1// Copyright 2020 Google LLC 2// 3// Licensed under the Apache License, Version 2.0 (the "License"); 4// you may not use this file except in compliance with the License. 5// You may obtain a copy of the License at 6// 7// http://www.apache.org/licenses/LICENSE-2.0 8// 9// Unless required by applicable law or agreed to in writing, software 10// distributed under the License is distributed on an "AS IS" BASIS, 11// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12// See the License for the specific language governing permissions and 13// limitations under the License. 14 15syntax = "proto3"; 16 17package google.cloud.automl.v1beta1; 18 19 20option go_package = "cloud.google.com/go/automl/apiv1beta1/automlpb;automlpb"; 21option java_multiple_files = true; 22option java_package = "com.google.cloud.automl.v1beta1"; 23option php_namespace = "Google\\Cloud\\AutoMl\\V1beta1"; 24option ruby_package = "Google::Cloud::AutoML::V1beta1"; 25 26// Input configuration for ImportData Action. 27// 28// The format of input depends on dataset_metadata the Dataset into which 29// the import is happening has. As input source the 30// [gcs_source][google.cloud.automl.v1beta1.InputConfig.gcs_source] 31// is expected, unless specified otherwise. Additionally any input .CSV file 32// by itself must be 100MB or smaller, unless specified otherwise. 33// If an "example" file (that is, image, video etc.) with identical content 34// (even if it had different GCS_FILE_PATH) is mentioned multiple times, then 35// its label, bounding boxes etc. are appended. The same file should be always 36// provided with the same ML_USE and GCS_FILE_PATH, if it is not, then 37// these values are nondeterministically selected from the given ones. 38// 39// The formats are represented in EBNF with commas being literal and with 40// non-terminal symbols defined near the end of this comment. The formats are: 41// 42// * For Image Classification: 43// CSV file(s) with each line in format: 44// ML_USE,GCS_FILE_PATH,LABEL,LABEL,... 45// GCS_FILE_PATH leads to image of up to 30MB in size. Supported 46// extensions: .JPEG, .GIF, .PNG, .WEBP, .BMP, .TIFF, .ICO 47// For MULTICLASS classification type, at most one LABEL is allowed 48// per image. If an image has not yet been labeled, then it should be 49// mentioned just once with no LABEL. 50// Some sample rows: 51// TRAIN,gs://folder/image1.jpg,daisy 52// TEST,gs://folder/image2.jpg,dandelion,tulip,rose 53// UNASSIGNED,gs://folder/image3.jpg,daisy 54// UNASSIGNED,gs://folder/image4.jpg 55// 56// * For Image Object Detection: 57// CSV file(s) with each line in format: 58// ML_USE,GCS_FILE_PATH,(LABEL,BOUNDING_BOX | ,,,,,,,) 59// GCS_FILE_PATH leads to image of up to 30MB in size. Supported 60// extensions: .JPEG, .GIF, .PNG. 61// Each image is assumed to be exhaustively labeled. The minimum 62// allowed BOUNDING_BOX edge length is 0.01, and no more than 500 63// BOUNDING_BOX-es per image are allowed (one BOUNDING_BOX is defined 64// per line). If an image has not yet been labeled, then it should be 65// mentioned just once with no LABEL and the ",,,,,,," in place of the 66// BOUNDING_BOX. For images which are known to not contain any 67// bounding boxes, they should be labelled explictly as 68// "NEGATIVE_IMAGE", followed by ",,,,,,," in place of the 69// BOUNDING_BOX. 70// Sample rows: 71// TRAIN,gs://folder/image1.png,car,0.1,0.1,,,0.3,0.3,, 72// TRAIN,gs://folder/image1.png,bike,.7,.6,,,.8,.9,, 73// UNASSIGNED,gs://folder/im2.png,car,0.1,0.1,0.2,0.1,0.2,0.3,0.1,0.3 74// TEST,gs://folder/im3.png,,,,,,,,, 75// TRAIN,gs://folder/im4.png,NEGATIVE_IMAGE,,,,,,,,, 76// 77// * For Video Classification: 78// CSV file(s) with each line in format: 79// ML_USE,GCS_FILE_PATH 80// where ML_USE VALIDATE value should not be used. The GCS_FILE_PATH 81// should lead to another .csv file which describes examples that have 82// given ML_USE, using the following row format: 83// GCS_FILE_PATH,(LABEL,TIME_SEGMENT_START,TIME_SEGMENT_END | ,,) 84// Here GCS_FILE_PATH leads to a video of up to 50GB in size and up 85// to 3h duration. Supported extensions: .MOV, .MPEG4, .MP4, .AVI. 86// TIME_SEGMENT_START and TIME_SEGMENT_END must be within the 87// length of the video, and end has to be after the start. Any segment 88// of a video which has one or more labels on it, is considered a 89// hard negative for all other labels. Any segment with no labels on 90// it is considered to be unknown. If a whole video is unknown, then 91// it shuold be mentioned just once with ",," in place of LABEL, 92// TIME_SEGMENT_START,TIME_SEGMENT_END. 93// Sample top level CSV file: 94// TRAIN,gs://folder/train_videos.csv 95// TEST,gs://folder/test_videos.csv 96// UNASSIGNED,gs://folder/other_videos.csv 97// Sample rows of a CSV file for a particular ML_USE: 98// gs://folder/video1.avi,car,120,180.000021 99// gs://folder/video1.avi,bike,150,180.000021 100// gs://folder/vid2.avi,car,0,60.5 101// gs://folder/vid3.avi,,, 102// 103// * For Video Object Tracking: 104// CSV file(s) with each line in format: 105// ML_USE,GCS_FILE_PATH 106// where ML_USE VALIDATE value should not be used. The GCS_FILE_PATH 107// should lead to another .csv file which describes examples that have 108// given ML_USE, using one of the following row format: 109// GCS_FILE_PATH,LABEL,[INSTANCE_ID],TIMESTAMP,BOUNDING_BOX 110// or 111// GCS_FILE_PATH,,,,,,,,,, 112// Here GCS_FILE_PATH leads to a video of up to 50GB in size and up 113// to 3h duration. Supported extensions: .MOV, .MPEG4, .MP4, .AVI. 114// Providing INSTANCE_IDs can help to obtain a better model. When 115// a specific labeled entity leaves the video frame, and shows up 116// afterwards it is not required, albeit preferable, that the same 117// INSTANCE_ID is given to it. 118// TIMESTAMP must be within the length of the video, the 119// BOUNDING_BOX is assumed to be drawn on the closest video's frame 120// to the TIMESTAMP. Any mentioned by the TIMESTAMP frame is expected 121// to be exhaustively labeled and no more than 500 BOUNDING_BOX-es per 122// frame are allowed. If a whole video is unknown, then it should be 123// mentioned just once with ",,,,,,,,,," in place of LABEL, 124// [INSTANCE_ID],TIMESTAMP,BOUNDING_BOX. 125// Sample top level CSV file: 126// TRAIN,gs://folder/train_videos.csv 127// TEST,gs://folder/test_videos.csv 128// UNASSIGNED,gs://folder/other_videos.csv 129// Seven sample rows of a CSV file for a particular ML_USE: 130// gs://folder/video1.avi,car,1,12.10,0.8,0.8,0.9,0.8,0.9,0.9,0.8,0.9 131// gs://folder/video1.avi,car,1,12.90,0.4,0.8,0.5,0.8,0.5,0.9,0.4,0.9 132// gs://folder/video1.avi,car,2,12.10,.4,.2,.5,.2,.5,.3,.4,.3 133// gs://folder/video1.avi,car,2,12.90,.8,.2,,,.9,.3,, 134// gs://folder/video1.avi,bike,,12.50,.45,.45,,,.55,.55,, 135// gs://folder/video2.avi,car,1,0,.1,.9,,,.9,.1,, 136// gs://folder/video2.avi,,,,,,,,,,, 137// * For Text Extraction: 138// CSV file(s) with each line in format: 139// ML_USE,GCS_FILE_PATH 140// GCS_FILE_PATH leads to a .JSONL (that is, JSON Lines) file which 141// either imports text in-line or as documents. Any given 142// .JSONL file must be 100MB or smaller. 143// The in-line .JSONL file contains, per line, a proto that wraps a 144// TextSnippet proto (in json representation) followed by one or more 145// AnnotationPayload protos (called annotations), which have 146// display_name and text_extraction detail populated. The given text 147// is expected to be annotated exhaustively, for example, if you look 148// for animals and text contains "dolphin" that is not labeled, then 149// "dolphin" is assumed to not be an animal. Any given text snippet 150// content must be 10KB or smaller, and also be UTF-8 NFC encoded 151// (ASCII already is). 152// The document .JSONL file contains, per line, a proto that wraps a 153// Document proto. The Document proto must have either document_text 154// or input_config set. In document_text case, the Document proto may 155// also contain the spatial information of the document, including 156// layout, document dimension and page number. In input_config case, 157// only PDF documents are supported now, and each document may be up 158// to 2MB large. Currently, annotations on documents cannot be 159// specified at import. 160// Three sample CSV rows: 161// TRAIN,gs://folder/file1.jsonl 162// VALIDATE,gs://folder/file2.jsonl 163// TEST,gs://folder/file3.jsonl 164// Sample in-line JSON Lines file for entity extraction (presented here 165// with artificial line breaks, but the only actual line break is 166// denoted by \n).: 167// { 168// "document": { 169// "document_text": {"content": "dog cat"} 170// "layout": [ 171// { 172// "text_segment": { 173// "start_offset": 0, 174// "end_offset": 3, 175// }, 176// "page_number": 1, 177// "bounding_poly": { 178// "normalized_vertices": [ 179// {"x": 0.1, "y": 0.1}, 180// {"x": 0.1, "y": 0.3}, 181// {"x": 0.3, "y": 0.3}, 182// {"x": 0.3, "y": 0.1}, 183// ], 184// }, 185// "text_segment_type": TOKEN, 186// }, 187// { 188// "text_segment": { 189// "start_offset": 4, 190// "end_offset": 7, 191// }, 192// "page_number": 1, 193// "bounding_poly": { 194// "normalized_vertices": [ 195// {"x": 0.4, "y": 0.1}, 196// {"x": 0.4, "y": 0.3}, 197// {"x": 0.8, "y": 0.3}, 198// {"x": 0.8, "y": 0.1}, 199// ], 200// }, 201// "text_segment_type": TOKEN, 202// } 203// 204// ], 205// "document_dimensions": { 206// "width": 8.27, 207// "height": 11.69, 208// "unit": INCH, 209// } 210// "page_count": 1, 211// }, 212// "annotations": [ 213// { 214// "display_name": "animal", 215// "text_extraction": {"text_segment": {"start_offset": 0, 216// "end_offset": 3}} 217// }, 218// { 219// "display_name": "animal", 220// "text_extraction": {"text_segment": {"start_offset": 4, 221// "end_offset": 7}} 222// } 223// ], 224// }\n 225// { 226// "text_snippet": { 227// "content": "This dog is good." 228// }, 229// "annotations": [ 230// { 231// "display_name": "animal", 232// "text_extraction": { 233// "text_segment": {"start_offset": 5, "end_offset": 8} 234// } 235// } 236// ] 237// } 238// Sample document JSON Lines file (presented here with artificial line 239// breaks, but the only actual line break is denoted by \n).: 240// { 241// "document": { 242// "input_config": { 243// "gcs_source": { "input_uris": [ "gs://folder/document1.pdf" ] 244// } 245// } 246// } 247// }\n 248// { 249// "document": { 250// "input_config": { 251// "gcs_source": { "input_uris": [ "gs://folder/document2.pdf" ] 252// } 253// } 254// } 255// } 256// 257// * For Text Classification: 258// CSV file(s) with each line in format: 259// ML_USE,(TEXT_SNIPPET | GCS_FILE_PATH),LABEL,LABEL,... 260// TEXT_SNIPPET and GCS_FILE_PATH are distinguished by a pattern. If 261// the column content is a valid gcs file path, i.e. prefixed by 262// "gs://", it will be treated as a GCS_FILE_PATH, else if the content 263// is enclosed within double quotes (""), it is 264// treated as a TEXT_SNIPPET. In the GCS_FILE_PATH case, the path 265// must lead to a .txt file with UTF-8 encoding, for example, 266// "gs://folder/content.txt", and the content in it is extracted 267// as a text snippet. In TEXT_SNIPPET case, the column content 268// excluding quotes is treated as to be imported text snippet. In 269// both cases, the text snippet/file size must be within 128kB. 270// Maximum 100 unique labels are allowed per CSV row. 271// Sample rows: 272// TRAIN,"They have bad food and very rude",RudeService,BadFood 273// TRAIN,gs://folder/content.txt,SlowService 274// TEST,"Typically always bad service there.",RudeService 275// VALIDATE,"Stomach ache to go.",BadFood 276// 277// * For Text Sentiment: 278// CSV file(s) with each line in format: 279// ML_USE,(TEXT_SNIPPET | GCS_FILE_PATH),SENTIMENT 280// TEXT_SNIPPET and GCS_FILE_PATH are distinguished by a pattern. If 281// the column content is a valid gcs file path, that is, prefixed by 282// "gs://", it is treated as a GCS_FILE_PATH, otherwise it is treated 283// as a TEXT_SNIPPET. In the GCS_FILE_PATH case, the path 284// must lead to a .txt file with UTF-8 encoding, for example, 285// "gs://folder/content.txt", and the content in it is extracted 286// as a text snippet. In TEXT_SNIPPET case, the column content itself 287// is treated as to be imported text snippet. In both cases, the 288// text snippet must be up to 500 characters long. 289// Sample rows: 290// TRAIN,"@freewrytin this is way too good for your product",2 291// TRAIN,"I need this product so bad",3 292// TEST,"Thank you for this product.",4 293// VALIDATE,gs://folder/content.txt,2 294// 295// * For Tables: 296// Either 297// [gcs_source][google.cloud.automl.v1beta1.InputConfig.gcs_source] or 298// 299// [bigquery_source][google.cloud.automl.v1beta1.InputConfig.bigquery_source] 300// can be used. All inputs is concatenated into a single 301// 302// [primary_table][google.cloud.automl.v1beta1.TablesDatasetMetadata.primary_table_name] 303// For gcs_source: 304// CSV file(s), where the first row of the first file is the header, 305// containing unique column names. If the first row of a subsequent 306// file is the same as the header, then it is also treated as a 307// header. All other rows contain values for the corresponding 308// columns. 309// Each .CSV file by itself must be 10GB or smaller, and their total 310// size must be 100GB or smaller. 311// First three sample rows of a CSV file: 312// "Id","First Name","Last Name","Dob","Addresses" 313// 314// "1","John","Doe","1968-01-22","[{"status":"current","address":"123_First_Avenue","city":"Seattle","state":"WA","zip":"11111","numberOfYears":"1"},{"status":"previous","address":"456_Main_Street","city":"Portland","state":"OR","zip":"22222","numberOfYears":"5"}]" 315// 316// "2","Jane","Doe","1980-10-16","[{"status":"current","address":"789_Any_Avenue","city":"Albany","state":"NY","zip":"33333","numberOfYears":"2"},{"status":"previous","address":"321_Main_Street","city":"Hoboken","state":"NJ","zip":"44444","numberOfYears":"3"}]} 317// For bigquery_source: 318// An URI of a BigQuery table. The user data size of the BigQuery 319// table must be 100GB or smaller. 320// An imported table must have between 2 and 1,000 columns, inclusive, 321// and between 1000 and 100,000,000 rows, inclusive. There are at most 5 322// import data running in parallel. 323// Definitions: 324// ML_USE = "TRAIN" | "VALIDATE" | "TEST" | "UNASSIGNED" 325// Describes how the given example (file) should be used for model 326// training. "UNASSIGNED" can be used when user has no preference. 327// GCS_FILE_PATH = A path to file on GCS, e.g. "gs://folder/image1.png". 328// LABEL = A display name of an object on an image, video etc., e.g. "dog". 329// Must be up to 32 characters long and can consist only of ASCII 330// Latin letters A-Z and a-z, underscores(_), and ASCII digits 0-9. 331// For each label an AnnotationSpec is created which display_name 332// becomes the label; AnnotationSpecs are given back in predictions. 333// INSTANCE_ID = A positive integer that identifies a specific instance of a 334// labeled entity on an example. Used e.g. to track two cars on 335// a video while being able to tell apart which one is which. 336// BOUNDING_BOX = VERTEX,VERTEX,VERTEX,VERTEX | VERTEX,,,VERTEX,, 337// A rectangle parallel to the frame of the example (image, 338// video). If 4 vertices are given they are connected by edges 339// in the order provided, if 2 are given they are recognized 340// as diagonally opposite vertices of the rectangle. 341// VERTEX = COORDINATE,COORDINATE 342// First coordinate is horizontal (x), the second is vertical (y). 343// COORDINATE = A float in 0 to 1 range, relative to total length of 344// image or video in given dimension. For fractions the 345// leading non-decimal 0 can be omitted (i.e. 0.3 = .3). 346// Point 0,0 is in top left. 347// TIME_SEGMENT_START = TIME_OFFSET 348// Expresses a beginning, inclusive, of a time segment 349// within an example that has a time dimension 350// (e.g. video). 351// TIME_SEGMENT_END = TIME_OFFSET 352// Expresses an end, exclusive, of a time segment within 353// an example that has a time dimension (e.g. video). 354// TIME_OFFSET = A number of seconds as measured from the start of an 355// example (e.g. video). Fractions are allowed, up to a 356// microsecond precision. "inf" is allowed, and it means the end 357// of the example. 358// TEXT_SNIPPET = A content of a text snippet, UTF-8 encoded, enclosed within 359// double quotes (""). 360// SENTIMENT = An integer between 0 and 361// Dataset.text_sentiment_dataset_metadata.sentiment_max 362// (inclusive). Describes the ordinal of the sentiment - higher 363// value means a more positive sentiment. All the values are 364// completely relative, i.e. neither 0 needs to mean a negative or 365// neutral sentiment nor sentiment_max needs to mean a positive one 366// - it is just required that 0 is the least positive sentiment 367// in the data, and sentiment_max is the most positive one. 368// The SENTIMENT shouldn't be confused with "score" or "magnitude" 369// from the previous Natural Language Sentiment Analysis API. 370// All SENTIMENT values between 0 and sentiment_max must be 371// represented in the imported data. On prediction the same 0 to 372// sentiment_max range will be used. The difference between 373// neighboring sentiment values needs not to be uniform, e.g. 1 and 374// 2 may be similar whereas the difference between 2 and 3 may be 375// huge. 376// 377// Errors: 378// If any of the provided CSV files can't be parsed or if more than certain 379// percent of CSV rows cannot be processed then the operation fails and 380// nothing is imported. Regardless of overall success or failure the per-row 381// failures, up to a certain count cap, is listed in 382// Operation.metadata.partial_failures. 383// 384message InputConfig { 385 // The source of the input. 386 oneof source { 387 // The Google Cloud Storage location for the input content. 388 // In ImportData, the gcs_source points to a csv with structure described in 389 // the comment. 390 GcsSource gcs_source = 1; 391 392 // The BigQuery location for the input content. 393 BigQuerySource bigquery_source = 3; 394 } 395 396 // Additional domain-specific parameters describing the semantic of the 397 // imported data, any string must be up to 25000 398 // characters long. 399 // 400 // * For Tables: 401 // `schema_inference_version` - (integer) Required. The version of the 402 // algorithm that should be used for the initial inference of the 403 // schema (columns' DataTypes) of the table the data is being imported 404 // into. Allowed values: "1". 405 map<string, string> params = 2; 406} 407 408// Input configuration for BatchPredict Action. 409// 410// The format of input depends on the ML problem of the model used for 411// prediction. As input source the 412// [gcs_source][google.cloud.automl.v1beta1.InputConfig.gcs_source] 413// is expected, unless specified otherwise. 414// 415// The formats are represented in EBNF with commas being literal and with 416// non-terminal symbols defined near the end of this comment. The formats 417// are: 418// 419// * For Image Classification: 420// CSV file(s) with each line having just a single column: 421// GCS_FILE_PATH 422// which leads to image of up to 30MB in size. Supported 423// extensions: .JPEG, .GIF, .PNG. This path is treated as the ID in 424// the Batch predict output. 425// Three sample rows: 426// gs://folder/image1.jpeg 427// gs://folder/image2.gif 428// gs://folder/image3.png 429// 430// * For Image Object Detection: 431// CSV file(s) with each line having just a single column: 432// GCS_FILE_PATH 433// which leads to image of up to 30MB in size. Supported 434// extensions: .JPEG, .GIF, .PNG. This path is treated as the ID in 435// the Batch predict output. 436// Three sample rows: 437// gs://folder/image1.jpeg 438// gs://folder/image2.gif 439// gs://folder/image3.png 440// * For Video Classification: 441// CSV file(s) with each line in format: 442// GCS_FILE_PATH,TIME_SEGMENT_START,TIME_SEGMENT_END 443// GCS_FILE_PATH leads to video of up to 50GB in size and up to 3h 444// duration. Supported extensions: .MOV, .MPEG4, .MP4, .AVI. 445// TIME_SEGMENT_START and TIME_SEGMENT_END must be within the 446// length of the video, and end has to be after the start. 447// Three sample rows: 448// gs://folder/video1.mp4,10,40 449// gs://folder/video1.mp4,20,60 450// gs://folder/vid2.mov,0,inf 451// 452// * For Video Object Tracking: 453// CSV file(s) with each line in format: 454// GCS_FILE_PATH,TIME_SEGMENT_START,TIME_SEGMENT_END 455// GCS_FILE_PATH leads to video of up to 50GB in size and up to 3h 456// duration. Supported extensions: .MOV, .MPEG4, .MP4, .AVI. 457// TIME_SEGMENT_START and TIME_SEGMENT_END must be within the 458// length of the video, and end has to be after the start. 459// Three sample rows: 460// gs://folder/video1.mp4,10,240 461// gs://folder/video1.mp4,300,360 462// gs://folder/vid2.mov,0,inf 463// * For Text Classification: 464// CSV file(s) with each line having just a single column: 465// GCS_FILE_PATH | TEXT_SNIPPET 466// Any given text file can have size upto 128kB. 467// Any given text snippet content must have 60,000 characters or less. 468// Three sample rows: 469// gs://folder/text1.txt 470// "Some text content to predict" 471// gs://folder/text3.pdf 472// Supported file extensions: .txt, .pdf 473// 474// * For Text Sentiment: 475// CSV file(s) with each line having just a single column: 476// GCS_FILE_PATH | TEXT_SNIPPET 477// Any given text file can have size upto 128kB. 478// Any given text snippet content must have 500 characters or less. 479// Three sample rows: 480// gs://folder/text1.txt 481// "Some text content to predict" 482// gs://folder/text3.pdf 483// Supported file extensions: .txt, .pdf 484// 485// * For Text Extraction 486// .JSONL (i.e. JSON Lines) file(s) which either provide text in-line or 487// as documents (for a single BatchPredict call only one of the these 488// formats may be used). 489// The in-line .JSONL file(s) contain per line a proto that 490// wraps a temporary user-assigned TextSnippet ID (string up to 2000 491// characters long) called "id", a TextSnippet proto (in 492// json representation) and zero or more TextFeature protos. Any given 493// text snippet content must have 30,000 characters or less, and also 494// be UTF-8 NFC encoded (ASCII already is). The IDs provided should be 495// unique. 496// The document .JSONL file(s) contain, per line, a proto that wraps a 497// Document proto with input_config set. Only PDF documents are 498// supported now, and each document must be up to 2MB large. 499// Any given .JSONL file must be 100MB or smaller, and no more than 20 500// files may be given. 501// Sample in-line JSON Lines file (presented here with artificial line 502// breaks, but the only actual line break is denoted by \n): 503// { 504// "id": "my_first_id", 505// "text_snippet": { "content": "dog car cat"}, 506// "text_features": [ 507// { 508// "text_segment": {"start_offset": 4, "end_offset": 6}, 509// "structural_type": PARAGRAPH, 510// "bounding_poly": { 511// "normalized_vertices": [ 512// {"x": 0.1, "y": 0.1}, 513// {"x": 0.1, "y": 0.3}, 514// {"x": 0.3, "y": 0.3}, 515// {"x": 0.3, "y": 0.1}, 516// ] 517// }, 518// } 519// ], 520// }\n 521// { 522// "id": "2", 523// "text_snippet": { 524// "content": "An elaborate content", 525// "mime_type": "text/plain" 526// } 527// } 528// Sample document JSON Lines file (presented here with artificial line 529// breaks, but the only actual line break is denoted by \n).: 530// { 531// "document": { 532// "input_config": { 533// "gcs_source": { "input_uris": [ "gs://folder/document1.pdf" ] 534// } 535// } 536// } 537// }\n 538// { 539// "document": { 540// "input_config": { 541// "gcs_source": { "input_uris": [ "gs://folder/document2.pdf" ] 542// } 543// } 544// } 545// } 546// 547// * For Tables: 548// Either 549// [gcs_source][google.cloud.automl.v1beta1.InputConfig.gcs_source] or 550// 551// [bigquery_source][google.cloud.automl.v1beta1.InputConfig.bigquery_source]. 552// GCS case: 553// CSV file(s), each by itself 10GB or smaller and total size must be 554// 100GB or smaller, where first file must have a header containing 555// column names. If the first row of a subsequent file is the same as 556// the header, then it is also treated as a header. All other rows 557// contain values for the corresponding columns. 558// The column names must contain the model's 559// 560// [input_feature_column_specs'][google.cloud.automl.v1beta1.TablesModelMetadata.input_feature_column_specs] 561// 562// [display_name-s][google.cloud.automl.v1beta1.ColumnSpec.display_name] 563// (order doesn't matter). The columns corresponding to the model's 564// input feature column specs must contain values compatible with the 565// column spec's data types. Prediction on all the rows, i.e. the CSV 566// lines, will be attempted. For FORECASTING 567// 568// [prediction_type][google.cloud.automl.v1beta1.TablesModelMetadata.prediction_type]: 569// all columns having 570// 571// [TIME_SERIES_AVAILABLE_PAST_ONLY][google.cloud.automl.v1beta1.ColumnSpec.ForecastingMetadata.ColumnType] 572// type will be ignored. 573// First three sample rows of a CSV file: 574// "First Name","Last Name","Dob","Addresses" 575// 576// "John","Doe","1968-01-22","[{"status":"current","address":"123_First_Avenue","city":"Seattle","state":"WA","zip":"11111","numberOfYears":"1"},{"status":"previous","address":"456_Main_Street","city":"Portland","state":"OR","zip":"22222","numberOfYears":"5"}]" 577// 578// "Jane","Doe","1980-10-16","[{"status":"current","address":"789_Any_Avenue","city":"Albany","state":"NY","zip":"33333","numberOfYears":"2"},{"status":"previous","address":"321_Main_Street","city":"Hoboken","state":"NJ","zip":"44444","numberOfYears":"3"}]} 579// BigQuery case: 580// An URI of a BigQuery table. The user data size of the BigQuery 581// table must be 100GB or smaller. 582// The column names must contain the model's 583// 584// [input_feature_column_specs'][google.cloud.automl.v1beta1.TablesModelMetadata.input_feature_column_specs] 585// 586// [display_name-s][google.cloud.automl.v1beta1.ColumnSpec.display_name] 587// (order doesn't matter). The columns corresponding to the model's 588// input feature column specs must contain values compatible with the 589// column spec's data types. Prediction on all the rows of the table 590// will be attempted. For FORECASTING 591// 592// [prediction_type][google.cloud.automl.v1beta1.TablesModelMetadata.prediction_type]: 593// all columns having 594// 595// [TIME_SERIES_AVAILABLE_PAST_ONLY][google.cloud.automl.v1beta1.ColumnSpec.ForecastingMetadata.ColumnType] 596// type will be ignored. 597// 598// Definitions: 599// GCS_FILE_PATH = A path to file on GCS, e.g. "gs://folder/video.avi". 600// TEXT_SNIPPET = A content of a text snippet, UTF-8 encoded, enclosed within 601// double quotes ("") 602// TIME_SEGMENT_START = TIME_OFFSET 603// Expresses a beginning, inclusive, of a time segment 604// within an 605// example that has a time dimension (e.g. video). 606// TIME_SEGMENT_END = TIME_OFFSET 607// Expresses an end, exclusive, of a time segment within 608// an example that has a time dimension (e.g. video). 609// TIME_OFFSET = A number of seconds as measured from the start of an 610// example (e.g. video). Fractions are allowed, up to a 611// microsecond precision. "inf" is allowed and it means the end 612// of the example. 613// 614// Errors: 615// If any of the provided CSV files can't be parsed or if more than certain 616// percent of CSV rows cannot be processed then the operation fails and 617// prediction does not happen. Regardless of overall success or failure the 618// per-row failures, up to a certain count cap, will be listed in 619// Operation.metadata.partial_failures. 620message BatchPredictInputConfig { 621 // Required. The source of the input. 622 oneof source { 623 // The Google Cloud Storage location for the input content. 624 GcsSource gcs_source = 1; 625 626 // The BigQuery location for the input content. 627 BigQuerySource bigquery_source = 2; 628 } 629} 630 631// Input configuration of a [Document][google.cloud.automl.v1beta1.Document]. 632message DocumentInputConfig { 633 // The Google Cloud Storage location of the document file. Only a single path 634 // should be given. 635 // Max supported size: 512MB. 636 // Supported extensions: .PDF. 637 GcsSource gcs_source = 1; 638} 639 640// * For Translation: 641// CSV file `translation.csv`, with each line in format: 642// ML_USE,GCS_FILE_PATH 643// GCS_FILE_PATH leads to a .TSV file which describes examples that have 644// given ML_USE, using the following row format per line: 645// TEXT_SNIPPET (in source language) \t TEXT_SNIPPET (in target 646// language) 647// 648// * For Tables: 649// Output depends on whether the dataset was imported from GCS or 650// BigQuery. 651// GCS case: 652// 653// [gcs_destination][google.cloud.automl.v1beta1.OutputConfig.gcs_destination] 654// must be set. Exported are CSV file(s) `tables_1.csv`, 655// `tables_2.csv`,...,`tables_N.csv` with each having as header line 656// the table's column names, and all other lines contain values for 657// the header columns. 658// BigQuery case: 659// 660// [bigquery_destination][google.cloud.automl.v1beta1.OutputConfig.bigquery_destination] 661// pointing to a BigQuery project must be set. In the given project a 662// new dataset will be created with name 663// 664// `export_data_<automl-dataset-display-name>_<timestamp-of-export-call>` 665// where <automl-dataset-display-name> will be made 666// BigQuery-dataset-name compatible (e.g. most special characters will 667// become underscores), and timestamp will be in 668// YYYY_MM_DDThh_mm_ss_sssZ "based on ISO-8601" format. In that 669// dataset a new table called `primary_table` will be created, and 670// filled with precisely the same data as this obtained on import. 671message OutputConfig { 672 // Required. The destination of the output. 673 oneof destination { 674 // The Google Cloud Storage location where the output is to be written to. 675 // For Image Object Detection, Text Extraction, Video Classification and 676 // Tables, in the given directory a new directory will be created with name: 677 // export_data-<dataset-display-name>-<timestamp-of-export-call> where 678 // timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. All export 679 // output will be written into that directory. 680 GcsDestination gcs_destination = 1; 681 682 // The BigQuery location where the output is to be written to. 683 BigQueryDestination bigquery_destination = 2; 684 } 685} 686 687// Output configuration for BatchPredict Action. 688// 689// As destination the 690// 691// [gcs_destination][google.cloud.automl.v1beta1.BatchPredictOutputConfig.gcs_destination] 692// must be set unless specified otherwise for a domain. If gcs_destination is 693// set then in the given directory a new directory is created. Its name 694// will be 695// "prediction-<model-display-name>-<timestamp-of-prediction-call>", 696// where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format. The contents 697// of it depends on the ML problem the predictions are made for. 698// 699// * For Image Classification: 700// In the created directory files `image_classification_1.jsonl`, 701// `image_classification_2.jsonl`,...,`image_classification_N.jsonl` 702// will be created, where N may be 1, and depends on the 703// total number of the successfully predicted images and annotations. 704// A single image will be listed only once with all its annotations, 705// and its annotations will never be split across files. 706// Each .JSONL file will contain, per line, a JSON representation of a 707// proto that wraps image's "ID" : "<id_value>" followed by a list of 708// zero or more AnnotationPayload protos (called annotations), which 709// have classification detail populated. 710// If prediction for any image failed (partially or completely), then an 711// additional `errors_1.jsonl`, `errors_2.jsonl`,..., `errors_N.jsonl` 712// files will be created (N depends on total number of failed 713// predictions). These files will have a JSON representation of a proto 714// that wraps the same "ID" : "<id_value>" but here followed by 715// exactly one 716// 717// [`google.rpc.Status`](https: 718// //github.com/googleapis/googleapis/blob/master/google/rpc/status.proto) 719// containing only `code` and `message`fields. 720// 721// * For Image Object Detection: 722// In the created directory files `image_object_detection_1.jsonl`, 723// `image_object_detection_2.jsonl`,...,`image_object_detection_N.jsonl` 724// will be created, where N may be 1, and depends on the 725// total number of the successfully predicted images and annotations. 726// Each .JSONL file will contain, per line, a JSON representation of a 727// proto that wraps image's "ID" : "<id_value>" followed by a list of 728// zero or more AnnotationPayload protos (called annotations), which 729// have image_object_detection detail populated. A single image will 730// be listed only once with all its annotations, and its annotations 731// will never be split across files. 732// If prediction for any image failed (partially or completely), then 733// additional `errors_1.jsonl`, `errors_2.jsonl`,..., `errors_N.jsonl` 734// files will be created (N depends on total number of failed 735// predictions). These files will have a JSON representation of a proto 736// that wraps the same "ID" : "<id_value>" but here followed by 737// exactly one 738// 739// [`google.rpc.Status`](https: 740// //github.com/googleapis/googleapis/blob/master/google/rpc/status.proto) 741// containing only `code` and `message`fields. 742// * For Video Classification: 743// In the created directory a video_classification.csv file, and a .JSON 744// file per each video classification requested in the input (i.e. each 745// line in given CSV(s)), will be created. 746// 747// The format of video_classification.csv is: 748// 749// GCS_FILE_PATH,TIME_SEGMENT_START,TIME_SEGMENT_END,JSON_FILE_NAME,STATUS 750// where: 751// GCS_FILE_PATH,TIME_SEGMENT_START,TIME_SEGMENT_END = matches 1 to 1 752// the prediction input lines (i.e. video_classification.csv has 753// precisely the same number of lines as the prediction input had.) 754// JSON_FILE_NAME = Name of .JSON file in the output directory, which 755// contains prediction responses for the video time segment. 756// STATUS = "OK" if prediction completed successfully, or an error code 757// with message otherwise. If STATUS is not "OK" then the .JSON file 758// for that line may not exist or be empty. 759// 760// Each .JSON file, assuming STATUS is "OK", will contain a list of 761// AnnotationPayload protos in JSON format, which are the predictions 762// for the video time segment the file is assigned to in the 763// video_classification.csv. All AnnotationPayload protos will have 764// video_classification field set, and will be sorted by 765// video_classification.type field (note that the returned types are 766// governed by `classifaction_types` parameter in 767// [PredictService.BatchPredictRequest.params][]). 768// 769// * For Video Object Tracking: 770// In the created directory a video_object_tracking.csv file will be 771// created, and multiple files video_object_trackinng_1.json, 772// video_object_trackinng_2.json,..., video_object_trackinng_N.json, 773// where N is the number of requests in the input (i.e. the number of 774// lines in given CSV(s)). 775// 776// The format of video_object_tracking.csv is: 777// 778// GCS_FILE_PATH,TIME_SEGMENT_START,TIME_SEGMENT_END,JSON_FILE_NAME,STATUS 779// where: 780// GCS_FILE_PATH,TIME_SEGMENT_START,TIME_SEGMENT_END = matches 1 to 1 781// the prediction input lines (i.e. video_object_tracking.csv has 782// precisely the same number of lines as the prediction input had.) 783// JSON_FILE_NAME = Name of .JSON file in the output directory, which 784// contains prediction responses for the video time segment. 785// STATUS = "OK" if prediction completed successfully, or an error 786// code with message otherwise. If STATUS is not "OK" then the .JSON 787// file for that line may not exist or be empty. 788// 789// Each .JSON file, assuming STATUS is "OK", will contain a list of 790// AnnotationPayload protos in JSON format, which are the predictions 791// for each frame of the video time segment the file is assigned to in 792// video_object_tracking.csv. All AnnotationPayload protos will have 793// video_object_tracking field set. 794// * For Text Classification: 795// In the created directory files `text_classification_1.jsonl`, 796// `text_classification_2.jsonl`,...,`text_classification_N.jsonl` 797// will be created, where N may be 1, and depends on the 798// total number of inputs and annotations found. 799// 800// Each .JSONL file will contain, per line, a JSON representation of a 801// proto that wraps input text snippet or input text file and a list of 802// zero or more AnnotationPayload protos (called annotations), which 803// have classification detail populated. A single text snippet or file 804// will be listed only once with all its annotations, and its 805// annotations will never be split across files. 806// 807// If prediction for any text snippet or file failed (partially or 808// completely), then additional `errors_1.jsonl`, `errors_2.jsonl`,..., 809// `errors_N.jsonl` files will be created (N depends on total number of 810// failed predictions). These files will have a JSON representation of a 811// proto that wraps input text snippet or input text file followed by 812// exactly one 813// 814// [`google.rpc.Status`](https: 815// //github.com/googleapis/googleapis/blob/master/google/rpc/status.proto) 816// containing only `code` and `message`. 817// 818// * For Text Sentiment: 819// In the created directory files `text_sentiment_1.jsonl`, 820// `text_sentiment_2.jsonl`,...,`text_sentiment_N.jsonl` 821// will be created, where N may be 1, and depends on the 822// total number of inputs and annotations found. 823// 824// Each .JSONL file will contain, per line, a JSON representation of a 825// proto that wraps input text snippet or input text file and a list of 826// zero or more AnnotationPayload protos (called annotations), which 827// have text_sentiment detail populated. A single text snippet or file 828// will be listed only once with all its annotations, and its 829// annotations will never be split across files. 830// 831// If prediction for any text snippet or file failed (partially or 832// completely), then additional `errors_1.jsonl`, `errors_2.jsonl`,..., 833// `errors_N.jsonl` files will be created (N depends on total number of 834// failed predictions). These files will have a JSON representation of a 835// proto that wraps input text snippet or input text file followed by 836// exactly one 837// 838// [`google.rpc.Status`](https: 839// //github.com/googleapis/googleapis/blob/master/google/rpc/status.proto) 840// containing only `code` and `message`. 841// 842// * For Text Extraction: 843// In the created directory files `text_extraction_1.jsonl`, 844// `text_extraction_2.jsonl`,...,`text_extraction_N.jsonl` 845// will be created, where N may be 1, and depends on the 846// total number of inputs and annotations found. 847// The contents of these .JSONL file(s) depend on whether the input 848// used inline text, or documents. 849// If input was inline, then each .JSONL file will contain, per line, 850// a JSON representation of a proto that wraps given in request text 851// snippet's "id" (if specified), followed by input text snippet, 852// and a list of zero or more 853// AnnotationPayload protos (called annotations), which have 854// text_extraction detail populated. A single text snippet will be 855// listed only once with all its annotations, and its annotations will 856// never be split across files. 857// If input used documents, then each .JSONL file will contain, per 858// line, a JSON representation of a proto that wraps given in request 859// document proto, followed by its OCR-ed representation in the form 860// of a text snippet, finally followed by a list of zero or more 861// AnnotationPayload protos (called annotations), which have 862// text_extraction detail populated and refer, via their indices, to 863// the OCR-ed text snippet. A single document (and its text snippet) 864// will be listed only once with all its annotations, and its 865// annotations will never be split across files. 866// If prediction for any text snippet failed (partially or completely), 867// then additional `errors_1.jsonl`, `errors_2.jsonl`,..., 868// `errors_N.jsonl` files will be created (N depends on total number of 869// failed predictions). These files will have a JSON representation of a 870// proto that wraps either the "id" : "<id_value>" (in case of inline) 871// or the document proto (in case of document) but here followed by 872// exactly one 873// 874// [`google.rpc.Status`](https: 875// //github.com/googleapis/googleapis/blob/master/google/rpc/status.proto) 876// containing only `code` and `message`. 877// 878// * For Tables: 879// Output depends on whether 880// 881// [gcs_destination][google.cloud.automl.v1beta1.BatchPredictOutputConfig.gcs_destination] 882// or 883// 884// [bigquery_destination][google.cloud.automl.v1beta1.BatchPredictOutputConfig.bigquery_destination] 885// is set (either is allowed). 886// GCS case: 887// In the created directory files `tables_1.csv`, `tables_2.csv`,..., 888// `tables_N.csv` will be created, where N may be 1, and depends on 889// the total number of the successfully predicted rows. 890// For all CLASSIFICATION 891// 892// [prediction_type-s][google.cloud.automl.v1beta1.TablesModelMetadata.prediction_type]: 893// Each .csv file will contain a header, listing all columns' 894// 895// [display_name-s][google.cloud.automl.v1beta1.ColumnSpec.display_name] 896// given on input followed by M target column names in the format of 897// 898// "<[target_column_specs][google.cloud.automl.v1beta1.TablesModelMetadata.target_column_spec] 899// 900// [display_name][google.cloud.automl.v1beta1.ColumnSpec.display_name]>_<target 901// value>_score" where M is the number of distinct target values, 902// i.e. number of distinct values in the target column of the table 903// used to train the model. Subsequent lines will contain the 904// respective values of successfully predicted rows, with the last, 905// i.e. the target, columns having the corresponding prediction 906// [scores][google.cloud.automl.v1beta1.TablesAnnotation.score]. 907// For REGRESSION and FORECASTING 908// 909// [prediction_type-s][google.cloud.automl.v1beta1.TablesModelMetadata.prediction_type]: 910// Each .csv file will contain a header, listing all columns' 911// [display_name-s][google.cloud.automl.v1beta1.display_name] given 912// on input followed by the predicted target column with name in the 913// format of 914// 915// "predicted_<[target_column_specs][google.cloud.automl.v1beta1.TablesModelMetadata.target_column_spec] 916// 917// [display_name][google.cloud.automl.v1beta1.ColumnSpec.display_name]>" 918// Subsequent lines will contain the respective values of 919// successfully predicted rows, with the last, i.e. the target, 920// column having the predicted target value. 921// If prediction for any rows failed, then an additional 922// `errors_1.csv`, `errors_2.csv`,..., `errors_N.csv` will be 923// created (N depends on total number of failed rows). These files 924// will have analogous format as `tables_*.csv`, but always with a 925// single target column having 926// 927// [`google.rpc.Status`](https: 928// //github.com/googleapis/googleapis/blob/master/google/rpc/status.proto) 929// represented as a JSON string, and containing only `code` and 930// `message`. 931// BigQuery case: 932// 933// [bigquery_destination][google.cloud.automl.v1beta1.OutputConfig.bigquery_destination] 934// pointing to a BigQuery project must be set. In the given project a 935// new dataset will be created with name 936// `prediction_<model-display-name>_<timestamp-of-prediction-call>` 937// where <model-display-name> will be made 938// BigQuery-dataset-name compatible (e.g. most special characters will 939// become underscores), and timestamp will be in 940// YYYY_MM_DDThh_mm_ss_sssZ "based on ISO-8601" format. In the dataset 941// two tables will be created, `predictions`, and `errors`. 942// The `predictions` table's column names will be the input columns' 943// 944// [display_name-s][google.cloud.automl.v1beta1.ColumnSpec.display_name] 945// followed by the target column with name in the format of 946// 947// "predicted_<[target_column_specs][google.cloud.automl.v1beta1.TablesModelMetadata.target_column_spec] 948// 949// [display_name][google.cloud.automl.v1beta1.ColumnSpec.display_name]>" 950// The input feature columns will contain the respective values of 951// successfully predicted rows, with the target column having an 952// ARRAY of 953// 954// [AnnotationPayloads][google.cloud.automl.v1beta1.AnnotationPayload], 955// represented as STRUCT-s, containing 956// [TablesAnnotation][google.cloud.automl.v1beta1.TablesAnnotation]. 957// The `errors` table contains rows for which the prediction has 958// failed, it has analogous input columns while the target column name 959// is in the format of 960// 961// "errors_<[target_column_specs][google.cloud.automl.v1beta1.TablesModelMetadata.target_column_spec] 962// 963// [display_name][google.cloud.automl.v1beta1.ColumnSpec.display_name]>", 964// and as a value has 965// 966// [`google.rpc.Status`](https: 967// //github.com/googleapis/googleapis/blob/master/google/rpc/status.proto) 968// represented as a STRUCT, and containing only `code` and `message`. 969message BatchPredictOutputConfig { 970 // Required. The destination of the output. 971 oneof destination { 972 // The Google Cloud Storage location of the directory where the output is to 973 // be written to. 974 GcsDestination gcs_destination = 1; 975 976 // The BigQuery location where the output is to be written to. 977 BigQueryDestination bigquery_destination = 2; 978 } 979} 980 981// Output configuration for ModelExport Action. 982message ModelExportOutputConfig { 983 // Required. The destination of the output. 984 oneof destination { 985 // The Google Cloud Storage location where the model is to be written to. 986 // This location may only be set for the following model formats: 987 // "tflite", "edgetpu_tflite", "tf_saved_model", "tf_js", "core_ml". 988 // 989 // Under the directory given as the destination a new one with name 990 // "model-export-<model-display-name>-<timestamp-of-export-call>", 991 // where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format, 992 // will be created. Inside the model and any of its supporting files 993 // will be written. 994 GcsDestination gcs_destination = 1; 995 996 // The GCR location where model image is to be pushed to. This location 997 // may only be set for the following model formats: 998 // "docker". 999 // 1000 // The model image will be created under the given URI. 1001 GcrDestination gcr_destination = 3; 1002 } 1003 1004 // The format in which the model must be exported. The available, and default, 1005 // formats depend on the problem and model type (if given problem and type 1006 // combination doesn't have a format listed, it means its models are not 1007 // exportable): 1008 // 1009 // * For Image Classification mobile-low-latency-1, mobile-versatile-1, 1010 // mobile-high-accuracy-1: 1011 // "tflite" (default), "edgetpu_tflite", "tf_saved_model", "tf_js", 1012 // "docker". 1013 // 1014 // * For Image Classification mobile-core-ml-low-latency-1, 1015 // mobile-core-ml-versatile-1, mobile-core-ml-high-accuracy-1: 1016 // "core_ml" (default). 1017 // 1018 // * For Image Object Detection mobile-low-latency-1, mobile-versatile-1, 1019 // mobile-high-accuracy-1: 1020 // "tflite", "tf_saved_model", "tf_js". 1021 // 1022 // * For Video Classification cloud, 1023 // "tf_saved_model". 1024 // 1025 // * For Video Object Tracking cloud, 1026 // "tf_saved_model". 1027 // 1028 // * For Video Object Tracking mobile-versatile-1: 1029 // "tflite", "edgetpu_tflite", "tf_saved_model", "docker". 1030 // 1031 // * For Video Object Tracking mobile-coral-versatile-1: 1032 // "tflite", "edgetpu_tflite", "docker". 1033 // 1034 // * For Video Object Tracking mobile-coral-low-latency-1: 1035 // "tflite", "edgetpu_tflite", "docker". 1036 // 1037 // * For Video Object Tracking mobile-jetson-versatile-1: 1038 // "tf_saved_model", "docker". 1039 // 1040 // * For Tables: 1041 // "docker". 1042 // 1043 // Formats description: 1044 // 1045 // * tflite - Used for Android mobile devices. 1046 // * edgetpu_tflite - Used for [Edge TPU](https://cloud.google.com/edge-tpu/) 1047 // devices. 1048 // * tf_saved_model - A tensorflow model in SavedModel format. 1049 // * tf_js - A [TensorFlow.js](https://www.tensorflow.org/js) model that can 1050 // be used in the browser and in Node.js using JavaScript. 1051 // * docker - Used for Docker containers. Use the params field to customize 1052 // the container. The container is verified to work correctly on 1053 // ubuntu 16.04 operating system. See more at 1054 // [containers 1055 // 1056 // quickstart](https: 1057 // //cloud.google.com/vision/automl/docs/containers-gcs-quickstart) 1058 // * core_ml - Used for iOS mobile devices. 1059 string model_format = 4; 1060 1061 // Additional model-type and format specific parameters describing the 1062 // requirements for the to be exported model files, any string must be up to 1063 // 25000 characters long. 1064 // 1065 // * For `docker` format: 1066 // `cpu_architecture` - (string) "x86_64" (default). 1067 // `gpu_architecture` - (string) "none" (default), "nvidia". 1068 map<string, string> params = 2; 1069} 1070 1071// Output configuration for ExportEvaluatedExamples Action. Note that this call 1072// is available only for 30 days since the moment the model was evaluated. 1073// The output depends on the domain, as follows (note that only examples from 1074// the TEST set are exported): 1075// 1076// * For Tables: 1077// 1078// [bigquery_destination][google.cloud.automl.v1beta1.OutputConfig.bigquery_destination] 1079// pointing to a BigQuery project must be set. In the given project a 1080// new dataset will be created with name 1081// 1082// `export_evaluated_examples_<model-display-name>_<timestamp-of-export-call>` 1083// where <model-display-name> will be made BigQuery-dataset-name 1084// compatible (e.g. most special characters will become underscores), 1085// and timestamp will be in YYYY_MM_DDThh_mm_ss_sssZ "based on ISO-8601" 1086// format. In the dataset an `evaluated_examples` table will be 1087// created. It will have all the same columns as the 1088// 1089// [primary_table][google.cloud.automl.v1beta1.TablesDatasetMetadata.primary_table_spec_id] 1090// of the 1091// [dataset][google.cloud.automl.v1beta1.Model.dataset_id] from which 1092// the model was created, as they were at the moment of model's 1093// evaluation (this includes the target column with its ground 1094// truth), followed by a column called "predicted_<target_column>". That 1095// last column will contain the model's prediction result for each 1096// respective row, given as ARRAY of 1097// [AnnotationPayloads][google.cloud.automl.v1beta1.AnnotationPayload], 1098// represented as STRUCT-s, containing 1099// [TablesAnnotation][google.cloud.automl.v1beta1.TablesAnnotation]. 1100message ExportEvaluatedExamplesOutputConfig { 1101 // Required. The destination of the output. 1102 oneof destination { 1103 // The BigQuery location where the output is to be written to. 1104 BigQueryDestination bigquery_destination = 2; 1105 } 1106} 1107 1108// The Google Cloud Storage location for the input content. 1109message GcsSource { 1110 // Required. Google Cloud Storage URIs to input files, up to 2000 characters 1111 // long. Accepted forms: 1112 // * Full object path, e.g. gs://bucket/directory/object.csv 1113 repeated string input_uris = 1; 1114} 1115 1116// The BigQuery location for the input content. 1117message BigQuerySource { 1118 // Required. BigQuery URI to a table, up to 2000 characters long. 1119 // Accepted forms: 1120 // * BigQuery path e.g. bq://projectId.bqDatasetId.bqTableId 1121 string input_uri = 1; 1122} 1123 1124// The Google Cloud Storage location where the output is to be written to. 1125message GcsDestination { 1126 // Required. Google Cloud Storage URI to output directory, up to 2000 1127 // characters long. 1128 // Accepted forms: 1129 // * Prefix path: gs://bucket/directory 1130 // The requesting user must have write permission to the bucket. 1131 // The directory is created if it doesn't exist. 1132 string output_uri_prefix = 1; 1133} 1134 1135// The BigQuery location for the output content. 1136message BigQueryDestination { 1137 // Required. BigQuery URI to a project, up to 2000 characters long. 1138 // Accepted forms: 1139 // * BigQuery path e.g. bq://projectId 1140 string output_uri = 1; 1141} 1142 1143// The GCR location where the image must be pushed to. 1144message GcrDestination { 1145 // Required. Google Contained Registry URI of the new image, up to 2000 1146 // characters long. See 1147 // 1148 // https: 1149 // //cloud.google.com/container-registry/do 1150 // // cs/pushing-and-pulling#pushing_an_image_to_a_registry 1151 // Accepted forms: 1152 // * [HOSTNAME]/[PROJECT-ID]/[IMAGE] 1153 // * [HOSTNAME]/[PROJECT-ID]/[IMAGE]:[TAG] 1154 // 1155 // The requesting user must have permission to push images the project. 1156 string output_uri = 1; 1157} 1158