bq_load>: Importing Data into Google BigQuery
bq_load> operator can be used to import data into Google BigQuery tables.
_export:
bq:
dataset: my_dataset
+ingest:
bq_load>: gs://my_bucket/data.csv
destination_table: my_data
+process:
bq>: queries/process.sql
destination_table: my_resultSecrets
When you don't know how to set secrets, please refer to Managing Workflow Secret
-
gcp.credential
: CREDENTIAL
See gcp.credential.
Options
-
bq_load>
: URI | LIST
A URI or list of URIs identifying files in GCS to import.
Examples:
bq_load>: gs://my_bucket/data.csv bq_load>:
- gs://my_bucket/data1.csv.gz
- gs://my_bucket/data2_*.csv.gz-
dataset
: NAME
The dataset that the destination table is located in or should be created in. Can also be specified directly in the table reference.
Examples:
dataset: my_dataset dataset: my_project:my_dataset-
destination_table
: NAME
The table to store the imported data in.
Examples:
destination_table: my_result_table destination_table: some_dataset.some_table destination_table: some_project:some_dataset.some_table You can append a date as $YYYYMMDD form at the end of table name to store data in a specific partition.
See Creating and Updating Date-Partitioned Tables document for details.
destination_table: some_dataset.some_partitioned_table$20160101-
location
: LOCATION
The location where the job should run. The source GCS bucket and the table must be in this location. See BigQuery locations for a list of available locations.
Examples:
location: asia-northeast1-
project
: NAME
The project that the table is located in or should be created in. Can also be specified directly in the table reference or the dataset parameter.
-
source_format
: CSV | NEWLINE
DELIMITED
JSON | AVRO | DATASTORE_BACKUP
The format of the files to be imported. Default:
CSV.Examples:
source_format: CSV source_format: NEWLINE_DELIMITED_JSON source_format: AVRO source_format: DATASTORE_BACKUP-
field_delimiter
: CHARACTER
The separator used between fields in CSV files to be imported. Default:
,.Examples:
field_delimiter: '\\t'-
create_disposition
: CREATE
IF
NEEDED | CREATE_NEVER
Specifies whether the destination table should be automatically created when performing the import.
-
CREATE_IF_NEEDED: (default) The destination table is created if it does not already exist. -
CREATE_NEVER: The destination table must already exist, otherwise the import will fail.
Examples:
-
create_disposition: CREATE_IF_NEEDED create_disposition: CREATE_NEVER-
write_disposition
: WRITE
TRUNCATE | WRITE
APPEND | WRITE_EMPTY
Specifies whether to permit importing data to an already existing destination table.
-
WRITE_TRUNCATE: If the destination table already exists, any data in it will be overwritten. -
WRITE_APPEND: If the destination table already exists, any data in it will be appended to. -
WRITE_EMPTY: (default) The import fails if the destination table already exists and is not empty.
Examples:
-
write_disposition: WRITE_TRUNCATE write_disposition: WRITE_APPEND write_disposition: WRITE_EMPTY-
skip_leading_rows
: INTEGER
The number of leading rows to skip in CSV files to import. Default:
0.Examples:
skip_leading_rows: 1-
encoding
: UTF-8 | ISO-8859-1
The character encoding of the data in the files to import.
Default
:
UTF-8.Examples:
encoding: ISO-8859-1-
quote
: CHARACTER
The character quote of the data in the files to import. Default:
'"'.Examples:
quote: '' quote: "'"-
max_bad_records
: INTEGER
The maximum number of bad records to ignore before failing the import. Default:
0.Examples:
max_bad_records: 100-
allow_quoted_newlines
: BOOLEAN
Whether to allow quoted data sections that contain newline characters in a CSV file. Default:
false. -
allow_jagged_rows
: BOOLEAN
Whether to accept rows that are missing trailing optional columns in CSV files. Default:
false. -
ignore_unknown_values
: BOOLEAN
Whether to ignore extra values in data that are not represented in the table schema. Default:
false. -
projection_fields
: LIST
A list of names of Cloud Datastore entity properties to load. Requires
source_format: DATASTORE_BACKUP. -
autodetect
: BOOLEAN
Whether to automatically infer options and schema for CSV and JSON sources. Default:
false. -
schema_update_options
: LIST
A list of destination table schema updates that may be automatically performed when performing the import.
schema_update_options:
- ALLOW_FIELD_ADDITION
- ALLOW_FIELD_RELAXATION-
schema
: OBJECT | STRING
A table schema. It can accept object, json or yml file path.
Example:
You can write schema within .dag file directly.
+step: bq_load>: gs://<bucket>/path/to_file ... schema: fields: - name: "name" type: "string" ...Or you can write it as external file.
javayaml{ "fields": [ {"name": "name", "type": "STRING"}, ... ] }fields: - name: "name" type: "string" ...And specify the file path. Supported formats are YAML and JSON. If an extension of the path is
.jsonbq_load try parse as JSON, otherwise YAML.+step: bq_load>: gs://<bucket>/path/to_file ... schema: path/to/schema.json # or # schema: path/to/schema.yml
Output parameters
-
bq.last_job_id
The id of the BigQuery job that performed this import.
Note:
bq.last_jobidparameter is kept only for backward compatibility but you must not use it because it will be removed in a near future release.