bq_load>: Importing Data into Google BigQuery

bq_load> operator can be used to import data into Google BigQuery tables.

Copy
Copied
_export:
  bq:
    dataset: my_dataset

+ingest:
  bq_load>: gs://my_bucket/data.csv
  destination_table: my_data

+process:
  bq>: queries/process.sql
  destination_table: my_result

Secrets

When you don't know how to set secrets, please refer to Managing Workflow Secret

Options

  • bq_load> : URI | LIST

    A URI or list of URIs identifying files in GCS to import.

    Examples:

    Copy
    Copied
    bq_load>: gs://my_bucket/data.csv
    Copy
    Copied
    bq_load>:
    - gs://my_bucket/data1.csv.gz
    - gs://my_bucket/data2_*.csv.gz
  • dataset : NAME

    The dataset that the destination table is located in or should be created in. Can also be specified directly in the table reference.

    Examples:

    Copy
    Copied
    dataset: my_dataset
    Copy
    Copied
    dataset: my_project:my_dataset
  • destination_table : NAME

    The table to store the imported data in.

    Examples:

    Copy
    Copied
    destination_table: my_result_table
    Copy
    Copied
    destination_table: some_dataset.some_table
    Copy
    Copied
    destination_table: some_project:some_dataset.some_table

    You can append a date as $YYYYMMDD form at the end of table name to store data in a specific partition. See Creating and Updating Date-Partitioned Tables document for details.

    Copy
    Copied
    destination_table: some_dataset.some_partitioned_table$20160101
  • location : LOCATION

    The location where the job should run. The source GCS bucket and the table must be in this location. See BigQuery locations for a list of available locations.

    Examples:

    Copy
    Copied
    location: asia-northeast1
  • project : NAME

    The project that the table is located in or should be created in. Can also be specified directly in the table reference or the dataset parameter.

  • source_format : CSV | NEWLINE DELIMITED JSON | AVRO | DATASTORE_BACKUP

    The format of the files to be imported. Default: CSV.

    Examples:

    Copy
    Copied
    source_format: CSV
    Copy
    Copied
    source_format: NEWLINE_DELIMITED_JSON
    Copy
    Copied
    source_format: AVRO
    Copy
    Copied
    source_format: DATASTORE_BACKUP
  • field_delimiter : CHARACTER

    The separator used between fields in CSV files to be imported. Default: ,.

    Examples:

    Copy
    Copied
    field_delimiter: '\\t'
  • create_disposition : CREATE IF NEEDED | CREATE_NEVER

    Specifies whether the destination table should be automatically created when performing the import.

    • CREATE_IF_NEEDED : (default) The destination table is created if it does not already exist.
    • CREATE_NEVER : The destination table must already exist, otherwise the import will fail.

    Examples:

    Copy
    Copied
    create_disposition: CREATE_IF_NEEDED
    Copy
    Copied
    create_disposition: CREATE_NEVER
  • write_disposition : WRITE TRUNCATE | WRITE APPEND | WRITE_EMPTY

    Specifies whether to permit importing data to an already existing destination table.

    • WRITE_TRUNCATE : If the destination table already exists, any data in it will be overwritten.
    • WRITE_APPEND : If the destination table already exists, any data in it will be appended to.
    • WRITE_EMPTY : (default) The import fails if the destination table already exists and is not empty.

    Examples:

    Copy
    Copied
    write_disposition: WRITE_TRUNCATE
    Copy
    Copied
    write_disposition: WRITE_APPEND
    Copy
    Copied
    write_disposition: WRITE_EMPTY
  • skip_leading_rows : INTEGER

    The number of leading rows to skip in CSV files to import. Default: 0.

    Examples:

    Copy
    Copied
    skip_leading_rows: 1
  • encoding : UTF-8 | ISO-8859-1 The character encoding of the data in the files to import. Default : UTF-8 .

    Examples:

    Copy
    Copied
    encoding: ISO-8859-1
  • quote : CHARACTER

    The character quote of the data in the files to import. Default: '"'.

    Examples:

    Copy
    Copied
    quote: ''
    Copy
    Copied
    quote: "'"
  • max_bad_records : INTEGER

    The maximum number of bad records to ignore before failing the import. Default: 0.

    Examples:

    Copy
    Copied
    max_bad_records: 100
  • allow_quoted_newlines : BOOLEAN

    Whether to allow quoted data sections that contain newline characters in a CSV file. Default: false.

  • allow_jagged_rows : BOOLEAN

    Whether to accept rows that are missing trailing optional columns in CSV files. Default: false.

  • ignore_unknown_values : BOOLEAN

    Whether to ignore extra values in data that are not represented in the table schema. Default: false.

  • projection_fields : LIST

    A list of names of Cloud Datastore entity properties to load. Requires source_format: DATASTORE_BACKUP.

  • autodetect : BOOLEAN

    Whether to automatically infer options and schema for CSV and JSON sources. Default: false.

  • schema_update_options : LIST

    A list of destination table schema updates that may be automatically performed when performing the import.

    Copy
    Copied
    schema_update_options:
    - ALLOW_FIELD_ADDITION
    - ALLOW_FIELD_RELAXATION
  • schema : OBJECT | STRING

    A table schema. It can accept object, json or yml file path.

    Example:

    You can write schema within .dag file directly.

    Copy
    Copied
    +step:
    bq_load>: gs://<bucket>/path/to_file
    ...
    schema:
      fields:
        - name: "name"
          type: "string"
        ...

    Or you can write it as external file.

    javayaml
    Copy
    Copied
    {
    "fields": [
      {"name": "name", "type": "STRING"},
      ...
    ]
    }
    Copy
    Copied
    fields:
    - name: "name"
      type: "string"
    ...

    And specify the file path. Supported formats are YAML and JSON. If an extension of the path is .json bq_load try parse as JSON, otherwise YAML.

    Copy
    Copied
    +step:
    bq_load>: gs://<bucket>/path/to_file
    ...
    schema: path/to/schema.json
    # or
    # schema: path/to/schema.yml

Output parameters

  • bq.last_job_id

    The id of the BigQuery job that performed this import.

    Note: bq.last_jobid parameter is kept only for backward compatibility but you must not use it because it will be removed in a near future release.