TD Toolbelt Reference

You can run Treasure Data from the command line using these commands.

Command Example
Basic Commands td
Database Commands td db:create <db>
Table Commands td table:list [db]
Query Commands td query [sql]
Import Commands td import:list
Bulk Import Commands td bulk_import:list
Result Commands td result:list
Schedule Commands td sched:list
Schema Commands td schema:show <db> <table>
Connector Commands td connector:guess [config]
User Commands td user:list
Workflow Commands td connector:guess [config]
Job Commands td job:show <job_id>

Basic Commands

You can use the following commands to enable basic functions in Treasure Data.

td

Show list of options in Treasure Data.

Usage

Copy
Copied
     td

Options

Description

-c, --config PATH path to the configuration file (default: ~/.td/td.conf)
-k, --apikey KEY use this API key instead of reading the config file
-e, --endpoint API_SERVER specify the URL for API server to use (default: https://api.treasuredatacom). The URL must contain a scheme (http:// or https:// prefix) to be valid.
--insecure insecure access: disable SSL (enabled by default)
-v, --verbose verbose mode

-r, --retry-post-requests retry on failed post requests. Warning: can cause resource duplication, such as duplicated job submissions.
--versionshow version</p>

Additional Commands

Usage

Copy
Copied
     td <command>

Options

Description

db

create/delete/list databases

table

create/delete/list/import/export/tail tables

query

issue a query

job

show/kill/list jobs

import

manage bulk import sessions (Java based fast processing)

bulk_import

manage bulk import sessions (Old Ruby-based implementation)

result

create/delete/list result URLs

sched

create/delete/list schedules that run a query periodically

schema

create/delete/modify schemas of tables

connector

manage connectors

workflow

manage workflows

status

show scheds, jobs, tables and results

apikey

show/set API key

server

show status of the Treasure Data server

sample

create a sample log file

help

show help messages


Database Commands

You can create, delete, and view lists of databases from the command line.

td db create

Create a database.

Usage

Copy
Copied
     td db:create <db>

Example

Copy
Copied
     td db:create example_db

td db delete

Delete a database.

Usage

Copy
Copied
     td db:delete <db>

Options

Description

-f, --force

clear tables and delete the database

Example

Copy
Copied
     td db:delete example_db

td db list

Show list of tables in a database.

Usage

Copy
Copied
     td db:list

Options

Description

-f, --format FORMAT

format of the result rendering (tsv, csv, json or table. default is table)

Example

Copy
Copied
     td db:list
     td dbs

Table Commands

You can create, list, show, and organize table structure using the command line.

td table list

Show list of tables.

Usage

Copy
Copied
     td table:list [db]

Options

Description

-n, --num_threads VAL

–-show-bytes

number of threads to get list in parallel

show estimated table size in bytes

-f, --format FORMAT

format of the result rendering (tsv, csv, json or table. default is table)

Example

Copy
Copied
     td table:list
     td table:list example_db
     td tables

td table show

Describe information in a table.

Usage

Copy
Copied
     td table:show <db> <table>

Options

Description

-v

show more attributes

Example

Copy
Copied
     td table example_db table1

td table create

Create a table.

Usage

Copy
Copied
     td table:create <db> <table>

Options

Description

-T, --type TYPE

--expire-days DAYS

--include-v BOOLEAN

--detect-schema BOOLEAN

set table type (log)

set table expire days

set include_v flag

set detect schema flag

Example

Copy
Copied
     td table:create example_db table1

td table delete

Delete a table.

Usage

Copy
Copied
     td table:delete <db> <table>

Options

Description

-f, --force

never prompt

Example

Copy
Copied
     td table:delete example_db table1

td table import

Parse and import files to a table

Usage

Copy
Copied
     td table:import <db> <table> <files...>

Options

Description

--format FORMAT

file format (default: apache)

--apache

same as --format apache; apache common log format

--syslog

same as --format syslog; syslog

--msgpack

same as --format msgpack; msgpack stream format

--json

same as --format json; LF-separated json format

-t, --time-key COL_NAME

time key name for json and msgpack format (e.g. 'created_at')

--auto-create-table

Create table and database if doesn't exist

Example

Copy
Copied
     td table:import example_db table1 --apache access.log
     td table:import example_db table1 --json -t time - < test.json

How is the import command’s time format set in a windows batch file?

‘%’ is a recognized environment variable, so you must use ‘%%’ to set it.

Copy
Copied
 td import:prepare --format csv --column-header --time-column 'date' --time-format '%%Y-%%m-%%d' test.csv

td table export

Dump logs in a table to the specified storage

Usage

Copy
Copied
     td table:export <db> <table>

Options

Description

-w, --wait

wait until the job is completed

-f, --from TIME

export data which is newer than or same with the TIME

-t, --to TIME

export data which is older than the TIME

-b, --s3-bucket NAME

name of the destination S3 bucket (required)

-p, --prefix PATH

path prefix of the file on S3

-k, --aws-key-id KEY_ID

AWS access key id to export data (required)

-s, --aws-secret-key SECRET_KEY

AWS secret access key to export data (required)

-F, --file-format FILE_FORMAT

file format for exported data.

Available formats are tsv.gz (tab-separated values per line) and jsonl.gz (JSON record per line).

The json.gz and line-json.gz formats are default and still available but only for backward compatibility purpose;use is discouraged because they have far lower performance.

-O, --pool-name NAME

specify resource pool by name

-e, --encryption ENCRYPT_METHOD

export with server side encryption with the ENCRYPT_METHOD

-a ASSUME_ROLE_ARN,

--assume-role

export with assume role with ASSUME_ROLE_ARN as role arn

Example

Copy
Copied
     td table:export example_db table1 --s3-bucket mybucket -k KEY_ID -s SECRET_KEY

td table swap

Swap the names of two tables.

Usage

Copy
Copied
     td table:swap <db> <table1> <table2>

Example

Copy
Copied
     td table:swap example_db table1 table2

td table rename

Rename the existing table.

Usage

Copy
Copied
     td table:rename <db> <from_table> <dest_table>

Options

Description

--overwrite

replace existing dest table

Example

Copy
Copied
     td table:rename example_db table1 table2

td table tail

Get recently imported logs.

Usage

Copy
Copied
     td table:tail <db> <table>

Options

Description

-n, --count N

number of logs to get

-P, --pretty

pretty print

Example

Copy
Copied
     td table:tail example_db table1
     td table:tail example_db table1 -n 30

td table partial delete

Delete logs from the table within the specified time range.

Usage

Copy
Copied
     td table:partial_delete <db> <table>

Options

Description

-t, --to TIME

end time of logs to delete in Unix time >0 and multiple of 3600 (1 hour)

-f, --from TIME

start time of logs to delete in Unix time >0 and multiple of 3600 (1 hour)

-w, --wait

wait for the job to finish

-O, --pool-name NAME

specify resource pool by name

Example

Copy
Copied
     td table:partial_delete example_db table1 --from 1341000000 --to 1341003600

td table expire

Expire data in table after specified number of days. Set to 0 to disable the expiration.

Usage

Copy
Copied
     td table:expire <db> <table> <expire_days>

Example

Copy
Copied
     td table:expire example_db table1 30

Query Commands

You can issue queries from the command line.

td query

Issue a query

Usage

Copy
Copied
     td query [sql]

Options

Description

-d, --database DB_NAME

use the database (required)

-w, --wait[=SECONDS]

wait for finishing the job (for seconds)

-G, --vertical

use vertical table to show results

-o, --output PATH

write result to the file

-f, --format FORMAT

format of the result to write to the file (tsv, csv, json, msgpack, and msgpack.gz)

-r, --result RESULT_URL

write result to the URL (see also result:create subcommand)

It is suggested for this option to be used with the -x / --exclude option to suppress printing of the query result to stdout or -o / --output to dump the query result into a file.


-u, --user NAME

set user name for the result URL

-p, --password

ask password for the result URL

-P, --priority PRIORITY

set priority

-R, --retry COUNT

automatic retrying count

-q, --query PATH

use file instead of inline query

-T, --type TYPE

set query type (hive, presto)

--sampling DENOMINATOR

OBSOLETE - enable random sampling to reduce records 1/DENOMINATOR

-l, --limit ROWS

limit the number of result rows shown when not outputting to file

-c, --column-header

output of the columns' header when the schema is available for the table (only applies to json, tsv and csv formats)

-x, --exclude

do not automatically retrieve the job result

-O, --pool-name NAME

specify resource pool by name

--domain-key DOMAIN_KEY

optional user-provided unique ID.

You can include this ID with your `create` request to ensure idempotence

--engine-version ENGINE_VERSION

specify query engine version by name

Example

Copy
Copied
     td query -d example_db -w -r rset1 "select count(*) from table1"
     td query -d example_db -w -r rset1 -q query.txt

Import Commands

You can import and organize data from the command line using these commands.

td import list

List bulk import sessions

Usage

Copy
Copied
     td import:list

Options

Description

-f, --format FORMAT

format of the result rendering (tsv, csv, json or table. default is table)

Example

Copy
Copied
     td import:list

td import show

Show list of uploaded parts.

Usage

Copy
Copied
     td import:show <name>

Example

Copy
Copied
     td import:show

td import create

Create a new bulk import session to the table

Usage

Copy
Copied
     td import:create <name> <db> <table>

Example

Copy
Copied
     td import:create logs_201201 example_db event_logs

td import jar version

Show import jar version

Usage

Copy
Copied
     td import:jar_version

Example

Copy
Copied
     td import:jar_version

td import jar update

Update import jar to the latest version

Usage

Copy
Copied
     td import:jar_update

Example

Copy
Copied
     td import:jar_update

td import prepare

Convert files into part file format

Usage

Copy
Copied
     td import:prepare <files...>

Options

Description

-f, --format FORMAT

source file format [csv, tsv, json, msgpack, apache, regex, mysql]; default=csv

-C, --compress TYPE

compressed type [gzip, none, auto]; default=auto detect

-T, --time-format FORMAT

specifies the strftime format of the time column

The format slightly differs from Ruby's Time#strftime format in that the'%:z' and '%::z' timezone options are not supported.

-e, --encoding TYPE

encoding type [UTF-8, etc.]

-o, --output DIR

output directory. default directory is 'out'.

-s, --split-size SIZE_IN_KB

size of each parts (default: 16384)

-t, --time-column NAME

name of the time column

--time-value TIME,HOURS

time column's value. If the data doesn't have a time column,users can auto-generate the time column's value in 2 ways:

  • Fixed time value with --time-value TIME: where TIME is a Unix time in seconds since Epoch. The time column value is constant and equal to TIME seconds. E.g. '--time-value 1394409600' assigns the equivalent of timestamp 2014-03-10T00:00:00 to all records imported.

  • Incremental time value with --time-value TIME,HOURS: where TIME is the Unix time in seconds since Epoch and HOURS is the maximum range of the timestamps in hours.

This mode can be used to assign incremental timestamps to subsequent records. Timestamps will be incremented by 1 second each record. If the number of records causes the timestamp to overflow the range (timestamp >= TIME + HOURS * 3600), the next timestamp will restart at TIME and continue from there. E.g. '--time-value 1394409600,10' will assign timestamp 1394409600 to the first record, timestamp 1394409601 to the second, 1394409602 to the third, and so on until the 36000th record which will have timestamp 1394445600 (1394409600 + 10 * 3600). The timestamp assigned to the 36001th record will be 1394409600 again and the timestamp will restart from there.

--primary-key NAME:TYPE

pair of name and type of primary key declared in your item table

--prepare-parallel NUM

prepare in parallel (default: 2; max 96)

--only-columns NAME,NAME,...

only columns

--exclude-columns NAME,NAME,...

exclude columns

--error-records-handling MODE

error records handling mode [skip, abort]; default=skip

--invalid-columns-handling MODE

invalid columns handling mode [autofix, warn]; default=warn

--error-records-output DIR

write error records; default directory is 'error-records'.

--columns NAME,NAME,...

column names (use --column-header instead if the first line has column names)

--column-types TYPE,TYPE,...

column types [string, int, long, double]

--column-type NAME:TYPE

column type [string, int, long, double]. A pair of column name and type can be specified like 'age:int'

S, --all-string

disable automatic type conversion

--empty-as-null-if-numeric

the empty string values are interpreted as null values if columns are numerical types.

CSV/TSV Specific Options

Options

Description

--column-header

first line includes column names

--delimiter CHAR

delimiter CHAR; default="," at csv, "\t" at tsv

--escape CHAR

escape CHAR; default=\

--newline TYPE

newline [CRLF, LF, CR]; default=CRLF

--quote CHAR

quote [DOUBLE, SINGLE, NONE]; if csv format, default=DOUBLE. if tsv format, default=NONE

MySQL Specific Options

Options

Description

--db-url URL

JDBC connection URL

--db-user NAME

user name for MySQL account

--db-password PASSWORD

password for MySQL account

REGEX Specific Options

Options

Description

--regex-pattern PATTERN

pattern to parse line. When 'regex' is used as source file format, this option is required

Example

Copy
Copied
  td import:prepare logs/*.csv --format csv --columns time,uid,price,count --time-column time -o parts/
  td import:prepare logs/*.csv --format csv --columns date_code,uid,price,count --time-value 1394409600,10 -o parts/
  td import:prepare mytable --format mysql --db-url jdbc:mysql://localhost/mydb --db-user myuser --db-password mypass
  td import:prepare "s3://<s3_access_key>:<s3_secret_key>@/my_bucket/path/to/*.csv" --format csv --column-header --time-column date_time -o parts/

td import upload

Upload or re-upload files into a bulk import session

Usage

Copy
Copied
     td import:upload <session name> <files...>

Options

Description

--retry-count NUM

upload process will automatically retry at specified time; default: 10

--auto-create DATABASE.TABLE

create automatically bulk import session by specified database and table names

If you use 'auto-create' option, you MUST not specify any session name as first argument.

--auto-perform

perform bulk import job automatically

--auto-commit

commit bulk import job automatically

--auto-delete

delete bulk import session automatically

--parallel NUM

upload in parallel (default: 2; max 8)

-f, --format FORMAT

source file format [csv, tsv, json, msgpack, apache, regex, mysql]; default=csv

-C, --compress TYPE

compressed type [gzip, none, auto]; default=auto detect

-T, --time-format FORMAT

specifies the strftime format of the time column The format slightly differs from Ruby's Time#strftime format in that the '%:z' and '%::z' timezone options are not supported.

-e, --encoding TYPE

encoding type [UTF-8, etc.]

-o, --output DIR

output directory. default directory is 'out'.

-s, --split-size SIZE_IN_KB

size of each parts (default: 16384)

-t, --time-column NAME

name of the time column

--time-value TIME,HOURS

time column's value. If the data doesn't have a time column, users can auto-generate the time column's value in 2 ways:

  • Fixed time value with

  • -time-value

TIME: where TIME is a Unix time in seconds since Epoch. The time column value is constant and equal to TIME seconds. E.g. '--time-value 1394409600' assigns the equivalent of timestamp 2014-03-10T00:00:00 to all records imported.

  • Incremental time value with

  • -time-value

TIME,HOURS: where TIME is the Unix time in seconds since Epoch and HOURS is the maximum range of the timestamps in hours. This mode can be used to assign incremental timestamps to subsequent records. Timestamps will be incremented by 1 second each record. If the number of records causes the timestamp to overflow the range (timestamp >= TIME + HOURS

  • 3600), the next timestamp will restart at TIME and continue from there. E.g. '--time-value 1394409600,10' will assign timestamp 1394409600 to the first record, timestamp 1394409601 to the second, 1394409602 to the third, and so on until the 36000th record which will have timestamp 1394445600 (1394409600 + 10 * 3600). The timestamp assigned to the 36001th record will be 1394409600 again and the timestamp will restart from there.

--primary-key NAME:TYPE

pair of name and type of primary key declared in your item table

--prepare-parallel NUM

prepare in parallel (default: 2; max 96)

--only-columns NAME,NAME,...

only columns

--exclude-columns NAME,NAME,...

exclude columns

--error-records-handling MODE

error records handling mode [skip, abort]; default=skip

--invalid-columns-handling MODE

invalid columns handling mode [autofix, warn]; default=warn

--error-records-output DIR

write error records; default directory is 'error-records'.

--columns NAME,NAME,...

column names (use --column-header instead if the first line has column names)

--column-types TYPE,TYPE,...

column types [string, int, long, double]

--column-type NAME:TYPE

column type [string, int, long, double]. A pair of column name and type can be specified like 'age:int'

-S, --all-string

disable automatic type conversion

--empty-as-null-if-numeric

the empty string values are interpreted as null values if columns are numerical types.

CSV/TSV Specific Options

Options

Description

--column-header

irst line includes column names

-f --delimiter CHAR

delimiter CHAR; default="," at csv, "\t" at tsv

--escape CHAR

escape CHAR; default=\

--newline TYPE

newline [CRLF, LF, CR]; default=CRLF

--quote CHAR

quote [DOUBLE, SINGLE, NONE]; if csv format, default=DOUBLE. if tsv format, default=NONE

MySQL Specific Options

Options

Description

--db-url URL

JDBC connection URL

--db-user NAME

user name for MySQL account

--db-password PASSWORD

password for MySQL account

REGEX Specific Options

Options

Description

--regex-pattern PATTERN

pattern to parse line. When 'regex' is used as source file format, this option is required

Example

Copy
Copied
     td import:upload mysess parts/* --parallel 4
     td import:upload mysess parts/*.csv --format csv --columns time,uid,price,count --time-column time -o parts/
     td import:upload parts/*.csv --auto-create mydb.mytbl --format csv --columns time,uid,price,count --time-column time -o parts/
     td import:upload mysess mytable --format mysql --db-url jdbc:mysql://localhost/mydb --db-user myuser --db-password mypass
     td import:upload "s3://<s3_access_key>:<s3_secret_key>@/my_bucket/path/to/*.csv" --format csv --column-header --time-column date_time -o parts/

td import auto

Automatically upload or re-upload files into a bulk import session. It's functional equivalent of 'upload' command with 'auto-perform', 'auto-commit' and 'auto-delete' options. But it, by default, doesn't provide 'auto-create' option. If you want 'auto-create' option, you explicitly must declare it as command options.

Usage

Copy
Copied
     td import:auto <session name> <files...>

Options

Description

--retry-count NUM

upload process will automatically retry at specified time; default: 10

--auto-create DATABASE.TABLE

create automatically bulk import session by specified database and table names

If you use 'auto-create' option, you MUST not specify any session name as first argument.

--parallel NUM

upload in parallel (default: 2; max 8)

-f, --format FORMAT

source file format [csv, tsv, json, msgpack, apache, regex, mysql]; default=csv

-C, --compress TYPE

compressed type [gzip, none, auto]; default=auto detect

-T, --time-format FORMAT

specifies the strftime format of the time column The format slightly differs from Ruby's Time#strftime format in that the '%:z' and '%::z' timezone options are not supported.

-e, --encoding TYPE

encoding type [UTF-8, etc.]

-o, --output DIR

output directory. default directory is 'out'.

-s, --split-size SIZE_IN_KB

size of each parts (default: 16384)

-t, --time-column NAME

name of the time column

--time-value TIME,HOURS

time column's value. If the data doesn't have a time column, users can auto-generate the time column's value in 2 ways:

  • Fixed time value with --time-value TIME: where TIME is a Unix time in seconds since Epoch. The time column value is constant and equal to TIME seconds. E.g. '--time-value 1394409600' assigns the equivalent of timestamp 2014-03-10T00:00:00 to all records imported.

  • Incremental time value with --time-value TIME,HOURS: where TIME is the Unix time in seconds since Epoch and HOURS is the maximum range of the timestamps in hours. This mode can be used to assign incremental timestamps to subsequent records. Timestamps will be incremented by 1 second each record. If the number of records causes the timestamp tooverflow the range (timestamp >= TIME + HOURS * 3600), the next timestamp will restart at TIME and continue from there.E.g. '--time-value 1394409600,10' will assign timestamp 1394409600 to the first record, timestamp 1394409601 to the second, 1394409602 to the third, and so on until the 36000th record which will have timestamp 1394445600 (1394409600 + 10 * 3600). The timestamp assigned to the 36001th record will be 1394409600 again and the timestamp will restart from there.


--primary-key NAME:TYPE

pair of name and type of primary key declared in your item table

--prepare-parallel NUM

prepare in parallel (default: 2; max 96)

--only-columns NAME,NAME,...

only columns

--exclude-columns NAME,NAME,...

exclude columns

--error-records-handling MODE

error records handling mode [skip, abort]; default=skip

--invalid-columns-handling MODE

invalid columns handling mode [autofix, warn]; default=warn

--error-records-output DIR

write error records; default directory is 'error-records'.

--columns NAME,NAME,...

column names (use --column-header instead if the first line has column names)

--column-types TYPE,TYPE,...

column types [string, int, long, double]

--column-type NAME:TYPE

column type [string, int, long, double]. A pair of column name and type can be specified like 'age:int'

-S, --all-string

disable automatic type conversion

--empty-as-null-if-numeric

the empty string values are interpreted as null values if columns are numerical types.

CSV/TSV Specific Options

Options

Description

--column-header

first line includes column names

--delimiter CHAR

delimiter CHAR; default="," at csv, "\t" at tsv

--escape CHAR

escape CHAR; default=\

--newline TYPE

newline [CRLF, LF, CR]; default=CRLF

--quote CHAR

quote [DOUBLE, SINGLE, NONE]; if csv format, default=DOUBLE. if tsv format, default=NONE

__ __MySQL Specific Options__

Options

Description

--db-url URL

JDBC connection URL

--db-user NAME

user name for MySQL account

--db-password PASSWORD

password for MySQL account

REGEX Specific Options

Options

Description

--regex-pattern PATTERN

pattern to parse line. When 'regex' is used as source file format, this option is required

Example

Copy
Copied
     td import:auto mysess parts/* --parallel 4
     td import:auto mysess parts/*.csv --format csv --columns time,uid,price,count --time-column time -o parts/
     td import:auto parts/*.csv --auto-create mydb.mytbl --format csv --columns time,uid,price,count --time-column time -o parts/
     td import:auto mysess mytable --format mysql --db-url jdbc:mysql://localhost/mydb --db-user myuser --db-password mypass
     td import:auto "s3://<s3_access_key>:<s3_secret_key>@/my_bucket/path/to/*.csv" --format csv --column-header --time-column date_time -o parts/

td import perform

Start to validate and convert uploaded files

Usage

Copy
Copied
     td import:perform <name>

Options

Description

-w, --wait

wait for finishing the job

-f, --force

force start performing

-O, --pool-name NAME

specify resource pool by name

Example

Copy
Copied
     td import:perform logs_201201

td import error records

Show records which did not pass validations

Usage

Copy
Copied
     td import:error_records <name>

Example

Copy
Copied
     td import:error_records logs_201201

td import commit

Start to commit a performed bulk import session

Usage

Copy
Copied
     td import:commit <name>

Options

Description

-w, --wait

wait for finishing the commit

Example

Copy
Copied
     td import:commit logs_201201

td import delete

Delete a bulk import session

Usage

Copy
Copied
     td import:delete <name>

Example

Copy
Copied
     td import:delete logs_201201

td import freeze

Pause any further data upload for a bulk import session/Reject succeeding uploadings to a bulk import session

Usage

Copy
Copied
     td import:freeze <name>

Example

Copy
Copied
     td import:freeze logs_201201

td import unfreeze

Unfreeze a bulk import session

Usage

Copy
Copied
     td import:unfreeze <name>

Example

Copy
Copied
     td import:unfreeze logs_201201

td import config

create guess config from arguments Usage

Copy
Copied
     td import:config <files...>

Options

Description

-o, --out FILE_NAME

output file name for connector:guess

-f, --format FORMAT

source file format [csv, tsv, mysql]; default=csv

--db-url URL

Database Connection URL

--db-user NAME

user name for database

--db-password PASSWORD

password for database

--columns COLUMNS

not supported

--column-header COLUMN-HEADER

not supported

--time-column TIME-COLUMN

not supported

--time-format TIME-FORMAT

not supported

Example

Copy
Copied
     td import:config "s3://<s3_access_key>:<s3_secret_key>@/my_bucket/path/to/*.csv" -o seed.

Bulk Import Commands

You can create and organize bulk imports from the command line.

For instructions on how to use the bulk import commands, refer to the Bulk Import API Tutorial.

td bulk import list

List bulk import sessions

Usage

Copy
Copied
     td bulk_import:list

Options

Description

-f, --format FORMAT

format of the result rendering (tsv, csv, json or table. default is table)

Example

Copy
Copied
     td bulk_import:list

td bulk import show

Shows a list of uploaded parts

Usage

Copy
Copied
     td bulk_import:show <name>

Example

Copy
Copied
     td bulk_import:show logs_201201

td bulk import create

Creates a new bulk import session to the table

Usage

Copy
Copied
  td bulk_import:create <name> <db> <table>

Example

Copy
Copied
 td bulk_import:create logs_201201 example_db event_logs

td bulk import prepare parts

Converts files into part file format

Usage

Copy
Copied
     td bulk_import:prepare_parts <files...>

Options

Description

-f, --format NAME

source file format [csv, tsv, msgpack, json]

-h, --columns NAME,NAME,...

column names (use --column-header instead if the first line has column names)

-H, --column-header

first line includes column names

-d, --delimiter REGEX

--null REGEX

--true REGEX

--false REGEX

delimiter between columns (default: (?-mix:\t|,))

null expression for the automatic type conversion (default: (?i-mx:\A(?:null||\-|\\N)\z))

true expression for the automatic type conversion (default: (?i-mx:\A(?:true)\z))

false expression for the automatic type conversion (default: (?i-mx:\A(?:false)\z))

-S, --all-string

disable automatic type conversion

-t, --time-column NAME

name of the time column

-T, --time-format FORMAT

strftime(3) format of the time column

-time-value TIME

value of the time column

-e, --encoding NAME

text encoding

-C, --compress NAME

compression format name [plain, gzip] (default: auto detect)

s, --split-size SIZE_IN_KB

size of each parts (default: 16384)

-o, --output DIR

output directory

Example

Copy
Copied
     td bulk_import:prepare_parts logs/*.csv --format csv --columns time,uid,price,count --time-column "time" -o parts/

td bulk import upload parts

Uploads or re-uploads files into a bulk import session

Usage

Copy
Copied
     td bulk_import:upload_parts <name> <files...>

Options

Description

-P, --prefix NAME



add prefix to parts name

-s, --use-suffix COUNT

--auto-perform

--parallel NUM

use COUNT number of . (dots) in the source file name to the parts name

perform bulk import job automatically

perform uploading in parallel (default: 2; max 8)

-O, --pool-name NAME

specify resource pool by name

Example

Copy
Copied
 td bulk_import:upload_parts parts/* --parallel 4

td bulk import delete parts

Delete uploaded files from a bulk import session

Usage

Copy
Copied
 td bulk_import:delete_parts <name> <ids...>

Options

Description

-P, --prefix NAME

add prefix to parts name

Example

Copy
Copied
     td bulk_import:delete_parts logs_201201 01h 02h 03h

td bulk import perform

Start to validate and convert uploaded files

Usage

Copy
Copied
     td bulk_import:perform <name>

Options

Description

-w, --wait

-f, --force

-O, --pool-name NAME

wait for finishing the job

force start performing

specify resource pool by name

Example

Copy
Copied
     td bulk_import:perform logs_201201

td bulk import error records

Show records which did not pass validations

Usage

Copy
Copied
     td bulk_import:error_records <name>

Example

Copy
Copied
     td bulk_import:error_records logs_201201

td bulk import commit

Start to commit a performed bulk import session

Usage

Copy
Copied
     td bulk_import:commit <name>

Options

Description

-w, --wait

wait for finishing the commit

Example

Copy
Copied
     td bulk_import:commit logs_201201

td bulk import delete

Delete a bulk import session

Usage

Copy
Copied
     td bulk_import:delete <name>

Example

Copy
Copied
     td bulk_import:delete logs_201201

td bulk import freeze

Block the upload to a bulk import session

Usage

Copy
Copied
     td bulk_import:freeze <name>

Example

Copy
Copied
     td bulk_import:freeze logs_201201

td bulk import unfreeze

Unfreeze a frozen bulk import session

Usage

Copy
Copied
     td bulk_import:unfreeze <name>

Example

Copy
Copied
     td bulk_import:unfreeze logs_201201

Result Commands

You can use the command line to list, create, show, and delete results.

td result list

Show list of result URLs

Usage

Copy
Copied
     td result:list

Options

Description

-f, --format FORMAT

format of the result rendering (tsv, csv, json or table. default is table)

Example

Copy
Copied
     td result:list
     td results

td result show

Describe information of a result URL.

Usage

Copy
Copied
     td result:show <name>

Example

Copy
Copied
     td result name

td result create

Create a result URL

Usage

Copy
Copied
     td result:create <name> <URL>

Options

Description

-u, --user NAME

set user name for authentication

-p, --password

ask password for authentication

Example

Copy
Copied
     td result:create name mysql://my-server/mydb

td result delete

Delete a result URL.

Usage

Copy
Copied
     td result:delete <name>

Example

Copy
Copied
     td result:delete name

Schedule Commands

You can use the command line to schedule, update, delete, and list queries.

td sched list

Show list of schedules

Usage

Copy
Copied
     td sched:list

Options

Description

-f, --format FORMAT

format of the result rendering (tsv, csv, json or table. default is table)

Example

Copy
Copied
     td sched:list
     td scheds

td sched create

Create a schedule

Usage

Copy
Copied
     td sched:create <name> <cron> [sql]

Options

Description

-d, --database DB_NAME

use the database (required)

-t, --timezone TZ

name of the timezone.

Only extended timezones like 'Asia/Tokyo', 'America/Los_Angeles' are supported, (no 'PST', 'PDT', etc...). When a timezone is specified, the cron schedule is referred to that timezone. Otherwise, the cron schedule is referred to the UTC timezone. E.g. cron schedule '0 12 * * *' will execute daily at 5 AM without timezone option and at 12PM with the -t / --timezone 'America/Los_Angeles' timezone option

-D, --delay SECONDS

delay time of the schedule

-r, --result RESULT_URL

write result to the URL (see also result:create subcommand)

-u, --user NAME

set user name for the result URL

-p, --password

ask password for the result URL

-P, --priority PRIORITY

set priority

-q, --query PATH

use file instead of inline query

-R, --retry COUNT

automatic retrying count

-T, --type TYPE

set query type (hive)

Example

Copy
Copied
     td sched:create sched1 "0 * * * *" -d example_db "select count(*) from table1" -r rset1
     td sched:create sched1 "0 * * * *" -d example_db -q query.txt -r rset2

td sched delete

Delete a schedule

Usage

Copy
Copied
     td sched:delete <name>

Example

Copy
Copied
     td sched:delete sched1

td sched update

Modify a schedule

Usage

Copy
Copied
     td sched:update <name>

Options

Description

-n, --newname NAME

change the schedule's name

-s, --schedule CRON

change the schedule

-q, --query SQL

change the query

-d, --database DB_NAME

change the database

-r, --result RESULT_URL

change the result target (see also result:create subcommand)

-t, --timezone TZ

name of the timezone.

Only extended timezones like 'Asia/Tokyo', 'America/Los_Angeles' are supported, (no 'PST', 'PDT', etc...). When a timezone is specified, the cron schedule is referred to that timezone. Otherwise, the cron schedule is referred to the UTC timezone. E.g. cron schedule '0 12 * * *' will execute daily at 5 AM without timezone option and at 12PM with the -t / --timezone 'America/Los_Angeles' timezone option

-D, --delay SECONDS

change the delay time of the schedule

-P, --priority PRIORITY

set priority

-R, --retry COUNT

automatic retrying count

-T, --type TYPE

--engine-version ENGINE_VERSION

set query type (hive)

EXPERIMENTAL: specify query engine version by name

Example

Copy
Copied
     td sched:update sched1 -s "0 */2 * * *" -d my_db -t "Asia/Tokyo" -D 3600

td sched history

Show history of scheduled queries

Usage

Copy
Copied
     td sched:history <name> [max]

Options

Description

-p, --page PAGE

skip N pages

-s, --skip N

skip N schedules

-f, --format FORMAT

format of the result rendering (tsv, csv, json or table. default is table)


Example

Copy
Copied
     td sched sched1 --page 1

td sched run

Run scheduled queries for the specified time

Usage

Copy
Copied
     td sched:run <name> <time>

Options

Description

-n, --num N

number of jobs to run

-f, --format FORMAT

format of the result rendering (tsv, csv, json or table. default is table)

Example

Copy
Copied
     td sched:run sched1 "2013-01-01 00:00:00" -n 6

td sched result

Show status and result of the last job ran. --last [N] enables showing the result before N from the last. The other options are identical to those of the job:show command.

Usage

Copy
Copied
     td sched:result <name>

Options

Description

-v, --verbose

show logs

-w, --wait

wait for finishing the job

-G, --vertical

use vertical table to show results

-o, --output PATH

write result to the file

-l, --limit ROWS

limit the number of result rows shown when not outputting to file

-c, --column-header

output of the columns' header when the schema is available for the table (only applies to tsv and csv formats)

-x, --exclude

--null STRING

do not automatically retrieve the job result

null expression in csv or tsv

-f, --format FORMAT

--last [Number]

format of the result to write to the file (tsv, csv, json, msgpack, and msgpack.gz)

show the result before N from the last. default: 1

Example

Copy
Copied
     td sched:result NAME | sched:result NAME --last | sched:result NAME --last 3

Schema Commands

Use the command line to work with schema in a table.

td schema show

Show schema of a table

Usage

Copy
Copied
     td schema:show <db> <table>

Example

Copy
Copied
     td schema example_db table1

td schema set

Set new schema on a table

Usage

Copy
Copied
     td schema:set <db> <table> [columns...]

Example

Copy
Copied
     td schema:set example_db table1 user:string size:int

td schema add

Add new columns to a table.

Usage

Copy
Copied
     td schema:add <db> <table> <columns...>

Example

Copy
Copied
     td schema:add example_db table1 user:string size:int

td schema remove

Remove columns from a table

Usage

Copy
Copied
     td schema:remove <db> <table> <columns...>

Example

Copy
Copied
     td schema:remove example_db table1 user size

Connector Commands

You can use the command line to control several elements related to connectors.

td connector guess

Run guess to generate a connector config file

Usage

Copy
Copied
     td connector:guess [config]

Options

Description

-type[=TYPE]

(obsoleted)

-access-id ID

(obsoleted)

-access-secret SECRET

(obsoleted)

-source SOURCE

(obsoleted)

-o, --out FILE_NAME

output file name for connector:preview

-g, --guess NAME,NAME,...

specify list of guess plugins that users want to use

Example

Copy
Copied
     td connector:guess config.yml -o td-load.yml

Example config.yml

Copy
Copied
    in: type:
      s3 bucket: my-s3-bucket
      endpoint: s3-us-west-1.amazonaws.com
      path_prefix: path/prefix/to/import/ 
      access_key_id: ABCXYZ123ABCXYZ123
      secret_access_key: AbCxYz123aBcXyZ123
    out:
      mode: append

td connector preview

Show a subset of possible data that the data connector fetches

Usage

Copy
Copied
     td connector:preview <config>

Options

Description

-f, --format FORMAT

format of the result rendering (tsv, csv, json or table. default is table)

Example

Copy
Copied
     td connector:preview td-load.yml

td connector issue

Runs connector execution one time only

Usage

Copy
Copied
     td connector:issue <config>

Options

Description

-database DB_NAME

destination database

-table TABLE_NAME

destination table

-time-column COLUMN_NAME

data partitioning key

-w, --wait

wait for finishing the job

-auto-create-table

Create table and database if doesn't exist

Example

Copy
Copied
     td connector:issue td-load.yml

td connector list

Shows a list of connector sessions

Usage

Copy
Copied
     td connector:list

Options

Description

-f, --format FORMAT

format of the result rendering (tsv, csv, json or table. default is table)

Example

Copy
Copied
     td connector:list

td connector create

Creates a new connector session

Usage

Copy
Copied
     td connector:create <name> <cron> <database> <table> <config>

Options

Description

-time-column COLUMN_NAME

data partitioning key

-t, --timezone TZ

name of the timezone.

Only extended timezones like 'Asia/Tokyo', 'America/Los_Angeles' are supported, (no 'PST', 'PDT', etc...). When a timezone is specified, the cron schedule is referred to that timezone. Otherwise, the cron schedule is referred to the UTC timezone. E.g. cron schedule '0 12 * * *' will execute daily at 5 AM without timezone option and at 12PM with the -t / --timezone 'America/Los_Angeles' timezone option

-D, --delay SECONDS

delay time of the schedule

Example

Copy
Copied
     td connector:create connector1 "0 * * * *" connector_database connector_table td-load.yml

td connector show

Shows the execution settings for a connector such as name, timezone, delay, database, table

Usage

Copy
Copied
     td connector:show <name>

Example

Copy
Copied
     td connector:show connector1

td connector update

Modify a connector session

Usage

Copy
Copied
     td connector:update <name> [config]

Options

Description

-n, --newname NAME

change the schedule's name

-d, --database DB_NAME

change the database

-t, --table TABLE_NAME

change the table

-s, --schedule [CRON]

change the schedule or leave blank to remove the schedule

-z, --timezone TZ

name of the timezone.

Only extended timezones like 'Asia/Tokyo', 'America/Los_Angeles' are supported, (no 'PST', 'PDT', etc...). When a timezone is specified, the cron schedule is referred to that timezone. Otherwise, the cron schedule is referred to the UTC timezone. E.g. cron schedule '0 12 * * *' will execute daily at 5 AM without timezone option and at 12PM with the -t / --timezone 'America/Los_Angeles' timezone option

-D, --delay SECONDS

change the delay time of the schedule

-T, --time-column COLUMN_NAME

change the name of the time column

-c, --config CONFIG_FILE

update the connector configuration

--config-diff CONFIG_DIFF_FIL

update the connector config_diff

Example

Copy
Copied
     td connector:update connector1 -c td-bulkload.yml -s '@daily' ...

td connector delete

Delete a connector session

Usage

Copy
Copied
     td connector:delete <name>

Example

Copy
Copied
     td connector:delete connector1

td connector history

Show the job history of a connector session

Usage

Copy
Copied
     td connector:history <name>

Options

Description

-f, --format FORMAT

format of the result rendering (tsv, csv, json or table. default is table)

Example

Copy
Copied
     td connector:history connector1

td connector run

Run a connector session for the specified time option.

Usage

Copy
Copied
     td connector:run <name> [time]

Options

Description

-w, --wait

wait for finishing the job

Example

Copy
Copied
     td connector:run connector1 "2016-01-01 00:00:00"

User Commands

You can use the command line to control several elements related to users.

td user list

Show a list of users.

Usage

Copy
Copied
     td user:list

Options

Description

-f, --format FORMAT

format of the result rendering (tsv, csv, json or table. default is table)

Example

Copy
Copied
     td user:list

     td user:list -f csv

td user show

Show a user.

Usage

Copy
Copied
     td user:show <name>

Example

Copy
Copied
     td user:show "Roberta Smith"

td user create

Create a user. As part of the user creation process, you will be prompted to provide a password for the user.

Usage

Copy
Copied
     td user:create <first_name> --email <email_address>

Example

Copy
Copied
     td user:create "Roberta" --email "roberta.smith@acme.com"

td user delete

Delete a user.

Usage

Copy
Copied
     td user:delete <email_address>

Example

Copy
Copied
     td user:delete roberta.smith@acme.com

td user apikey list

Show API keys for a user.

Options

Description

-f, --format FORMAT

format of the result rendering (tsv, csv, json or table. default is table)

Usage

Copy
Copied
     td user:apikey:list <email_address>

Example

Copy
Copied
     td user:apikey:list roberta.smith@acme.com

     td user:apikey:list roberta.smith@acme.com -f csv

td user apikey add

Add an API key to a user.

Usage

Copy
Copied
     td user:apikey:add <email_address>

Example

Copy
Copied
     td user:apikey:add roberta.smith@acme.com

td user apikey remove

Remove an API key from a user.

Usage

Copy
Copied
     td user:apikey:remove <email_address> <apikey>

Example

Copy
Copied
     td user:apikey:remove roberta.smith@acme.com 1234565/abcdefg

Workflow Commands

You can create or modify workflows from the CLI using the following commands. The command wf can be used interchangeably with workflow.

Basic Workflow Commands

td workflow reset

Reset the workflow moduleS

Usage

Copy
Copied
     td workflow:reset

td workflow:update

Update the workflow module

Usage

Copy
Copied
     td workflow:update [version]

td workflow:version

Show workflow module version

Usage

Copy
Copied
     td workflow:version

Local-mode commands

You can use the following commands to locally initiate changes to workflows.

Usage

Copy
Copied
     td workflow <command> [options...]

Options

Description

init <dir>

create a new workflow project

r[un] <workflow.dig>

run a workflow

c[heck]

show workflow definitions

sched[uler]

run a scheduler server

migrate(run|check)

migrate database

selfupdate

update CLI to the latest version

Info

Secrets for local mode use the following command:

td workflow secrets --local

Server-mode commands

You can use the following commands to initiate changes to workflows from the server.

Usage

Copy
Copied
     td workflow <command> [options...]

Options

Description

server

start server

Client-mode commands

You can use the following commands to initiate changes to workflows from the client.

Usage

Copy
Copied
     td workflow <command> [options...]

Options

Description

push <project-name>

create and upload a new revision

download <project-name>

pull an uploaded revision

start <project-name> <name>

start a new session attempt of a workflow

retry <attempt-id>

retry a session

kill <attempt-id>

kill a running session attempt

backfill <schedule-id>

start sessions of a schedule for past times

backfill <project-name> <name>

start sessions of a schedule for past times

reschedule <schedule-id>

skip sessions of a schedule to a future time

reschedule <project-name> <name>

skip sessions of a schedule to a future time

projects [name]

show projects

workflows [project-name] [name]

show registered workflow definitions

schedules

show registered schedules

disable <schedule-id>

disable a workflow schedule

disable <project-name>

disable all workflow schedules in a project

disable <project-name> <name>

disable a workflow schedule

enable <schedule-id>

enable a workflow schedule

enable <project-name>

enable all workflow schedules in a project

enable <project-name> <name>

enable a workflow schedule

sessions

show sessions for all workflows

sessions <project-name>

show sessions for all workflows in a project

sessions <project-name> <name>

show sessions for a workflow

session <session-id>

show a single session

attempts

show attempts for all sessions

attempts <session-id>

show attempts for a session

attempt <attempt-id>

show a single attempt

tasks <attempt-id>

show tasks of a session attempt

delete <project-name>

delete a project

secrets --project <project-name>

manage secrets

version

show client and server version

parameter

Description

-L, --log PATH

output log messages to a file (default: -)

-l, --log-level LEVEL

log level (error, warn, info, debug or trace)

-X KEY=VALUE

add a performance system config

-c, --config PATH.properties

Configuration file (default: /Users/<user_name>/.config/digdag/config)

--version

show client version

client options:

parameter

Description

-e, --endpoint URL

Server endpoint

-H, --header KEY=VALUE

Additional headers

--disable-version-check

Disable server version check

--disable-cert-validation

Disable certificate verification

--basic-auth <user:pass>

Add an Authorization header with the provided username and password


Job Commands

You can view status and results of jobs, view lists of jobs and delete jobs using the CLI.

td job show

Show status and results of a job.

Usage

Copy
Copied
  td job:show <job_id>

Example

Copy
Copied
  td job:show 1461
Options Description
-v, --verbose show logs
-w, --wait

wait for finishing the job


-G, --vertical use vertical table to show results
-o, --output PATH write results to the file
-l, --limit ROWS limit the number of result rows shown when not outputting to file
-c, --column-header

output of the columns' header when the schema is available for the table (only applies to tsv and csv formats)

-x, --exclude do not automatically retrieve the job result
--null STRING

null expression in csv or tsv

-f, --format FORMAT format of the result to write to the file (tsv, csv, json, msgpack, and msgpack.gz)

td job status

Show status progress of a job.

Usage

Copy
Copied
  td job:status <job_id>

Example

Copy
Copied
  td job:status 1461

td job list

Copy
Copied
  td job:list [max]

[max] is the number of jobs to show.

Example

Copy
Copied
  td jobs --page 1
Options Description
-p, --page PAGE skip N pages
-s, --skip N skip N jobs
-R, --running show only running jobs
-S, --success show only succeeded jobs
-E, --error show only failed jobs
--slow [SECONDS] show slow queries (default threshold: 3600 seconds)
-f, --format FORMAT format of the result rendering (tsv, csv, json or table. default is table)

td job kill

Kill or cancel a job.

Copy
Copied
  td jobs --page 1

Example

Copy
Copied
  td jobs --page 1