Java Client for TD-API

Using the Java client for Treasure Data API, you can:

  • Submit Hive/Presto queries to Treasure Data.
  • Check the status of jobs (queries).
  • Retrieve query results.
  • Check the information of databases and tables.

Note that td-client-java 0.8.0 requires Java 1.8 or higher. And td-client-java-0.7.x requires Java7.

Install

You can download a Jar file (td-client-java-(version)-shade.jar) from here.

For the information about the older versions.

Use the following dependency settings for either Maven or the Standalone Jar file.

MavenStandalone Jar
Copy
Copied
<dependency>
  <groupId>com.treasuredata.client</groupId>
  <artifactId>td-client</artifactId>
  <version>(version)</version>
</dependency>

<!-- If you are not using any slf4 logger binder, add the following dependency, too. -->
<dependency>
  <groupId>ch.qos.logback</groupId>
  <artifactId>logback-classic</artifactId>
  <version>1.2.3</version>
</dependency>
Copy
Copied
<dependency>
  <groupId>com.treasuredata.client</groupId>
  <artifactId>td-client</artifactId>
  <version>(version)</version>
  <classifier>shade</classifier>
</dependency>

Basic Use

Set API Key

Option 1 : Config file

To use td-client-java, you need to set your API key in the $HOME/.td/td.conf file.

Copy
Copied
[account]
  user = (your TD account e-mail address)
  apikey = <YOUR_API_KEY>

Option 2: Environment variable

It is also possible to use the TD_API_KEY environment variable. Add the following configuration to your shell configuration .bash_profile, .zprofile, etc.

Copy
Copied
export TD_API_KEY = YOUR_API_KEY

For Windows, add the TD_API_KEY environment variable in the user preference panel.

Example Code

Copy
Copied
import com.treasuredata.client.*;
import com.google.common.base.Function;
import org.msgpack.core.MessagePack;
import org.msgpack.core.MessageUnpacker;
import org.msgpack.value.ArrayValue;
...

// Create a new TD client by using configurations in $HOME/.td/td.conf
TDClient client = TDClient.newClient();
try {

// Retrieve database and table names
List<TDDatabase> databaseNames = client.listDatabases();
for(TDDatabase db : databaseNames) {
   System.out.println("database: " + db.getName());
   for(TDTable table : client.listTables(db.getName())) {
      System.out.println(" table: " + table);
   }
}

// Submit a new Presto query (for Hive, use TDJobReqult.newHiveQuery)
String jobId = client.submit(TDJobRequest.newPrestoQuery("sample_datasets", "select count(1) from www_access"));

// Wait until the query finishes
ExponentialBackOff backoff = new ExponentialBackOff();
TDJobSummary job = client.jobStatus(jobId);
while(!job.getStatus().isFinished()) {
  Thread.sleep(backoff.nextWaitTimeMillis());
  job = client.jobStatus(jobId);
}

// Read the detailed job information
TDJob jobInfo = client.jobInfo(jobId);
System.out.println("log:\n" + jobInfo.getCmdOut());
System.out.println("error log:\n" + jobInfo.getStdErr());

// Read the job results in msgpack.gz format
client.jobResult(jobId, TDResultFormat.MESSAGE_PACK_GZ, new Function<InputStream, Object>() {
  @Override
  public Object apply(InputStream input) {
  try {
    MessageUnpacker unpacker = MessagePack.newDefaultUnpacker(new GZIPInputStream(input));
    while(unpacker.hasNext()) {
       // Each row of the query result is array type value (e.g., [1, "name", ...])
       ArrayValue array = unpacker.unpackValue().asArrayValue();
       int id = array.get(0).asIntegerValue().toInt();
    }
  }
});

...

}
finally {
  // Never forget to close the TDClient.
  client.close();
}

Bulk upload

Copy
Copied
// Create a new TD client by using configurations in $HOME/.td/td.conf
TDClient client = TDClient.newClient();

File f = new File("./sess/part01.msgpack.gz");

TDBulkImportSession session = client.createBulkImportSession("session_name", "database_name", "table_name");
client.uploadBulkImportPart(session.getName(), "session_part01", f);

Data Connector Bulk Loading

Copy
Copied
// Create a new TD client by using configurations in $HOME/.td/td.conf
TDClient client = TDClient.newClient();

client.startBulkLoadSession("session_name");

Advanced Use

Proxy Server

If you need to access Web through proxy, add the following configuration to $HOME/.td/td.conf file:

Copy
Copied
[account]
  user = (your TD account e-mail address)
  apikey = (your API key)
  td.client.proxy.host = (optional: proxy host name)
  td.client.proxy.port = (optional: proxy port number)
  td.client.proxy.user = (optional: proxy user name)
  td.client.proxy.password = (optional: proxy password)

Configuring TDClient

To configure TDClient, use TDClient.newBuilder():

Copy
Copied
TDClient client = TDClient
    .newBuilder()
    .setApiKey("(your api key)")
    .setEndpoint("api.ybi.idcfcloud.net")   // For using a non-default endpoint
    .build()

It is also possible to set the configuration with a Properties object:

Copy
Copied
Properties prop = new Properties();
// Set your own properties
prop.setProperty("td.client.retry.limit", "10");
...

// This overrides the default configuration parameters with the given Properties
TDClient client = TDClient.newBuilder().setProperties(prop).build();

Configuration Parameters

The precedence of the configuration parameters are as follows:

  1. Properties object passed to TDClient.newBuilder().setProperties(Properties p)
  2. Parameters written in $HOME/.td/td.conf
  3. System properties (passed with -D option when launching JVM)
  4. Environment variable (only for TD_API_KEY parameter)
Key Default Value Description
apikey API key to access Treasure Data. You can also set this via TD_API_KEY environment variable.
user Account e-mail address (unnecessary if apikey is set)
password Account password (unnecessary if apikey is set)
td.client.proxy.host (optional) Proxy host e.g., "myproxy.com"
td.client.proxy.port (optional) Proxy port e.g., "80"
td.client.proxy.user (optional) Proxy user
td.client.proxy.password (optional) Proxy password
td.client.usessl true (optional) Use SSL encryption
td.client.retry.limit 7 (optional) The maximum number of API request retry
td.client.retry.initial-interval 500 (optional) backoff retry interval = (interval) * (multiplier) ^ (retry count)
td.client.retry.max-interval 60000 (optional) max retry interval
td.client.retry.multiplier 2.0 (optional) retry interval multiplier
td.client.connect-timeout 15000 (optional) connection timeout before reaching the API
td.client.read-timeout 60000 (optional) timeout when no data is coming from API
td.client.connection-pool-size 64 (optional) Connection pool size
td.client.endpoint api.treasuredata.com (optional) TD REST API endpoint name
td.client.port 80 for non-SSL, 443 for SSL connection (optional) TD API port number

Further Reading