As you might know, Treasure Data provides two types of query engines: Hive and Presto. Since the engines are different, there are natural differences regarding the syntax of query statements and supported functions. Of course, the basic syntax of typical functions like `COUNT` are the same. Although you can learn about the behavior of each function and syntax by looking at the documentation, I imagine that most of you will probably want to actually run the Hive or Presto job before implementation to production.

How, then, do you prepare your test data? Even in cases where a few rows of records are sufficient for testing, it is very troublesome to create an empty table and insert rows using the `INSERT` statement or to import a table via a bulk load job (e.g. using a Data Connector). Also, from an operational standpoint, it is not good to have an increasing number of tables that are used for testing even if the tables are deleted after the test is completed.

Therefore, let me show you in this article how to easily check the behavior of Hive or Presto jobs without creating a test table.

# Case 1: When one row is enough for testing

First, let me show this small test case. Suppose you wanted to use the [SPLIT](https://trinodb.github.io/docs.trino.io/350/functions/string.html#split) function in Presto. The documentation says the following:

![Presto Split Function](/assets/prestosplitfunction.e34a4227b13ea709e36b274c84c088a4d4c2e066c2f3921b3f81fd1df2c9e6f3.c5c3197e.png)

If you don't know this function, it might be better to confirm the behavior before you use it in production query statements. Usually, you might create a new test table and store records in it for testing. After that you would run the following query statement.


```sql
SELECT SPLIT(col1, ',') as split_col
  FROM table_a
 WHERE col2=100;
```

Of course, this will work for you. However, I think that it is excessive to create a new table in order to check the behavior of a function or a simple test case. Instead, consider passing a string directly to the argument of the `SPLIT` function instead of specifying a column name? It is easy to assume that a `FROM` clause is required, but the query statement works without it. Here is an example:


```sql
SELECT SPLIT('aaaa,bbbb,cccc,dddd', ',') as split_col;
```

To see an example of the output of this query from the TD Console, go to **Data Workbench > Queries > New Query**. Select a database, and then enter your SQL query. Then select **Run**. Here is an example of the output using TD Console:

![Presto One Line SQL Test](/assets/onelinesqlconsole.f47412f5c86e3a00561fa0459d767309bbb6e08bac4a3351fbe7ce304d35657f.c5c3197e.png)

Here is an example of the output using the TD CLI or cURL:


```bash Curl Example
curl --location --request POST 'https://api.treasuredata.com/v3/job/issue/presto/sample_test_db' --header 'Accept: application/json' --header 'Authorization: TD1 123/abcdef•••••••••••••••••••••••0123456789' -d "query=SELECT SPLIT('aaaa,bbbb,cccc,dddd', ',') as split_col;"

> {"job":"1838627512","job_id":"1838627512","database":"sample_test_db"}


curl --location 'https://console.treasuredata.com/v3/job/result/1838627512' --header 'Content-Type: application/json' --header 'Authorization: TD1 123/abcdef•••••••••••••••••••••••0123456789'

> "[""aaaa"", ""bbbb"", ""cccc"", ""dddd""]"
```


```bash TD CLI Example 
td query -d  sample_db_test -T presto -w "SELECT SPLIT('aaaa,bbbb,cccc,dddd', ',') as split_col;"

>
Job 1838376808 is queued.
Use 'td job:show 1838376808' to show the status.
  started at 2023-06-06T19:43:51Z
  presto version: 350
  executing query: SELECT SPLIT('aaaa,bbbb,cccc,dddd', ',') as split_col
  .
  .
  .
  .
Status      : success
Result      :
WARNING: the job result is being downloaded...

+-------------------------------+
| split_col                     |
+-------------------------------+
| ["aaaa","bbbb","cccc","dddd"] |
+-------------------------------+
1 row in set
```

# Case2: When multiple rows are required

If you would like to check the behavior of an aggregation function via a query, a single row is not enough for testing. Therefore, you need to prepare multiple rows as test data. Let me show you how to achieve this without creating a test table using Presto and Hive.

## Presto Case

In the case of Presto, you can prepare multiple rows by using `VALUES`. You can define the multiple rows after the `VALUES` in the form `(<first_row>), (second_row), ..., (nth_row)`. Regarding column names, you can define them by aliasing them with `AS`.


```sql
SELECT *
  FROM (
    VALUES
      (1, 100, 'test1'),  -- first row
      (1, 200, 'test2'),  -- second row
      (2, 200, 'test3'),  -- third row
      (2, 400, 'test4'),  -- fourth row
      (2, 600, 'test5')   -- fifth row
       ) AS t(col1, col2, col3)  -- define column names
;
```

Here is an example of the output using TD Console:

![Presto Multiple Line SQL Test](/assets/multlinesqlconsolepresto.39f92d01c41f21ae90739b5c5b26ac03a436df19a07f8b60803a8a9f2fcb122d.c5c3197e.png)

Here is an example of the output using the TD CLI or cURL:


```bash Curl Example
curl --location --request POST 'https://api.treasuredata.com/v3/job/issue/presto/sample_test_db' --header 'Accept: application/json' --header 'Authorization: TD1 123/abcdef•••••••••••••••••••••••0123456789' -d "query=SELECT * FROM (VALUES (1, 100, 'test1'),(1, 200, 'test2'),(2, 200, 'test3'),(2, 400, 'test4'),(2, 600, 'test5') ) AS t(col1, col2, col3);"

> {"job":"1838631936","job_id":"1838631936","database":"sample_test_db"}

curl --location 'https://console.treasuredata.com/v3/job/result/1838631936' --header 'Content-Type: application/json' --header 'Authorization: TD1 123/abcdef•••••••••••••••••••••••0123456789'

>
1,100,test1
1,200,test2
2,200,test3
2,400,test4
2,600,test5
```


```bash TD CLI Example
td query -d  sample_db_test -T presto -w "SELECT * FROM (VALUES (1, 100, 'test1'),(1, 200, 'test2'),(2, 200, 'test3'),(2, 400, 'test4'),(2, 600, 'test5')) AS t(col1, col2, col3)"

>
Job 1838434823 is queued.
Use 'td job:show 1838434823' to show the status.
  started at 2023-06-06T21:00:40Z
  presto version: 350
  executing query: SELECT * FROM (VALUES (1, 100, 'test1'),(1, 200, 'test2'),(2, 200, 'test3'),(2, 400, 'test4'),(2, 600, 'test5')) AS t(col1, col2, col3)
  .
  .
  .
  .
Status      : success
Result      :
WARNING: the job result is being downloaded...
+------+------+-------+
| col1 | col2 | col3  |
+------+------+-------+
| 1    | 100  | test1 |
| 1    | 200  | test2 |
| 2    | 200  | test3 |
| 2    | 400  | test4 |
| 2    | 600  | test5 |
+------+------+-------+
5 rows in set
```

The advantage of this approach is that you can share the example with your colleague without the existing table.
Of course, if you want a few dozen rows of test data, it may be easier to prepare a table instead of this example.

## Hive Case

In the case of Hive, you can prepare multiple rows by using `STACK`. Unlike `VALUES` in Presto, you need to specify the number of rows as the first argument and not enclose each row in parenthesis.


```sql
SELECT *
  FROM (SELECT STACK(5,  -- the number of rows
                     1, 100, 'test1',  -- first row
                     1, 200, 'test2',  -- second row
                     2, 200, 'test3',  -- third row
                     2, 400, 'test4',  -- fourth row
                     2, 600, 'test5'   -- fifth row
                    ) AS (col1, col2, col3)  -- define column names
       ) AS t
;
```

Here is an example of the output using TD Console:

![Hive Multiple Line SQL Test](/assets/multlinesqlconsolehive.379460897cb34459926b4eb85afb9fbdc1cbf38da42bc7ec2ccf593d13ec486d.c5c3197e.png)

Here is an example of the output using the TD CLI or cURL:


```bash Curl Example
curl --location --request POST 'https://api.treasuredata.com/v3/job/issue/hive/sample_test_db' --header 'Accept: application/json' --header 'Authorization: TD1 123/abcdef•••••••••••••••••••••••0123456789' -d "query=SELECT * FROM (SELECT STACK (5,1, 100, 'test1',1, 200, 'test2',2, 200, 'test3',2, 400, 'test4',2, 600, 'test5') AS (col1, col2, col3) ) AS t;"

> {"job":"1838639145","job_id":"1838639145","database":"sample_test_db"}

curl --location 'https://console.treasuredata.com/v3/job/result/1838639145' --header 'Content-Type: application/json' --header 'Authorization: TD1 123/abcdef•••••••••••••••••••••••0123456789'

>
1,100,test1
1,200,test2
2,200,test3
2,400,test4
2,600,test5
```


```bash TD CLI Example
td query -d  sample_db_test -T hive -w "SELECT * FROM (SELECT STACK(5,1, 100,'test1',1, 200,'test2',2, 200,'test3',2, 400,'test4',2, 600,'test5')AS(col1, col2, col3)) AS t;"

>
Job 1838369237 is queued.
Use 'td job:show 1838369237' to show the status.
  Hive history file=/mnt/hive/tmp/3867/hive_job_log_5c427fc1-ee26-4e52-9d8f-fa0c308e206e_123537841.txt
  Job is running in resource pool: hadoop2 with priority: default
  Execution engine version: 2020.1
  TD Hive worker version: release-0.2.4-11028
  .
  .
  .
  .
Status      : success
Result      :
+------+------+-------+
| col1 | col2 | col3  |
+------+------+-------+
| 1    | 100  | test1 |
| 1    | 200  | test2 |
| 2    | 200  | test3 |
| 2    | 400  | test4 |
| 2    | 600  | test5 |
+------+------+-------+
5 rows in set
```

# Summary

The techniques described in this article will help you when you attempt to confirm the behavior of a function or query statement using either Hive or Presto. When you encounter unexpected behavior, use these methods to reproduce the behavior, and then you can share the query statement with your team or with support.