Trino(Presto) Troubleshooting

Review one or more of the following procedures to help you troubleshoot Trino(Presto) issues:

Troubleshooting Trino(Presto) Query Runtime Limits and Query Hint Override
Troubleshooting Trino(Presto) Queries
- Start with a smaller data set and review subqueries
- Create a Sample Data Set with a VALUES Clause
Troubleshooting Trino(Presto) Errors

Troubleshooting Trino(Presto) Query Runtime Limits and Query Hint Override

By default, the Trino(Presto) service limits query execution time to 6 hours. You can manage the Trino(Presto) query execution time in several ways.
You might want to limit some or all Trino(Presto) queries to run for a shorter period. For example:

You can limit execution time for ad hoc queries by default, to avoid unexpectedly high Trino(Presto) costs.
You can push users to choose Hive over Trino(Presto) for large and long-running data pipeline jobs.

Trino(Presto) Query Limits:

Policies to Limit Query Execution Time
Limiting Execution Time of a Single Query with Query Hints
Using Query Hints to Extend Query Runtime Beyond Configured Limits

Policies to Limit Query Execution Time

Contact Support to request custom limits on the following types of queries:

Ad hoc queries from the Console, and queries submitted through ODBC and JDBC
Queries that run as scheduled queries or as steps in workflows
Queries explicitly assigned to any named resource pool

The limits apply to queries created by any user.

Limiting Execution Time of a Single Query with Query Hints

If you want to override any other limits for a single query, you can apply the following query hint at the beginning of the query:

Copy

Copied

-- SET SESSION query_max_execution_time = '6h'
SELECT 
  COUNT(*)
FROM
  ALL_EVENTS
;

If limits are set in your account, and you write a query with a hint, then the smallest limit between account, resource-group, query-hint, and global is used.

Supported time measures are d (days), h (hours), m (minutes), s (seconds).
Note: All Trino(Presto) queries are limited to 6 hours runtime, regardless of any account-level policy or query hint.

If the limit specified in the query hint is longer than the configured limit, then the shorter configured limit still applies. For example, if a 1-hour limit is configured for ad hoc queries, then you can use the magic comment to limit a query to 5 minutes (5m) but not to increase the limit to 2 hours (2h).

Using Query Hints to Extend Query Runtime Beyond Configured Limits

You can run a query in a special overcommit resource pool if you want to extend the runtime beyond the query's normal limits. In this case, the limit specified in the query hint is greater than the query's normal limit.

For example, if there is a 30-minute limit configured on scheduled queries, you can run a longer query with the following query hints:

Copy

Copied

-- set property resource_group = 'overcommit'
-- set session query_max_execution_time = '1d'

Troubleshooting Trino(Presto) Queries

Typically, when Trino(Presto) queries running on Treasure Data have problems the following solutions can be tried:

Start with a smaller data set and review subqueries
Create a Sample Data Set with a VALUES Clause

If they do not solve your particular issue, there is always e-mail to support@treasuredata.com with the job IDs of your queries. If possible, include information about the expected results and the meaning of your data set.

Start with a smaller data set and review subqueries

Trino(Presto) can process millions of rows in a second. If you see query errors or unexpected results, try to minimize your input data set. Here are some typical ways you can refine your query:

Narrow down the input data size by using TD_TIME_RANGE
Create a sample table by using CREATE TABLE AS SELECT … WHERE ….
- Review: Use CREATE TABLE AS SELECT.
Minimize your query:
- Remove irrelevant columns from the SELECT statements
- Reduce the number of joins
- Replace SELECT c1, c2, … with SELECT count( * )
  - If you are still experiencing query errors, skipping the query result generation with count( * ) will reduce your investigation time.
Extract sub-queries and check their results one by one.
- Running individual subqueries can help identify the performance bottleneck.
Check the conditions in the WHERE clause carefully.
- Checking a NULL value using the equal (=) operator always returns false. Instead, use IS NULL (or IS NOT NULL).
- The following query scans the entire data set even if TD _ TIME _ RANGE is used:

Copy

Copied

SELECT * FROM table1
WHERE
  col1 < 100
  OR col2 is TRUE
  AND TD_TIME_RANGE(time, '2015-11-01')

The full data set scan occurs because AND has a stronger precedence than OR. This condition is equivalent to:

Copy

Copied

(col1 < 100) OR (col2 is TRUE AND TD_TIME_RANGE(time, '2015-11-01'))

The first condition does not specify any TD_TIME_RANGE, so it results in scanning the whole table. To fix this, use parenthesis appropriately:

Copy

Copied

(col1 < 100 OR col2 is TRUE) AND TD_TIME_RANGE(time, '2015-11-01')

Create a Sample Data Set with a VALUES Clause

To test Trino(Presto) queries, use the VALUES clause to prepare a sample data set. You don’t need to import a data set. For example:

Copy

Copied

SELECT * FROM (VALUES (1, 'apple'), (2, 'banana')) as fruit(id, name);

# This query gives the following table:
#  id |  name
#-----+--------
#   1 | apple
#   2 | banana

Troubleshooting Trino(Presto) Errors

Review the following sections to troubleshoot various Trino(Presto) error codes.

Trino(Presto) error: Concatenated string is too large
Trino(Presto) error: mismatched input 'xxxx' expecting .. SYNTAX_ERROR
Trino(Presto) Error: second argument of max_n/min_n must be less than or equal to 10000; found xxxx
Trino(Presto) error: column 'xxxx' cannot be resolved ... SYNTAX_ERROR
Trino(Presto) error: Unexpected parameters (xxxx, xxxx) for function yyyy. Expected: yyyy(zzzz, zzzz)
TOO_MANY_REQUESTS_FAILED
Invalid Table Alias or Column Reference Error
Trino(Presto) Reliability and PAGE_TRANSPORT_TIMEOUT Error
QUERY_HAS_TOO_MANY_STAGES Error

Trino(Presto) error - Concatenated string is too large

This error is caused by using the concat function ( concat or || ) to concatenate strings that have a combined length greater than 1048576 characters.

Resolution

As a workaround, break your concat statement into two or more statements that each yeild fewer than 1048576 total characters.

Trino(Presto) error - mismatched input xxxx expecting - SYNTAX ERROR

Description

For example, the query below will simply cause the error in title:

Example

Copy

Copied

SELECT * from tbl SELECT;

Error Message

Copy

Copied

line 1:15: mismatched input 'SELECT'. Expecting: '(', 'LATERAL', 'UNNEST', <identifier>

Cause

A keyword is detected in SQL, while it should not be there as per SQL grammar. As result a compile error as titled, will be output.

Resolution

Locate the keyword by information in error message (in this case "line 1:15"), then check and fix it.

Copy

Copied

SELECT * from tbl;

Trino(Presto) Error - second argument of max n min n must be less than or equal to 10000 - found xxxx

Description

Trino(Presto) can take a 2nd/3rd argument (n) to get largest value of all input values in the following 4 functions:

min(col, n)
max(col, n)
min _ by(col1, col2, n)
max _ by(col1, col2, n)

However, Presto restricts n to 10000 or less since v0.194.

Cause

When calling the above functions, using a large value for n can cause performance issues in Trino(Presto) clusters. As a result, Trino(Presto) will return an error when a value greater than 10000 is for used for n.

Resolution

Assign a value less than 10000 to n, when you call any of the 4 functions.

Related Information

Trino(Presto) error - column xxxx cannot be resolved - SYNTAX ERROR

If you receive this error message, you can take actions to resolve the issue.

Description

The following is an example.

Table Definition:

Based on the example, the following queries can cause the error: Column 'xxxx' cannot be resolved:

Copy

Copied

SELECT id1 FROM tbl;
Column 'id1' cannot be resolved

Copy

Copied

SELECT id FROM tbl WHERE id="123" ;
Column '123' cannot be resolved

Cause

Column 'id1' does not exist (the column specified in the SQL query is incorrect).

Double quotations are used to contain the string in the WHERE clause. Double quotations can be used only to specify the column name.

Resolution

Use correct column name in SQL query.

Copy

Copied

SELECT id FROM tbl;

Use single quotations around strings values.

Do not use any quotations when specifying a number.

Copy

Copied

SELECT id FROM tbl WHERE id=123 ;

Trino(Presto) error - Unexpected parameters for function yyyy

This example query would cause the following error:

Copy

Copied

SELECT td_time_format(time) from pageviews;

Error:

Copy

Copied

Unexpected parameters (bigint) for function td_time_format. 
Expected: td_time_format(bigint, varchar, varchar) , td_time_format(bigint, varchar)

Cause

This error occurs when the number of arguments or the data type provided in the function is different from those specified for that function.

Resolution

Provide the correct number and type of arguments required for the function. See the notes for the correct number and type of arguments.

Fixed query:

Copy

Copied

SELECT td_time_format(time, 'yyyy-MM-dd HH:mm:ss', 'UTC') from pageviews;

TOO MANY REQUESTS FAILED

Error Example:

Copy

Copied

Query 20160519_223238_81890_gf734 failed: TOO_MANY_REQUESTS_FAILED: Failed to read f1a04ceb-c217-4ee4-ba4f-17b10c78ec9f.

Description

The TOO_MANY_REQUESTS_FAILED error indicates that there are many data reading errors. Trino(Presto) itself does not have fault tolerance, so in this situation, it will exit with an error once. By re-executing the query in TD, the query itself will be successful. Therefore, the log will be cleared once, but the Trino(Presto) query will be re-executed after a few minutes.

Cause

The message happens when our internal error increased. But, presto job retries automatically, so the job succeeded.

Resolution

Trino(Presto) job retries automatically or re-execute the query in TD.

Trino(Presto) Reliability and PAGE TRANSPORT TIMEOUT Error

Trino(Presto) transfers the data through the network. Because connection timeouts or problems in worker nodes, this network data transfer may occasionally fail with the PAGE_TRANSPORT_TIMEOUT error.

Trino(Presto) is designed for faster query processing when compared to Hive, so it sacrifices fault-tolerance, to some extent. Typically, more than 99.5% of Trino(Presto) queries finish without any error on the first run.

Treasure Data provides a query retry mechanism on query failures, so nearly 100% of queries finish successfully after being retried.

Because of the nature of network-based distributed query processing, if your query tries to process billions of records, the chance of hitting network failures increases. If you start seeing the PAGE_TRANSPORT_TIMEOUT error frequently, try to reduce the input data size by narrowing down the TD_TIME_RANGE or reducing the number of columns in SELECT statements.

QUERY HAS TOO MANY STAGES Error

Error Example:

Copy

Copied

Query 20250314_041639_00000_vc9wt failed: Number of stages in the query (2515) exceeds the allowed maximum (2500).

Description

The QUERY_HAS_TOO_MANY_STAGES error indicates that the query has too many stages. A large number of stages will introduce instability in the cluster. The maximum number of stages in a query plan is capped at 2500 in TD.

Cause

The message happens when a query generates more stages than 2500.

Resolution

Rewrite the query not to exceed the allowed maximum. If the query contains WITH clauses that are referenced more than once, please consider creating temporary table(s) for the queries in those clauses.