Presto Troubleshooting

Review one or more of the following procedures to help you troubleshoot Presto issues:

Troubleshooting Presto Query Runtime Limits and Query Hint Override

By default, the Presto service limits query execution time to 6 hours. You can manage the Presto query execution time in several ways.
You might want to limit some or all Presto queries to run for a shorter period. For example:

  • You can limit execution time for ad hoc queries by default, to avoid unexpectedly high Presto costs.
  • You can push users to choose Hive over Presto for large and long-running data pipeline jobs.

Presto Query Limits:

Policies to Limit Query Execution Time

Contact Support to request custom limits on the following types of queries:

  • Ad hoc queries from the Console, and queries submitted through ODBC and JDBC
  • Queries that run as scheduled queries or as steps in workflows
  • Queries explicitly assigned to any named resource pool

The limits apply to queries created by any user.

Limiting Execution Time of a Single Query with Query Hints

If you want to override any other limits for a single query, you can apply the following query hint at the beginning of the query:

Copy
Copied
-- SET SESSION query_max_execution_time = '6h'
SELECT 
  COUNT(*)
FROM
  ALL_EVENTS
;

If limits are set in your account, and you write a query with a hint, then the smallest limit between account, resource-group, query-hint, and global is used.

Supported time measures are d (days), h (hours), m (minutes), s (seconds).
Note: All Presto queries are limited to 6 hours runtime, regardless of any account-level policy or query hint.

If the limit specified in the query hint is longer than the configured limit, then the shorter configured limit still applies. For example, if a 1-hour limit is configured for ad hoc queries, then you can use the magic comment to limit a query to 5 minutes (5m) but not to increase the limit to 2 hours (2h).

Using Query Hints to Extend Query Runtime Beyond Configured Limits

You can run a query in a special overcommit resource pool if you want to extend the runtime beyond the query's normal limits. In this case, the limit specified in the query hint is greater than the query's normal limit.

For example, if there is a 30-minute limit configured on scheduled queries, you can run a longer query with the following query hints:

Copy
Copied
-- set property resource_group = 'overcommit'
-- set session query_max_execution_time = '1d' 

Troubleshooting Presto Queries

Typically, when Presto queries running on Treasure Data have problems the following solutions can be tried:

If they do not solve your particular issue, there is always e-mail to support@treasuredata.com with the job IDs of your queries. If possible, include information about the expected results and the meaning of your data set.

Start with a smaller data set and review subqueries

Presto can process millions of rows in a second. If you see query errors or unexpected results, try to minimize your input data set. Here are some typical ways you can refine your query:

  • Narrow down the input data size by using TD_TIME_RANGE
  • Create a sample table by using CREATE TABLE AS SELECT … WHERE ….
    • Review: Use CREATE TABLE AS SELECT.
  • Minimize your query:
    • Remove irrelevant columns from the SELECT statements
    • Reduce the number of joins
    • Replace SELECT c1, c2, … with SELECT count( * )
      • If you are still experiencing query errors, skipping the query result generation with count( * ) will reduce your investigation time.
  • Extract sub-queries and check their results one by one.
    • Running individual subqueries can help identify the performance bottleneck.
  • Check the conditions in the WHERE clause carefully.
    • Checking a NULL value using the equal (=) operator always returns false. Instead, use IS NULL (or IS NOT NULL).
    • The following query scans the entire data set even if TD _ TIME _ RANGE is used:
Copy
Copied
SELECT * FROM table1
WHERE
  col1 < 100
  OR col2 is TRUE
  AND TD_TIME_RANGE(time, '2015-11-01')

The full data set scan occurs because AND has a stronger precedence than OR. This condition is equivalent to:

Copy
Copied
(col1 < 100) OR (col2 is TRUE AND TD_TIME_RANGE(time, '2015-11-01'))

The first condition does not specify any TD_TIME_RANGE, so it results in scanning the whole table. To fix this, use parenthesis appropriately:

Copy
Copied
(col1 < 100 OR col2 is TRUE) AND TD_TIME_RANGE(time, '2015-11-01')

Create a Sample Data Set with a VALUES Clause

To test Presto queries, use the VALUES clause to prepare a sample data set. You don’t need to import a data set. For example:

Copy
Copied
SELECT * FROM (VALUES (1, 'apple'), (2, 'banana')) as fruit(id, name);

# This query gives the following table:
#  id |  name
#-----+--------
#   1 | apple
#   2 | banana

Troubleshooting Presto Errors

Review the following sections to troubleshoot various Presto error codes.

Presto error - Concatenated string is too large

This error is caused by using the concat function ( concat or || ) to concatenate strings that have a combined length greater than 1048576 characters.

Resolution

As a workaround, break your concat statement into two or more statements that each yeild fewer than 1048576 total characters.

Presto error - mismatched input xxxx expecting - SYNTAX ERROR

Description

For example, the query below will simply cause the error in title:

Example

Copy
Copied
SELECT * from tbl SELECT;

Error Message

Copy
Copied
line 1:15: mismatched input 'SELECT'. Expecting: '(', 'LATERAL', 'UNNEST', <identifier>

Cause

A keyword is detected in SQL, while it should not be there as per SQL grammar. As result a compile error as titled, will be output.

Resolution

Locate the keyword by information in error message (in this case "line 1:15"), then check and fix it.

Copy
Copied
SELECT * from tbl;

Presto Error - second argument of max n min n must be less than or equal to 10000 - found xxxx

Description

Presto can take a 2nd/3rd argument (n) to get largest value of all input values in the following 4 functions:

  • min(col, n)
  • max(col, n)
  • min _ by(col1, col2, n)
  • max _ by(col1, col2, n)

However, Presto restricts n to 10000 or less since v0.194.

Cause

When calling the above functions, using a large value for n can cause performance issues in Presto clusters. As a result, Presto will return an error when a value greater than 10000 is for used for n.

Resolution

Assign a value less than 10000 to n, when you call any of the 4 functions.

Related Information

Presto error - column xxxx cannot be resolved - SYNTAX ERROR

If you receive this error message, you can take actions to resolve the issue.

Description

The following is an example.

Table Definition:

image

Based on the example, the following queries can cause the error: Column 'xxxx' cannot be resolved:

Copy
Copied
SELECT id1 FROM tbl;
Column 'id1' cannot be resolved

or

Copy
Copied
SELECT id FROM tbl WHERE id="123" ;
Column '123' cannot be resolved

Cause

Column 'id1' does not exist (the column specified in the SQL query is incorrect).

or

Double quotations are used to contain the string in the WHERE clause. Double quotations can be used only to specify the column name.

Resolution

Use correct column name in SQL query.

Copy
Copied
SELECT id FROM tbl;

or

Use single quotations around strings values.

Do not use any quotations when specifying a number.

Copy
Copied
SELECT id FROM tbl WHERE id=123 ;

Presto error - Unexpected parameters for function yyyy

This example query would cause the following error:

Copy
Copied
SELECT td_time_format(time) from pageviews;

Error:

Copy
Copied
Unexpected parameters (bigint) for function td_time_format. 
Expected: td_time_format(bigint, varchar, varchar) , td_time_format(bigint, varchar)

Cause

This error occurs when the number of arguments or the data type provided in the function is different from those specified for that function.

Resolution

Provide the correct number and type of arguments required for the function. See the notes for the correct number and type of arguments.

Fixed query:

Copy
Copied
SELECT td_time_format(time, 'yyyy-MM-dd HH:mm:ss', 'UTC') from pageviews;

TOO MANY REQUESTS FAILED

Error Example:

Copy
Copied
Query 20160519_223238_81890_gf734 failed: TOO_MANY_REQUESTS_FAILED: Failed to read f1a04ceb-c217-4ee4-ba4f-17b10c78ec9f. 

Description

The TOO_MANY_REQUESTS_FAILED error indicates that there are many data reading errors. Presto itself does not have fault tolerance, so in this situation, it will exit with an error once. By re-executing the query in TD, the query itself will be successful. Therefore, the log will be cleared once, but the Presto query will be re-executed after a few minutes.

Cause

The message happens when our internal error increased. But, presto job retries automatically, so the job succeeded.

Resolution

Presto job retries automatically or re-execute the query in TD.

Invalid Table Alias or Column Reference Error

Description

Job failed with the following error message"

FAILED: SemanticException [Error 10004]: Line 3:29 Invalid table alias or column reference 'order_id': (possible column names are: _c0, _c1)

Failed Query

Copy
Copied
SELECT
TD_DATE_TRUNC(
'day',
CAST(
CONV(SUBSTR(order_id,
0,
8),
16,
10) AS BIGINT
),
'America/Los_Angeles'
),
COUNT(*)
FROM
pj_db.pj_table
GROUP BY
1
ORDER BY
1

Cause

The issue caused by the Hive ORDER BY with a column number. Table aliases in order by clause lead to semantic analysis failure

Solution

Use column alias in order by instead of column number

Working Query

Copy
Copied
SELECT
TD_DATE_TRUNC(
'day',
CAST(
CONV(SUBSTR(order_id,
0,
8),
16,
10) AS BIGINT
),
'America/Los_Angeles'
),
COUNT(*)
FROM
pj_db.pj_table
GROUP BY
1
ORDER BY
ORDER_ID

Presto Reliability and PAGE TRANSPORT TIMEOUT Error

Presto transfers the data through the network. Because connection timeouts or problems in worker nodes, this network data transfer may occasionally fail with the PAGE_TRANSPORT_TIMEOUT error.

Presto is designed for faster query processing when compared to Hive, so it sacrifices fault-tolerance, to some extent. Typically, more than 99.5% of Presto queries finish without any error on the first run.

Treasure Data provides a query retry mechanism on query failures, so nearly 100% of queries finish successfully after being retried.

Because of the nature of network-based distributed query processing, if your query tries to process billions of records, the chance of hitting network failures increases. If you start seeing the PAGE_TRANSPORT_TIMEOUT error frequently, try to reduce the input data size by narrowing down the TD_TIME_RANGE or reducing the number of columns in SELECT statements.