Accessing Trino 423

Trino 423 is the current version of Trino (Presto) now available for Treasure Data queries. The new features and improvements are

  • Performance Improvements
  • New UDFs
  • New SQL support
  • JDBC Improvements
  • tzdata version update

Performance Improvements

Trino 423 realizes performance issues in the following areas:

  • join
    • improved join performance. ( #5981 , #8974 , #13352 )
    • improved performance of certain queries involving LEFT, RIGHT and FULL JOIN when one side of the join is known to produce a single row. ( #7090 )
    • improved performance of queries that contain joins on varchar keys of different length. ( #7644 )
    • improved performance of inequality joins. ( #9307 )
    • improved performance of joins involving a small table on one side. ( #9851 )
    • improved performance of queries that contain JOIN and UNION clauses. ( #11935 )
    • improved join performance when one side of the join is small. ( #12257 )
    • improved performance of queries involving joins on a single bigint column. ( #13432 )
    • improved planning performance for queries involving joins. ( #17458 )
  • in, exists
    • improved performance for queries using IN predicate with moderate to large number of constants. ( #8833 )
    • improved performance of IN ( < subquery > ) expressions. ( #8639 )
    • improved query planning performance for queries containing large IN predicates. ( #9874 )
    • improved planning performance of queries with large IN lists. ( #11902 , #11918 , #11956 )
    • improved performance of queries involving correlated IN or EXISTS predicates. ( #12047 )
  • group by
    • improved performance of GROUP BY with a large number of groups. ( #11011 , #11361 )
    • improved performance of queries that contain GROUP BY clauses. ( #12095 )
    • improved planning performance for queries with many GROUP BY clauses. ( #15292 )
  • functions
    • improved performance of queries using rank() window function. ( #6333 )
    • improved performance of sum() and avg() for decimal types. ( #6591 , #8878 )
    • improved performance of decimal aggregations. ( #9640 , #13573 )
    • improved performance of queries that contain array_distinct(array_sort(…)) expressions. ( #8777 )
    • improved performance of queries with window functions. ( #15994 )
    • improved performance of queries with row_number() and rank() window functions. ( #16753 )
  • types
    • improved performance of specific queries which compare table columns of type timestamp with date literals. ( #11170 )
    • improved performance of queries that compare date columns with timestamp(n) with time zone literals. ( #5798 )
    • improved performance of queries that process row or array data. ( #9402 )
    • improved performance of map and row types. ( #10469 )
    • improved read performance for row data types. ( #12926 )
    • improved performance of queries involving row type or certain aggregations such as sum, avg, etc. ( #12762 )
    • improved performance of queries that process string data. ( #12798 )
    • improved performance of queries with an UNNEST clause. ( #10506 )
    • improved performance of LIKE expressions. ( #15999 )
    • improved performance of UNION ALL queries. ( #17265 )

New UDFs

The following new UDFs are available in Trino 423:

  • version() ( #4627 )
  • to_geojson_geometry() and from_geojson_geometry() ( #6355 )
  • soundex() ( #4022 )
  • format_number() ( #1878 )
  • geometry_nearest_points() . ( #8280 )
  • support for listagg() . ( #4835 )
  • contains function to check whether a CIDR contains an IP address. ( #9654 )
  • trim_array() ( #11238 )
  • to_base32() and from_base32() ( #11439 )
  • json_exists , json_query , json_value , json_array , and json_object JSON functions . ( #9081 )
  • sinh() ( #16494 )
  • quantile_at_value() ( #16736 )
  • array_histogram() ( #14725 )
  • support for any_value() ( #17777 )

New SQL support

The following SQL support has been added:

JDBC improvements

Trino 423 implements the following method:

  • PreparedStatement.getParameterMetaData() ( #2978 )

tzdata

Trino 423 updates the tzdata version. With this new version, the Pacific/Kantontime zone is now supported; however, as a result, queries can no longer reference the US/Pacific-New zone.(#6660,#10679)

Migration Guide

Currently, the default Presto query engine is Presto 350. You can specify the version of Trino/Presto you want to use by prefacing your SQL query with a "magic comment:"

Presto 350

Copy
Copied
-- @TD engine_version: 350 

Trino 423

Copy
Copied
-- @TD engine_version: 423 

Using pytd, you can add the magic comment in the query with db parameter as shown here.

Copy
Copied
client.query('-- @TD engine_version: 350 \n select * from nasdaq limit 10', db='sample_datasets')

Trino 423 Limitations and Differences

While most existing Presto 350 queries are compatible with Trino 423, some queries will require changes.

Casting from double or real to varchar

Follow the SQL standard when casting from double or real to varchar.

350423
Copy
Copied
presto> with t as (select * from (VALUES DOUBLE '10.3', DOUBLE '100.0') AS t (k)) select CAST("k" AS VARCHAR) from t;
 _col0 
-------
 10.3  
 100.0 
(2 rows)
Copy
Copied
trino> with t as (select * from (VALUES DOUBLE '10.3', DOUBLE '100.0') AS t (k)) select CAST("k" AS VARCHAR) from t;
 _col0  
--------
 1.03E1 
 1.0E2  
(2 rows)
To get the same result as 350, you need to cast to decimal:

If you want to get the same result as Presto 350, you need to cast to decimal:

Copy
Copied
trino> with t as (select * from (VALUES DOUBLE '10.3', DOUBLE '100.0') AS t (k)) select CAST(CAST("k" AS DECIMAL(10,1)) AS VARCHAR) from t;
 _col0 
-------
 10.3  
 100.0 
(2 rows)

Handling return type from unixtime

When using time zone, change return type of from_unixtime(), and from_unixtime_nanos(), to timestamp(p) with time zone.

350423
Copy
Copied
presto> select from_unixtime(time) from sample_datasets.www_access order by time limit 1;
          _col0          
-------------------------
 2014-10-03 07:20:45.000 
(1 row)

presto> select from_unixtime_nanos(1699503369586210000);
          _col0          
-------------------------
 2023-11-09 04:16:09.586 
(1 row)
Copy
Copied
trino> select from_unixtime(time) from sample_datasets.www_access order by time limit 1;
               _col0                
------------------------------------
 2014-10-03 16:20:45.000 Asia/Tokyo 
(1 row)

trino> select from_unixtime_nanos(1699503369586210000);
                  _col0                   
------------------------------------------
 2023-11-09 13:16:09.586210000 Asia/Tokyo 
(1 row)

Differences in to_unixtime

In the Treasure Data implementation of Trino 423 the to_unixtime function continues to operate in the same way as Presto 350.

Be aware, however, that the Trino 423 specification introduced new timestamp semantics that required timestamps to be cast to timestapm(p) with time zone before being passed to to_unixtime.

Casting from row to json does not return a JSON array in Trino 423

350423
Copy
Copied
presto> select cast(row("code", "host", "time") as json) from sample_datasets.www_access;
               _col0                
------------------------------------
 [200,"132.174.221.142",1412374112] 
 [200,"104.216.96.28",1412374101]   
 ...
Copy
Copied
trino> select cast(row("code", "host", "time") as json) from sample_datasets.www_access;
                    _col0                    
---------------------------------------------
 {"":200,"":"60.183.134.133","":1412366395}  
 {"":200,"":"76.45.157.42","":1412366383}    
...

Invalid timestamp literals can't be casted

In Presto 350, an invalid timestamp literal can be casted. However, this is a bug and has been fixed in Trino 423.

350423
Copy
Copied
presto> select cast('2023-12-13 14.18.34.000' as timestamp);
          _col0          
-------------------------
 2023-12-13 00:00:00.000 
(1 row)

# The literals are converted with due validation, so the same literal raises an error

presto> select TIMESTAMP '2023-12-13 14.18.34.000';
Query 20231213_052357_26435_7amdj failed: line 1:8: '2023-12-13 14.18.34.000' is not a valid timestamp literal
Copy
Copied
# In Trino 423, the query fails with an error.

trino> select cast('2023-12-13 14.18.34.000' as timestamp);
Query 20231213_053737_00008_abjfn failed: Value cannot be cast to timestamp: 2023-12-13 14.18.34.000

Additionally, when using Trino 423, ISO-8601 (cast to timestamp) also fails. (In Presto 350, casting ISO-8601 to timestamp is not correct.)

350423
Copy
Copied
# incorrect

presto> SELECT cast('2023-12-13T14:18:34+0000' as timestamp);
          _col0          
-------------------------
 2023-12-13 00:00:00.000 
(1 row)

# correct
presto> SELECT from_iso8601_timestamp('2023-12-13T14:18:34+0000');
            _col0            
-----------------------------
 2023-12-13 14:18:34.000 UTC 
(1 row)
Copy
Copied
# In 423, the query fails.

trino> SELECT cast('2023-12-13T14:18:34+0000' as timestamp);
Query 20231213_060050_00012_abjfn failed: Value cannot be cast to timestamp: 2023-12-13T14:18:34+0000
 
# correct
trino> SELECT from_iso8601_timestamp('2023-12-13T14:18:34+0000');
            _col0            
-----------------------------
 2023-12-13 14:18:34.000 UTC 
(1 row)

Generally, use

  • from_iso8601_timestamp
  • from_iso8601_timestamp_nanos
  • from_iso8601_date
  • to_iso8601

Changes to the handling of row expressions for IN predicates, quantified comaprisons, and scalar subqueries

Trino 423 fixed the handling of row expressions for IN predicates, quantified comparisons and scalar subqueries.

350423
Copy
Copied
# incorrect behavior


presto> SELECT
     ->   time, (code, host)
     -> FROM 
     ->   sample_datasets.www_access 
     -> WHERE
     ->   TD_TIME_RANGE(time, '2014-10-04 01:13:00', '2014-10-04 01:13:05') 
     ->   AND (time, code, host) IN
     ->   (
     ->     SELECT (time, code, host) FROM sample_datasets.www_access
     ->   );
    time    |         _col1         
------------+-----------------------
 1412385182 | {200, 40.81.151.94}   
 1412385184 | {200, 216.51.135.153} 
 1412385180 | {200, 76.75.28.148}   
(3 rows)
Copy
Copied
# In 423, the query fails with a type mismatch error.

trino> SELECT
    ->   time, (code, host)
    -> FROM 
    ->   sample_datasets.www_access 
    -> WHERE
    ->   TD_TIME_RANGE(time, '2014-10-04 01:13:00', '2014-10-04 01:13:05') 
    ->   AND (time, code, host) IN
    ->   (
    ->     SELECT (time, code, host) FROM sample_datasets.www_access
    ->   );
Query 20231110_073823_00001_ymnng failed: line 7:26: Value expression and result of subquery must be of the same type: row(bigint, bigint, varchar) vs row(row(bigint, bigint, varchar))

#  Need to change:
#    (time, code, host) IN (SELECT (time, code, host) FROM sample_datasets.www_access)
#  to:
#    (time, code, host) IN (SELECT time, code, host FROM sample_datasets.www_access)

Differences in casting numeric values to varchar(n)

In Presto 350, even when the varchar type length was too short, a query was successful. In Trino 423, the query fails with an invalid cast argument error.

Numeric Values

350423
Copy
Copied
presto> select CAST(12345678900 AS varchar(11));
    _col0    
-------------
 12345678900 
(1 row)

presto> select CAST(12345678900 AS varchar(50));
    _col0    
-------------
 12345678900 
(1 row)

presto> select CAST(12345678900 AS varchar(10));
   _col0    
------------
 1234567890 
(1 row)
Copy
Copied
trino> select CAST(12345678900 AS varchar(11));
    _col0    
-------------
 12345678900 
(1 row)

trino> select CAST(12345678900 AS varchar(50));
    _col0    
-------------
 12345678900 
(1 row)

trino> select CAST(12345678900 AS varchar(10));
Query 20240130_084534_31703_24gcm failed: Value 12345678900 cannot be represented as varchar(10)

Date

350423
Copy
Copied
presto> select CAST(DATE '2013-02-02' AS varchar(10));
   _col0    
------------
 2013-02-02 
(1 row)

presto> select CAST(DATE '2013-02-02' AS varchar(50));
   _col0    
------------
 2013-02-02 
(1 row)

presto> select CAST(DATE '2013-02-02' AS varchar(9));
   _col0   
-----------
 2013-02-0 
(1 row)
Copy
Copied
trino> select CAST(DATE '2013-02-02' AS varchar(10));
   _col0    
------------
 2013-02-02 
(1 row)

trino> select CAST(DATE '2013-02-02' AS varchar(50));
   _col0    
------------
 2013-02-02 
(1 row)

trino> select CAST(DATE '2013-02-02' AS varchar(9));
Query 20240131_022940_06992_24gcm failed: Value 2013-02-02 cannot be represented as varchar(9)

Encoding Values For example, now() returns a timestamp(3) with time zone type value.

350423
Copy
Copied
select cast((now()) AS varchar(10));
  // => 2024-01-31
Copy
Copied
select cast((now()) AS varchar(10));
  // => Query 20240131_024017_07317_24gcm failed: Value [2024-01-31 02:40:17.842 UTC] does not fit in type varchar(10)

UNNEST ARRAY with 255+ items causes an error

According to the Trino community,

Due to the way arrays literals are modeled in Trino, there’s currently a physical limitation on how many items you can have in an array declaration. It’s possible that before it worked “by chance” based on when the array declaration was being converted into a function call and which optimizer rules ran in between.

Consider using the alternatives listed here.

CREATE TABLE and DROP TABLE no longer return result sets

In earlier versions of Presto, DDL queries would return outputs. With Trino 423, CREATE TABLE and DROP TABLE no longer return result sets.

In case you use pytd library, update to v1.5.2or later to support this result sets handling. Earlier versions will returnTypeError: 'NoneType' object is not iterable` when pytd fetches a result of CREATE TABLE and DROP TABLE.

information_schema.role_authorization_descriptors table has been removed

Becasue the table was only intended to be used in the SQL views that define the actual tables, the information_schema.role_authorization_descriptors table has been removed.

Name of catalog changes

In 423, the catalog is td. However, to ensure backward compatibility, td-presto continues as an alias for td. The same URL works as is in 423.

However, be aware of the alias does not display in the results you get back from information_schema or system.jdbc.

information_schema

350423
Copy
Copied
presto> select table_catalog, table_schema, table_name from information_schema.tables where table_schema = 'sample_datasets' and table_name = 'www_access';
 table_catalog |  table_schema   | table_name 
---------------+-----------------+------------
 td-presto     | sample_datasets | www_access 
(1 row)

presto> select table_catalog, table_schema, table_name, column_name from information_schema.columns where table_schema = 'sample_datasets' and table_name = 'www_access';
 table_catalog |  table_schema   | table_name | column_name 
---------------+-----------------+------------+-------------
 td-presto     | sample_datasets | www_access | user        
 td-presto     | sample_datasets | www_access | host 
 ...
Copy
Copied
trino> select table_catalog, table_schema, table_name from information_schema.tables where table_schema = 'sample_datasets' and table_name = 'www_access';
 table_catalog |  table_schema   | table_name 
---------------+-----------------+------------
 td            | sample_datasets | www_access 
(1 row)

trino> select table_catalog, table_schema, table_name, column_name from information_schema.columns where table_schema = 'sample_datasets' and table_name = 'www_access';
 table_catalog |  table_schema   | table_name | column_name 
---------------+-----------------+------------+-------------
 td            | sample_datasets | www_access | user        
 td            | sample_datasets | www_access | host    
 ...

JDBC

350423
Copy
Copied
select TABLE_SCHEM, TABLE_CATALOG from system.jdbc.schemas where TABLE_CATALOG = 'td-presto'
select TABLE_CAT, TABLE_SCHEM, TABLE_NAME from system.jdbc.tables where TABLE_CAT = 'td-presto'
select TABLE_CAT, TABLE_SCHEM, COLUMN_NAME from system.jdbc.columns where TABLE_CAT = 'td-presto' and TABLE_NAME = 'www_access'
Copy
Copied
select TABLE_SCHEM, TABLE_CATALOG from system.jdbc.schemas where TABLE_CATALOG = 'td'
select TABLE_CAT, TABLE_SCHEM, TABLE_NAME from system.jdbc.tables where TABLE_CAT = 'td'
select TABLE_CAT, TABLE_SCHEM, COLUMN_NAME from system.jdbc.columns where TABLE_CAT = 'td' and TABLE_NAME = 'www_access'

Some extra information eliminated from SHOW COLUMNS and information_schema.columns

In Trino 423, SHOW COLUMNS and information_schema.columns do not show extra information.

350423
Copy
Copied
presto> SHOW COLUMNS FROM sample_datasets.www_access Like '%host%';
 Column |  Type   |    Extra     | Comment 
--------+---------+--------------+---------
 host   | varchar | keyName=host |         
(1 row)

presto> select column_name, data_type, extra_info, comment from information_schema.columns where table_schema = 'sample_datasets' and table_name = 'www_access' and column_name Like '%host%';
 column_name | data_type |  extra_info  | comment 
-------------+-----------+--------------+---------
 host        | varchar   | keyName=host | NULL    
(1 row)
Copy
Copied
trino> SHOW COLUMNS FROM sample_datasets.www_access Like '%host%';
 Column |  Type   | Extra | Comment 
--------+---------+-------+---------
 host   | varchar |       |         
(1 row)

trino> select column_name, data_type, extra_info, comment from information_schema.columns where table_schema = 'sample_datasets' and table_name = 'www_access' and column_name Like '%host%';
 column_name | data_type | extra_info | comment 
-------------+-----------+------------+---------
 host        | varchar   | NULL       | NULL    
(1 row)

To see keys names with Trino 423, use SHOW CREATE TABLE.

350423
Copy
Copied
presto> SHOW CREATE TABLE your_db.language;
                 Create Table                  
-----------------------------------------------
 CREATE TABLE "td-presto".your_db.language ( 
    name varchar,                              
    time bigint                                
 )                                             
(1 row)
Copy
Copied
trino> SHOW CREATE TABLE your_db.language;
                Create Table                 
---------------------------------------------
 CREATE TABLE td.your_db.language (        
    name varchar WITH ( key_name = 'name' ), 
    time bigint WITH ( key_name = 'time' )   
 )                                           
(1 row)

Further Reading