Migrating Hive Queries into Presto

Presto follows the ANSI SQL syntax and semantics, while Hive uses a different SQL syntax dialect. Differences include:

  • Quotation — In Presto, use double quotations for column and table names (e.g., "column1"), single quotations for strings (for example, ‘hello presto’).
  • Implicit type conversion — In Presto, SELECT CAST(5 AS DOUBLE) / 2 gives 2.5, but SELECT 5 / 2 gives 2. In Hive, both return 2.5.

To migrate Hive queries into Presto:

  1. Review the SQL syntax requirements for Presto at Migrating from Hive .
  2. Determine which of your queries might need to have syntax updates.
  3. Make and test the changes to your queries. Or, create copies of the queries you want to migrate to Presto and make changes in the new queries.
  4. On the Query page of the TD Console for your query, make sure that the Type is set to Presto.

image

Supported Time Formats in TD TIME FORMAT UDF

This is a summary of the time formats supported by the TD_TIME_FORMAT_UDF udf.

Formats

Syntax Date or Time Component Presentation Examples
G Era designator Text AD

yyyy Year Year 1996
yy Year Year (2 digits) 96
MMMM Month in year Month long name July
MMM Month in year Month short name Jul
MM, M Month in year Month number 07
ww, w Week in year Number 06
DDD, DD, D Day in year Number 189
dd, d Day in month Number 10
EEEE Day in week Text Tuesday
E, EEE Day in week Text (short form) Tue
a Am/pm marker Text PM
HH, H Hour in day (0-23) Number 00
kk, k Hour in day (1-24) Number 24
KK, K Hour in AM/PM (0-11) Number 00
hh, h Hour in AM/PM (1-12) Number 12
mm, m Minute in hour Number 30
ss, s Second in minute Number 55
SSS, SS, S Millisecond Number 978
wyear Week of the year Number TD_TIME_FORMAT(1582859260, 'wyear'
TD_TIME_FORMAT(1582859260, 'wyear', 'UTC'))
zzzz Time zone Zone long name Pacific Standard Time, or GMT+01:00
z Time zone Zone short name PST, or GMT+01:00
Z Time zone Zone offset -0800
u Day number of week (1-7) Number 1 (for Monday)

Example

This example formats a UNIX timestamp into a date formatted string. You can use the sample_datasets database for example to try it out:

SQL

Copy
Copied
SELECT 
  time,
  TD_TIME_FORMAT(time,
    'yyyy-MM-dd HH:mm:ss z')
FROM
  www_access LIMIT 1
;

Result

Copy
Copied
2012-01-01 00: 00: 00 UTC

SQL DDL Command Differences in Hive and Presto

Data Definition Language (DDL) is a concept in SQL that can be used to create and modify the structure of database objects or schema objects. You can use DDL statements to create, manage, and delete tables, indexes, users, and other schema objects.

As it pertains to data-processing, Presto is optimized for quickly processing smaller volumes of information while Hive is better for querying more expansive databases requiring large amounts of memory. Both Hive and Presto use DDL commands but in slightly different ways.

The following table illustrates DDL commands required for Hive and Presto.

SQL DDL Commands and Their Equivalents

SQL DDL Commands Hive SQL DDL Presto SQL DDL Query Definition
ALTER ALTER ALTER Modifies the structure of an existing table.
CREATE CREATE CREATE Creates a database or objects including tables, views, stored procedures, etc.
DROP DROP DROP Deletes an existing database or an object within a database.
TRUNCATE TRUNCATE DELETE Removes all the records from a table.