SQL Tips for Hive and Presto

For tips specifically concerning Presto performance, refer to Presto Performance Tuning.

Use of PIVOT and UNPIVOT

You can use the PIVOT and UNPIVOT operators in standard SQL, Hive, and Presto.

The PIVOT operator transforms rows into columns. The UNPIVOT operator transforms columns into rows.

Example Tables

Example of vertical table (vtable)

uid key value
101 c1 11
101 c2 12
101 c3 13
102 c1 21
102 c2 22
102 c3 23

Example of horizontal table (htable)

uid c1 c2 c3
101 11 12 13
102 21 22 23

Standard SQL

The following examples show how you can use an SQL syntax query language.

PIVOT

Copy
Copied
SELECT 
  uid,
  kv['c1'] AS c1,
  kv['c2'] AS c2,
  kv['c3'] AS c3
FROM (
    SELECT 
      uid,
      to_map(
        KEY,
        VALUE
      ) kv
    FROM
      vtable
    GROUP BY
      uid
  ) t
uid c1 c2 c3
101 11 12 13
102 21 22 23

UNPIVOT

Copy
Copied
SELECT uid, 'c1' AS key, c1 AS value FROM htable
UNION ALL
SELECT uid, 'c2' AS key, c2 AS value FROM htable
UNION ALL
SELECT uid, 'c3' AS key, c3 AS value FROM htable
uid key value
101 c1 11
102 c1 21
101 c2 12
102 c2 22
101 c3 13
102 c3 23

Hive

The following examples show how you can use Hive.

PIVOT

Hive on Treasure Data supports to_map UDAF, which can generate Map type, and then transforms rows into columns. The general Hive function doesn’t offer the same support.

Copy
Copied
SELECT
  uid,
  kv['c1'] AS c1,
  kv['c2'] AS c2,
  kv['c3'] AS c3
FROM (
  SELECT uid, to_map(key, value) kv
  FROM vtable
  GROUP BY uid
) t
uid c1 c2 c3
101 11 12 13
102 21 22 23

UNPIVOT

LATERAL VIEW explode function transforms columns into rows.

Copy
Copied
SELECT 
  t1.uid,
  t2.key,
  t2.value
FROM
  htable t1
LATERAL VIEW
  explode(
    MAP(
      'c1',
      c1,
      'c2',
      c2,
      'c3',
      c3
    )
  ) t2 AS KEY,
  VALUE
uid key value
101 c1 11
101 c2 12
101 c3 13
102 c1 21
102 c2 22
102 c3 23

Presto

The following examples show how you can use Presto.

PIVOT

This SQL transforms rows into columns by map_agg function.

Copy
Copied
SELECT 
  uid,
  kv['c1'] AS c1,
  kv['c2'] AS c2,
  kv['c3'] AS c3
FROM (
    SELECT 
      uid,
      map_agg(
        KEY,
        VALUE
      ) kv
    FROM
      vtable
    GROUP BY
      uid
  ) t
uid c1 c2 c3
101 11 12 13
102 21 22 23

UNPIVOT

CROSS JOIN unnest function is similar to LATERAL VIEW explode function. It also transforms columns into rows.

Copy
Copied
SELECT 
  t1.uid,
  t2.key,
  t2.value
FROM
  htable t1 CROSS
JOIN
  unnest(
    array['c1',
    'c2',
    'c3'],
    array[c1,
    c2,
    c3]
  ) t2(
    KEY,
    VALUE
  )
uid key value
101 c1 11
101 c2 12
101 c3 13
102 c1 21
102 c2 22
102 c3 23

Getting Data from Multiple Databases

From within in Hive and Presto, you can create a single query to obtain data from several databases or analyze data in different databases.

Example Databases and Tables

  • Example of animals table in zoo _ a database
id name sex
1 Lion m
2 Gorilla m
3 Zebra f
4 Giraffe f
  • Example of emp animals in zoo _ b database
animal_id name sex
101 Lion m
102 Bear f
103 Elephant f
104 Gorilla f
105 Tiger m
106 Monkey m
107 Rhinoceros f

Use of Fully Qualified Table Names

If you use the fully qualified name of a table in the format database.table, you can query and process data from multiple databases.

Copy
Copied
SELECT 
  name,
  sex
FROM
  zoo_a.animals
UNION
SELECT 
name,
sex
FROM
zoo_b.animals
name sex
Lion m
Giraffe f
Bear f
Monkey m
Gorilla m
Gorilla f
Elephant f
Rhinoceros f
Tiger m
Zebra f

Treasure Data Native Data Types

Because Treasure Data supports so many different sources and targets of data, Treasure Data uses a specific set of primitive data types native to our platform.

TD processes queries using different processing engines (Presto and Hive). Each engine has its own data type system, and the TD native data types map to types available in the query engine.

When read from and written to the underlying storage layer in TD, which uses MessagePack mpc1 format, these TD types map to MessagePack's formats as well.

Schema relation

Treasure Data Presto Hive
int bigint smallint
int bigint int
long bigint bigint
double double decimal
float double float
double double double
Convert to string or int boolean boolean
string varchar string or varchar
string or Convert to long date string
string or Convert to long timestamp timestamp

Our storage stores data as MessagePack mpc1 format. They support INT as:

positive fixint uint16 int16
negative fixint unit32 int32
uint8 int8

For example, the Hive query result for ResultWorker a float data type becomes a double data type.

TD Primitive Data Types Hive Query Result for ResultWorker Presto Result msgpack ValueType
int int > int Integer
long bigint > long integer
float float > double float
double double > double float
string string > string string