Working with DB-API

Intro

pytd implements Python Database API Specification v2.0 with the help of prestodb/presto-python-client.

Setup

To connect to the API declare a connect object. The default engine is Presto. You can manually specify Hive via the default_engine argument.

Copy
Copied
from pytd.dbapi import connect
conn = connect(pytd.Client(database='sample_datasets'))

# or, connect with Hive:
conn = connect(pytd.Client(database='sample_datasets', default_engine='hive'))

Query

The cursor function allows you to flexibly fetch query results from a custom function:

Copy
Copied
def query(sql, connection):
    cur = connection.cursor()
    cur.execute(sql)
    rows = cur.fetchall()
    columns = [desc[0] for desc in cur.description]
    return {'data': rows, 'columns': columns}

query('select symbol, count(1) as cnt from nasdaq group by 1 order by 1', conn)

Example

Below is an example of generator-based iterative retrieval, just like pandas.DataFrame.iterrows:

Copy
Copied
def iterrows(sql, connection):
    cur = connection.cursor()
    cur.execute(sql)
    index = 0
    columns = None
    while True:
        row = cur.fetchone()
        if row is None:
            break
        if columns is None:
            columns = [desc[0] for desc in cur.description]
        yield index, dict(zip(columns, row))
        index += 1

for index, row in iterrows('select symbol, count(1) as cnt from nasdaq group by 1 order by 1', conn):
    print(index, row)
# 0 {'cnt': 590, 'symbol': 'AAIT'}
# 1 {'cnt': 82, 'symbol': 'AAL'}
# 2 {'cnt': 9252, 'symbol': 'AAME'}
# 3 {'cnt': 253, 'symbol': 'AAOI'}
# 4 {'cnt': 5980, 'symbol': 'AAON'}
# ...

Further Reading