API Reference

class heavyai.Connection(uri=None, user=None, password=None, host=None, port=6274, dbname=None, protocol='binary', sessionid=None, bin_cert_validate=None, bin_ca_certs=None, idpurl=None, idpformusernamefield='username', idpformpasswordfield='password', idpsslverify=True)
change_dashboard_sources(dashboard: TDashboard, remap: dict) TDashboard

Change the sources of a dashboard

Parameters:
dashboard: TDashboard

The HeavyDB dashboard object to transform

remap: dict

EXPERIMENTAL A dictionary remapping table names. The old table name(s) should be keys of the dict, with each value being another dict with a ‘name’ key holding the new table value. This structure can be used later to support changing column names.

Returns:
dashboard: TDashboard

An HeavyDB dashboard with the sources remapped

Examples

>>> source_remap = {'oldtablename1': {'name': 'newtablename1'}, 'oldtablename2': {'name': 'newtablename2'}}
>>> dash = con.get_dashboard(1)
>>> newdash = con.change_dashboard_sources(dash, source_remap)
create_dashboard(dashboard: TDashboard) int

Create a new dashboard

Parameters:
dashboard: TDashboard

The HeavyDB dashboard object to create

Returns:
dashboardid: int

The dashboard id of the new dashboard

create_table(table_name, data, preserve_index=False)

Create a table from a pandas.DataFrame

Parameters:
table_name: str
data: DataFrame
preserve_index: bool, default False

Whether to create a column in the table for the DataFrame index

deallocate_ipc(df, device_id=0)

Deallocate a DataFrame using CPU shared memory.

Parameters:
device_id: int

GPU which contains TDataFrame

deallocate_ipc_gpu(df, device_id=0)

Deallocate a DataFrame using GPU memory.

Parameters:
device_ids: int

GPU which contains TDataFrame

duplicate_dashboard(dashboard_id, new_name=None, source_remap=None)

Duplicate an existing dashboard, returning the new dashboard id.

Parameters:
dashboard_id: int

The id of the dashboard to duplicate

new_name: str

The name for the new dashboard

source_remap: dict

EXPERIMENTAL A dictionary remapping table names. The old table name(s) should be keys of the dict, with each value being another dict with a ‘name’ key holding the new table value. This structure can be used later to support changing column names.

Examples

>>> source_remap = {'oldtablename1': {'name': 'newtablename1'}, 'oldtablename2': {'name': 'newtablename2'}}
>>> newdash = con.duplicate_dashboard(12345, "new dash", source_remap)
get_column_details(table_name)

Get the column names and data types associated with a table.

Parameters:
table_name: str
Returns:
details: List[tuples]

Examples

>>> con.get_column_details('stocks')
[ColumnDetails(name='date_', type='STR', nullable=True, precision=0,
               scale=0, comp_param=32, encoding='DICT'),
 ColumnDetails(name='trans', type='STR', nullable=True, precision=0,
               scale=0, comp_param=32, encoding='DICT'),
 ...
]
get_dashboard(dashboard_id)

Return the dashboard object of a specific dashboard

Examples

>>> con.get_dashboard(123)
get_dashboards()

List all the dashboards in the database

Examples

>>> con.get_dashboards()
get_table_details(table_name)

Get the TTableDetails object returned by the get_table_details thrift API call

Parameters:
table_name: str

Name of the table to get details for.

Returns:
details: TTableDetails

Examples

>>> con.get_table_details('test')
TTableDetails(row_desc=[TColumnType(col_name='a',
  col_type=TTypeInfo(type=1, encoding=0, nullable=True, is_array=False,
  precision=0, scale=0, comp_param=0, size=-1, dict_key=None),
  is_reserved_keyword=False, src_name='', is_system=False,
  is_physical=False, col_id=1, default_value=None)],
  fragment_size=32000000, page_size=2097152,
  max_rows=4611686018427387904, view_sql='', shard_count=0,
  key_metainfo='[]', is_temporary=False, partition_detail=0,
  table_type=0, refresh_info=TTableRefreshInfo(update_type=0,
                                               timing_type=0,
                                               start_date_time='',
                                               interval_type=0,
                                               interval_count=0,
                                               last_refresh_time='',
                                               next_refresh_time=''),
                                               sharded_column_name='')
get_table_details_for_database(table_name, db_name)

Get the TTableDetails object returned by the get_table_details_for_database thrift API call

Parameters:
table_name: str

Name of the table to get details for.

db_name: str

Name of the database in which table resides.

Returns:
details: TTableDetails

Examples

>>> con.get_table_details_for_database('test','test_db')
TTableDetails(row_desc=[TColumnType(col_name='a',
  col_type=TTypeInfo(type=1, encoding=0, nullable=True, is_array=False,
  precision=0, scale=0, comp_param=0, size=-1, dict_key=None),
  is_reserved_keyword=False, src_name='', is_system=False,
  is_physical=False, col_id=1, default_value=None)],
  fragment_size=32000000, page_size=2097152,
  max_rows=4611686018427387904, view_sql='', shard_count=0,
  key_metainfo='[]', is_temporary=False, partition_detail=0,
  table_type=0, refresh_info=TTableRefreshInfo(update_type=0,
                                               timing_type=0,
                                               start_date_time='',
                                               interval_type=0,
                                               interval_count=0,
                                               last_refresh_time='',
                                               next_refresh_time=''),
                                               sharded_column_name='')
get_tables()

List all the tables in the database

Examples

>>> con.get_tables()
['flights_2008_10k', 'stocks']
load_table(table_name, data, method='infer', preserve_index=False, create='infer', column_names=[])

Load data into a table

Parameters:
table_name: str
data: pyarrow.Table, pandas.DataFrame, or iterable of tuples
method: {‘infer’, ‘columnar’, ‘rows’, ‘arrow’}

Method to use for loading the data. Three options are available

  1. pyarrow and Apache Arrow loader

  2. columnar loader

  3. row-wise loader

The Arrow loader is typically the fastest, followed by the columnar loader, followed by the row-wise loader. If a DataFrame or pyarrow.Table is passed and pyarrow is installed, the Arrow-based loader will be used. If arrow isn’t available, the columnar loader is used. Finally, data is an iterable of tuples the row-wise loader is used.

preserve_index: bool, default False

Whether to keep the index when loading a pandas DataFrame

create: {“infer”, True, False}

Whether to issue a CREATE TABLE before inserting the data.

  • infer: check to see if the table already exists, and create a table if it does not

  • True: attempt to create the table, without checking if it exists

  • False: do not attempt to create the table

load_table_arrow(table_name, data, preserve_index=False, load_column_names=[])

Load a pandas.DataFrame or a pyarrow Table or RecordBatch to the database using Arrow columnar format for interchange

Parameters:
table_name: str
data: pandas.DataFrame, pyarrow.RecordBatch, pyarrow.Table
preserve_index: bool, default False

Whether to include the index of a pandas DataFrame when writing.

Notes

Use load_table_columnar to load geometry data if heavydb <= 7.0

Examples

>>> df = pd.DataFrame({"a": [1, 2, 3], "b": ['d', 'e', 'f']})
>>> con.load_table_arrow('foo', df, preserve_index=False)
load_table_columnar(table_name, data, preserve_index=False, chunk_size_bytes=0, col_names_from_schema=False, column_names=[])

Load a pandas DataFrame to the database using HeavyDB’s Thrift-based columnar format

Parameters:
table_name: str
data: DataFrame
preserve_index: bool, default False

Whether to include the index of a pandas DataFrame when writing.

chunk_size_bytes: integer, default 0

Chunk the loading of columns to prevent large Thrift requests. A value of 0 means do not chunk and send the dataframe as a single request

col_names_from_schema: bool, default False

Read the existing table schema to determine the column names. This will read the schema of an existing table in HeavyDB and match those names to the column names of the dataframe. This is for user convenience when loading from data that is unordered, especially handy when a table has a large number of columns.

Notes

Use pymapd >= 0.11.0 while running with heavydb >= 4.6.0 in order to avoid loading inconsistent values into DATE column.

Examples

>>> df = pd.DataFrame({"a": [1, 2, 3], "b": ['d', 'e', 'f']})
>>> con.load_table_columnar('foo', df, preserve_index=False)
load_table_rowwise(table_name, data, column_names=[])

Load data into a table row-wise

Parameters:
table_name: str
data: Iterable of tuples

Each element of data should be a row to be inserted

Examples

>>> data = [(1, 'a'), (2, 'b'), (3, 'c')]
>>> con.load_table('bar', data)
render_vega(vega, compression_level=1)

Render vega data on the database backend, returning the image as a PNG.

Parameters:
vega: dict

The vega specification to render.

compression_level: int

The level of compression for the rendered PNG. Ranges from 0 (low compression, faster) to 9 (high compression, slower).

select_ipc(operation, parameters=None, first_n=-1, release_memory=True, transport_method=1)

Execute a SELECT operation using CPU shared memory

Parameters:
operation: str

A SQL select statement

parameters: dict, optional

Parameters to insert for a parametrized query

first_n: int, optional

Number of records to return

release_memory: bool, optional

Call self.deallocate_ipc(df) after DataFrame created

Returns:
df: pandas.DataFrame

Notes

This method requires the Python code to be executed on the same machine where HeavyDB running.

select_ipc_gpu(operation, parameters=None, device_id=0, first_n=-1, release_memory=True)

Execute a SELECT operation using GPU memory.

Parameters:
operation: str

A SQL statement

parameters: dict, optional

Parameters to insert into a parametrized query

device_id: int

GPU to return results to

first_n: int, optional

Number of records to return

release_memory: bool, optional

Call self.deallocate_ipc_gpu(df) after DataFrame created

Returns:
gdf: cudf.GpuDataFrame

Notes

This method requires cudf and libcudf to be installed. An ImportError is raised if those aren’t available.

This method requires the Python code to be executed on the same machine where HeavyDB running.

class heavyai.Cursor(connection)

A database cursor.

property arraysize

The number of rows to fetch at a time with fetchmany. Default 1.

See also

fetchmany
close()

Close this cursor.

property description

Read-only sequence describing columns of the result set. Each column is an instance of Description describing

  • name

  • type_code

  • display_size

  • internal_size

  • precision

  • scale

  • null_ok

We only use name, type_code, and null_ok; The rest are always None

execute(operation, parameters=None)

Execute a SQL statement.

Parameters:
operation: str

A SQL query

parameters: dict

Parameters to substitute into operation.

Returns:
selfCursor

Examples

>>> c = conn.cursor()
>>> c.execute("select symbol, qty from stocks")
>>> list(c)
[('RHAT', 100.0), ('IBM', 1000.0), ('MSFT', 1000.0), ('IBM', 500.0)]

Passing in parameters:

>>> c.execute("select symbol qty from stocks where qty <= :max_qty",
...           parameters={"max_qty": 500})
[('RHAT', 100.0), ('IBM', 500.0)]
executemany(operation, parameters)

Execute a SQL statement for many sets of parameters.

Parameters:
operation: str
parameters: list of dict
Returns:
results: list of lists
fetchmany(size=None)

Fetch size rows from the results set.

fetchone()

Fetch a single row from the results set

heavyai.connect(uri=None, user=None, password=None, host=None, port=6274, dbname=None, protocol='binary', sessionid=None, bin_cert_validate=None, bin_ca_certs=None, idpurl=None, idpformusernamefield='username', idpformpasswordfield='password', idpsslverify=True)

Create a new Connection.

Parameters:
uri: str
user: str
password: str
host: str
port: int
dbname: str
protocol: {‘binary’, ‘http’, ‘https’}
sessionid: str
bin_cert_validate: bool, optional, binary encrypted connection only

Whether to continue if there is any certificate error

bin_ca_certs: str, optional, binary encrypted connection only

Path to the CA certificate file

idpurlstr

EXPERIMENTAL Enable SAML authentication by providing the logon page of the SAML Identity Provider.

idpformusernamefield: str

The HTML form ID for the username, defaults to ‘username’.

idpformpasswordfield: str

The HTML form ID for the password, defaults to ‘password’.

idpsslverify: str

Enable / disable certificate checking, defaults to True.

Returns:
conn: Connection

Examples

You can either pass a string uri, all the individual components, or an existing sessionid excluding user, password, and database

>>> connect('heavydb://admin:HyperInteractive@localhost:6274/heavyai?'
...         'protocol=binary')
Connection(mapd://heavydb:***@localhost:6274/heavyai?protocol=binary)
>>> connect(user='admin', password='HyperInteractive', host='localhost',
...         port=6274, dbname='heavyai')
>>> connect(user='admin', password='HyperInteractive', host='localhost',
...         port=443, idpurl='https://sso.localhost/logon',
...         protocol='https')
>>> connect(sessionid='XihlkjhdasfsadSDoasdllMweieisdpo', host='localhost',
...         port=6273, protocol='http')

Exceptions