API Reference¶
- class heavyai.Connection(uri=None, user=None, password=None, host=None, port=6274, dbname=None, protocol='binary', sessionid=None, bin_cert_validate=None, bin_ca_certs=None, idpurl=None, idpformusernamefield='username', idpformpasswordfield='password', idpsslverify=True)¶
- change_dashboard_sources(dashboard: TDashboard, remap: dict) TDashboard ¶
Change the sources of a dashboard
- Parameters:
- dashboard: TDashboard
The HeavyDB dashboard object to transform
- remap: dict
EXPERIMENTAL A dictionary remapping table names. The old table name(s) should be keys of the dict, with each value being another dict with a ‘name’ key holding the new table value. This structure can be used later to support changing column names.
- Returns:
- dashboard: TDashboard
An HeavyDB dashboard with the sources remapped
See also
Examples
>>> source_remap = {'oldtablename1': {'name': 'newtablename1'}, 'oldtablename2': {'name': 'newtablename2'}} >>> dash = con.get_dashboard(1) >>> newdash = con.change_dashboard_sources(dash, source_remap)
- create_dashboard(dashboard: TDashboard) int ¶
Create a new dashboard
- Parameters:
- dashboard: TDashboard
The HeavyDB dashboard object to create
- Returns:
- dashboardid: int
The dashboard id of the new dashboard
- create_table(table_name, data, preserve_index=False)¶
Create a table from a pandas.DataFrame
- Parameters:
- table_name: str
- data: DataFrame
- preserve_index: bool, default False
Whether to create a column in the table for the DataFrame index
- deallocate_ipc(df, device_id=0)¶
Deallocate a DataFrame using CPU shared memory.
- Parameters:
- device_id: int
GPU which contains TDataFrame
- deallocate_ipc_gpu(df, device_id=0)¶
Deallocate a DataFrame using GPU memory.
- Parameters:
- device_ids: int
GPU which contains TDataFrame
- duplicate_dashboard(dashboard_id, new_name=None, source_remap=None)¶
Duplicate an existing dashboard, returning the new dashboard id.
- Parameters:
- dashboard_id: int
The id of the dashboard to duplicate
- new_name: str
The name for the new dashboard
- source_remap: dict
EXPERIMENTAL A dictionary remapping table names. The old table name(s) should be keys of the dict, with each value being another dict with a ‘name’ key holding the new table value. This structure can be used later to support changing column names.
Examples
>>> source_remap = {'oldtablename1': {'name': 'newtablename1'}, 'oldtablename2': {'name': 'newtablename2'}} >>> newdash = con.duplicate_dashboard(12345, "new dash", source_remap)
- get_column_details(table_name)¶
Get the column names and data types associated with a table.
- Parameters:
- table_name: str
- Returns:
- details: List[tuples]
Examples
>>> con.get_column_details('stocks') [ColumnDetails(name='date_', type='STR', nullable=True, precision=0, scale=0, comp_param=32, encoding='DICT'), ColumnDetails(name='trans', type='STR', nullable=True, precision=0, scale=0, comp_param=32, encoding='DICT'), ... ]
- get_dashboard(dashboard_id)¶
Return the dashboard object of a specific dashboard
Examples
>>> con.get_dashboard(123)
- get_dashboards()¶
List all the dashboards in the database
Examples
>>> con.get_dashboards()
- get_table_details(table_name)¶
Get the TTableDetails object returned by the get_table_details thrift API call
- Parameters:
- table_name: str
Name of the table to get details for.
- Returns:
- details: TTableDetails
Examples
>>> con.get_table_details('test') TTableDetails(row_desc=[TColumnType(col_name='a', col_type=TTypeInfo(type=1, encoding=0, nullable=True, is_array=False, precision=0, scale=0, comp_param=0, size=-1, dict_key=None), is_reserved_keyword=False, src_name='', is_system=False, is_physical=False, col_id=1, default_value=None)], fragment_size=32000000, page_size=2097152, max_rows=4611686018427387904, view_sql='', shard_count=0, key_metainfo='[]', is_temporary=False, partition_detail=0, table_type=0, refresh_info=TTableRefreshInfo(update_type=0, timing_type=0, start_date_time='', interval_type=0, interval_count=0, last_refresh_time='', next_refresh_time=''), sharded_column_name='')
- get_table_details_for_database(table_name, db_name)¶
Get the TTableDetails object returned by the get_table_details_for_database thrift API call
- Parameters:
- table_name: str
Name of the table to get details for.
- db_name: str
Name of the database in which table resides.
- Returns:
- details: TTableDetails
Examples
>>> con.get_table_details_for_database('test','test_db') TTableDetails(row_desc=[TColumnType(col_name='a', col_type=TTypeInfo(type=1, encoding=0, nullable=True, is_array=False, precision=0, scale=0, comp_param=0, size=-1, dict_key=None), is_reserved_keyword=False, src_name='', is_system=False, is_physical=False, col_id=1, default_value=None)], fragment_size=32000000, page_size=2097152, max_rows=4611686018427387904, view_sql='', shard_count=0, key_metainfo='[]', is_temporary=False, partition_detail=0, table_type=0, refresh_info=TTableRefreshInfo(update_type=0, timing_type=0, start_date_time='', interval_type=0, interval_count=0, last_refresh_time='', next_refresh_time=''), sharded_column_name='')
- get_tables()¶
List all the tables in the database
Examples
>>> con.get_tables() ['flights_2008_10k', 'stocks']
- load_table(table_name, data, method='infer', preserve_index=False, create='infer', column_names=[])¶
Load data into a table
- Parameters:
- table_name: str
- data: pyarrow.Table, pandas.DataFrame, or iterable of tuples
- method: {‘infer’, ‘columnar’, ‘rows’, ‘arrow’}
Method to use for loading the data. Three options are available
pyarrow
and Apache Arrow loadercolumnar loader
row-wise loader
The Arrow loader is typically the fastest, followed by the columnar loader, followed by the row-wise loader. If a DataFrame or
pyarrow.Table
is passed andpyarrow
is installed, the Arrow-based loader will be used. If arrow isn’t available, the columnar loader is used. Finally,data
is an iterable of tuples the row-wise loader is used.- preserve_index: bool, default False
Whether to keep the index when loading a pandas DataFrame
- create: {“infer”, True, False}
Whether to issue a CREATE TABLE before inserting the data.
infer: check to see if the table already exists, and create a table if it does not
True: attempt to create the table, without checking if it exists
False: do not attempt to create the table
See also
- load_table_arrow(table_name, data, preserve_index=False, load_column_names=[])¶
Load a pandas.DataFrame or a pyarrow Table or RecordBatch to the database using Arrow columnar format for interchange
- Parameters:
- table_name: str
- data: pandas.DataFrame, pyarrow.RecordBatch, pyarrow.Table
- preserve_index: bool, default False
Whether to include the index of a pandas DataFrame when writing.
Notes
Use
load_table_columnar
to load geometry data ifheavydb <= 7.0
Examples
>>> df = pd.DataFrame({"a": [1, 2, 3], "b": ['d', 'e', 'f']}) >>> con.load_table_arrow('foo', df, preserve_index=False)
- load_table_columnar(table_name, data, preserve_index=False, chunk_size_bytes=0, col_names_from_schema=False, column_names=[])¶
Load a pandas DataFrame to the database using HeavyDB’s Thrift-based columnar format
- Parameters:
- table_name: str
- data: DataFrame
- preserve_index: bool, default False
Whether to include the index of a pandas DataFrame when writing.
- chunk_size_bytes: integer, default 0
Chunk the loading of columns to prevent large Thrift requests. A value of 0 means do not chunk and send the dataframe as a single request
- col_names_from_schema: bool, default False
Read the existing table schema to determine the column names. This will read the schema of an existing table in HeavyDB and match those names to the column names of the dataframe. This is for user convenience when loading from data that is unordered, especially handy when a table has a large number of columns.
See also
Notes
Use
pymapd >= 0.11.0
while running withheavydb >= 4.6.0
in order to avoid loading inconsistent values into DATE column.Examples
>>> df = pd.DataFrame({"a": [1, 2, 3], "b": ['d', 'e', 'f']}) >>> con.load_table_columnar('foo', df, preserve_index=False)
- load_table_rowwise(table_name, data, column_names=[])¶
Load data into a table row-wise
- Parameters:
- table_name: str
- data: Iterable of tuples
Each element of data should be a row to be inserted
Examples
>>> data = [(1, 'a'), (2, 'b'), (3, 'c')] >>> con.load_table('bar', data)
- render_vega(vega, compression_level=1)¶
Render vega data on the database backend, returning the image as a PNG.
- Parameters:
- vega: dict
The vega specification to render.
- compression_level: int
The level of compression for the rendered PNG. Ranges from 0 (low compression, faster) to 9 (high compression, slower).
- select_ipc(operation, parameters=None, first_n=-1, release_memory=True, transport_method=1)¶
Execute a
SELECT
operation using CPU shared memory- Parameters:
- operation: str
A SQL select statement
- parameters: dict, optional
Parameters to insert for a parametrized query
- first_n: int, optional
Number of records to return
- release_memory: bool, optional
Call
self.deallocate_ipc(df)
after DataFrame created
- Returns:
- df: pandas.DataFrame
Notes
This method requires the Python code to be executed on the same machine where HeavyDB running.
- select_ipc_gpu(operation, parameters=None, device_id=0, first_n=-1, release_memory=True)¶
Execute a
SELECT
operation using GPU memory.- Parameters:
- operation: str
A SQL statement
- parameters: dict, optional
Parameters to insert into a parametrized query
- device_id: int
GPU to return results to
- first_n: int, optional
Number of records to return
- release_memory: bool, optional
Call
self.deallocate_ipc_gpu(df)
after DataFrame created
- Returns:
- gdf: cudf.GpuDataFrame
Notes
This method requires
cudf
andlibcudf
to be installed. AnImportError
is raised if those aren’t available.This method requires the Python code to be executed on the same machine where HeavyDB running.
- class heavyai.Cursor(connection)¶
A database cursor.
- property arraysize¶
The number of rows to fetch at a time with fetchmany. Default 1.
See also
- close()¶
Close this cursor.
- property description¶
Read-only sequence describing columns of the result set. Each column is an instance of Description describing
name
type_code
display_size
internal_size
precision
scale
null_ok
We only use name, type_code, and null_ok; The rest are always
None
- execute(operation, parameters=None)¶
Execute a SQL statement.
- Parameters:
- operation: str
A SQL query
- parameters: dict
Parameters to substitute into
operation
.
- Returns:
- selfCursor
Examples
>>> c = conn.cursor() >>> c.execute("select symbol, qty from stocks") >>> list(c) [('RHAT', 100.0), ('IBM', 1000.0), ('MSFT', 1000.0), ('IBM', 500.0)]
Passing in
parameters
:>>> c.execute("select symbol qty from stocks where qty <= :max_qty", ... parameters={"max_qty": 500}) [('RHAT', 100.0), ('IBM', 500.0)]
- executemany(operation, parameters)¶
Execute a SQL statement for many sets of parameters.
- Parameters:
- operation: str
- parameters: list of dict
- Returns:
- results: list of lists
- fetchmany(size=None)¶
Fetch
size
rows from the results set.
- fetchone()¶
Fetch a single row from the results set
- heavyai.connect(uri=None, user=None, password=None, host=None, port=6274, dbname=None, protocol='binary', sessionid=None, bin_cert_validate=None, bin_ca_certs=None, idpurl=None, idpformusernamefield='username', idpformpasswordfield='password', idpsslverify=True)¶
Create a new Connection.
- Parameters:
- uri: str
- user: str
- password: str
- host: str
- port: int
- dbname: str
- protocol: {‘binary’, ‘http’, ‘https’}
- sessionid: str
- bin_cert_validate: bool, optional, binary encrypted connection only
Whether to continue if there is any certificate error
- bin_ca_certs: str, optional, binary encrypted connection only
Path to the CA certificate file
- idpurlstr
EXPERIMENTAL Enable SAML authentication by providing the logon page of the SAML Identity Provider.
- idpformusernamefield: str
The HTML form ID for the username, defaults to ‘username’.
- idpformpasswordfield: str
The HTML form ID for the password, defaults to ‘password’.
- idpsslverify: str
Enable / disable certificate checking, defaults to True.
- Returns:
- conn: Connection
Examples
You can either pass a string
uri
, all the individual components, or an existing sessionid excluding user, password, and database>>> connect('heavydb://admin:HyperInteractive@localhost:6274/heavyai?' ... 'protocol=binary') Connection(mapd://heavydb:***@localhost:6274/heavyai?protocol=binary)
>>> connect(user='admin', password='HyperInteractive', host='localhost', ... port=6274, dbname='heavyai')
>>> connect(user='admin', password='HyperInteractive', host='localhost', ... port=443, idpurl='https://sso.localhost/logon', ... protocol='https')
>>> connect(sessionid='XihlkjhdasfsadSDoasdllMweieisdpo', host='localhost', ... port=6273, protocol='http')