How to add a RuntimeDataConnector to a SimpleSqlAlchemyDatasource configuration

wks · May 18, 2021, 11:03pm

If you follow a tutorial on configuring a Datasource connected to any database (mysql, snowflake, etc), it will ask that you build a configuration with a SimpleSqlAlchemyDatasource. Although the SimpleSqlAlchemyDatasource offers a number of convenience methods that simplify the configuration process, it does not allow for additional DataConnectors to be added. If you need to connect to data using a query or use additional data connectors, the following will be helpful. Below is an equivalent configuration to the SimpleSqlAlchemyDatasource with a Datasource.

name: my_datasource
class_name: Datasource
execution_engine:
  class_name: SqlAlchemyExecutionEngine
  connection_string: postgresql+psycopg2://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<DATABASE> # or credentials go here
data_connectors:
   default_inferred_data_connector_name:
       class_name: InferredAssetSqlDataConnector
       name: whole_table

This configuration will behave the same way as a SimpleSqlAlchemyDatasource, meaning you can send in a BatchRequest with a table name that you would like to retrieve as a batch. (the table name is taxi_data in the following BatchRequest)

batch_request = BatchRequest(
    datasource_name="my_datasource",
    data_connector_name="default_inferred_data_connector_name",
    data_asset_name="taxi_data",  # this is the name of the table you want to retrieve
)

You can also add additional DataConnectors to this configuration, like the RuntimeDataConnector in the following example:

name: my_datasource
class_name: Datasource
execution_engine:
  class_name: SqlAlchemyExecutionEngine
  connection_string: postgresql+psycopg2://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<DATABASE> # or credentials go here
data_connectors:
   default_inferred_data_connector_name:
       class_name: InferredAssetSqlDataConnector
       name: whole_table
   default_runtime_data_connector_name:
       class_name: RuntimeDataConnector
       batch_identifiers:
           - default_identifier_name

With the RuntimeDataConnector, you can retrieve data from the Datasource as a query using a RuntimeBatchRequest. For a RuntimeBatchRequest, the data_asset_name and batch_identifiers that uniquely identify the data (e.g. a run_id from an AirFlow DAG run), must be specified by the user.

batch_request = RuntimeBatchRequest(
    datasource_name="my_datasource",
    data_connector_name="default_runtime_data_connector_name",
    data_asset_name="default_name",  # this can be anything that identifies this data
    runtime_parameters={"query": "SELECT * from taxi_data LIMIT 10"},
    batch_identifiers={"default_identifier_name": "identifier"},
)

Topic		Replies	Views
How to use SqlAlchemyExecutionEngine with passing URL(including connection string and http path) GX Core Support how-to , help-wanted , databricks	1	241	November 3, 2023
How I can add a data_source given an already existing SQLAlchemy engine rather than creating a data_source from a connection_string??? GX Core Support	1	226	January 25, 2024
Snowflake GX datasource GX Core Support	1	58	August 13, 2024
Add Query to filter based on some columns for athena datasource Archive	1	932	October 21, 2021
How to instantiate a Data Context without a yml file Archive how-to , help-wanted	1	584	July 9, 2021

How to add a RuntimeDataConnector to a SimpleSqlAlchemyDatasource configuration

Related topics