I am trying to write some expectations on an athena table partitioned by date. I want to pass date as a parameter for it. I do not want to run my expectations over the whole table. I want to run expectations on the table every day after our etl jobs add data to it.
I cannot find anything other than this thread How to add a RuntimeDataConnector to a SimpleSqlAlchemyDatasource configuration to do that. Other threads on slack and here say row filtering is only allowed only for pandas datasource. So cannot use it for AWS Athena.
Using great-expectations==0.13.31
I ran the following command
great_expectations --v3-api checkpoint new my_checkpoint
In the notebook I have this
yaml_config = f"""
name: {my_checkpoint_name}
config_version: 1.0
class_name: SimpleCheckpoint
run_name_template: "%Y%m%d-%H%M%S-my-run-name-template"
validations:
- batch_request:
datasource_name: athena_db
data_connector_name: default_inferred_data_connector_name
data_asset_name: data_asset_name_redacted
data_connector_query:
index: -1
expectation_suite_name: my_suite_name_redacted
"""
I tried adding the following
validations:
- batch_request:
datasource_name: athena_db
data_connector_name: default_inferred_data_connector_name
data_asset_name: TABLE_NAME
runtime_parameters:
query: select * from DB_NAME.TABLE_NAME where date = '2021-05-01'
expectation_suite_name: SUITE_NAME
I got this exception
~/miniconda3/envs/great_exp/lib/python3.8/site-packages/great_expectations/data_context/data_context.py in get_batch_list(self, datasource_name, data_connector_name, data_asset_name, batch_request, batch_data, data_connector_query, batch_identifiers, limit, index, custom_filter_function, batch_spec_passthrough, sampling_method, sampling_kwargs, splitter_method, splitter_kwargs, runtime_parameters, query, path, batch_filter_parameters, **kwargs)
1596 if batch_request:
1597 # TODO: Raise a warning if any parameters besides batch_requests are specified
-> 1598 return datasource.get_batch_list_from_batch_request(
1599 batch_request=batch_request
1600 )
~/miniconda3/envs/great_exp/lib/python3.8/site-packages/great_expectations/datasource/new_datasource.py in get_batch_list_from_batch_request(self, batch_request)
157 batch_spec,
158 batch_markers,
--> 159 ) = data_connector.get_batch_data_and_metadata(
160 batch_definition=batch_definition,
161 runtime_parameters=runtime_parameters,
TypeError: get_batch_data_and_metadata() got an unexpected keyword argument 'runtime_parameters'
Questions
- How do I add these to this configuration so that I can filter by date?
- I tried using
database_name.table_name
but it was not recognized. How do I qualify by database_name? In athena we have separate databases for staging and production environment. So I need to be able to qualify withdatabase_name