Add an expectations to an expectation suite without obtaining a sample batch

I’d like to be able to programmatically add a few expectations to a batch (or validator?) without immediately executing the query.

My use case is supporting rules generating by domain experts for data that will exist in the future, but doesn’t yet exist. Our workflow involves developing new features, promoting them to our prod environment, and then waiting for data to roll in from usage – we’d like our authors to write expectations during feature development, rather than waiting fo the actual data to be available.

1 Like

The default mode is that when you want to create an Expectation Suite, you create a batch out of sample data and add Expectations to the suite by calling expect_* methods. These method calls evaluate the Expectation against the sample batch and add it to the suite.

@ryan describes another great use case, when you want to add Expectation to a suite relying on your knowledge and you don’t want to validate them on a sample (or the sample does not exist).

Approach 1

Keep using the code that adds Expectations to a suite and runs validation as it goes, but “trick” it by providing a empty “dummy” batch of data. Set the interactive_evaluation flag to False to save the time and make sure the validation does not complain. The code snippet below shows a complete example.

The advantage of this approach is that when you start writing “batch.expect”, auto-complete in Jupyter or on IDE of you choice will display all the expectation types and their documentation.

import great_expectations as ge
import pandas as pd
context = ge.data_context.DataContext()

expectation_suite_name = “my_new_expectation_suite” #TODO: replace with the name of your suite
suite = context.create_expectation_suite(
expectation_suite_name, overwrite_existing=True
)

batch_kwargs = {
“dataset”: pd.DataFrame(), # create a simplest “dummy” dataset
“datasource”: “my_datasource”, #TODO: replace with the name of your datasource
“data_asset_name”: “some_name”,
}
batch = context.get_batch(batch_kwargs, suite)
batch.set_config_value(“interactive_evaluation”, False)

batch.expect_column_max_to_be_between(“mycolumn”, min_value=1, max_value=100)

TODO: add more expectations

suite = batch.get_expectation_suite()
context.save_expectation_suite(suite, expectation_suite_name)

Approach 2

Don’t create a batch, not even a “dummy” one. Instead, you create an Expectation Suite and then add Expectations to it by specifying their configurations.

This notebook contains a compete example:

https://github.com/great-expectations/great_expectations/blob/eugene/example_notebook_create_suite_without_sample_202012/examples/notebooks/create_expectation_suite_without_sample_data.ipynb

This short video provides the details:
https://www.loom.com/share/4eb133fadf6e427984e531573b661e29

Please tell us in the comments which approach you like more and why.

1 Like

Thanks for the detailed response! There’s something to like in both of these approaches, for sure. I suspect for our use case, we’ll manually add the parameterized ExpectationConfiguration objects since we’ll be doing this outside of a jupyter notebook. But for the notebook case, I would probably go with the first solution to get the nice autocomplete functionality.

2 Likes