I am building an expectation suite in python, and am coming across a few different scenarios where adding another expectation of the same type seems to overwrite an existing one.
Example 1: expect_column_values_to_be_in_set
(different value_sets for same column)
cohorts = ["Cohort1","Cohort2","Cohort3"]
validator.expect_column_values_to_be_in_set(
"Cohort",
value_set=cohorts,
meta={ "profiler_details": { "metric_configuration": { "metric_name": "cohort_in_approved_set" } } }
)
validator.expect_column_values_to_be_in_set(
"Cohort",
value_set={ "$PARAMETER": "urn:great_expectations:stores:sql_data_store:lookup_cohort_list" },
meta={ "profiler_details": { "metric_configuration": { "metric_name": "cohort_configured_in_platform" } } }
)
In this example, the second expectation runs but not the first.
Example 2: expect_table_row_count_to_be_between
(different row_conditions)
for cohort in cohorts:
validator.expect_table_row_count_to_be_between(
row_condition=f"Cohort==\"{cohort}\"",
condition_parser='spark',
min_value=get_min_expected_rows_for_cohort(cohort),
max_value=get_max_expected_rows_for_cohort(cohort),
meta={ "profiler_details": { "metric_configuration": { "metric_name": f"{cohort}_row_count" } } }
)
In this example, the expecataion for Cohort3
runs, but not the expectations for Cohort1
or Cohort2
.
Note that there doesn’t seem to be a fundamental issue with utilising the same expectation type multiple times, as demonstated below.
Example 3: expect_column_values_to_be_in_set
(different row_conditions and value_sets)
for cohort in cohorts:
validator.expect_column_values_to_be_in_set(
"Active",
row_condition=f"Cohort==\"{cohort}\"",
condition_parser='spark',
value_set=[ False ],
mostly=get_min_expected_inactive_for_cohort(cohort),
meta={ "profiler_details": { "metric_configuration": { "metric_name": f"{cohort}_inactive" } } }
)
validator.expect_column_values_to_be_in_set(
"Active",
row_condition=f"Cohort==\"{cohort}\"",
condition_parser='spark',
value_set=[ True ],
mostly=get_min_expected_active_for_cohort(cohort),
meta={ "profiler_details": { "metric_configuration": { "metric_name": f"{cohort}_active" } } } )
Here, the second expectation runs for each cohort, but not the first - so I get 3 of the 6 expectations I was hoping for.
Perhaps what I’m trying to achieve is not possible (I know Conditional Expectations are experimental), but I’m hoping someone may be able to point me in the right direction.