Similar expecations overwriting one another

I am building an expectation suite in python, and am coming across a few different scenarios where adding another expectation of the same type seems to overwrite an existing one.

Example 1: expect_column_values_to_be_in_set (different value_sets for same column)

cohorts = ["Cohort1","Cohort2","Cohort3"]

validator.expect_column_values_to_be_in_set(
    "Cohort",
    value_set=cohorts,
    meta={ "profiler_details": { "metric_configuration": { "metric_name": "cohort_in_approved_set" } } }
)
validator.expect_column_values_to_be_in_set(
    "Cohort",
    value_set={ "$PARAMETER": "urn:great_expectations:stores:sql_data_store:lookup_cohort_list" },
    meta={ "profiler_details": { "metric_configuration": { "metric_name": "cohort_configured_in_platform" } } }
)

In this example, the second expectation runs but not the first.

Example 2: expect_table_row_count_to_be_between (different row_conditions)

for cohort in cohorts:
    validator.expect_table_row_count_to_be_between(
        row_condition=f"Cohort==\"{cohort}\"",
        condition_parser='spark',
        min_value=get_min_expected_rows_for_cohort(cohort),
        max_value=get_max_expected_rows_for_cohort(cohort),
        meta={ "profiler_details": { "metric_configuration": { "metric_name": f"{cohort}_row_count" } } }
    )

In this example, the expecataion for Cohort3 runs, but not the expectations for Cohort1 or Cohort2.

Note that there doesn’t seem to be a fundamental issue with utilising the same expectation type multiple times, as demonstated below.

Example 3: expect_column_values_to_be_in_set (different row_conditions and value_sets)

for cohort in cohorts:
    validator.expect_column_values_to_be_in_set(
        "Active",
        row_condition=f"Cohort==\"{cohort}\"",
        condition_parser='spark',
        value_set=[ False ],
        mostly=get_min_expected_inactive_for_cohort(cohort),
        meta={ "profiler_details": { "metric_configuration": { "metric_name": f"{cohort}_inactive" } } }
    )
    validator.expect_column_values_to_be_in_set(
        "Active",
        row_condition=f"Cohort==\"{cohort}\"",
        condition_parser='spark',
        value_set=[ True ],
        mostly=get_min_expected_active_for_cohort(cohort),
        meta={ "profiler_details": { "metric_configuration": { "metric_name": f"{cohort}_active" } } }    )

Here, the second expectation runs for each cohort, but not the first - so I get 3 of the 6 expectations I was hoping for.

Perhaps what I’m trying to achieve is not possible (I know Conditional Expectations are experimental), but I’m hoping someone may be able to point me in the right direction.

In the interests of attempting to answer my own question, I have now come across some relevant documentation, which suggests GE may be being cleverer than I want and editing my expecations rather than adding additional ones, since I’m creating them in the “interactive” mode: Create Expectations interactively with Python | Great Expectations

Since I am coding up my production system rather than interactively interrogating the data, I am going to explore using this alternative method for creating my expectations:

This looks interesting.
Yeah, writing the expectations with Python overwrites the existing expectation if it targets the same column. I have no idea if GX allows having multiple expectations like this if you write them directly to YAML. I’m interested in how this works, please do update the thread with what you find out!

Besides that, to me, this sounds like you might want to have multiple versions of something, maybe Assets or Expectations Suites. Granted, this is without knowing the specifics, i.e. what data you are validating.

Took a look at this and apparently it’s possible to just directly add the same Expectation multiple times into the YAML/JSON files and it works just fine!

Thanks Tovio. Yes, appending expectations to a suite using Python seems to be doing the trick too.

1 Like

The specific syntax that works for me is:

suite.expectations.append()

Note that suite.add_expectation_configuration() (as currently stated in the documentation) is invalid, and suite.add_expectation() seems to apply the same de-duplication logic which I am trying to avoid.

See here for fuller examples, but swap in suite.expectations.append() at the point of adding the expectation to the suite:

1 Like