Similar expecations overwriting one another

mike_f50 · April 9, 2024, 2:23pm

I am building an expectation suite in python, and am coming across a few different scenarios where adding another expectation of the same type seems to overwrite an existing one.

Example 1: expect_column_values_to_be_in_set (different value_sets for same column)

cohorts = ["Cohort1","Cohort2","Cohort3"]

validator.expect_column_values_to_be_in_set(
    "Cohort",
    value_set=cohorts,
    meta={ "profiler_details": { "metric_configuration": { "metric_name": "cohort_in_approved_set" } } }
)
validator.expect_column_values_to_be_in_set(
    "Cohort",
    value_set={ "$PARAMETER": "urn:great_expectations:stores:sql_data_store:lookup_cohort_list" },
    meta={ "profiler_details": { "metric_configuration": { "metric_name": "cohort_configured_in_platform" } } }
)

In this example, the second expectation runs but not the first.

Example 2: expect_table_row_count_to_be_between (different row_conditions)

for cohort in cohorts:
    validator.expect_table_row_count_to_be_between(
        row_condition=f"Cohort==\"{cohort}\"",
        condition_parser='spark',
        min_value=get_min_expected_rows_for_cohort(cohort),
        max_value=get_max_expected_rows_for_cohort(cohort),
        meta={ "profiler_details": { "metric_configuration": { "metric_name": f"{cohort}_row_count" } } }
    )

In this example, the expecataion for Cohort3 runs, but not the expectations for Cohort1 or Cohort2.

Note that there doesn’t seem to be a fundamental issue with utilising the same expectation type multiple times, as demonstated below.

Example 3: expect_column_values_to_be_in_set (different row_conditions and value_sets)

for cohort in cohorts:
    validator.expect_column_values_to_be_in_set(
        "Active",
        row_condition=f"Cohort==\"{cohort}\"",
        condition_parser='spark',
        value_set=[ False ],
        mostly=get_min_expected_inactive_for_cohort(cohort),
        meta={ "profiler_details": { "metric_configuration": { "metric_name": f"{cohort}_inactive" } } }
    )
    validator.expect_column_values_to_be_in_set(
        "Active",
        row_condition=f"Cohort==\"{cohort}\"",
        condition_parser='spark',
        value_set=[ True ],
        mostly=get_min_expected_active_for_cohort(cohort),
        meta={ "profiler_details": { "metric_configuration": { "metric_name": f"{cohort}_active" } } }    )

Here, the second expectation runs for each cohort, but not the first - so I get 3 of the 6 expectations I was hoping for.

Perhaps what I’m trying to achieve is not possible (I know Conditional Expectations are experimental), but I’m hoping someone may be able to point me in the right direction.

mike_f50 · April 9, 2024, 2:57pm

In the interests of attempting to answer my own question, I have now come across some relevant documentation, which suggests GE may be being cleverer than I want and editing my expecations rather than adding additional ones, since I’m creating them in the “interactive” mode: Create Expectations interactively with Python | Great Expectations

Since I am coding up my production system rather than interactively interrogating the data, I am going to explore using this alternative method for creating my expectations:

ToivoMattila · April 9, 2024, 3:31pm

This looks interesting.
Yeah, writing the expectations with Python overwrites the existing expectation if it targets the same column. I have no idea if GX allows having multiple expectations like this if you write them directly to YAML. I’m interested in how this works, please do update the thread with what you find out!

Besides that, to me, this sounds like you might want to have multiple versions of something, maybe Assets or Expectations Suites. Granted, this is without knowing the specifics, i.e. what data you are validating.

ToivoMattila · April 9, 2024, 4:02pm

Took a look at this and apparently it’s possible to just directly add the same Expectation multiple times into the YAML/JSON files and it works just fine!

mike_f50 · April 9, 2024, 4:17pm

Thanks Tovio. Yes, appending expectations to a suite using Python seems to be doing the trick too.

mike_f50 · April 11, 2024, 10:28am

The specific syntax that works for me is:

suite.expectations.append()

Note that suite.add_expectation_configuration() (as currently stated in the documentation) is invalid, and suite.add_expectation() seems to apply the same de-duplication logic which I am trying to avoid.

See here for fuller examples, but swap in suite.expectations.append() at the point of adding the expectation to the suite:

chandru.kdj · June 9, 2025, 5:17am

I just tried today swapping in suite.expectations.append() in 1.4.6 version. It doesn’t seem to work. The expectation suite validation result says zero expectations added.

Topic		Replies	Views
Can't define mulitple "expect_table_row_count_to_be_between" with different "row_condition" GX Core Support	1	490	July 4, 2024
Expectation to confirm that all values in column are the same as in a list GX Core Support how-to , help-wanted	0	216	January 12, 2024
Wanted help in creating query Expectation to use on diffrent column name GX Core Support how-to , help-wanted , databricks	3	367	October 25, 2023
Expect_column_pair_values_to_be_equal does not generate a row level report? GX Core Support help-wanted	1	273	December 20, 2023
'ExpectationSuite' object has no attribute 'add_expectation_configuration' GX Core Support	5	570	April 11, 2024

Similar expecations overwriting one another

Related topics