Can ColumnPairMapExpectations handle row_conditions?

mike_f50 · July 4, 2024, 9:51am

I am validating some data with a ValueString coumn and a ValueInt column.

When ValueString is populated, I expect it to correspond to the ValueInt. (In the data, this happens for specific values, e.g. ValueString of ">90" with VallueInt of 90.) When ValueString is null, ValueInt can be whatever it wants.

I have tried to achieve this via an expect_column_pair_values_to_be_in_set expectation with a row_condition, configured as follows:

{
  "column_A": "ValueString",
  "column_B": "ValueInt",
  "condition_parser": "spark",
  "row_condition": "ValueString IS NOT NULL",
  "value_pairs_set": [ [">90", 90], [">60", 60], [">140", 140], [">=90", 90] ]
}

However, the data docs rendering is failing with:

Invalid result values were found when trying to instantiate an ExpectationValidationResult. - Invalid result values are likely caused by inconsistent cache values. - Great Expectations enables caching by default. - Please ensure that caching behavior is consistent between the underlying Dataset (e.g. Spark) and Great Expectations. Result: { “element_count”: 14777, “unexpected_count”: 50065, “unexpected_percent”: 77.21075845902348, “partial_unexpected_list”: [ [ null, 75 ], [ null, 160 ], [ null, 82 ], [ null, 53 ], [ null, 69 ], …

My obersavtions:

The error message is coming through consistently, so I think the suggestion of a cause relating to caching is a red herring.
The element_count is smaller than the unexpected_count.
The element_count corresponds to the number of records passing the row_condition.
The partial_unexpected_list shows examples which should fail the row_condition.

Hence my question: Is there something about ColumnPairMapExpectations, or expect_column_pair_values_to_be_in_set specifically, being incompatible with row_conditions?

Reviewing the validations store, the expectation failed, with no result, so the exception is occurring during validation and not just during data docs rendering.

mike_f50 · July 4, 2024, 6:19pm

I have discovered a workaround.

Instead of a row_condition, it’s possible to use ignore_row_if. This does not seem to be well documented. (The best information I found was on this issue.) However, the following is doing what I require:

{
  "column_A": "ValueString",
  "column_B": "ValueInt",
  "condition_parser": "spark",
  "ignore_row_if": "either_value_is_missing",
  "value_pairs_set": [ [">90", 90], [">60", 60], [">140", 140], [">=90", 90] ]
}

I would still be interested to know whether this should have been possible with a row_condition, if anyone knows.

Topic		Replies	Views
Unable to run 'row_condition' with spark GX Core Support help-wanted	8	216	April 9, 2025
Expectations with the row_condition throws an exception when either column or value have space, GX Core Support help-wanted	0	488	June 28, 2023
Unable to parse condition: col("guid").isNull() GX Core Support help-wanted	2	88	November 10, 2024
How to specify expect_column_pair_values_to_be_in_set value_pairs_set input arg via json GX Core Support how-to	7	446	March 6, 2024
Custom expectations for ExpectColumnPairValuesToBeInSet GX Core Support help-wanted	1	54	March 31, 2025

Can ColumnPairMapExpectations handle row_conditions?

Related topics