I am validating some data with a ValueString
coumn and a ValueInt
column.
When ValueString
is populated, I expect it to correspond to the ValueInt
. (In the data, this happens for specific values, e.g. ValueString
of ">90"
with VallueInt
of 90
.) When ValueString
is null
, ValueInt
can be whatever it wants.
I have tried to achieve this via an expect_column_pair_values_to_be_in_set
expectation with a row_condition
, configured as follows:
{
"column_A": "ValueString",
"column_B": "ValueInt",
"condition_parser": "spark",
"row_condition": "ValueString IS NOT NULL",
"value_pairs_set": [ [">90", 90], [">60", 60], [">140", 140], [">=90", 90] ]
}
However, the data docs rendering is failing with:
Invalid result values were found when trying to instantiate an ExpectationValidationResult. - Invalid result values are likely caused by inconsistent cache values. - Great Expectations enables caching by default. - Please ensure that caching behavior is consistent between the underlying Dataset (e.g. Spark) and Great Expectations. Result: { “element_count”: 14777, “unexpected_count”: 50065, “unexpected_percent”: 77.21075845902348, “partial_unexpected_list”: [ [ null, 75 ], [ null, 160 ], [ null, 82 ], [ null, 53 ], [ null, 69 ], …
My obersavtions:
- The error message is coming through consistently, so I think the suggestion of a cause relating to caching is a red herring.
- The
element_count
is smaller than theunexpected_count
. - The
element_count
corresponds to the number of records passing therow_condition
. - The
partial_unexpected_list
shows examples which should fail therow_condition
.
Hence my question: Is there something about ColumnPairMapExpectations
, or expect_column_pair_values_to_be_in_set
specifically, being incompatible with row_condition
s?
Reviewing the validations store, the expectation failed, with no result, so the exception is occurring during validation and not just during data docs rendering.