Can ColumnPairMapExpectations handle row_conditions?

I am validating some data with a ValueString coumn and a ValueInt column.

When ValueString is populated, I expect it to correspond to the ValueInt. (In the data, this happens for specific values, e.g. ValueString of ">90" with VallueInt of 90.) When ValueString is null, ValueInt can be whatever it wants.

I have tried to achieve this via an expect_column_pair_values_to_be_in_set expectation with a row_condition, configured as follows:

{
  "column_A": "ValueString",
  "column_B": "ValueInt",
  "condition_parser": "spark",
  "row_condition": "ValueString IS NOT NULL",
  "value_pairs_set": [ [">90", 90], [">60", 60], [">140", 140], [">=90", 90] ]
}

However, the data docs rendering is failing with:

Invalid result values were found when trying to instantiate an ExpectationValidationResult. - Invalid result values are likely caused by inconsistent cache values. - Great Expectations enables caching by default. - Please ensure that caching behavior is consistent between the underlying Dataset (e.g. Spark) and Great Expectations. Result: { “element_count”: 14777, “unexpected_count”: 50065, “unexpected_percent”: 77.21075845902348, “partial_unexpected_list”: [ [ null, 75 ], [ null, 160 ], [ null, 82 ], [ null, 53 ], [ null, 69 ], …

My obersavtions:

  1. The error message is coming through consistently, so I think the suggestion of a cause relating to caching is a red herring.
  2. The element_count is smaller than the unexpected_count.
  3. The element_count corresponds to the number of records passing the row_condition.
  4. The partial_unexpected_list shows examples which should fail the row_condition.

Hence my question: Is there something about ColumnPairMapExpectations, or expect_column_pair_values_to_be_in_set specifically, being incompatible with row_conditions?

Reviewing the validations store, the expectation failed, with no result, so the exception is occurring during validation and not just during data docs rendering.

I have discovered a workaround.

Instead of a row_condition, it’s possible to use ignore_row_if. This does not seem to be well documented. (The best information I found was on this issue.) However, the following is doing what I require:

{
  "column_A": "ValueString",
  "column_B": "ValueInt",
  "condition_parser": "spark",
  "ignore_row_if": "either_value_is_missing",
  "value_pairs_set": [ [">90", 90], [">60", 60], [">140", 140], [">=90", 90] ]
}

I would still be interested to know whether this should have been possible with a row_condition, if anyone knows.