How can I use the return_format unexpected_index_list to select row from a PandasDataSet

Im looking to provide unexpected index list as a parameter for the next task in my data pipeline in airflow or dagster.

Here’s a toy example that fetches the list of unexpected rows from a PandasDataAsset.

import great_expectations as ge
import pandas as pd
df = pd.DataFrame({
    "x": [0,1,2,3,4],
    "y": list("abcde")
})
ge_df = ge.from_pandas(df)

expectation_suite = ge.core.ExpectationSuite(**{
  "expectation_suite_name": "default",
  "expectations": [
    {
      "expectation_type": "expect_column_values_to_be_between",
      "kwargs": {
        "column": "x",
        "min_value": 1,
        "max_value": 3
      },
      "meta": {}
    }
  ],
})
result = ge_df.validate(expectation_suite, result_format="COMPLETE")
unexpected_indexes = result.results[0].result["unexpected_index_list"]

print(unexpected_indexes)
>> [0, 4]
failed_rows_df = df.ix[unexpected_indexes]
print(failed_rows_df)
>>>
   x  y
0  0  a
4  4  e

This example uses a single expectation. With a little more work, you could aggregate the bad rows across multiple expectations.

1 Like

This was a great solution. Worked for me with the exception of the second to last line.

df.ix has been deprecated so you would just want to update that with df.iloc[unexpected_indexes]