How can I use the return_format unexpected_index_list to select row from a PandasDataSet

Im looking to provide unexpected index list as a parameter for the next task in my data pipeline in airflow or dagster.

Here’s a toy example that fetches the list of unexpected rows from a PandasDataAsset.

import great_expectations as ge
import pandas as pd
df = pd.DataFrame({
    "x": [0,1,2,3,4],
    "y": list("abcde")
ge_df = ge.from_pandas(df)

expectation_suite = ge.core.ExpectationSuite(**{
  "expectation_suite_name": "default",
  "expectations": [
      "expectation_type": "expect_column_values_to_be_between",
      "kwargs": {
        "column": "x",
        "min_value": 1,
        "max_value": 3
      "meta": {}
result = ge_df.validate(expectation_suite, result_format="COMPLETE")
unexpected_indexes = result.results[0].result["unexpected_index_list"]

>> [0, 4]
failed_rows_df = df.ix[unexpected_indexes]
   x  y
0  0  a
4  4  e

This example uses a single expectation. With a little more work, you could aggregate the bad rows across multiple expectations.

1 Like

This was a great solution. Worked for me with the exception of the second to last line.

df.ix has been deprecated so you would just want to update that with df.iloc[unexpected_indexes]