Desperate for examples of full custom expectations

I cannot for the life of me find good examples of custom expectations written for v1.x. I’m only working in pandas, so I’d think it would be easy but I’m running into a really tough time, and now I’m down to the wire.

If it helps, the two things I’m trying to do are:

  1. detect rows outside of n IQRs of the median of a column
  2. for a table that includes two columns Date and HourEnding, confirm that those columns for a unique key AND that all rows in a reference table appear in the target table as long as their Date is between the earliest and latest in the target table.

I’ve spent days trying to parse everything in the repo under great_expectations/great_expectations/expectations at develop · great-expectations/great_expectations · GitHub and under docs/ snippets: great_expectations/docs/docusaurus/docs/snippets at develop · great-expectations/great_expectations · GitHub to no avail. I’ve also tried using the v0.18 tutorials but things seem to have changed a lot since then and the release notes just don’t explain any of the breaking changes.

If anyone can point me in the right direction, I’d really appreciate it. Alternatively, if someone would be willing to work with me for an hour, I’d happily pay a consulting fee.

Thanks,
Mike

Here is my spark data source I wrote, I think it should be pretty strait forward to change it to pandas. simply do add_pandas instead of add_spark

expectation_suite_name = "my_expectation_suite_name"
suite = context.suites.add(
    gx.ExpectationSuite(
        name=expectation_suite_name
    )
)

suite.add_expectation(
    gxe.ExpectColumnValuesToNotBeNull(
        column="id",
        meta={
            "priority": "P1",
            "notes": {
                "format": "markdown",
                "content": "Critical! Missing KEY for Construction Method Category - Hard Delete",
            },
        },
    )
)

result_format = {
    "result_format": "COMPLETE",
    "return_unexpected_index_query": True
}

df = my_get_df_method(...)

data_source = self.context.data_sources.add_spark(
    name="my_spark",
    persist=True
)

data_asset = data_source.add_dataframe_asset(name=data_asset_name)

batch_definition = data_asset.add_batch_definition_whole_dataframe(
    name="my_data_asset_name"
)

validation_definition = gx.ValidationDefinition(
    data=batch_definition,
    suite=suite,
    name=data_asset_name
)

context.validation_definitions.add(validation_definition)

batch_parameters = {"dataframe": df}

validation_result = validation_definition.run(
    batch_parameters=batch_parameters,
    result_format=result_format
)
1 Like

hi Mike,

these pages in our docs weren’t helpful for getting started?

If not, please share where you are facing the challenge.