Expectation on a subset of data

Hi there,

Let’s say I have a table of users, each one has a name, age and country. I would like to define an expectation that checks that the average age for user in US is between 25 and 30.

In the docs I read that I should prepare an aggregated table that has an average age for a country. But if I want to do this for multiple countries, I would have to create a table like this for each country.

Is there a better way to do this? Like an expectation - there should be a row with country = US and avg_age column between 25 and 30? Is there anything on the GE roadmap to help with these kind of expectations?

Thanks

1 Like

This is currently true - you might need to prepare a table per country.

However, the following can be helpful:
We have a relatively recent community contribution that can help: conditional expectations. Conditional expectations allow (among other things) to express an expectation about values in one column conditional on values in another column.To follow the example described in the post, expect the average of “Age” column to be between 25 and 30 for records where “Country”=“US”.

This feature is documented here: https://docs.greatexpectations.io/en/latest/reference/conditional_expectations.html

Conditional expectations are currently implemented only for Pandas. Contributions implementing this feature for Spark and SQLAlchemy are most welcome.

1 Like