Enhancement request:
We are incorporating GE into a custom data quality tool. The problem we are experiencing is that some expectations are structural and others check actual data within rows. It would be very helpful if the GE expectations provided a way for us to know programmatically whether to expect just a pass/fail or whether to expect statistics about how many rows passed/failed.
At the moment our workaround is to parse the response for specific words, but that is obviously brittle.
3 Likes
Great question/request!
The main relevant concepts are column_map_expectations
and column_aggregate_expectations
. Map Expectations apply on a row-by-row basis. Aggregate Expectations apply to a whole column at once. (There are also analogous classes for multicolumn Expectations.)
These concepts are currently implemented as decorators within the Dataset
class. We’re in the process of refactoring them to be their own classes. Among other things, this will make them much more inspectable.
In the meantime, I believe you can work around the issue by grepping for the decorator names (MetaDataset.column_map_expectation
and MetaDataset.column_aggregate_expectation
) in dataset.py
:
$ grep -B 1 column_map_expectation dataset.py
(Ignoring a few unrelated grepped matches...)
--
<great_expectations.data_asset.data_asset.DataAsset.expectation>`, not a
``column_map_expectation`` or ``column_aggregate_expectation``.
--
<great_expectations.data_asset.data_asset.DataAsset.expectation>`, not a
``column_map_expectation`` or ``column_aggregate_expectation``.
--
<great_expectations.data_asset.data_asset.DataAsset.expectation>`, not a
``column_map_expectation`` or ``column_aggregate_expectation``.
--
<great_expectations.data_asset.data_asset.DataAsset.expectation>`, not a
``column_map_expectation`` or ``column_aggregate_expectation``.
--
<great_expectations.data_asset.data_asset.DataAsset.expectation>`, not a
``column_map_expectation`` or ``column_aggregate_expectation``.
--
<great_expectations.data_asset.data_asset.DataAsset.expectation>`, not a
``column_map_expectation`` or ``column_aggregate_expectation``.
--
<great_expectations.data_asset.data_asset.DataAsset.expectation>`, not a
``column_map_expectation`` or ``column_aggregate_expectation``.
--
expect_column_values_to_be_unique is a \
:func:`column_map_expectation <great_expectations.dataset.dataset.MetaDataset.column_map_expectation>`.
--
expect_column_values_to_not_be_null is a \
:func:`column_map_expectation <great_expectations.dataset.dataset.MetaDataset.column_map_expectation>`.
--
expect_column_values_to_be_null is a \
:func:`column_map_expectation <great_expectations.dataset.dataset.MetaDataset.column_map_expectation>`.
--
For PandasDataset columns with dtype of 'object' expect_column_values_to_be_of_type is a
:func:`column_map_expectation <great_expectations.dataset.dataset.MetaDataset.column_map_expectation>` and will
--
'object'). For PandasDataset columns with dtype of 'object' expect_column_values_to_be_of_type is a
:func:`column_map_expectation <great_expectations.dataset.dataset.MetaDataset.column_map_expectation>` and will
--
expect_column_values_to_be_in_set is a \
:func:`column_map_expectation <great_expectations.dataset.dataset.MetaDataset.column_map_expectation>`.
--
expect_column_values_to_not_be_in_set is a \
:func:`column_map_expectation <great_expectations.dataset.dataset.MetaDataset.column_map_expectation>`.
--
expect_column_values_to_be_between is a \
:func:`column_map_expectation <great_expectations.dataset.dataset.MetaDataset.column_map_expectation>`.
--
expect_column_values_to_be_increasing is a \
:func:`column_map_expectation <great_expectations.dataset.dataset.MetaDataset.column_map_expectation>`.
--
expect_column_values_to_be_decreasing is a \
:func:`column_map_expectation <great_expectations.dataset.dataset.MetaDataset.column_map_expectation>`.
--
expect_column_value_lengths_to_be_between is a \
:func:`column_map_expectation <great_expectations.dataset.dataset.MetaDataset.column_map_expectation>`.
--
expect_column_values_to_be_between is a \
:func:`column_map_expectation <great_expectations.dataset.dataset.MetaDataset.column_map_expectation>`.
--
expect_column_values_to_match_regex is a \
:func:`column_map_expectation <great_expectations.dataset.dataset.MetaDataset.column_map_expectation>`.
--
expect_column_values_to_not_match_regex is a \
:func:`column_map_expectation <great_expectations.dataset.dataset.MetaDataset.column_map_expectation>`.
--
expect_column_values_to_match_regex_list is a \
:func:`column_map_expectation <great_expectations.dataset.dataset.MetaDataset.column_map_expectation>`.
--
expect_column_values_to_not_match_regex_list is a \
:func:`column_map_expectation <great_expectations.dataset.dataset.MetaDataset.column_map_expectation>`.
--
expect_column_values_to_match_strftime_format is a \
:func:`column_map_expectation <great_expectations.dataset.dataset.MetaDataset.column_map_expectation>`.
--
expect_column_values_to_be_dateutil_parseable is a \
:func:`column_map_expectation <great_expectations.dataset.dataset.MetaDataset.column_map_expectation>`.
--
expect_column_values_to_be_json_parseable is a \
:func:`column_map_expectation <great_expectations.dataset.dataset.MetaDataset.column_map_expectation>`.
--
expect_column_values_to_match_json_schema is a \
:func:`column_map_expectation <great_expectations.dataset.dataset.MetaDataset.column_map_expectation>`.
$ grep -A 1 @MetaDataset.column_aggregate_expectation dataset.py
(Ignoring a few unrelated grepped matches...)
@MetaDataset.column_aggregate_expectation
def expect_column_distinct_values_to_be_in_set(
--
@MetaDataset.column_aggregate_expectation
def expect_column_distinct_values_to_equal_set(
--
@MetaDataset.column_aggregate_expectation
def expect_column_distinct_values_to_contain_set(
--
@MetaDataset.column_aggregate_expectation
def expect_column_mean_to_be_between(
--
@MetaDataset.column_aggregate_expectation
def expect_column_median_to_be_between(
--
@MetaDataset.column_aggregate_expectation
def expect_column_quantile_values_to_be_between(
--
@MetaDataset.column_aggregate_expectation
def expect_column_stdev_to_be_between(
--
@MetaDataset.column_aggregate_expectation
def expect_column_unique_value_count_to_be_between(
--
@MetaDataset.column_aggregate_expectation
def expect_column_proportion_of_unique_values_to_be_between(
--
@MetaDataset.column_aggregate_expectation
def expect_column_most_common_value_to_be_in_set(
--
@MetaDataset.column_aggregate_expectation
def expect_column_sum_to_be_between(
--
@MetaDataset.column_aggregate_expectation
def expect_column_min_to_be_between(
--
@MetaDataset.column_aggregate_expectation
def expect_column_max_to_be_between(
--
@MetaDataset.column_aggregate_expectation
def expect_column_chisquare_test_p_value_to_be_greater_than(
--
@MetaDataset.column_aggregate_expectation
def expect_column_kl_divergence_to_be_less_than(
--
@MetaDataset.column_aggregate_expectation
def expect_column_pair_cramers_phi_value_to_be_less_than(
1 Like