How to set expectation to have a threshold

I have an expectation: ExpectColumnValuesToBeBetween.
I want it to pass if 60% of rows meet the expectation. How to set that?

In the validation results, i do see stats like
“”"
“result”: {
“element_count”: 22651362,
“unexpected_count”: 0,
“unexpected_percent”: 0.0,
“partial_unexpected_list”: ,
“missing_count”: 0,
“missing_percent”: 0.0,
“unexpected_percent_total”: 0.0,
“unexpected_percent_nonmissing”: 0.0,
“partial_unexpected_counts”: ,
“unexpected_list”: ,
“unexpected_index_query”: “”
},
“”"

but parsing the validation results will be an extra effort, especially when I have multiple expectations. Is there a more straightforward way to set the threshold in the expectation config?

You can pass mostly argument to the Expectation, see the example code below.

gx.expectations.ExpectColumnValuesToBeBetween(
    column="passenger_count",
    min_value=1,
    max_value=5,
    mostly=0.6,
)

This does exactly what you are looking for. If more than 60% of the values in the passenger_count column are less than 1 or more than 5, the Expectation fails.

This is not mentioned on the documentation for the 1.0 version of ExpectColumnValuesToBeBetween but it is mentioned in the documentation for the legacy version expect_column_values_to_be_between in the Keyword Args section.

Thank you Toivo!
one more qq: is “mostly” argument only for certain expectations, or it’s supported in ALL expectations?

Just tried “mostly” with ExpectColumnValuesToNotBeNull and it worked. i would assume it’s a general argument supported in different expectations.

This is an interesting question! Hadn’t thought of that, thanks!

Looks like only some Expectations have the mostly parameter.

Here are the Expectations that have the mostly parameter:

ExpectColumnPairValuesAToBeGreaterThanB
ExpectColumnPairValuesToBeEqual
ExpectColumnPairValuesToBeInSet
ExpectColumnValueLengthsToBeBetween
ExpectColumnValueLengthsToEqual
ExpectColumnValueZScoresToBeLessThan
ExpectColumnValuesToBeBetween
ExpectColumnValuesToBeDateutilParseable
ExpectColumnValuesToBeDecreasing
ExpectColumnValuesToBeInSet
ExpectColumnValuesToBeInTypeList
ExpectColumnValuesToBeIncreasing
ExpectColumnValuesToBeJsonParseable
ExpectColumnValuesToBeNull
ExpectColumnValuesToBeOfType
ExpectColumnValuesToBeUnique
ExpectColumnValuesToMatchJsonSchema
ExpectColumnValuesToMatchLikePattern
ExpectColumnValuesToMatchLikePatternList
ExpectColumnValuesToMatchRegex
ExpectColumnValuesToMatchRegexList
ExpectColumnValuesToMatchStrftimeFormat
ExpectColumnValuesToNotBeInSet
ExpectColumnValuesToNotBeNull
ExpectColumnValuesToNotMatchLikePattern
ExpectColumnValuesToNotMatchLikePatternList
ExpectColumnValuesToNotMatchRegex
ExpectColumnValuesToNotMatchRegexList
ExpectCompoundColumnsToBeUnique
ExpectMulticolumnSumToEqual
ExpectMulticolumnValuesToBeUnique
ExpectSelectColumnValuesToBeUniqueWithinRecord

whereas these Expectations do not have the mostly parameter:

ExpectColumnDistinctValuesToBeInSet
ExpectColumnDistinctValuesToContainSet
ExpectColumnDistinctValuesToEqualSet
ExpectColumnKLDivergenceToBeLessThan
ExpectColumnMaxToBeBetween
ExpectColumnMeanToBeBetween
ExpectColumnMedianToBeBetween
ExpectColumnMinToBeBetween
ExpectColumnMostCommonValueToBeInSet
ExpectColumnProportionOfUniqueValuesToBeBetween
ExpectColumnQuantileValuesToBeBetween
ExpectColumnStdevToBeBetween
ExpectColumnSumToBeBetween
ExpectColumnToExist
ExpectColumnUniqueValueCountToBeBetween
ExpectTableColumnCountToBeBetween
ExpectTableColumnCountToEqual
ExpectTableColumnsToMatchOrderedList
ExpectTableColumnsToMatchSet
ExpectTableRowCountToBeBetween
ExpectTableRowCountToEqual
ExpectTableRowCountToEqualOtherTable
1 Like

For future reference, here’s the code for getting the Expectations with the mostly parameter:

from inspect import signature

import great_expectations.expectations as gxe
from great_expectations.expectations import *

# Prints out all the Expectations and some more
# Manually copy over all the expectations
dir(gxe)

all_expectations = [
    ExpectColumnDistinctValuesToBeInSet,
    ExpectColumnDistinctValuesToContainSet,
    ExpectColumnDistinctValuesToEqualSet,
    ExpectColumnKLDivergenceToBeLessThan,
    ExpectColumnMaxToBeBetween,
    ExpectColumnMeanToBeBetween,
    ExpectColumnMedianToBeBetween,
    ExpectColumnMinToBeBetween,
    ExpectColumnMostCommonValueToBeInSet,
    ExpectColumnPairValuesAToBeGreaterThanB,
    ExpectColumnPairValuesToBeEqual,
    ExpectColumnPairValuesToBeInSet,
    ExpectColumnProportionOfUniqueValuesToBeBetween,
    ExpectColumnQuantileValuesToBeBetween,
    ExpectColumnStdevToBeBetween,
    ExpectColumnSumToBeBetween,
    ExpectColumnToExist,
    ExpectColumnUniqueValueCountToBeBetween,
    ExpectColumnValueLengthsToBeBetween,
    ExpectColumnValueLengthsToEqual,
    ExpectColumnValueZScoresToBeLessThan,
    ExpectColumnValuesToBeBetween,
    ExpectColumnValuesToBeDateutilParseable,
    ExpectColumnValuesToBeDecreasing,
    ExpectColumnValuesToBeInSet,
    ExpectColumnValuesToBeInTypeList,
    ExpectColumnValuesToBeIncreasing,
    ExpectColumnValuesToBeJsonParseable,
    ExpectColumnValuesToBeNull,
    ExpectColumnValuesToBeOfType,
    ExpectColumnValuesToBeUnique,
    ExpectColumnValuesToMatchJsonSchema,
    ExpectColumnValuesToMatchLikePattern,
    ExpectColumnValuesToMatchLikePatternList,
    ExpectColumnValuesToMatchRegex,
    ExpectColumnValuesToMatchRegexList,
    ExpectColumnValuesToMatchStrftimeFormat,
    ExpectColumnValuesToNotBeInSet,
    ExpectColumnValuesToNotBeNull,
    ExpectColumnValuesToNotMatchLikePattern,
    ExpectColumnValuesToNotMatchLikePatternList,
    ExpectColumnValuesToNotMatchRegex,
    ExpectColumnValuesToNotMatchRegexList,
    ExpectCompoundColumnsToBeUnique,
    ExpectMulticolumnSumToEqual,
    ExpectMulticolumnValuesToBeUnique,
    ExpectSelectColumnValuesToBeUniqueWithinRecord,
    ExpectTableColumnCountToBeBetween,
    ExpectTableColumnCountToEqual,
    ExpectTableColumnsToMatchOrderedList,
    ExpectTableColumnsToMatchSet,
    ExpectTableRowCountToBeBetween,
    ExpectTableRowCountToEqual,
    ExpectTableRowCountToEqualOtherTable,
]
expectations_with_mostly_parameter = [str(expectation).rsplit(".")[-1].split("'")[0] for expectation in all_expectations if "mostly" in signature(expectation).parameters]
for expectation in expectations_with_mostly_parameter:
    print(expectation)

expectations_without_mostly_parameter = [str(expectation).rsplit(".")[-1].split("'")[0] for expectation in all_expectations if "mostly" not in signature(expectation).parameters]
for expectation in expectations_without_mostly_parameter:
    print(expectation)
1 Like