Exact match for expect_table_columns_to_match_set

GX has an expectation called expect_table_columns_to_match_set that checks if the columns in a data frame match an unordered set. This expectation has a parameter called exact_match, which allows users to specify whether the list of columns must exactly match the observed columns. However, based on my experience, this parameter might not be clear to all users.

To provide a better understanding of how this parameter affects the expectation’s behavior, let’s consider an example. Assume we have a data frame called df with the following columns: “a”, “b”, and “c”. We can implement several expectations with different values by using the GX:

pd_gx = gx.from_pandas(df)

  1. pd_gx.expect_table_columns_to_match_set(["a", "b"], exact_match=False)
    This expectation will return True because exact_match is set to False, meaning that the data frame must have at least the expected columns (in any order).
  2. pd_gx.expect_table_columns_to_match_set(["a", "b"], exact_match=True)
    This expectation will return False because exact_match is set to True, meaning that the data frame must have the expected columns exactly (in any order).
  3. pd_gx.expect_table_columns_to_match_set(["a", "b", "c", "d"], exact_match=False)
  4. pd_gx.expect_table_columns_to_match_set(["a", "b", "c", "d"], exact_match=True)
    Both of these expectations will return False because the data frame does not have all the expected columns.
    I believe these examples will help you gain a deeper understanding of the exact_match parameter.

Full example:

import pandas as pd
import great_expectations as gx

df = pd.DataFrame({"a": [1,2,3], "b": ["a", "b", "c"], "c": [True, False, True]})
pd_gx = gx.from_pandas(df)
result = [
  pd_gx.expect_table_columns_to_match_set(["a", "b"], exact_match=False).success,
  pd_gx.expect_table_columns_to_match_set(["a", "b"], exact_match=True).success,
  pd_gx.expect_table_columns_to_match_set(["a", "b","c", "d"], exact_match=False).success,
  pd_gx.expect_table_columns_to_match_set(["a", "b","c", "d"], exact_match=True).success 
]
result
1 Like

Thank you for sharing this information for the community @Aleksei !

2 Likes