Hi everyone,
I wanted to compare whether all the values (including duplicates and their count as well) in my column are the same as in the list, but I couldn’t do it using expect_column_values_to_be_in_set - there was a problem that it only checks for unique values, so I couldn’t confirm for sure that all the values in this column are identical to the values in the list including duplicates and counts = (
Conditionally, if I have the value ‘2023-10-10’ in the list of 3, and 4 in the column of such values, then the exception will be defined as correct, which does not correspond to what I want to achieve. For me to determine the exception as successful, their number should also match.
Part of my code in which i got expected_values and created that expectation with expect_column_values_to_be_in_set:
expected_values = [str(pd.to_datetime(val).date()) for val in df[‘post_time’] if pd.notnull(val)]
expectation_configuration_partition_date_matches_post_time = ExpectationConfiguration(
expectation_type=“expect_column_values_to_be_in_set”,
kwargs={
“column”: “partition_date”,
“value_set”: expected_values,
“result_format”: “COMPLETE”
},
meta={
“notes”: {
“format”: “markdown”,
“content”: "Confirm that all values in the [partition_date] column match the transformed values from the "
“[post_time] column, while also handling missing values and duplicates.”
“Markdown Supported
”,
}
},
)
suite.add_expectation(expectation_configuration=expectation_configuration_partition_date_matches_post_time)
context.add_or_update_expectation_suite(expectation_suite=suite)
Can anyone tell me what the problem is or advise a ready-made solution for this exception?