How to get a column's distinct values and their corresponding weights when adding an expect_column_kl_divergence_to_be_less_than expectation

eugene.mandel · June 2, 2020, 6:07pm

When adding a expect_column_kl_divergence_to_be_less_than on a low-cardinality column, we need to pass a “partition_object” argument that holds the values of the column and their weights.

How can GE help get this object from a sample batch of data?

eugene.mandel · June 2, 2020, 6:11pm

This answer assumes that you:

are an a notebook generated by GE (when you created an expectation suite)
obtained a batch (this code is present in the generated notebook)

First, call expect_column_kl_divergence_to_be_less_than without specifying any partition object or threshold. GE will interpret this as you having no constraints and will returned the partition that it observed in the batch:

profiling_result = batch.expect_column_kl_divergence_to_be_less_than(COLUMN_NAME,
bucketize_data=False,
partition_object=None,
threshold=None,
result_format=‘COMPLETE’)
observed_partition = profiling_result.result[“details”][“observed_partition”]

The step above essentially profiled the column.
Now you can use the observed partition object to create a real expectation (don’t forget to specify a threshold as well):

batch.expect_column_kl_divergence_to_be_less_than(COLUMN_NAME,
bucketize_data=False,
partition_object=observed_partition,
threshold=0.4,
)

Topic		Replies	Views
Wanted help in creating query Expectation to use on diffrent column name GX Core Support how-to , help-wanted , databricks	3	363	October 25, 2023
Check nulls in subset of columns Archive	5	2101	March 18, 2020
Desperate for examples of full custom expectations GX Core Support how-to	2	95	March 31, 2025
Expect_column_pair_values_to_be_equal does not generate a row level report? GX Core Support help-wanted	1	270	December 20, 2023
Expectation to confirm that all values in column are the same as in a list GX Core Support how-to , help-wanted	0	212	January 12, 2024

How to get a column's distinct values and their corresponding weights when adding an expect_column_kl_divergence_to_be_less_than expectation

Related topics