How to get a column's distinct values and their corresponding weights when adding an expect_column_kl_divergence_to_be_less_than expectation

When adding a expect_column_kl_divergence_to_be_less_than on a low-cardinality column, we need to pass a “partition_object” argument that holds the values of the column and their weights.

How can GE help get this object from a sample batch of data?

This answer assumes that you:

  1. are an a notebook generated by GE (when you created an expectation suite)
  2. obtained a batch (this code is present in the generated notebook)

First, call expect_column_kl_divergence_to_be_less_than without specifying any partition object or threshold. GE will interpret this as you having no constraints and will returned the partition that it observed in the batch:

profiling_result = batch.expect_column_kl_divergence_to_be_less_than(COLUMN_NAME,
bucketize_data=False,
partition_object=None,
threshold=None,
result_format=‘COMPLETE’)
observed_partition = profiling_result.result[“details”][“observed_partition”]

The step above essentially profiled the column.
Now you can use the observed partition object to create a real expectation (don’t forget to specify a threshold as well):

batch.expect_column_kl_divergence_to_be_less_than(COLUMN_NAME,
bucketize_data=False,
partition_object=observed_partition,
threshold=0.4,
)