Expect_column_values_to_be_in_set takes too long to process

Piyush · June 21, 2021, 3:27pm

Hi,
I am using great expectations. My source data has only 97 records while reference data has 5 lakhs records. I need to compare whether my source dataset is a subset of reference dataset. I am using the rule label ‘expect_column_values_to_be_in_set’ and I am running great expectations on databricks. The rule execution takes around 55 minutes to give the result. Can someone please help on how I can optimize the performance and reduce the time.

Topic		Replies	Views
Validations need to be ran twice to store and create data docs Archive how-to , help-wanted	1	487	March 8, 2021
Parallel Execution of Great Expectation Validations Feedback	0	428	June 5, 2023
Wanted help in creating query Expectation to use on diffrent column name GX Core Support how-to , help-wanted , databricks	3	344	October 25, 2023
Issue while running Great expectation validations on Big Query External Tables GX Core Support	0	121	May 16, 2024
Abnormal behavior on expectation expect_column_pair_values_a_to_be_greater_than_b GX Core Support help-wanted	1	45	March 31, 2025

Expect_column_values_to_be_in_set takes too long to process

Related topics