Row by row comparison in great_expectation between two datasources

sd6974 · March 11, 2022, 10:36am

I am able to compare two datasources but it is not giving me a row by row comparison of column values at table level.
Can somebody help .

I was trying with custom expectations but didn’t achieve anything

diego.chapman · March 14, 2022, 4:13pm

If the tables are located in the same data base I would create a RunTimeBatchRequest with a SQL query that joins both tables and the logic needed. The outcome would be one batch with data from both tables. Based on that table, I would generate an expectation to compare 2 columns.

There might be another solution but I have succeeded in the past with workaround.

I hope it works for you. All the best!

sd6974 · March 15, 2022, 11:25am

Hi Diego,both are CSV files.
I am reading them through spark.
My goal is to find out rows that are uncommon between the two csv’s.

If there is any column mismatch,it would print the unexpected rows through any expectation

diego.chapman · March 16, 2022, 6:23pm

I guess both files have the same columns. Do you have many columns?

I would do this:

Step 1) Generate a new data frame (df3) with a new boolean column “common_row”. This data frame would contain all the rows from both data frames and this new column will tell if they are common TRUE or not FALSE (you need to work on the logic to get this).

Step 2) Create an expectation suite for this generated data frame (df3), and add an expectation which validates the boolean column “common_row”. The validation should check if there are uncommon rows (FALSE).

Step 3) Execute this as a RunTimeBatch sending the df3.

There might other solutions.

Topic		Replies	Views
How to compare two tables across different databases Archive how-to	1	2858	February 12, 2021
Expectation which includes two tables Archive how-to	7	3367	September 16, 2020
Difference in results when executing expectations on same data from CSV and Athena datasources Archive help-wanted	0	485	July 21, 2021
Compare sum of columns between two tables or datasets Archive	0	564	August 15, 2022
Expect_table_row_count_to_equal_other_table Archive	0	1143	February 28, 2022

Row by row comparison in great_expectation between two datasources

Related topics