Row by row comparison in great_expectation between two datasources

I am able to compare two datasources but it is not giving me a row by row comparison of column values at table level.
Can somebody help .

I was trying with custom expectations but didn’t achieve anything

If the tables are located in the same data base I would create a RunTimeBatchRequest with a SQL query that joins both tables and the logic needed. The outcome would be one batch with data from both tables. Based on that table, I would generate an expectation to compare 2 columns.

There might be another solution but I have succeeded in the past with workaround.

I hope it works for you. All the best!

Hi Diego,both are CSV files.
I am reading them through spark.
My goal is to find out rows that are uncommon between the two csv’s.

If there is any column mismatch,it would print the unexpected rows through any expectation

I guess both files have the same columns. Do you have many columns?

I would do this:

Step 1) Generate a new data frame (df3) with a new boolean column “common_row”. This data frame would contain all the rows from both data frames and this new column will tell if they are common TRUE or not FALSE (you need to work on the logic to get this).

Step 2) Create an expectation suite for this generated data frame (df3), and add an expectation which validates the boolean column “common_row”. The validation should check if there are uncommon rows (FALSE).

Step 3) Execute this as a RunTimeBatch sending the df3.

  • There might other solutions.