Is it possible to see the number of record in pyspark dataframe that didn't pass the validation?

adeola · October 23, 2024, 3:05pm

hey there @woodbine welcome to our community

can you try adding unexpected_index_column_names to your result format? If your DataFrame has a unique identifier column (like an ID or record number), specify that column in unexpected_index_column_names. This will include the failing record indices in the validation output.

result_format = {
   "result_format": "SUMMARY",
   "unexpected_index_column_names": ["id_column"] 
}

replace “id_column” with your unique identifier column

Topic		Replies	Views
DQ result to indicate the row number of the invalid records? Archive	2	539	May 24, 2021
How do I retrieve rows that failed validation from an Athena or Spark data source? Archive	0	522	July 30, 2021
How to validate Spark DataFrames in 0.13 Archive	3	1260	July 19, 2021
Get sample rows that fail validation Archive help-wanted	0	779	August 11, 2021
Great expectation is only returning 20 rows of failed records GX Core Support how-to , help-wanted , types-of-expectation , expectation-request , sql	7	287	March 14, 2025

Is it possible to see the number of record in pyspark dataframe that didn't pass the validation?

Related topics