I’m currently working on a project that uses the Snowflake connector with Great Expectations (version 0.17.23). I’ve set up some validation rules, and my data is failing on several of these. However, even though more than 1000 records fail, Great Expectations only returns a maximum of 20 failed records per rule.
After investigating with the Snowflake debugger, we discovered that Great Expectations is automatically adding a LIMIT 20 clause when fetching the failed records. I would like to disable this limit to retrieve details for all failed records rather than just 20.
Any guidance on how to remove this limit would be greatly appreciated!
hi there. where are you seeing this limit of 20? Data docs or in your validation results? The limit in Data docs is because Data docs is more meant to summarize for stakeholders and not so much as a debugging tool.
When it comes to your validation results you can set the level of detail returned in your Validation Results by specifying a value for the optional result_format parameter. More detail here.
Hi again, I could not do that, the output was always 200 elements and I was sure there was a lot of more since out of 64000 objects, 63826 did not pass validation.
Maybe you could point me what am I doing wrong in my implementation?
import great_expectations as ge
import great_expectations.expectations as gex
context = ge.get_context(mode=“ephemeral”)
data_source = context.data_sources.add_spark(name=“my_data_source”)
data_asset = data_source.add_dataframe_asset(name=“my_data_asset”)
batch_definition = data_asset.add_batch_definition_whole_dataframe(“my_batch_definition”)
suite = ge.ExpectationSuite(name=“my_suite”)
suite = context.suites.add(suite)
suite.add_expectation(… )
suite.add_expectation etc. etc.
validation_definition = context.validation_definitions.add(ge.core.validation_definition.ValidationDefinition(name=“my_validation_definition”, data=batch_definition, suite=suite,))
results = validation_definition.run(batch_parameters={“dataframe”: my_df}, result_format={“result_format”: “COMPLETE”}
i tried also:
results = validation_definition.run(batch_parameters={“dataframe”: my_df}, result_format={“result_format”: “COMPLETE”, “partial_unexpected_count”: 1500}
but it didn’t help. Im also not aware of any parameter for customizing unexpected_list count
are you seeing they key unexpected_index_query returned in your results dictionary when using COMPLETE? That query can be used to retrieve all unexpected values without a limit