Checkpoint with "result_format": "COMPLETE" json that includes validation result without saving all results to GX context

cary · August 28, 2024, 12:40pm

Hello,

I am using a checkpoint with “result_format”: “COMPLETE” option in order to use the complete JSON for custom python flagging row by row issues with a unique identifier (in this case point_id).

The issue I am having is that I also want to create a validation result HTML summary page (/dbfs/mnt/gx/uncommitted/data_docs/local_site/validations/exp_suite/20240801-160244-gx-run-exp_suite/20240801T160244.027774Z/memory_datasource-exp_suite.html) built as well and I cannot find an option that stops the action store_evaluation_params from writing all the COMPLETE failed checks from writing out to my GX data context. To make matters worse I am on databricks using a service principle to connect to Azure where my GX context is located (making checkpoint runtime performance VERY slow [essentially unusable] when this is happening).

Since I could not find a way to stop this from happening I attempted to write a custom action for store_evaluation_params but unfortunately due to issues with the service principle it does not appear GX can locate my plugins or custom_actions on the Azure blob location compared to when they are located locally on dbfs. I am able to read the rest of my GX context just fine but the python files with custom actions within the plugins directory is never imported and instead indicates the module cannot be located. I have set the path in gx.yml and defined the modules while also appending it to sys in python without any luck.

I am wondering if:

MOST PREFERRED OUTCOME: Is there a way to generate a validation result summary HTML page while having “result_format”: “COMPLETE” set but without having the store_evaluation_params write out all the results to my GX context?

OR

Is there is better way to implement a custom action adding this modification to the validation result outside of the dependency around locating the plugins directory?

My checkpoint:

checkpoint = Checkpoint(
    name=checkpoint_name,
    expectation_suite_name=expectation_suite_name,
    data_context=context,
    run_name_template=f"%Y%m%d-%H%M%S-gx-run-{expectation_suite_name}",
    validations=[
        {
            "expectation_suite_name": expectation_suite_name,
        }
    ],
    action_list=[
        {
            "name": "store_evaluation_params",
            "action": {
            	"class_name": "StoreEvaluationParametersAction"
            }
        },
        {
            "name": "update_data_docs",
            "action": {
                "class_name": "UpdateDataDocsAction"
            }
        }
    ],
    runtime_configuration={
        "result_format": {
            "result_format": "COMPLETE",
            "unexpected_index_column_names": ["point_id"],
            "return_unexpected_index_query": True,
        },
    },
)

My custom action:

class CustomStoreValidationResultAction(StoreValidationResultAction):
    def _run(
        self,
        validation_result_suite: ExpectationSuiteValidationResult,
        validation_result_suite_identifier: ValidationResultIdentifier,
        data_asset: dict,
        payload=None,
        expectation_suite_identifier=None,
        checkpoint_identifier=None,
    ):
        # Create a new ExpectationSuiteValidationResult with modified results
        modified_results = []
        for result in validation_result_suite.results:
            modified_result = result.to_json_dict()
            if "unexpected_index_list" in modified_result.get("result", {}):
                del modified_result["result"]["unexpected_index_list"]
            if "unexpected_list" in modified_result.get("result", {}):
                modified_result["result"]["unexpected_list"] = ["<removed for storage>"]
            modified_results.append(modified_result)

        modified_result_suite = ExpectationSuiteValidationResult(
            results=modified_results,
            success=validation_result_suite.success,
            statistics=validation_result_suite.statistics,
            evaluation_parameters=validation_result_suite.evaluation_parameters,
            meta=validation_result_suite.meta
        )

        # Call the parent class method with the modified results
        super()._run(
            validation_result_suite=modified_result_suite,
            validation_result_suite_identifier=validation_result_suite_identifier,
            data_asset=data_asset,
            payload=payload,
            expectation_suite_identifier=expectation_suite_identifier,
            checkpoint_identifier=checkpoint_identifier,
        )

I also noticed this GX docs page for Configure Actions is down - https://docs.greatexpectations.io/docs/oss/guides/validation/validation_actions/actions_lp/

Any ideas are appreciated!

adeola · September 12, 2024, 8:17pm

hi @cary, great questions. it does look like the plugins_directory is expected on the local filesystem unfortunately.

what if you try modifying the default store_validation_result action?

cary · September 17, 2024, 3:12pm

Hi @adeola,

Thank you for your feedback. You mentioned:

it does look like the plugins_directory is expected on the local filesystem unfortunately.

Is it possible to modify this path to look at a different context location? I have already tried but it never seems to be executed.

I am defining my context in this manner at the start of my code. Where specifically does it look locally when the context is setup in this manner?

# Service Principal path to azure blob
context_root_dir = "/dbfs/mnt/spatial-metadata/gx"
try:
    context = gx.get_context(context_root_dir=context_root_dir)
    logger.info(f"Great Expectations context located at {context_root_dir}")
except Exception as e:
    logger.error(f"Error creating Great Expectations context: {str(e)}")
    raise

what if you try modifying the default store_validation_result action?
I will try this although it will remain modified for everyone using this framework which isn’t the best outcome given my implementation.

A few follow up questions:

Have any features been made to how plugins are referenced or to how the store_validation_result action write results out (stores all data to output location) in the newly released GX 1.0?
This seems like a pretty common use case for GX, any possibility GX devs could add an option to store_validation_result so that users can choose to not write out the results when building the summary report (code can be found above)?

Appreciate any suggestions on existing settings that could modify this behavior or assist in locating my custom plugins located on a cloud storage location.

For additional assistance, this is my updated checkpoint yml:

action_list:
- action:
    class_name: CustomStoreValidationResultAction
    module_name: custom_actions
  name: store_validation_result
- action:
    class_name: UpdateDataDocsAction
  name: update_data_docs
batch_request: {}
class_name: Checkpoint
config_version: 1.0
default_validation_id: null
evaluation_parameters: {}
expectation_suite_ge_cloud_id: null
expectation_suite_name: record
ge_cloud_id: null
module_name: great_expectations.checkpoint
name: checkpoint_record
notify_on: null
notify_with: null
profilers: []
run_name_template: '%Y%m%d-%H%M%S-gx-run-record'
runtime_configuration:
  result_format:
    result_format: COMPLETE
    return_unexpected_index_query: true
    unexpected_index_column_names:
    - point_id
site_names: null
slack_webhook: null
template_name: null
validations:
- batch_request: null
  expectation_suite_ge_cloud_id: null
  expectation_suite_name: record
  id: null
  name: null

adeola · October 29, 2024, 8:52pm

hi @cary, unfortunately there isn’t a workaround referring to your inquiry about generating the complete HTML summary without sending all validation data to the GX context. Regarding custom actions, we don’t have specific documentation for how this has changed in 1.x at the moment, but this is a recent priority for us.

In the past I know a user was able to get a custom action up and running by redefining them with our new Pydantic syntax, subclassing the Checkpoint class and overwriting the action_list to accept the new class as a valid CheckpointAction after being blocked by pydantic. Lastly, the user stated he monkey patched the Checkpoint class in a few areas to stop pydantic from complaining. I have since asked the user if they are willing to share more details about how they got this working.

I’m sorry I couldn’t be of more help, but I did strongly advocate for the release of the relevant documentation, which should be released soon. I’ll make sure to follow up with you once it’s available.

adeola · December 23, 2024, 2:40pm

hi @cary I am happy to share that we have added back the ability to add custom actions! Thank you for your patience while we worked on this.

Topic		Replies	Views
COMPLETE result_format not working on some expectations GX Core Support	1	440	August 15, 2023
Checkpoint results to delta lake GX Core Support	1	88	September 28, 2024
No support for Spark DF in Result Format COMPLETE Mode GX Core Support	5	459	February 9, 2024
Issue with checkpoint.run() results when we have multiple validation definitions GX Core Support help-wanted , s3	1	71	April 1, 2025
Different result format per validation in checkpoint GX Core Support how-to , help-wanted	1	189	February 26, 2024

Checkpoint with "result_format": "COMPLETE" json that includes validation result without saving all results to GX context

Related topics