Checkpoint with "result_format": "COMPLETE" json that includes validation result without saving all results to GX context

Hello,

I am using a checkpoint with “result_format”: “COMPLETE” option in order to use the complete JSON for custom python flagging row by row issues with a unique identifier (in this case point_id).

The issue I am having is that I also want to create a validation result HTML summary page (/dbfs/mnt/gx/uncommitted/data_docs/local_site/validations/exp_suite/20240801-160244-gx-run-exp_suite/20240801T160244.027774Z/memory_datasource-exp_suite.html) built as well and I cannot find an option that stops the action store_evaluation_params from writing all the COMPLETE failed checks from writing out to my GX data context. To make matters worse I am on databricks using a service principle to connect to Azure where my GX context is located (making checkpoint runtime performance VERY slow [essentially unusable] when this is happening).

Since I could not find a way to stop this from happening I attempted to write a custom action for store_evaluation_params but unfortunately due to issues with the service principle it does not appear GX can locate my plugins or custom_actions on the Azure blob location compared to when they are located locally on dbfs. I am able to read the rest of my GX context just fine but the python files with custom actions within the plugins directory is never imported and instead indicates the module cannot be located. I have set the path in gx.yml and defined the modules while also appending it to sys in python without any luck.

I am wondering if:

  1. MOST PREFERRED OUTCOME: Is there a way to generate a validation result summary HTML page while having “result_format”: “COMPLETE” set but without having the store_evaluation_params write out all the results to my GX context?

OR

  1. Is there is better way to implement a custom action adding this modification to the validation result outside of the dependency around locating the plugins directory?

My checkpoint:

checkpoint = Checkpoint(
    name=checkpoint_name,
    expectation_suite_name=expectation_suite_name,
    data_context=context,
    run_name_template=f"%Y%m%d-%H%M%S-gx-run-{expectation_suite_name}",
    validations=[
        {
            "expectation_suite_name": expectation_suite_name,
        }
    ],
    action_list=[
        {
            "name": "store_evaluation_params",
            "action": {
            	"class_name": "StoreEvaluationParametersAction"
            }
        },
        {
            "name": "update_data_docs",
            "action": {
                "class_name": "UpdateDataDocsAction"
            }
        }
    ],
    runtime_configuration={
        "result_format": {
            "result_format": "COMPLETE",
            "unexpected_index_column_names": ["point_id"],
            "return_unexpected_index_query": True,
        },
    },
)

My custom action:

class CustomStoreValidationResultAction(StoreValidationResultAction):
    def _run(
        self,
        validation_result_suite: ExpectationSuiteValidationResult,
        validation_result_suite_identifier: ValidationResultIdentifier,
        data_asset: dict,
        payload=None,
        expectation_suite_identifier=None,
        checkpoint_identifier=None,
    ):
        # Create a new ExpectationSuiteValidationResult with modified results
        modified_results = []
        for result in validation_result_suite.results:
            modified_result = result.to_json_dict()
            if "unexpected_index_list" in modified_result.get("result", {}):
                del modified_result["result"]["unexpected_index_list"]
            if "unexpected_list" in modified_result.get("result", {}):
                modified_result["result"]["unexpected_list"] = ["<removed for storage>"]
            modified_results.append(modified_result)

        modified_result_suite = ExpectationSuiteValidationResult(
            results=modified_results,
            success=validation_result_suite.success,
            statistics=validation_result_suite.statistics,
            evaluation_parameters=validation_result_suite.evaluation_parameters,
            meta=validation_result_suite.meta
        )

        # Call the parent class method with the modified results
        super()._run(
            validation_result_suite=modified_result_suite,
            validation_result_suite_identifier=validation_result_suite_identifier,
            data_asset=data_asset,
            payload=payload,
            expectation_suite_identifier=expectation_suite_identifier,
            checkpoint_identifier=checkpoint_identifier,
        )

I also noticed this GX docs page for Configure Actions is down - https://docs.greatexpectations.io/docs/oss/guides/validation/validation_actions/actions_lp/

Any ideas are appreciated!

hi @cary, great questions. it does look like the plugins_directory is expected on the local filesystem unfortunately.

what if you try modifying the default store_validation_result action?

Hi @adeola,

Thank you for your feedback. You mentioned:

it does look like the plugins_directory is expected on the local filesystem unfortunately.

Is it possible to modify this path to look at a different context location? I have already tried but it never seems to be executed.

I am defining my context in this manner at the start of my code. Where specifically does it look locally when the context is setup in this manner?

# Service Principal path to azure blob
context_root_dir = "/dbfs/mnt/spatial-metadata/gx"
try:
    context = gx.get_context(context_root_dir=context_root_dir)
    logger.info(f"Great Expectations context located at {context_root_dir}")
except Exception as e:
    logger.error(f"Error creating Great Expectations context: {str(e)}")
    raise

what if you try modifying the default store_validation_result action?
I will try this although it will remain modified for everyone using this framework which isn’t the best outcome given my implementation.

A few follow up questions:

  • Have any features been made to how plugins are referenced or to how the store_validation_result action write results out (stores all data to output location) in the newly released GX 1.0?
  • This seems like a pretty common use case for GX, any possibility GX devs could add an option to store_validation_result so that users can choose to not write out the results when building the summary report (code can be found above)?

Appreciate any suggestions on existing settings that could modify this behavior or assist in locating my custom plugins located on a cloud storage location.

For additional assistance, this is my updated checkpoint yml:

action_list:
- action:
    class_name: CustomStoreValidationResultAction
    module_name: custom_actions
  name: store_validation_result
- action:
    class_name: UpdateDataDocsAction
  name: update_data_docs
batch_request: {}
class_name: Checkpoint
config_version: 1.0
default_validation_id: null
evaluation_parameters: {}
expectation_suite_ge_cloud_id: null
expectation_suite_name: record
ge_cloud_id: null
module_name: great_expectations.checkpoint
name: checkpoint_record
notify_on: null
notify_with: null
profilers: []
run_name_template: '%Y%m%d-%H%M%S-gx-run-record'
runtime_configuration:
  result_format:
    result_format: COMPLETE
    return_unexpected_index_query: true
    unexpected_index_column_names:
    - point_id
site_names: null
slack_webhook: null
template_name: null
validations:
- batch_request: null
  expectation_suite_ge_cloud_id: null
  expectation_suite_name: record
  id: null
  name: null