How to run a checkpoint to validate multiple dataframes

Hi, I am using GX version 1.2.4
My scenario is to validate a spark dataframe where some columns are array, but GX does not support validating array. My current approach is splitting those array columns into new dataframes, then creating an expectation suite and a validation definition for each one.
I gathered all validation definitions into a single checkpoint which looked like the following

{
  "actions": [
    {
      "name": "update_all_data_docs",
      "site_names": [],
      "type": "update_data_docs"
    }
  ],
  "id": "d7bc316a-0501-452b-885b-b5bc4dab79df",
  "name": "bronze_to_silver__checkpoint",
  "result_format": "COMPLETE",
  "validation_definitions": [
    {
      "id": "1e993b0d-ccd7-40bb-8a18-70253edee2d0",
      "name": "validation_def__dataframe1"
    },
    {
      "id": "35c0f783-f48e-46eb-ba14-9607c2677222",
      "name": "validation_def__dataframe2"
    },
    {
      "id": "1af3d180-68fa-406f-a3ae-95fc59db7f0a",
      "name": "validation_def__dataframe3"
    }
  ]
}

I wonder how I can pass those 3 dataframes to the checkpoint.run().

Hi May! You can create distinct validation definitions for different DataFrames by adding them to your context and passing the corresponding validation definitions to the checkpoint. Example:

validation_definitions = context.validation_definitions.add(gx.ValidationDefinition(name=name, data=bd, suite=suite))

checkpoint = context.checkpoints.add(
    gx.Checkpoint(
        name=name,
        validation_definitions=[vd],
        actions=[gx.checkpoint.actions.UpdateDataDocsAction(name=name)],
    )
)

Hi Adeola. Thank you for your response! I understand how to configure checkpoint with multiple validation definitions, but I don’t know how to run the checkpoint. I’m currently passing a dataframe in a batch_parameters like the following:

batch_parameters = {"dataframe": df}
run_results = checkpoint.run(batch_parameters=batch_parameters)

However, only 1 dataframe was passed to the checkpoint. Since I have 3 validation definitions, I expect to pass dataframes like:

batch_parameters = [
    {"dataframe": df1},
    {"dataframe": df2},
    {"dataframe": df3},
]
run_results = checkpoint.run(batch_parameters=batch_parameters)

This is just not how it works. Could you guide me on this?

Hi May,

Apologies for the confusion earlier-- you’re correct, what I described is only possible for non-DataFrame data sources. For DataFrames, a checkpoint requires a single set of batch parameters, so achieving this isn’t feasible with a single checkpoint.

Got it! Thank you very much