Batch_parameters() to pass dataframe and parameter to restrict data together

Following documentation, I wanted to pass batch_parameters = {"dataframe": dataframe} and a subset of my table using
batch = batch_definition.get_batch(batch_parameters={"year": "2019", "month": "01"}). However I get error that only dataframe is expected as a parameter. How can I pass both simultaneously. I am following gx core documentation links below.

Enviornment: spark on Microsoft Fabric

Hi there,

I’m not sure I understand what you mean by “passing both simultaneously.” You have two options:

  1. Whole Table Batch Definition – This provides all records in the Data Asset as a single batch.
  2. Partitioned Batch Definition – This splits data into multiple batches based on a time-based partition, allowing you to validate each batch separately.

Are you working with SQL data or a dataframe?

For SQL Data: You can partition your Batch Definition to create multiple batches based on time. Here’s an example with “daily_batch” and “monthly_batch” definitions:

daily_batch = daily_batch_definition.get_batch(
    batch_parameters={"year": 2020, "month": 1, "day": 14}
)
daily_batch.head()

monthly_batch = monthly_batch_definition.get_batch(
    batch_parameters={"year": 2020, "month": 1}
)
monthly_batch.head()

This approach allows multiple Batch Definitions (full table or partitioned) to be validated separately.

For DataFrames: DataFrames are always passed as a whole. You’d use add_batch_definition_whole_dataframe() to define a Batch Definition with the entire dataset.