Following documentation, I wanted to pass batch_parameters = {"dataframe": dataframe} and a subset of my table using batch = batch_definition.get_batch(batch_parameters={"year": "2019", "month": "01"}). However I get error that only dataframe is expected as a parameter. How can I pass both simultaneously. I am following gx core documentation links below.
I’m not sure I understand what you mean by “passing both simultaneously.” You have two options:
Whole Table Batch Definition – This provides all records in the Data Asset as a single batch.
Partitioned Batch Definition – This splits data into multiple batches based on a time-based partition, allowing you to validate each batch separately.
Are you working with SQL data or a dataframe?
For SQL Data: You can partition your Batch Definition to create multiple batches based on time. Here’s an example with “daily_batch” and “monthly_batch” definitions:
This approach allows multiple Batch Definitions (full table or partitioned) to be validated separately.
For DataFrames: DataFrames are always passed as a whole. You’d use add_batch_definition_whole_dataframe() to define a Batch Definition with the entire dataset.