To create a batch from two SQL assets, you can follow these steps:
- Import GX and instantiate a Data Context.
- Retrieve the SQL Data Source and Data Assets that you want to include in the batch.
- Add a Splitter to each Data Asset to divide the data based on a specific field.
- (Optional) Add Batch Sorters to each Data Asset to specify the order in which the batches are returned.
- Use a Batch Request to verify that the Data Assets work as desired.
Here is an example of how you can implement these steps using Great Expectations:
import great_expectations as ge # Step 1: Import GX and instantiate a Data Context context = ge.data_context.DataContext() # Step 2: Retrieve the SQL Data Source and Data Assets data_source = context.get_datasource("my_datasource") data_asset1 = data_source.get_data_asset("table1") data_asset2 = data_source.get_data_asset("table2") # Step 3: Add a Splitter to each Data Asset data_asset1.add_splitter("column1") data_asset2.add_splitter("column2") # Step 4: (Optional) Add Batch Sorters to each Data Asset data_asset1.add_sorters(["column1"]) data_asset2.add_sorters(["column2"]) # Step 5: Use a Batch Request to verify the Data Assets batch_request1 = data_asset1.get_batch_request() batch_request2 = data_asset2.get_batch_request() batch1 = context.get_batch(batch_request1, data_asset1) batch2 = context.get_batch(batch_request2, data_asset2)
In this example, we first import Great Expectations and instantiate a Data Context. Then, we retrieve the SQL Data Source and the Data Assets that we want to include in the batch. We add a Splitter to each Data Asset to divide the data based on a specific column. Optionally, we can add Batch Sorters to specify the order in which the batches are returned. Finally, we use a Batch Request to get the batches from each Data Asset.
Note that the specific implementation may vary depending on your SQL data source and the structure of your data assets. Make sure to adjust the code accordingly.