Validate foreign keys / load multiple files to a single spark dataframe with batch generator

Hi all,

Is there a way to use the batch generator to create a spark dataframe based on multiple files? Ex… I have a fact table in a parquet file and multiple dimension parquet files is there a way to do something like validate joins / foreign keys? Or would I have to build the dataframe outside and then replace my batch.spark_df?

If your parquet files are on S3, you can use S3GlobReaderBatchKwargsGenerator to configure a “data asset” using a prefix and a regex and set the directory_assets property to True. This property will signal to the generator to load all the files into one Data Frame.

See the generator’s configuration here:
https://docs.greatexpectations.io/en/latest/autoapi/great_expectations/datasource/batch_kwargs_generator/s3_batch_kwargs_generator/index.html