Creating Data source for s3 with pandas

Garima93 · June 1, 2021, 8:04am

We are getting below error when we tried to configure Data Source and read data from s3 in pandas engine GE.

great_expectations.exceptions.exceptions.BatchKwargsError: Unable to build batch_kwargs. The asset may not be configured correctly. If s3 returned common prefixes it may not have been able to identify desired keys, and they are included in the incomplete batch_kwargs object returned with this error.

We tried below configuration but it gives above error

pandas_s3:
class_name: PandasDatasource
batch_kwargs_generators:
pandas_s3_generator:
class_name: S3GlobReaderBatchKwargsGenerator
bucket: xxx-xxx-xxx-xxxxxx-bucket
reader_method: read_csv
reader_options:
sep: “,”
delimiter: “/”
assets:
client_csv_file_test:
prefix: /xxx/xx-xx-xxxxx/xxxxx/
regex_filter: ‘/xxx/xx-xx-xxxxx/xxxxx/Client.*.csv’
module_name: great_expectations.datasource
data_asset_type:
class_name: PandasDataset
module_name: great_expectations.dataset

Can you please help here.

eugene.mandel · June 14, 2021, 3:28pm

@Garima93 Could you please use GitHub issues to report this? Thank you!

Topic		Replies	Views
How to load data from S3 for validation with Pandas as a batch Archive how-to , help-wanted	3	649	June 24, 2020
Connecting GE with S3 Archive	5	2675	May 28, 2021
Configuring S3 as Datastore Archive how-to , s3	2	700	April 1, 2021
Use S3 as data source 2022 Archive help-wanted	1	610	November 16, 2022
How to configure a PySpark datasource for accessing the data from AWS S3? Archive	1	1436	March 28, 2020

Creating Data source for s3 with pandas

Related topics