Use S3 as data source 2022

ncastrog · November 10, 2022, 1:09am

I have problems to connect to our AWS S3 bucket.

I have followed the manual and have this configuration file

enaible_yaml = f"""
name: enaible_s3_datasource
class_name: Datasource
execution_engine:
    class_name: PandasExecutionEngine
data_connectors:
    default_runtime_data_connector_name:
        class_name: RuntimeDataConnector
        batch_identifiers:
            - default_identifier_name
    default_inferred_data_connector_name:
        class_name: InferredAssetS3DataConnector
        bucket: enaible-public-data
        prefix: data_quality/final
        default_regex:
          pattern: (.*)/(.*)\.parquet
          group_names:
            - prefix
            - data_asset_name
"""
print(example_yaml)

But get this error when I test the yaml file.

ValueError: S3 query may not have been configured correctly.

I need to load parquet files also rather than csv.
great_expectations, version 0.15.29

Any enlightenment will be appreciated.

Thanks in advance

c_m · November 16, 2022, 5:12pm

I can’t offer a solution but have you tried testing this against a newly created empty S3 bucket?

See: great_expectations/util.py at 5c21d539dd5f280fa0afb55a506b1e51d871bfbf · great-expectations/great_expectations · GitHub

which seems to indicate that the S3 bucket might be misconfigured. I successfully connected to an S3 bucket with the same configuration to yours with no error with GE 0.15.32

Topic		Replies	Views
Help understanding datasource_name config for RuntimeBatchRequest GX Core Support s3 , datasource	5	385	September 14, 2023
S3 Data Source Configures Successfully But Suite New Fails with New S3 Datasource Archive s3	2	572	May 3, 2021
Configuring S3 as Datastore Archive how-to , s3	2	696	April 1, 2021
Spark Connect to S3 GX Core Support s3	0	294	November 9, 2021
Creating Data source for s3 with pandas Archive how-to , help-wanted , s3	1	642	June 14, 2021

Use S3 as data source 2022

Related topics