This article is for comments to: https://docs.greatexpectations.io/en/latest/how_to_guides/creating_batches/how_to_load_data_from_s3_for_validation_with_pandas_as_a_batch.html
Please comment +1 if this How to is important to you.
This article is for comments to: https://docs.greatexpectations.io/en/latest/how_to_guides/creating_batches/how_to_load_data_from_s3_for_validation_with_pandas_as_a_batch.html
Please comment +1 if this How to is important to you.
+1
I don’t see how to define s3 datasource in 0.11.0 so that it works. I have (multiple versions of):
devices_s3:
batch_kwargs_generators:
s3gen:
class_name: S3GlobReaderBatchKwargsGenerator
bucket: mybucket/
reader_method: parquet
assets:
device_ip_sample:
prefix: device_ip_sample/
regex_filter: .*
dictionary_assets: True
module_name: great_expectations.datasource
data_asset_type:
module_name: great_expectations.dataset
class_name: SparkDFDataset
class_name: SparkDFDatasource
but great_expectations datasource profile devices_s3
gives me this error:
Unrecognized batch_parameter(s): {'data_asset_name'}
(...omitted errors...)
File "/home/kris/anaconda3/lib/python3.7/site-packages/great_expectations/datasource/batch_kwargs_generator/s3_batch_kwargs_generator.py", line 208, in _build_batch_kwargs
"s3": "s3a://" + self.bucket + "/" + key,
TypeError: can only concatenate str (not "dict") to str
Thanks for your help!
@krisp we’ll work on reproducing this! Thanks for the report. Just to confirm is this on version 0.11.0?
@krisp -> Have you confirmed the version by chance? I believe this was an issue that is now fixed…