Setting up a batchrequest with pattern

Hello Great Expectations community,

Please bear with me, I have only used GE for some weeks and I guess I only have misunderstood something regarding how to set the pattern correctly for my asset. I have attached my python code below.

First I set up the Azure Blob Storeage data source and test it. Everything working as expected and it finds four matches correctly in the blob storeage.

ExecutionEngine class name: PandasExecutionEngine
Data Connectors:
default_configured_data_connector_name : ConfiguredAssetAzureDataConnector

Available data_asset_names (1 of 1):
	inference_year_month (3 of 4): ['cases_2021-06-*.csv', 'cases_2021-06-*.csv', 'cases_2021-07-*.csv']

Unmatched data_references (0 of 0):[]

The file names are built up by the pattern: “cases_YYYY-MM-DD.csv”. I don’t want to specify the DD-part so I set it to “.*” in the regexp pattern below. Though I want to set the year and month in the “data_connector_query”. So by doing this I hope that it finds (and match) out which files exists given the year and month.

Next I try to add a BatchRequest based on this data source, but when I test it with the get_batch_list(…) I receive an error:
“ResourceNotFoundError: The specified blob does not exist.
RequestId:c6908321-b01e-0027-3a1a-0d5693000000
Time:2022-01-19T09:51:18.7407504Z
ErrorCode:BlobNotFound”.

If I hardcode the DD-part (e.g “cases_(\d{4})-(\d{2})-29\.csv”) instead of using “.*” in the pattern argument - it actually works (no error - it finds and load the file). So the problem seems related to the DD-part.

I use great-expectations version 0.14.0

I hope I made myself clear and you understand my problem. Hope you can bring some advice / hint on what’s my problem is. :slight_smile:

My code:
datasource_config = {
“name”: “my_azure_datasource”,
“class_name”: “Datasource”,
“execution_engine”: {
“class_name”: “PandasExecutionEngine”,
“azure_options”: {
“account_url”: “https://xxxx.blob.core.windows.net/”,
“credential”: “xxxx”,
},
},
“data_connectors”: {
“default_configured_data_connector_name”: {
“class_name”: “ConfiguredAssetAzureDataConnector”,
“azure_options”: {
“account_url”: “https://xxxxxx.blob.core.windows.net/”,
“credential”: “xxxxx”,
},
“container”: “blobstore-xxxx”,
“name_starts_with”: “”,
“default_regex”: {
“pattern”: “(.)",
“group_names”: [“data_asset_name”]
},
“assets”: {
“inference_year_month”: {
“base_directory”: “”,
“pattern”: "cases_(\d{4})-(\d{2})-.
\.csv”,
“group_names”: [“year”, “month”]
}
}
},
},
}

Test configuration

context.test_yaml_config(yaml.dump(datasource_config))

Save the data source configuration

context.add_datasource(**datasource_config)

data_connector_query_202107 = {
“batch_filter_parameters”: {
“year”: “2021”,
“month”: “07”
}
}

Create batch / data set

batch_request = BatchRequest(
datasource_name=“my_azure_datasource”,
data_connector_name=“default_configured_data_connector_name”,
data_asset_name=“inference_year_month”,
batch_spec_passthrough={
“reader_method”: “csv”,
“reader_options”: {“sep”: “;”}
},
data_connector_query=data_connector_query_202107
)

List all Batches associated with the DataAsset

context.get_batch_list(
datasource_name=“my_azure_datasource”,
data_connector_name=“default_configured_data_connector_name”,
data_asset_name=“inference_year_month”
)

Thanks
Andy