I need some pointers as to what I’m doing wrong, please.
Following the process here and in docs for setting up a multi-file Pandas filesystem datasource, I’ve created a datasource and a CSVAsset with a batching_regex which I’ve verified matches two files in the same data directory.
When then running a checkpoint, I was only seeing one file processed. In working backward to see what’s wrong, I’m finding that only one file is listed when i run
mybatchrequest = asset.build_batch_request()
for batch in asset.get_batch_list_from_batch_request(mybatchrequest):
print(batch.batch_spec)
There’s not something special that has to be done to get all the files when the regex includes group names, is there? My regex in the asset is batching_regex=re.compile('customer_(?P<datetime>\\d{14})\\.csv')
Are there other likely sources of this that I should be looking into?