Regex Compile Error When Trying to add_batch_definition_path

Hello,

I’m using Windows 10, GX version 1.3.1 and trying to use ephemeral context. My issue is that I’m not able to create a batch definition. I either get a regex compile error or a PathNotFoundError.

If I submit the file path as either a WindowsPath from pathlib or with escape characters for the backslashes I get a error: incomplete escape \U at position 2 message. That error comes from regex when it tries to compile the path in add_batch_definition_path.

batch_definition_path = 'c:\\Users\\me\\project_data\\test_file_20240902.csv'
#  or batch_definition_path = Path(source_folder) / batch_definition_name
#  or batch_definition_path = str(Path(source_folder) / batch_definition_name)
if Path(batch_definition_path).exists():
    print(f"Batch definition path exists: {batch_definition_path}")
    batch_definition = file_data_asset.add_batch_definition_path(name=batch_definition_name, path=batch_definition_path)
    # error: incomplete escape \U at position...
    # the code below will raise the same error 
    regex = re.compile(str(batch_definition_path))

If I use a raw string for the path, the regex compiles, but then it raises a PathNotFoundError. I know the path exists because my code below prints “Batch definition path exists…” before the error.

batch_definition_path = r'c:/Users/me/project_data/test_file_20240902.csv'
if Path(batch_definition_path).exists():
    print(f"Batch definition path exists: {batch_definition_path}")
    batch_definition = file_data_asset.add_batch_definition_path(name=batch_definition_name, path=batch_definition_path)
    # will raise a PathNotFoundError

Am I doing something wrong here? How should I be adding a batch definition path with Windows?

hi @FreeLance - are you using a filesystem data asset? If so, the path will be concatenated with the base_directory path you configured on the data source, so the path you give the batch definition should be relative to the base directory.

Hey @joshstauffer - thanks for getting back to me.

the path you give the batch definition should be relative to the base directory.

Aha! This is where I was going wrong. I think this section of the documentation confused me.

If you are using a File Data Context, you can provide a path that is relative to the Data Context’s base_directory . Otherwise, you should provide the absolute path to the folder that contains your data.

For now I’m just using ephemeral context so I thought I needed to provide an absolute path for the batch definition path. However, the base_directory of the data source should be an absolute path and the batch definition path should be relative to the base directory of the data source. I should re-write the if statement above as:

# data_source.base_directory is a WindowsPath object
if (data_source.base_directory / batch_definition_path).exists():
    batch_definition = file_data_asset.add_batch_definition_path(name=batch_definition_name, path=batch_definition_path)