Datasource pandas_filesystem does not contain "add_directory_csv_asset" as suggested on docs

Hi,

I am following the documentation to read several CSV files in a folder.

import great_expectations as gx
import great_expectations.expectations as gxe
import great_expectations.exceptions as exceptions
from great_expectations.core.expectation_suite import ExpectationSuite
from demo_gym.notebooks.constants import (

    BASE_DIRECTORY,
    DATASOURCE_NAME,
    ASSET_NAME,
    BATCH_DEFINITION_NAME,
    SUITE_NAME,
    VALIDATION_DEFINITION_NAME,

)

context = gx.get_context(mode="file")

data_source = context.data_sources.add_pandas_filesystem(name=DATASOURCE_NAME,base_directory=BASE_DIRECTORY)

Then in the data_source object, I don’t have the add_directory_csv_asset as suggested in the documentation.

I ran print(dir(data_source)) and did not show the option “add_directory_csv_asset”.

Since I could not find the method for the directory, I was not able to find the method add_batch_definition_whole_directory as well.

My second option:

I´ve tested with the add_csv_asset providing the param glob_directive, for instance:

directory_csv_asset = data_source.add_csv_asset(ASSET_NAME,glob_directive="*.csv")

however, it still reads just the last CSV file in my BASE_DIRECTORY.

My environment:
Local
Python 3.11.9
Great Expectations 1.0.2


Hi there,

It seems like you’re running into an issue because pandas Filesystem Data Sources do not support Directory Data Assets. This is actually mentioned in the documentation, but I agree—it could be highlighted more. The relevant part states:

“Spark Filesystem Data Sources support Directory Data Assets for all supported Filesystem environments. However, pandas Filesystem Data Sources do not support Directory Data Assets at all.”

For your case, since you are using the Pandas Filesystem Data Source, I recommend using the glob_directive or a batching_regex to capture multiple files.

1 Like

Hey @adeola indeed I overlooked this information.
Many thanks for your reply.

Cheers,