Datasource pandas_filesystem does not contain "add_directory_csv_asset" as suggested on docs

dpires92 · September 5, 2024, 8:35pm

Hi,

I am following the documentation to read several CSV files in a folder.

import great_expectations as gx
import great_expectations.expectations as gxe
import great_expectations.exceptions as exceptions
from great_expectations.core.expectation_suite import ExpectationSuite
from demo_gym.notebooks.constants import (

    BASE_DIRECTORY,
    DATASOURCE_NAME,
    ASSET_NAME,
    BATCH_DEFINITION_NAME,
    SUITE_NAME,
    VALIDATION_DEFINITION_NAME,

)

context = gx.get_context(mode="file")

data_source = context.data_sources.add_pandas_filesystem(name=DATASOURCE_NAME,base_directory=BASE_DIRECTORY)

Then in the data_source object, I don’t have the add_directory_csv_asset as suggested in the documentation.

I ran print(dir(data_source)) and did not show the option “add_directory_csv_asset”.

Since I could not find the method for the directory, I was not able to find the method add_batch_definition_whole_directory as well.

My second option:

I´ve tested with the add_csv_asset providing the param glob_directive, for instance:

directory_csv_asset = data_source.add_csv_asset(ASSET_NAME,glob_directive="*.csv")

however, it still reads just the last CSV file in my BASE_DIRECTORY.

My environment:
Local
Python 3.11.9
Great Expectations 1.0.2

adeola · October 4, 2024, 4:21pm

Hi there,

It seems like you’re running into an issue because pandas Filesystem Data Sources do not support Directory Data Assets. This is actually mentioned in the documentation, but I agree—it could be highlighted more. The relevant part states:

“Spark Filesystem Data Sources support Directory Data Assets for all supported Filesystem environments. However, pandas Filesystem Data Sources do not support Directory Data Assets at all.”

For your case, since you are using the Pandas Filesystem Data Source, I recommend using the glob_directive or a batching_regex to capture multiple files.

dpires92 · October 7, 2024, 11:01am

Hey @adeola indeed I overlooked this information.
Many thanks for your reply.

Cheers,

Topic		Replies	Views
I am currently working with Great Expectations Core to validate data from two different sources: a CSV file and a MongoDB data source. While I am able to create Expectations and generate local Data Docs, I am encountering the same issue in both cases. S GX Core Support how-to	1	147	November 7, 2024
GX-Databricks:Datasource-Data asset - Validator GX Core Support databricks , datasource	8	434	December 19, 2024
How to configure a Pandas/S3 Datasource Archive how-to , help-wanted	3	817	February 24, 2021
Struggling to identify correct base directory for add_spark_filesystem GX Core Support	1	128	March 21, 2024
Datasource for Microsoft Fabric throwing error when I call get_batch() method on a batch_definition GX Core Support	2	91	September 5, 2024

Datasource pandas_filesystem does not contain "add_directory_csv_asset" as suggested on docs

Related topics