Does Great Expectations support file-level expectations?

For example, iterating through each year + date combination and performing the following:

  • Check for number of parquet files
  • Check size of parquet files
  • Check for the list of columns within a parquet file
  • Validate column value for each row within a parquet file

Great Expectations doesn’t support any expectations set against the existence or metadata of a file itself – number of files in a directory, size of those files, whether a given file exists, etc.

Great Expectations does support many expectations set against the data contained within those files, including validating column existence and values within that columns. For more on the expectations provided by GX, see our Expectation Gallery.

Hi @austin_gx as I was looking for this particular question, I started reading the source code at the GX repo. There I found the file great_expectations/data_asset/file_data_asset.py that includes functions for FileDataAsset like:

  • expect_file_hash_to_equal
  • expect_file_size_to_be_between
  • expect_file_to_exist
  • expect_file_to_have_valid_table_header
  • expect_file_to_be_valid_json

Could you clarify what is the role/usage of these functions?

Thanks in advance,
César

1 Like

Hmm. In a similar situation, I would create a dataframe of the statistics I am interested in and validate the dataframe.

Hey @CesarGarcia! Great Q. The *DataAsset classes like FileDataAsset are from a much earlier ( < 0.13 ) version of GX, and no longer actively supported.

Those earlier versions of GX did indeed support certain file-level expectations, and if you rolled back to these early versions of the API & set up your context & datasources appropriately, you may be able to make use of these – however, those versions of the API are significantly less stable & less performant.