Using Great Expectations to validate Excel files

sam · October 13, 2020, 4:06pm

We got a question about validating Excel files without having to export them to CSV. I can think of two options:

One quick option would be to follow the instructions provided in this tutorial to validate Excel files in-line in notebook. You can use ge.read_excel instead of read_csv (make sure the headers are loaded correctly). However, this means you won’t be able to save the expectation suite or generate data docs since you won’t have a data context.
Another option would be to create a data context with a “dummy” datasource (e.g. pointing at any directory), then loading the Excel data as a pandas dataframe, creating a batch, and using the standard workflow that’s shown in the great_expectations suite new notebook. The only difference is how you set up the batch_kwargs: 1) Pass the ‘dataset’ key to point to the dataframe. 2) Unfortunately you’ll need the ‘datasource’ key to point to a dummy datasource (this isn’t quite intuitive). Here’s some code that worked for me: https://gist.github.com/spbail/6723728aecf0295bf50622d07e09840e

Topic		Replies	Views
How to validate data without a Checkpoint Feedback how-to , help-wanted	4	1161	September 28, 2021
I am currently working with Great Expectations Core to validate data from two different sources: a CSV file and a MongoDB data source. While I am able to create Expectations and generate local Data Docs, I am encountering the same issue in both cases. S GX Core Support how-to	1	143	November 7, 2024
How do I programmatically validate expectations? Archive	3	587	May 17, 2021
A super-simple alternative introduction to Great Expectations Archive	6	3807	March 27, 2020
How to validate Spark DataFrames in 0.13 Archive	3	1258	July 19, 2021