Using Great Expectations to validate Excel files

We got a question about validating Excel files without having to export them to CSV. I can think of two options:

  1. One quick option would be to follow the instructions provided in this tutorial to validate Excel files in-line in notebook. You can use ge.read_excel instead of read_csv (make sure the headers are loaded correctly). However, this means you won’t be able to save the expectation suite or generate data docs since you won’t have a data context.

  2. Another option would be to create a data context with a “dummy” datasource (e.g. pointing at any directory), then loading the Excel data as a pandas dataframe, creating a batch, and using the standard workflow that’s shown in the great_expectations suite new notebook. The only difference is how you set up the batch_kwargs: 1) Pass the ‘dataset’ key to point to the dataframe. 2) Unfortunately you’ll need the ‘datasource’ key to point to a dummy datasource (this isn’t quite intuitive). Here’s some code that worked for me: https://gist.github.com/spbail/6723728aecf0295bf50622d07e09840e

1 Like