Can I use checkpoints to validate data on a spark cluster?

The following answer also applies to the question of whether you can use checkpoints to validate data on any file system.

Currently, Great Expectations’ checkpoint functionality is not set up to work automatically with file system data. The problem is in automating the loading of a new batch of data when that batch is in new files with different names. Checkpoints currently supports new batches in a database, but not in files.

A fix for this gap in checkpoints functionality is currently planned for the next major release. In the meantime, here’s the workaround. Checkpoints are just a thin veneer on top of validation operators. So you can automate the workflow using Validation Operators directly for now.

Here’s a sample workbook included in the project that implements this: https://github.com/great-expectations/great_expectations/blob/develop/great_expectations/init_notebooks/spark/validation_playground.ipynb