I’d like to be able to run Great Expectations at several steps in an airflow pipeline. Between which, Spark is being used used to cleanse / transform data and store parquet files in S3 buckets. Can Great Expectations access these as a data source? I can find only references to Spark on filesystem, and S3 in conjunction with Pandas.
@alexc Yes, you can validate a Parquet file in an S3 bucket as a step in your Airflow DAG. We will create a documentation article for this case, but in the meantime,