How to configure a PySpark datasource for accessing the data from AWS S3?

Current version of Great Expectation framework documentation (0.9.4) does not contain any samples of how to configure a PySpark datasource in order to access the AWS S3 files. It would be really helpful if there is any example of it’s configuration.

You’re right! In fact, it’s really similar to the example for pandas, since spark’s reader methods also know how to process s3 paths:

datasources:
  nyc_taxi:
    class_name: SparkDFDatasource
    generators:
      s3:
        class_name: S3GlobReaderBatchKwargsGenerator
        bucket: nyc-tlc
        delimiter: '/'
        reader_options:
          sep: ','
          engine: python
        assets:
          taxi-green:
            prefix: trip data/
            regex_filter: 'trip data/green.*\.csv'
          taxi-fhv:
            prefix: trip data/
            regex_filter: 'trip data/fhv.*\.csv'
    data_asset_type:
      class_name: SparkDFDataset