How to configure a self managed Spark Datasource

This article is for comments to: https://docs.greatexpectations.io/en/latest/how_to_guides/configuring_datasources/how_to_configure_a_self_managed_spark_datasource.html

Please comment +1 if this How to is important to you.

I am looking for documentation on how to use Spark datasource (and files are stored remotely say S3).
I found two articles below. Is the first article about using a local Spark for validation, and the second method uses a Spark cluster? So if I want the validation to be scalable(larger amount of data), should I follow the second article? Is it true the first article’s method will only be useful for validating small amount of data (since it seems to be using a local Spark and will involve downloading data from remote location during validation)?

  1. How to configure a PySpark datasource for accessing the data from AWS S3?
  2. https://docs.greatexpectations.io/en/latest/guides/how_to_guides/configuring_datasources/how_to_configure_a_self_managed_spark_datasource.html