How to configure a self managed Spark Datasource

kyle · May 27, 2020, 8:39pm

This article is for comments to: https://docs.greatexpectations.io/en/latest/how_to_guides/configuring_datasources/how_to_configure_a_self_managed_spark_datasource.html

Please comment +1 if this How to is important to you.

anthonycwmak · October 28, 2020, 11:24pm

I am looking for documentation on how to use Spark datasource (and files are stored remotely say S3).
I found two articles below. Is the first article about using a local Spark for validation, and the second method uses a Spark cluster? So if I want the validation to be scalable(larger amount of data), should I follow the second article? Is it true the first article’s method will only be useful for validating small amount of data (since it seems to be using a local Spark and will involve downloading data from remote location during validation)?

Topic		Replies	Views
How to configure a Spark/filesystem Datasource Archive	0	531	June 5, 2020
How to configure a PySpark datasource for accessing the data from AWS S3? Archive	1	1433	March 28, 2020
How to configure a Databricks AWS Datasource Archive how-to , help-wanted	4	664	March 10, 2021
How to configure a Validation Result store in GCS Archive how-to , help-wanted	1	874	September 21, 2020
How to configure an EMR Spark Datasource Feedback how-to , help-wanted	7	1638	December 10, 2021

How to configure a self managed Spark Datasource

Related topics