Hi Team,
I am exploring Great Expectations to use it as a Data Quality tool for our datalake which is in delta format in GCS bucket. I am planning to use Composer for Orchestration and use a DataprocSubmitPySparkJobOperator to trigger the python job.
I have a simple python job like below
def run_great_expectations(context_root_dir, checkpoint_name):
context = DataContext(context_root_dir=context_root_dir)
print(context.datasources.values())
checkpoint_result = context.run_checkpoint(checkpoint_name=checkpoint_name)
print(checkpoint_result)
Below is my great_expectations.yml for the datasoource: spark_delta_read
datasources:
spark_delta_read:
class_name: Datasource
module_name: great_expectations.datasource
execution_engine:
class_name: SparkDFExecutionEngine
module_name: great_expectations.execution_engine
data_connectors:
default_inferred_data_connector_name:
name: default_inferred_data_connector_name
class_name: InferredAssetFilesystemDataConnector
module_name: great_expectations.datasource.data_connector
base_directory: gs://bucket/data/
default_regex:
group_names:
- data_asset_name
pattern: (.*)
batch_spec_passthrough:
reader_method: delta
My checkpoint file looks like below
name: brands_delta_checkpoint
config_version: 1.0
class_name: SimpleCheckpoint
validations:
- batch_request:
datasource_name: spark_delta_read
data_connector_name: default_inferred_data_connector_name
data_asset_name: brands
data_connector_query:
index: -1
action_list:
- name: store_validation_result
action:
class_name: StoreValidationResultAction
- name: store_evaluation_params
action:
class_name: StoreEvaluationParametersAction
- name: update_data_docs
action:
class_name: UpdateDataDocsAction
expectation_suite_name: brands
As soon I load my context, I get the error great_expectations.exceptions.exceptions.DatasourceInitializationError: Cannot initialize datasource spark_delta_read, error: ‘spark.remote’
Below are the versions I am using
great_expectations : 0.18.7
pyspark : 3.4.2
Please guide me with the issue and the suggest me if I have to modify anything in my great_expectations.yml file