Hi, I am trying to use great expectations with aws glue. The data will be read at runtime, I follow this tutorial How to Use Great Expectations in AWS Glue | Great Expectations
My problem is to understand what to set for datasource_name as when I use the name provided by the tutorial it is not working. I got this error message : DatasourceError: Cannot initialize datasource version-0.15.50 spark_s3, error: The given datasource could not be retrieved from the DataContext; please confirm that your configuration is accurate.
From the code provided it looks like this:
expectation_suite_name = “version-0.15.50 suite_name”
suite = context_gx.add_expectation_suite(expectation_suite_name)
batch_request = RuntimeBatchRequest(
datasource_name="version-0.15.50 spark_s3",
data_asset_name="version-0.15.50 datafile_name",
batch_identifiers={"runtime_batch_identifier_name": "default_identifier"},
data_connector_name="version-0.15.50 default_inferred_data_connector_name",
runtime_parameters={"batch_data": df},
)
I understand that datasource is what kind of data such as in this case using pyspark, but what is datasource_name. I don’t see in the yaml where it is set, it looks like this:
config_version: 3.0
datasources:
spark_s3:
module_name: great_expectations.datasource
class_name: Datasource
execution_engine:
module_name: great_expectations.execution_engine
class_name: SparkDFExecutionEngine
data_connectors:
default_runtime_data_connector_name:
batch_identifiers:
- runtime_batch_identifier_name
module_name: great_expectations.datasource.data_connector
class_name: RuntimeDataConnector
Is the datasource_name by default or should i set it ? Can you help me understand that please? Actually if someone can explain me all parameters to set for RuntimeBatchRequest (datasource_name, data_asset_name, batch_identifiers, data_connector_name, data_connector_name) that would be awesome because it is not clear to me.