Help understanding datasource_name config for RuntimeBatchRequest

anais · September 6, 2023, 7:54am

Hi, I am trying to use great expectations with aws glue. The data will be read at runtime, I follow this tutorial How to Use Great Expectations in AWS Glue | Great Expectations
My problem is to understand what to set for datasource_name as when I use the name provided by the tutorial it is not working. I got this error message : DatasourceError: Cannot initialize datasource version-0.15.50 spark_s3, error: The given datasource could not be retrieved from the DataContext; please confirm that your configuration is accurate.
From the code provided it looks like this:
expectation_suite_name = “version-0.15.50 suite_name”
suite = context_gx.add_expectation_suite(expectation_suite_name)

batch_request = RuntimeBatchRequest(
    datasource_name="version-0.15.50 spark_s3",
    data_asset_name="version-0.15.50 datafile_name",
    batch_identifiers={"runtime_batch_identifier_name": "default_identifier"},
    data_connector_name="version-0.15.50 default_inferred_data_connector_name",
    runtime_parameters={"batch_data": df},
)

I understand that datasource is what kind of data such as in this case using pyspark, but what is datasource_name. I don’t see in the yaml where it is set, it looks like this:

config_version: 3.0
datasources:
  spark_s3:
    module_name: great_expectations.datasource
    class_name: Datasource
    execution_engine:
      module_name: great_expectations.execution_engine
      class_name: SparkDFExecutionEngine
    data_connectors:
      default_runtime_data_connector_name:
        batch_identifiers:
          - runtime_batch_identifier_name
        module_name: great_expectations.datasource.data_connector
        class_name: RuntimeDataConnector

Is the datasource_name by default or should i set it ? Can you help me understand that please? Actually if someone can explain me all parameters to set for RuntimeBatchRequest (datasource_name, data_asset_name, batch_identifiers, data_connector_name, data_connector_name) that would be awesome because it is not clear to me.

nevintan · September 6, 2023, 3:49pm

Hi @anais, thanks for the question! Can you please verify which version of GX you are running this code with?

anais · September 7, 2023, 7:56am

I use the version 0.15.50 as for now it is the last version that support aws glue.
I finally found that the datasource_name has to be the one written in the yaml file, which is “spark_s3” and not working as it was written in the tutorial “version-0.15.50 spark_s3”, same for data_asset_name and data_connector_name I removed the mention of the version.
I also need to add this parameter “force_reuse_spark_context: true” under execution_engine , otherwise it was not working.
I also adapt the end of the script as the checkpoint part was not working with the providing code.
At the end the data_asset_name is still not clear to me what is it use for as the data seems identified by the batch identifiers.

HaebichanGX · September 13, 2023, 3:31pm

Ah yes, so @anais our support for AWS Glue is only up to 0.15 and that version is quite old and deprecated. Our new version doesn’t fully support AWS Glue as it has not been tested on our end. Therefore, it’s likely that the tutorial will not work.

anais · September 14, 2023, 8:47am

Thank you for your answer, I finally manage to make it work.
But I wonder if I should keep going with GX as you said the version is old and deprecated. Do you know if it will be integrated in the new version?

HaebichanGX · September 14, 2023, 5:58pm

Hey @anais can you share your solution here? So it can be helpful for our community. I don’t think there is a timeline for this request yet due to our internal prioritization.

Topic		Replies	Views
GX-Databricks:Datasource-Data asset - Validator GX Core Support databricks , datasource	8	447	December 19, 2024
DataContextError: Datasource is not a FluentDatasource GX Core Support how-to , help-wanted	1	501	October 18, 2024
Connecting GE with S3 Archive	5	2675	May 28, 2021
Migration from 0.18->1.2.4 GX Core Support	3	143	March 27, 2025
Unable to see data asset name in data doc GX Core Support	8	280	April 22, 2024

Help understanding datasource_name config for RuntimeBatchRequest

Related topics