Help understanding datasource_name config for RuntimeBatchRequest

Hi, I am trying to use great expectations with aws glue. The data will be read at runtime, I follow this tutorial How to Use Great Expectations in AWS Glue | Great Expectations
My problem is to understand what to set for datasource_name as when I use the name provided by the tutorial it is not working. I got this error message : DatasourceError: Cannot initialize datasource version-0.15.50 spark_s3, error: The given datasource could not be retrieved from the DataContext; please confirm that your configuration is accurate.
From the code provided it looks like this:
expectation_suite_name = “version-0.15.50 suite_name”
suite = context_gx.add_expectation_suite(expectation_suite_name)

batch_request = RuntimeBatchRequest(
    datasource_name="version-0.15.50 spark_s3",
    data_asset_name="version-0.15.50 datafile_name",
    batch_identifiers={"runtime_batch_identifier_name": "default_identifier"},
    data_connector_name="version-0.15.50 default_inferred_data_connector_name",
    runtime_parameters={"batch_data": df},
)

I understand that datasource is what kind of data such as in this case using pyspark, but what is datasource_name. I don’t see in the yaml where it is set, it looks like this:

config_version: 3.0
datasources:
  spark_s3:
    module_name: great_expectations.datasource
    class_name: Datasource
    execution_engine:
      module_name: great_expectations.execution_engine
      class_name: SparkDFExecutionEngine
    data_connectors:
      default_runtime_data_connector_name:
        batch_identifiers:
          - runtime_batch_identifier_name
        module_name: great_expectations.datasource.data_connector
        class_name: RuntimeDataConnector

Is the datasource_name by default or should i set it ? Can you help me understand that please? Actually if someone can explain me all parameters to set for RuntimeBatchRequest (datasource_name, data_asset_name, batch_identifiers, data_connector_name, data_connector_name) that would be awesome because it is not clear to me.

Hi @anais, thanks for the question! Can you please verify which version of GX you are running this code with?

I use the version 0.15.50 as for now it is the last version that support aws glue.
I finally found that the datasource_name has to be the one written in the yaml file, which is “spark_s3” and not working as it was written in the tutorial “version-0.15.50 spark_s3”, same for data_asset_name and data_connector_name I removed the mention of the version.
I also need to add this parameter “force_reuse_spark_context: true” under execution_engine , otherwise it was not working.
I also adapt the end of the script as the checkpoint part was not working with the providing code.
At the end the data_asset_name is still not clear to me what is it use for as the data seems identified by the batch identifiers.

Ah yes, so @anais our support for AWS Glue is only up to 0.15 and that version is quite old and deprecated. Our new version doesn’t fully support AWS Glue as it has not been tested on our end. Therefore, it’s likely that the tutorial will not work.

Thank you for your answer, I finally manage to make it work.
But I wonder if I should keep going with GX as you said the version is old and deprecated. Do you know if it will be integrated in the new version?

Hey @anais can you share your solution here? So it can be helpful for our community. I don’t think there is a timeline for this request yet due to our internal prioritization.