Issues with configuring metadata store connection to Azure Data Lake Storage Gen2

Hello,

I’m trying to configure a Great Expectation context in a notebook in Databricks with metadata stores saved in an Azure Data Lake Storage Gen2 container but I’m getting connection errors. For this configuration project I’ve copied code from the following example :

I made the following changes to the code :

  • I did not copy the data source used in the example in github and instead I’m using a spark data source and a dataframe as an asset.
  • I did not copy the “local_site_for_hosting” because I’m not using a docker container.
  • I started by using “credential” and “account_url” for each metadata store (same as in the example) but when that wasn’t working I removed them and started using “connection_string” instead

Everything works fine until I try to add an expectation suite to the context. Then I get an error telling me “StoreBackendError: Due to exception: “Connection string is either blank or malformed.”, “azure_client” could not be created.”. I’m using a connection string for the storage account that I get from the Azure portal.

Here’s a copy of the code I’m using :

import great_expectations as gx
import os

context = gx.get_context()

# Set up metadata stores (for expectations and validations)
if metadata_stores_bucket_name:
    expectations_store_name = "abs_expectations_store"
    validations_store_name = "abs_validations_store"
    context.add_store(
        expectations_store_name,
        {
            "class_name": "ExpectationsStore",
            "store_backend": {
                "class_name": "TupleAzureBlobStoreBackend",
                "container": "${ABS_METADATA_STORES_CONTAINER_NAME}",
                "prefix": "expectations",
                "connection_string": "${AZURE_STORAGE_CONNECTION_STRING}"
            },
        },
    )
    context.add_store(
        validations_store_name,
        {
            "class_name": "ValidationsStore",
            "store_backend": {
                "class_name": "TupleAzureBlobStoreBackend",
                "container": "${ABS_METADATA_STORES_CONTAINER_NAME}",
                "prefix": "validations",
                "connection_string": "${AZURE_STORAGE_CONNECTION_STRING}"
            },
        },
    )
    # Set these stores as the active stores
    context.expectations_store_name = expectations_store_name
    context.validations_store_name = validations_store_name
else:
    print(
        "No bucket name provided for metadata stores, reverting to local file based storage."
    )

# Set up data docs site
if metadata_stores_bucket_name:
    new_site_name = "abs_site"
    new_site_config = {
        "class_name": "SiteBuilder",
        "store_backend": {
            "class_name": "TupleAzureBlobStoreBackend",
            "container": "${ABS_METADATA_STORES_CONTAINER_NAME}",
            "prefix": "data_docs",
            "connection_string": "${AZURE_STORAGE_CONNECTION_STRING}"
        },
        "site_index_builder": {"class_name": "DefaultSiteIndexBuilder"},
    }

    context.add_data_docs_site(new_site_name, new_site_config)

else:
    print(
        "No bucket name provided for data docs site stores, reverting to local file based storage."
    )

# Define a datasource and add it

datasource_name = "abs_test"

datasource = context.sources.add_spark(name=datasource_name)

if datasource_name in context.datasources:
    print('datasource is in the context')

# Create a dataframe and add it as an asset

df = spark.read.table(f"{catalog_name}.{schema_name}.{table_name}")

dataframe_asset = datasource.add_dataframe_asset(name=data_asset_name, dataframe=df)

# Create a batch request

my_batch_request = dataframe_asset.build_batch_request(dataframe=df)

batches = dataframe_asset.get_batch_list_from_batch_request(my_batch_request)

print("len(batches):", len(batches))

# Create an expectation suite

expectation_suite_name = "gx_test_validations"

context.add_or_update_expectation_suite(expectation_suite_name=expectation_suite_name)

What’s interesting and weird though is that when I use code from the following module (as a separate project) :

I do see that that Great Expectations project is able to connect to the same ADLS account and container and it’s able to read information about csv files there.

I would appreciate any help I can get because I don’t what I can do to solve this.