Hello there,
I am reposting from Slack. I have a problem with an Azure Blob Container. I’m testing Great Expectations on Databricks and tried to connect to NYC taxi record on Azure open data blob account.
But when I try to validate my configuration file, I got a DatasourceInitializationError. If I show details I found that BlobServiceClient is NoneType.
Looks like GX is doing something wrong here, any idea about it ? Should it be a Github issue ?
My configuration file (azure opendata info):
name: yellow_taxi_datasource
class_name: Datasource
execution_engine:
class_name: SparkDFExecutionEngine
azure_options:
account_url: https://azureopendatastorage.blob.core.windows.net
data_connectors:
default_runtime_data_connector_name:
class_name: RuntimeDataConnector
batch_identifiers:
- default_identifier_name
yellow_taxi_connector:
class_name: InferredAssetAzureDataConnector
azure_options:
account_url: https://azureopendatastorage.blob.core.windows.net
container: nyctlc
name_starts_with: yellow
default_regex:
pattern: (.*)\.csv
group_names:
- data_asset_name
Details about my error:
Exception details:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/local_disk0/.ephemeral_nfs/envs/pythonEnv-602a5fde-cbf7-4b3c-a133-4c7043bc490c/lib/python3.9/site-packages/great_expectations/datasource/data_connector/inferred_asset_azure_data_connector.py in __init__(self, name, datasource_name, container, execution_engine, default_regex, sorters, name_starts_with, delimiter, azure_options, batch_spec_passthrough, id)
105 ).group(1)
--> 106 self._azure = BlobServiceClient(**azure_options)
107 except (TypeError, AttributeError):
TypeError: 'NoneType' object is not callable
During handling of the above exception, another exception occurred:
ImportError Traceback (most recent call last)
/local_disk0/.ephemeral_nfs/envs/pythonEnv-602a5fde-cbf7-4b3c-a133-4c7043bc490c/lib/python3.9/site-packages/great_expectations/data_context/data_context/abstract_data_context.py in _instantiate_datasource_from_config(self, raw_config, substituted_config)
3786 try:
-> 3787 datasource: Datasource = self._build_datasource_from_config(
3788 raw_config=raw_config, substituted_config=substituted_config
...
DatasourceInitializationError: Cannot initialize datasource yellow_taxi_datasource, error: Unable to load Azure BlobServiceClient (it is required for InferredAssetAzureDataConnector). Please ensure that you have provided the appropriate keys to `azure_options` for authentication.
Another try with connection_string:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/local_disk0/.ephemeral_nfs/envs/pythonEnv-602a5fde-cbf7-4b3c-a133-4c7043bc490c/lib/python3.9/site-packages/great_expectations/datasource/data_connector/inferred_asset_azure_data_connector.py in __init__(self, name, datasource_name, container, execution_engine, default_regex, sorters, name_starts_with, delimiter, azure_options, batch_spec_passthrough, id)
100 ).group(1)
--> 101 self._azure = BlobServiceClient.from_connection_string(**azure_options)
102 elif account_url is not None:
AttributeError: 'NoneType' object has no attribute 'from_connection_string'