Hello,
I am trying to validate a column expectation (ExpectColumnValuesToNotBeNull) on a query asset with a Trino connector.
The column I am testing is not recognized and the validation returns the following error message: “Error: The column “<column_name>” in BatchData does not exist.”
Here is a code I am using to reproduce the issue:
import great_expectations as gx
import great_expectations.expectations as gxe
context = gx.get_context(mode="ephemeral")
connection_string = 'trino://username:password@host:port/catalog_name'
data_source = context.data_sources.add_or_update_sql(
name='test', connection_string=connection_string
)
query = "SELECT * FROM schema_name.table_name"
data_asset = data_source.add_query_asset(
name='test_asset',
query=query)
batch_definition = data_asset.add_batch_definition(name='FULL_TABLE')
expectation = gxe.ExpectColumnValuesToNotBeNull(
column="column_to_test",
)
validation_result = batch.validate(expectation)
It returns the following traceback:
traceback
{'success': False, 'expectation_config': {'type': 'expect_column_values_to_not_be_null', 'kwargs': {'column': 'column_to_test', 'batch_id': 'test-test_asset'}, 'meta': {}, 'severity': 'critical'}, 'result': {}, 'meta': {}, 'exception_info': {"MetricConfigurationID(metric_name='column_values.nonnull.condition', metric_domain_kwargs_id='f190dfdd20cb14615256281f2b017006', metric_value_kwargs_id=())": {'exception_traceback': 'Traceback (most recent call last):
File "../great_expectations/execution_engine/execution_engine.py", line 577, in \_process_direct_and_bundled_metric_computation_configurations
metric_computation_configuration.metric_fn( # type: ignore\[misc\] # F not callable
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
\*\*metric_computation_configuration.metric_provider_kwargs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "../great_expectations/expectations/metrics/metric_provider.py", line 99, in inner_func
return metric_fn(\*args, \*\*kwargs)
File "../great_expectations/expectations/metrics/map_metric_provider/column_condition_partial.py", line 165, in inner_func
metric_domain_kwargs = get_dbms_compatible_metric_domain_kwargs(
metric_domain_kwargs=metric_domain_kwargs,
batch_columns_list=metrics\["table.columns"\],
)
File "../great_expectations/expectations/metrics/util.py", line 746, in get_dbms_compatible_metric_domain_kwargs
column_name: str | sqlalchemy.quoted_name = get_dbms_compatible_column_names(
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~^
column_names=metric_domain_kwargs\["column"\],
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
batch_columns_list=batch_columns_list,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "../great_expectations/expectations/metrics/util.py", line 816, in get_dbms_compatible_column_names
\_verify_column_names_exist_and_get_normalized_typed_column_names_map(
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~^
column_names=column_names,
^^^^^^^^^^^^^^^^^^^^^^^^^^
batch_columns_list=batch_columns_list,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
error_message_template=error_message_template,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "../great_expectations/expectations/metrics/util.py", line 901, in \_verify_column_names_exist_and_get_normalized_typed_column_names_map
raise gx_exceptions.InvalidMetricAccessorDomainKwargsKeyError(
message=error_message_template.format(column_name=column_name)
)
great_expectations.exceptions.exceptions.InvalidMetricAccessorDomainKwargsKeyError: Error: The column "column_to_test" in BatchData does not exist.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "../great_expectations/validator/validation_graph.py", line 290, in \_resolve
self.\_execution_engine.resolve_metrics(
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~^
metrics_to_resolve=computable_metrics,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
metrics=metrics,
^^^^^^^^^^^^^^^^
runtime_configuration=runtime_configuration,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "../great_expectations/execution_engine/execution_engine.py", line 294, in resolve_metrics
return self.\_process_direct_and_bundled_metric_computation_configurations(
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~^
metric_fn_direct_configurations=metric_fn_direct_configurations,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
metric_fn_bundle_configurations=metric_fn_bundle_configurations,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
error_message_template=error_message_template,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "../great_expectations/expectations/metrics/util.py", line 901, in \_verify_column_names_exist_and_get_normalized_typed_column_names_map
raise gx_exceptions.InvalidMetricAccessorDomainKwargsKeyError(
message=error_message_template.format(column_name=column_name)
)
great_expectations.exceptions.exceptions.InvalidMetricAccessorDomainKwargsKeyError: Error: The column "column_to_test" in BatchData does not exist.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "../great_expectations/validator/validation_graph.py", line 290, in \_resolve
self.\_execution_engine.resolve_metrics(
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~^
metrics_to_resolve=computable_metrics,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
metrics=metrics,
^^^^^^^^^^^^^^^^
runtime_configuration=runtime_configuration,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "../great_expectations/execution_engine/execution_engine.py", line 294, in resolve_metrics
return self.\_process_direct_and_bundled_metric_computation_configurations(
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~^
metric_fn_direct_configurations=metric_fn_direct_configurations,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
metric_fn_bundle_configurations=metric_fn_bundle_configurations,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
error_message_template=error_message_template,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "../great_expectations/expectations/metrics/util.py", line 901, in \_verify_column_names_exist_and_get_normalized_typed_column_names_map
raise gx_exceptions.InvalidMetricAccessorDomainKwargsKeyError(
message=error_message_template.format(column_name=column_name)
)
great_expectations.exceptions.exceptions.InvalidMetricAccessorDomainKwargsKeyError: Error: The column "column_to_test" in BatchData does not exist.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "../great_expectations/validator/validation_graph.py", line 290, in \_resolve
self.\_execution_engine.resolve_metrics(
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~^
metrics_to_resolve=computable_metrics,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
metrics=metrics,
^^^^^^^^^^^^^^^^
runtime_configuration=runtime_configuration,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "../great_expectations/execution_engine/execution_engine.py", line 294, in resolve_metrics
return self.\_process_direct_and_bundled_metric_computation_configurations(
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~^
metric_fn_direct_configura^^^^^
error_message_template=error_message_template,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "../great_expectations/expectations/metrics/util.py", line 901, in \_verify_column_names_exist_and_get_normalized_typed_column_names_map
raise gx_exceptions.InvalidMetricAccessorDomainKwargsKeyError(
message=error_message_template.format(column_name=column_name)
)
great_expectations.exceptions.exceptions.InvalidMetricAccessorDomainKwargsKeyError: Error: The column "column_to_test" in BatchData does not exist.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "../great_expectations/validator/validation_graph.py", line 290, in \_resolve
self.\_execution_engine.resolve_metrics(
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~^
metrics_to_resolve=computable_metrics,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
metrics=metrics,
^^^^^^^^^^^^^^^^
runtime_configuration=runtime_configuration,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "../great_expectations/execution_engine/execution_engine.py", line ^^^^^
error_message_template=error_message_template,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "../great_expectations/expectations/metrics/util.py", line 901, in \_verify_column_names_exist_and_get_normalized_typed_column_names_map
raise gx_exceptions.InvalidMetricAccessorDomainKwargsKeyError(
message=error_message_template.format(column_name=column_name)
)
great_expectations.exceptions.exceptions.InvalidMetricAccessorDomainKwargsKeyError: Error: The column "column_to_test" in BatchData does not exist.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "../great_expectations/validator/validation_graph.py", line 290, in \_resolve
self.\_execution_engine.resolve_metrics(
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~^
metrics^^^^^
error_message_template=error_message_template,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "../great_expectations/expectations/metrics/util.py", line 901, in \_verify_column_names_exist_and_get_normalized_typed_column_names_map
raise gx_exceptions.InvalidMetricAccessorDomainKwargsKeyError(
message=error_message_template.format(column_name=column_name)
)
great_expectations.exceptions.exceptions.InvalidMetricAccessorDomainKwargsKeyError: Error: The column "column_to_test" i^^^^^
error_message_template=error_message_template,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:/Users/4242634/AppData/Local/pypoetry/Cache/virtualenvs/gx-poc-x632O^^^^^
error_message_template=error_message_template,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "../great_expectations/expectations/metrics/util.py", line 901, in \_verify_column_names_exist_and_get_normalized_typed_column_names_map
raise gx_exceptions.InvalidMetricAccessorDomainKwargsKeyError(
message=error_message_template.format(column_name=column_name)
)
great_expectations.exceptions.exceptions.InvalidMetricAccessorDomainKwargsKeyError: Error: The column "column_to_test" i^^^^^
error_message_template=error_message_template,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:/Users/4242634/AppData/Local/pypoetry/Cache/virtualenvs/gx-poc-x632O^^^^^
error_message_template=error_message_template,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "../great_expectations/expectations/metrics/util.py", line 901, in \_verify_column_names_exist_and_get_normalized_typed_column_names_map
raise gx_exceptions.InvalidMetricAccessorDomainKwargsKeyError(
message=error_message_template.format(column_name=column_name)
)
great_expectations.exceptions.exceptions.InvalidMetricAccessorDomainKwargsKeyError: Error: The column "column_to_test" in BatchData does not exist.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "../great_expectations/validator/validation_graph.py", line 290, in \_resolve
self.\_execution_engine.resolve_metrics(
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~^
metrics_to_resolve=computable_metrics,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
metrics=metrics,
^^^^^^^^^^^^^^^^
runtime_configuration=runtime_configuration,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "../great_expectations/execution_engine/execution_engine.py", line 294, in resolve_metrics
return self.\_process_direct_and_bundled_metric_computation_configurations(
\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~^
metric_fn_direct_configurations=metric_fn_direct_configurations,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
metric_fn_bundle_configurations=metric_fn_bundle_configurations,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "../great_expectations/execution_engine/execution_engine.py", line 582, in \_process_direct_and_bundled_metric_computation_configurations
raise gx_exceptions.MetricResolutionError(
...<2 lines>...
) from e
great_expectations.exceptions.exceptions.MetricResolutionError: Error: The column "column_to_test" in BatchData does not exist.
', 'exception_message': 'Error: The column "column_to_test" in BatchData does not exist.', 'raised_exception': True}}}
Using a table asset allows to successfully run the validation but the query that I want to use in my project is actually more complex than in the code I’m using to reproduce the issue.
From my understanding, the validator is trying to use the table schema to check the column’s availability, which returns an empty list with a query asset:
>>> batch = batch_definition.get_batch()
>>> batch.columns()
Calculating Metrics: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3.68it/s]
[]
while batch.head() returns results containing the column_to_test.
It seems to be a bug but I might be doing something wrong and would appreciate some help understanding the issue.
Notes:
- The query that I want to use in my project is actually more complex than in the code I’m using to reproduce the issue.
- Using UnexpectedRowsExpectation works on the query asset as it does not seem to check for columns in the table schema. But I would like to use ExpectColumnValuesToNotBeNull in order to have access to element_count an not only unexpected_count in the output.
- I have tried to move the schema_name in the connection string but then batch.columns() fails instead of returning an empty list
KeyError: MetricConfigurationID(metric_name=‘table.columns’, metric_domain_kwargs_id=‘batch_id=test-test_asset’, metric_value_kwargs_id=()) - The issue seems similar to this one: Problem with data_source.add_query_asset but it does not seem to be solved.
- I am using
great_expectations==1.11.0