Validator.columns() returns empty list - validator.head() shows data though

Hello everyone,

I am trying to setup GX on a table that we have stored in an S3 Storage. The connection should be done with Trino.
I am running on Python 3.11 with GX 0.17.14 in a Jupyter Notebook.

I tried to follow the getting started guide in the documentation. Here is my code so far:

import great_expectations as gx

context = gx.get_context()

connection_string = f"trino://$USER:$PASSWORD@MY_TRINO_CONN_STRING/deltalake/deltalake_schema"

data_source = context.sources.add_sql(
    name="my_datasource", connection_string=connection_string
)
my_asset = data_source.add_query_asset(
    name="my_asset",
    query="SELECT * FROM some_table LIMIT 100",
)
my_asset.add_splitter_column_value("tenant")

my_batch_request = my_asset.build_batch_request()

context.add_or_update_expectation_suite("my_gx_suite")

validator = context.get_validator(
    batch_request=my_batch_request,
    expectation_suite_name="my_gx_suite",
)
validator.head()

This code shows some output:
Bild 13.09.23 um 14.29

However if I run

validator.columns()

it returns an empty list.

I am surprised, that I can see data in the head() output, but it seems, that somehow the data is not (really) present.

Did I forget any crucial steps?

just fyi: don’t be surprised, that the returned data frame in validator.output has only a single row - our test environment sometimes has some very sparse tables :upside_down_face:

Hm… @allekai that does sound very strange. Overall your code looks good, no steps forgotten. It maybe the case that your sparse tables are causing some issues that’s making it difficult for our validator to track. Please let me know if this is causing issues repeatedly. You’re also welcome to try with non sparse tables just as a testing procedure.

Hi, thanks for your Feedback - I also tried with a different table, that has at least a few hundred rows. I get the same behavior with that table :confused:

Here is a count(*) grouped by tenant for the table:

test

So the size should probably not be a problem.

I also oserve the following warning:
EXECUTE IMMEDIATE not available for MY_TRINO_URL:443; defaulting to legacy prepared statements (TrinoUserError(type=USER_ERROR, name=SYNTAX_ERROR, message=“line 1:19: mismatched input ‘‘SELECT 1’’. Expecting: ‘USING’, ”, query_id=20230913_143138_26694_xwq25))
when running either the validator.head() or validator.expect_column_values_to_not_be_null(column="my_column")

Interestingly the validator.expect_column_to_exist(column="my_column") runs without warning (but of course with the value "sucess": false, which makes sense, since the validator doesn’t find any columns

Here is my stacktrace when running validator.expect_column_values_to_not_be_null(column="my_column"):
Somewhere in the MetricsCalculator part are lost? I am new to GX, still trying to wrap my head around the concepts :sweat_smile:

InvalidMetricAccessorDomainKwargsKeyError Traceback (most recent call last)
File ~/opt/anaconda3/envs/gx/lib/python3.11/site-packages/great_expectations/execution_engine/execution_engine.py:548, in ExecutionEngine._process_direct_and_bundled_metric_computation_configurations(self, metric_fn_direct_configurations, metric_fn_bundle_configurations)
    545 try:
    546     resolved_metrics[
    547         metric_computation_configuration.metric_configuration.id
--> 548     ] = metric_computation_configuration.metric_fn(  # type: ignore[misc] # F not callable
    549         **metric_computation_configuration.metric_provider_kwargs
    550     )
    551 except Exception as e:

File ~/opt/anaconda3/envs/gx/lib/python3.11/site-packages/great_expectations/expectations/metrics/metric_provider.py:90, in metric_partial.<locals>.wrapper.<locals>.inner_func(*args, **kwargs)
     88 @wraps(metric_fn)
     89 def inner_func(*args, **kwargs):
---> 90     return metric_fn(*args, **kwargs)

File ~/opt/anaconda3/envs/gx/lib/python3.11/site-packages/great_expectations/expectations/metrics/map_metric_provider/column_condition_partial.py:169, in column_condition_partial.<locals>.wrapper.<locals>.inner_func(cls, execution_engine, metric_domain_kwargs, metric_value_kwargs, metrics, runtime_configuration)
    154 @metric_partial(
    155     engine=engine,
    156     partial_fn_type=partial_fn_type,
   (...)
    167     runtime_configuration: dict,
    168 ):
--> 169     metric_domain_kwargs = get_dbms_compatible_metric_domain_kwargs(
    170         metric_domain_kwargs=metric_domain_kwargs,
    171         batch_columns_list=metrics["table.columns"],
    172     )
    174     (
    175         selectable,
    176         compute_domain_kwargs,
   (...)
    179         domain_kwargs=metric_domain_kwargs, domain_type=domain_type
    180     )

File ~/opt/anaconda3/envs/gx/lib/python3.11/site-packages/great_expectations/expectations/metrics/util.py:629, in get_dbms_compatible_metric_domain_kwargs(metric_domain_kwargs, batch_columns_list)
    628 if "column" in metric_domain_kwargs:
--> 629     column_name: str | sqlalchemy.quoted_name = get_dbms_compatible_column_names(
    630         column_names=metric_domain_kwargs["column"],
    631         batch_columns_list=batch_columns_list,
    632     )
    633     metric_domain_kwargs["column"] = column_name

File ~/opt/anaconda3/envs/gx/lib/python3.11/site-packages/great_expectations/expectations/metrics/util.py:703, in get_dbms_compatible_column_names(column_names, batch_columns_list, error_message_template)
    683 """
    684 Case non-sensitivity is expressed in upper case by common DBMS backends and in lower case by SQLAlchemy, with any
    685 deviations enclosed with double quotes.
   (...)
    698     Single property-typed column name object or list of property-typed column name objects (depending on input).
    699 """
    700 normalized_typed_batch_columns_mappings: List[
    701     Tuple[str, str | sqlalchemy.quoted_name]
    702 ] = (
--> 703     _verify_column_names_exist_and_get_normalized_typed_column_names_map(
    704         column_names=column_names,
    705         batch_columns_list=batch_columns_list,
    706         error_message_template=error_message_template,
    707     )
    708     or []
    709 )
    711 element: Tuple[str, str | sqlalchemy.quoted_name]

File ~/opt/anaconda3/envs/gx/lib/python3.11/site-packages/great_expectations/expectations/metrics/util.py:799, in _verify_column_names_exist_and_get_normalized_typed_column_names_map(column_names, batch_columns_list, error_message_template, verify_only)
    798 if normalized_column_name_mapping is None:
--> 799     raise gx_exceptions.InvalidMetricAccessorDomainKwargsKeyError(
    800         message=error_message_template.format(column_name=column_name)
    801     )
    802 else:  # noqa: PLR5501

InvalidMetricAccessorDomainKwargsKeyError: Error: The column "my_column" in BatchData does not exist.

The above exception was the direct cause of the following exception:

MetricResolutionError                     Traceback (most recent call last)
Cell In[43], line 2
      1 import pdb; pdb.set_trace()
----> 2 validator.expect_column_values_to_not_be_null(column="my_column")

File ~/opt/anaconda3/envs/gx/lib/python3.11/site-packages/great_expectations/validator/validator.py:594, in Validator.validate_expectation.<locals>.inst_expectation(*args, **kwargs)
    588         validation_result = ExpectationValidationResult(
    589             success=False,
    590             exception_info=exception_info,
    591             expectation_config=configuration,
    592         )
    593     else:
--> 594         raise err
    596 if self._include_rendered_content:
    597     validation_result.render()

File ~/opt/anaconda3/envs/gx/lib/python3.11/site-packages/great_expectations/validator/validator.py:557, in Validator.validate_expectation.<locals>.inst_expectation(*args, **kwargs)
    553     validation_result = ExpectationValidationResult(
    554         expectation_config=copy.deepcopy(expectation.configuration)
    555     )
    556 else:
--> 557     validation_result = expectation.validate(
    558         validator=self,
    559         evaluation_parameters=self._expectation_suite.evaluation_parameters,
    560         data_context=self._data_context,
    561         runtime_configuration=basic_runtime_configuration,
    562     )
    564 # If validate has set active_validation to true, then we do not save the config to avoid
    565 # saving updating expectation configs to the same suite during validation runs
    566 if self._active_validation is True:

File ~/opt/anaconda3/envs/gx/lib/python3.11/site-packages/great_expectations/expectations/expectation.py:1276, in Expectation.validate(self, validator, configuration, evaluation_parameters, interactive_evaluation, data_context, runtime_configuration)
   1267 self._warn_if_result_format_config_in_expectation_configuration(
   1268     configuration=configuration
   1269 )
   1271 configuration.process_evaluation_parameters(
   1272     evaluation_parameters, interactive_evaluation, data_context
   1273 )
   1274 expectation_validation_result_list: list[
   1275     ExpectationValidationResult
-> 1276 ] = validator.graph_validate(
   1277     configurations=[configuration],
   1278     runtime_configuration=runtime_configuration,
   1279 )
   1280 return expectation_validation_result_list[0]

File ~/opt/anaconda3/envs/gx/lib/python3.11/site-packages/great_expectations/validator/validator.py:1069, in Validator.graph_validate(self, configurations, runtime_configuration)
   1067         return evrs
   1068     else:
-> 1069         raise err
   1071 configuration: ExpectationConfiguration
   1072 result: ExpectationValidationResult

File ~/opt/anaconda3/envs/gx/lib/python3.11/site-packages/great_expectations/validator/validator.py:1048, in Validator.graph_validate(self, configurations, runtime_configuration)
   1041 resolved_metrics: _MetricsDict
   1043 try:
   1044     (
   1045         resolved_metrics,
   1046         evrs,
   1047         processed_configurations,
-> 1048     ) = self._resolve_suite_level_graph_and_process_metric_evaluation_errors(
   1049         graph=graph,
   1050         runtime_configuration=runtime_configuration,
   1051         expectation_validation_graphs=expectation_validation_graphs,
   1052         evrs=evrs,
   1053         processed_configurations=processed_configurations,
   1054         show_progress_bars=self._determine_progress_bars(),
   1055     )
   1056 except Exception as err:
   1057     # If a general Exception occurs during the execution of "ValidationGraph.resolve()", then
   1058     # all expectations in the suite are impacted, because it is impossible to attribute the failure to a metric.
   1059     if catch_exceptions:

File ~/opt/anaconda3/envs/gx/lib/python3.11/site-packages/great_expectations/validator/validator.py:1207, in Validator._resolve_suite_level_graph_and_process_metric_evaluation_errors(self, graph, runtime_configuration, expectation_validation_graphs, evrs, processed_configurations, show_progress_bars)
   1199 resolved_metrics: _MetricsDict
   1200 aborted_metrics_info: Dict[
   1201     _MetricKey,
   1202     Dict[str, Union[MetricConfiguration, Set[ExceptionInfo], int]],
   1203 ]
   1204 (
   1205     resolved_metrics,
   1206     aborted_metrics_info,
-> 1207 ) = self._metrics_calculator.resolve_validation_graph(
   1208     graph=graph,
   1209     runtime_configuration=runtime_configuration,
   1210     min_graph_edges_pbar_enable=0,
   1211 )
   1213 # Trace MetricResolutionError occurrences to expectations relying on corresponding malfunctioning metrics.
   1214 rejected_configurations: List[ExpectationConfiguration] = []

File ~/opt/anaconda3/envs/gx/lib/python3.11/site-packages/great_expectations/validator/metrics_calculator.py:287, in MetricsCalculator.resolve_validation_graph(self, graph, runtime_configuration, min_graph_edges_pbar_enable)
    282 resolved_metrics: _MetricsDict
    283 aborted_metrics_info: Dict[
    284     _MetricKey,
    285     Dict[str, Union[MetricConfiguration, Set[ExceptionInfo], int]],
    286 ]
--> 287 resolved_metrics, aborted_metrics_info = graph.resolve(
    288     runtime_configuration=runtime_configuration,
    289     min_graph_edges_pbar_enable=min_graph_edges_pbar_enable,
    290     show_progress_bars=self._show_progress_bars,
    291 )
    292 return resolved_metrics, aborted_metrics_info

File ~/opt/anaconda3/envs/gx/lib/python3.11/site-packages/great_expectations/validator/validation_graph.py:209, in ValidationGraph.resolve(self, runtime_configuration, min_graph_edges_pbar_enable, show_progress_bars)
    203 resolved_metrics: Dict[_MetricKey, MetricValue] = {}
    205 # updates graph with aborted metrics
    206 aborted_metrics_info: Dict[
    207     _MetricKey,
    208     Dict[str, Union[MetricConfiguration, Set[ExceptionInfo], int]],
--> 209 ] = self._resolve(
    210     metrics=resolved_metrics,
    211     runtime_configuration=runtime_configuration,
    212     min_graph_edges_pbar_enable=min_graph_edges_pbar_enable,
    213     show_progress_bars=show_progress_bars,
    214 )
    216 return resolved_metrics, aborted_metrics_info

File ~/opt/anaconda3/envs/gx/lib/python3.11/site-packages/great_expectations/validator/validation_graph.py:315, in ValidationGraph._resolve(self, metrics, runtime_configuration, min_graph_edges_pbar_enable, show_progress_bars)
    311                 failed_metric_info[failed_metric.id]["exception_info"] = {
    312                     exception_info
    313                 }
    314     else:
--> 315         raise err
    316 except Exception as e:
    317     if catch_exceptions:

File ~/opt/anaconda3/envs/gx/lib/python3.11/site-packages/great_expectations/validator/validation_graph.py:285, in ValidationGraph._resolve(self, metrics, runtime_configuration, min_graph_edges_pbar_enable, show_progress_bars)
    280         computable_metrics.add(metric)
    282 try:
    283     # Access "ExecutionEngine.resolve_metrics()" method, to resolve missing "MetricConfiguration" objects.
    284     metrics.update(
--> 285         self._execution_engine.resolve_metrics(
    286             metrics_to_resolve=computable_metrics,  # type: ignore[arg-type]  # Metric typing needs further refinement.
    287             metrics=metrics,  # type: ignore[arg-type]  # Metric typing needs further refinement.
    288             runtime_configuration=runtime_configuration,
    289         )
    290     )
    291     progress_bar.update(len(computable_metrics))
    292     progress_bar.refresh()

File ~/opt/anaconda3/envs/gx/lib/python3.11/site-packages/great_expectations/execution_engine/execution_engine.py:283, in ExecutionEngine.resolve_metrics(self, metrics_to_resolve, metrics, runtime_configuration)
    274 metric_fn_bundle_configurations: List[MetricComputationConfiguration]
    275 (
    276     metric_fn_direct_configurations,
    277     metric_fn_bundle_configurations,
   (...)
    281     runtime_configuration=runtime_configuration,
    282 )
--> 283 return self._process_direct_and_bundled_metric_computation_configurations(
    284     metric_fn_direct_configurations=metric_fn_direct_configurations,
    285     metric_fn_bundle_configurations=metric_fn_bundle_configurations,
    286 )

File ~/opt/anaconda3/envs/gx/lib/python3.11/site-packages/great_expectations/execution_engine/execution_engine.py:552, in ExecutionEngine._process_direct_and_bundled_metric_computation_configurations(self, metric_fn_direct_configurations, metric_fn_bundle_configurations)
    546         resolved_metrics[
    547             metric_computation_configuration.metric_configuration.id
    548         ] = metric_computation_configuration.metric_fn(  # type: ignore[misc] # F not callable
    549             **metric_computation_configuration.metric_provider_kwargs
    550         )
    551     except Exception as e:
--> 552         raise gx_exceptions.MetricResolutionError(
    553             message=str(e),
    554             failed_metrics=(
    555                 metric_computation_configuration.metric_configuration,
    556             ),
    557         ) from e
    559 try:
    560     # an engine-specific way of computing metrics together
    561     resolved_metric_bundle: Dict[
    562         Tuple[str, str, str], MetricValue
    563     ] = self.resolve_metric_bundle(
    564         metric_fn_bundle=metric_fn_bundle_configurations
    565     )

MetricResolutionError: Error: The column "my_column" in BatchData does not exist.

I will try it again with an older version of GX - I hope I can find time today

Unfortunately, I still get an error even with the downgraded Version 0.16.16

@HaebichanGX should I open an GitHub issue for this? Or how can we proceed here?