Unable to see data asset name in data doc

roger67 · April 12, 2024, 10:51am

below is the code block through which i generating data docs but i am not able to filter the validation result through data asset. Additionally i am not able to see profiler info on the data doc

data_asset_name = ‘test’

dataframe_datasource = context.sources.add_or_update_spark(
name=table_name,
)
dataframe_asset = dataframe_datasource.add_dataframe_asset(
name= data_asset_name,
dataframe=df,
)
batch_request = dataframe_asset.build_batch_request()
print(batch_request)
validator = context.get_validator(
batch_request=batch_request,
expectation_suite_name=expectation_suite_name,
)
validator = context.get_validator(
batch_request=batch_request,
expectation_suite_name=expectation_suite_name
)

#validate the data
results = validator.validate()
checkpoint = context.add_or_update_checkpoint(
name=checkpoint_name,
validator=validator,
)
checkpoint_result = checkpoint.run()

context.build_data_docs()

ToivoMattila · April 16, 2024, 3:42am

I don’t have a solution but I replicated this and don’t have the asset name in Data Docs either.

The asset name is not listed on the validation page either

I can see the data asset name in the validation just fine so that’s not the problem

Screenshot_20240416_063802

roger67 · April 16, 2024, 6:55am

is there any workaround for this, can we pass the asset name manually before building data docs or in while batch request

ToivoMattila · April 16, 2024, 1:33pm

I’ll take a look at this tomorrow
Someone from the GX staff might also have some ideas, hopefully they check out this thread.

nevintan · April 16, 2024, 2:04pm

Hey all, I believe this was previously answered here: Asset Name not shown in Data Docs

Please take a look to see if that resolves the issue!

ToivoMattila · April 17, 2024, 5:09am

Thanks @nevintan !

Unfortunately, I couldn’t get this working with the batch_spec_passthrough but got an error instead.

Here’s my code:

checkpoint = context.add_or_update_checkpoint(
    name="example checkpoint",
    batch_request={
        'datasource_name': 'example',
        'data_asset_name': 'fruits',
        'batch_spec_passthrough': {
            'data_asset_name': "fruits",
        },
    },
    expectation_suite_name="example suite",
)

This results in example_checkpoint.yml looking like this:

...
batch_request:
  datasource_name: example
  data_asset_name: fruits
  batch_spec_passthrough:
    data_asset_name: fruits
...

Is this correct? As far as I can tell, this matches Example 1 from the mentioned thread.

However, running checkpoint.run() throws a ValidationError, stack trace below.

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In[22], line 1
----> 1 checkpoint.run()

File ~/code/gx-exploration/.venv/lib/python3.10/site-packages/great_expectations/core/usage_statistics/usage_statistics.py:266, in usage_statistics_enabled_method.<locals>.usage_statistics_wrapped_method(*args, **kwargs)
    263         args_payload = args_payload_fn(*args, **kwargs) or {}
    264         nested_update(event_payload, args_payload)
--> 266     result = func(*args, **kwargs)
    267     message["success"] = True
    268 except Exception:

File ~/code/gx-exploration/.venv/lib/python3.10/site-packages/great_expectations/checkpoint/checkpoint.py:315, in BaseCheckpoint.run(self, template_name, run_name_template, expectation_suite_name, batch_request, validator, action_list, evaluation_parameters, runtime_configuration, validations, profilers, run_id, run_name, run_time, result_format, expectation_suite_ge_cloud_id)
    305         self._run_validation(
    306             substituted_runtime_config=substituted_runtime_config,
    307             async_validation_operator_results=async_validation_operator_results,
   (...)
    312             validation_dict=validation_dict,
    313         )
    314 else:
--> 315     self._run_validation(
    316         substituted_runtime_config=substituted_runtime_config,
    317         async_validation_operator_results=async_validation_operator_results,
    318         async_executor=async_executor,
    319         result_format=result_format,
    320         run_id=run_id,
    321     )
    323 checkpoint_run_results: dict = {}
    324 async_validation_operator_result: AsyncResult

File ~/code/gx-exploration/.venv/lib/python3.10/site-packages/great_expectations/checkpoint/checkpoint.py:454, in BaseCheckpoint._run_validation(self, substituted_runtime_config, async_validation_operator_results, async_executor, result_format, run_id, idx, validation_dict)
    449     validation_dict = CheckpointValidationConfig(
    450         id=substituted_runtime_config.get("default_validation_id")
    451     )
    453 try:
--> 454     substituted_validation_dict: dict = get_substituted_validation_dict(
    455         substituted_runtime_config=substituted_runtime_config,
    456         validation_dict=validation_dict,
    457     )
    458     validate_validation_dict(
    459         validation_dict=substituted_validation_dict,
    460         batch_request_required=(not self._validator),
    461     )
    463     batch_request: Union[
    464         BatchRequest, FluentBatchRequest, RuntimeBatchRequest, None
    465     ] = substituted_validation_dict.get("batch_request")

File ~/code/gx-exploration/.venv/lib/python3.10/site-packages/great_expectations/checkpoint/util.py:193, in get_substituted_validation_dict(substituted_runtime_config, validation_dict)
    189 def get_substituted_validation_dict(
    190     substituted_runtime_config: dict, validation_dict: CheckpointValidationConfig
    191 ) -> dict:
    192     substituted_validation_dict = {
--> 193         "batch_request": get_substituted_batch_request(
    194             substituted_runtime_config=substituted_runtime_config,
    195             validation_batch_request=validation_dict.get("batch_request"),
    196         ),
    197         "expectation_suite_name": validation_dict.get("expectation_suite_name")
    198         or substituted_runtime_config.get("expectation_suite_name"),
    199         "expectation_suite_ge_cloud_id": validation_dict.get(
    200             "expectation_suite_ge_cloud_id"
    201         )
    202         or substituted_runtime_config.get("expectation_suite_ge_cloud_id"),
    203         "action_list": get_updated_action_list(
    204             base_action_list=substituted_runtime_config.get("action_list", []),
    205             other_action_list=validation_dict.get("action_list", {}),
    206         ),
    207         "evaluation_parameters": nested_update(
    208             substituted_runtime_config.get("evaluation_parameters") or {},
    209             validation_dict.get("evaluation_parameters", {}),
    210             dedup=True,
    211         ),
    212         "runtime_configuration": nested_update(
    213             substituted_runtime_config.get("runtime_configuration") or {},
    214             validation_dict.get("runtime_configuration", {}),
    215             dedup=True,
    216         ),
    217         "include_rendered_content": validation_dict.get("include_rendered_content")
    218         or substituted_runtime_config.get("include_rendered_content")
    219         or None,
    220     }
    222     for attr in ("name", "id"):
    223         if validation_dict.get(attr) is not None:

File ~/code/gx-exploration/.venv/lib/python3.10/site-packages/great_expectations/checkpoint/util.py:268, in get_substituted_batch_request(substituted_runtime_config, validation_batch_request)
    259         raise gx_exceptions.CheckpointError(
    260             f'BatchRequest attribute "{key}" was provided with different values'
    261         )
    263 effective_batch_request: dict = {
    264     **validation_batch_request,
    265     **substituted_runtime_batch_request,
    266 }
--> 268 return materialize_batch_request(batch_request=effective_batch_request)

File ~/code/gx-exploration/.venv/lib/python3.10/site-packages/great_expectations/core/batch.py:942, in materialize_batch_request(batch_request)
    939 else:
    940     batch_request_class = BatchRequest
--> 942 return batch_request_class(**effective_batch_request)

File ~/code/gx-exploration/.venv/lib/python3.10/site-packages/great_expectations/datasource/fluent/batch_request.py:91, in BatchRequest.__init__(self, **kwargs)
     89 if "batch_slice" in kwargs:
     90     _batch_slice_input = kwargs.pop("batch_slice")
---> 91 super().__init__(**kwargs)
     92 self._batch_slice_input = _batch_slice_input

File ~/code/gx-exploration/.venv/lib/python3.10/site-packages/pydantic/v1/main.py:341, in BaseModel.__init__(__pydantic_self__, **data)
    339 values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
    340 if validation_error:
--> 341     raise validation_error
    342 try:
    343     object_setattr(__pydantic_self__, '__dict__', values)

ValidationError: 1 validation error for BatchRequest
batch_spec_passthrough
  extra fields not permitted (type=value_error.extra)

GX version: 0.18.12
Python: 3.10.13
OS: Linux

aslic · April 19, 2024, 4:09pm

Hey,

I’ve also encountered this issue and it appears that the root cause, is that the data docs renderer is not aware of all potential locations for: “data_asset_name”.

Similarly to others, when using a fluent Spark datasource and .build_batch_request() the asset name is neither in “batch_spec” or “batch_kwargs” in the validation result’s meta section.

Just as a proof of concept the following amendment to GX code shows the expected asset name just fine for me in the data docs

# from: great_expectations/render/renderer/site_builder.py : 959

validation_success = validation.success
batch_kwargs = validation.meta.get("batch_kwargs", {})
batch_spec = validation.meta.get("batch_spec", {})
# added
active_batch_def = validation.meta.get("active_batch_definition", {})

self.add_resource_info_to_index_links_dict(
    asset_name=batch_kwargs.get("data_asset_name")
    or batch_spec.get("data_asset_name")
    # added
    or active_batch_def.get("data_asset_name"),
)

I’m sure there will be other ammendments needed beyond this, but hopefully this is a useful starting point for the actual fix

roger67 · April 22, 2024, 4:39am

Thanks @aslic for the workaround but when i navigated to great_expectations/render/renderer/ directory its empty. I have set up gx in databricks environment and every time i can see these folders but they are empty

aslic · April 22, 2024, 9:11am

That’s odd @roger67, I have no idea why that is, perhaps somebody from GX can shed some light.

In my particular use case I’m not going to use an adjustment to the library code to fix. I just wanted to raise it to the GX team.

But if I do manage to figure out an actual workaround I’m happy to use in my code in the meantime, I will update here.

Topic		Replies	Views
Data Asset Name not showing in Data Docs GX Core Support help-wanted , databricks	1	232	March 7, 2024
Asset Name not shown in Data Docs GX Core Support how-to	5	556	October 3, 2023
How to organize validation results comes from multiple pipelines? Archive	5	750	November 3, 2020
How are validation results stored and visualized in Data Docs? Archive	0	491	October 21, 2020
Appending validations to data docs GX Core Support	4	316	September 26, 2023

Unable to see data asset name in data doc

Related topics