Hi all
I am trying to integrate GE in Datahub and I follow the documentation about it, which implies to add an action in the checkpoint.yml Great Expectations | DataHub
Is there someone that has been able to integrate them?
When I execute my script, checkpoint.yml is reverted and action deleted.
Hey @Ric_Denmark ! Thanks for reaching out.
Which backend & version of GX are you using? I believe the DataHub integration only supports SQLAlchemy backends, and may only support up to v0.15.50, possibly 0.16.16.
Additionally, The GX <> DataHub integration is maintained and supported by the folks over at DataHub – they may be able to provide more help for you over in their community.
Hej Austin. Thank you for writing me.
I use GE 0.15.50 and SQLAlchemy - no pandas here
Since you wrote me, I started all over using the gx-tutorials and my code is not reverted anymore, I can run the GE checkpoint in CLI, but the metadata is not ingested in Datahub.
Metadata is sent but the validation tab is not visualised.
I activate the DATAHUB_DEBUG
variable to true and I see that the Dataset URN might be the issue.
- name: datahub_action
action:
module_name: datahub.integrations.great_expectations.action
class_name: DataHubValidationAction
server_url: https://meta.test-data.domain.it/api/gms
token: 'string_of_token'
env: TEST
platform_instance_map: {"name_my_datasource": "Synapse.Data_Platform" }
graceful_exceptions: true
When I run my checkpoint I get:
* Using v3 (Batch Request) API
* Calculating Metrics: 100%|███████████████████████████████████████████████████████████████████████████████████| 25/25 [00:06<00:00, 3.92it/s]
* Finding datasets being validated
* GE expectation_suite_name - name_of_my_suite, expectation_type - expect_table_columns_to_match_set, Assertion URN - urn:li:assertion:bebcf033345475640f99b73e34873ad1
* GE expectation_suite_name - name_of_my_suite, expectation_type - expect_column_values_to_not_be_null, Assertion URN - urn:li:assertion:da4173832d02233a39d60186a6deed59
* GE expectation_suite_name - name_of_my_suite, expectation_type - expect_column_values_to_not_be_null, Assertion URN - urn:li:assertion:a64d6234eb91f4e8cdfd250584032c24
* GE expectation_suite_name - name_of_my_suite, expectation_type - expect_column_values_to_be_between, Assertion URN - urn:li:assertion:158ae3d98a2e227e126a226746524418
* Sending metadata to datahub ...
* Dataset URN - urn:li:dataset:(urn:li:dataPlatform:mssql,Synapse.Data_Platform..name_container.name_dataset,TEST)
* Assertion URN - urn:li:assertion:bebcf033345475640f99b73e34873ad1
* Assertion URN - urn:li:assertion:da4173832d02233a39d60186a6deed59
* Assertion URN - urn:li:assertion:a64d6234eb91f4e8cdfd250584032c24
* Assertion URN - urn:li:assertion:158ae3d98a2e227e126a226746524418
* Metadata sent to datahub.
* Validation succeeded!
* Suite Name Status Expectations met
* - name_of_my_suite ✔ Passed 4 of 4 (100.0 %)
There are 2 dots in Dataset URN instead of 1 as it should be by reading the url of the dataset, where I try to visualize the validation tab
Solution found.
After activating DATAHUB_DEBUG it was possible to see which Dataset URN was used to send metadata.
This allowed to correct the arguments within :
platform_instance_map
platform_alias
env
btw the token should be within ’ ’ and without Authorization Bearer
Make sure that any yml file has the right GE version 0.15.50, in case you still find issues
1 Like