We’ve been getting some questions about integrating validation results from Great Expectations into existing dashboards, e.g. in order to see data quality metrics right next to the actual data. Here are some thoughts on how to do this:
Approach 1: Metrics
If you just want some basic stats like number of tests run and failed, you can quickly tap into metrics by configuring a Metrics Store and a validation action to store metrics, see the docs here.
This will add a table “ge_metrics” to the database you’ve configured with metadata about the expectation suite run and some basic statistics: evaluated_expectations, successful_expectations, unsuccessful_expectations, success_percent. You can then integrate those into your dashboard.
Approach 2: Validation results
If you want to show the type of detailed information you see in Data Docs, such as the expected and observed values, you’ll need to tap into the validation results. You have two options for that, depending on where you store validation results:
- If your validation results store uses the default (JSON files), you can parse out the JSON files. I’ll add a small snippet of the schema below.
- You can also set up a new DB backend for a validation store - we currently only support Postgres for that, see the docs. The DB version gives you a few of the fields as structured columns (e.g. the expectation suite name and the run time), but most of the information is still dumped as JSON into the “value” column.
From there, you’ll have to unpack the JSON for each validation run, which has the following structure:
- results (array, one element per expectation in this suite)
- success (whether the suite failed or passed)
- meta (expectation suite name, GE version, batch_kwargs, etc)
- evaluation_parameters (eval params that were used)
- statistics (same as the statistics listed in metrics above)
Here’s a small snippet that shows how you’d parse out the top-level elements from the JSON blob in “value” if the results are stored in a postgres DB:
with value_json as (
select
expectation_suite_name,
run_time,
value::json as value
from ge_validations_store
),
value_items as (
select
expectation_suite_name,
run_time,
value -> 'success' as success,
value -> 'results' as results_array,
value -> 'meta' as meta,
value -> 'evaluation_parameters' as evaluation_parameters,
value -> 'statistics' as stats
from value_json
)
select * from value_items;
The main challenge here is then unpacking the results array, which contains an element for each expectation in the suite that was used for validation. The schema for each result is:
- success (whether the expectation passed or failed)
- expectation_config (another JSON object, contains the actual expectation name and parameters – the structure here is different depending on the expectation type!)
- meta (metadata)
- result (contains the observed value)
- exception_info (whether an exception was raised)
So, long story short: you’ll have to do a fair amount of JSON parsing and unpacking and take into account the different parameters for the different expectation types. But other than that, getting to the validation results and displaying them in a dashboard or some other kind of visualization is pretty straightforward.