How do I integrate validation results into my dashboard?

We’ve been getting some questions about integrating validation results from Great Expectations into existing dashboards, e.g. in order to see data quality metrics right next to the actual data. Here are some thoughts on how to do this:

Approach 1: Metrics
If you just want some basic stats like number of tests run and failed, you can quickly tap into metrics by configuring a Metrics Store and a validation action to store metrics, see the docs here.
This will add a table “ge_metrics” to the database you’ve configured with metadata about the expectation suite run and some basic statistics: evaluated_expectations, successful_expectations, unsuccessful_expectations, success_percent. You can then integrate those into your dashboard.

Approach 2: Validation results
If you want to show the type of detailed information you see in Data Docs, such as the expected and observed values, you’ll need to tap into the validation results. You have two options for that, depending on where you store validation results:

  1. If your validation results store uses the default (JSON files), you can parse out the JSON files. I’ll add a small snippet of the schema below.
  2. You can also set up a new DB backend for a validation store - we currently only support Postgres for that, see the docs. The DB version gives you a few of the fields as structured columns (e.g. the expectation suite name and the run time), but most of the information is still dumped as JSON into the “value” column.

From there, you’ll have to unpack the JSON for each validation run, which has the following structure:

  • results (array, one element per expectation in this suite)
  • success (whether the suite failed or passed)
  • meta (expectation suite name, GE version, batch_kwargs, etc)
  • evaluation_parameters (eval params that were used)
  • statistics (same as the statistics listed in metrics above)

Here’s a small snippet that shows how you’d parse out the top-level elements from the JSON blob in “value” if the results are stored in a postgres DB:

with value_json as (
	select 
		expectation_suite_name,
		run_time,
		value::json as value 
	from ge_validations_store
),
value_items as (
	select
		expectation_suite_name,
		run_time,
		value -> 'success' as success,
		value -> 'results' as results_array,
		value -> 'meta' as meta,
		value -> 'evaluation_parameters' as evaluation_parameters,
		value -> 'statistics' as stats
	from value_json
)
select * from value_items;

The main challenge here is then unpacking the results array, which contains an element for each expectation in the suite that was used for validation. The schema for each result is:

  • success (whether the expectation passed or failed)
  • expectation_config (another JSON object, contains the actual expectation name and parameters – the structure here is different depending on the expectation type!)
  • meta (metadata)
  • result (contains the observed value)
  • exception_info (whether an exception was raised)

So, long story short: you’ll have to do a fair amount of JSON parsing and unpacking and take into account the different parameters for the different expectation types. But other than that, getting to the validation results and displaying them in a dashboard or some other kind of visualization is pretty straightforward.

3 Likes

This is interesting, we are currently using the default doc site for monitoring daily job. Storing it in a database and display in dashboard is probably a cleaner solution.

Just try the postgres backend and it works… I feel storing in a database is better than files. I can get rid of the nested folder structure automatically with simple SQL in a table

I am now getting this error, it’s not very useful information and I do not know what is going on. I am trying to reproduce this example.

great_expectations.exceptions.ClassInstantiationError: The module “great_expectations.data_context.store” exists; however, the system is unable to create an instance of the class “MetricStore”, searched for inside this module. Please make sure that the class named “MetricStore” is properly defined inside its intended module and declared correctly by the calling entity. This error is unrecoverable.

@sam
Is there any plan to expand the support for other DB? The current default is postgres.

I tried Oracle, which does not work well with the current table creation

sqlalchemy.exc.DatabaseError: (cx_Oracle.DatabaseError) ORA-00906: missing left parenthesis
[SQL:
CREATE TABLE ge_validations_store (
        expectation_suite_name VARCHAR2 NOT NULL,
        run_name VARCHAR2 NOT NULL,
        run_time VARCHAR2 NOT NULL,
        batch_identifier VARCHAR2 NOT NULL,
        value VARCHAR2,
        PRIMARY KEY (expectation_suite_name, run_name, run_time, batch_identifier)
)

For oracle, we need to specify a max length which is only 4000, and certainly not enough for GE validation result JSON.

Max length:
Oracle:4000
MySQL: 65536 Bytes
Postgres: 1GB

Even if I could solve the table problem, the max length will still be a problem for storing the JSON text.

@nok We currently don’t have any plans to add other backends to store validation results, but feel free to give it a shot if you’d like to implement something. FYI, GE also doesn’t officially support Oracle yet for datasources, so this is currently pretty untested territory. And yes, I believe we’ve only implemented validation store support for Postgres because it provides good native JSON support.

Regarding the metrics error above, could you file a github issue if this is still a problem? That seems like a bug.