How to configure a Great Expectations deployment for use in production and dev environments

eugene.mandel · December 17, 2020, 4:41pm

We got this question from multiple users that want to run their GE Data Context (deployment) against both their production environment and the dev/test one. How to configure great_expectations.yml to support this?

eugene.mandel · December 21, 2020, 3:52am

Typically teams want to use a Great Expectations Data Context (project) in production and dev environments. We will use “prod” and “dev” labels for brevity.
“prod” is an environment shared by multiple team members where the data is real and validation is crucial for operating the business.
“dev” is a personal environment used by individual team member to develop, experiment and test.

What area of Great Expectations configuration should vary between “prod” and “dev”

Metadata stores (expectations_store and validations_store)

The “prod” environments store their validation results and/or expectations in a shared location, such as S3, GCS or a database. “Dev” environment should have its own store at least for validation results (and maybe for expectations) that is separate from “prod”.

Datasources
“Prod” and “dev” might need to validate data from different datasources, such as databases, S3 buckets, etc.

Data Docs
The “prod” Data Docs sites are usually deployed on S3, GCS or another service that allows multiple users to access. “Dev” environments should not update the team’s “prod” sites and are best deployed on the team members’ drives.

Notifications
Checkpoints/Validation Operators that are used for validating data have configurable lists of actions, each performing an action on the validation result. One of the actions in the chain sends notifications of the validation’s success or failure. The default class is SlackNotificationAction, but other messaging platforms can be used. “dev” environments should not sent alerts and notifications that can be mistaken for production ones.

Recommended Approach: One config file with variable substitution

Have one great_expectations.yml, but parametrize all the config properties whose values are environment dependent. Great Expectations supports "${VAR} variables that are substituted in run time (as shows in this how-to guide)
This feature allows you to supply environment specific values for these variables from env variables or a file.

In the stores section of the config define two variations of expectations stores and validation results stores, giving them distinct names.
Set the default store names:
expectations_store_name: {EXPECTATIONS_STORE_NAME} validations_store_name: {VALIDATIONS_STORE_NAME}
Define two Data Docs sites in the data_docs_sites section.
local_site that comes pre–built is appropriate for “dev”, since it uses local filesystem.
Add the config of a site to be used for “prod”.
Validation Operators write to the stores, update Data Docs and send notifications. Parametrize the configuration of the actions of Validation Operators:

For example

    ...
    class_name: ActionListValidationOperator
    action_list:
    - name: store_validation_result
      action:
        class_name: StoreValidationResultAction
        target_store_name: ${VALIDATIONS_STORE_NAME}
    - name: update_data_docs
      action:
        class_name: UpdateDataDocsAction
        site_names:
          - ${ACTIVE_SITE}
    - name: send_slack_notification_on_validation_result
     action:
       class_name: SlackNotificationAction
       slack_webhook: ${validation_notification_slack_webhook}
       notify_on: all
       notify_with:
       renderer:
         module_name: great_expectations.render.renderer.slack_renderer
         class_name: SlackRenderer
    ...

Alternative approach: separate config files

Some users have two great_expectations.yml config files - one for “prod” and the other - for “dev”. Each has its own configuration of Datasources, expectations store, validation results store, Data Docs site and a Validation Operator. If you are running GE in Docker, set the file to be used when starting the container. Outside of Docker, use manual (or scripted) renaming.

Please use comments to ask questions and share your best practices for configuring Great Expectations for “prod” and “dev” environments.

Topic		Replies	Views
How to override setting in great_expectations.yml for different environment? Archive	2	711	September 22, 2020
How can I use Great Expectations in a read-only environment? Archive how-to	0	649	March 18, 2021
Expectation stores based on environment Archive	0	454	June 16, 2020
Publishing Docs based on Environment Archive how-to	1	523	June 16, 2020
Testing datasets with the same name in dev and in prod or in different version folders Archive how-to	6	488	September 22, 2020

How to configure a Great Expectations deployment for use in production and dev environments

What area of Great Expectations configuration should vary between “prod” and “dev”

Recommended Approach: One config file with variable substitution

Alternative approach: separate config files

Related topics