Modern Production stack for Great Expectations

Hello, community.
I want to raise the topic of a modern stack for GX in production
As I see for production using:

  • A fluent data source is better than a block config data source
  • Checkpoint is better than a validator
  • Checkpoint is better than SimpleCheckpoint
  • Data assistance is better than rule-based profiler

Thoughts?

Hi Aleksei,

Thanks for raising such an important topic. To answer that question, I think it would be quite important to map these concepts to the related documentation and the steps along the GX workflow. I have to review the new updates to the docs and I’ll share any insights in this thread.

1 Like

@CesarGarcia thank you! Yes, will be great to point out best practices in GX docs. But before this, it needs approval from the community and GX.

Add some points about expectations
json as expectations storage is better than code

Any news on this topic? We are currently exploring the possibility to deploy and run expectations on Kubernetes. The setup in mind would be

  • Build Docker image with dependencies
  • store great_expectations.yml as a ConfigMap and use Kustomize to manage enviroments
  • Airflow to trigger the checkpoint run
  • use sidecar pod with access to expectation suites

We are currently struggling with how to incorporate this process with CI/CD. And what the development process will look like. Any idea?

Is there also a best practice for running always the same actions for all checkpoints?

Anything else to look out for? Anything missing?