GX in a Production Setting

Hello all,

I’m a data engineer working on a POC with Great Expectations and need guidance on architecting the project for production use. I’ve read the documentation and completed simple tests in a single .py file, but I’m struggling to find a suitable architecture for production and widespread use.

I’m looking for high-level architectural considerations, best practices, and examples of successful deployments. Specifically, I need advice on structuring the Python project, organizing different modules, and managing the file context to avoid errors when re-running the project (e.g., handling data source/asset read errors). For context, I’ll be using GX Core to test tables in my Snowflake database.

Any insights or experiences would be greatly appreciated!

Thanks and have a great day!

hello,

I would refer to this page in our documentation which has a very intuitive guideline and tutorial environment to guide you through working examples of GX data validation in a data pipeline. This is the repo featured on the doc page. The tutorials include happy path and pipeline fails of data validation during ingestion of data into database.