Leveraging AWS DevOps Pipeline with High Expectations

Oliver · June 12, 2024, 10:17am

Hello everyone ,

I’m looking for advice and best practices from people who have created a data validation pipeline using Great Expectations in an AWS DevOps environment. I’m currently working on this project.

A data pipeline that ingests, transforms, and loads data into an Amazon Redshift cluster is the focus of my project. Here’s a quick rundown of my configuration:

Data Ingestion: Information is absorbed into a staging area in S3 from a number of sources, including RDS and S3.
Data Transformation: The data is transformed using AWS Glue tasks.
Data Loading: Amazon Redshift receives the altered data.

In order to guarantee data quality and integrity at various points in this pipeline, I wish to incorporate Great Expectations. More specifically, I want to:

As soon as the raw data is ingested into S3, validate it.
Before importing converted data into Redshift, make sure it is valid.
Establish ongoing data monitoring and validation within Redshift.

I also visit this for reference: DevOps pipeline example - AWS CodePipeline but I am not getting exact clarification.

Here are my questions:

Which techniques work best when combining an AWS DevOps process with Great Expectations?

Using CodePipeline and CodeBuild, two AWS DevOps products, how can I automate the validation steps?

Anyone who has overcome comparable obstacles or has suggestions for improving this configuration would be greatly appreciated.

I appreciate your guidance and support in advance!

joshzheng · July 1, 2024, 5:48pm

Hey @Oliver,

I’d first get a sense of the GX components and its workflow: Great Expectations overview | Great Expectations

Then I’d check out the the tutorials showing how you’d integrate with AWS: Integrate Great Expectations with AWS | Great Expectations

I’m not sure if GX works with CodePipeline and CodeBuild so I’d experiment with that first to make sure your proposed solution is a viable one.

Josh

Topic		Replies	Views
Can I use Great Expectations in an Argo pipeline? Archive	1	837	November 24, 2020
Modern Production stack for Great Expectations Feedback	4	352	January 4, 2024
Can GE access / validate data from Spark, stored in an S3 bucket? Archive s3	1	646	April 20, 2021
Release: Great Expectations 0.8.0 Archive	0	450	November 8, 2019
How does Great Expectations fit into your data stack? Feedback	3	1329	January 23, 2022

Leveraging AWS DevOps Pipeline with High Expectations

Related topics