Hello, i am looking for some guidance how to prepare GE environment for multi user team of 10 developers working on creating expectations suites together. First idea was create common docker container which would serve as one common development environment for all. But i can’t imagine how it would be maintained or how to prepare the image. considering there could be multiple users working on same time. I guess it would be problematic to simultaneously work on top of one container. I think they would interfere each other work. And another thing is that with GE you need cmd line to send GE commands so it would mean that everyone would need to step in the container with docker exec … to have cmd line available. I think this is not the best approach.
Another option:
prepare a docker container with all needed setup - connection to our DBs, etc. and that docker would each developer run locally and add the personal related configuration - mainly ssh keys for git repository.
Another idea:
Local GE installation for each developer
each developer would install GE locally
before they start developing, they would fetch latest changes from git
locally do the development
push changes to git
Last idea:
Having some ec2 machine with installed GE where every developer would have access and create there his/her working folder and again:
before they start developing, they would fetch latest changes from git
locally do the development
push changes to git
Can you suggest what would be the best solution for our scenario? Running GE in multiuser team where some developers work on top of one db, another group of developers work on top of different db.
If you are some kind of project team, the second approach is more natural. Expectations are just a json file, so you can easily manage it with Git.
Local install should be easy, you just need to figure out how to connect it to your data. If it is some kind of database or spark cluster, it should be relatively straight forward.
If it is some kind of production workflow, you can then have a central expectation stores to run all the checking daily.
@tomas Your options 1 and 3 are super interesting and they spark some ideas for improvements, but I agree with @nok - option 2 (a local GE installation for every engineer and synchronizations through Git) is the most common solution for a team.
Since you said that the collaborators are developers, my reading is that creating a Python environment and installing GE in it is a reasonable ask for them. This becomes more challenging when the collaborators are analysts who can author Expectations in Jupyter notebooks, but prefer not to deal with installation.
Yes, we are project team of developers, so I agree installation is reasonable task. I am also thinking about the option to prepare the docker image for the team with installed Python + GE and offer this as secondary option. Based on personal interest people could choose.
Just with docker I have one personal issue as I understood docker as isolated standalone environment which is preconfigured, I just I do not feel it is right or best approach to “step in” the running container and doing development inside it. And as I wrote in my first post, using docker for GE development currently means you have to go inside and work there. I am not saying it is not possible - I have been trying that but I just believe it is not a best practise. Is my assumption right?
To answer other questions. We are going to use GE with our Snowflake databases so as you already mentioned, no blocker for local installation.
For production workload we plan to use GitHub Actions at the beginning, just to simplify the setup. Those will be scheduled on some regular time. As we plan to store GE code on git this should work smoothly I hope. I need to verify that. In later phases we will probably orchestrate GE jobs in Airflow and possibly link them with our data pipelines.