We are running our cron job for ml model in a docker container. So every time when it runs the docs are generated and pushed to GCS bucket.
Does this mean that the docs will be overwritten each time or will it just update the existing docs? if it is overwriting the docs is there a way to make sure it doesn’t?
we also push the data docs when we run it locally (develop), so every time I run the checkpoint it gets pushed. but when someone else runs locally they overwrite it as well. We could sync the docs using gsutil rsync
but this seems a bit hacky if there is another way do?
If you are calling a Validation Operator (or a Checkpoint) to validate your new batch of data, Data Docs are not overwritten. Instead, UpdateDataDocsAction
inside the Validation Operator generates HTML files for the new validation results and rebuilds index.html to add links to the new results.
Data Docs are rebuilt only when you call the CLI command great_expectations docs build
or invoke the build_data_docs method in your DataContext in Python.
To the second part of the question, if I understand it correctly, you do not want the shared Data Docs sited that is hosted on GCS to be updated when team members run validation locally on their dev machines (please correct me if this is a misunderstanding). You can configure this by having 2 Validation Operators in your Data Context - one for the “production” environment and the other for the dev environment. The difference in the operators’ configuration will be the site_names
property set for the UpdateDataDocsAction
action. In the production one you will set it to the name of the GCS site, and the dev one - to the “local” one.
To the second part of the question, if I understand it correctly, you do not want the shared Data Docs sited that is hosted on GCS to be updated when team members run validation locally on their dev machines
I want everyone in the team to run the GE locally and to be pushed into a prefix name develop
in GCS bucket so that I can see everyone data docs. We do have a separate prefix for prod.