How to host Data Docs on GCS?

I’m trying to setup GCS-hosting for datadocs. I know there’s a tutorial for AWS. I saw GCS support mentioned a few times in the docs, but couldn’t find a tutorial. Could anyone point me towards a good place to start?

A few things:

  1. You’ll need credentials configured correctly

  2. You’ll need to configure a data docs site as follows in your great_expectations.yml:

    class_name: SiteBuilder
    class_name: TupleGCSStoreBackend
    bucket: YOUR_GCP_BUCKET
    project: YOUR_GCP_PROJECT
    class_name: DefaultSiteIndexBuilder

  3. You’ll then probably run into this bug that I’m fixing now:

  4. Then you may notice that some of the links between data docs pages don’t work. I will file this and begin work on these bugs as well.

  5. Once these bugs are worked out I plan on making a “how to” guide in our official docs.

How did you configure credentials?

I presume via gcloud auth on the command line. I’m not sure how permissions are typically saved in GE, but GCS commonly relies on service accounts that you can authenticate as JSON key files. Service accounts in GCS are specially made accounts designated for programmatic usage of a specific task.

So, the nice thing about GE in this case is that it doesn’t even know about your credentials - it uses the google-cloud library service accounts. You might need to create a service account, download the key, and set the environment variable like this:

 export GOOGLE_APPLICATION_CREDENTIALS=path/to/sevice-account-key-sdfjwefsdf.json

Side note. The first bug is fixed and merged, and a colleague has fixes for the other bugs I’m hoping to ship tomorrow!

Alright I have some good news! In the upcoming 0.10.9 release which is shipping this morning GCS data docs is verified working with a small caveat!

The caveat is that if you have a prefix configured your site will not have the correct urls so until this bug is fixed you will need to operate with a prefix: "":

0.10.11 also has related fixes.