How can I deploy GE on Databricks Azure?

Caveat: this path has not been tested by anyone at Superconductive, but we’ve been able to help GE users complete these steps to work with GE on Azure.

  1. Project creation. We recommend that you either:
    a. use great_expectations init locally and configure a SparkDFDatasource that reads from a local directory, or
    b. use DataContext.create() directly from a notebook.

Option (a) makes it easy to rapidly tweak your configuration and experiment with GE; starting locally is often a useful path. However, it will require copying your configuration to Azure Blob Storage directly.

Option (b) allows you to directly use Azure Blob Storage.

  1. Ensure that you have configured DBFS to work with Azure Blob Storage. The Databricks docs are here: https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-storage

  2. Load your DataContext from Azure Blob Storage: context = DataContext("/your/dbfs/project/path")

At that point, you will be able to use the standard Great Expectations flows, including using a notebook that you have created locally if you would like or directly using your data from the notebook.

1 Like