Caveat: this path has not been tested by anyone at Superconductive, but we’ve been able to help GE users complete these steps to work with GE on Azure.
- Project creation. We recommend that you either:
a. usegreat_expectations init
locally and configure a SparkDFDatasource that reads from a local directory, or
b. useDataContext.create()
directly from a notebook.
Option (a) makes it easy to rapidly tweak your configuration and experiment with GE; starting locally is often a useful path. However, it will require copying your configuration to Azure Blob Storage directly.
Option (b) allows you to directly use Azure Blob Storage.
-
Ensure that you have configured DBFS to work with Azure Blob Storage. The Databricks docs are here: https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-storage
-
Load your DataContext from Azure Blob Storage:
context = DataContext("/your/dbfs/project/path")
At that point, you will be able to use the standard Great Expectations flows, including using a notebook that you have created locally if you would like or directly using your data from the notebook.