How can I deploy GE on Databricks Azure?

jpcampbell42 · May 18, 2020, 3:11pm

Caveat: this path has not been tested by anyone at Superconductive, but we’ve been able to help GE users complete these steps to work with GE on Azure.

Project creation. We recommend that you either:
a. use great_expectations init locally and configure a SparkDFDatasource that reads from a local directory, or
b. use DataContext.create() directly from a notebook.

Option (a) makes it easy to rapidly tweak your configuration and experiment with GE; starting locally is often a useful path. However, it will require copying your configuration to Azure Blob Storage directly.

Option (b) allows you to directly use Azure Blob Storage.

Ensure that you have configured DBFS to work with Azure Blob Storage. The Databricks docs are here: https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-storage
Load your DataContext from Azure Blob Storage: context = DataContext("/your/dbfs/project/path")

At that point, you will be able to use the standard Great Expectations flows, including using a notebook that you have created locally if you would like or directly using your data from the notebook.

Topic		Replies	Views
GE with Databricks Delta Archive	3	3152	May 4, 2020
GX with Databricks and Azure Blob Storage GX Core Support	8	296	July 2, 2024
Issues with configuring metadata store connection to Azure Data Lake Storage Gen2 GX Core Support help-wanted , databricks	0	169	January 11, 2024
How to configure a Databricks AWS Datasource Archive how-to , help-wanted	4	664	March 10, 2021
Great Expectation Setup with Azure Data Bricks Archive how-to , help-wanted , databricks , types-of-expectation , expectation-request	0	586	August 25, 2022

How can I deploy GE on Databricks Azure?

Related topics