Using ADLS instead of DBFS in Azure Databricks for all GX artefacts, especially data docs

Hi @hdamczy, apologies for the delay.

I dug into this some more, and was able to get the tutorial writing to my directory of choice on Databricks - I think this is what you’re looking for. There’s two things that I had to adjust to get it working.

  1. Set the context project root directory. The project_dir here can be a little tricky, if you’re using a DBFS directory (that otherwise you might access via dbutils.fs.ls("/tmp/discourse1427"), you want to make sure you begin the path with /dbfs as shown below. If the directory you’re using is not on DBFS but available at some other path on the Databricks machine you’re running the code, just use the full path name.
project_dir = "/dbfs/tmp/discourse1427/"

context = gx.get_context(project_root_dir=project_dir)
  1. The Data Docs config automatically creates the default local_site site with a temp directory path - this is what you’re seeing in the most recent output you included. Instead, create a new site config with your desired file path in the base_directory and remove the local_site like this:
context.add_data_docs_site(
    site_config={
        "class_name": "SiteBuilder",
        "store_backend": {
            "class_name": "TupleFilesystemStoreBackend",
            "base_directory": "/dbfs/tmp/discourse1427/data_docs",
        },
        "site_index_builder": {"class_name": "DefaultSiteIndexBuilder"},
    },
    site_name="my_new_data_docs_site",
)

context.delete_data_docs_site(site_name="local_site")

print(f"context_root_dir: {context_root_dir}")
print(f"context: {context}")

After I made these changes, I was able to run through the tutorial and verify that Data Docs output was written to the folder I specified in the site config base_directory.

1 Like