Struggling to identify correct base directory for add_spark_filesystem

mike_f50 · March 21, 2024, 2:41pm

After a flash of inspiration overnight, I thought I should try using a Spark dataframe, which turns out to work a treat, and avoids the need to copy my data onto the DataBricks cluster.

Sharing my solution here in the hope it will help others. (I’m very new to Spark and DataBricks, which doubtless shows!)

data_frame = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load(f"abfss://{blob_container_name}@{datavalidation_storage_account_name}.dfs.core.windows.net/{for_validation_data_folder}/MyFileName.csv")
display(data_frame)

datasource = context.sources.add_spark("spark_datasource")
asset = datasource.add_dataframe_asset(name="spark_dataframe_asset")
batch_request = asset.build_batch_request(dataframe=data_frame)

batches = asset.get_batch_list_from_batch_request(batch_request)
print(batches)

Topic		Replies	Views
Access CSV from Azure Data Storage getting error Archive how-to , help-wanted , databricks	0	519	July 5, 2022
How to configure a Spark/filesystem Datasource Archive	0	535	June 5, 2020
GX-Databricks:Datasource-Data asset - Validator GX Core Support databricks , datasource	8	455	December 19, 2024
Not able to create expectation suite and data docs in databricks using spark GX Core Support	0	30	July 9, 2025
Using ADLS instead of DBFS in Azure Databricks for all GX artefacts, especially data docs GX Core Support	4	686	October 31, 2023

Struggling to identify correct base directory for add_spark_filesystem

Related topics