Hello,
I’m trying to create a data source for a delta table, I’m working fully on databricks(so I’m not using great_expectations locally).
my_spark_datasource_config = DatasourceConfig(
class_name=“SparkDFDatasource”,
batch_kwargs_generators={
“subdir_reader”: {
“class_name”: “DatabricksTableBatchKwargsGenerator”,
“base_directory”: “delta_table_name or path”,
“reader_method” : “delta”
}
},
)
But I get the feeling that this is not the way to do it? I feel like adding the directory is not really useful(and addng the path isn’t useful for delta lake since the directory contains snappy.parquet files…), I can just leave it blank(is this bad practice?) and select the rows I want from the delta table into a dataframe and test out the dataframe without indicating the delta table in the first place…but that defeats the purpose of a datasource in the first place, no?
Sorry I’m a noob at using GE and I’m a bit lost reading the documentation