How to set a dynamic path for the data source?

Hi GE folks :wave: !

In my pipeline the datasource filename changes every day, but follows this systematic pattern: tableXYZ_YYYYMMDD.csv .

Hence, I need to set the path dynamically, which I tried to do by adding a few lines of code at the top of the suite’s jupyter notebook.

This is what I did:

1.) I opened the suite’s notebook via

great_expectations suite edit "Table Checker"

2.) I changed the first chunk of the notebook to

import glob 
import datetime
import great_expectations as ge
import great_expectations.jupyter_ux
from great_expectations.data_context.types.resource_identifiers import (
    ValidationResultIdentifier,
)

context = ge.data_context.DataContext()

expectation_suite_name = "Table Checker"
suite = context.get_expectation_suite(expectation_suite_name)
suite.expectations = []

path_to_csv = glob.glob("../data/tableXYZ*.csv")[0]

batch_kwargs = {
    "data_asset_name": "Table XYZ",
    "datasource": "Data Provider ZZZ",
    "path": path_to_csv,
}
batch = context.get_batch(batch_kwargs, suite)
batch.head()

3.) I ran the jupyter notebook (sucessfully) and saved it.

4.) But when I return to the juypter notebook later via

great_expectations suite edit "Table Checker"

the code from 2.) got overwritten.


Questions:

  • How can I make sure changes in the first code chunk of the suite jupyter notebook do not get overwritten later?
  • Is there a more elegant or sustainable solution for setting the path to datasource dynamically?

Best from Berlin and thx for your great work :bowing_man:,

Guido