How to set a dynamic path for the data source?

chamaoskurumi · June 5, 2020, 1:29pm

Hi GE folks !

In my pipeline the datasource filename changes every day, but follows this systematic pattern: tableXYZ_YYYYMMDD.csv .

Hence, I need to set the path dynamically, which I tried to do by adding a few lines of code at the top of the suite’s jupyter notebook.

This is what I did:

1.) I opened the suite’s notebook via

great_expectations suite edit "Table Checker"

2.) I changed the first chunk of the notebook to

import glob 
import datetime
import great_expectations as ge
import great_expectations.jupyter_ux
from great_expectations.data_context.types.resource_identifiers import (
    ValidationResultIdentifier,
)

context = ge.data_context.DataContext()

expectation_suite_name = "Table Checker"
suite = context.get_expectation_suite(expectation_suite_name)
suite.expectations = []

path_to_csv = glob.glob("../data/tableXYZ*.csv")[0]

batch_kwargs = {
    "data_asset_name": "Table XYZ",
    "datasource": "Data Provider ZZZ",
    "path": path_to_csv,
}
batch = context.get_batch(batch_kwargs, suite)
batch.head()

3.) I ran the jupyter notebook (sucessfully) and saved it.

4.) But when I return to the juypter notebook later via

great_expectations suite edit "Table Checker"

the code from 2.) got overwritten.

Questions:

How can I make sure changes in the first code chunk of the suite jupyter notebook do not get overwritten later?
Is there a more elegant or sustainable solution for setting the path to datasource dynamically?

Best from Berlin and thx for your great work ,

Guido

Topic		Replies	Views
Setting up a batchrequest with pattern Archive	0	702	January 19, 2022
I am currently working with Great Expectations Core to validate data from two different sources: a CSV file and a MongoDB data source. While I am able to create Expectations and generate local Data Docs, I am encountering the same issue in both cases. S GX Core Support how-to	1	134	November 7, 2024
Datasource pandas_filesystem does not contain "add_directory_csv_asset" as suggested on docs GX Core Support	2	64	October 7, 2024
GX-Databricks:Datasource-Data asset - Validator GX Core Support databricks , datasource	8	401	December 19, 2024
Creating redshift datasource behaves differently Archive how-to , help-wanted	1	563	April 20, 2021

How to set a dynamic path for the data source?

Related topics