I have Airflow DAGs set up to run Great Expectations’ checkpoints.yml alongside corresponding expectations.json files. These DAGs work well for full Data Quality tests.
Now, I’m in need of a DAG that can be triggered with configurations, such as {“load_start_date”: “2023-07-01”}, for incremental DQ tests. When running full tests, I intend to omit configurations.
My approach involves using an environment variable that dynamically changes its value based on the passed configurations. I attempted to use Jinja templating syntax {{ dag_run.conf.get(‘key’, ‘default_value’) }} when declaring this environment variable. However, since Jinja templating operates in a read-only manner within DAG context, the entire {{ … }} expression is passed to the checkpoints. Here are the relevant parts of my DAG code:
os.environ["load_start_date"] = "{{ dag_run.conf.get('load_start_date', '1900-01-01') }}"
# DAG definition with GreatExpectationsOperator tasks
with DAG(
dag_id="ge_dag_name",
start_date=datetime(2021, 12, 15),
catchup=False,
schedule_interval=None
) as dag:
task_GE_c_hub_tbl_name = GreatExpectationsOperator(
task_id="task_GE_c_hub_tbl_name",
data_context_root_dir=ge_root_dir,
checkpoint_name="chk_name",
trigger_rule="all_done",
fail_task_on_validation_failure=True,
return_json_dict=True,
)
Here is a fragment of my Great Expectations checkpoint:
....
runtime_parameters:
query: "
SELECT * FROM tbl_name
WHERE load_datetime >= $load_start_date :: DATE
"
...
While the value of os.environ[“load_start_date”] successfully reaches the checkpoint’s SQL, the Jinja templating syntax is not being interpreted as expected. Instead of obtaining the desired date values like ‘2023-07-01’ or ‘1900-01-01’, I’m encountering the raw Jinja template expression in my SQL query:
WHERE load_datetime >= ‘{{ dag_run.conf.get('load_start_date', '1900-01-01') }}’ :: DATE
I’m looking for an alternative solution to pass the DAG configuration value to my checkpoint
Any guidance would be appreciated.