Batch Requests converted to non-optimized queries in BigQuery (GoogleSQL)

Good morning GX Support,

I mapped a BigQuery dataset to a GX Datasource and then I mapped a BigQuery table to a GX Data Asset. Afterward, I added a splitter (yearly, monthly, and daily) to the Data Asset so that I can control what data to fetch via GX Batch Requests based on the interval of the Airflow DAG I’m running.

The problem is that the final request inferred by GX and executed on BigQuery is not optimized to leverage pruning. In a nutshell, I added the splitter by defining as column_name the partition column of the table I’m working with. By doing so, I expect to get a final query that can leverage pruning and load the partition I need only (not the entire table every time) in order to not waste resources. However, the way GX translates the Batch Request to the GoogleSQL dialect is something like this:

CREATE OR REPLACE TABLE `<GX TEMP ID>`
	OPTIONS(
		expiration_timestamp=TIMESTAMP_ADD(
		CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
	)
	AS SELECT * 
FROM `<NAME OF THE TABLE>` 
WHERE EXTRACT(year FROM `<PARTITION COLUMN>`) = <YEAR> AND EXTRACT(month FROM `<PARTITION COLUMN>`) = <MONTH> AND EXTRACT(day FROM `<PARTITION COLUMN>`) = <DAY>

instead of something like this:

CREATE OR REPLACE TABLE `<GX TEMP ID>`
	OPTIONS(
		expiration_timestamp=TIMESTAMP_ADD(
		CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
	)
	AS SELECT * 
FROM `<NAME OF THE TABLE>` 
WHERE TIMESTAMP_TRUNC(<PARTITION COLUMN>, DAY) = TIMESTAMP("<YEAR>-<MONTH>-<DAY>")

In my specific case, the first query processes 33GB while the second would process 130MB.

Is there any chance this is something you’re working on? In case you aren’t, is there another way to address this problem? I want to add some quality checks within my ETL pipeline on the new batch I’m processing without wasting resources (especially considering I’m working with very big tables sometimes).

Thanks in advance.

Hey @lmandruzzato! Thanks for reaching out. Would you be willing to open a GitHub Issue detailing this functionality you’re looking for? That will help us to better review the behavior here.

To keep track of the issue and an eventual solution, I leave here the GitHub Issue link: Optimize BigQuery Batch Requests to fully leverage pruning · Issue #8518 · great-expectations/great_expectations · GitHub.