Performance difference for "table" vs "query" batch kwargs

sam · August 28, 2020, 5:27pm

This is just an observation I made, which might some clarify some things for users (and might want to confirm with the GE core eng team!)

When creating a batch from a SQLalchemy backend, I can either pass in a table+schema, or a SQL query. I believe the SQL query creates a temp table behind the scenes, whereas the table type doesn’t, which may impact performance a little for creating expectations. See the attached screenshot for my timing tests. The input here was:

postgres RDS instance
table with ~200k rows
ran timeit multiple times and the trend seems the same

I think this implies that in practice, even if I choose a table with “select *” when creating an expectation suite, I should probably switch the batch kwargs to use a table+schema instead?

Are my assumptions here correct?

Topic		Replies	Views
How to load a database table or a query result as a batch Archive how-to , help-wanted	0	437	May 27, 2020
Batch API v3 migration - specify name of BQ temp table Archive help-wanted	0	519	September 2, 2022
What happened to the ability to pass in custom sql queries to batch_kwargs? It doesn't seem to work anymore if I'm using the new (experimental) modular expectations API Archive how-to	0	529	February 6, 2021
Cannot use QueryAsset with Microsoft SQL Server (MSSQL) GX Core Support	4	544	January 4, 2024
Pre-processing SQL query before executing expectations Archive help-wanted	1	578	February 19, 2021

Performance difference for "table" vs "query" batch kwargs

Related topics