Performance difference for "table" vs "query" batch kwargs

This is just an observation I made, which might some clarify some things for users (and might want to confirm with the GE core eng team!)

When creating a batch from a SQLalchemy backend, I can either pass in a table+schema, or a SQL query. I believe the SQL query creates a temp table behind the scenes, whereas the table type doesn’t, which may impact performance a little for creating expectations. See the attached screenshot for my timing tests. The input here was:

  • postgres RDS instance
  • table with ~200k rows
  • ran timeit multiple times and the trend seems the same

I think this implies that in practice, even if I choose a table with “select *” when creating an expectation suite, I should probably switch the batch kwargs to use a table+schema instead?

Are my assumptions here correct?

4 Likes