This is just an observation I made, which might some clarify some things for users (and might want to confirm with the GE core eng team!)
When creating a batch from a SQLalchemy backend, I can either pass in a table+schema, or a SQL query. I believe the SQL query creates a temp table behind the scenes, whereas the table type doesn’t, which may impact performance a little for creating expectations. See the attached screenshot for my timing tests. The input here was:
- postgres RDS instance
- table with ~200k rows
- ran timeit multiple times and the trend seems the same
I think this implies that in practice, even if I choose a table with “select *” when creating an expectation suite, I should probably switch the batch kwargs to use a table+schema instead?
Are my assumptions here correct?