We ask this question to all of our user interview participants. If you could add a feature or change anything about GE that could help you and your data team what would it be?
I requested that we find a way to make GE predict where we should go for lunch each day based on our collective team taste… fingers crossed it makes it out of the backlog
Our main pipeline exists of using dask to read parquet files from some external source through the intake library. Integration of great expectations with dask dataframes (instead of pandas) would be awesome because we’d be able to run expectations in a distributed way instead of having to download each partition on its own.
Of course some expectations can’t trivially be distributed, but plenty of them can, and we’d be happy to only get those if it means a nworkers x speedup.