Connection Timeout and Suite/Checkpoint Not Found in Multi-threaded Execution (Using Separate Contexts per Thread)

DataWings · July 2, 2025, 5:33pm

Hi Great Expectations team and community,

I’m using Great Expectations with PySpark DataFrames in Databricks for data validation, following this general flow wrapped in a function:

Create a new EphermalDataContext
Create an Expectation Suite, add rules, and register it with the context
Create a Checkpoint and add it to the context
Run validation using the defined rules

Sequential Execution

When running this sequentially across multiple dataframes, everything works as expected. However, I still see some ConnectionError(timeout) messages in the logs, though validations pass.

Q1: What causes these timeouts? Are they related to usage tracking, Data Docs, or any API interactions?

Multi-threaded Execution

To improve performance, I switched to multi-threading using concurrent.futures.ThreadPoolExecutor. I ensure that each thread creates its own fresh DataContext.

Still, for some dataframes, I get intermittent errors like:

Expectation suite not found in the context
Checkpoint not found in the context
Validation definition not found

These errors suggest that even though the context is isolated per thread, GE sometimes fails to locate suites or checkpoints created moments earlier in the same thread.

Q2: Is there any known issue with thread safety or race conditions in how suites and checkpoints are added and accessed in memory?

Q3: Would switching to multi-processing (vs threading) be more reliable for this use case?

Any advice, patterns, or configuration suggestions for safely running GE in a concurrent environment (especially with PySpark) would be highly appreciated.

Thanks in advance!

Topic		Replies	Views
Parallel Execution of Great Expectation Validations Feedback	0	436	June 5, 2023
Not able to create expectation suite and data docs in databricks using spark GX Core Support	0	13	July 9, 2025
Is there support for multithreading now? GX Core Support	0	41	March 7, 2025
Unable to run Great Expectations on PySpark using SparkDFExecutionEngine GX Core Support	0	120	July 12, 2024
Is there any support for multithreading? GX Core Support	3	449	May 2, 2024

Connection Timeout and Suite/Checkpoint Not Found in Multi-threaded Execution (Using Separate Contexts per Thread)

Sequential Execution

Multi-threaded Execution

Related topics