Checkpoint Run Result
``` in DEV are faster when compared to PROD.
it takes around 10-20 seconds in DEV while PROD takes more than 5 minutes. Which makes my process slow which is not what we want in our process of data loading.
I cross checked the environments of prod and dev all the resources are same.
any feedback what to check in both environments.
which can cause this possible delay.
the data in both environments are same, the expectations are same , the versions in both the environments are same.
Hi @lionel I just tested airflow + GX integration locally and in production, and I’m getting similar time results, not drastic changes that you’re observing. I can’t speak to GX-specific reasons why this might be occurring but there are some other, external reasons why this might be happening.
- Resource Availability: Production environments often have more complex setups and shared resources. Other applications and services may be running concurrently on the same infrastructure, leading to competition for resources like CPU, memory, and network bandwidth.
- Scale: In production, workflows may handle larger datasets or higher volumes of data, which can naturally lead to longer processing times.
- Infrastructure: Differences in infrastructure between local and production environments can contribute to variations in execution time. For instance, production environments might have distributed systems, load balancers, and more complex network configurations that can introduce latency.
- Dependencies: Production workflows might have dependencies on external services, APIs, databases, etc. Delays in accessing these dependencies can impact the overall execution time.
i tried to deploy a simple pandas dataframe
GreatExpectationsOperator(
task_id=“gx_validate_pg”,
data_context_root_dir=ge_root_dir,
data_asset_name=“strawberries”,
dataframe_to_validate=pd.DataFrame(
{
“id”: [“001”, “002”, “003”, “004”, “005”],
“name”: [
“Strawberry Order 1”,
“Strawberry Order 2”,
“Strawberry Order 3”,
“Strawberry Order 4”,
“Strawberry Order 5”,
],
“amount”: [10, 5, 8, 3, 12],
}
),
execution_engine=“PandasExecutionEngine”,
expectation_suite_name=“strawberry_suite”,
return_json_dict=True,
)
still the same result it took 12 mins to run on prod checkpoint runresult is start of delay