Execute checkpoint behind proxy

DJuanes · July 28, 2023, 8:49pm

Hello
I wanted to know if there is a way to run Great Expectations behind a corporate proxy.
Locally the checkpoint works fine, but when deploying it I receive the following error:
requests.exceptions.ProxyError: HTTPSConnectionPool(host=‘stats.greatexpectations.io’, port=443): Max retries exceeded with url: /great_expectations/v1/usage_statistics (Caused by ProxyError(‘Cannot connect to proxy.’, OSError( ‘Tunnel connection failed: 407 AuthorizedOnlyYPF’)))

DJuanes · July 31, 2023, 6:41pm

Hello,
Can someone help me with this problem?

HaebichanGX · August 2, 2023, 4:00pm

Hi @DJuanes it looks like you have to turn off anonymous_usage_statistics --Tthat should be a key in your great_expectations.yml that needs to be flipped from true to false. Please let me know if that works for you!

DJuanes · August 2, 2023, 8:26pm

Thanks for the reply @HaebichanGX
I set that option to false, but now it shows me another error.
Running it locally works fine, but not in production environment (Azure Web App)
Traceback (most recent call last):
File “/usr/local/lib/python3.9/site-packages/opentelemetry/trace/init.py”, line 573, in use_span
yield span
File “/usr/local/lib/python3.9/site-packages/opentelemetry/sdk/trace/init.py”, line 1045, in start_as_current_span
yield span_context
File “/parent-child/app/api.py”, line 179, in _etl
df_pressures, df_estados = main.elt_data()
File “/parent-child/src/main.py”, line 53, in elt_data
valid_data = data.validate_input()
File “/parent-child/src/data.py”, line 29, in validate_input
result: CheckpointResult = data_context.run_checkpoint(
File “/usr/local/lib/python3.9/site-packages/great_expectations/core/usage_statistics/usage_statistics.py”, line 260, in usage_statistics_wrapped_method
result = func(*args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/great_expectations/data_context/data_context/abstract_data_context.py”, line 2447, in run_checkpoint
return self._run_checkpoint(
File “/usr/local/lib/python3.9/site-packages/great_expectations/data_context/data_context/abstract_data_context.py”, line 2491, in _run_checkpoint
result: CheckpointResult = checkpoint.run_with_runtime_args(
File “/usr/local/lib/python3.9/site-packages/great_expectations/checkpoint/checkpoint.py”, line 911, in run_with_runtime_args
return self.run(**checkpoint_run_arguments)
File “/usr/local/lib/python3.9/site-packages/great_expectations/core/usage_statistics/usage_statistics.py”, line 260, in usage_statistics_wrapped_method
result = func(*args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/great_expectations/checkpoint/checkpoint.py”, line 304, in run
self._run_validation(
File “/usr/local/lib/python3.9/site-packages/great_expectations/checkpoint/checkpoint.py”, line 479, in _run_validation
validator: Validator = self._validator or self.data_context.get_validator(
File “/usr/local/lib/python3.9/site-packages/great_expectations/data_context/data_context/abstract_data_context.py”, line 2741, in get_validator
return self.get_validator_using_batch_list(
File “/usr/local/lib/python3.9/site-packages/great_expectations/data_context/data_context/abstract_data_context.py”, line 2767, in get_validator_using_batch_list
raise gx_exceptions.InvalidBatchRequestError(
great_expectations.exceptions.exceptions.InvalidBatchRequestError: Validator could not be created because BatchRequest returned an empty batch_list.
Please check your parameters and try again.

HaebichanGX · August 3, 2023, 2:29pm

@DJuanes It’s hard to diagnose without seeing the code. Can you share yours?

DJuanes · August 3, 2023, 2:52pm

Ok. This is the validation function:

def validate_input() → bool:
“”"Valida los datos de entrada ejecutando la suite de GE

Returns:
    bool: Si los datos son validos o no
"""

data_context: DataContext = DataContext(context_root_dir="./tests/great_expectations")

result: CheckpointResult = data_context.run_checkpoint(
    checkpoint_name="novedades_gidi",
    batch_request=None,
    run_name=None,
)

return result.success

DJuanes · August 3, 2023, 2:55pm

And great_expectations.yml:

config_version: 3.0

datasources:
local_data:
execution_engine:
class_name: PandasExecutionEngine
module_name: great_expectations.execution_engine
class_name: Datasource
module_name: great_expectations.datasource
data_connectors:
default_inferred_data_connector_name:
base_directory: …..\data\input
class_name: InferredAssetFilesystemDataConnector
module_name: great_expectations.datasource.data_connector
default_regex:
group_names:
- data_asset_name
pattern: (.*)
default_runtime_data_connector_name:
assets:
my_runtime_asset_name:
batch_identifiers:
- runtime_batch_identifier_name
class_name: Asset
module_name: great_expectations.datasource.data_connector.asset
class_name: RuntimeDataConnector
module_name: great_expectations.datasource.data_connector

config_variables_file_path: config_variables.yml

plugins_directory: plugins/

stores:
expectations_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: expectations/

validations_store:
class_name: ValidationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/validations/

evaluation_parameter_store:
class_name: EvaluationParameterStore

checkpoint_store:
class_name: CheckpointStore
store_backend:
class_name: TupleFilesystemStoreBackend
suppress_store_backend_id: true
base_directory: checkpoints/

profiler_store:
class_name: ProfilerStore
store_backend:
class_name: TupleFilesystemStoreBackend
suppress_store_backend_id: true
base_directory: profilers/

expectations_store_name: expectations_store
validations_store_name: validations_store
evaluation_parameter_store_name: evaluation_parameter_store
checkpoint_store_name: checkpoint_store

data_docs_sites:
local_site:
class_name: SiteBuilder
# set to false to hide how-to buttons in Data Docs
show_how_to_buttons: true
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/data_docs/local_site/
site_index_builder:
class_name: DefaultSiteIndexBuilder
blob_storage:
class_name: SiteBuilder
store_backend:
class_name: TupleAzureBlobStoreBackend
container: $web
connection_string: ${GE_STORAGE_CONNECTION_STRING}
site_index_builder:
class_name: DefaultSiteIndexBuilder

anonymous_usage_statistics:
enabled: false
data_context_id: 7ad54930-713b-41c9-9c2c-36ef198c8960
notebooks:

DJuanes · August 7, 2023, 12:48pm

Is this a proxy problem or is it something else?

DJuanes · August 9, 2023, 12:43pm

Is this a proxy problem or is it something else?

DJuanes · August 11, 2023, 12:13pm

This tool works in production environments or only for development?
It’s really frustrating.
Great Deception!!!

HaebichanGX · August 15, 2023, 2:14pm

Hi there @DJuanes, no GX works for both production and development but it really depends on how you set up GX. It does look like you have a previous implementation of GX, based on looking at your config. But to be sure can you please share your entire code on how you implemented GX end to end?

For more context, we are currently using Fluent Data Source (FDS) which no longer utilizes Data Connectors. Please use this flowchart to visualize the end to end implementation of the new system: GX Simple Use Case Flow Chart | Lucidspark

Topic		Replies	Views
Great Expectations checkpoint quits without error GX Core Support	3	129	July 30, 2024
Unable to great expectations checkpoint GX Core Support	1	419	January 3, 2024
Unable to run Great Expectations on PySpark using SparkDFExecutionEngine GX Core Support	0	114	July 12, 2024
Hi, I am new to Great Expectations. Please help in resolving 1 issue in Checkpoints creation Archive help-wanted	0	636	September 9, 2021
GX not rendering or showing the HTML representation of the Validation Results GX Core Support	2	228	December 8, 2023

Execute checkpoint behind proxy

Related topics