Problem: validator does not work on a shared high concurrency cluster

Hello,

I’m running GX on a shared cluster that has not enabled credential-passthrough for user-level data access. This code snippet:

validator = context.get_validator(
batch_request=batch_request, expectation_suite_name=“test_gcs_suite”
)

produces the following error:

py4j.security.Py4JSecurityException: Constructor public org.apache.spark.SparkConf(boolean) is not whitelisted.

Is there any workaround for this?

Nadya

Hi @Nadya This isn’t a GX issue but has to do with running Spark on Databricks. I would recommend looking into there documentation to solve this problem.

I’ve some here: pyspark - Error running Spark on Databricks: constructor public XXX is not whitelisted - Stack Overflow

Hey @Nadya, thanks for reaching out!

This is a known issue (errors when running GX on Databricks shared clusters) and we’re actively working on a fix. I’ll drop you a note here when we release a fix.

For the time being, I’d recommend using single-user compute on Databricks to run GX, since it doesn’t enforce the process isolation framework that is causing the py4j.security.Py4JSecurityException error.

Hi @Nadya, just a quick update that we recently released GX version 0.18.5, which adds support for for running GX on Databricks shared clusters. This latest version should resolve the py4j.security.Py4JSecurityException errors that you were encountering.

1 Like