I followed this How to create an Expectation Suite with the Onboarding Data Assistant | Great Expectations and Tried to run onboarding data assitant script except 7 cases all other fails and the exception which I get is (google.api_core.exceptions.BadRequest: 400 Invalid date: ‘2023-09-12T00:00:00’) and sqlalchemy points to Error Messages — SQLAlchemy 1.4 Documentation, I tried multiple things downgrading dependencies versions etc but no luck, can anyone please help me here I would really appreciate
Hi @Ravindra! I noticed that you referenced the Onboarding Data Assistant tutorial from an older version of GX - what version of GX are you currently running?
If you’re able to upgrade to the latest version, does the error still persist?
Hi @rachel.house I am using the latest version 0.17.16 following this example : Create an Expectation Suite with the Onboarding Data Assistant | Great Expectations.
Some how Great Expectation is treating Date field in Bigquery as datetime field the parameters which are sent with query is datetime instead of date for a date field due to which around 67 expectations fails.
Hi @Ravindra, thanks for the additional context! I have a couple more questions to help make sure I understand the behavior you’re seeing. (I’m still relatively new to GX as well, thanks for your patience).
You’re connecting to a BigQuery database and then running the Onboarding Data Assistant.
- Does the
run(...)
method fail with thegoogle.api_core.exceptions.BadRequest: 400 Invalid date
Exception that you included in your original post - Or does the
run(...)
method succeed, and then Expectations from the generated Expectation Suite fail with date/datetime data type errors?
Hi @rachel.house this is the exception: google.api_core.exceptions.BadRequest: 400 Invalid date: ‘2023-09-12T00:00:00’ ,
I do checkpoint.run() which executes , and then I do 1.
context.view_validation_result(checkpoint_result)
which opens the html on browser there I see this exception
Hey @Ravindra, can you please share the code you’re using that results in the exception, so that we can try to reproduce the error? Thanks!
Sure @rachel.house, please find the code below:
import great_expectations as gx
from great_expectations.exceptions import DataContextError
context = gx.get_context()
expectation_suite_name = "Onboarding_suite"
try:
suite = context.get_expectation_suite(expectation_suite_name=expectation_suite_name)
print(
f'Loaded ExpectationSuite "{suite.expectation_suite_name}" containing {len(suite.expectations)} expectations.'
)
except DataContextError:
suite = context.add_expectation_suite(expectation_suite_name=expectation_suite_name)
print(f'Created ExpectationSuite "{suite.expectation_suite_name}".')
datasource = context.sources.add_or_update_sql(
name="my_bigquery_datasource",
connection_string="bigquery://project-dev", #changed_my_actaull_project_name
create_temp_table=False,
)
table_asset = datasource.add_table_asset(
name="mytable", table_name="dataset.tablename" #changed my actaual dataset/tablename
)
request = table_asset.build_batch_request()
context.add_or_update_expectation_suite(expectation_suite_name=expectation_suite_name)
validator = context.get_validator(
batch_request=request, expectation_suite_name=expectation_suite_name
)
data_assistant_result = context.assistants.onboarding.run(
validator=validator,
exclude_column_names=[],
)
validator.expectation_suite = data_assistant_result.get_expectation_suite(
expectation_suite_name=expectation_suite_name
)
validator.save_expectation_suite(discard_failed_expectations=False)
checkpoint = context.add_or_update_checkpoint(
name=expectation_suite_name,
validator=validator,
)
checkpoint_result = checkpoint.run()
context.view_validation_result(checkpoint_result)
Thanks, @Ravindra! So I don’t see any problems with your code - as a next step, I’d suggest trying to isolate why and where this issue is happening by simplifying what your code is doing.
Instead of using the Onboarding Data Assistant, can you try instead creating a simple Expectation Suite with just a few Expectations that validates the specific date/datetime columns in your BigQuery table?
If you are able to reproduce the error with the simplified code, would you then be able to share:
- your simplified code
- a sample of your BigQuery data, including the column data types
If your code succeeds without using the Onboarding Data Assistant, then we’ll dig in there next.