This article is for comments to: How to validate data without a Checkpoint — great_expectations documentation
Please comment +1 if this How to is important to you.
This article is for comments to: How to validate data without a Checkpoint — great_expectations documentation
Please comment +1 if this How to is important to you.
This how-to guide is a stub and has not been published yet.
In the meantime, when you create a Great Expectations project by running great_expectations init, your new project contains 3 Jupyter notebooks in the notebooks directory that have a working step by step example of running validation using Validation Operators (instead of Checkpoint). Each of the three notebooks have a slight variation to show the differences when validating in Pandas, Spark or a database.
Hi, I’ve tried to run the pandas example for validating a pandas dataframe, using version 0.13.33 of Great Expectations. I’m running macOS 11.6. I get the following error when I call batch = context.get_batch(batch_kwargs, expectation_suite_name)
from the notebook:
Traceback (most recent call last):
File "/Users/colmginty/Projects/dsa-great-expectations-spike/validate_against_expectations.py", line 107, in <module>
batch = context.get_batch(batch_kwargs, expectation_suite_name)
File "/Users/colmginty/.pyenv/versions/venv-3.8.2/lib/python3.8/site-packages/great_expectations/data_context/data_context.py", line 1566, in get_batch
return self._get_batch_v2(
File "/Users/colmginty/.pyenv/versions/venv-3.8.2/lib/python3.8/site-packages/great_expectations/data_context/data_context.py", line 1267, in _get_batch_v2
batch = datasource.get_batch(
AttributeError: 'Datasource' object has no attribute 'get_batch'
I’m guessing this is a version problem? Ideally I’d like to use a recent version of the library.
Another thing from that example notebook…
# If you already loaded the data into a Pandas Data Frame:
batch_kwargs = {'dataset': "YOUR_DATAFRAME", 'datasource': datasource_name}
How am I to derive a string representing my pandas dataframe? Is that a typo in the example, or am I misunderstanding how GE works, and expects me to deal with the dataframe? I expect to pass in the name of the variable pointing to my dataframe, instead of a string.
Hi @Ntlzyjstdntcare - This is an old path. Those notebook directories are going to be deleted in one of the upcoming releases. They have outdated methods. Technically, we could still get this working, but I think you might be better off using the V3 api which will be the standard api very soon and just using the documentation for connecting to your data using the v3 api.
I would recommend going through the getting started tutorial: Getting started with Great Expectations | Great Expectations
Then look at the guide to connect to pandas:
How to connect to in-memory data in a Pandas dataframe | Great Expectations
Thanks @bhcastleton ! I’ve got the pandas connection code working. Now I want to launch data docs that show the validation results. The getting started tutorial shows how to do that, but using a checkpoint. I can’t figure out how to use checkpoints with my RuntimeBatchRequest from pandas example, and I can’t figure out how to launch validation results without using checkpoints. Can you point me to examples that can get me unstuck?