How to implement my own custom profiler?

eugene.mandel · May 12, 2020, 5:55pm

We heard this question a few times from users, so I want to put a couple of points in writing, since more people will find them useful.

Users notice that BasicDatasetProfiler can generate loosely-specified expectations without constraints. In case the user has a dataset with multiple columns/features, it may be impractical to author all the expectations manually.

Before resorting to writing your own profiler, it is worth checking out an alternative profiler that is built in into GE. Run this CLI command: great_expectations suite scaffold SOME_SUITE_NAME. It will generate a Jupyter notebook that will let you control the columns and the types of expectations the profiler will profile. More details about this command: How to use the Great Expectations command line interface (CLI) — great_expectations documentation

If after trying this, you still have to implement your own profiler, extend this class: BasicSuiteBuilderProfiler. It has a lot of functionality you will find useful (e.g., classifying columns into low-cardinality, numeric and string ones and selecting reasonable types of expectations for each).

As usual, if you need help, reach us on Slack: Slack

Sandra · September 2, 2020, 6:36pm

Can we have the link above updated to the current working address for these docs?

anthony · September 2, 2020, 7:17pm

Hi Sandra! Thanks for finding this broken link! I believe this is the relevant document: https://docs.greatexpectations.io/en/latest/guides/how_to_guides/miscellaneous/command_line.html?highlight=scaffold#great-expectations-suite-scaffold-suite-name

Topic		Replies	Views
Profiling a dataset / scaffolding an expectation suite without using the CLI (e.g. spark, databricks) Archive how-to	2	1155	October 23, 2020
How to create an Expectation Suite with the User Configurable Profiler Archive	0	456	February 17, 2021
We have Great Expectations for Pandas Profiling \| Great Expectations Archive	0	652	February 26, 2021
Can I use Great Expectations to profile data and produce documentation and visuals without having to invoke all the functionality around expectations and validation results? Archive	3	983	September 13, 2022
BasicDataProfiler Broken in 0.13.x? Archive	0	454	July 6, 2021

How to implement my own custom profiler?

Related topics