How to implement my own custom profiler?

We heard this question a few times from users, so I want to put a couple of points in writing, since more people will find them useful.

Users notice that BasicDatasetProfiler can generate loosely-specified expectations without constraints. In case the user has a dataset with multiple columns/features, it may be impractical to author all the expectations manually.

Before resorting to writing your own profiler, it is worth checking out an alternative profiler that is built in into GE. Run this CLI command: great_expectations suite scaffold SOME_SUITE_NAME. It will generate a Jupyter notebook that will let you control the columns and the types of expectations the profiler will profile. More details about this command: How to use the Great Expectations command line interface (CLI) — great_expectations documentation

If after trying this, you still have to implement your own profiler, extend this class: BasicSuiteBuilderProfiler. It has a lot of functionality you will find useful (e.g., classifying columns into low-cardinality, numeric and string ones and selecting reasonable types of expectations for each).

As usual, if you need help, reach us on Slack: Slack

Can we have the link above updated to the current working address for these docs? :pray:

1 Like

Hi Sandra! Thanks for finding this broken link! I believe this is the relevant document: https://docs.greatexpectations.io/en/latest/guides/how_to_guides/miscellaneous/command_line.html?highlight=scaffold#great-expectations-suite-scaffold-suite-name

2 Likes