What is your recommendation for replacing the deleted JsonSchemaProfiler?

sof · January 24, 2024, 10:36pm

My team is using GX to validate csv files. We use FastAPI (and Pydantic) to allow users submitting csv files to retrieve the expected csv format. The format is returned as a json schema, generated from a pydantic BaseModel.

The pydantic object:

class Model(BaseModel):
    Column1: str = Field(
        title="First Column", description="..."
    )

    Column2: str = Field(
        title="Second Column",
        description="...",
        nullable=True,
    )

The resulting json schema:

{
  "title": "schema-name",
  "description": "...",
  "type": "object",
  "properties": {
    "Column1": {
      "title": "First Column",
      "description": "...",
      "nullable": false,
      "type": "string"
    },
    "Column2": {
      "title": "Second Column",
      "description": "...",
      "nullable": true,
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ]
    },
  "required": [
    "Column 1"
  ]
}

We use this same json schema to generate GX expectations via the JsonSchemaProfiler in order to validate the csv files submitted by our users.

As this is not supported any longer by GX, we were considering implementing a formatter in our code base, but would like to know what your strategy is regarding the profilers: are you planning on deleting the feature all along? Do you recommend any other implementation that meets our needs?

Thanks!

ToivoMattila · February 9, 2024, 4:01pm

Did I understand your use case correctly?

A user submits a CSV file with an unknown schema
You automatically generate a schema for that CSV in JSON format and return that to the user
The user later submits other such CSVs that you then validate against this schema and GX is used for this validation

My understanding is that GX is definitely developing profilers but the current versions/implementations will be deleted.

You might be interested in ydata-profiling as an alternative, some links below
Documentation: Welcome - YData Profiling
Article on how to integrate it with GX: How to Use Ydata-Profiling with Great Expectations V3 API | by Bvolodarskiy | Provectus Articles | Medium

@Aleksei might have more information about this

Topic		Replies	Views
Is there a separate profiler which can be used to run on json data files Archive help-wanted	1	523	March 26, 2021
How to profile data and put profile info to data doc GX Core Support	1	186	October 24, 2024
Is there any alternative for GX v1 Data Profiler? GX Core Support	2	85	February 12, 2025
How to implement my own custom profiler? Archive	2	835	September 2, 2020
Can we specify the data schema in yaml config? GX Core Support	8	728	July 2, 2025

What is your recommendation for replacing the deleted JsonSchemaProfiler?

Related topics