My team is using GX to validate csv files. We use FastAPI (and Pydantic) to allow users submitting csv files to retrieve the expected csv format. The format is returned as a json schema, generated from a pydantic BaseModel.
The pydantic object:
class Model(BaseModel):
Column1: str = Field(
title="First Column", description="..."
)
Column2: str = Field(
title="Second Column",
description="...",
nullable=True,
)
The resulting json schema:
{
"title": "schema-name",
"description": "...",
"type": "object",
"properties": {
"Column1": {
"title": "First Column",
"description": "...",
"nullable": false,
"type": "string"
},
"Column2": {
"title": "Second Column",
"description": "...",
"nullable": true,
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
]
},
"required": [
"Column 1"
]
}
We use this same json schema to generate GX expectations via the JsonSchemaProfiler in order to validate the csv files submitted by our users.
As this is not supported any longer by GX, we were considering implementing a formatter in our code base, but would like to know what your strategy is regarding the profilers: are you planning on deleting the feature all along? Do you recommend any other implementation that meets our needs?
Thanks!