How do I clean old html files from data docs?

We’ve been running our expectations through GX for some months now, and it’s been working brilliantly for us. However, over time the process has been slowing down.

We’ve identified that the issue is the build-up of historic data docs.

We are using the update_data_docs action to update data docs as part of our checkpoint.

Adding a validation_results_limit (below) reduces the size of the index page, but leaves all the existing html documents in place and doesn’t give the speed boost we’re after.

data_docs_sites={
    "local_site": {
        "class_name": "SiteBuilder",
        "store_backend": {
            "class_name": "TupleFilesystemStoreBackend", 
            "base_directory": "/dbfs/data-docs/"
        },
        "site_index_builder": {
            "class_name": "DefaultSiteIndexBuilder", 
            "validation_results_limit": 100
        },
    }
},

Specifying an empty directory for the data docs makes everything much faster, but produces data docs for the current run only. We would like to include some historic runs, but not everything since the beginning of time.

I came across a reference to great_expectations docs clean here, which sounds like what we need. However, I’m unsure how to thigger this from our python notebook. (I tried using %sh, but was presented with an error about the lack of a gx directory.)

Thanks in anticipation!