Analytics enabled by default configuration (GX core)

I would like to bring to the GX team’s attention a critical analysis of the tracking of GX core analytics events. I would like this feedback to provide some food for thought on how to improve GX.

The documentation only provides a general overview of the data collected, without going into detail. This is critical, especially in a context like the EU, where transparency and control over data are very important.

The element where I find the most critical is the default setting for analytics data acquisition. In great_expectations.yml, during configuration, it would be more correct to have analytics_enabled: False, by default, giving the user the ability to express explicit consent.

Missing elements in the documentation

Data specificity: The documentation mentions “which GX features are used with which operating system and Python version”. However, it does not specify which features are tracked exactly. For example, are the names of the datasets recorded? Is information collected about the frequency of use of certain features?

There is a lack of a clear definition of what is meant by “analytics events”.

Purpose Details: “Help us improve Great Expectations” is a generic purpose. Users would like to know how this data is actually used to improve the product. For example, is it used to identify bugs, optimize performance, or develop new features?

Anonymization and Security: The documentation does not provide information on how (technically) the data is anonymized or pseudonymized.

There is a lack of details on the security measures in place to protect the data from loss, unauthorized access, or disclosure.

Sharing with third parties: Users would like to know if the data is shared with third parties, and if so, for what purposes.

Data retention: How long is the data collected retained? Users have the right to know how long their data will be retained.

Legal information: the privacy policy is comprehensive of each service offered. It would be better to specify which are the common features of all the services offered and which instead require specific attention.

Questions that a serious user might ask themselves are:

  • What are the risks to my privacy?
  • ​​How can I be sure that my data is safe?
  • Can I access my data and correct or delete it?
  • How can I file a complaint if I believe that my data has been processed improperly?

Below are some suggestions for improving communication on this topic:

  • Provide a detailed list of the data collected, with specific examples.

  • Explain clearly and concisely how the data is used to improve the product.

  • Provide information on the anonymization and security measures adopted.

  • Indicate whether the data is shared with third parties and for what purposes.

  • Specify the data retention period.

  • Add information regarding the legal basis of the processing, based on which the user can determine whether the use of GX core is in accordance with their reality or if further measures are necessary.

  • Provide a link to the full privacy policy, specifying the differences between the data processed in the different services offered (core, cloud…). What is reported in the “Usage Information” section of the privacy policy page is not detailed by type of service.

In conclusion, it is essential to provide more detailed and transparent information on the collection and use of analytical data and prevent the default settings from sending data without the user having given explicit consent. This will not only increase user trust, but also their engagement and GX’s effective development in data protection, which is one of the key quality dimensions.