Duplication check

Hi,
I have a very simple question :slight_smile:
How to implement a duplication check if the PRIMARY KEY is COMPOSITE key (name,email)

SELECT
    name, email, COUNT(*)
FROM
    users
GROUP BY
    name, email
HAVING 
    COUNT(*) > 1
1 Like

Hi @mik_q3,

Thanks for asking. I think this PR should implement exactly what youโ€™re looking for. Take a look and like the PR if it would be helpful for you.

While youโ€™re waiting for that PR to get into a release, try this workaround:

  1. Add a column to your dataset concatenating the two fields you need as a composite key.
SELECT
     CONCAT(name, email) AS helper_column_key,
     name,
     email
FROM
     users
;
  1. Configure this new data set to be the dataset that you run Great Expectations on.
  2. Add an expect_column_values_to_be_unique expectation to your suite on the new helper column. https://docs.greatexpectations.io/en/latest/autoapi/great_expectations/dataset/dataset/index.html#great_expectations.dataset.dataset.Dataset.expect_column_values_to_be_unique
  3. When troubleshooting, be sure to account for NULLs and empty strings and spaces in the fields in your composite key. The above code would only work 100% of the time if name and email were always populated. You may need to add something like CONCAT(COALESCE(name,"Blank"),COALESCE(email,"Blank")) for dealing with NULLs etc.
1 Like

@bhcastleton
Thx for the quick answer