How to validate a list of values against a set

Hello! I’m looking for some information on how to validate a list of strings in a column against a preset list of values.

Here is an example:
data column = “Codes”: value = [“01”,“02”,“03”,“04”,“05”]
value_set = [“01”,“02”,“03”,“04”,“05”,“06”,“07”,“08”,“09”,“10”]

What I am trying to to do is loop through the “Codes” list and compare each value in the list against each value in the value_set list. This example here should pass because each value in codes exists in the value set.

If a row had a column “Codes” with a value [“01”,“02”,“W”,“X”,“03”] I would expect this to fail.

I have tried these so far but they are not working for me.
expect_column_values_to_match_like_pattern_list
expect_column_values_to_be_in_set

Thank you for the tips/help in advance!

Hello friends! any help out there for this?

Hi @jakethesdet, thanks for reaching out!

I’m not aware of any out-of-the-box Expectations that will let you validate a list within each row of your Codes column - the list of strings within the cells is what makes this a trickier use case. I think your options here are:

  1. You could create a custom Expectation (I’d check out Custom Column Map Expectations first) that properly parses the list within each cell and then validates the list contents against your value_set.

  2. You could preprocess your data before you validate it with GX. If you explode your tabular data on the Codes column, you could then use the expect_column_values_to_be_in_set expectation to validated using your value_set, and it will fail on individual Codes values. With this approach, you could have multiple failures for the same original row - depending on your use case, you might need to do some post-processing on the Expectation output to get a clean report on which rows failed validation.

1 Like

Thank you for this! It is what I was expecting but wanted to make sure before diving in!