Hello! I’m looking for some information on how to validate a list of strings in a column against a preset list of values.
Here is an example:
data column = “Codes”: value = [“01”,“02”,“03”,“04”,“05”]
value_set = [“01”,“02”,“03”,“04”,“05”,“06”,“07”,“08”,“09”,“10”]
What I am trying to to do is loop through the “Codes” list and compare each value in the list against each value in the value_set list. This example here should pass because each value in codes exists in the value set.
If a row had a column “Codes” with a value [“01”,“02”,“W”,“X”,“03”] I would expect this to fail.
I have tried these so far but they are not working for me.
Thank you for the tips/help in advance!
Hello friends! any help out there for this?
Hi @jakethesdet, thanks for reaching out!
I’m not aware of any out-of-the-box Expectations that will let you validate a list within each row of your
Codes column - the list of strings within the cells is what makes this a trickier use case. I think your options here are:
You could create a custom Expectation (I’d check out Custom Column Map Expectations first) that properly parses the list within each cell and then validates the list contents against your
You could preprocess your data before you validate it with GX. If you explode your tabular data on the
Codes column, you could then use the
expect_column_values_to_be_in_set expectation to validated using your
value_set, and it will fail on individual
Codes values. With this approach, you could have multiple failures for the same original row - depending on your use case, you might need to do some post-processing on the Expectation output to get a clean report on which rows failed validation.
Thank you for this! It is what I was expecting but wanted to make sure before diving in!