Hello! I’m looking for some information on how to validate a list of strings in a column against a preset list of values.
Here is an example:
data column = “Codes”: value = [“01”,“02”,“03”,“04”,“05”]
value_set = [“01”,“02”,“03”,“04”,“05”,“06”,“07”,“08”,“09”,“10”]
What I am trying to to do is loop through the “Codes” list and compare each value in the list against each value in the value_set list. This example here should pass because each value in codes exists in the value set.
If a row had a column “Codes” with a value [“01”,“02”,“W”,“X”,“03”] I would expect this to fail.
I have tried these so far but they are not working for me.
expect_column_values_to_match_like_pattern_list
expect_column_values_to_be_in_set
Thank you for the tips/help in advance!
Hello friends! any help out there for this?
Hi @jakethesdet, thanks for reaching out!
I’m not aware of any out-of-the-box Expectations that will let you validate a list within each row of your Codes
column - the list of strings within the cells is what makes this a trickier use case. I think your options here are:
-
You could create a custom Expectation (I’d check out Custom Column Map Expectations first) that properly parses the list within each cell and then validates the list contents against your value_set
.
-
You could preprocess your data before you validate it with GX. If you explode your tabular data on the Codes
column, you could then use the expect_column_values_to_be_in_set
expectation to validated using your value_set
, and it will fail on individual Codes
values. With this approach, you could have multiple failures for the same original row - depending on your use case, you might need to do some post-processing on the Expectation output to get a clean report on which rows failed validation.
1 Like
Thank you for this! It is what I was expecting but wanted to make sure before diving in!