How do I use flags with my regex expectations?

austin_gx · July 14, 2023, 7:59pm

View in #gx-community-support on Slack

@Vinicius_Machado_Mansur: Hi people!
I’m trying to test a canadian zipcode column im my table, using this code:
validator.expect_column_values_to_match_regex(
    column="zipcode",
    regex="/^[ABCEGHJ-NPRSTVXY]\d[ABCEGHJ-NPRSTV-Z][ -]?\d[ABCEGHJ-NPRSTV-Z]\d$/gi"
This regex can’t find a match…
If I remove the flags, though, it works perfectly:
validator.expect_column_values_to_match_regex(
    column="zipcode",
    regex="^[ABCEGHJ-NPRSTVXY]\d[ABCEGHJ-NPRSTV-Z][ -]?\d[ABCEGHJ-NPRSTV-Z]\d$"
Am I doing something wrong here? How can I use flags on regex expressions?

austin_gx · July 14, 2023, 8:01pm

@Austin_Robinson_(GX): Hey @Vinicius_Machado_Mansur! Thanks for reaching out.

What is your backend (Pandas / Spark / SQLAlchemy)? The regex is passed from GX through to the backend-specific regex parser, and so support for those flags will be dependent on whether that engine can parse them.

@Vinicius_Machado_Mansur: I’m using Pandas. I wanted to make the test case insensitive (/i)

@Austin_Robinson_(GX): Hey @Vinicius_Machado_Mansur! Thanks for that context.

Pandas utilizes the python re module to parse regex. You can see that documentation here. From their docs:
(?aiLmsux)

    (One or more letters from the set 'a', 'i', 'L', 'm', 's', 'u', 'x'.) The group matches the empty string; the letters set the corresponding flags: re.A (ASCII-only matching), re.I (ignore case), re.L (locale dependent), re.M (multi-line), re.S (dot matches all), re.U (Unicode matching), and re.X (verbose), for the entire regular expression. (The flags are described in Module Contents.) This is useful if you wish to include the flags as part of the regular expression, instead of passing a flag argument to the re.compile() function. Flags should be used first in the expression string.
GX doesn’t currently support passing a flags argument, so the above would be the best way forward here.

@Vinicius_Machado_Mansur: OK, thank you @Austin_Robinson_(GX)!
The “inline” notation worked well:
validator.expect_column_values_to_match_regex(
    column="zipcode",
    regex="(?i)^[ABCEGHJ-NPRSTVXY]\d[ABCEGHJ-NPRSTV-Z][ -]?\d[ABCEGHJ-NPRSTV-Z]\d$"

Topic		Replies	Views
How to customize the message display in the GX Report? GX Core Support	0	146	January 24, 2024
GX validation Result returns broken query "unexpected_index_query" GX Core Support databricks	2	78	April 11, 2025
How to specify expect_column_pair_values_to_be_in_set value_pairs_set input arg via json GX Core Support how-to	7	445	March 6, 2024
Does GX support date format check? GX Core Support help-wanted	9	204	October 16, 2024
Custom Expectations for Pandas Archive help-wanted	0	540	July 6, 2022

How do I use flags with my regex expectations?

Related topics