The first time I saw a regex, I thought someone had dropped a bowl of spaghetti on my keyboard. Characters like \d+, [a-zA-Z], and ^.*$ looked like ancient hieroglyphics. Fifteen years later, regex is one of my most-used tools.
The thing about regex is that it follows the snowflake principle: everyone thinks theirs is unique, but they're all made of the same patterns. Once you understand the building blocks, you can assemble any pattern you need.
The Building Blocks
Literals and Metacharacters
Most characters in a regex match themselves. "abc" matches the string "abc". Simple. But some characters have special meaning: . ^ $ * + ? { } [ ] \ | ( )
The Dot (.)
The dot matches any single character except newline. "a.c" matches "abc", "a1c", "a c", but not "ac" (no character between a and c).
Character Classes [ ]
Match one character from a set. "[aeiou]" matches any vowel. "[0-9]" matches any digit. "[a-zA-Z]" matches any letter.
# Match US phone numbers like (555) 123-4567
\(\d{3}\) \d{3}-\d{4}
# Match dates like 2024-01-15
\d{4}-\d{2}-\d{2}
Quantifiers
How many times does something match? * means zero or more, + means one or more, ? means zero or one, {n} means exactly n times.
Anchors
^ matches start of string, $ matches end. "\d+" by itself finds digits anywhere. "^\d+$" matches only if the entire string is digits.
Practical Examples
Email Validation
Real email validation is surprisingly complex, but for most purposes: "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
URL Parsing
Pull apart a URL: "https?://([^/]+)(/.*)?" captures the domain and path.
Password Strength
Require at least 8 chars, one uppercase, one lowercase, one number: "^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$"
Common Mistakes
- Not escaping special characters when you want literal matches
- Being too greedy with .* and matching more than intended
- Forgetting that regex matching is greedy by default
- Not considering what should be anchored vs unanchored