Guide 08 May 2026 12 min read

Regular Expressions (Regex) Complete Guide: Patterns, Syntax & Real Examples (2026)

Master regex from scratch. Learn character classes, quantifiers, groups, lookaheads, and real-world patterns for email validation, URL parsing, log analysis, and more.

Regex Complete Guide

What Is a Regular Expression?

A regular expression (regex) is a sequence of characters that defines a search pattern. It is used to match, extract, replace, or validate text. Regex is supported in virtually every programming language — Python, JavaScript, PHP, Java, Go, Ruby — and in command-line tools like grep, sed, and awk.

A regex pattern like \b[A-Z][a-z]+\b means: match a word that starts with an uppercase letter followed by one or more lowercase letters. Once you understand the building blocks, you can construct patterns for almost any text processing task.

Key insight: Regex patterns are not code — they are descriptions of text structure. Thinking in terms of "what does this text look like?" rather than "how do I find it?" makes regex much easier to write.

Core Syntax Reference

Literal Characters

Most characters match themselves. The pattern hello matches the exact string "hello" wherever it appears in the input text.

Character Classes [...]

Match any one character from a set:

  • [abc] — matches "a", "b", or "c"
  • [a-z] — matches any lowercase letter
  • [A-Za-z0-9] — matches any alphanumeric character
  • [^abc] — matches any character EXCEPT "a", "b", or "c" (negation)
  • [a-zA-Z_][a-zA-Z0-9_]* — matches a valid identifier (letter or underscore, followed by letters/digits/underscores)

Shorthand Character Classes

  • \d — digit (same as [0-9])
  • \D — non-digit (same as [^0-9])
  • \w — word character (letter, digit, or underscore: [a-zA-Z0-9_])
  • \W — non-word character
  • \s — whitespace (space, tab, newline, etc.)
  • \S — non-whitespace
  • . — any character except newline (use [\s\S] to include newlines)

Quantifiers

Control how many times the preceding element must match:

  • * — 0 or more times
  • + — 1 or more times
  • ? — 0 or 1 time (makes the element optional)
  • {3} — exactly 3 times
  • {2,5} — between 2 and 5 times
  • {3,} — 3 or more times
Greedy vs. Lazy:
By default, quantifiers are greedy — they match as much as possible.
Add ? after a quantifier to make it lazy — match as little as possible.
<.+> on <b>bold</b> matches the entire string.
<.+?> matches only <b>.

Anchors

Anchors do not match characters — they match positions in the string:

  • ^ — start of string (or start of line in multiline mode)
  • $ — end of string (or end of line in multiline mode)
  • \b — word boundary (between a \w and a \W character)
  • \B — non-word boundary
  • \A — absolute start of string (Python/Java/Ruby; ignores multiline)
  • \Z — absolute end of string

Groups and Alternation

  • (abc)capturing group: matches "abc" and captures the match for later use
  • (?:abc)non-capturing group: groups without capturing (more efficient)
  • cat|dog — alternation: matches "cat" or "dog"
  • (cat|dog)s? — matches "cat", "cats", "dog", or "dogs"
  • \1, \2backreference: refers to the content of group 1, group 2
Named groups (Python/JS/PHP):
(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})
Names the groups "year", "month", "day" for readable extraction.

Lookaheads and Lookbehinds

Lookaround assertions match a position based on what comes before or after it, without consuming the characters:

  • (?=abc)positive lookahead: position is followed by "abc"
  • (?!abc)negative lookahead: position is NOT followed by "abc"
  • (?<=abc)positive lookbehind: position is preceded by "abc"
  • (?<!abc)negative lookbehind: position is NOT preceded by "abc"
Example: Extract prices without the currency symbol:
Pattern: (?<=\$)\d+(\.\d{2})?
Input: "Pay $29.99 or $9.00 monthly"
Matches: "29.99", "9.00" (the symbol is checked but not included in the match)

Regex Flags (Modifiers)

Flags change how the engine interprets the pattern:

  • icase-insensitive: /hello/i matches "Hello", "HELLO", "hElLo"
  • gglobal: find all matches, not just the first (JavaScript)
  • mmultiline: ^ and $ match start/end of each line
  • sdotall: makes . match newlines too
  • x / verbose — ignore whitespace in pattern and allow comments (great for long patterns)

20 Essential Regex Patterns

Validation Patterns

# Email address (simplified but practical)
^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$

# URL
https?://[\w\-]+(\.[\w\-]+)+(/[\w\-._~:/?#[\]@!$&'()*+,;=%]*)?

# IPv4 address
^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$

# Phone number (flexible international)
^\+?[\d\s\-().]{7,20}$

# Postal code (US ZIP)
^\d{5}(-\d{4})?$

# Credit card (generic, 13-19 digits)
^[\d ]{13,19}$

# Strong password (min 8 chars, uppercase, lowercase, digit, special)
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$

# Date YYYY-MM-DD
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

# Hex color code
^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$

# Semantic version (1.2.3)
^\d+\.\d+\.\d+$

Extraction Patterns

# Extract all URLs from text
https?://[^\s<>"']+

# Extract hashtags
#[a-zA-Z]\w*

# Extract @mentions
@[a-zA-Z0-9_]+

# Extract quoted strings
"([^"\]|\.)*"

# Extract HTML tag content
<(\w+)[^>]*>(.*?)<\/\1>

# Extract numbers (integers and decimals)
-?\d+(\.\d+)?

# Extract words (no punctuation)
\b[a-zA-Z]+\b

# Extract lines containing a keyword (case-insensitive)
(?im)^.*error.*$

# Extract everything between parentheses
\(([^)]*)\)

# Extract JSON key-value pairs
"(\w+)"\s*:\s*"([^"]*)"

Real-World Use Cases

Log File Analysis

Parse Apache/Nginx access logs to extract IP, timestamp, method, path, and status code:

^(\S+)\s+\S+\s+\S+\s+\[([^\]]+)\]\s+"(\S+)\s+(\S+)\s+\S+"\s+(\d{3})\s+(\d+|-)

Groups: 1=IP, 2=datetime, 3=method, 4=path, 5=status, 6=bytes

Data Cleaning

Remove all HTML tags from a string:

<[^>]+> → replace with empty string

Normalise whitespace (replace multiple spaces/tabs/newlines with a single space):

\s+ → replace with " "

Remove leading and trailing whitespace:

^\s+|\s+$ → replace with empty string

Code Analysis

Find all TODO comments in source code:

//\s*TODO:?\s*(.*)

Find function definitions (Python):

^def\s+(\w+)\s*\(([^)]*)\):

Find import statements (JavaScript/TypeScript):

import\s+(?:\{[^}]+\}|[\w*]+)\s+from\s+['"]([^'"]+)['"]

Performance Tips

  • Avoid catastrophic backtracking: Nested quantifiers like (a+)+ on a long non-matching string can take exponential time. Use atomic groups or possessive quantifiers when available.
  • Prefer non-capturing groups: Use (?:...) instead of (...) when you do not need to capture — it is slightly faster.
  • Anchor your patterns: Starting a pattern with ^ or \b tells the engine where to start matching, which dramatically reduces unnecessary work.
  • Use character classes instead of alternation: [aeiou] is faster than a|e|i|o|u.
  • Compile patterns once: In Python (re.compile()) and Java (Pattern.compile()), compile your pattern once and reuse it instead of recompiling it in a loop.
  • Be specific: \d{4} is faster than \d+ when you know the exact length.

Common Regex Mistakes

  • Forgetting to escape metacharacters: ., *, +, ?, (, ), [, {, ^, $, |, \ are special. To match a literal dot, write \. not just ..
  • Using . when you mean "any visible character": . also matches spaces and tabs. If you want to exclude whitespace, use \S.
  • Greedy matching eating too much: <.+> on <b>text</b> matches the whole string. Use <.+?> or <[^>]+>.
  • Forgetting word boundaries: The pattern cat also matches "concatenate". Use \bcat\b to match the word "cat" only.
  • Assuming ^ and $ work per line by default: In most engines, they match the start/end of the entire string. Add the m flag for line-by-line matching.
  • Case-sensitive by default: Regex is case-sensitive unless you add the i flag. /hello/ will not match "Hello".

Regex in Different Languages

# Python
import re
pattern = re.compile(r"\b\w+@\w+\.\w+\b")
matches = pattern.findall("Send to [email protected] or [email protected]")

# JavaScript
const emails = text.match(/\b\w+@\w+\.\w+\b/g);

# PHP
preg_match_all('/\b\w+@\w+\.\w+\b/', $text, $matches);

# Go
re := regexp.MustCompile(`\b\w+@\w+\.\w+\b`)
matches := re.FindAllString(text, -1)

# Bash (grep)
grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt
Test Your Regex Patterns for Free

Write, test, and debug regular expressions with real-time match highlighting, group capture display, and a built-in pattern library.