Regex Cheat Sheet

Q: การใช้งาน regex ในชีวิตจริงที่พบบ่อยที่สุดคืออะไร?

Regex ถูกนำไปใช้อย่างแพร่หลายในการ: ตรวจสอบรูปแบบการป้อนข้อมูล (อีเมล หมายเลขโทรศัพท์ รหัสไปรษณีย์), ค้นหาและแทนที่ข้อความในโปรแกรมแก้ไขโค้ด, แยกวิเคราะห์ไฟล์ log และดึงข้อมูลเฉพาะ, กำหนดเส้นทาง URL ใน web framework, ตรวจสอบฟอร์มทั้งฝั่ง client และ server และการดึงข้อมูลเว็บ ส่วน "Pattern ทั่วไป" ของ cheat sheet นี้มีตัวอย่างพร้อมใช้สำหรับหลายสถานการณ์

คู่มืออ้างอิงด่วนสำหรับนิพจน์ปกติ — ค้นหา คัดลอก และนำไปใช้

Anchors

Pattern	Description	Example
`^`	Start of string (or start of line in multiline mode)	`^Hello matches 'Hello world' but not 'Say Hello'`
`$`	End of string (or end of line in multiline mode)	`world$ matches 'Hello world' but not 'world peace'`
`\A`	Start of string (never matches at line breaks)	`\AHello matches only if string starts with 'Hello'`
`\Z`	End of string (before optional final newline)	`world\Z matches 'Hello world' at end`
`\z`	Absolute end of string	`end\z matches absolute end of string`
`\b`	Word boundary (between \w and \W)	`\bcat\b matches 'cat' in 'the cat sat' but not 'catch'`
`\B`	Non-word boundary	`\Bcat\B matches 'cat' in 'concatenate' but not standalone 'cat'`
`\G`	Start of current match (useful with global flag)	`\Gfoo matches consecutive 'foo' occurrences from last match`
`(?m)^`	Start of each line in multiline mode	`(?m)^line matches 'line' at start of any line`

Quantifiers

Pattern	Description	Example
`*`	Match 0 or more times (greedy)	`ab*c matches 'ac', 'abc', 'abbc', 'abbbc'`
`+`	Match 1 or more times (greedy)	`ab+c matches 'abc', 'abbc' but not 'ac'`
`?`	Match 0 or 1 time (greedy)	`colou?r matches 'color' and 'colour'`
`{n}`	Match exactly n times	`a{3} matches 'aaa' only`
`{n,}`	Match n or more times	`a{2,} matches 'aa', 'aaa', 'aaaa', etc.`
`{n,m}`	Match between n and m times (inclusive)	`a{2,4} matches 'aa', 'aaa', 'aaaa'`
`*?`	Match 0 or more times (lazy/non-greedy)	`<.*?> matches '<b>' in '<b>bold</b>' instead of whole string`
`+?`	Match 1 or more times (lazy/non-greedy)	`a+? matches single 'a' as few times as possible`
`??`	Match 0 or 1 time (lazy/non-greedy)	`colou??r prefers 'color' over 'colour'`
`{n,m}?`	Match between n and m times (lazy)	`a{2,4}? matches 'aa' preferring fewer repetitions`
`*+`	Possessive: match 0 or more, never backtrack	`a*+b — possessive, won't give back matched a's`
`++`	Possessive: match 1 or more, never backtrack	`\d++[abc] — possessive digit matching`

Character Classes

Pattern	Description	Example
`.`	Any character except newline (by default)	`a.b matches 'axb', 'a2b', 'a b' but not 'a\nb'`
`\d`	Any digit [0-9]	`\d+ matches '123', '42', '0'`
`\D`	Any non-digit [^0-9]	`\D+ matches 'abc', 'foo bar'`
`\w`	Any word character [a-zA-Z0-9_]	`\w+ matches 'hello', 'foo_bar', 'Test123'`
`\W`	Any non-word character [^a-zA-Z0-9_]	`\W+ matches '!@#', ' ', '->'`
`\s`	Any whitespace character (space, tab, newline, etc.)	`\s+ matches spaces, tabs, newlines between words`
`\S`	Any non-whitespace character	`\S+ matches words or tokens without spaces`
`[abc]`	Character class: matches a, b, or c	`[aeiou] matches any vowel`
`[^abc]`	Negated class: matches any char except a, b, c	`[^0-9] matches any non-digit character`
`[a-z]`	Range: matches any lowercase letter a through z	`[a-z]+ matches 'hello', 'world'`
`[a-zA-Z]`	Range: matches any letter (upper or lower)	`[a-zA-Z]+ matches alphabetic strings`
`[0-9a-fA-F]`	Hexadecimal digit	`[0-9a-fA-F]{6} matches hex color like 'FF5733'`
`\p{L}`	Unicode letter (PCRE/Unicode mode)	`\p{L}+ matches letters in any language`
`\p{N}`	Unicode number	`\p{N}+ matches numeric characters including non-ASCII digits`

Groups & References

Pattern	Description	Example
`(abc)`	Capturing group — captures matched text	`(\d{4})-(\d{2})-(\d{2}) captures year, month, day`
`(?:abc)`	Non-capturing group — groups without capturing	`(?:foo\|bar)+ matches 'foo', 'bar', 'foofoo', 'foobar'`
`(?P<name>abc)`	Named capturing group (Python/PCRE syntax)	`(?P<year>\d{4}) captures year by name`
`(?<name>abc)`	Named capturing group (.NET/PCRE2 syntax)	`(?<year>\d{4}) captures year by name`
`\1`	Backreference to group 1	`(\w+) \1 matches repeated words like 'hello hello'`
`\k<name>`	Named backreference	`(?<word>\w+) \k<word> matches repeated named word`
`(?\|...)`	Branch reset group — subgroups share numbers	`(?\|(a)\|(b)) both alternatives use group 1`
`(?>abc)`	Atomic group — no backtracking inside	`(?>a\|ab)c — atomic, won't retry alternatives`
`\g{1}`	Backreference using \g syntax (PCRE)	`(\w+) \g{1} same as \1 but clearer syntax`
`\g<name>`	Recursive reference to named group	`(?<balanced>$(?:[^()]\|\g<balanced>)*$) recursive match`

Lookarounds

Pattern	Description	Example
`(?=abc)`	Positive lookahead — matches if followed by abc	`\d+(?= dollars) matches number only if followed by ' dollars'`
`(?!abc)`	Negative lookahead — matches if NOT followed by abc	`\d+(?! dollars) matches number not followed by ' dollars'`
`(?<=abc)`	Positive lookbehind — matches if preceded by abc	`(?<=\$)\d+ matches digits preceded by '$'`
`(?<!abc)`	Negative lookbehind — matches if NOT preceded by abc	`(?<!\$)\d+ matches digits NOT preceded by '$'`
`(?=.*abc)`	Lookahead that allows characters before the target	`^(?=.\d)(?=.[a-z]).{8,}$ password with digit and lowercase`
`(?<=\b)\w+`	Lookbehind at word boundary	`(?<=\bpre)\w+ matches suffix after 'pre'`
`(?=(?:...)*$)`	Lookahead for repeated pattern to end of string	`^(?=(?:\d{3})*$)\d+ divisible block of 3 digits`
`(?<!\\)"`	Match quote not preceded by backslash	`(?<!\\)" matches unescaped double quotes`

Flags / Modifiers

Pattern	Description	Example
`i`	Case-insensitive matching	`/hello/i matches 'Hello', 'HELLO', 'hello'`
`g`	Global: find all matches (not just first)	`/\d+/g finds all numbers in a string`
`m`	Multiline: ^ and $ match start/end of each line	`/^foo/m matches 'foo' at start of any line`
`s`	Dotall: . matches newline characters too	`/a.b/s matches 'a\nb' with dot matching newline`
`u`	Unicode: treat pattern and string as Unicode	`/\p{Emoji}/u matches emoji characters`
`y`	Sticky: match only from lastIndex position	`/foo/y matches 'foo' only at current position`
`x`	Extended: ignore whitespace and allow comments	`/hello # greeting/x ignores space and comment`
`(?i)`	Inline flag for case-insensitive (embedded in pattern)	`(?i)hello matches case-insensitively from that point`
`(?m)`	Inline multiline flag	`(?m)^start matches 'start' at beginning of each line`
`(?s)`	Inline dotall flag	`(?s)begin.*end matches across newlines`

Common Patterns

Pattern	Description	Example
`^[\w.-]+@[\w.-]+\.\w{2,}$`	Basic email address validation	`[email protected], [email protected]`
`^https?:\/\/[\w\-.]+(?:\.[\w\-.]+)+[\/\w\-.?=%&#]*$`	URL validation (http and https)	`https://example.com/path?q=1&r=2`
`^(?:\d{1,3}\.){3}\d{1,3}$`	IPv4 address (basic)	`192.168.1.1, 10.0.0.255`
`^([0-9A-Fa-f]{2}:){5}[0-9A-Fa-f]{2}$`	MAC address	`00:1A:2B:3C:4D:5E`
`^#?([A-Fa-f0-9]{6}\|[A-Fa-f0-9]{3})$`	Hex color code	`#FF5733 or #F57 or FF5733`
`^\+?[1-9]\d{1,14}$`	International phone number (E.164)	`+14155552671, +442071838750`
`^\d{4}-\d{2}-\d{2}$`	Date in YYYY-MM-DD format	`2024-01-31, 2000-12-25`
`^([01]\d\|2[0-3]):[0-5]\d(:[0-5]\d)?$`	Time in HH:MM or HH:MM:SS (24-hour)	`14:30, 09:05:00, 23:59:59`
`^[a-zA-Z0-9_-]{3,16}$`	Username: 3-16 chars, letters, digits, underscore, hyphen	`john_doe, user-123, MyName`
`^(?=.[a-z])(?=.[A-Z])(?=.\d)(?=.[@$!%?&])[A-Za-z\d@$!%?&]{8,}$`	Strong password: min 8 chars, upper, lower, digit, special	`P@ssw0rd!, Secure#123`
`^\d{5}(-\d{4})?$`	US ZIP code (5 digit or ZIP+4)	`90210, 10001-1234`
`^[A-Z]{2}\d{2}[A-Z0-9]{4}\d{7}([A-Z0-9]?){0,16}$`	IBAN bank account number	`GB29NWBK60161331926819`
`<([a-z][a-z0-9])\b[^>]>(.*?)<\/\1>`	Basic HTML tag with content (non-nested)	`<b>bold</b>, <span class="x">text</span>`
`^([12]\d{3}-(0[1-9]\|1[0-2])-(0[1-9]\|[12]\d\|3[01]))$`	Strict date YYYY-MM-DD with range validation	`2024-01-31, 1999-12-25`

คำถามที่พบบ่อย

นิพจน์ปกติ (regex) คือลำดับอักขระที่ใช้กำหนดรูปแบบการค้นหา ใช้สำหรับค้นหา จับคู่ และจัดการข้อความในภาษาโปรแกรม โปรแกรมแก้ไขข้อความ และเครื่องมือบรรทัดคำสั่ง เช่น pattern \d{3}-\d{4} ตรงกับรูปแบบหมายเลขโทรศัพท์อย่าง "555-1234"

ภาษาโปรแกรมส่วนใหญ่มีการรองรับ regex ในตัว ใน JavaScript ใช้ /pattern/flags กับเมธอดเช่น .test(), .match() หรือ .replace() ใน Python import โมดูล re แล้วใช้ re.search(), re.findall() หรือ re.sub() ใน PHP ใช้ preg_match(), preg_match_all() หรือ preg_replace() คัดลอก pattern จาก cheat sheet นี้แล้วนำไปใช้กับฟังก์ชันเหล่านี้ได้เลย

Quantifier แบบ greedy (*, +, {n,}) จะจับคู่ข้อความให้มากที่สุดเท่าที่ทำได้ ส่วน quantifier แบบ lazy หรือ non-greedy (*?, +?, {n,}?) จะจับคู่ให้น้อยที่สุด เช่น กับ input "ตัวหนา และ ตัวเอียง" pattern แบบ greedy <.+> จะจับคู่ทั้งสตริงตั้งแต่ < แรกถึง > สุดท้าย ในขณะที่ <.+?> แบบ lazy จะจับคู่เฉพาะ "" และแต่ละ tag แยกกัน

Lookahead (?=...) ตรวจสอบว่าสิ่งที่ตามหลังตำแหน่งปัจจุบันตรงกับ pattern โดยไม่นำไปรวมในผลลัพธ์ Lookbehind (?<=...) ตรวจสอบว่าสิ่งที่นำหน้าตรงกับ pattern เวอร์ชันลบ (?!...) และ (?

Regex ถูกนำไปใช้อย่างแพร่หลายในการ: ตรวจสอบรูปแบบการป้อนข้อมูล (อีเมล หมายเลขโทรศัพท์ รหัสไปรษณีย์), ค้นหาและแทนที่ข้อความในโปรแกรมแก้ไขโค้ด, แยกวิเคราะห์ไฟล์ log และดึงข้อมูลเฉพาะ, กำหนดเส้นทาง URL ใน web framework, ตรวจสอบฟอร์มทั้งฝั่ง client และ server และการดึงข้อมูลเว็บ ส่วน "Pattern ทั่วไป" ของ cheat sheet นี้มีตัวอย่างพร้อมใช้สำหรับหลายสถานการณ์

Regex Cheat Sheet

Anchors

Quantifiers

Character Classes

Groups & References

Lookarounds

Flags / Modifiers

Common Patterns

คำถามที่พบบ่อย

นิพจน์ปกติ (regex) คืออะไร?

ฉันจะใช้ regex pattern ในโค้ดได้อย่างไร?

Quantifier แบบ greedy และ lazy ต่างกันอย่างไร?

lookahead และ lookbehind ต่างกันอย่างไร?

การใช้งาน regex ในชีวิตจริงที่พบบ่อยที่สุดคืออะไร?