Helaian Panduan Regex

Q: Apakah penggunaan regex yang paling biasa dalam dunia sebenar?

Regex digunakan secara meluas untuk: mengesahkan format input (alamat e-mel, nombor telefon, poskod), mencari dan menggantikan teks dalam penyunting kod, menghurai fail log dan mengekstrak data tertentu, penghalaan URL dalam rangka kerja web, pengesahan borang di bahagian klien dan pelayan, serta pengikisan data. Bahagian "Corak Lazim" dalam helaian panduan ini menyediakan contoh sedia guna untuk kebanyakan senario ini.

Rujukan cepat untuk ungkapan biasa — cari, salin dan guna

Anchors

Pattern	Description	Example
`^`	Start of string (or start of line in multiline mode)	`^Hello matches 'Hello world' but not 'Say Hello'`
`$`	End of string (or end of line in multiline mode)	`world$ matches 'Hello world' but not 'world peace'`
`\A`	Start of string (never matches at line breaks)	`\AHello matches only if string starts with 'Hello'`
`\Z`	End of string (before optional final newline)	`world\Z matches 'Hello world' at end`
`\z`	Absolute end of string	`end\z matches absolute end of string`
`\b`	Word boundary (between \w and \W)	`\bcat\b matches 'cat' in 'the cat sat' but not 'catch'`
`\B`	Non-word boundary	`\Bcat\B matches 'cat' in 'concatenate' but not standalone 'cat'`
`\G`	Start of current match (useful with global flag)	`\Gfoo matches consecutive 'foo' occurrences from last match`
`(?m)^`	Start of each line in multiline mode	`(?m)^line matches 'line' at start of any line`

Quantifiers

Pattern	Description	Example
`*`	Match 0 or more times (greedy)	`ab*c matches 'ac', 'abc', 'abbc', 'abbbc'`
`+`	Match 1 or more times (greedy)	`ab+c matches 'abc', 'abbc' but not 'ac'`
`?`	Match 0 or 1 time (greedy)	`colou?r matches 'color' and 'colour'`
`{n}`	Match exactly n times	`a{3} matches 'aaa' only`
`{n,}`	Match n or more times	`a{2,} matches 'aa', 'aaa', 'aaaa', etc.`
`{n,m}`	Match between n and m times (inclusive)	`a{2,4} matches 'aa', 'aaa', 'aaaa'`
`*?`	Match 0 or more times (lazy/non-greedy)	`<.*?> matches '<b>' in '<b>bold</b>' instead of whole string`
`+?`	Match 1 or more times (lazy/non-greedy)	`a+? matches single 'a' as few times as possible`
`??`	Match 0 or 1 time (lazy/non-greedy)	`colou??r prefers 'color' over 'colour'`
`{n,m}?`	Match between n and m times (lazy)	`a{2,4}? matches 'aa' preferring fewer repetitions`
`*+`	Possessive: match 0 or more, never backtrack	`a*+b — possessive, won't give back matched a's`
`++`	Possessive: match 1 or more, never backtrack	`\d++[abc] — possessive digit matching`

Character Classes

Pattern	Description	Example
`.`	Any character except newline (by default)	`a.b matches 'axb', 'a2b', 'a b' but not 'a\nb'`
`\d`	Any digit [0-9]	`\d+ matches '123', '42', '0'`
`\D`	Any non-digit [^0-9]	`\D+ matches 'abc', 'foo bar'`
`\w`	Any word character [a-zA-Z0-9_]	`\w+ matches 'hello', 'foo_bar', 'Test123'`
`\W`	Any non-word character [^a-zA-Z0-9_]	`\W+ matches '!@#', ' ', '->'`
`\s`	Any whitespace character (space, tab, newline, etc.)	`\s+ matches spaces, tabs, newlines between words`
`\S`	Any non-whitespace character	`\S+ matches words or tokens without spaces`
`[abc]`	Character class: matches a, b, or c	`[aeiou] matches any vowel`
`[^abc]`	Negated class: matches any char except a, b, c	`[^0-9] matches any non-digit character`
`[a-z]`	Range: matches any lowercase letter a through z	`[a-z]+ matches 'hello', 'world'`
`[a-zA-Z]`	Range: matches any letter (upper or lower)	`[a-zA-Z]+ matches alphabetic strings`
`[0-9a-fA-F]`	Hexadecimal digit	`[0-9a-fA-F]{6} matches hex color like 'FF5733'`
`\p{L}`	Unicode letter (PCRE/Unicode mode)	`\p{L}+ matches letters in any language`
`\p{N}`	Unicode number	`\p{N}+ matches numeric characters including non-ASCII digits`

Groups & References

Pattern	Description	Example
`(abc)`	Capturing group — captures matched text	`(\d{4})-(\d{2})-(\d{2}) captures year, month, day`
`(?:abc)`	Non-capturing group — groups without capturing	`(?:foo\|bar)+ matches 'foo', 'bar', 'foofoo', 'foobar'`
`(?P<name>abc)`	Named capturing group (Python/PCRE syntax)	`(?P<year>\d{4}) captures year by name`
`(?<name>abc)`	Named capturing group (.NET/PCRE2 syntax)	`(?<year>\d{4}) captures year by name`
`\1`	Backreference to group 1	`(\w+) \1 matches repeated words like 'hello hello'`
`\k<name>`	Named backreference	`(?<word>\w+) \k<word> matches repeated named word`
`(?\|...)`	Branch reset group — subgroups share numbers	`(?\|(a)\|(b)) both alternatives use group 1`
`(?>abc)`	Atomic group — no backtracking inside	`(?>a\|ab)c — atomic, won't retry alternatives`
`\g{1}`	Backreference using \g syntax (PCRE)	`(\w+) \g{1} same as \1 but clearer syntax`
`\g<name>`	Recursive reference to named group	`(?<balanced>$(?:[^()]\|\g<balanced>)*$) recursive match`

Lookarounds

Pattern	Description	Example
`(?=abc)`	Positive lookahead — matches if followed by abc	`\d+(?= dollars) matches number only if followed by ' dollars'`
`(?!abc)`	Negative lookahead — matches if NOT followed by abc	`\d+(?! dollars) matches number not followed by ' dollars'`
`(?<=abc)`	Positive lookbehind — matches if preceded by abc	`(?<=\$)\d+ matches digits preceded by '$'`
`(?<!abc)`	Negative lookbehind — matches if NOT preceded by abc	`(?<!\$)\d+ matches digits NOT preceded by '$'`
`(?=.*abc)`	Lookahead that allows characters before the target	`^(?=.\d)(?=.[a-z]).{8,}$ password with digit and lowercase`
`(?<=\b)\w+`	Lookbehind at word boundary	`(?<=\bpre)\w+ matches suffix after 'pre'`
`(?=(?:...)*$)`	Lookahead for repeated pattern to end of string	`^(?=(?:\d{3})*$)\d+ divisible block of 3 digits`
`(?<!\\)"`	Match quote not preceded by backslash	`(?<!\\)" matches unescaped double quotes`

Flags / Modifiers

Pattern	Description	Example
`i`	Case-insensitive matching	`/hello/i matches 'Hello', 'HELLO', 'hello'`
`g`	Global: find all matches (not just first)	`/\d+/g finds all numbers in a string`
`m`	Multiline: ^ and $ match start/end of each line	`/^foo/m matches 'foo' at start of any line`
`s`	Dotall: . matches newline characters too	`/a.b/s matches 'a\nb' with dot matching newline`
`u`	Unicode: treat pattern and string as Unicode	`/\p{Emoji}/u matches emoji characters`
`y`	Sticky: match only from lastIndex position	`/foo/y matches 'foo' only at current position`
`x`	Extended: ignore whitespace and allow comments	`/hello # greeting/x ignores space and comment`
`(?i)`	Inline flag for case-insensitive (embedded in pattern)	`(?i)hello matches case-insensitively from that point`
`(?m)`	Inline multiline flag	`(?m)^start matches 'start' at beginning of each line`
`(?s)`	Inline dotall flag	`(?s)begin.*end matches across newlines`

Common Patterns

Pattern	Description	Example
`^[\w.-]+@[\w.-]+\.\w{2,}$`	Basic email address validation	`[email protected], [email protected]`
`^https?:\/\/[\w\-.]+(?:\.[\w\-.]+)+[\/\w\-.?=%&#]*$`	URL validation (http and https)	`https://example.com/path?q=1&r=2`
`^(?:\d{1,3}\.){3}\d{1,3}$`	IPv4 address (basic)	`192.168.1.1, 10.0.0.255`
`^([0-9A-Fa-f]{2}:){5}[0-9A-Fa-f]{2}$`	MAC address	`00:1A:2B:3C:4D:5E`
`^#?([A-Fa-f0-9]{6}\|[A-Fa-f0-9]{3})$`	Hex color code	`#FF5733 or #F57 or FF5733`
`^\+?[1-9]\d{1,14}$`	International phone number (E.164)	`+14155552671, +442071838750`
`^\d{4}-\d{2}-\d{2}$`	Date in YYYY-MM-DD format	`2024-01-31, 2000-12-25`
`^([01]\d\|2[0-3]):[0-5]\d(:[0-5]\d)?$`	Time in HH:MM or HH:MM:SS (24-hour)	`14:30, 09:05:00, 23:59:59`
`^[a-zA-Z0-9_-]{3,16}$`	Username: 3-16 chars, letters, digits, underscore, hyphen	`john_doe, user-123, MyName`
`^(?=.[a-z])(?=.[A-Z])(?=.\d)(?=.[@$!%?&])[A-Za-z\d@$!%?&]{8,}$`	Strong password: min 8 chars, upper, lower, digit, special	`P@ssw0rd!, Secure#123`
`^\d{5}(-\d{4})?$`	US ZIP code (5 digit or ZIP+4)	`90210, 10001-1234`
`^[A-Z]{2}\d{2}[A-Z0-9]{4}\d{7}([A-Z0-9]?){0,16}$`	IBAN bank account number	`GB29NWBK60161331926819`
`<([a-z][a-z0-9])\b[^>]>(.*?)<\/\1>`	Basic HTML tag with content (non-nested)	`<b>bold</b>, <span class="x">text</span>`
`^([12]\d{3}-(0[1-9]\|1[0-2])-(0[1-9]\|[12]\d\|3[01]))$`	Strict date YYYY-MM-DD with range validation	`2024-01-31, 1999-12-25`

Soalan Lazim

Ungkapan biasa (regex) adalah urutan aksara yang menentukan corak carian. Regex digunakan untuk mencari, memadankan dan memanipulasi teks dalam bahasa pengaturcaraan, penyunting teks dan alat baris perintah. Sebagai contoh, corak \d{3}-\d{4} memadankan format nombor telefon seperti "555-1234".

Kebanyakan bahasa pengaturcaraan mempunyai sokongan regex terbina dalam. Dalam JavaScript, gunakan /corak/bendera dengan kaedah seperti .test(), .match() atau .replace(). Dalam Python, import modul re dan gunakan re.search(), re.findall() atau re.sub(). Dalam PHP, gunakan preg_match(), preg_match_all() atau preg_replace(). Salin mana-mana corak dari helaian panduan ini dan tampalkan terus ke dalam fungsi-fungsi ini.

Pengkuantiti tamak (*, +, {n,}) memadankan sebanyak mungkin teks. Pengkuantiti malas (tidak tamak) (*?, +?, {n,}?) memadankan sesedikit mungkin teks. Sebagai contoh, dengan input "tebal dan italik", corak tamak <.+> memadankan keseluruhan rentetan dari < pertama hingga > terakhir, manakala corak malas <.+?> hanya memadankan "" dan setiap tag secara individu.

Lookahead (?=...) mengesahkan bahawa apa yang mengikuti kedudukan semasa memadankan corak, tanpa memasukkannya dalam padanan. Lookbehind (?<=...) mengesahkan bahawa apa yang mendahului kedudukan semasa memadankan corak. Versi negatif (?!...) dan (?

Regex digunakan secara meluas untuk: mengesahkan format input (alamat e-mel, nombor telefon, poskod), mencari dan menggantikan teks dalam penyunting kod, menghurai fail log dan mengekstrak data tertentu, penghalaan URL dalam rangka kerja web, pengesahan borang di bahagian klien dan pelayan, serta pengikisan data. Bahagian "Corak Lazim" dalam helaian panduan ini menyediakan contoh sedia guna untuk kebanyakan senario ini.

Helaian Panduan Regex

Anchors

Quantifiers

Character Classes

Groups & References

Lookarounds

Flags / Modifiers

Common Patterns

Soalan Lazim

Apakah ungkapan biasa (regex)?

Bagaimana saya menggunakan corak regex dalam kod saya?

Apakah perbezaan antara pengkuantiti tamak dan malas?

Apakah perbezaan antara lookahead dan lookbehind?

Apakah penggunaan regex yang paling biasa dalam dunia sebenar?