Cheat Sheet – Regular Expression

A quick reference guide for regular expressions, including symbols, ranges, grouping, assertions and some sample patterns to get you started.

Metacharacters Defined

MChar Definition
^ Start of a string.
$ End of a string.
. Any character (except \n newline)
| Alternation.
{…} Explicit quantifier notation.
[…] Explicit set of characters to match.
(…) Logical grouping of part of an expression.
* 0 or more of previous expression.
+ 1 or more of previous expression.
? 0 or 1 of previous expression; also forces minimal matching when an expression might match several strings within a search string.
\ Preceding one of the above, it makes it a literal instead of a special character. Preceding a special matching character, see below.

Metacharacter Examples

Pattern Sample Matches
^abc abc, abcdefg, abc123, …
abc$ abc, endsinabc, 123abc, …
a.c abc, aac, acc, adc, aec, …
bill|ted ted, bill
ab{2}c abbc
a[bB]c abc, aBc
(abc){2} abcabc
ab*c ac, abc, abbc, abbbc, …
ab+c abc, abbc, abbbc, …
ab?c ac, abc
a\sc a c

Character Escapes (http://tinyurl.com/5wm3wl)

Escaped Char Description
ordinary characters Characters other than . $ ^ { [ ( | ) ] } * + ? \ match themselves.
\a Matches a bell (alarm) \u0007.
\b Matches a backspace \u0008 if in a []; otherwise matches a word boundary (between \w and \W characters).
\t Matches a tab \u0009.
\r Matches a carriage return \u000D.
\v Matches a vertical tab \u000B.
\f Matches a form feed \u000C.
\n Matches a new line \u000A.
\e Matches an escape \u001B.
40 Matches an ASCII character as octal (up to three digits); numbers with no leading zero are backreferences if they have only one digit or if they correspond to a capturing group number. (For more information, see Backreferences.) For example, the character 40 represents a space.
\x20 Matches an ASCII character using hexadecimal representation (exactly two digits).
\cC Matches an ASCII control character; for example \cC is control-C.
\u0020 Matches a Unicode character using a hexadecimal representation (exactly four digits).
\* When followed by a character that is not recognized as an escaped character, matches that character. For example, \* is the same as \x2A.

Character Classes (http://tinyurl.com/5ck4ll)

Char Class Description
. Matches any character except \n. If modified by the Singleline option, a period character matches any character. For more information, see Regular Expression Options.
[aeiou] Matches any single character included in the specified set of characters.
[^aeiou] Matches any single character not in the specified set of characters.
[0-9a-fA-F] Use of a hyphen (–) allows specification of contiguous character ranges.
\p{name} Matches any character in the named character class specified by {name}. Supported names are Unicode groups and block ranges. For example, Ll, Nd, Z, IsGreek, IsBoxDrawing.
\P{name} Matches text not included in groups and block ranges specified in {name}.
\w Matches any word character. Equivalent to the Unicode character categories [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \w is equivalent to [a-zA-Z_0-9].
\W Matches any nonword character. Equivalent to the Unicode categories [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \W is equivalent to [^a-zA-Z_0-9].
\s Matches any white-space character. Equivalent to the Unicode character categories [\f\n\r\t\v\x85\p{Z}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \s is equivalent to [ \f\n\r\t\v].
\S Matches any non-white-space character. Equivalent to the Unicode character categories [^\f\n\r\t\v\x85\p{Z}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \S is equivalent to [^ \f\n\r\t\v].
\d Matches any decimal digit. Equivalent to \p{Nd} for Unicode and [0-9] for non-Unicode, ECMAScript behavior.
\D Matches any nondigit. Equivalent to \P{Nd} for Unicode and [^0-9] for non-Unicode, ECMAScript behavior.

http://regexlib.com/CheatSheet.aspx

http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet/

4 thoughts on “Cheat Sheet – Regular Expression

  1. Học Regular Expression thì ko cần hiểu nhiều mà chỉ cần nhớ, tuy nhiên rất dễ quên vì vậy cần có cheet sheet để tham khảo khi cần. Những pattern Regular Expression thường được tạo ra sẵn và lưu trữ để sử dụng, mình mà ngồi mò viết lại cũng khá lâu đấy.

    Trả lời
  2. Mặc dù có thể có vài điểm khác nhau nhưng cú pháp tiêu chuẩn của regular expression là giống nhau với bất kì ngôn gữ lập trình nào. Do đó chỉ cần hiểu regular expression là có thể áp dụng được trong bất kì ngôn ngữ lập trình nào.

    Trả lời

Gửi phản hồi

Mời bạn điền thông tin vào ô dưới đây hoặc kích vào một biểu tượng để đăng nhập:

WordPress.com Logo

Bạn đang bình luận bằng tài khoản WordPress.com Log Out / Thay đổi )

Twitter picture

Bạn đang bình luận bằng tài khoản Twitter Log Out / Thay đổi )

Facebook photo

Bạn đang bình luận bằng tài khoản Facebook Log Out / Thay đổi )

Google+ photo

Bạn đang bình luận bằng tài khoản Google+ Log Out / Thay đổi )

Connecting to %s