Regular Expressions
Metacharacters
Single characters
\t Tab
\n Newline
\r Carriage return
\s Whitespace [\ \t\r\n\f] (space, tab, carriage return, newline, formfeed)
\S Non-whitespace [^\s] (inverse of \s)
\w Any word character [a-zA-Z0-9_]
\W Any non-word character [^a-zA-Z0-9_] (inverse of \w)
\d Digit [0-9]
\D Any non-digit character [^0-9] (inverse of \d)
. Matches any character but newline [^\n]
- Can be used both inside and outside of
character classes
Character class
- Specifies a set of possible characters,
rather than just a
single character
- The set of possible characters is placed
within brackets
[]
- Specifies a range of characters, unless it is the first or the last
character in a character class
^ In the first position of a character class, this denotes negated
character class.
Anchors
^ Match at the beginning of the string.
$ Match at the end of the string, or before a newline at the end of the string.
\b Match only at a word boundary.
\B Match only at a non-word boundary.
/^beginning/
/end$/
/^entirety$/
/anywhere/
Quantifiers
? Match 0 or 1 times
* Match 0 or more times (any number of times)
+ Match 1 or more times (at least once)
{n} Match exactly n times
{n, } Match n or more times (at least n times)
{n, m} Match at least n times, but not more than m times
(between n and m matches, inclusive)
- Placed immediately after the relevant
character, character
class, or grouping
- By default, each of the above will
match as large an amount
of the string as possible (greedy) (greedy closure)
- For non-greedy behavior (reluctant
closure), follow each of
the above by a question mark
Alternation
| Specifies a set of possible strings
/one|two/ # Matches either "one" or "two"
Grouping
() Allows part of a regular expression to be treated as a single unit
- For each grouping, the part that
matched inside goes into the
special variables:
$1, $2, ...
- If the result of a match is assigned to
a list, a match with
groupings will return a list of matched values:
($1, $2, ...)
/(pre|suf)fix/ # Matches "prefix" or "suffix"
($hours, $minutes, $second) = ($time =~ /(\d\d):(\d\d):(\d\d)/);