Regular Expressions

Metacharacters

  • {}[]()^$.|*+?\

Single characters

\t    Tab
\n Newline
\r Carriage return
\s Whitespace [\ \t\r\n\f] (space, tab, carriage return, newline, formfeed)
\S Non-whitespace [^\s] (inverse of \s)
\w Any word character [a-zA-Z0-9_]
\W Any non-word character [^a-zA-Z0-9_] (inverse of \w)
\d Digit [0-9]
\D Any non-digit character [^0-9] (inverse of \d)
. Matches any character but newline [^\n]
  • Can be used both inside and outside of character classes

Character class

  • Specifies a set of possible characters, rather than just a single character
  • The set of possible characters is placed within brackets []
-     Specifies a range of characters, unless it is the first or the last
character in a character class
^ In the first position of a character class, this denotes negated
character class.

Anchors

^     Match at the beginning of the string.
$ Match at the end of the string, or before a newline at the end of the string.
\b Match only at a word boundary.
\B Match only at a non-word boundary.
/^beginning/

/end$/

/^entirety$/

/anywhere/

Quantifiers

?       Match 0 or 1 times 
* Match 0 or more times (any number of times)
+ Match 1 or more times (at least once)
{n} Match exactly n times
{n, } Match n or more times (at least n times)
{n, m} Match at least n times, but not more than m times
(between n and m matches, inclusive)
  • Placed immediately after the relevant character, character class, or grouping
  • By default, each of the above will match as large an amount of the string as possible (greedy) (greedy closure)
  • For non-greedy behavior (reluctant closure), follow each of the above by a question mark

Alternation

|     Specifies a set of possible strings
/one|two/    # Matches either "one" or "two"                            

Grouping

()   Allows part of a regular expression to be treated as a single unit
  • For each grouping, the part that matched inside goes into the special variables: $1, $2, ...
  • If the result of a match is assigned to a list, a match with groupings will return a list of matched values: ($1, $2, ...)
/(pre|suf)fix/    # Matches "prefix" or "suffix"

($hours, $minutes, $second) = ($time =~ /(\d\d):(\d\d):(\d\d)/);

Resources URL: 
notes/perl/resources
Sources URL: 
notes/perl/sources

See Also