Pattern Matching
Matching variables
$1,$2, ...- Used outside of a regex
- When grouping metacharacters
()are used, the parts of a string that matched are saved in matching variables
# Replace abcdef with cd
s/ab(cd)ef/$1/
Backreferences
\1,\2, ...- Use inside a regex
- Backreferences are essentially matching variables that can be used inside a regex
# Replace abcabc with xyz
s/(abc)\1/xyz/
# Find all 3-letter words that are repeated twice with a space in-between ("the the")
/(\w\w\w)\s\1/
Matching operator
- Find a matching string
//m//
/reg-expr/
- Operates on
$_by default - Can be bound to another variable using
=~ - Returns a true value if the pattern matched; otherwise, it returns false
- Can use other delimiters when the "m"
prefix is used, such as
m||,m!!andm{} - If single quotes are
used,
m'', then the regex is treated as a single-quoted string (no substitutions are made) - In scalar context, a match /regex/ will
return the value "
1" upon a successful match, otherwise it will return the value ""
- In list context, a match /regex/ with no
groupings
()will return a list containing each successful match - A match /regex/ with groupings
()will implicitly assign each grouping of matched values to a list of the form($1, $2, ...). In list context, the groupings will also be explicitly assigned to the stated list
@-, @+
$-[0]: position of the start of the entire match$+[0]: position of the end
$-[n]: position of the start of the$nmatch$+[n]: position of the end- If
$nis undefined, so are$-[n]and$+[n]
$str =~ /orange/ # Search for "orange" in $str
$str = 'red';
m'$str'; # Matches '$str', not 'red'
# extract hours, minutes, seconds
($hours, $minutes, $second) = ($time =~ /(\d\d):(\d\d):(\d\d)/);
# extract hours, minutes, seconds
$time =~ /(\d\d):(\d\d):(\d\d)/; # Match hh:mm:ss format
$hours = $1;
$minutes = $2;
$seconds = $3;
# extract minutes
($minutes) = ($time =~ /\d\d:(\d\d):\d\d/);
'cathouse' =~ /cat$foo/; # matches
'housecat' =~ /${foo}cat/; # matches
Substitution operator
- Search and replace
s///s|||s/reg-expr/replacement-string/modifiers
- If there is a match,
s///returns the number of substitutions made, otherwise it returns false - Can use other delimiters, such as
s!!!ands{}{}, and evens{}//. If single quotes are used,s''', then the regex and replacement are treated as single quoted strings
s/apples/oranges/ # Replace the first occurrence of "apples" with "oranges"
s/cats/dogs/g # Replace all occurrences of "cats" with "dogs"
$y = "'quoted words'";
$y =~ s/^'(.*)'$/$1/; # strip single quotes; $y contains "quoted words"
# reverse all the words in a string
$x = "the cat in the hat";
$x =~ s/(\w+)/reverse $1/ge; # $x contains "eht tac ni eht tah"
# convert percentage to decimal
$x = "A 39% hit rate";
$x =~ s!(\d+)%!$1/100!e; # $x contains "A 0.39 hit rate"
Command line
- perl operates in a line-by-line manner (like sed) when run in this way
perl -pi -e "s|apple|orange|gis;" "$file" # apple ==> orange
perl -pi -e "s|([^q])q([^q])|\$1 \$2|gs;" "$file" # Remove isolated q's (aqb ==> a b)
# Backreferences (substitution variables)
perl -pi -e "s|a(p)\1le|orange|gs;" "$file"
# Matching variables
perl -pi -e "s|a(p)ple|\1|gs;" "$file"
perl -pi -e "s|a(p)ple|\$1|gs;" "$file"
Split operator
split //, STRINGsplit /reg-expr/, STRING
- The regex determines the character sequence that the string is split with respect to
- If the empty regex,
//, is used, the string is split into individual characters - If the regex has groupings, then the list produced contains the matched substrings from the groupings as well
@arr = split /\s+/, "zero one two"; # Whitespace: "zero", "one", "two"
@arr = split /,\s*/, "aa,bb, cc"; # Comma-delimited: "aa", "bb", "cc"
@arr = split //, "xyz"; # Individual characters: "a", "b", "c"
@arr = split /(:)/ "10:20"; # Groupings: "10", ":", "20"
Modifiers
Default behavior
.matches any character except\n^matches only at the beginning of the string$matches only at the end or before a newline at the end
/s
- Treats the string as a single long line
.matches any character, including\n^matches only at the beginning of the string$matches only at the end or before a newline at the end
/m
- Treats the string as a set of multiple lines
.matches any character except\n^and$are can match at the start or end of any line within the string
/sm
- Treats the string as a single long line, but detects multiple lines
.matches any character, including\n^and$can match at the start or end of any line within the string
/g
- Global (applies to all occurrences of the search pattern)
/i
- Case insensitive
/o
- Performs variable substitutions in the regex only once (useful in loops)
/x
- Allows extended regular expressions (improves the readability by
allowing whitespace and comments to be used)
Modifiers specific to matching
/c
- The search position on a failed match is not reset when /g is in effect
Modifiers specific to substitution
/e
- Evaluates the right side as an expression
Resources URL:
notes/perl/resources
Sources URL:
notes/perl/sources