sed

[<command> | ] sed [-e "<sed-script>"] [-f "<script-file>"] [input-file]* [ | <command>] 
  • sed is a line-by-line editor
  • Reads the current line from the input stream, removes the trailing newline,  places it in the pattern space, and then runs the commands
  • Can concatenate consecutive lines in the pattern space (each line will be separated by \n):
For Unix-formatted files:
- A single blank line: '^$'
- Two consecutive blank lines (concatenated): '^\n$'
- Three consecutive blank lines (concatenated): '^\n\n$'
  • When the commands are finished, unless the -n option is used, the pattern space is printed out to the output stream, with the trailing newline restored
  • By default, the pattern space is deleted after each line
  • The hold space retains its data
  • By default, if the -e or -f option is not used, the first non-option argument is run as a sed script
  • Typically, piping is used for input and output
  • A trailing newline character (Unix) at the end of the last line of a file (a non-empty line)  does not start a new line; it simply terminates the last line of the file
  • In some editors, when <enter> is pressed in a text editor at the end of the last line in the file (a non-empty line),  it does not add a trailing blank line; the last line is still the same line  (only now it ends with a newline character (Unix)) ; it may appear so for such editors that allow the cursor to be placed below the last line
  • Using '...' instead of "..." prevents the Unix shell from expanding $ and `...`

Terms

Pattern space
  • The current line (or concatenated set of lines), minus the trailing newline
Hold space
  • A persistent buffer
Cycle
  • Advancing to the next line of the file (which initially is the first line) and applying the script from the beginning

Syntax

command             Run the command on the pattern space
command;command; Run multiple commands on the pattern space

ADDRESS command Run the command if the pattern space matches ADDRESS (the separating space is usually optional)
ADDRESS!command Run the command if the pattern space does not match ADDRESS

ADDRESS {commands} Run the commands if the pattern space matches ADDRESS (the separating space is usually optional)
ADDRESS!{commands} Run the command if the pattern space does not match ADDRESS

ADDRESS {commands};ADDRESS {commands} Run multiple conditional commands on the pattern space
  • The ADDRESS is essentially an if statement, used to determine if the command should be applied to the pattern space:
/one/ command   # if (/one/) command
/one/!command # if (!/one/) command
Regular expressions
/regex/     Address range regular expression
^ Start of file
$ End of file

/one/ Lines that contain "one"
/^$/ Empty line
/./ Non-empty line

\%regex% Use different delimiter
\|regex| Use different delimiter

/regex/I Case insensitive
  • Backreference and matching variable (respectively): s|(ab)\1|\1|
  • Possibly not supported: {n}
Line number
n   Line number
1 Line 1
$ Last line
Line range
START,END     First line that matches START, to first subsequent line that matches END (inclusive)

1,5 Lines 1 through 5
5,$ Line 4 through last line

/one/,/two/ First line that contains "one" to first line that contains "two"
/one/,15 First line that contains "one" to line 15

Commands

:LABEL    Specify the location of LABEL for branch commands

bLABEL Unconditionally branch to LABEL (goto, jump) (remains in the current cycle; pattern space unchanged)

tLABEL Branch to LABEL only if there has been a successful substitution
since the last input line was read or conditional branch was taken

{ COMMANDS } A group of commands may be enclosed between { and } characters.
Allows a group of commands to be triggered by a single address (or address-range) match.
Basic commands
q           Quit, printing the current pattern space by default (unless -n is used)

-e '...' Run the following commands
-n Only prints out lines explicitly requested using "p" (by default, the entire
pattern space is printed)
Regular expressions
-r          Extended regular expressions (GNU extension) (requires fewer regex characters
to be escaped) (default: basic regular expressions)

Basic regular expressions (default)
Must escape the following: ?, (, ), +, |, {, }

Extended regular expressions (-r)
None of the above characters must be escaped, unless they are meant
to be used as literal characters
Pattern space commands
  • [nN]ext, [dD]elete, [pP]rint
  • NDP are the multi-line equivalents of ndp
d       Delete the pattern space; immediately start next cycle
D Delete text in the pattern space up to the first newline (first line in the pattern space)

n Jump to the next line (applying any additional commands to that line)
N Add a newline to the pattern space, then append the next line of input

p Print the pattern space (to standard output) (used in conjunction with -n)
P Print the pattern space up to the first newline (first line in the pattern space)
Hold space commands
  • [hH]old, [gG]et, [x]change
  • HG are the multi-line equivalents of hg
h       Replace the hold space with the pattern space
H Append a newline to the hold space, and then append the pattern space to the hold space

g Replace the pattern space with the hold space
G Append a newline to the pattern space, and then append the hold space to the pattern space

x Exchange the hold and pattern spaces
Substitution
s/search/replace/flags

- Flags
I, i Case insensitive
g Global (apply to all matches)
M, m Multi-line (allows ^ and $ to match for individual lines)
(\` and \' will always match the beginning or end of the buffer)

e Pipe input from a shell command (trailing newline is suppressed)

NUMBER Only replace the NUMBERth match
p Print the substitution made
w FILE Write the result (if modified) to FILE

Sample commands

    1d          Delete the first line
$d Delete the last line
1,5d Delete the first 5 lines
n;n;d; Delete every third line (skip skip delete ...)

q Print the first line
5q Print the first 5 lines
-n $p Print the last line

-n /regex/p Print lines that match regex
/regex/!d Print lines that match regex

Examples

File names
path=`echo "$file" | sed -e "s|/\?[^/]*$||g"`               # Extract <path> from <path>/<filename>
filename=`echo "$file" | sed -e "s|^.*/\([^/]*$\)|\1|g"` # Extract <filename> from <path>/<filename>

path_is_absolute=`echo "$file" | sed -e "s|^/.*$|true|g"` # "true", if the path starts with "/"

file_ext=`echo "$file" | sed -e "s|^.*\.\([^.]*\)$|\1|g"` # Extract <ext> from [<path>/]<base>.<ext>
file_base=`echo "$filename" | sed -e "s|\.\([^.]*\)$||g"` # Extract <base> from <base>.<ext>

number=`echo $file | sed -re "s|([0-9]*).*|\1|g"` # Extract the leading number from the filename
is_file_group=`echo "$data" | sed -e "s|^[ \t]*\[.*\][\t]*$|true|g"`      # [text]
file_group_name=`echo "$data" | sed -e "s|^[ \t]*\[||g;s|\][ \t]*$||g"` # text
Trailing blank lines
# Identify files with a trailing blank line
#
# Description:
# - If the last line of the file is a blank line, the sed command will output "found".
# - Print the file name for each such file.
# - Supports Unix-, DOS-, and Mac-formatted files.
#
# Regular expressions for matching a trailing blank line in the pattern space:
# - Unix: /^$/
# - DOS: /^\r+$/
# - Mac: /\r\r$/
#
for file in $(find -name "*.txt"); do
result=$(cat $file | sed -nre '/(^\r*$)|(\r\r$)/ {s/.*/found/;$p}');
if [ "$result" = "found" ]; then
echo $file;
fi;
done;
# Delete trailing blank lines
#
# Description:
# - Concatenate each set of consecutive blank lines in the pattern space.
# - When the end of the file is reached, delete the pattern space
# (which will delete the trailing blank lines, if there were any).
# - This command will overwrite all of the files, even those without trailing blank lines.
#
# - The regular expression matches any single blank line in the file. It also matches any number
# of blank lines that have been concatenated in the pattern space.
#
# - Supports Unix-, DOS-, and Mac-formatted files.
#
# Regular expressions for matching concatenated trailing blank lines in the pattern space:
# - Unix: /^\n*$/
# - DOS: /^\r(\r\n)*$/
# - Mac: /\r{2,}$/
#
for file in `find -name "*.txt"`; do
cat $file | sed -re '/\r{2,}$/ {$s/\r*$/\r/}; :a /^[\r\n]*$/ {$d;N;ba}' > ~/sed.tmp;
mv ~/sed.tmp $file;
done;
# Identify files with trailing blank lines, and delete the offending lines from those files.
#
# Description:
# - Supports Unix-, DOS-, and Mac-formatted files.
#
files=`for file in $(find -name "*.txt"); do
result=$(cat $file | sed -nre '/(^\r*$)|(\r\r$)/ {s/.*/found/;$p}');
if [ "$result" = "found" ]; then
echo $file;
fi;
done`;
for file in $files; do
echo $file;
cat $file | sed -re '/\r{2,}$/ {$s/\r*$/\r/}; :a /^[\r\n]*$/ {$d;N;ba}' > ~/sed.tmp;
mv ~/sed.tmp $file;
done;