sed: Regexp Addresses
4.3 selecting lines by text matching
====================================
GNU 'sed' supports the following regular expression addresses. The
default regular expression is ⇒Basic Regular Expression (BRE) BRE
syntax. If '-E' or '-r' options are used, The regular expression should
be in ⇒Extended Regular Expression (ERE) ERE syntax. syntax.
⇒BRE vs ERE.
'/REGEXP/'
This will select any line which matches the regular expression
REGEXP. If REGEXP itself includes any '/' characters, each must be
escaped by a backslash ('\').
The following command prints lines in '/etc/passwd' which end with
'bash'(1):
sed -n '/bash$/p' /etc/passwd
The empty regular expression '//' repeats the last regular
expression match (the same holds if the empty regular expression is
passed to the 's' command). Note that modifiers to regular
expressions are evaluated when the regular expression is compiled,
thus it is invalid to specify them together with the empty regular
expression.
'\%REGEXP%'
(The '%' may be replaced by any other single character.)
This also matches the regular expression REGEXP, but allows one to
use a different delimiter than '/'. This is particularly useful if
the REGEXP itself contains a lot of slashes, since it avoids the
tedious escaping of every '/'. If REGEXP itself includes any
delimiter characters, each must be escaped by a backslash ('\').
The following commands are equivalent. They print lines which
start with '/home/alice/documents/':
sed -n '/^\/home\/alice\/documents\//p'
sed -n '\%^/home/alice/documents/%p'
sed -n '\;^/home/alice/documents/;p'
'/REGEXP/I'
'\%REGEXP%I'
The 'I' modifier to regular-expression matching is a GNU extension
which causes the REGEXP to be matched in a case-insensitive manner.
In many other programming languages, a lower case 'i' is used for
case-insensitive regular expression matching. However, in 'sed'
the 'i' is used for the insert command (⇒insert command).
Observe the difference between the following examples.
In this example, '/b/I' is the address: regular expression with 'I'
modifier. 'd' is the delete command:
$ printf "%s\n" a b c | sed '/b/Id'
a
c
Here, '/b/' is the address: a regular expression. 'i' is the
insert command. 'd' is the value to insert. A line with 'd' is
then inserted above the matched line:
$ printf "%s\n" a b c | sed '/b/id'
a
d
b
c
'/REGEXP/M'
'\%REGEXP%M'
The 'M' modifier to regular-expression matching is a GNU 'sed'
extension which directs GNU 'sed' to match the regular expression
in 'multi-line' mode. The modifier causes '^' and '$' to match
respectively (in addition to the normal behavior) the empty string
after a newline, and the empty string before a newline. There are
special character sequences ('\`' and '\'') which always match the
beginning or the end of the buffer. In addition, the period
character does not match a new-line character in multi-line mode.
Regex addresses operate on the content of the current pattern space.
If the pattern space is changed (for example with 's///' command) the
regular expression matching will operate on the changed text.
In the following example, automatic printing is disabled with '-n'.
The 's/2/X/' command changes lines containing '2' to 'X'. The command
'/[0-9]/p' matches lines with digits and prints them. Because the
second line is changed before the '/[0-9]/' regex, it will not match and
will not be printed:
$ seq 3 | sed -n 's/2/X/ ; /[0-9]/p'
1
3
---------- Footnotes ----------
(1) There are of course many other ways to do the same, e.g.
grep 'bash$' /etc/passwd
awk -F: '$7 == "/bin/bash"' /etc/passwd