sed: Back-references and Subexpressions
5.7 Back-references and Subexpressions
======================================
"back-references" are regular expression commands which refer to a
previous part of the matched regular expression. Back-references are
specified with backslash and a single digit (e.g. '\1'). The part of
the regular expression they refer to is called a "subexpression", and is
designated with parentheses.
Back-references and subexpressions are used in two cases: in the
regular expression search pattern, and in the REPLACEMENT part of the
's' command (⇒Regular Expression Addresses Regexp Addresses. and
⇒The "s" Command).
In a regular expression pattern, back-references are used to match
the same content as a previously matched subexpression. In the
following example, the subexpression is '.' - any single character
(being surrounded by parentheses makes it a subexpression). The
back-reference '\1' asks to match the same content (same character) as
the sub-expression.
The command below matches words starting with any character, followed
by the letter 'o', followed by the same character as the first.
$ sed -E -n '/^(.)o\1$/p' /usr/share/dict/words
bob
mom
non
pop
sos
tot
wow
Multiple subexpressions are automatically numbered from
left-to-right. This command searches for 6-letter palindromes (the
first three letters are 3 subexpressions, followed by 3 back-references
in reverse order):
$ sed -E -n '/^(.)(.)(.)\3\2\1$/p' /usr/share/dict/words
redder
In the 's' command, back-references can be used in the REPLACEMENT
part to refer back to subexpressions in the REGEXP part.
The following example uses two subexpressions in the regular
expression to match two space-separated words. The back-references in
the REPLACEMENT part prints the words in a different order:
$ echo "James Bond" | sed -E 's/(.*) (.*)/The name is \2, \1 \2./'
The name is Bond, James Bond.
When used with alternation, if the group does not participate in the
match then the back-reference makes the whole match fail. For example,
'a(.)|b\1' will not match 'ba'. When multiple regular expressions are
given with '-e' or from a file ('-f FILE'), back-references are local to
each expression.