sed: Escapes
5.8 Escape Sequences - specifying special characters
====================================================
Until this chapter, we have only encountered escapes of the form '\^',
which tell 'sed' not to interpret the circumflex as a special character,
but rather to take it literally. For example, '\*' matches a single
asterisk rather than zero or more backslashes.
This chapter introduces another kind of escape(1)--that is, escapes
that are applied to a character or sequence of characters that
ordinarily are taken literally, and that 'sed' replaces with a special
character. This provides a way of encoding non-printable characters in
patterns in a visible manner. There is no restriction on the appearance
of non-printing characters in a 'sed' script but when a script is being
prepared in the shell or by text editing, it is usually easier to use
one of the following escape sequences than the binary character it
represents:
The list of these escapes is:
'\a'
Produces or matches a BEL character, that is an "alert" (ASCII 7).
'\f'
Produces or matches a form feed (ASCII 12).
'\n'
Produces or matches a newline (ASCII 10).
'\r'
Produces or matches a carriage return (ASCII 13).
'\t'
Produces or matches a horizontal tab (ASCII 9).
'\v'
Produces or matches a so called "vertical tab" (ASCII 11).
'\cX'
Produces or matches 'CONTROL-X', where X is any character. The
precise effect of '\cX' is as follows: if X is a lower case letter,
it is converted to upper case. Then bit 6 of the character (hex
40) is inverted. Thus '\cz' becomes hex 1A, but '\c{' becomes hex
3B, while '\c;' becomes hex 7B.
'\dXXX'
Produces or matches a character whose decimal ASCII value is XXX.
'\oXXX'
Produces or matches a character whose octal ASCII value is XXX.
'\xXX'
Produces or matches a character whose hexadecimal ASCII value is
XX.
'\b' (backspace) was omitted because of the conflict with the
existing "word boundary" meaning.
5.8.1 Escaping Precedence
-------------------------
GNU 'sed' processes escape sequences _before_ passing the text onto the
regular-expression matching of the 's///' command and Address matching.
Thus the follwing two commands are equivalent ('0x5e' is the hexadecimal
ASCII value of the character '^'):
$ echo 'a^c' | sed 's/^/b/'
ba^c
$ echo 'a^c' | sed 's/\x5e/b/'
ba^c
As are the following ('0x5b','0x5d' are the hexadecimal ASCII values
of '[',']', respectively):
$ echo abc | sed 's/[a]/x/'
Xbc
$ echo abc | sed 's/\x5ba\x5d/x/'
Xbc
However it is recommended to avoid such special characters due to
unexpected edge-cases. For example, the following are not equivalent:
$ echo 'a^c' | sed 's/\^/b/'
abc
$ echo 'a^c' | sed 's/\\\x5e/b/'
a^c
---------- Footnotes ----------
(1) All the escapes introduced here are GNU extensions, with the
exception of '\n'. In basic regular expression mode, setting
'POSIXLY_CORRECT' disables them inside bracket expressions.