Which Special Characters Must Be Escaped in Regular Expressions?
In most regular expression engines (PCRE, JavaScript, Python, Go, and Java), these special characters must be escaped outside of character classes.
Join the DZone community and get the full member experience.
Join For FreeIn most regular expression engines (PCRE, JavaScript, Python, Go, and Java), these special characters must be escaped outside of character classes:
[ * + ? { . ( ) ^ $ | \
If you want to find one of these metacharacters literally, please add \
before it. For example, to find the text $100
, use \$100
. If you want to find the backslash itself, double it: \\
.
Inside character classes [square brackets], you must escape the following characters:
\ ] -
For example, to find an opening or a closing bracket, use [[\]]
.
If you need to include the dash into a character class, you can make it the first or the last character instead of escaping it. Use [a-z-]
or [a-z\-]
to find a Latin letter or a dash.
If you need to include the caret ^ into a character class, it cannot be the first character; otherwise, it will be interpreted as any character except the specified ones. For example: [^aeiouy]
means "any character except vowels," while [a^eiouy]
means "any vowel or a caret." Alternatively, you can escape the caret: [\^aeiouy]
JavaScript
In JavaScript, you also need to escape the slash /
in regular expression literals:
/AC\/DC/.test('AC/DC')
Lone closing brackets ]
and }
are allowed by default, but if you use the 'u' flag, then you must escape them:
/]}/.test(']}') // true /]}/u.test(']}') // throws an exception
This feature is specific for JavaScript; lone closing brackets are allowed in other languages.
If you create a regular expression on the fly from a user-supplied string, you can use the following function to properly escape the special characters:
function escapeRe(str) { return str.replace(/[[\]*+?{}.()^$|\\-]/g, '\\$&'); }
var re = new RegExp(escapeRe(start) + '.*?' + escapeRe(end));
PHP
In PHP, you have the preg_quote
function to insert a user-supplied string into a regular expression pattern. In addition to the characters listed above, it also escapes #
(in 7.3.0 and higher), the null terminator, and the following characters: = ! < > : -
, which do not have a special meaning in PCRE regular expressions but are sometimes used as delimiters. Closing brackets ]
and }
are escaped, too, which is unnecessary:
preg_match('/]}/', ']}'); // returns 1
Just like in JavaScript, you also need to escape the delimiter, which is usually /
, but you can use another special character such as #
or =
if the slash appears inside your pattern:
if (preg_match('/\/posts\/([0-9]+)/', $path, $matches)) { }
// Can be simplified to: if (preg_match('#/posts/([0-9]+)#', $path, $matches)) { }
Note that preg_quote
does not escape the tilde ~
and the slash /
, so you should not use them as delimiters if you construct regexes from strings.
In double quotes, \1
and $
are interpreted differently than in regular expressions, so the best practice is:
- to use single quotes with preg_match, preg_replace, etc.;
- to repeat backslash 4 times if you need to match a literal backslash. This is because you need to escape the backslash in the regular expression, but you also need to escape it in the single-quoted string. So it's escaped twice:
$text = 'C:\\Program files\\'; echo $text; if (preg_match('/C:\\\\Program files\\\\/', $text, $matches)) { print_r($matches); }
Python
Python has a raw string syntax (r''
), which conveniently avoids the backslash escaping idiosyncrasies of PHP:
import re re.match(r'C:\\Program files/Tools', 'C:\\Program files/Tools')
You only need to escape the quote in raw strings:
re.match(r'\'', "'") re.match(r"'", "'") // or just use double quotes if you have a regex with a single quote
re.match(r"\"", '"') re.match(r'"', '"') // or use single quotes if you have a regex with a double quote
re.match(r'"\'', '"\'') // multiple quote types; cannot avoid escaping them
A raw string literal cannot end with a single backslash, but this is not a problem for a valid regular expression.
To match a literal ]
inside a character class, you can make it the first character: [][]
matches a closing or an opening bracket. Aba Search & Replace supports this syntax, but other programming languages do not. You can also quote the ]
character with a slash, which works in all languages: [\][]
or [[\]]
.
For inserting a string into a regular expression, Python offers the re.escape method. Unlike JavaScript with the u
flag, Python tolerates escaping non-special punctuation characters, so this function also escapes -
, #
, &
, and ~
:
print(re.escape(r'-#&~')) // prints \-\#\&\~ re.match(r'\@\~', '@~') // matches
Java
Java allows escaping non-special punctuation characters, too:
Assert.assertTrue(Pattern.matches("\\@\\}\\] }]", "@}] }]"));
Similarly to PHP, you need to repeat the backslash character 4 times, but in Java, you also must double the backslash character when escaping other characters:
Assert.assertTrue(Pattern.matches("C:\\\\Program files \\(x86\\)\\\\", "C:\\Program files (x86)\\"));
This is because the backslash must be escaped in a Java string literal, so if you want to pass \\ \[
to the regular expression engine, you need to double each backslash: "\\\\ \\["
. There are no raw string literals in Java, so regular expressions are just usual strings.
There is the Pattern.quote method for inserting a string into a regular expression. It surrounds the string with \Q
and \E
, which escapes multiple characters in Java regexes (borrowed from Perl). If the string contains \E
, it will be escaped with the backslash \
:
Assert.assertEquals("\\Q()\\E", Pattern.quote("()"));
Assert.assertEquals("\\Q\\E\\\\E\\Q\\E", Pattern.quote("\\E"));
Assert.assertEquals("\\Q(\\E\\\\E\\Q)\\E", Pattern.quote("(\\E)"));
The \Q...\E
syntax is another way to escape multiple special characters that you can use. Besides Java, it's supported in PHP/PCRE and Go regular expressions, but not in Python nor in JavaScript.
Go
Go raw string literals are characters between backquotes: `\(`
. It's preferable to use them for regular expressions because you don't need to double-escape the backslash:
r := regexp.MustCompile(`\(text\)`) fmt.Println(r.FindString("(text)"))
A backquote cannot be used in a raw string literal, so you have to resort to the usual "`"
string syntax for it. But this is a rare character.
The \Q...\E
syntax is supported, too:
r := regexp.MustCompile(`\Q||\E`) fmt.Println(r.FindString("||"))
There is a regexp.QuoteMeta method for inserting strings into a regular expression. In addition to the characters listed above, it also escapes closing brackets ]
and }
.
Published at DZone with permission of Peter Kankowski. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments