Regex: Dos and Don'ts
Join the DZone community and get the full member experience.
Join For FreeIn this article, I will outline the use of regex in security, what can go wrong when regexes are not set up properly, and how to avoid/troubleshoot issues that can arise when using regex as a security measure.
Regex for Security
Regular expressions play a big role in the security world. They are used as a security measure across multiple layers of a corporation’s infrastructure. Below are a few of the most common use cases of regex for security.
Firewall Rules
Developers use regex to fine-tune how firewalls behave. For example, you can use regex to create rules to block requests to certain file types, from certain IP addresses, or from certain user-agents that are known to be malicious.
User Input Validation
Another important use case for regex patterns is validating user input. When an application accepts user input, it opens its doors to a wide range of potential vulnerabilities, like XSS, open redirect, and SQL injection. Regex is used to filter and sanitize user input as a defense mechanism against these attacks.
Malware Detection
Lastly, regex is often used to customize the behavior of malware detectors. System administrators can use regex rules to detect potentially dangerous content in files and to quarantine these files accordingly.
Faulty Regexes
Since regex is so prevalent as a security measure, incorrectly deployed regex patterns have the potential to impact many different aspects of a system. So, what can go wrong with these regex patterns? Faulty regex patterns that lead to vulnerabilities are often patterns that fail to consider one or multiple edge cases. This happens a lot in public-facing web applications and leads to a significant number of newly discovered vulnerabilities. Defending a system is a lot harder than attacking it. Often, all an attacker needs to compromise an application is to find a single user input that is incorrectly validated!
Web Vulnerabilities Caused by Faulty Regexes
In web applications, regexes are often used to filter and sanitize potentially malicious user input. When these regexes are composed incorrectly, the protection fails and gives hackers a chance to attack the application. Here are a few real-life vulnerabilities that are caused by faulty regexes (the name of the websites are replaced with “examplesite.com”).
SSRF Protection via a Blacklist
SSRF, or Server Side Request Forgery, is a vulnerability that happens when an attacker is able to send requests on behalf of a server. It allows attackers to “forge” the request signatures of the vulnerable server, therefore assuming a privileged position on a network, bypassing firewall controls, and gaining access to internal services. You can learn more about SSRFs here: Intro to SSRF. examplesite.com allows users to load content from external domains via the “img” URL parameter. A legitimate image request would look like this:
examplesite.com/load?img=https://images.com/puppies.png
The website prevents SSRF by rejecting img
parameters that contain certain URLs in a blacklist. The regex used looks like this:
^http?://(127\.|10\.|192\.168\.).*$
This regex pattern checks all user input against a blacklist of local IP addresses and rejects the request if they match. The problem is that the website fails to consider another possible case of a local IP address: “0.0.0.0”, which can be used to refer to the local machine. So, the protection can be bypassed by using the request:
examplesite.com/load?img=https://0.0.0.0
Open Redirect Filter Bypass
Here’s another example of a faulty regex leading to a vulnerability. The application protects against open redirect by requiring URLs to fit the following two criteria: - The URL must contain “examplesite.com/”, - and it should end with an image related extension, such as jpg and png. The faulty regex looks like this:
^.*examplesite\.com\/.*(jpg|jpeg|png)$
The issue with this regex is that it is too permissive and allows for too much flexibility in user input (the two .*
in the pattern will match with any number of any character). This open redirect filter can be bypassed by a URL like this:
https://attackersite.com?examplesite.com/abc.png
Regex Safety Best Practices
So, how can developers prevent these mistakes from happening? Regex safety is hard. It is difficult to consider all the cases you’ll need to check for, and you never know what creative ideas hackers are going to come up with! But, it is possible to minimize the potential for attack by following a few regex best practices.
Be Strict!
First, be strict when validating user input. When in doubt, use a whitelist instead of a blacklist when filtering for file types, IP addresses, user-agents, and more. Reduce unnecessary flexibility for predictable user input. For example, when the user is asked to input their age, only numerics should be allowed, and the number should not be too large. The length of any user input should also be checked. Strict input validation practices like this might seem like overkill, but it saves you the worry of a variety of potential vulnerabilities.
Don’t Publish Regex Patterns
The second tip is to avoid exposing regex patterns online. Sometimes applications publish their regex patterns because the project is open-sourced, or accidentally expose them because they use the same patterns in both client-side code and server-side code. This makes it easier for attackers to find security holes in the regex pattern and exploit them.
Use Validated Patterns
Ideally, you should avoid writing your own regex patterns for common use-cases (like username, password validation, and comment boxes). Instead, find validated and secured regex patterns online. These patterns have been vetted and have stood the test of time, so they are often better than custom written regex patterns.
Defense-in-depth
In addition to using safe regex, employ defense-in-depth measures. Defense-in-depth means that you do not use a single protection mechanism and instead use multiple layers of protection to prevent attacks. For example, in addition to rigorous input validation, you can use prepared-statements, the principle of least privileges, and hashed passwords to minimize the impact of a potential SQL injection.
Fuzz Testing
Finally, rigorously test your application by supplying it with illegal and unexpected inputs to verify that your regexes are doing their jobs.
Regex Security Resources
Here are a few resources to help you secure your regex patterns.
OWASP Validation Regex Repository
The OWASP Validation Regex Repository is a database of validated and tested regex patterns that you can use. Here, you can find a variety of patterns that could be used to validate usernames, emails, IPs, credit card numbers, and more. Using these regex patterns is a good idea as they are strict validation patterns that don’t allow for most potentially dangerous inputs. Additionally, if you can’t find the patterns you need in the repository, search for them in here:
The Regular Expression Library is an even larger database of already written regex patterns that you can use.
If you need to write your own patterns, consult the OWASP input validation cheatsheet for a few things that you need to consider to make sure that your regexes are safe.
Lastly, remember to always test your regexes against illegal input, regardless of where your patterns come from!
Published at DZone with permission of Vickie Li, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments