How to Parse and Standardize Street/Postal Addresses
Learn how to parse and standardize addresses with different tools, including regex, npm packages, online validators, and geocoding APIs.
Join the DZone community and get the full member experience.
Join For FreeFor any apps or websites that work with addresses, it's necessary for these addresses to be validated and parsed, as well as standardized and verified. There are various mechanisms that are best suited to different projects, so figuring out what you need exactly isn’t always that easy.
What Problems Appear Around Parsing and Standardization?
There are three primary issues that often occur in the parsing and standardization process.
- In general, addresses are not regular. They can be a short string in a certain format or a large fragment written in a very specific way. Also, one abbreviation can mean more than one word. Most challenging of all, there is no versatile open-code to divide and standardize all of this.
- There are many different ways to write an address. Some people enter street names and house numbers, while others use zip codes, post office boxes, etc. With variable punctuation, a parsing mechanism must be strong to deal with it.
- Countries and regions have different address formats, and this makes the address parsing task again more complicated.
Ways to Parse and Standardize Postal Addresses
Keeping in mind the three main difficulties, you'll now need to choose a suitable method. Here are some of the most popular technologies, from simple ones to the most complicated and versatile.
Regex
This is the easiest solution for situations when you have only regular form addresses. Here you create a regular expression to read this particular form and no other. For example, it can look like [HOUSE_NUMBER, STREET_NAME, CITY_NAME, STATE_NAME]
. Then, a regexp will divide this string into suitable components.
Here is an example of regex that will work well for US addresses containing house number, street, and city:
// Address examples:
// 123 W 34th St, Richmond
// 3700 Crutchfield St, Richmond
// 202 E 35th St, Richmond
// 420 Kenyon St NW, Washington
// 102 Irving St NW, Washington
const address = '123 W 34th St, Richmond';
const { groups: { house_number, street, city } } =
/(?<house_number>\d+[a-zA-Z]*) (?<street>.+),\s(?<city>.+)/ug.exec(address);
Try building and testing regular expressions at RegExr.
A regex address parser doesn’t need any external libraries or APIs but simplifies working only with standardized locations. However, it is almost impossible to debug and hard to read. Also, keep in mind that performance issues appear sometimes.
Npm Packages
Another popular variant is npm-packages, which are (or contain) Node modules. Again, there is a wide choice of packages; mostly, they suit one specific country or data type. Some popular ones are:
- parse-address for the US. This package is regex-based, it knows about many types of data (prefixes, grid-based addresses, official abbreviations, etc.) and is very forgiving with user-provided addresses.
- addresser intakes an address string and converts it into structured geographic data. It handles abbreviations and normalizes them well. Also, it has the function
getRandomCity
, which is helpful for testing. - humanparser works with human names and divides strings into the first name, last name, middle name, suffixes, and other components. It also parses addresses with the regex method.
While this technology is community-driven, open-data-based, and effective, it also has its cons — primarily in its difficulties with licenses and dependencies. So be careful, as many npm packages cannot be used in commercial projects.
Online Address Validator Tools
Do you have a one-time job? Then there is no need to reinvent the wheel! You can parse and standardize addresses with an online address validator tool. Usually, these tools are compatible with CSV, Excel, and Text formats. The tool will verify each address and you will receive a CSV file with all strings checked.
An address validator is convenient and straightforward, but the number of addresses to parse may not be as big as you want. Try these tools to parse a bunch of addresses:
- Address Validation Tool by Geoapify
- Address Standardization Tool by Geoapify
- Batch Address Check Tool by Melissa
- Bulk Address Validation Tool by SmartyStreets
Geocoding API
The final and the strongest technology from today's list is a geocoding API. It is a mechanism processing all operations you need, including parsing, postal address normalization, postal code lookup by address, validation, and verification.
It allows not only to structurize but also to get location's coordinates and information about it. The purpose of a geocoding API is not to parse and divide addresses into components but to show their most suitable locations. For example, if you enter an address that does not exist, you'll get the nearest one.
Some geocoding APIs, such as Geoapify Geocoding API, also provide you a confidence level for each found address. From there, you can decide on the quality of the results and be sure that the found location corresponds to the entered address.
The API as an address parsing technology will probably handle all your tasks and work reliably. Still, do not expect that it is a silver bullet that will work for any address you pass. As with many other cases, the better input you provide, the better results you get. In addition, even if most geocoding API providers offer a free tier, the geocoding service is not free for a large number of addresses. You'll also need additional coding and logic to deal with not-found addresses.
Which One to Choose?
With so many technologies, it might be challenging to choose the best one for your project. Here is a piece of advice on picking the right one.
- Work with regex if you have strictly regular addresses only. In other cases, use it to eliminate special symbols that shouldn’t be in the address.
- For projects based in one certain country, npm packages act well. However, there are difficulties with dependencies, and you must check developers’ information precisely.
- If you need to validate a small number of addresses, an online validator suits you well. For stronger mechanisms, move to geocoding APIs. They simplify the developer’s work maximally and provide high-precision data, which makes it almost versatile.
Hope you’ve found a suitable way of parsing. Try testing different ones to see which one fits better, as well as which is the most comfortable and requires less effort. Remember that different apps and websites might not have the same requirements!
Published at DZone with permission of Alfiya Tarasenko. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments