How to Fully Validate URLs in Java
URLs can turn out to be invalid for a variety of reasons. We can call a free API to check syntax, domain name existence, and endpoint availability all at once.
Join the DZone community and get the full member experience.
Join For FreeUniform Resource Locators (URLs) function as the address of unique resources on the internet. Entering a website URL into our browser retrieves the HTML/CSS files required to construct the page we’re visiting, and making API calls against an endpoint URL allows us to remotely access and/or modify important data — the list goes on. URLs effectively facilitate the interconnectivity we take for granted on the internet today.
When we capture URL string inputs in our web applications, it’s critical that we validate those inputs to ensure the URLs are useful. Retrieving and storing any form of address data (whether that's a URL address, an IP address, or even a physical street address) without immediately validating its utility is a waste of time; it’ll leave us empty-handed when we attempt to access important resources in the future.
Automating URL validation isn’t exactly as straightforward as it sounds, however. Any given URL can present multiple issues at once, and some of those issues are harder and more resource-intensive to find out about than others. We can look at URL validity from a syntax perspective (i.e., ensuring the URL is well-formed), and we can also look at it from a domain and endpoint validity perspective (i.e., ensuring the domain exists and the unique resources are actually accessible).
In this article, we’ll discuss what constitutes a valid URL from a syntax, domain, and endpoint validity perspective, and we’ll learn how to call an API (using ready-to-run Java code examples) that validates all three of these factors simultaneously.
Understanding URL Validity
Validating a URL string starts with checking the URL syntax. Each component of the URL structure must be incorporated correctly to access any given URL's resources.
Let’s quickly break down the basic components of a valid URL. We’ll use https://example.com
as a simple example.
A valid URL begins with a correctly typed scheme that identifies the internet protocol used for communication. In the case of https://example.com
, that protocol is https
. The scheme must be followed by the scheme delimiter ://
to separate it from the rest of the URL. Errors in scheme syntax are common, but they’re relatively easy to identify with lightweight programmatic methods.
A valid URL next presents a top-level domain (e.g., .com
) and a second-level domain (e.g., example
). A subdomain (e.g., api.example
) can sometimes precede the second-level domain. A domain syntax error might involve a simple misspelling at this stage, such as https://examplecom
. The missing period between example
and com
means the top-level domain is missing, and the URL cannot be accessed.
Syntax is crucially important, but validating syntax alone won’t entirely ensure a URL is functional. A misspelled domain can appear syntactically correct, but we won’t know it’s a real domain unless we check the DNS (Domain Name System) to see if it’s registered there. If we misspell our example URL as https://exmpl.com
, for instance, we won’t be able to access https://example.com
resources (unless example.com
also owned the exmpl.com
domain), but we will technically have a syntactically valid URL string.
Furthermore, validating a domain name with a DNS lookup doesn’t necessarily mean we can access resources from that URL, either. Well-formed URLs with registered domain names can still point to resources that are inaccessible for one reason or another. For example, if we’re planning to make API calls against https://api.example.com
, we’ll need to make a request to the URL endpoint directly to determine whether it’s listening and prepared to modify/return resources as expected.
Validating URLs in Java
There are a few standard ways we can validate URLs in Java. In this case, we’ll briefly discuss two common classes that can be used for this purpose: java.net.url
, and HttpURLConnection
. Both classes are part of the java.net
package, which is provided by the Java Development Kit (JDK).
Using the java.net.url
class, we can perform limited validation checks during URL parsing. We can check for syntax errors in a URL string, and we can ensure URLs follow a standard format. However, this class isn’t primarily designed for validation; rather, it’s designed for working with URLs in other important ways, such as parsing or composing URLs. We won’t be able to validate domain names and endpoints with this class.
Using the HttpURLConnection
class, we can open a connection with a URL and check the response code from the underlying server. This technically works as a method for validating URL endpoints, but it’s a bit resource-intensive (and, much like the java.net.url
class, it's not explicitly designed with validation in mind). When we use the HttpURLConnection
class, we need our application to handle the connection setup, send requests, read responses, and manage errors — all of which puts a significant burden on our server.
Fully Validating URLs With Free API
Rather than build a URL validation workflow around a Java class, we can instead take advantage of a free URL validation API that performs an exhaustive URL validation check on our behalf.
This way, we can very easily validate URL syntax, domain existence, and endpoint availability in one step. Perhaps most importantly, we can abstract the heavy lifting involved in domain and endpoint validation to another server. Our application won’t need to handle HTTP connection management or error handling by itself, and — as an added benefit — it won’t need to deal directly with potentially threatening URLs either.
If we use this API to validate our earlier example https://example.com
, we’ll get the following response:
{
"ValidURL": true,
"Valid_Syntax": true,
"Valid_Domain": true,
"Valid_Endpoint": true,
"WellFormedURL": "https://example.com/"
}
With a simple response object like this, we can quickly determine if URLs are usable based on several important URL validation categories.
Demonstration
To take advantage of this multi-step URL validation API, we can use the ready-to-run Java code examples provided below to structure our API call, and we can use a free API key to authorize our API calls. With a free API key, we can make up to 800 API calls per month without any additional commitments.
To install the client SDK, let’s add the following reference to the repository in our Maven POM file (Jitpack is used to dynamically compile the library):
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
And then let’s add the following reference to the dependency:
<dependencies>
<dependency>
<groupId>com.github.Cloudmersive</groupId>
<artifactId>Cloudmersive.APIClient.Java</artifactId>
<version>v4.25</version>
</dependency>
</dependencies>
Next, let’s add the imports to our file:
// Import classes:
//import com.cloudmersive.client.invoker.ApiClient;
//import com.cloudmersive.client.invoker.ApiException;
//import com.cloudmersive.client.invoker.Configuration;
//import com.cloudmersive.client.invoker.auth.*;
//import com.cloudmersive.client.DomainApi;
And after that, let’s use the below examples to call the URL validation function, and let's replace the "YOUR API KEY" placeholder text with our own API key:
ApiClient defaultClient = Configuration.getDefaultApiClient();
// Configure API key authorization: Apikey
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
Apikey.setApiKey("YOUR API KEY");
// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)
//Apikey.setApiKeyPrefix("Token");
DomainApi apiInstance = new DomainApi();
ValidateUrlRequestFull request = new ValidateUrlRequestFull(); // ValidateUrlRequestFull | Input URL request
try {
ValidateUrlResponseFull result = apiInstance.domainUrlFull(request);
System.out.println(result);
} catch (ApiException e) {
System.err.println("Exception when calling DomainApi#domainUrlFull");
e.printStackTrace();
}
That’s all the code we’ll need. We can now easily use this API to capture URL input strings in any of our Java web applications and carry out a useful multi-step validation check.
Conclusion
In this article, we discussed the importance of validating URLs, the various components of a valid URL, and two Java classes we can use to handle URL validation. In the end, we learned how to call a free URL validation API that performs a multi-step URL validation check on our behalf.
Opinions expressed by DZone contributors are their own.
Comments