How To Scan a URL for Malicious Content and Threats in Java
This article discusses some of the threats posed by URL links and provides a free-to-use URL Security API with complementary, ready-to-run Java code snippets.
Join the DZone community and get the full member experience.
Join For FreeAt this point, we’ve all heard the horror stories about clicking on malicious links, and if we’re unlucky enough, perhaps we’ve been the subject of one of those stories.
Here’s one we’ll probably all recognize: an unsuspecting employee receives an email from a seemingly trustworthy source, and this email claims there’s been an attempt to breach one of their most important online accounts. The employee, feeling an immediate sense of dread, clicks on this link instinctively, hoping to salvage the situation before management becomes aware. When they follow this link, they’re confronted with a login interface they’re accustomed to seeing – or so they believe. Entering their email and password is second nature: they input this information rapidly and click “enter” without much thought.
In their rush, this employee didn’t notice that the login interface looks very different than normal. Further, they’ve overlooked that the email address alerting them to this account “breach” contained 10 more characters than it would have if it had come from the account provider. On top of all that, they’ve failed to see that the link itself – a mix of tightly packed letters, symbols, and words which, in truth, they’ve hardly glanced at in the best of circumstances – contains improper spellings and characters all over the place. In about 30 seconds, this employee has unwittingly compromised an account with access to some of their employer’s most sensitive data, handing their login details to a cybercriminal far away who will, no doubt, waste little time in exploiting the situation for monetary gain.
A boilerplate email phishing scenario such as this – the most basic example of a tried-and-true social engineering tactic, dating back to the early days of the internet – is just one of many threats involving URLs that continues to drive immense scrutiny around the origin and dissemination of malicious links. As the internet has scaled, the utility of URLs has grown in lockstep. We use URLs to share important content with our friends, colleagues, managers, clients, and customers all the time, quietly ensuring that URLs can continue to expand in their role as vehicles for social engineering scams, viruses, malware, and various other forms of cybersecurity threats.
From this scrutiny, a culture of individual accountability has predominantly emerged: we, the targets of threatening URLs, are (justifiably) viewed as the most pivotal barrier between attack and breach. As a result, at an organizational level, the most important and common step taken to mitigate this issue involves training users on how to spot fraudulent links on their own. Employees of companies in diverse industries all over the world are increasingly taught to identify the obvious signs of malicious links (and social engineering/untrustworthy outreach), a practice which has, no doubt, proved highly beneficial in reducing instances of URL-driven breach.
However, the vast criminal potential of URLs means user training isn’t quite enough to mitigate the issue entirely. To properly secure our invaluable data, we need to proactively implement security policies that can accurately identify and flag URL-based threats on their own. Like the tendencies of living viruses, the underlying strategies of URL threats (and all cybersecurity threats) inexorably evolve to defeat their victims, diminishing the utility of past security training until their relevance is dubious at best.
For example, URLs are increasingly used as a lightweight method for sharing files across a network. When we receive a file link from someone we trust (regularly receive files from), we have little reason to believe that link may be compromised, and – despite all our intense security training – we are still very much in danger of clicking on it. Unbeknownst to us, this link may contain a malicious ForcedDownload file that seeks to capitalize on our brief error in judgment and compromise our system before we can react. While individual accountability means blunders such as this should (and will) be considered our fault in the short term, that blame has a limited ability to deter the issue as it continues to evolve. The person who sent this link to us may have received it from a source they usually trust, and that source may have received it from someone they also usually trust, and someone towards the beginning of that chain of communication may not have had any security training at their job whatsoever, blindly forwarding links from a source they believed to be valuable but had never actually investigated before. Just as it’s important for us to assume links such as this might be dangerous, it’s equally important for our system’s security policies to assume the same, and to act against those links as diligently as possible before they reach a human layer of discretion.
To that end, URL security APIs can play a key role, offering an efficient, value-add service to our application architecture while removing some of the burdens on our users to prevent malicious links from compromising our systems by themselves.
Demonstration
The purpose of this article is to provide a powerful, free-to-use REST API that scans website URLs for various forms of threats. This API accepts a website link (beginning with "http://" or "https://") string as input and returns key information about the contents of that URL in short order. The response body includes the following information:
- “CleanResult” – A Boolean indicating whether or not the link is clean, ensuring this link can be diverted immediately from its intended destination“WebsiteThreatType,” a string value identifying if the underlying threat within the link is of the Malware, ForcedDownload, or Phishing variety (clean links will return “none”)
- “FoundViruses” – A subsection of viruses (“VirusName”) found within a given file URL (“FileName”), and the name of those viruses
- “WebsiteHttpResponseCode” – The three-digit HTTP response code returned by the link
To complete a free API request, a free-tier API is required, and that can be obtained by registering a free account on the Cloudmersive website (please note, this yields a limit of 800 API calls per month with no commitments).
To take advantage of this API, follow the steps below to structure your API call in Java using complementary, ready-to-run code examples.
To begin, your first step is to install the Java SDK. To install with Maven, add the below reference to the repository in pom.xml:
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
To complete the installation with Maven, next add the following reference to the dependency in pom.xml:
<dependencies>
<dependency>
<groupId>com.github.Cloudmersive</groupId>
<artifactId>Cloudmersive.APIClient.Java</artifactId>
<version>v4.25</version>
</dependency>
</dependencies>
To install with Gradle instead, add it to your root build.gradle at the end of repositories:
allprojects {
repositories {
...
maven { url 'https://jitpack.io' }
}
}
Following that, next, add the dependency in build.gradle, and you’re all done with the installation step:
dependencies {
implementation 'com.github.Cloudmersive:Cloudmersive.APIClient.Java:v4.25'
}
With installation out of the way, our next step is to add the imports and call the Virus Scan API:
// Import classes:
//import com.cloudmersive.client.invoker.ApiClient;
//import com.cloudmersive.client.invoker.ApiException;
//import com.cloudmersive.client.invoker.Configuration;
//import com.cloudmersive.client.invoker.auth.*;
//import com.cloudmersive.client.ScanApi;
ApiClient defaultClient = Configuration.getDefaultApiClient();
// Configure API key authorization: Apikey
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
Apikey.setApiKey("YOUR API KEY");
// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)
//Apikey.setApiKeyPrefix("Token");
ScanApi apiInstance = new ScanApi();
WebsiteScanRequest input = new WebsiteScanRequest(); // WebsiteScanRequest |
try {
WebsiteScanResult result = apiInstance.scanWebsite(input);
System.out.println(result);
} catch (ApiException e) {
System.err.println("Exception when calling ScanApi#scanWebsite");
e.printStackTrace();
}
After that, you’re all done – no more code is required.
Opinions expressed by DZone contributors are their own.
Comments