OWASP Top Ten and Software Composition Analysis (SCA)
In this post, learn about PVS-Studio as an SCA (Software Composition Analysis) tool. Which approach to implementation should we choose? Let's figure it out!
Join the DZone community and get the full member experience.
Join For FreeOne of the priority areas for PVS-Studio development is to cover categories from the OWASP Top Ten 2017 in the C# analyzer. We also plan to cover the Top Ten 2021 in the future. The most unusual for us is the A9:2017 category: Using Components with Known Vulnerabilities. This category has the A6 position in the preliminary version of OWASP 2021. The rule implementation for this category is an important task for our analyzer. It allows us to classify PVS-Studio as an SCA (Software Composition Analysis) tool. Which approach to implementation should we choose? Let's figure it out!
Using Components With Known Vulnerabilities
The A9 threat category (which turned into A6 in the preliminary OWASP 2021 version) is dedicated to using components with known vulnerabilities. These are the components that have the corresponding entries in the CVE database. CVE (Common Vulnerabilities and Exposures) is a database of records about real-life vulnerabilities in software, hardware, service components, etc.
A9 is quite atypical from the point of view of its coverage in PVS-Studio. That's because the existing analyzer architecture is designed to search for errors in the code itself. The architecture uses syntax trees, semantic models, and various technologies like data-flow analysis and others. These technologies were generally sufficient to implement diagnostic rules that would cover certain categories from the OWASP Top Ten 2017.
For example, on the basis of the existing data-flow mechanism, we implemented taint analysis and various related diagnostic rules:
- V5608 searches for SQL Injection
- V5609 searches for Path Traversal/Directory Traversal
- V5610 searches for potential XSS vulnerabilities
- Others
Each of these rules searches for potential vulnerabilities in code and works by traversing a syntax tree. At the same time, they correspond to one or more OWASP Top Ten 2017 categories. You can find the full list of correspondences here.
The situation with A9 is completely different. From the point of view of C# projects, the rule implementation for A9 is a check of all the project dependency libraries for CVE. In other words, for each dependency, we need to check whether there is a corresponding entry in the CVE database.
This task goes far beyond the usual syntax tree traversal and the study of code semantics. However, we are determined to cover this category. Besides, it is very important that the implementation of the A9 rule would let PVS-Studio position the analyzer as an SCA solution.
Software Composition Analysis
In general, SCA tools are designed to check the project for problematic dependencies.
For example, if a project depends on an open source library, it is extremely important to take into account the license under which this library is distributed. Terms of use violations can cause huge damage to the business.
Another possible problem is the presence of vulnerabilities in the library. In the context of SCA, we are talking about known vulnerabilities — CVE. It's almost impossible to determine the use of a dependency that contains a non-recorded vulnerability :). It is not difficult to guess that if we use a library with a [publicly known] vulnerability, we can make a product vulnerable to various attacks.
Besides, using libraries whose maintenance was discontinued is a dangerous approach. Potentially, these dependencies also contain vulnerabilities. However, developers most likely don't know about them. Fixing such vulnerabilities is out of the question: no one is going to do that.
SCA and PVS-Studio
We are gradually coming to the main question, which is how to implement the SCA functionality First, we need to say that we are going to develop these features within the coverage of the A9:2017 category (Using Components with Known Vulnerabilities). Thus, we are going to search for dependencies with known vulnerabilities in the first place. However, the PVS-Studio analyzer already has diagnostic rules that warn developers about copyleft licenses:
It is possible that over time we will implement other SCA features.
Detecting components with known vulnerabilities consists of two parts. The first step is to obtain all (both direct and transitive) project dependencies and then search for the CVEs that match them. The first part of this plan seems simple. The second part, though, is more difficult.
At the moment, we plan to implement the specified functionality for the C# analyzer. It's easy to obtain the list of dependencies for a C# project. Roslyn helps us a lot — our analyzer is built on its base. To be more precise, the main factor is the use of the same build platform (MSBuild) and a compiler for all C# projects. At the same time, Roslyn is closely related to MSBuild. This makes obtaining the dependencies list trivial.
Since the ecosystem of C++ and Java is much more diverse, obtaining the dependencies list is going to be more difficult. We'll do this another time :).
Well, we got the dependencies from the project. How do we understand which of them have vulnerabilities? Besides, we need to keep in mind that the vulnerability may be relevant only for specific library versions. Obviously, we need some kind of database, where the dependencies, versions, and the corresponding CVEs would be stored.
The main question of implementation: how do we find (or, perhaps, create) a database that allows us to compare the available information about project dependencies with specific CVE? The answer to that question depends on the tools you use.
Using a CPE Open Database
The first option we have studied is the approach used in OWASP Dependency Check. The approach is simple: for each dependency, this utility searches for a corresponding identifier in the CPE (Common Platform Enumeration) database. In fact, the CPE database is a list with information about products, their versions, vendors, and so on. To implement SCA, we must obtain CPE and CVE correspondences. Thus, getting a CVE list is just searching for the corresponding entry in the CPE database.
You can find the CPE database and CVE compliance on the official website National Vulnerability Database. One of the ways to get the necessary information is to use the Rest API. It's described here. For example, the following query allows us to get the first 20 elements of the CPE database including corresponding CVEs:
https://services.nvd.nist.gov/rest/json/cpes/1.0?addOns=cves
Below is an example of CPE for ActivePerl:
{
"deprecated": false,
"cpe23Uri": "cpe:2.3:a:activestate:activeperl:-:*:*:*:*:*:*:*",
"lastModifiedDate": "2007-09-14T17:36Z",
"titles": [
{
"title": "ActiveState ActivePerl",
"lang": "en_US"
}
],
"refs": [],
"deprecatedBy": [],
"vulnerabilities": [ "CVE-2001-0815", "CVE-2004-0377" ]
}
The most important part here is the "cpe23Uri" value. It contains important information for us in a certain format, and, of course, "vulnerabilities" (although they are not a part of the CPE list). For simplicity, we read the "cpe23Uri" string as:
cpe:2.3:a:<vendor>:<product>:<version>:<update>:...
According to the specification, a hyphen in place of one of the fragments means a logical "NA" value. As far as I understand, this can be interpreted as "the value is not set". The "*" character put in place of a fragment means "ANY".
When we implement a CPE-based solution, the main difficulty is to find the right element for each dependency. The problem here is that the library name (obtained when we parsed the project links) may not match the corresponding CPE entry. For example, the CPE list has entries with the following "cpe23Uri":
cpe:2.3:a:microsoft:asp.net_model_view_controller:2.0:*:*:*:*:*:*:*
cpe:2.3:a:microsoft:asp.net_model_view_controller:3.0:*:*:*:*:*:*:*
cpe:2.3:a:microsoft:asp.net_model_view_controller:4.0:*:*:*:*:*:*:*
cpe:2.3:a:microsoft:asp.net_model_view_controller:5.0:*:*:*:*:*:*:*
cpe:2.3:a:microsoft:asp.net_model_view_controller:5.1:*:*:*:*:*:*:*
After processing the entries, the analyzer concludes that they are all related to various versions of a product with the name "asp.net_model_view_controller" released by a company called Microsoft. All these entries correspond to a vulnerability with the CVE-2014-4075 identifier. However, the library in which the vulnerability was discovered is called "System.Web.Mvc". Most likely we'll get this name from the list of dependencies. In CPE, the name of the product is "Microsoft ASP.NET Model View Controller".
Besides, we need to take into account the vendor, whose identifier is an integral part of the CPE entries. There are also problems with this, as the actual dependency does not always provide the necessary information in any form suitable for parsing. Not to mention the compliance of this information with any entry from CPE.
You can guess that similar problems arise with the library version.
Another problem is that many records in the database are not relevant when we look for matches. Let's take as an example the entry given at the beginning of this section:
cpe:2.3:a:activestate:activeperl
ActivePerl is a distribution of the Perl language from ActiveState. The probability that something like this would be a dependency on a C# project...well, is low. There are a lot of "unnecessary" (in the context of analyzing C# projects) entries. It's hard to say how we can teach the analyzer to distinguish them from the useful ones.
Despite the mentioned problems, the CPE-based approach can still be effective. Its implementation should be much trickier than a simple pair of string comparisons. For example, the OWASP Dependency Check works in an interesting way. For each dependency, this tool collects evidence strings that can correspond to the vendor, product, and version values from the desired CPE.
Using GitHub Advisory
We found another approach to searching for CVEs. We investigate GitHub Advisory to find the entries that correspond to the dependency we need to check. GitHub Advisory is a vulnerability database (CVE) discovered in open source projects that are stored on GitHub. The full list of positions is available here.
After we got acquainted with CPE, we understood that the method of recording data is extremely important when we choose the data source. We have to admit that in this case GitHub Advisory is much more convenient than CPE. Perhaps, this database was originally created for being used by various SCA tools. Anyway, various solutions like GitHub SCA and SCA by Microsoft use this database.
For programmatic access to GitHub Advisory, we need to use GraphQL. It's a powerful technology, but we must note that it's much easier to understand Rest API. Nevertheless, worn out by GitHub's GraphQL Explorer, I finally managed to make a query that outputs almost what I wanted. Namely, it outputs a list of packages and corresponding CVEs. Here's one of the elements I received:
{
"identifiers": [
{
"value": "GHSA-mv2r-q4g5-j8q5",
"type": "GHSA"
},
{
"value": "CVE-2018-8269",
"type": "CVE"
}
],
"vulnerabilities": {
"nodes": [
{
"package": {
"name": "Microsoft.Data.OData"
},
"severity": "HIGH",
"vulnerableVersionRange": "< 5.8.4"
}
]
}
}
Obviously, I did not make the most optimal query, so I got a little extra information at the output.
If you're an expert in GraphQL, please write in the comments how you would construct a query that allows you to get a list of matches of this form: (package name, version) => CVE list.
Anyway, the query result clearly indicates the package name: the one that corresponds to this dependency in NuGet. The package name corresponds to CVE, and versions, for which the vulnerabilities are relevant. I'm sure that with a better understanding of this topic, we could easily create a utility that would automatically download all the necessary information.
We must say that selecting packages specifically for NuGet is a useful feature. In many cases (if not all) we would like to look for entries that correspond to a particular dependency among those packages. More specifically, we would like to do it without all the stuff for Composer, pip, etc.
Alas, this solution has its flaws. At the time of writing this article, the GitHub Advisory had 4753 entries and only 140 NuGet packages. In comparison to the CPE database that contains more than 700 000 entries, this collection doesn't look that impressive. Although we must note that not all CPEs have corresponding CVEs. Plus the description implies that the GitHub Advisory database will contain information about vulnerabilities of GitHub-stored projects only. This drastically narrows the sample.
Nevertheless, the convenience of presenting vulnerabilities in this database at least makes us think about using it, if not as the main, then at least as one of the auxiliary data sources.
Our Own Database
Powerful SCA tools, such as Black Duck and Open Source Lifecycle Management, form and use their own databases. These databases, judging by the description, contain even more information than the National Vulnerability Database. Obviously, such databases present information in the most convenient form for the relevant tools.
Working in this direction, we have to transform the public data found about vulnerable components into some form convenient for our analyzer. We only need to find data convenient for such a transformation. Most likely, all SCA tools have their own databases of vulnerable components. However, not all of them contain information about vulnerabilities that are not in NVD or some other public source. One of the important distinguishing features of powerful SCA solutions is that they build their custom base that surpasses similar bases of other tools. Therefore, when working on the SCA implementation in PVS-Studio, we will take into account the need to expand our vulnerability base in the future.
Places Where Vulnerable Components Are Used
It may seem that the implementation of the SCA functionality in PVS-Studio will require the creation of something fundamentally new, without the possibility of using any of our existing developments. And, frankly, not in vain. The fact is that dependency analysis is a brand-new functionality and PVS-Studio doesn't have anything like this right now.
However, we have an idea of how we can use the existing architecture to enhance our SCA implementation. Instead of simply making the analyzer trigger at the presence of a link to an unsafe library, we will try to look for its use in code. We have plenty of ready-made mechanisms for this :).
In my opinion, if the library is not even used, the analyzer still should warn about its presence among the dependencies. And if the library capabilities are somehow applied in the code, then the analyzer should issue a warning of the highest level. So far these are just thoughts.
As you see, we haven't decided what implementation approach to use. We haven't resolved some issues about it, too. For example: if we use a library with a vulnerability many times in a project, should the analyzer issue a warning for each place of use? Or will the user drown in warnings? Should the analyzer issue one warning per file or should we simply raise the level if it detects the use of such a library?
We have a lot of such questions about this solution. That's why we would like to know: how would YOU like to see SCA in PVS-Studio? How should an effective tool for finding problematic vulnerabilities work? What level should warnings have? Should we try to find other information sources about vulnerabilities? Should the analyzer trigger at transitive (indirect) dependencies?
We are waiting for your comments. Thank you for your attention!
Published at DZone with permission of Nick Lipilin. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments