All the Questions You Were Afraid to Ask About SBOM
What if we could enumerate all the software components we use and produce, and we could distribute and consume it easily? This is what SBOM is trying to solve.
Join the DZone community and get the full member experience.
Join For FreeDuring many recent security incidents, we hear a lot of messages about the lack of knowledge of the code dependencies, attacks to the software supply chain, Software Bill of Materials (SBOM), digital signatures, provenance, attestation, and so on.
The fact is, every time a new vulnerability appears in the landscape, we usually need to spend a lot of time and effort to detect the real impact on the applications and services that are running in our environment.
What if we had a way to enumerate all the software components we use and produce, and we were able to distribute and consume it easily?
This is what SBOM is trying to solve, and we would like to explain it. Let’s dive into the details of SBOM!
What Is the SBOM?
A Software Bill of Materials is simply an artifact containing a comprehensive list of package dependencies, files, licenses, and other assets that compose a piece of software together.
According to NTIA SBOM FAQ:
A Software Bill of Materials (SBOM) is a formal record containing the details and supply chain relationships of various components used in building software. These components, including libraries and modules, can be open source or proprietary, free or paid, and the data can be widely available or access-restricted.
The concept is not something new, and BOMs (Bills of Materials, without “software”) have always been there as part of industrial processes. They are very similar to a “list of ingredients,” although a BOM usually includes the concept of “hierarchy,” so every component is broken down into the list of subcomponents, which again includes the BOM for each of them.
In Software Bill of Materials, the “pieces” are commonly abstract libraries, modules, binaries, compilers, files, etc., and they usually include licensing information (Apache 2.0, GNU, BSD, …) and additional metadata.
Is the SBOM Something Recent?
Not really. It might look pretty new, but the Open Source community noticed the need in creating SBOMs more than 10 years ago, and SPDX open standard was started in 2010 as an initial effort to address the problem.
Truth is, things started to move fast in recent years due to the increase of attacks related to the supply chain:
- October 2015 – SWID Tags standard, from NIST, published as ISO/IEC 19770-2:2015.
- May 2017 – Initial drafts of CycloneDX, an OWASP SBOM standard.
- December 2020 – The ISO International Standard for open source license compliance (ISO/IEC 5230:2020 – Information technology — OpenChain Specification) is published, requiring a process for managing a bill of materials for supplied software.
- 2020 – 2021 – NTIAs publishes latest work as part of the ongoing Software Component Transparency effort around Software Bill of Materials (SBOM).
- February 2021 – Executive Order 14017 on America’s Supply Chain.
- May 2021 – Executive Order 14028 on Improving the Nation’s Cybersecurity.
- July 2021 – NIST releases the Recommended Minimum Standards for Vendor or Developer Verification (Testing) of Software Under Executive Order (EO) 14028.
- August 2021 – SPDX published as ISO/IEC 5962:2021 standard.
- September 2021 – First draft of SLSA (Supply-Chain Levels for Software Artifacts) framework.
- February 2022 – DoD plan on Securing Defense-Critical Supply Chains which includes Software Supply Chain.
So, especially since 2021, everyone seems to be concerned and talking about it. And this is just the beginning. 2022 is becoming a turning point on how the industry is approaching SBOM and software supply-chain security challenges.
Where Does the SBOM Apply?
An SBOM applies to any software component, either external or internal, open source or proprietary (like files, packages, modules, shared libraries, …), used in the construction of software products. This includes firmware and embedded software, too. Hardware might take part in the distribution or execution of software (e.g., network equipment, cryptographic devices, chips, …) but it is not considered part of an SBOM, although standards like CycloneDX support devices as a type of component too.
In an ideal world, every software company would attach an SBOM to each deliverable, and everyone would have full visibility to the components used in software and know exactly which vulnerabilities are impacting that software.
But not everything in the garden is rosy.
What are some typical assets that could have a companion SBOM?
- Development dependencies: Every time a developer includes a third-party dependency (either open source or an internal component), it is often adding transitive dependencies — the modules or packages that the dependency itself is using. So, a detailed SBOM would give visibility of those transitive dependencies.
- Software applications or packages: When an application is distributed, the SBOM would help the consumer quickly identify the application version, helper tools that are included in the package, and all the pieces that were involved in the build process. This makes it much easier to identify vulnerabilities or troubleshoot issues that might be caused by buggy software dependencies, etc.
- Container images: They basically are a filesystem composed of a base image distribution, plus a set of additional packages and components added during the build process.
- Hosts: (E.g., a Virtual machine appliance image, or an AWS AMI, etc.). The SBOM would include the base operating system type, vendor, version, and a comprehensive list of each package installed in the host, either from the base operating system (e.g., the Linux distribution) or manually deployed from external sources.
- Hardware devices: Examples include a firewall, an IoT device, or a mobile phone, which are running software.
In any case, the SBOM should capture the multi-level dependencies. So, for example, if package libfoobar-1.5.3-r3-u8 is part of the SBOM, it should also include each package name, version, license, etc. used to assemble libfoobar-1.5.3-r3-u8, and the components for each of these, resulting in a multi-level tree where each node is decomposed into its dependencies.
It is also important to point out that every time one of the assets changes, like on every release of a product or even on each build, a new SBOM should be created to match the changes for that version.
Why Do I Need a SBOM?
First, because without a SBOM, you don’t have visibility. A piece of software is a black box with respect to the packages and libraries assembling it.
The reason why we need an SBOM is basically the same reason we need a list of ingredients for food. You can check for the presence of allergens, animal substances for vegans, chemical preservatives, and so on. Certainly, you can eat food without checking the list of ingredients, but you are assuming some risks. The same applies for software: It might be right to use any dependency on a quick test in a sandbox environment, but you definitely want to know what’s inside when deploying critical services to production or delivering software in environments with strong compliance regulations.
Availability of SBOM for third-party dependencies also makes it easier for you to build the SBOM of your software by simply composing the lists from the dependencies and adding your own ingredients. Please note that your software can also be the input or dependency of a more complex product, and consumers might demand the presence of the SBOM as part of their supplier’s minimal requirements.
Knowing the licenses of the different software pieces is also very important. Otherwise, distributing software which is using third-party libraries under multiple types of licenses might break the usage terms or force you to make source code public, which can be inconvenient and even get your company into trouble.
Finally, the SBOM is a key piece in the vulnerability scanning process. Provided that you have an accurate SBOM and reliable and updated vulnerability feeds from different vendors and sources, it is pretty straightforward to find which vulnerabilities are present in the software.
Without an SBOM, the vulnerability scanner software needs to compute and guess a SBOM from, which might be quite tricky or even impossible for some opaque components.
A good SBOM should also allow the answering of questions, like “Am I vulnerable to CVE-2022-22965 (Spring4Shell) vulnerability?” As described in the linked article, exploiting this vulnerability requires a set of conditions to happen simultaneously in the host or container running the exploitable Java package:
- Using SpringCore versions 5.3.0 to 5.3.17, 5.2.0 to 5.2.19 or older, unsupported versions
- Using JDK 9 or higher
- Running Apache Tomcat as the Servlet container
- Having the library packaged as WAR
- Using the spring-webmvc or spring-webflux dependency
Most of these conditions can be checked in the contents of a comprehensive SBOM, making it easier to assess the risk in your environments by focusing on fixing the exploitable applications first.
How Do You Create a SBOM?
Generating SBOMs is a complex topic, with multiple competing standards, distribution, and so on making adoption slower than desired.
Many tools exist that can help you create an SBOM for a piece of software. But even before considering producing an SBOM, it is critical that the build process is completely automated (and this is Level 1 in SLSA framework) and the SBOM creation is integrated as part of the build pipeline.
Next, these are some example executions and outputs of open source tools and the corresponding SPDX or CycloneDX (truncated) SBOM, which are two of the most common standards.
Syft
Syft can generate an SBOM in SPDX or CycloneDX format from a filesystem or container image, and it is embedded in Docker by default using the docker sbom command.
$ syft neo4j:latest ✔ Parsed image ✔ Cataloged packages [376 packages] NAME VERSION TYPE CodePointIM 11.0.15 java-archive FastInfoset 1.2.16 java-archive ... util-linux 2.36.1-8+deb11u1 deb wget 1.21-1+deb11u1 deb zlib1g 1:1.2.11.dfsg-2+deb11u1 deb zstd-jni 1.5.0-4 java-archive zstd-proxy 4.4.8 java-archive
When using the -o flag to set the output to spdx-json format, it will produce a document like:
$ syft -o spdx-json neo4j:latest ✔ Parsed image ✔ Cataloged packages [376 packages] { "SPDXID": "SPDXRef-DOCUMENT", "name": "neo4j-latest", "spdxVersion": "SPDX-2.2", "creationInfo": { "created": "2022-06-23T10:09:26.751733Z", "creators": [ "Organization: Anchore, Inc", "Tool: syft-0.48.1" ], "licenseListVersion": "3.17" }, ... "packages": [ { "SPDXID": "SPDXRef-fd9f083cc189cf0c", "name": "CodePointIM", "licenseConcluded": "NONE", "checksums": [ { "algorithm": "SHA1", "checksumValue": "50a6f2c46702b14cb129aac653d9abfcdc324363" } ], "downloadLocation": "NOASSERTION", "externalRefs": [ { "referenceCategory": "SECURITY", "referenceLocator": "cpe:2.3:a:oracle-corporation:CodePointIM:11.0.15:*:*:*:*:*:*:*", "referenceType": "cpe23Type" }, ... { "referenceCategory": "PACKAGE_MANAGER", "referenceLocator": "pkg:maven/CodePointIM/CodePointIM@11.0.15", "referenceType": "purl" } ], "filesAnalyzed": true, "licenseDeclared": "NONE", "sourceInfo": "acquired package info from installed java archive: /usr/local/openjdk-11/demo/jfc/CodePointIM/CodePointIM.jar", "versionInfo": "11.0.15" }, { "SPDXID": "SPDXRef-80979ce84b1617b2", "name": "FastInfoset", "licenseConcluded": "NONE", ... }, ... ], "files": [ { "SPDXID": "SPDXRef-9e950849d3fbc974", "comment": "layerID: sha256:ad6562704f3759fb50f0d3de5f80a38f65a85e709b77fd24491253990f30b6be", "licenseConcluded": "NOASSERTION", "fileName": "/bin/bash" }, { "SPDXID": "SPDXRef-d1fd1bc48eedeaba", "comment": "layerID: sha256:ad6562704f3759fb50f0d3de5f80a38f65a85e709b77fd24491253990f30b6be", "licenseConcluded": "NOASSERTION", "fileName": "/bin/cat" }, ... ], "relationships": [ { "spdxElementId": "SPDXRef-a124711c55c5b5ec", "relationshipType": "CONTAINS", "relatedSpdxElement": "SPDXRef-9f73084aac22b0b3" }, { "spdxElementId": "SPDXRef-a124711c55c5b5ec", "relationshipType": "CONTAINS", "relatedSpdxElement": "SPDXRef-23989aa2a193ea3d" }, ... ] }
This not only includes the packages, but also the files in the image, relationships between the elements, licensing information, and more.
cyclonedx/bom
The nodeJS package cyclonedx/bom allows generating CycloneDX format SBOM from a Node project. An example output when generating the SBOM from github.com/fastify/fastify looks like:
$ cyclonedx-bom $ cat bom.xml <?xml version="1.0" encoding="utf-8"?> <bom xmlns="http://cyclonedx.org/schema/bom/1.3" serialNumber="urn:uuid:be53de33-6897-49ca-855d-926383866c21" version="1"> <metadata> <timestamp>2022-06-23T10:03:17.018Z</timestamp> <tools> <tool> <vendor>CycloneDX</vendor> <name>Node.js module</name> <version>3.10.1</version> </tool> </tools> <component type="library" bom-ref="pkg:npm/fastify@4.1.0"> <author>Matteo Collina</author> <name>fastify</name> <version>4.1.0</version> <description> <![CDATA[Fast and low overhead web framework, for Node.js]]> </description> ... </component> </metadata> <components> <component type="library" bom-ref="pkg:npm/%40fastify/ajv-compiler@3.1.0"> <author>Manuel Spigolon</author> <group>@fastify</group> <name>ajv-compiler</name> <version>3.1.0</version> <description> <![CDATA[Build and manage the AJV instances for the fastify framework]]> </description> <licenses> <license> <id>MIT</id> </license> </licenses> <purl>pkg:npm/%40fastify/ajv-compiler@3.1.0</purl> <externalReferences> <reference type="website"> <url>https://github.com/fastify/ajv-compiler#readme</url> </reference> <reference type="issue-tracker"> <url>https://github.com/fastify/ajv-compiler/issues</url> </reference> <reference type="vcs"> <url>git+https://github.com/fastify/ajv-compiler.git</url> </reference> </externalReferences> </component> ... </components> <dependencies> <dependency ref="pkg:npm/fast-deep-equal@3.1.3"/> <dependency ref="pkg:npm/json-schema-traverse@1.0.0"/> <dependency ref="pkg:npm/require-from-string@2.0.2"/> <dependency ref="pkg:npm/punycode@2.1.1"/> <dependency ref="pkg:npm/uri-js@4.4.1"> <dependency ref="pkg:npm/punycode@2.1.1"/> </dependency> <dependency ref="pkg:npm/ajv@8.11.0"> <dependency ref="pkg:npm/fast-deep-equal@3.1.3"/> <dependency ref="pkg:npm/json-schema-traverse@1.0.0"/> <dependency ref="pkg:npm/require-from-string@2.0.2"/> <dependency ref="pkg:npm/uri-js@4.4.1"/> </dependency> ... </dependencies> </bom>
snyk2spdx
The snyk2spdx tool leverages the Snyk open-source API to create an SBOM from your code repositories. Unfortunately, at the time of writing, this repository is outdated and unmaintained.
Others
There are also online tools, like https://sbom.democert.org/sbom/, that allow importing different formats or manually adding components to the SBOM definition and then downloading it.
The NTIA also published the “How to Guide for SBOM Generation” as a collection of simple instructions and guidance on how to generate an SBOM. It is interesting that the guide includes the concept of “completeness assertion” for cases where the dependencies of some components are missing.
Vendor-Provided SBOM or Guessed SBOM?
Ideally, the vendor of a product should tell us every component and provide it in a digitally signed document to prevent tampering or modifications. But we are still far from there, and not many vendors produce and provide an SBOM. It’s a complex process, involving multiple tools and pieces, and there are multiple standards for SBOM distribution.
In Dreamland, every vendor would provide a 100% accurate and comprehensive bill of materials, in a common standard, and digitally signed. But in the real world, we usually need scanning tools that can produce a “guessed” Bill of Materials. This is harder, as many components are opaque and it is difficult to discover the dependencies or libraries used during the build.
Still, scanning is necessary, as the SBOM from the vendor might be wrong. The build process in the vendor might be compromised, so some components might be intentionally omitted from the vendor SBOM, which brings us to the following question…
Can the SBOM be Wrong or Inaccurate?
Yes. The quality of an SBOM depends on the quality and automation of the process that builds the SBOM.
It is easy to produce the root level, like the version and details of the software you are directly building, and the first level of dependencies (packages and third party libraries). It becomes more difficult for transitive dependencies and harder as you navigate deeper in the tree, because many components might not provide their own SBOM and detecting dependencies can be complex or plainly impossible, like in statically linked binaries with stripped information.
Even with a perfect toolchain and perfect SBOM information during the build phase, an attacker could tamper the contents of the SBOM (i.e., modify the companion file or artifact at rest) to hide the fact that it contains vulnerable or malicious components. A consumer would then retrieve the modified version of the SBOM and miss these dangerous components.
A common, recommended practice is adding a digital signature to the SBOM artifact to make sure the consumer can verify its authenticity and integrity.
Even worse, it is possible that an attacker compromises the build pipeline, being able to modify the process of creating the SBOM, which would result in a digitally signed but altered list of components.
That is on the building side. From a “scanning” tool perspective, a software component is usually a black-box, or the amount of information that can be obtained from analysis might be quite limited, as most of it (like pom.xml or go.mod files) is available during build but removed in the final deliverable.
To compare, analysis or scanner will produce quantitative data versus provided SBOM which can contain qualitative data, and that can be lost or invisible to an analysis.
To minimize the risks of poor quality SBOMs or attacks, it is recommended to use scanning solutions even in the presence of vendor-provided SBOMs.
How Is the SBOM Related to Vulnerabilities?
A vulnerability is a weakness or flaw that an attacker can exploit to bypass security boundaries, get access to a system, and more. They are a typical way to attack or compromise the software supply chain.
To find vulnerabilities in a piece of software (or in a running host, a container image, and so on), you need something that matches “known” vulnerabilities with the set of components in your software. This is called vulnerability scanning. And it’s where the SBOM comes into play, as it contains a comprehensive list of packages and versions composing your software.
Then, another big question arises: where do the “known” vulnerabilities come from? Note that vulnerabilities need to be known in advance; you cannot detect an unknown flaw!
Researchers and hackers discover them and they end up in vulnerability databases that can be consumed by humans or computers. There are two main sources for vulnerabilities:
Vendors can provide feeds for vulnerabilities in their products, like major Linux distributions, or package repositories like Go, NPM, and so on. Vendors have good context around how the vulnerability impacts the product. However, they might also be biased in regard to severity.
Independent providers like NIST, Mitre, the Open Source Vulnerabilities Database, and commercial offerings like Snyk and VulnDB collect, analyze, and provide information for vulnerabilities. The drawback is the score is objective, without specific context of how the vulnerability might apply on different products. In some cases, vulnerabilities might not even impact a vendor specific product version because it is forked, or the patch is backported.
Consuming vulnerability feeds can be challenging because different formats and standards exist for vulnerability information exchange, like:
- Redhat OVAL – An XML format used for Redhat Enterprise Linux, Openshift, and other Redhat products and also available for Ubuntu.
- Different JSON feeds like the Debian Security Tracker.
- APIs like OSV that allow querying for specific open source package versions.
- Security advisories meant for humans, but not for automated processing, like Gentoo security.
- NVD CVE JSON v5.0, an attempt to create a standard CVE format.
- CSAF (Common Security Advisory Framework) – “Standardizing automated disclosure of cybersecurity vulnerability issues”.
- VEX (Vulnerability Exploitability eXchange), which has been implemented as a profile of CSAF.
VEX is interesting, as it allows vendors to provide a kind of “negative” security advisory, like a certain vulnerability does not apply to a component because the submodule in the package is not even used in the product.
Another source that can help prioritizing vulnerabilities is the CISA’s KEV (Known Exploited Vulnerabilities Catalog), an updated list of vulnerabilities with assigned CVE ID, reliable evidence of being actively exploited in the wild, and with a clear remediation (such as a vendor update).
The following diagram describes the full vulnerability management flow:
A typical flow comprises a scanning tool that is capable of creating an SBOM by analyzing a container image, host, or workload, or directly by consuming a pre-computed SBOM (or both!). The scanning tool then matches known vulnerabilities from different sources (usually the vendor provided sources for the corresponding Linux distribution, plus generic sources like NVD) to report the list of vulnerabilities impacting the software.
You can see an example of matching the Kubernetes SBOM (which is publicly available for each version) with known vulnerabilities from OSV using the spdx-to-osv tool.
The list of detected vulnerabilities can be curated and prioritized with additional information, like VEX (not exploitable vulnerabilities) from the Vendor, KEV list (Known Exploited Vulnerabilities), or Risk Spotlight information from Sysdig, which can detect the packages effectively loaded during the execution of the workload. This filters out the packages that are inside a container image, but never executed, so they are not exploitable.
What Are the Pitfalls?
Even though there is much literature on supply chain security and an ever-growing set of tools and products, many of these tools generate SBOMs by analyzing the components and guessing the dependencies.
The optimal approach of generating the SBOM on every component upstream still requires tailor-made solutions for most cases. This translates into incomplete or inaccurate SBOMs. That’s not to mention the different existing formats (CycloneDX, SPDX, SWID) and lack of a standardized distribution mechanism, making consumption of SBOM pretty hard.
Another problem to consider is that the limitations in the SBOM propagate to vulnerability scanners.
For example, a missing package in the SBOM can result in a false negative (an existing vulnerability not being reported), and applying a patch on a custom package version can result in a false positive. In general, any package customization which doesn’t result in a version reflected in vulnerability databases might cause a FP/FN issue. And there is no single provider for vulnerability information, nor a single exchange standard. Ideally, every vendor would provide their own source of security advisories, and VEX (exploitability) to allow flawless identification of the existing weaknesses.
Conclusion
SBOM is a key piece in securing the software supply chain and fundamental for vulnerability matching and management. It is becoming more important as software consumers and governments are raising the collective bar on security requirements and software quality for their providers.
At the time or writing, there are still different competing standards, a plethora of tools, and a lot of uncertainty, and most of the actors are still struggling to get there. But the general consensus is that we need to secure the supply chain, converge on common standards, and make the SBOM an essential part of the build process.
An interesting initiative to follow up to start securing the supply chain is the “Salsa” framework, which introduces different levels of maturity in the software supply chain, so you can start from nothing and progressively implement different mechanisms to to being as resilient as possible, at any link in the chain.
Published at DZone with permission of Álvaro Iradier. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments