Binary Code Verification in Open Source World
We all (developers and organizations) using open source must be aware of the inherent risks that come with OSS packages and unverified code.
Join the DZone community and get the full member experience.
Join For FreeThe IT industry has faced new security challenges with the growing popularity of Open Source Software (OSS). Although the challenges were always there, the number of applications based on OSS has highlighted the problem. Furthermore, as OSS stands out as the basis for many of the systems in the cloud, this increases the pressure because a bug in OSS exposes the whole ecosystem to risk.
OSS, as an approach, was designed for free use, encouraging the developers' community to implement and build new applications and software based on it, as well as contributing to the code. This idea proved itself as the best functional approach in creating the code, improving it, and finding and fixing the bugs. However, at the current level of OSS implementation, the safety point seems to be lagging behind it.
Today most applications are built using open-source packages or code. However, as we know, utilizing any third-party code (including open-source packages) can introduce security risks, which are often dismissed by non-development users and even by software development companies themselves.
Log4Shell, followed by Log4Text and others, presented a chain of precedents that required us to look at the problem differently because digital and software security goes beyond OSS development and the IT industry. As such, potential consequences for software vulnerability may also have a tremendous impact on other sectors.
We all (developers and organizations) using open source must be aware of the inherent risks that come with OSS packages and unverified code. Meanwhile, most companies are mainly addressing software security chain issues, and the core of code safety, binary verification, is often forgotten. This article focuses on the binary check and its importance in all the community efforts to raise software security.
Binary and Open-Source Code
In the Java community, we contribute to developing and enhancing the code via the OpenJDK initiatives. The code is built into Java, but the final product is delivered as binary. In practice, this means that all Java-based products are initially written in the Java language. Then this code or software needs to become available for machine use, so it is compiled into binary byte code. As a result, the Java runtime is converted to the binary form of a machine code, and a Java class library might be presented in the form of bytecode or compiled machine code. Therefore, for Java binary, there is no single option. In addition, there are different binary forms received after the Java products are compiled, which further complicates the verification process.
Most often, after the compilation is done, the original program cannot be restored from it. Nevertheless, a certain Java code can be recovered. The initial Java code can either be executed by the JVM by interpretation or by JIT compilation, in the latter case being transformed into machine code.
You can also get the initial code by compiling the binary to machine code and linking it with runtime (Native Image or AOT). Besides, the Java runtime in its binary form is a machine code and a Java class library, which can exist in the form of bytecode or compiled machine code.
More importantly, when we build new OpenJDK instruments, we combine different libraries and runtimes taken from various sources. In this creative process, supply chain security management and binary verification, in particular, are principal factors for the future security of software programs. Implementing unverified code and libraries is unacceptable in the community of leading OpenJDK vendors. Most companies are doing verification checks internally and applying different methods.
The market situation, however, is not quite as straightforward.
Snyk, one of the leading companies for security development solutions, delivers interesting yearly reports on what is happening in the security niche. This year's report examines the complexity introduced by open-source packages and how organizations manage that complexity as part of their Software Development Life Cycle (SDLC).
The research shows some important software security realities, including that many organizations still don't have any policies or governance around open source security (OSS). A few striking numbers include:
- 41% of organizations are not confident in their open-source software security;
- 51% of organizations don't have a security policy for open-source development or usage;
- 30% of organizations without an open-source security policy readily recognize that no one on their team is responsible for addressing open-source security.
These statistics show that despite companies using open source widely, many still need to gain internal expertise for OSS and have established procedures for dealing with it.
Ensuring the Full Compliance of Binary Code Against Open-Source Code
Within all the security measures for OSS, binary code verification should be the first. Nevertheless, this problem is not that strongly addressed, as most organizations focus on overall software chain security. Such little attention may be related to the difficulty of complete binary verification.
The problem with verification is that the compilation process of converting open-source code to binary is not reversible. Therefore, the security of the code and software is vital for the vendor and the final user, so security checks are sometimes done on both sides.
The Snyk report demonstrates that less than 50% of enterprises and companies have an OSS policy. Accordingly, those companies without such a policy neither analyze the binary nor arrange additional checks by external security providers.
Often, conformity of binary code is taken for granted. However, using the binary code based on trust in the company producing the final product is an unhealthy pattern. Such an approach creates a faulty assumption that someone else has made these checks for you and the code is secure, which in turn may bring an avalanche of possible problems in the future. Furthermore, when the unverified binary is inserted into the supply chain, it can mean that this part of the code or the library contains a bug that makes your application vulnerable. Ideally, software vendors should accompany their products with binary conformity.
Tightening Up the Binary Security
Using open-source packages and code safely requires a set of checks. The prevalence of OS applications today expands its vulnerabilities and requires that more scrutiny be inserted into the security process. The key strategy for better and safer Open Source use would be implementing security practices, ideally on both sides: the developers and final users.
Today, the OSS demonstrates a growing focus on supply chain security. Supply chains are getting more complex and use a vast number of binaries. In addition, the supply chain depends on the library's use. Therefore, for full verification, the vendor needs to open all library components used to confirm that they do not contain any bugs. Only after completing this procedure can it be said that the product is secure.
Generally, a software supply chain is anything that touches an application or concerns its development throughout the entire SDLC. Binary check in this process includes verifying any third-party and proprietary code. Vendors and final users are responsible for performing these security activities and providing proof of their security efforts to their clients.
The continuous rise of code reuse and cloud-native approaches has delivered growth in the number of potential gaps and bugs in the code, attracting more prospective hacker attacks. Unfortunately, exploiting just one weakness can help traverse the entire supply chain, and we've seen plenty of examples of that in recent times.
There is a high demand for practical methods and tools to ensure the total correctness of critical software components. A usual assumption is that the machine code (or binary code) generated by a compiler follows the programming language's semantics. Unfortunately, modern compilers such as GCC and LLVM are too complex to verify thoroughly, and bugs in the generated code are common. Besides, the code can be corrupted on purpose.
Binary Check Methods
Binary-source code matching is included in many security and software engineering-related tasks, such as malware detection, reverse engineering, and vulnerability assessment.
Various methods are used for verification: some of them confine themselves to checking the absence of specific kinds of bugs (e.g., runtime errors), while others try to prove the total/overall correctness of the software under analysis.
The total correctness approach usually refers to deductive verification. Such software verification aims at formally verifying that all possible behaviors of a given program satisfy formally defined, possibly complex properties, where the verification process is based on logical inference.
Despite many efforts offered by existing approaches, complete verification of the binary code remains challenging.
In fact, OpenJDK vendors often combine different verification methods to fully confirm the binary. In addition, they also rebuild the code from scratch.
Software components delivered by vendors, together with verification, provide full conformity. Obtaining open-source products from such vendors, rather than taking open source from public repositories, is a more secure business approach and the best possible practice for companies using open source. Besides, vendors provide tech support for their open-source code, fully guarding your software against any potential future threat.
To reduce security risks, OSS vendors use different methods for binary verification. They also often combine several checks, depending on existing data. This could involve the use of static analysis, sanitizers with units, regression tests, code coverage testing for some parts of the product, and rebuilding the code for others. Such an approach is based on high competence and a deep understanding of the problem, requiring different verifications for different parts of Java software. As Java applications and their dependencies consist of binary taken from various sources, all these binaries may require separate checks. Therefore, verifying the entire Java application using just one or even two standard approaches is difficult. However, companies developing this software possess the necessary internal competence to confirm each bit of the binary and, by delivering those binary checks, bring certified and secure software to the market.
A standard Technology Compatibility Kit (TCK) exists for Java. The TCK has been ratified by the Java Community Process suite of tests, delivering nominal checks of a particular alleged implementation of a Java Specification Request (JSR) for compliance. The TCK guarantees compliance with the umbrella Java SE specification and JSRs specific to the particular Java version. However, the TCK does not guarantee that the binaries are secure, just that the binary complies with the Java SE specifications.
The safest road toward better and safer development would be using only Java software with TCK verification. The tech support by the same vendor and their internal overall binary checks give protection from rare cases in a critical enterprise environment.
Opinions expressed by DZone contributors are their own.
Comments