What Is Project Valhalla?
Curious about Project Valhalla and what it will bring to your Java development experience? Here's everything you wanted to know and more about it.
Join the DZone community and get the full member experience.
Join For FreeFor over three years, Project Valhalla has been a buzzword in the Java community, but even with all of this anticipation and foreboding, surprisingly little has been published on this important project. For some, it means the ability to create value types, and for others, it means reified generic runtime types.
But through all of the confusion and desire, Project Valhalla has a very specific purpose: To cease the requirement that Java developers choose between performance and abstraction. In this article, we will clear up the confusion about what Project Valhalla is and what it brings to the table. In doing so, we will examine what is and what is not included in the project as well as delve into the reasoning behind each of the major inclusions and exclusions.
What Is Project Valhalla?
Project Valhalla is an OpenJDK project started in 2014 and headed by Brian Goetz with the purpose of introducing value-based optimizations to Java Development Kit (JDK) 10 or a future Java release. The project is primarily focused on allowing developers to create and utilize value types, or non-reference values that act as though they are primitives. In the words of Goetz:
Codes like a class, works like an int
The major benefit of value types over reference types (objects) is the removal of reference type overhead, both in memory as well as in computation. For example, although the actual size and overhead associated with an object is specific to a Java Virtual Machine (JVM) implementation (see section 2.7 of the JDK 9 Specification), all JVMs include bytes to store information about the object, including polymorphism information, identification information, synchronization information, and garbage collection metadata (such as reference counters). Additionally, when the object associated with a reference type must be accessed, the reference must be dereferenced, imparting a level of indirection. In some cases, this overhead is superfluous, as the benefits of the object (such as polymorphism, identification, or synchronization) are not needed.
Although value types deviate from objects in some important characteristics–which we will see shortly–they maintain a level of abstraction that is akin to classes. For example, value types may still have methods and fields, both with visibility modifiers that facilitate encapsulation. One of the major differences, though, in the current Java release (JDK 9 at the time of writing) is the inability of value types (such as primitives) to be used as generic type arguments; i.e. List<int>
is not a valid type in the current release of Java.
Although primitive types such as int
can be autoboxed to reference types (such as Integer
) this has a major drawback: It reintroduces the overhead of objects. While this has been an obstacle in Java generics since their introduction in JDK 5, it has become much more prevalent in Project Valhalla, as developers will be allowed to create new value types. In order to solve this issue, Valhalla has also been tasked with providing a mechanism for allowing value types to be supplied as valid generic arguments, while still maintaining the current type-erased generic semantics of Java. The solution currently reached by the team is to utilize generic specialization, which we will delve into after a more detailed look into value types.
What Are Value Types?
Value types are groups of data whose immediate value is stored in memory, rather than a reference (or pointer) to the data. Value types can, therefore, be thought of as consuming only enough memory to store the aggregate of the data contained in its field with no additional overhead. Conceptually, primitives are a relatable example of value types where an int only consumes 32 bits with no additional bytes to store metadata.
Taking data and directly placing its value into memory (rather than a reference) is called flattening and its benefits are more acutely demonstrated with arrays. In the case of an array of objects, each element in the array stores a reference to the object associated with that element, requiring that a dereference to be performed before accessing the object. In an array of value types, the values are directly placed into the array and are guaranteed to be in contiguous memory (which increases locality and, consequently, the chance of cache hits). This idea is illustrated in the figure below:
The use of value types over reference types has some major benefits, including:
- Reduced memory usage: No additional memory is used to store object metadata, such as flags facilitating synchronization, identity, and garbage collection. For a small object such as
Integer
, the overhead for an object (such as a boxedInteger
object) can match or even surpass the size of the data itself (a primitive int requires 32 bits in Java). - Reduced indirection: Since objects are stored as reference types in Java, each time an object is accessed it must first be dereferenced, causing additional instructions to be executed. The flattened data associated with value types are immediately present in the location in which they are needed and therefore, require no dereferencing.
- Increased locality: Flattened value objects remove indirection which increases the likelihood that values are adjacently stored in memory–especially for arrays or other contiguous memory structures such as classes (i.e. if a class contains value type fields).
One of the major differences between reference types and value types is the corresponding definition of identity:
The identity of a reference type is intrinsically bound to the object while the identity of a value type is bound to its current state
For example, a date can be thought of as a value type, where one value instance that resolves to January 1, 2018, is equal to another value instance that resolves to January 1, 2018, regardless of the fact that the two instances are not the same (i.e. do not occupy the same location in memory). Informally, so long as the data of one instance matches field-for-field with the data of another instance, the two instances are interchangeable (i.e. they can be identified as the same value). In practice, dates are commonly stored as units of time (such as milliseconds) since some epoch. Thus, if two date value types have the same number of milliseconds since the same epoch, they are equal.
Although value types have many important benefits, they also have many difficulties, especially within the existing Java language. One of the overarching concerns through each iteration of JDK releases is backward compatibility, primarily compatibility of existing binary bytecode representations. In practice, this means additions to Java are prohibited from altering the existing functionality of bytecode operations or semantics. Although it is tempting to try and fit value types into the primitive value model currently employed by Java, there are some major deficiencies that hamstring this effort, including:
- Primitives do not currently support qualified method calls or field access (i.e. use of the dot operator)
- No literals exist for user-defined value types
- Built-in operators (such as
+
) are not supported by all value types - It is difficult to define default values for all user-defined types
In addition to these shortcomings in the existing primitive model, there are also other concerns that must be addressed for user-defined value types, include:
- Can value types implement an interface?
- Can one value subclass another?
- Do value types implicitly subclass a base class, such as all objects do with the
Object
class? (Note: value types cannot simply implicitly inherit fromObject
, since the usage ofObject
instances assumes that the instance is a reference type) - Can a value type be treated as an
Object
when necessary (i.e. autoboxed)?
Although all of these questions–and more–are answered in the State of the Values article, they make clear that the existing Java bytecode operations and descriptors must be expanded to allow for proper value types. At the moment, the plan for this expansion is to create a new bytecode descriptor category, V
, analogous to the existing Q
type descriptor used for classes. For example, bytecode would denote a value type, Point
, as QPoint
, where a reference type named Point
would be called LPoint
.
While there is still a great deal of work to be completed before value types are ready for introduction into the next JDK release, major steps have been taken to experiment with an initial implementation. The Minimal Value Types article goes into great detail on this initial exploratory implementation and although this first cut includes many approximations of the eventual implementation (such as the use of an annotation to denote a value type rather than an express keyword for this purpose), it provides a promising path for the eventual inclusion of value types into Java.
What Are Generic Specializations?
Since the inclusion of generics in JDK 5, one of the major inadequacies of Java generics are their lack of support for primitive type arguments. For example, although List<Integer>
is a valid Java type, List<int>
is not. This shortcoming is due to the requirement that generics be introduced into Java without changing the binary runtime characteristics of class files. This resulted in the type arguments of generic types being removed through type erasure, resulting in the same runtime type for all generic usages of a generic class. For example, although List<Integer>
and List<String>
are processed during compile-time to ensure that each type argument is type-safe, both type arguments are erased and resolve to the same type at runtime. Thus, the following source code lines...
List<String> stringList = new ArrayList<String>();
List<Integer> intList = new ArrayList<Integer>();
...resolve to the following lines at runtime:
List stringList = new ArrayList();
List intList = new ArrayList();
This allows for existing code to continue using the raw type of the generic class without making changes, thus preserving the backward compatibility of existing Java code. For example, existing code could still function by using the raw type List
(without any generic type arguments) since the runtime type of both the raw type and the generic type would resolve to the same type through erasure: List
. Furthermore, since a generic type argument no longer exists for the generic type, all generic parameters are replaced by the reference type Object
. For example, given the following class definition:
public class Box<T> {
private T value;
public T getValue() {
return value;
}
}
The equivalent runtime type would be:
public class Box {
private Object value;
public Object getValue() {
return value;
}
}
The compiler is then responsible for generating proper casts and bridge methods to ensure that all uses of the homogeneously translated runtime type are type-safe. For example, the following snippet...
Box<Integer> myBox = new Box<Integer>();
Integer myValue = myBox.getValue()
...would effectively become:
Box myBox = new Box();
Integer myValue = (Integer) myBox.getValue()
Due to type erasure, the runtime generic argument must be resolved to some type. Object
was selected because it encompassed the uppermost type that could store any user-defined type. Since the primitive values (e.g. int
, double
, boolean
, etc.) did not inherit from the Object
class, they were therefore prohibited from being used as generic type arguments. Practically, the restriction that the runtime nature of Java could not change with the introduction of generics resulted in primitive types being precluded from being used as generic type arguments.
How Can Value Types be Used as Type Arguments?
In general, there are two techniques for supporting generics in an object-oriented programming language. The first technique, homogeneous translation, resolves generic types to a single type, regardless of its type argument. This is the technique used by Java, whereby List<String>
and List<Integer>
both resolve to the runtime type List
. The second technique, heterogenous translation (or generic specialization), results in different type arguments for a single generic type resolving to different types at runtime (see C++ template classes for more details). For example, List<String>
and List<Integer>
would result in runtime types akin to List_String
and List_Integer
, respectively (as opposed to the single homogeneous type List
). There are many important side-effects of using specialization, including the disjunction of generic type hierarchies; further information can be found in the State of the Specialization article by Brian Goetz.
With the introduction of user-defined value types, the need for non-reference types to be used as generic type arguments has become even stronger. As we have already seen, homogeneous translation is insufficient for providing a means of non-reference types generic type arguments. In order to rectify this dilemma, while maintaining backward compatibility with existing type-erased generics, a hybrid solution has been devised, where homogeneous translation is used for reference types, while heterogeneous translation is used for value types.
In order to ensure that existing generic classes can still operate, a new keyword, any
, is used in conjunction with the generic type parameter declaration to denote that it will use the enhanced generics (allowing for both value types and reference types to be supplied as generic type arguments). For example, an enhanced generic Box
class would resemble the following:
public class Box<any T> {
private T value;
public T getValue() {
return value;
}
}
While it may appear cumbersome to include the any
keyword whenever enhanced generics are desired, there are important cases in which existing generic semantics must be maintained. For example, the ArrayList
class has the following methods:
public void remove(int position);
public void remove(T element);
Originally, the remove(int)
method was designed with the assumption that the generic type argument would be a reference type. If this restriction is relaxed and primitive generic type arguments were permissible, a method clash would arise, where the two methods above would resolve to the same signature if the generic parameter T
resolves to the type int
. This can be solved by creating two methods with separate names, such as removeByPosition
and removeElement
, respectively, but this would still require a change to the interface of ArrayList
and thus, we cannot assume that all generic classes can automatically accept value type generic arguments.
At this point in Project Valhalla, there are still a lot of open questions about generic specialization, but we can conclude that if value types are to be added to Java, generics that accept these value types are very likely to come along with them. While there is still a great deal of work that needs to be done to include value-type-capable generic semantics, there have been some major decisions made, even in its infancy. For more information on these conclusions and possible implementations of generic specializations in Project Valhalla, see the State of the Specialization article.
Does This Mean Java Will Have Reified Types?
As a corollary to the use of homogeneous translation by Java, the generic arguments for a generic type are not reified, or available at runtime This has been a major frustration for countless applications but was accepted as a trade-off for backward compatibility of existing Java applications when generics were introduced. With the revisiting of generic types in Project Valhalla, it is only natural to ask the question: Will Java have reified types?
The short answer to this question is possibly, but only in the case of value type generics. Although an implementation for the generic specialization of value type arguments has not been solidified, it is possible that the implementation may include reified generic types. In order to maintain backward compatibility, though, it is highly unlikely that reference type generic arguments will see reification (see slide 42 of the August 2016 Project Valhalla Update presentation). Although there are many benefits to universally reified generic types, there are also a few major disadvantages:
- Incompatibility with existing reference type generics
- Invalidation of optimizations that assumed generics to be erased at runtime
- Complication of bytecode to maintain runtime type information
Although nothing is currently set in stone, it is highly unlikely that universal generic reification will make its way into Project Valhalla, even if value type generics see some level of partial reification. For more information on the trade-offs of reification in generics, see the August 2016 Project Valhalla Update presentation.
Conclusion
With the recent inclusion of Project Jigsaw and the previous introduction of functional programming semantics, it is an exciting time for the Java language. While Java is far from a perfect language, major steps are being taken to mature the language, providing new functionality and semantics that allow for better performance and problem-mapping without the cost of proper abstraction. Project Valhalla, with its exploration of value types and generic support for value types, is a major step in the right direction.
Further Reading
The following sources were used throughout this article and constitute some very useful resources in understanding both the thought process behind Project Valhalla, as well as the progress made so far in its experimental implementations. The personal notes of the author (used in the creation of this article) can be found on Google Drive. A complete list of the resources used for this article can also be found on Google Drive.
Opinions expressed by DZone contributors are their own.
Comments