Java Bytecode Simplified: Journey to the Wonderland (Part 2)
As we continue the journey into Java bytecode, here, in part two of the series, delve a bit deeper into ConstantPool.
Join the DZone community and get the full member experience.
Join For FreeOur previous article introduced bytecode and discussed what it includes. This article will delve a bit deeper into ConstantPool.
Highlights
- Bytecode is a representation that is abstract in nature. They are fictitious codes for a fictitious machine known as the Java virtual machine. The Java virtual machine is a piece of software that interprets bytecode.
- The JVM is a stack-based computer. Real CPUs are register-based systems and execute machine code. Java is compiled into bytecode, an intermediate form, which is then executed by the just-in-time (JIT) compiler, which generates machine code.
Before going any further, let's explore javap
, which is a very handy tool for deconstructing byte code.
javap
javap
is a standard tool included in the JDK's bin subdirectory. An intriguing aspect of javap
is that we do not need to deal with Java source code: rather, it just works with the binary file, which is the .class extension.
Let's see an example:
package ca.bazlur;
public class Lamp {
private boolean isOn;
public void turnOn() {
this.isOn = true;
printStatus();
}
public void turnOff() {
this.isOn = false;
printStatus();
}
private void printStatus() {
System.out.println("Light is turned " + (isOn ? "on" : "off"));
}
public static void main(String[] args) {
var lamp = new Lamp();
lamp.turnOn();
lamp.turnOff();
}
}
If we compile this code using javac
we will get a class file, and then we can use javap
to disassemble the bytecode from the command line as follows:
javap Lamp
We will get the following output:
Compiled from "Lamp.java"
public class ca.bazlur.Lamp {
public ca.bazlur.Lamp();
public void turnOn();
public void turnOff();
public static void main(java.lang.String[]);
}
Note that it prints only the public, protected, and default methods. Above, it did not print private methods. If we also wish to view the private method, we must specify an additional switch -p
.
javap -p Lamp
Compiled from "Lamp.java"
public class ca.bazlur.Lamp {
private boolean isOn;
public ca.bazlur.Lamp();
public void turnOn();
public void turnOff();
private void printStatus();
public static void main(java.lang.String[]);
}
Nonetheless, this only prints the names of the methods. We would be looking for more information, including the bytecode used in the method body. This requires another switch, which is -c
.
javap -c Lamp
Compiled from "Lamp.java"
public class ca.bazlur.Lamp {
public ca.bazlur.Lamp();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public void turnOn();
Code:
0: aload_0
1: iconst_1
2: putfield #7 // Field isOn:Z
5: aload_0
6: invokevirtual #13 // Method printStatus:()V
9: return
public void turnOff();
Code:
0: aload_0
1: iconst_0
2: putfield #7 // Field isOn:Z
5: aload_0
6: invokevirtual #13 // Method printStatus:()V
9: return
public static void main(java.lang.String[]);
Code:
0: new #8 // class ca/bazlur/Lamp
3: dup
4: invokespecial #36 // Method "<init>":()V
7: astore_1
8: aload_1
9: invokevirtual #37 // Method turnOn:()V
12: aload_1
13: invokevirtual #40 // Method turnOff:()V
16: return
}
Now, this becomes significantly more intriguing, and we can observe the presence of all bytecodes. If we examine the first line of the main method, we see the following:
new #8
In addition to this, the code has other locations with numbers such as #1, #2, etc. These are the constant pool's reference values. If we wish to view the constant pool, we must use an additional switch, -v
.
javap -v Lamp
Classfile /bytecode-simplified/src/main/java/ca/bazlur/Lamp.class
Last modified Aug. 11, 2022; size 1245 bytes
SHA-256 checksum cf727468acdcc0b2dd0a6a858a313110e437e01a6625cf4e03f1f0fa41910dae
Compiled from "Lamp.java"
public class ca.bazlur.Lamp
minor version: 0
major version: 62
flags: (0x0021) ACC_PUBLIC, ACC_SUPER
this_class: #8 // ca/bazlur/Lamp
super_class: #2 // java/lang/Object
interfaces: 0, fields: 1, methods: 5, attributes: 3
Constant pool:
#1 = Methodref #2.#3 // java/lang/Object."<init>":()V
#2 = Class #4 // java/lang/Object
#3 = NameAndType #5:#6 // "<init>":()V
#4 = Utf8 java/lang/Object
#5 = Utf8 <init>
#6 = Utf8 ()V
#7 = Fieldref #8.#9 // ca/bazlur/Lamp.isOn:Z
#8 = Class #10 // ca/bazlur/Lamp
#9 = NameAndType #11:#12 // isOn:Z
#10 = Utf8 ca/bazlur/Lamp
#11 = Utf8 isOn
#12 = Utf8 Z
#13 = Methodref #8.#14 // ca/bazlur/Lamp.printStatus:()V
#14 = NameAndType #15:#6 // printStatus:()V
#15 = Utf8 printStatus
#16 = Fieldref #17.#18 // java/lang/System.out:Ljava/io/PrintStream;
#17 = Class #19 // java/lang/System
#18 = NameAndType #20:#21 // out:Ljava/io/PrintStream;
#19 = Utf8 java/lang/System
#20 = Utf8 out
#21 = Utf8 Ljava/io/PrintStream;
#22 = String #23 // on
#23 = Utf8 on
#24 = String #25 // off
#25 = Utf8 off
#26 = InvokeDynamic #0:#27 // #0:makeConcatWithConstants:(Ljava/lang/String;)Ljava/lang/String;
#27 = NameAndType #28:#29 // makeConcatWithConstants:(Ljava/lang/String;)Ljava/lang/String;
#28 = Utf8 makeConcatWithConstants
#29 = Utf8 (Ljava/lang/String;)Ljava/lang/String;
#30 = Methodref #31.#32 // java/io/PrintStream.println:(Ljava/lang/String;)V
#31 = Class #33 // java/io/PrintStream
#32 = NameAndType #34:#35 // println:(Ljava/lang/String;)V
#33 = Utf8 java/io/PrintStream
#34 = Utf8 println
#35 = Utf8 (Ljava/lang/String;)V
#36 = Methodref #8.#3 // ca/bazlur/Lamp."<init>":()V
#37 = Methodref #8.#38 // ca/bazlur/Lamp.turnOn:()V
#38 = NameAndType #39:#6 // turnOn:()V
#39 = Utf8 turnOn
#40 = Methodref #8.#41 // ca/bazlur/Lamp.turnOff:()V
#41 = NameAndType #42:#6 // turnOff:()V
#42 = Utf8 turnOff
#43 = Utf8 Code
#44 = Utf8 LineNumberTable
#45 = Utf8 StackMapTable
#46 = Class #47 // java/lang/String
#47 = Utf8 java/lang/String
#48 = Utf8 main
#49 = Utf8 ([Ljava/lang/String;)V
#50 = Utf8 SourceFile
#51 = Utf8 Lamp.java
The output is quite large, so only a portion of the code for the constant pool is shown here.
Bytecode starts with minor and major versions. This allows us to determine the version it was compiled from. There are a few other things like flags. This flag is ACC PUBLIC
because this class is a public class. The ACC SUPER
was implemented to fix a problem with super invocation, but since Java 1.8, it has no effect. Perhaps it will be deleted in the future. In reality, a JEP proposal is available to eliminate this. We will not discuss all of the content of bytecode here; rather, let's move on to ConstantPool.
ConstantPool
ConstantPool can be considered a multidimensional array. In fact, in the JVM specification, the general format is mentioned as follows:
cp_info {
u1 tag;
u1 info[];
}
It contains numerous elements, including class name, field name, interface name, string, numbers, pointers to classes or methods, type descriptor, etc., and has an index.
For instance, the first element contains a MethodRef, which is composed of elements #2
and #3
. In #2
, the material is #4
. Similarly, in line #4
, we have a UTF-8
value that is essentially a String, namely java/lang/Object
.
If you use javap
to unpack the entire bytecode, you will find something known as a descriptor. They are referred to as "type descriptors." These are strings that describe the signatures of Java methods or Java types at other constant pool locations.
BaseType Character | Type | Interpretation |
---|---|---|
B |
byte |
Signed byte |
C |
char |
Unicode character code point in the Basic Multilingual Plane encoded with UTF-16 |
D |
double |
Double-precision floating-point value |
F |
float |
Single-precision floating-point value |
I |
int |
Integer |
J |
long |
Long integer |
L ClassName; |
reference |
An instance of class ClassName |
S |
short |
Signed short |
Z |
boolean |
true or false |
[ |
reference |
One array dimension |
Although it appears to be shorter and more concise, particularly for primitive types, we must always use fully qualified names in bytecode for reference types.
Let’s see how we read them. For example:
()Ljava/lang/String
In the round bracket, nothing between them indicates that this method doesn’t require any parameters. The right of the brackets always indicates the return type. So this represents a method signature, which means it takes nothing but the return string; for example, toString()
.
(I)V
This one takes integer parameters and returns a void
. The V doesn’t exist in the table, but it means void. The reason it’s not present in the table is that void
is not actually a type. It means the absence of a type.
The constant pool includes all the information required to verify a class during class loading.
If you are interested in knowing more about ConstantPool, I would recommend reading JVM specifications.
This is all for today. Next, we will discuss the bytecode catalog and the family of bytecode.
Published at DZone with permission of A N M Bazlur Rahman, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments