Recently, I had a chance to dive deeper into the JVM bytecode than I was ever expecting. It took me tons of documentation, articles, Stack Overflow topics, and bytecode listings to gain some understanding of the JVM internals. Generating my own bytecode and making a lot of mistakes also was a very insightful experience 😀.
My goal is to bring that knowledge to one place and publish it in small chunks as a beginner-friendly blog series.
Now I’m going to dive right into it. Have fun!
The bytecode
Let’s take a look at the following example class written in Java:
Let’s compile it:
After compilation, an Example.class
file gets generated, and it contains the JVM bytecode.
Now we can use the javap
tool, which shows the bytecode in a human-readable format:
This is the full output:
When we write a program in Java and compile it, the compiler translates our code into JVM bytecode. If we write our code in Scala or any other JVM language, it will be compiled to the same bytecode format. The bytecode is the “code” that the JVM understands and can execute, so the task of compilers is to translate any code to that standard bytecode format.
Bytecode, by itself, is just an array of bytes, and it’s not readable for us humans. However, the format of bytecode is very strict, and each byte has a meaning. The JVM will only accept the compiled classes that comply with a specific format.
The javap
tool shows the bytecode in a readable way. The bytecode operations are represented with helpful special
aliases, such as iload
or istore
. The references to the special memory area called the constant pool are
complemented with hints about the actual values, for example: // Method java/io/PrintStream.println:(I)V
.
If we run javap
with -v
flag, we can see all the verbose information, which includes everything that is
written in the bytecode, including the constant pool, the debug information, etc.
Debug information
The debug information consists of three parts: the source file name, line tables, and variable tables.
Variable tables are only necessary for debugging purposes, so they are generated only when we compile with
debug mode enabled. To compile Java code with debug more, we should use the -g
flag:
The source file name and line tables are present by default even when debug mode is off. So you can already see these parts in the bytecode:
The line table is the mapping of line numbers in Java code (or any other language from which the program is compiled)
to the bytecode indices. The above example shows the line table of the main method of the Example
class.
In our Java code, line 4
contains the following:
The line table maps line 4
of the Java code to index 2
of the bytecode main
method code
section:
This instruction increments an integer value by one.
Each method has a separate code section in the bytecode, so there is a separate line table for each method including the constructor.
Line tables are not super important for running the application if we’re not debugging it. However, when we get an exception, we can see helpful line numbers in the stack trace. These line numbers are available thanks to the line tables.
If we compile our class with debug mode enabled, we will also see the variable tables for each method:
This table contains the variable names (Name
), types (Signature
), the starting code index where they
become available (Start
), for how many code lines they are available (Length
), and the index in the
local variable array where the variable is stored (Slot
).
Internally, the JVM doesn’t know about our variable names, it stores and accesses the variables by their indices. However, when we debug our application, we are asking to provide a value for a variable with a certain name. For this purpose the JVM needs to know the names too. Also, in order to request the value of a certain variable, the JVM needs to know exactly where it’s stored and what is the type.
Constant pool
This whole section describes the constant pool:
The Code
section contains the information describing what and how should be executed:
what methods should be invoked, in which classes they reside, what constant values should be loaded in memory,
and so on. However, certain values take too much space, so they are put in the Constant pool
,
and the Code
section contains the references to them.
Let’s take a look at the first line:
This is the value under reference #1
. We can see that it’s a method reference, but the value is actually
constructed of two other references: #5
and #14
.
This is the #5
:
It’s a class name, and the actual value is stored at #20
:
Finally, this is the string, and the actual value is: java/lang/Object
.
Now, let’s see what is located under #14
:
If we go through the labyrinth of #6
and #7
references, we will end up with this value: “<init>”:()V
.
When we combine #20.#6:#7
together, we will have the following: java/lang/Object.”<init>”:()V
.
This is exactly the value that the javap
tool nicely provides for us as a hint comment,
so now we know where it comes from.
What does that value actually mean? It is a method descriptor.
Internally, the JVM represents types and methods with descriptors. java/lang/Object
specifies the class where
the method is located, “<init>”
is the actual method name. Normally the method names are just regular method
names we are used to, but this particular one is an internal name for constructors. This last bit:
()V
describes the types that the method accepts and returns. Argument types are written within the parentheses,
in this case there are no arguments at all. Outside the parentheses is what the method returns:
V
means that it returns a void type.
We can see that this value #1
is referenced in the code section of the constructor:
This is the instruction to invoke the constructor of a java.lang.Object
class.
Let’s talk about the Code section in the next part.