In last post, I introduced how memory looks like in the JVM and pointed out that most mnemonics interact with the stack operand as well. Now, I am going to introduce bytecode.
Types in JVM
Though Java, a language runs on JVM, is known for being strong type and static type, JVM does not have much information about type. There is no type information on stack operand. The values on the stack operand are either primitive values, references or addresses, or in other words, the stack operand has no idea what type the object on it is.
For primitives, there are specific instructions for them correspondingly. The primitives and the object references are denoted as:
- boolean:
z
- byte:
b
- char:
c
- short:
s
- int:
i
- long:
l
- float:
f
- double:
d
- reference:
a
Instructions
Java bytecode is the instruction set of the Java virtual machine. Each bytecode is composed of one, or in some cases two bytes that represent the instruction (opcode), along with zero or more bytes for passing parameters. Currently, in Jan 2017, there are 205 opcodes in use out of 256 possible byte-long opcodes.
Instructions fall into a number of broad groups:
- Load and store (e.g. aload_0, istore)
- Operand stack management (e.g. swap, dup2)
- Arithmetic and logic (e.g. ladd, fcmpl)
- Type conversion (e.g. i2b, d2i)
- Object creation and manipulation (new, putfield)
- Control transfer (e.g. ifeq, goto)
- Method invocation and return (e.g. invokespecial, areturn)
There are also a few instructions for a number of more specialized tasks such as exception throwing, synchronization, etc.
Load and Store
load
means pushing some value onto the stack operand. It has several forms:
-
load value from local variable array:
t
loadt
denotes the possible types:i
,l
,f
,d
,a
. boolean, byte, char and short are treated as int. So there are 5 opcodes in total. -
load value from local variable array:
t
load_#
In the above
t
load opcodes, a parameter is required to specify the index of the variable in the local variable array. So there are 2 bytes used. In order to make bytecode more compact, frequently accessed indices have specific opcodes.#
takes value ranges from 0 to 3. So there are 20 opcodes in total. -
load some specific constant:
t
const_#
These opcodes are used for load frequently used values. When
t
isi
,#
ranges fromm1
to5
, denoting integers from -1 to 5. Whent
isl
,#
ranges from 0 to 1. whent
isf
,#
ranges from 0 to 2. Whent
isd
,#
ranges from 0 to 1. Whent
isa
,#
isnull
. So there are 15 in total. -
load
byte
constant: bipush -
load
short
constant: sipush -
load other constant: ldc, ldc_w, ldc2_w
ldc is for loading constants from constant pool with a byte parameter denoting the index in constant pool. ldc is for loading constants from constant pool needing two byte parameters denoting the index in constant pool. ldc2_w is for loading double and long, having two byte parameters.
-
load value from an array:
t
aloadIt is an operator on an array reference on the stack operand, where
t
denotes the possible types:b
,c
,s
,i
,f
,d
,a
. So there are 8 opcodes in total.
store
means storing some value from the stack operand. It has several forms:
-
store value into the local variable array:
t
storet
denotes the possible types:i
,l
,f
,d
,a
. So there are 5 opcodes in total. -
store value into the local variable array:
t
store_#
#
takes value ranges from 0 to 3. So there are 20 opcodes in total. -
store value into an array:
t
astoreIt is an operator on an array reference on the stack operand, where
t
denotes the possible types:b
,c
,s
,i
,f
,d
,a
. So there are 8 opcodes in total.
So there are 86 opcodes in this section.
Stack Operand Management
The opcodes used for operand stack management include:
-
discard top stack operand(s): pop, pop2
-
duplicate top stack operand(s): dup, dup_x1, dup_x2, dup2, dup2_x1, dup2_x2
where dup means make 1 to 1,1; dup_x1 makes 2,1 to 1,2,1; dup_x2 makes 3,2,1 to 1,3,2,1; dup2 makes 2,1 to 2,1,2,1; dup2_x1 makes 3,2,1 to 2,1,3,2,1; dup2_x2 makes 4,3,2,1 to 2,1,4,3,2,1.
-
exchange the top2 stack operands: swap
So there are 9 opcodes in this section.
Arithmetic and Logic
Arithmetic operations include addition, subtraction, multiplication, division, negation, and bit-shifting:
-
addition:
t
addt
takes value fromi
,l
,f
,d
. -
subtraction:
t
subt
takes value fromi
,l
,f
,d
. -
multiplication:
t
mult
takes value fromi
,l
,f
,d
. -
division:
t
divt
takes value fromi
,l
,f
,d
. -
remainder from division:
t
remt
takes value fromi
,l
,f
,d
. -
negation:
t
negt
takes value fromi
,l
,f
,d
. -
shifting: ishl, lshl, ishr, lshr, iushr, iushr
Besides all those arithmetic operations on stack operands, there is one for increment local variable: iinc. It increments a local variable at some index by a signed byte constant.
Logic operations include logical and, logical or, and logical xor:
-
and:
t
andt
isi
orl
. -
or:
t
ort
isi
orl
. -
xor:
t
xort
isi
orl
.
There are also some comparison operations: lcmp, fcmpl, fcmpg, dcmpl, dcmpg.
lcmp compares two long operands, results in 0 if equivalent, 1 if first operand is larger, -1 if
first operand is smaller. f
denotes float, and d
denotes double. In the JVM, the comparision of
floating-point number always fails if one of the number being compared is NaN
. fcmpg results
in 1 when one of the operands is NaN
. fcmpl results in -1 when one of the operands is NaN
.
Similar do dcmpl and dcmpg.
So there are 42 opcodes in this section.
Type Conversion
Type conversion operations are for converting primitive values from one type to another.
-
conversion between
i
,l
,f
,d
:t
2t'
t
is one ofi
,l
,f
,d
, andt'
is one of the remainings. -
conversion from
i
tob
,s
, andc
: i2t
t
is one ofb
,s
,c
.
Yes, there is no boolean, z
, in the JVM. z
s are stored as i
s in the JVM, occupying 4 bytes
instead of 1 bit in the memory.
So there are 15 opcodes in this section.
Object Creation and Manipulation
For non-array objects:
-
creation: new
new only creates a reference of a type. In order to initialize the object, it is required to call
<init>
on that object reference. new/dup/invokespecial/astore is a common pattern to new an object and store it into a local variable. -
manipulation: getstatic, putstaic, getfield, putfield
-
type: checkcast, instanceof
For arrays:
-
creation: newarray, anewarray, multianewarray
-
manipulation: arraylength
So there are 11 opcodes in this section.
Control Transfer
The JVM uses goto
s to implement control flows:
-
goto, goto_w
go to instruction at some branchoffset.
-
jsr/ret, jsr_w
the pair of opcodes are used to implement
finally
clause in Java prior to Java6, deprecated until then. -
switches: tableswitch, lookupswitch
-
comparing with
0
: ifeq, ifne, iflt, ifge, ifgt, ifle -
comparing 2
int
s: if_icmpt
t
takes one ofeq
,ne
,lt
,ge
,gt
,le
-
comparing with
null
: ifnull, ifnonnull -
comparing 2
reference
s: if_acmpeq, if_acmpne
So there are 23 opcodes in this section.
Method Invocation and Return
Method invocation includes:
-
invokevirtual
Methods in Java is by default
virtual
, unless noted asfinal
, which means that each Java class is associated with avirtual method table
that contains links to the bytecode of each method of a class. The table is inherited from the superclass of a particular class and extended with regard to the new methods of the subclass.invokevirtual enables the dynamic binding in JVM. It also ensures that the method being called is on the class instance without using interface, and the method access is not
private
.Since the method in the vtable is known at compile time, the JVM can be optimized to remember each method’s position in the table, so as to call methods efficiently.
-
invokespecial
invokespecial is for invoking instance initialization methods,
private
methods, and methods of a specific superlcass of the current class. That means there is no dynamic binding, invokespecial always invoke the particular class’version of a method.In Java 8, invokespecial is also used to call default methods via
super
. -
invokestatic
invokestatic is used to call the class methods, those methods declared with the
static
keyword. There is no need to load the target object reference to the operand stack. The method is identified by a reference in the constant pool. Only the parameters are passed in, so the first local variable of the method being called is notthis
. -
invokeinterface
The differences between invokeinterface and invokevirtual contains:
- invokeinterface does not check the accessibility of the method, all methods in interface is
declared
public
as until Java 8. For more information, check this SO question. - invokeinterface has a different method lookup process. The method table of an interface can have different offsets. So invokeinterface has no chance for the style of optimization that invokevirtual does. For more information, check Efficient Implementation of Java Interfaces: Invokeinterface Considered Harmless.
- invokeinterface does not check the accessibility of the method, all methods in interface is
declared
-
invokedynamic
It is introduced with Java 7, originally targeting to support dynamic languages running on the JVM. In Java 8, invokedynamic is used under the hood to implement lambda expressions and default methods, as well as the primary dispatch mechanism.
invokedynamic allows user code to decide which method to call at runtime, instead of some constant pointing to the Constant Pool.
java.lang.invoke.MethodHandle
represents the methods that invokedynamic can target, it receives some special treatment from the JVM, in order to operate correctly. Method handles are invoked by using polymorphic signature. A polymorphic signature is created by the Java compiler dependant on the types of the actual arguments and the expected return type at a call site. When invokedynamic is first encountered, it does not have a known target. A method handle (bootstrap method) is invoked, which returns aCallSite
containing another method handle, that is the actual target of the invokedynamic call.Currently the lambda expression in Java is implemented as follows: the lambda’s body is copied into a private method inside of the class in which the expression is defined. Given that the lambda expression makes no use of non-static fields or methods of the enclosing class, the method is also defined to be static. (final fields are directly copied.) The lambda expression itself is substituted by an invokedynamic call site. For bootstrapping a call site, invokedynamic instruction currently delegates to the
LambdaMetafactory
class which is responsible for creating a class that implements the functional interface and which invokes the appropriate method that contains the lambda’s body stored in the original class. The method contains the lambda’s body is private. So the generated class is loaded using anonymous class loading, so as to receive the host class’s full security context.
Return includes: ireturn, lreturn, freturn, dreturn, areturn, return. The
return is for returning void
from a method invocation.
So there are 11 opcodes in this section.
Others
Opcodes for other specific tasks includes:
-
nop
perform no operation
- monitor: monitorenter/monitorexit
-
athrow
throws an error or exception, the rest of the stack is cleared, leaving only a reference to the
Throwable
. -
wide
execute opcode, where opcode is
t
load,t
store, or ret, but assume the index is 16 bit; or execute iinc, where the index is 16 bits and the constant to increment by is a signed 16 bit short. - reserved for debuggers: breakpoint, impdep1, impdep2
So there are 8 opcodes in this section.
Class File
A compiled class file consists of the following structure:
To decompile the classfile, use
To view the assembler code, use
Loading, Linking and Initialization
The JVM starts up by loading an initial class using the bootstrap classloader. The class is then linked
and initialized before main
is invoked. The execution of this method will in turn drive futher
loading, linking and initialization as required.
-
Loading
Load the byte array of the class definition. Any class or interface named as a direct superclass is also loaded.
-
Linking, it contains three steps verifying, preparing, and optionally resolving.
- verifying, it confirms the representation is structurally correct and obeys the semantic requirements.
- preparing, it allocates the memory for static storage and any data structures used by the JVM such as method tables. Static fields are created and initialized to their default values, no initializers or code is executed at this stage as that happens as part of initialization.
- resolving, it checks symbolic references by loading the referenced classes or interfaces and checks the references are correct. If this does not take place at this point the resolution of symbolic references can be deferred until just prior to their use by a bytecode instruction.
-
Initialization, it executes the initialization method
<clinit>
.