Reading Java Bytecode – Reverse Engineering

 

In Java, we can read java .class files with java.io.DataInputStream and write class files with java.io.DataOutputStream. The Java API defines the DataInputStream as

 

“A data input stream lets an application read primitive Java data types from an underlying input stream in a machine-independent way. An application uses a data output stream to write data that can later be read by a data input stream.”

 

The Java class file is organized with its own defined data types, u1, u2, and u4. These data types represent one, two, and four-byte unsigned data. Since DataInputStream allows us to read primitive data types, we can use its methods to read portions or all of the .class file. Namely, readByte() which reads 8 bits (1 byte), readUnsignedShort() which reads 16 bits (2 bytes) at a time, and readInt() which reads 32 bits (4 bytes).

 

The structure of the class file is defined as follows:

 

    ClassFile {
        u4 magic;
        u2 minor_version;
        u2 major_version;
        u2 constant_pool_count;
        cp_info constant_pool[constant_pool_count-1];
        u2 access_flags;
        u2 this_class;
        u2 super_class;
        u2 interfaces_count;
        u2 interfaces[interfaces_count];
        u2 fields_count;
        field_info fields[fields_count];
        u2 methods_count;
        method_info methods[methods_count];
        u2 attributes_count;
        attribute_info attributes[attributes_count];
    }

 

From the vmspec on class file structure, we need to use the following sequence of methods to get to the defined Constant Pool:

 

readInt();         // magic number defined as 0xCAFEBABE

readUnsignedShort();// minor version of the compiler

// that produced the file

readUnsignedShort();// major version of the compiler

// that produced the file

readUnsignedShort();// number of entries in the constant

// pool

 

Now we have the initial contents of the class file and the size of the constant pool. The work begins. Since the constant pool consists of different structures, we must first determine which structures we’re dealing with first, string constants, class names, field names, etc. Each of the structures is defined as follows:

 

Constant Type

Value

CONSTANT_Class

7

CONSTANT_Fieldref

9

CONSTANT_Methodref

10

CONSTANT_InterfaceMethodref

11

CONSTANT_String

8

CONSTANT_Integer

3

CONSTANT_Float

4

CONSTANT_Long

5

CONSTANT_Double

6

CONSTANT_NameAndType

12

CONSTANT_Utf8

1

 

The entries are stored in a cp_info structure the has a byte tag that evaluates to one of the above values. We can read the tag with the DataInput method readByte().

 

Continue next week…