Review of pipeline stages

We looked at a simplified model of a CPU pipeline with the stages

This pipeline model was used on the MIPS architecture, in which “normal” instructions do not access memory; only the Load and Store instructions are allowed to access memory, and those instructions do nothing else. Hence, for Load/Store, the EX stage is used to compute the effective address (as on x86, memory operands may use a base register + constant displacement, requiring some arithmetic to find the actual address).

For x86, the pipeline is much more complex, with many more stages. At a minimum, we would need to add another memory-access stage, before the EX stage, because instructions can use a memory operand as an input, or an output (but usually not both). Thus, our pipeline might look like

IF ---> ID ---> MemIn ---> EX ---> MemOut ---> WB

Note that the CPU does not necessarily need to have two memory units; it can have a single unit, and the ID stage simply configures the CPU so that it is used either at the before, or after, the EX stage.

On MIPS, the IF stage is relatively simple: every instruction is 32-bits wide, so excluding jumps, the next instruction is simply the next 32-bits after the current one. Similarly, as we’ll see, the MIPS instruction format always stores the actual operation to be performed in the high 6 bits of the instruction, so the ID stage is mostly a matter of looking these bits up in a table mapping operations to CPU configuration flags.

On x86, different instructions have different sizes, different formats (the operation is not always in the high/low n bits of the instruction), and can have a number of optional prefixes which further modify the instruction. All of this serves to complicate the IF and ID stages.

Instruction formats

The instruction format of an instruction set specifies how instructions are encoded as bytes. That is, how do we get from mov rax, rbx to … ? The instruction format determines the behavior and complexity of the instruction fetch (IF) and decode (ID) pipeline stages.

An IF can be either fixed-size or variable-size, depending on whether every instruction encodes to the same number of bytes, or a variable number. There are advantages and disadvantages to each

Microcode

To simplify the design of variable-size instruction sets, these opcodes are often decoded by the CPU into short “programs” of microcode. The CPU does not directly run the opcodes, only microcode. This allows the microcode to be much simpler (fixed-size, directly decoded), since a single “macrocode” instruction can decode into a sequence a microcode instructions. Thus, the complexity of decoding is front-loaded into the IF and ID stages, and the remainder of the pipeline can be relatively simple.

MIPS32 opcode format

MIPS32 (the architecture whose pipeline design we’ve already looked at) uses fixed-size instructions, 32-bits each. There are three different instruction formats, although all three use the high 6 bits for the opcode (thus, determining which instruction format is in use is easy). The instruction formats are referred to as R, I, and J.

The R instruction format

The R instruction format has fields for three registers (typically, two sources and a destination), as well a shift amount (5 bits) and a function (6 bits). It is used for arithmetic/bitwise instructions which do not have an immediate operand.

Opcode = 000000RSRTRDSh.Amt.Func.
6 bits5 5 5 5 6

The Function field specifies the actual arithmetic function to be applied to the operands given by the RS, RT (sources) and RD (destination) fields. For example, function 32 (100000b) is addition. The Left/Right Shift instructions use the shift amount field to specify the amount to shift.

Note that all R-instructions have a 0 opcode field.

The I instruction format

The I instruction format contains fields for two registers (typically source and destination) and for a 16-bit immediate value. The I format is used for arithmetic operations with an immediate operand, and also for Load/Store instructions in which a base+displacement memory addressing scheme is used (the displacement is stored in the immediate field and is limited to 16-bits, signed).

OpcodeRS RD Immediate/Address
6 bits5 bits5 bits16 bits

The Reg1 and Reg2 fields encode the source and destination registers (MIPS has 32 registers = 25), while the Immediate field encodes any immediate value.

The J instruction format

The J format is used for the Jump instruction, which jumps to an absolute address. Because instructions must be aligned to 32-bits, the low 3 bits of every valid address are always 0. Thus, we can jump to any one of 232-6+3 addresses; the currently-executing address is used to supply the missing 3 bits.

OpcodeAddress
6 bits26 bits

The only instructions which use the J format are Jump and Jump-and-Link, which have opcodes 0x2 and 0x3.

Examples

Here we look at some examples of MIPS instructions and decode them.

Example 1

00000000001000100001100000100000 = 0x00221820

Because the high 6 bits are 0, this is an arithmetic operation, using the R encoding, so we break apart the instruction as

00000000001000100001100000100000
OpcodeRs Rt Rd Sh. Amt.Operation

The two source registers are r1 and r2; the destination register is r3. The operation is 32, which corresponds to the add instruction, so this decodes into

add r3, r1, r2

Example 2

00001000110110101100010000100110

The opcode (high 6 bits) is 2, which indicates a Jump instruction, decoded as

00001000110110101100010000100110
OpcodeAddress

This corresponds to a Jump with an address of 0xDAC426 (the address will be extended to 32-bits using the current contents of the instruction pointer).

Example 3

001000 00001 00010 0000000000100101

The opcode is 8, which is neither a R-instruction nor a J-instruction, so it must use the I-format:

00100000001000100000000000100101
OpcodeRs Rd Immediate

Instruction 8 is Add-with-Immediate, this corresponds to the instruction

addi r2, r1, 37

x86-64 opcode format

The x86 instruction format is described in volume 2A of the Intel x86 reference manuals.

The basic instruction format looks like this:

Field PrefixesOpcodeModR/MSIBDisplacementImmediate
Size (bytes)1-4 1,2,3 0,1 0,10,1,2,4,8 0,1,2,4,8

The maximum instruction size is limited to 15 bytes; it’s possible to construct theoretical instructions larger than this, but they are illegal. Note that a few instructions allow an 8-byte immediate, or an 8-byte address, but obviously never both at the same time.

Prefixes

Prefixes are used to modify instructions. For example, the repetition prefixes rep, repne and repe are all prefixes: they apply to the existing string instructions. There are four groups of prefixes, 1-4, and only one prefix from each group can be present. The groups can be given in any order, except that the REX prefix, if present must be the last prefix (i.e., should be immediately before the opcode). Because all 64-bit instructions use the REX prefix, this means that every instruction we’ve used has consisted of at least REX followed by the opcode.

Some prefixes are mandatory for certain instructions, indicating that the prefix byte must be present for those instructions, although it does not have its normal meaning. When a mandatory prefix is present, it must be placed immediately before the opcode (or, before the REX prefix if used). Mandatory prefixes are often used to “extend” the space of available opcodes, by reusing an existing opcode for a different purpose.

The prefix groups are

  1. Locking and repetition prefixes. Locking is used to indicate that an instruction should be executed atomically. The repetition prefixes have values 0xF2 (repne) and 0xF3 (repe/rep). These prefixes are mandatory for a few unrelated instructions.

  2. Segment override prefixes. These prefixes indicate that the instruction should use the segment registers (if segmentation is enabled) differently from their normal interpretation.

    Group 2 also contains the branch hint prefixes: these are applied to condition jump instructions and give a “hint” to the CPU as to whether the jump is likely, or not likely, to be taken. 0x2e indicates unlikely, while 0x3e indicates likely.

  3. Operand-size override, encoded as 0x66. This switches between 16- (default) and 32-bit (if prefix is used) sized operands. Mandatory for some SSE instructions.

  4. Address-size override, encoded as 0x67. Switches between 16- and 32-bit address sizes. The default is set by the current processor mode; if the prefix is present the non-default size is select.

ModR/M and SIB fields

The ModR/M and SIB fields, when present, are used to specify how memory operands “work”. Remember that a memory operand can consist of a (constant) displacement, two registers (base and offset), and a scale (0,1,2,4, or 8). The configuration must be encoded in the ModR/M and SIB bytes.

ModR/M is broken down as

FieldModReg/OpcodeR/M
Bits7,65,4,3 2,1,0

The three bits of the Reg and R/M fields, when interpreted as a register, have the following decoding:

BitsDecimalRegister
000 0 rax
001 1 rcx
010 2 rdx
011 3 rbx
100 4 rsp
101 5 rbp
110 6 rsi
111 7 rdi

The encoding of ModR/M is quite complex, as it is used to specify which register(s) to use in an instruction, and thus has different interpretations for normal instructions, XMM instructions, etc.

SIB is broken down as

FieldScaleIndexBase
Bits 7,6 5,4,32,1,0

In some memory addressing modes, these are interpreted differently.

REX prefixes

The REX prefix is not part of the above four groups; REX is required for any instruction which uses either one of the extended 64-bit registers (r8-r15, or the xmm registers) or uses a 64-bit sized immediate or address. If present, the REX prefix must occur immediately before the opcode (and immediately after any mandatory prefixes). The REX prefix values are actually valid opcodes; they are interpreted as REX only in 64-bit mode. For example, in 32-bit mode, the REX prefix would be interpreted as an inc instruction. (This further implies that inc must have a different encoding in 64-bit mode!)

There are 16 different REX prefixes, all encoded as 0x40-0x4f. REX is encoded as

Field0100WRXB
Bits7,6,5,43210

The REX prefix is also used for mov instructions with 64-bit immediate operands; the default operand size, even in 64-bit mode, is 32-bits.

All of these have to do with extending existing instructions to support 64-bit operation.

(Intel’s 256- and 512-bit vector instructions use variations on the REX prefix called VEX and EVEX. The remainder of the instruction format is the same.)

Opcode

The opcode itself can be between 1 and 3 bytes.

Examples

Here we look at some example instructions (taken from our programs) and decode them. (The fact that all the fields of the opcode lie on byte boundaries makes this easier than decoding MIPS instructions.)

If you want to see what the decoding of an instruction is, write a simple assembly file containing just the section .text and the instruction, e.g.,

; Save as decode.s

section .text
mov rax, byte [100]     ; Mov from memory

Assemble this to an object file with

yasm -f elf64 -o decode.o decode.s -l decode.lst

and then view the listing file decode.lst.

Example 1

add rax, rbx

This instruction uses 64-bit registers, so it will need the REX.W prefix byte:

01001000
    WRXB

The opcode for the add instruction varies depending on the register used, and whether or not an immediate operand is present. For two 64-bit registers, neither of them in r8-r15, the opcode is 0x03 =

00000011

With two 64-bit register operands, the operands are encoded in the ModR/M byte, the first (destination) operand in the Reg field, and the second in the R/M field. The encoding of rax as a register number is 000b, and rbx is 011. The Mod field is 11, because both operands are registers. Thus, the ModR/M byte is

11 011 000

(Note that the order of the operands is the reverse of that in the decoded instruction: rbx is in the Reg field, while R/M contains the destination register rax.)

Assembling these together gives a three-byte instruction:

0100 1000  00000011  11011000
REX  W     Opcode    ModR/M

or 0x48 01 D8 in hex.

Example 2

add r10, r11

This example uses two of the registers that are only available in 64-bit mode. This will require using the bits of the REX prefix to extend the register indexes in the ModR/M byte:

0100 1101
REX  WRXB

R = 1 will be used as an extra high bit for the Reg field, while B = 1 will be used as an extra high bit in the R/M field.

As above, the destination register is r10 (= register number 10), which should be placed in the R/M field. In order to encode 10 in binary, we need four bits (10 decimal = 1010 binary), so the high bit is set as REX.B in the
prefix, and the remaining three bits 010 are placed in R/M. Similarly, 11 decimal is 1011 binary, so we place a 1 in REX.R, and then the remaining three bits in the Reg field. This gives us a ModR/M byte of

11 011 010
M  Reg R/M

Finally, add, for two 64-bit register operands, where the destination is an an extended 64-bit register, uses opcode 1 (= 00000001b). This leaves us with the final encoding of

0100 1101  00000001  11 011 010
REX  WRXB  Opcode    M  Reg R/M

Example 3

add r10, 17

This is addition, with an extended 64-bit register as the destination, of an immediate. The immediate is small enough that it can be encoded in a single byte, 17 = 00010001b.

The destination register, r10, requires setting the high bit in REX.B, and 010 in R/M. Immediate-Register operations also use a Mod = 11b, so our ModR/M byte is

11 000 010
Mo Reg R/M

and REX is

0100 1001
REX  WRXB

Finally, the opcode for add-with-immediate is 0x83 (= 131 decimal, 10000011b), giving a decoding of

0100 1001  10000011  11 000 010  00010001
REX  WRXB  Opcode    Mo Reg R/M  Imm.

Example 4

add eax, 10

Here, we are not using any of the 64-bit registers, so no REX prefix is needed. Addition with immediate uses opcode 0x83. The destination register is specified using the ModR/M byte (rax = register 0), giving a decoding

10000011  11 000 000  00001010
Opcode    Mo Reg R/M  Imm.

Example 5

scasb

This instruction does not reference any operands, 64-bit or otherwise, so it can be represented without a prefix or ModR/M byte. The opcode for scasb is 0xAE = 10101110, which in turn is the entire decoding:

10101110
Opcode

Example 6

repe scasq

Here, we have made two changes: upgrade the instruction size to 64-bits (requiring the REX prefix) and add the repe prefix. The REX prefix must come immediately before the opcode (0xAF = 10101111; 0xAF is the opcode for scas larger than a byte). The repe prefix is 0xF3 = 11110011b. Thus, we have

11110011  0100 1000  10101111
repe      REX  W     Opcode

Example 7

syscall

Syscall is a two-byte instruction, starting with 0x0F. Its primary opcode is 0x05, giving an encoding of

00001111 00000101

Example 8

lea rax, [rbx + 2*rcx]

As usual, we need the REX.W prefix = 01001000. The opcode for lea is 10001101b = 0x8D. Because we are using a memory operand with both base and offset (index) registers, we will need both the ModR/M and SIB bytes.

This leads to a ModR/M of

00 000 100
Mo Reg R/M

and an SIB of

01 001 011
Sc Ind Bas

for a final encoding of

0100 1000  10001101 00 000 100  01 001 011
REX  W     Opcode   Mo Reg R/M  Sc Ind Bas

Example 9

add rax, qword [10]

This is the most complex example, as it uses a 64-bit register, and a qword-sized memory operand.

The opcode is still 3, for addition into a 64-bit register, from either a register or memory source. The destination register will be specified via the Reg field (rax = register number 0). The Mod and R/M fields combine to specify the addressing mode, which here is just displacement (no base, offset, or scale present). The encoding for this is Mod = 0, R/M = 100b.

0100 1000  00000011  00 000 100  00 100 101  00001010  00000000  00000000  00000000
REX  W     Opcode    Mo Reg R/M  Sc Ind Bas  Address   
                                    rsp rbp

The extra 0 bytes on the end are padding bytes, added to make the instruction have a size of 8.