Review of pipeline stages
We looked at a simplified model of a CPU pipeline with the stages
Instruction fetch – Load the next instruction from memory
Instruction decode – transform the bits of the opcode into the CPU configuration necessary to execute it.
Execute – Perform artihmetic operations
Memory – Access memory
Writeback – write results back into destination register
This pipeline model was used on the MIPS architecture, in which “normal” instructions do not access memory; only the Load and Store instructions are allowed to access memory, and those instructions do nothing else. Hence, for Load/Store, the EX stage is used to compute the effective address (as on x86, memory operands may use a base register + constant displacement, requiring some arithmetic to find the actual address).
For x86, the pipeline is much more complex, with many more stages. At a minimum, we would need to add another memory-access stage, before the EX stage, because instructions can use a memory operand as an input, or an output (but usually not both). Thus, our pipeline might look like
IF ---> ID ---> MemIn ---> EX ---> MemOut ---> WB
Note that the CPU does not necessarily need to have two memory units; it can have a single unit, and the ID stage simply configures the CPU so that it is used either at the before, or after, the EX stage.
On MIPS, the IF stage is relatively simple: every instruction is 32-bits wide, so excluding jumps, the next instruction is simply the next 32-bits after the current one. Similarly, as we’ll see, the MIPS instruction format always stores the actual operation to be performed in the high 6 bits of the instruction, so the ID stage is mostly a matter of looking these bits up in a table mapping operations to CPU configuration flags.
On x86, different instructions have different sizes, different formats (the operation is not always in the high/low n bits of the instruction), and can have a number of optional prefixes which further modify the instruction. All of this serves to complicate the IF and ID stages.
Instruction formats
The instruction format of an instruction set specifies how instructions are
encoded as bytes. That is, how do we get from mov rax, rbx
to … ? The
instruction format determines the behavior and complexity of the instruction
fetch (IF) and decode (ID) pipeline stages.
An IF can be either fixed-size or variable-size, depending on whether every instruction encodes to the same number of bytes, or a variable number. There are advantages and disadvantages to each
Fixed-size makes fetching instructions easy: just get the next n bytes.
Fixed-size enables a decoding optimization known as direct encoding, in which the bits of the opcode’s value are mapped directly to various CPU control flags. E.g., one bit in the opcode might specify whether the Ex pipeline stage is enabled. Four bits might specify an index into the register file for the source register. This implies that the Decode pipeline stage is a no-op: once we have loaded the instruction, it is already decoded.
Fixed-size does not allow us to take advantage of Huffman encoding. Huffman encoding says that more-frequently-used opcodes should be shorter, and less-frequently-used opcodes longer, which leads to smaller program sizes overall.
When direct encoding is used, many opcodes are invalid (imply invalid combinations of CPU configurations); these cannot be reused in the future for additional instructions. Thus, although in theory 232 instruction values are available, the actual number of possible valid instructions may be much smaller. Indeed, with a fixed-size encoding, once the “encoding space” of instructions is exhausted, either with valid or invalid instructions, the only way to add additional instructions is to globally increase the instruction size (e.g., boost every instruction from 32-bit to, say, 48-bit).
Instruction fetch and decode are inextricably mixed with variable-width instruction formats. We cannot know how long the instruction is until we have decoded at least part of it. Ideally, the low byte of the instruction contains enough information to know the full instruction size, so that the process is: Fetch 1 byte, Decode, Fetch remaining bytes, Decode. Often, however, this is not the case, and the IF and ID proceed in a loop-like fashion: fetch the beginning of the instruction, decode it to determine how much more to fetch, then fetch more, decode, and so forth, until the entire instruction has been fetched and decoded.
Decoding variable-size instructions, even if the entire instruction is available (it isn’t!) is more complex. Direct encoding is almost impossible. The relevant parts of the instruction may be in different places for different instructions. As we’ll see below, MIPS has only three different instruction formats, and the operation is always in the same place in all three. x86, in contrast, has
Extending a variable-size instruction format with new instructions is relatively easy.
Microcode
To simplify the design of variable-size instruction sets, these opcodes are often decoded by the CPU into short “programs” of microcode. The CPU does not directly run the opcodes, only microcode. This allows the microcode to be much simpler (fixed-size, directly decoded), since a single “macrocode” instruction can decode into a sequence a microcode instructions. Thus, the complexity of decoding is front-loaded into the IF and ID stages, and the remainder of the pipeline can be relatively simple.
MIPS32 opcode format
MIPS32 (the architecture whose pipeline design we’ve already looked at) uses fixed-size instructions, 32-bits each. There are three different instruction formats, although all three use the high 6 bits for the opcode (thus, determining which instruction format is in use is easy). The instruction formats are referred to as R, I, and J.
The R instruction format
The R instruction format has fields for three registers (typically, two sources and a destination), as well a shift amount (5 bits) and a function (6 bits). It is used for arithmetic/bitwise instructions which do not have an immediate operand.
Opcode = 000000 | RS | RT | RD | Sh.Amt. | Func. |
6 bits | 5 | 5 | 5 | 5 | 6 |
The Function field specifies the actual arithmetic function to be applied to
the operands given by the RS, RT (sources) and RD (destination) fields. For
example, function 32 (100000b
) is addition. The Left/Right Shift instructions
use the shift amount field to specify the amount to shift.
Note that all R-instructions have a 0 opcode field.
The I instruction format
The I instruction format contains fields for two registers (typically source and destination) and for a 16-bit immediate value. The I format is used for arithmetic operations with an immediate operand, and also for Load/Store instructions in which a base+displacement memory addressing scheme is used (the displacement is stored in the immediate field and is limited to 16-bits, signed).
Opcode | RS | RD | Immediate/Address | |
6 bits | 5 bits | 5 bits | 16 bits |
The Reg1 and Reg2 fields encode the source and destination registers (MIPS has 32 registers = 25), while the Immediate field encodes any immediate value.
The J instruction format
The J format is used for the Jump instruction, which jumps to an absolute address. Because instructions must be aligned to 32-bits, the low 3 bits of every valid address are always 0. Thus, we can jump to any one of 232-6+3 addresses; the currently-executing address is used to supply the missing 3 bits.
Opcode | Address | |||
6 bits | 26 bits |
The only instructions which use the J format are Jump and Jump-and-Link, which
have opcodes 0x2
and 0x3
.
Examples
Here we look at some examples of MIPS instructions and decode them.
Example 1
00000000001000100001100000100000 = 0x00221820
Because the high 6 bits are 0, this is an arithmetic operation, using the R encoding, so we break apart the instruction as
000000 | 00001 | 00010 | 00011 | 00000 | 100000 |
Opcode | Rs | Rt | Rd | Sh. Amt. | Operation |
The two source registers are r1
and r2
; the destination register is
r3
. The operation is 32, which corresponds to the add
instruction, so
this decodes into
add r3, r1, r2
Example 2
00001000110110101100010000100110
The opcode (high 6 bits) is 2, which indicates a Jump instruction, decoded as
000010 | 00110110101100010000100110 | |||
Opcode | Address |
This corresponds to a Jump with an address of 0xDAC426
(the address will be
extended to 32-bits using the current contents of the instruction pointer).
Example 3
001000 00001 00010 0000000000100101
The opcode is 8, which is neither a R-instruction nor a J-instruction, so it must use the I-format:
001000 | 00001 | 00010 | 0000000000100101 | |
Opcode | Rs | Rd | Immediate |
Instruction 8 is Add-with-Immediate, this corresponds to the instruction
addi r2, r1, 37
x86-64 opcode format
The x86 instruction format is described in volume 2A of the Intel x86 reference manuals.
The basic instruction format looks like this:
Field | Prefixes | Opcode | ModR/M | SIB | Displacement | Immediate |
---|---|---|---|---|---|---|
Size (bytes) | 1-4 | 1,2,3 | 0,1 | 0,1 | 0,1,2,4,8 | 0,1,2,4,8 |
The maximum instruction size is limited to 15 bytes; it’s possible to construct theoretical instructions larger than this, but they are illegal. Note that a few instructions allow an 8-byte immediate, or an 8-byte address, but obviously never both at the same time.
Prefixes
Prefixes are used to modify instructions. For example, the repetition
prefixes rep
, repne
and repe
are all prefixes: they apply to the
existing string instructions. There are four groups of prefixes, 1-4, and
only one prefix from each group can be present. The groups can be given in
any order, except that the REX prefix, if present must be the last prefix
(i.e., should be immediately before the opcode). Because all 64-bit
instructions use the REX prefix, this means that every instruction we’ve used
has consisted of at least REX followed by the opcode.
Some prefixes are mandatory for certain instructions, indicating that the prefix byte must be present for those instructions, although it does not have its normal meaning. When a mandatory prefix is present, it must be placed immediately before the opcode (or, before the REX prefix if used). Mandatory prefixes are often used to “extend” the space of available opcodes, by reusing an existing opcode for a different purpose.
The prefix groups are
Locking and repetition prefixes. Locking is used to indicate that an instruction should be executed atomically. The repetition prefixes have values 0xF2 (
repne
) and 0xF3 (repe/rep
). These prefixes are mandatory for a few unrelated instructions.Segment override prefixes. These prefixes indicate that the instruction should use the segment registers (if segmentation is enabled) differently from their normal interpretation.
Group 2 also contains the branch hint prefixes: these are applied to condition jump instructions and give a “hint” to the CPU as to whether the jump is likely, or not likely, to be taken.
0x2e
indicates unlikely, while0x3e
indicates likely.Operand-size override, encoded as
0x66
. This switches between 16- (default) and 32-bit (if prefix is used) sized operands. Mandatory for some SSE instructions.Address-size override, encoded as
0x67
. Switches between 16- and 32-bit address sizes. The default is set by the current processor mode; if the prefix is present the non-default size is select.
ModR/M and SIB fields
The ModR/M and SIB fields, when present, are used to specify how memory operands “work”. Remember that a memory operand can consist of a (constant) displacement, two registers (base and offset), and a scale (0,1,2,4, or 8). The configuration must be encoded in the ModR/M and SIB bytes.
ModR/M is broken down as
Field | Mod | Reg/Opcode | R/M |
---|---|---|---|
Bits | 7,6 | 5,4,3 | 2,1,0 |
The Mod and R/M fields together specify one of eight registers and 24 different addressing modes (combinations of displacement, base, offset, scale). For register-register operands, the Mod field will have a value of 3 (
= 11b
).The Reg/Opcode field encodes either a register (base/displacement, or some other use), or else it is an extension to the main opcode.
R/M can either be another register, or an extension to the Mod field.
The three bits of the Reg and R/M fields, when interpreted as a register, have the following decoding:
Bits | Decimal | Register |
---|---|---|
000 | 0 | rax |
001 | 1 | rcx |
010 | 2 | rdx |
011 | 3 | rbx |
100 | 4 | rsp |
101 | 5 | rbp |
110 | 6 | rsi |
111 | 7 | rdi |
The encoding of ModR/M is quite complex, as it is used to specify which register(s) to use in an instruction, and thus has different interpretations for normal instructions, XMM instructions, etc.
SIB is broken down as
Field | Scale | Index | Base |
---|---|---|---|
Bits | 7,6 | 5,4,3 | 2,1,0 |
Scale gives the scale factor, 0,1,2, or 4.
Index gives the register number (0-8) of the index register, using the encoding given above.
Base gives the register number (0-8) of the base register.
In some memory addressing modes, these are interpreted differently.
REX prefixes
The REX prefix is not part of the above four groups; REX is required for any
instruction which uses either one of the extended 64-bit registers (r8
-r15
,
or the xmm
registers)
or uses a 64-bit sized immediate or address. If present, the REX prefix must
occur immediately before the opcode (and immediately after any mandatory
prefixes). The REX prefix values are actually valid opcodes; they are
interpreted as REX only in 64-bit mode. For example, in 32-bit mode,
the REX prefix would be interpreted as an inc
instruction. (This further
implies that inc
must have a different encoding in 64-bit mode!)
There are 16 different REX prefixes, all encoded as 0x40
-0x4f
. REX is
encoded as
Field | 0100 | W | R | X | B | |||
---|---|---|---|---|---|---|---|---|
Bits | 7,6,5,4 | 3 | 2 | 1 | 0 |
W, if set, indicates 64-bit operand size.
R is used as an extra bit in the Reg field of the ModR/M byte. Because the Reg field is only 3 bits, it can normally only access the first eight registers. An extra bit is needed to access registers
r8-r15
.X is used as an extra bit in the Index field of the SIB byte. As the Index field names a register, this has the same justification as the R field above.
B is used as an extra bit in the Base field of the SIB byte, and sometimes as an extra bit in the R/M field of the ModR/M byte. Both of these fields name a register, so an extra bit is needed when accessing
r8-r15
.
The REX prefix is also used for mov
instructions with 64-bit immediate
operands; the default operand size, even in 64-bit mode, is 32-bits.
All of these have to do with extending existing instructions to support 64-bit operation.
(Intel’s 256- and 512-bit vector instructions use variations on the REX prefix called VEX and EVEX. The remainder of the instruction format is the same.)
Opcode
The opcode itself can be between 1 and 3 bytes.
A one-byte opcode has a value
!= 0x0f
If the first byte is
0x0f
, then a second opcode byte must be present after it.If the first byte is a mandatory prefix of
0x66, 0xF2
or0xF3
, then a second opcode byte must be present after it.If the first byte is either an escape byte or mandatory prefix, then two opcode bytes may follow it for three-byte opcodes. The first byte of the opcode determines whether another opcode byte is following it.
Examples
Here we look at some example instructions (taken from our programs) and decode them. (The fact that all the fields of the opcode lie on byte boundaries makes this easier than decoding MIPS instructions.)
If you want to see what the decoding of an instruction is, write a simple
assembly file containing just the section .text
and the instruction, e.g.,
; Save as decode.s
section .text
mov rax, byte [100] ; Mov from memory
Assemble this to an object file with
yasm -f elf64 -o decode.o decode.s -l decode.lst
and then view the listing file decode.lst
.
Example 1
add rax, rbx
This instruction uses 64-bit registers, so it will need the REX.W prefix byte:
01001000
WRXB
The opcode for the add
instruction varies depending on the register used,
and whether or not an immediate operand is present. For two 64-bit registers,
neither of them in r8-r15
, the opcode is 0x03 =
00000011
With two 64-bit register operands, the operands are encoded in the ModR/M byte,
the first (destination) operand in the Reg field, and the second in the R/M
field. The encoding of rax
as a register number is 000b
, and rbx
is
011
. The Mod field is 11, because both operands are registers.
Thus, the ModR/M byte is
11 011 000
(Note that the order of the operands is the reverse of that in the decoded
instruction: rbx
is in the Reg field, while R/M contains the destination
register rax
.)
Assembling these together gives a three-byte instruction:
0100 1000 00000011 11011000
REX W Opcode ModR/M
or 0x48 01 D8
in hex.
Example 2
add r10, r11
This example uses two of the registers that are only available in 64-bit mode. This will require using the bits of the REX prefix to extend the register indexes in the ModR/M byte:
0100 1101
REX WRXB
R = 1
will be used as an extra high bit for the Reg field, while B = 1
will be
used as an extra high bit in the R/M field.
As above, the destination register is r10
(= register number 10), which should
be placed in the R/M field. In order to encode 10 in binary, we need four
bits (10 decimal = 1010
binary), so the high bit is set as REX.B in the
prefix, and the remaining three bits 010
are placed in R/M. Similarly, 11
decimal is 1011
binary, so we place a 1 in REX.R, and then the remaining three
bits in the Reg field. This gives us a ModR/M byte of
11 011 010
M Reg R/M
Finally, add
, for two 64-bit register operands, where the destination is an
an extended 64-bit register, uses opcode 1 (= 00000001b
). This leaves us
with the final encoding of
0100 1101 00000001 11 011 010
REX WRXB Opcode M Reg R/M
Example 3
add r10, 17
This is addition, with an extended 64-bit register as the destination, of an
immediate. The immediate is small enough that it can be encoded in a single
byte, 17 = 00010001b
.
The destination register, r10
, requires setting the high bit in REX.B, and
010
in R/M. Immediate-Register operations also use a Mod = 11b
, so our
ModR/M byte is
11 000 010
Mo Reg R/M
and REX is
0100 1001
REX WRXB
Finally, the opcode for add-with-immediate is 0x83 (= 131 decimal, 10000011b
),
giving a decoding of
0100 1001 10000011 11 000 010 00010001
REX WRXB Opcode Mo Reg R/M Imm.
Example 4
add eax, 10
Here, we are not using any of the 64-bit registers, so no REX prefix is needed.
Addition with immediate uses opcode 0x83
. The destination register is
specified using the ModR/M byte (rax
= register 0), giving a decoding
10000011 11 000 000 00001010
Opcode Mo Reg R/M Imm.
Example 5
scasb
This instruction does not reference any operands, 64-bit or otherwise, so
it can be represented without a prefix or ModR/M byte. The opcode for
scasb
is 0xAE = 10101110
, which in turn is the entire decoding:
10101110
Opcode
Example 6
repe scasq
Here, we have made two changes: upgrade the instruction size to 64-bits (requiring
the REX prefix) and add the repe
prefix. The REX prefix must come immediately
before the opcode (0xAF = 10101111
; 0xAF
is the opcode for scas
larger than
a byte). The repe
prefix is 0xF3 = 11110011b
. Thus, we have
11110011 0100 1000 10101111
repe REX W Opcode
Example 7
syscall
Syscall is a two-byte instruction, starting with 0x0F
. Its primary opcode is
0x05
, giving an encoding of
00001111 00000101
Example 8
lea rax, [rbx + 2*rcx]
As usual, we need the REX.W
prefix = 01001000
. The opcode for lea
is
10001101b = 0x8D
. Because we are using a memory operand with both base and
offset (index) registers, we will need both the ModR/M and SIB bytes.
To encode the base + index * scale addressing scheme, we set Mod = 0 and R/M =
100b
.The destination register,
rax
is register number 0, which goes in the Reg field.The possible scale values 1,2,4,8 are mapped to the binary values
00,01,10,11
. Our scale of 2 is placed in the Scale field as01
.The index (offset) register is
rcx
, which is register number 1. This is placed in the index field.The base register is
rbx
, which is register number 3, placed in the Base field.
This leads to a ModR/M of
00 000 100
Mo Reg R/M
and an SIB of
01 001 011
Sc Ind Bas
for a final encoding of
0100 1000 10001101 00 000 100 01 001 011
REX W Opcode Mo Reg R/M Sc Ind Bas
Example 9
add rax, qword [10]
This is the most complex example, as it uses a 64-bit register, and a qword-sized memory operand.
The opcode is still 3, for addition into a 64-bit register, from either a
register or memory source. The destination register will be specified via the
Reg field (rax
= register number 0). The Mod and R/M fields combine to
specify the addressing mode, which here is just displacement (no base, offset,
or scale present). The encoding for this is Mod = 0, R/M = 100b
.
0100 1000 00000011 00 000 100 00 100 101 00001010 00000000 00000000 00000000
REX W Opcode Mo Reg R/M Sc Ind Bas Address
rsp rbp
The extra 0 bytes on the end are padding bytes, added to make the instruction have a size of 8.