Review
Any sort of non-sequential control flow (just moving from one instruction to
the next) in assembly involves branching, using one of a number of
conditional and/or unconditional jump instructions. This means that if-else
structures, switch-case
, and any kind of loop, will need to be implemented
using jumps.
Unconditional jumps
jmp target
where target:
is a label in the .text
section of the program (hopefully
within the same function!), causes the execution not to go to the instruction
immediate after the jump, but to the instruction marked by the label target:
.
Normally, I’ll use local labels (those starting with .
) for labels inside
functions.
Internally, in the CPU, the jmp
instruction just modifies the rip
register,
which contains the address of the next instruction to execute. Normally, rip
is updated by the CPU automatically, to point to the following instruction.
Computed jumps
It’s possible to jump to a target which is a register, rather than a label. This can be used for things like this:
mov rax, target
...
jmp rax
You can even build an array of targets in the .data
section and then jump to
one:
section .data
jmp_tbl: qword _start.label0, _start.label1, ...
section .text
_start:
...
mov rax, qword [jmp_tbl + 1]
jmp rax
We’ll use “jump table” technique later to implement a kind of switch-case.
Conditional jumps
Conditional jumps examine the flags register rflags
and jump to the target
only if the flags match a specific pattern; otherwise, execution proceeds
to the following instruction as usual.
The conditional jump instruction(s) is
jcc target
where cc is any of the condition codes:
Operation | Description | Flag condition |
---|---|---|
je |
Jump if op1 == op2 |
ZF == 1 |
jne |
Jump if op1 != op2 |
ZF == 0 |
jl |
Jump if op1 < op2 , signed |
SF != OF |
jle |
Jump if op1 <= op2 , signed |
ZF == 1 or SF != OF |
jg |
Jump if op1 > op2 , signed |
ZF == 0 and SF == OF |
jge |
Jump if op1 >= op2 , signed |
SF == OF |
jb |
Jump if op1 < op2 , unsigned |
CF == 1 |
jbe |
Jump if op1 <= op2 , unsigned |
CF == 1 or ZF == 1 |
ja |
Jump if op1 > op2 , unsigned |
CF == 0 and ZF == 0 |
jae |
Jump if op1 >= op2 , unsigned |
CF == 0 |
Operation | Description |
---|---|
jna |
Jump if not above |
jnae |
Jump if not above or equal |
jnb |
Jump if not below |
jnbe |
Jump if not below or equal |
jng |
Jump if not greater-than |
jnge |
Jump if not greater-than or equal |
jnl |
Jump if not less-than |
jnle |
Jump if not less-than or equal |
Operation | Description |
---|---|
jcxz |
Jump if cx == 0 |
jecxz |
Jump if ecx == 0 |
jrcxz |
Jump if rcx == 0 |
Operation | Description |
---|---|
jc |
Jump if CF == 1 |
jnc |
Jump if CF == 0 |
jz |
Jump if ZF == 1 |
jnz |
Jump if ZF == 0 |
jo |
Jump if OF == 1 |
jno |
Jump if OF == 0 |
js |
Jump if SF == 1 |
jns |
Jump if SF == 0 |
jz |
Jump if ZF == 1 |
jnz |
Jump if ZF == 0 |
jp |
Jump if PF == 1 |
jpo |
Jump if PF == 1 (jump if parity odd) |
jpe |
Jump if PF == 0 (jump if parity even) |
Conditions
Before executing a conditional jump, we obviously want to set the flags so that
the jump is meaningful. Although this can be done by any instruction which
updates the flags, the easiest way is by using the cmp
(compare) instruction:
cmp a, b
“compares” a
and b
by performing the subtraction a - b
. The result of the
subtraction is discarded, but the flags are still updated. E.g., if a > b
then
we would expect the zero flag to be unset (= 0).
Less-than and greater-than comparisons come in two forms, signed and unsigned. Signed comparisons use the “Less-than” and “Greater-than” names, and rely on the state of the OF (overflow), sign (SF), and zero (ZF) flags. Unsigned comparisons use the names “Below” and “Above” and rely on the state of the carry (CF) and zero (ZF) flags.
Another comparison instruction which is less useful (but faster) is the
test instruction test
:
test a, b
performs a bitwise AND operation between a
and b
, discarding the result
and updating the flags. This can be used to compare for (not-)equal-to-zero:
test rax, rax
jz target ; Jump if rax == 0
or to check the sign:
test rax, rax
js target ; Jump if rax < 0
test
is more useful for testing specific bits of a register, by constructing
a bitmask.
Computed conditional jumps
The conditional jump instructions do not jump to an absolute address within the
program, but rather to a signed byte offset from the current address. That
is, while jmp target
says “jump to target”, jcc target
actually stores the
positive or negative distance, in bytes, from the jcc
instruction to the
target, and then says “jump forward (or backward) distance many bytes”. The
offset is stored directly in the encoded instruction, and cannot be a register.
We can still perform a kind of “computed conditional jump”, but it’s a two-step
process:
mov rax, target ; Real jump target
cmp ...
jn__ .skip_jmp ; Note: negated condition
jmp rax
.skip_jmp:
...
Converting C/C++ control structures
We started by looking at how to convert common C/C++ control structures (if-else, loops, etc.) into their assembly equivalents.
Simple if-else
An if
with no else
becomes just
cmp ...
jn__ .end_if ; Note negated condition
... ; Body of the if
.end_if:
Similarly, an if-else
becomes
cmp ...
jn__ .else
... ; Body of if
jmp .endif
.else:
... ; Body of else
.endif:
Note that in both cases we negate the condition (i.e., if the if
was
if(rax < 0)
then the condition becomes jge
or jnl
), because we want the
jump to occur, skipping over the if { ... }
part, when the condition is
false.
This can be generalized to an if-else if-else
chain:
cmp ...
jn__ .else_if1
... ; Body of if
jmp .end_if
.else_if1:
cmp ...
jn__ .else_if2
... ; Body of else-if #1
jmp .end_if
.else_if2:
...
.else:
... ; Body of the final else
.end_if:
If-else with compound conditions
If the condition on the if
is compound (contains the logical operators !
,
&&
or ||
) then things are more tricky.
!
is simple: just invert the condition itself:
cmp rax, 0 ; if(!(rax == 0))
je .end_if ; Negated jne == je
... ; Body of if
.end_if:
&&
is relatively easy: we check both
conditions at the beginning and jump to the .endif
if either fails:
cmp ... ; Condition 1
jn__ .end_if
cmp ... ; Condition 2
jn__ .end_if
... ; Body of the if
.end_if:
||
(logical OR) is somewhat more tricky. There are a few ways we can deal
with this. A roundabout way is to test each condition, jumping into the body
of the if
if successful, otherwise skipping over it with an unconditional jump:
; rax == 0 || rbx == 0
cmp rax, 0
je .if
cmp rbx, 0
je .if
jmp .end_if
.if:
... ; Body of the if
.end_if:
This corresponds roughly to rewriting
if(rax == 0 || rbx == 0) {
// Body of if
}
into
if(rax != 0 && rbx != 0) {
// Empty
}
else {
// Body of if
}
With a bit of reorganization, we can save ourselves the trouble of the extra jump:
cmp rax, 0 ; rax == 0 || rbx == 0
je .if
cmp rbx, 0
jne .end_if ; Note negated condition!
.if:
... ; Body of if
.end_if:
Note that in a A || B || ... || Z
condition, only the last condition is
negated: all prior conditions jump into the body of the if, whereas if all
prior conditions have failed and the last condition fails, too, then we jump
over the if-body (corresponding to all of the conditions being false).
An else
or else-if
chain can be added to the above by combining them with
the above patterns.
Loops
do-while
loops
We’ve already seen that the loop
instruction serves to implement a kind of
do-while
loop, so long as you want to use a decrementing rcx
as the loop
variable, with the loop ending when rcx == 0
. With conditional jumps we can
build a more general do-while
loop:
.do ; do {
... ; Loop body
cmp rax, rbx
je .do ; } while(rax == rbx);
while
loops
Implementing a while loop requires testing the loop condition at the beginning of the loop, and possibly jumping to past-the-end of the loop if it fails. Thus, we need a label at both the beginning (so that we can do the loop at all) and the end of the loop:
.while: ; while(rax != rbx) {
cmp rax, rbx
je .end_whle
... ; Loop body
jmp .while
.end_while: ; }
...
A for
loop is just a specialized kind of while
loop, so for example,
for(rax = 0; rax < 100; ++rax) {
...
}
would become
xor rax, rax ; rax = 0
.for:
cmp rax, 100
jge .end_for
... ; Loop body
inc rax
jmp .for
.end_for:
break
and continue
break
is equivalent to a jmp
to a label immediately after the end of the
loop. continue
is equivalent to a jmp
to a label at the beginning of the
loop. The common pattern of
if(condition)
break; // Or continue
can be done with a conditional jump to the end/beginning; there is no need to
emulate the entire if
structure.
switch-case
statements
Unlike if-else
, switch-case
does not have a single translation to assembly.
Depending on the number of case labels and their values, the compiler may
translate a switch-case
into an if-else chain like the above, or into a
table-based jump (unconditional jump to a computed target, where the target is
looked up in an array of addresses), or even to a hash-table-like structure!
We will examine the second alternative, constructing a table of jump targets
and then using this to implement a switch-case
:
;;;;
;;;; switch_case.s
;;;; Implementing a switch-case statement as a jump table.
;;;;
section .data
jump_table: dq _start.case0, _start.case1, _start.case2, _start.case3
section .text
global _start
_start:
; Switch on rcx = 0, 1, 2, 3, default
mov rbx, qword [jump_table + 8*rcx]
cmp rcx, 4
jae .default
jmp rbx
.case0:
...
jmp .end_switch
.case1:
...
jmp .end_switch
.case2:
...
jmp .end_switch
.case3:
...
jmp .end_switch
.default:
...
.end_switch
...
Notes:
In the definition of the jump table, we have to use the full names of the
.case
labels. If we just wrote.case0
, it would refer to the (non-existing) labeljump_table.case0
.Every case has to end with a jump to the end of the switch. This is why every
case
has to end with abreak
! (If you left out the jumps, what would happen?)The memory operand
qword [jump_table + 8*rcx]
is using an extended form of memory lookup which we’ll go over later: suffice it to say that memory operands are more general than just[addr]
. In this case, we usejump_table
as the displacement of the lookup, and then multiplyrcx
by 8, because each table entry is 64-bits (8 bytes).
The table of case labels is always indexed as 0, 1, 2, 3, … If the actual case label values do not correspond to this, then we have to transform them in some way (the compiler would normally do this for use). E.g., if our labels were 10, 11, 12, 13, we could simple subtract 10 and use that as our indexes. If the labels were 10, 20, 30, 40, we could divide by 10 and subtract 1. If the labels were 3, 1, 2, 0, we could just renumber the cases.
If the case labels do not fit any pattern, we might have to simply loop through an array of values to find the correct label, or maybe even do a binary search (if the set of label values is big enough). In this case, we have two arrays, one of the label values, and one for the label targets.
Examples
A function that converts a byte from lowercase to uppercase
;;; uppercase
;;; Converts byte [rdi] from uppercase to lowercase.
;;;
uppercase:
; rdi = addr. of byte to convert
cmp byte [rdi], 'a'
jb .done
cmp byte [rdi], 'z'
ja .done
sub byte [rdi], 32
.done
ret
This is equivalent to
if(*rdi >= a)
if(*rdi <= z)
*rdi -= 32;
This could also be done by using the subtraction-based range testing trick, provided we move the value into a register first:
uppercase:
mov al, byte [rdi]
sub al, 'a' ; Values below 'a' will overflow
cmp al, 'z' - 'a'
ja .done
sub byte [rdi], 32
.done:
ret
Simple functions
As we’ll see later, calling C functions from assembly, or making our assembly functions callable from C/C++ requires a few extra steps to set up the stack correctly. However, as long as we stay purely in “assembly-land”, we don’t need to worry about the extra complexity; we can essentially make functions work however we like, so long as we are consistent about it. The only requirements are that we be able to return from a function and get back to where we were, and fortunately the instructions handle that for us.
The two instructions that handle functions are call
and ret
. Both use
the stack internally:
call
takes an address (a label in our.text
section) and performs two steps: pushesrip
, the instruction pointer, onto the stack, and then jumps to the address given. Remember thatrip
points to the next instruction to be executed, so the value pushed onto the stack is effectively the return address of the function, the address where execution should resume when the function returns.ret
pops the top element of the stack and jumps to it.rip
is updated to the following instruction automatically.
These work together as follows (addresses are just made up):
Address | Instruction | Address | Instruction |
---|---|---|---|
_start: |
my_func: |
||
0x100 | call my_func |
0x200 | mov eax, ... |
0x108 | mov rbx, rax … |
0x208 | … |
… | 0x280 | ret |
While my_func
is executing, the stack contains 0x108, the return address.
When ret
is executed, this address is popped off the stack and we resume
execution at that point. (Later, we’ll see that this means if you’re using
the stack for anything else, you have to make sure you’ve popped every off
before you ret
, so that at that point, the only thing on the stack is the
return address.)
As for getting arguments into functions, and return values out of functions, remember that registers are essentially global variables. We can, at this point, simply decide to use some registers to hold the arguments, and some register to hold the return value. However, to get used to the way things will be later, we will try to stick with what will eventually become the convention for calling functions:
Pass the first six arguments in registers
rdi
,rsi
,rdx
,rcx
,r8
andr9
. Note that this is slightly different from thesyscall
convention (rcx
instead ofr10
).Return results in
rax
As an example, let’s write a function that prints a string (given as an
address and a length) and adds a newline at the end. This will just wrap up
the call to the write
syscall we’ve been using:
section .data
newline: db 10
section .text
write_ln:
; rdi = address
; rsi = length
mov rax, 1
mov rdi, 1
mov rsi, rdi
mov rdx, rsi
syscall
mov rax, 1
mov rdi, 1
mov rsi, newline
mov rdx, 1
syscall
ret
(Because we can switch sections at any time, you could put this code in a
file and then %include
it to make the write_ln
function available in any
program you write.)
To use this, we load the registers with the appropriate information and then
issue a call
:
section .data
msg: db "Hello, world!"
MSGLEN: equ $-msg
section .text
mov rdi, msg
mov rsi, MSGLEN
call write_ln
; Normal exit syscall...
We could, of course, further tidy this up by writing a function to wrap up
the write
syscall:
sys_write:
; rdi = address
; rsi = length
mov rax, 1
mov rdi, 1
mov rsi, rdi
mov rdx, rsi
syscall
ret
and then our write_ln
becomes just
write_ln:
; rdi = address
; rsi = length
call sys_write
mov rdi, newline
mov rsi, 1
call sys_write
ret
Note that our functions “clobber” the registers rdi
, rsi
, rax
and rdx
:
when calling either write_ln
or sys_write
you cannot rely on the values of
these registers being preserved through the call. Theoretically, which registers
and clobbered and which are preserved should be part of our function calling
convention, so that when we call a function, we know which registers are “safe”.
As of right now, we don’t have an easy way to “save” the value of a register,
except by putting it in another register, which in this case doesn’t actually
help! The eventual solution will be to push
any registers we need to save
onto the stack, and then pop
them off later. Different registers are
designated as caller saved (meaning the function doing the call
instruction is responsible for saving them) vs. callee saved, meaning the
function being called is responsible for saving them.
As above, we’ll try to use the same set of caller/callee saved registers as will eventually be required when we interoperate with C-code:
Register | Use |
---|---|
rax |
Return value |
rbx |
Callee-saved |
rcx |
4th argument |
rdx |
3rd argument |
rsi |
2nd argument |
rdi |
1st argument |
rbp |
Callee-saved |
rsp |
Stack pointer |
r8 |
5th argument |
r9 |
6th argument |
r10 |
Temporary (caller-saved) |
r11 |
Temporary (caller-saved) |
r12 -r15 |
Callee-saved |
The general flow of a function call, and the function itself, then becomes:
...
push r11 ; Push any caller-saved registers
push r10
call func
pop r10 ; Pop in opposite order after return
pop r11
...
func:
push r12 ; Push any callee-saved registers
push rbx
...
pop rbx ; Pop in opposite order before return
pop r12
ret
“Function pointers”
The address passed to call
can be a register, not just a label:
mov r11, my_function
call r11
This is equivalent to calling a function through a function pointer. As with
unconditional jump labels, you can build an array in the .data
section of
function labels and then call
into it. This is how virtual methods work
in C++: there is an array of pointers-to-functions, and the method name is
mapped to an index into the array.