Review

Any sort of non-sequential control flow (just moving from one instruction to the next) in assembly involves branching, using one of a number of conditional and/or unconditional jump instructions. This means that if-else structures, switch-case, and any kind of loop, will need to be implemented using jumps.

Unconditional jumps

jmp target

where target: is a label in the .text section of the program (hopefully within the same function!), causes the execution not to go to the instruction immediate after the jump, but to the instruction marked by the label target:. Normally, I’ll use local labels (those starting with .) for labels inside functions.

Internally, in the CPU, the jmp instruction just modifies the rip register, which contains the address of the next instruction to execute. Normally, rip is updated by the CPU automatically, to point to the following instruction.

Computed jumps

It’s possible to jump to a target which is a register, rather than a label. This can be used for things like this:

mov rax, target
...
jmp rax

You can even build an array of targets in the .data section and then jump to one:

section .data

jmp_tbl:    qword   _start.label0, _start.label1, ...

section .text
_start:

    ...
    mov rax, qword [jmp_tbl + 1]
    jmp rax

We’ll use “jump table” technique later to implement a kind of switch-case.

Conditional jumps

Conditional jumps examine the flags register rflags and jump to the target only if the flags match a specific pattern; otherwise, execution proceeds to the following instruction as usual.

The conditional jump instruction(s) is

jcc target

where cc is any of the condition codes:

Operation Description Flag condition
je Jump if op1 == op2 ZF == 1
jne Jump if op1 != op2 ZF == 0
jl Jump if op1 < op2, signed SF != OF
jle Jump if op1 <= op2, signed ZF == 1 or SF != OF
jg Jump if op1 > op2, signed ZF == 0 and SF == OF
jge Jump if op1 >= op2, signed SF == OF
jb Jump if op1 < op2, unsigned CF == 1
jbe Jump if op1 <= op2, unsigned CF == 1 or ZF == 1
ja Jump if op1 > op2, unsigned CF == 0 and ZF == 0
jae Jump if op1 >= op2, unsigned CF == 0
Operation Description
jna Jump if not above
jnae Jump if not above or equal
jnb Jump if not below
jnbe Jump if not below or equal
jng Jump if not greater-than
jnge Jump if not greater-than or equal
jnl Jump if not less-than
jnle Jump if not less-than or equal
Operation Description
jcxz Jump if cx == 0
jecxz Jump if ecx == 0
jrcxz Jump if rcx == 0
Operation Description
jc Jump if CF == 1
jnc Jump if CF == 0
jz Jump if ZF == 1
jnz Jump if ZF == 0
jo Jump if OF == 1
jno Jump if OF == 0
js Jump if SF == 1
jns Jump if SF == 0
jz Jump if ZF == 1
jnz Jump if ZF == 0
jp Jump if PF == 1
jpo Jump if PF == 1 (jump if parity odd)
jpe Jump if PF == 0 (jump if parity even)

Conditions

Before executing a conditional jump, we obviously want to set the flags so that the jump is meaningful. Although this can be done by any instruction which updates the flags, the easiest way is by using the cmp (compare) instruction:

cmp a, b

“compares” a and b by performing the subtraction a - b. The result of the subtraction is discarded, but the flags are still updated. E.g., if a > b then we would expect the zero flag to be unset (= 0).

Less-than and greater-than comparisons come in two forms, signed and unsigned. Signed comparisons use the “Less-than” and “Greater-than” names, and rely on the state of the OF (overflow), sign (SF), and zero (ZF) flags. Unsigned comparisons use the names “Below” and “Above” and rely on the state of the carry (CF) and zero (ZF) flags.

Another comparison instruction which is less useful (but faster) is the test instruction test:

test a, b

performs a bitwise AND operation between a and b, discarding the result and updating the flags. This can be used to compare for (not-)equal-to-zero:

test rax, rax
jz target       ; Jump if rax == 0

or to check the sign:

test rax, rax
js target       ; Jump if rax < 0

test is more useful for testing specific bits of a register, by constructing a bitmask.

Computed conditional jumps

The conditional jump instructions do not jump to an absolute address within the program, but rather to a signed byte offset from the current address. That is, while jmp target says “jump to target”, jcc target actually stores the positive or negative distance, in bytes, from the jcc instruction to the target, and then says “jump forward (or backward) distance many bytes”. The offset is stored directly in the encoded instruction, and cannot be a register. We can still perform a kind of “computed conditional jump”, but it’s a two-step process:

    mov rax, target         ; Real jump target
    cmp ...
    jn__ .skip_jmp           ; Note: negated condition
    jmp rax

.skip_jmp:
    ... 

Converting C/C++ control structures

We started by looking at how to convert common C/C++ control structures (if-else, loops, etc.) into their assembly equivalents.

Simple if-else

An if with no else becomes just

    cmp ...
    jn__ .end_if        ; Note negated condition

    ...                 ; Body of the if
.end_if:

Similarly, an if-else becomes

    cmp ...
    jn__ .else

    ...                 ; Body of if

    jmp .endif

.else:

    ...                 ; Body of else

.endif:

Note that in both cases we negate the condition (i.e., if the if was if(rax < 0) then the condition becomes jge or jnl), because we want the jump to occur, skipping over the if { ... } part, when the condition is false.

This can be generalized to an if-else if-else chain:

    cmp ...
    jn__ .else_if1

    ...                 ; Body of if

    jmp .end_if

.else_if1: 
    cmp ...
    jn__ .else_if2

    ...                 ; Body of else-if #1

    jmp .end_if

.else_if2:

    ...

.else:

    ...                 ; Body of the final else

.end_if:

If-else with compound conditions

If the condition on the if is compound (contains the logical operators !, && or ||) then things are more tricky.

! is simple: just invert the condition itself:

    cmp rax, 0              ; if(!(rax == 0))
    je .end_if              ; Negated jne == je

    ...                     ; Body of if

.end_if:

&& is relatively easy: we check both conditions at the beginning and jump to the .endif if either fails:

    cmp ...             ; Condition 1
    jn__ .end_if
    cmp ...             ; Condition 2
    jn__ .end_if

    ...                 ; Body of the if

.end_if:

|| (logical OR) is somewhat more tricky. There are a few ways we can deal with this. A roundabout way is to test each condition, jumping into the body of the if if successful, otherwise skipping over it with an unconditional jump:

    ; rax == 0 || rbx == 0
    cmp rax, 0          
    je .if
    cmp rbx, 0
    je .if
    jmp .end_if
.if:

    ...                 ; Body of the if

.end_if:

This corresponds roughly to rewriting

if(rax == 0 || rbx == 0) {
    // Body of if
}

into

if(rax != 0 && rbx != 0) { 
    // Empty
}
else {
    // Body of if
}

With a bit of reorganization, we can save ourselves the trouble of the extra jump:

    cmp rax, 0          ; rax == 0 || rbx == 0
    je .if
    cmp rbx, 0
    jne .end_if         ; Note negated condition!

.if:
    ...                 ; Body of if

.end_if:

Note that in a A || B || ... || Z condition, only the last condition is negated: all prior conditions jump into the body of the if, whereas if all prior conditions have failed and the last condition fails, too, then we jump over the if-body (corresponding to all of the conditions being false).

An else or else-if chain can be added to the above by combining them with the above patterns.

Loops

do-while loops

We’ve already seen that the loop instruction serves to implement a kind of do-while loop, so long as you want to use a decrementing rcx as the loop variable, with the loop ending when rcx == 0. With conditional jumps we can build a more general do-while loop:

.do                 ; do {

  ...               ;   Loop body

  cmp rax, rbx      
  je .do            ; } while(rax == rbx);

while loops

Implementing a while loop requires testing the loop condition at the beginning of the loop, and possibly jumping to past-the-end of the loop if it fails. Thus, we need a label at both the beginning (so that we can do the loop at all) and the end of the loop:

.while:         ; while(rax != rbx) {
  cmp rax, rbx
  je .end_whle

  ...           ;   Loop body

  jmp .while
.end_while:     ; }

  ...

A for loop is just a specialized kind of while loop, so for example,

for(rax = 0; rax < 100; ++rax) {
  ...
}

would become

  xor rax, rax      ; rax = 0
.for:     
  cmp rax, 100
  jge .end_for

  ...               ; Loop body

  inc rax
  jmp .for
.end_for:

break and continue

break is equivalent to a jmp to a label immediately after the end of the loop. continue is equivalent to a jmp to a label at the beginning of the loop. The common pattern of

if(condition)
  break; // Or continue

can be done with a conditional jump to the end/beginning; there is no need to emulate the entire if structure.

switch-case statements

Unlike if-else, switch-case does not have a single translation to assembly. Depending on the number of case labels and their values, the compiler may translate a switch-case into an if-else chain like the above, or into a table-based jump (unconditional jump to a computed target, where the target is looked up in an array of addresses), or even to a hash-table-like structure! We will examine the second alternative, constructing a table of jump targets and then using this to implement a switch-case:

;;;; 
;;;; switch_case.s
;;;; Implementing a switch-case statement as a jump table.
;;;;

section .data

jump_table: dq  _start.case0, _start.case1, _start.case2, _start.case3

section .text

global _start
_start:

  ; Switch on rcx = 0, 1, 2, 3, default
  mov rbx, qword [jump_table + 8*rcx]
  cmp rcx, 4
  jae .default
  jmp rbx

.case0:

  ...
  jmp .end_switch

.case1:

  ...
  jmp .end_switch 

.case2:

  ...
  jmp .end_switch

.case3:

  ...
  jmp .end_switch

.default:
  ...

.end_switch
  ...

Notes:

The table of case labels is always indexed as 0, 1, 2, 3, … If the actual case label values do not correspond to this, then we have to transform them in some way (the compiler would normally do this for use). E.g., if our labels were 10, 11, 12, 13, we could simple subtract 10 and use that as our indexes. If the labels were 10, 20, 30, 40, we could divide by 10 and subtract 1. If the labels were 3, 1, 2, 0, we could just renumber the cases.

If the case labels do not fit any pattern, we might have to simply loop through an array of values to find the correct label, or maybe even do a binary search (if the set of label values is big enough). In this case, we have two arrays, one of the label values, and one for the label targets.

Examples

A function that converts a byte from lowercase to uppercase

;;; uppercase
;;; Converts byte [rdi] from uppercase to lowercase.
;;;
uppercase:
  ; rdi = addr. of byte to convert

  cmp byte [rdi], 'a'
  jb .done
  cmp byte [rdi], 'z'
  ja .done

  sub byte [rdi], 32 

  .done
  ret

This is equivalent to

if(*rdi >= a)
  if(*rdi <= z)
      *rdi -= 32;

This could also be done by using the subtraction-based range testing trick, provided we move the value into a register first:

uppercase:

  mov al, byte [rdi]
  sub al, 'a'         ; Values below 'a' will overflow
  cmp al, 'z' - 'a'
  ja .done

  sub byte [rdi], 32

  .done:
  ret

Simple functions

As we’ll see later, calling C functions from assembly, or making our assembly functions callable from C/C++ requires a few extra steps to set up the stack correctly. However, as long as we stay purely in “assembly-land”, we don’t need to worry about the extra complexity; we can essentially make functions work however we like, so long as we are consistent about it. The only requirements are that we be able to return from a function and get back to where we were, and fortunately the instructions handle that for us.

The two instructions that handle functions are call and ret. Both use the stack internally:

These work together as follows (addresses are just made up):

Address Instruction Address Instruction
_start: my_func:
0x100 call my_func 0x200 mov eax, ...
0x108 mov rbx, rax 0x208
0x280 ret

While my_func is executing, the stack contains 0x108, the return address. When ret is executed, this address is popped off the stack and we resume execution at that point. (Later, we’ll see that this means if you’re using the stack for anything else, you have to make sure you’ve popped every off before you ret, so that at that point, the only thing on the stack is the return address.)

As for getting arguments into functions, and return values out of functions, remember that registers are essentially global variables. We can, at this point, simply decide to use some registers to hold the arguments, and some register to hold the return value. However, to get used to the way things will be later, we will try to stick with what will eventually become the convention for calling functions:

As an example, let’s write a function that prints a string (given as an address and a length) and adds a newline at the end. This will just wrap up the call to the write syscall we’ve been using:

section .data

newline:    db      10

section .text

write_ln:

    ; rdi = address
    ; rsi = length

    mov rax, 1 
    mov rdi, 1
    mov rsi, rdi
    mov rdx, rsi
    syscall

    mov rax, 1
    mov rdi, 1
    mov rsi, newline
    mov rdx, 1
    syscall

    ret

(Because we can switch sections at any time, you could put this code in a file and then %include it to make the write_ln function available in any program you write.)

To use this, we load the registers with the appropriate information and then issue a call:

section .data

msg:    db      "Hello, world!"
MSGLEN: equ     $-msg

section .text

    mov rdi, msg
    mov rsi, MSGLEN
    call write_ln

    ; Normal exit syscall...

We could, of course, further tidy this up by writing a function to wrap up the write syscall:

sys_write:

    ; rdi = address
    ; rsi = length

    mov rax, 1 
    mov rdi, 1
    mov rsi, rdi
    mov rdx, rsi
    syscall

    ret

and then our write_ln becomes just

write_ln:

    ; rdi = address
    ; rsi = length
    call sys_write

    mov rdi, newline
    mov rsi, 1
    call sys_write

    ret

Note that our functions “clobber” the registers rdi, rsi, rax and rdx: when calling either write_ln or sys_write you cannot rely on the values of these registers being preserved through the call. Theoretically, which registers and clobbered and which are preserved should be part of our function calling convention, so that when we call a function, we know which registers are “safe”. As of right now, we don’t have an easy way to “save” the value of a register, except by putting it in another register, which in this case doesn’t actually help! The eventual solution will be to push any registers we need to save onto the stack, and then pop them off later. Different registers are designated as caller saved (meaning the function doing the call instruction is responsible for saving them) vs. callee saved, meaning the function being called is responsible for saving them.

As above, we’ll try to use the same set of caller/callee saved registers as will eventually be required when we interoperate with C-code:

Register Use
rax Return value
rbx Callee-saved
rcx 4th argument
rdx 3rd argument
rsi 2nd argument
rdi 1st argument
rbp Callee-saved
rsp Stack pointer
r8 5th argument
r9 6th argument
r10 Temporary (caller-saved)
r11 Temporary (caller-saved)
r12-r15 Callee-saved

The general flow of a function call, and the function itself, then becomes:

    ...
    push r11        ; Push any caller-saved registers
    push r10        
    call func
    pop r10         ; Pop in opposite order after return
    pop r11

...
func:

    push r12        ; Push any callee-saved registers
    push rbx
    ...
    pop rbx         ; Pop in opposite order before return
    pop r12
    ret

“Function pointers”

The address passed to call can be a register, not just a label:

mov r11, my_function
call r11

This is equivalent to calling a function through a function pointer. As with unconditional jump labels, you can build an array in the .data section of function labels and then call into it. This is how virtual methods work in C++: there is an array of pointers-to-functions, and the method name is mapped to an index into the array.