Review: Arithmetic operations

Arithmetic operations

add dest, src       ; dest += src
sub dest, src       ; dest -= src

add and sub perform addition and subtraction between two operands of the same size. Internally, sub is just addition with the second operand negated, and the carry inverted at the end.

Flags

add and sub set/unset the OF, SF, ZF, AF, CF, and PF flags:

Note that all flags are set/cleared on all operations, but some flags only make sense on signed/unsigned operations. The add/sub instructions don’t know whether you are performing a signed or unsigned operation, so its up to you to make sure you check the correct flags for the type of operation you are performing.

Example: Let’s perform a add operation and see how the flags are set from it:

  111  11
   10110011   = 179 (unsigned)   = -77 (signed)
 + 01100110   = 102 (unsigned)   = 102 (signed)
────────────
1  00011001   =  25 (unsigned)   = 25  (signed)

Increment and decrement

inc dest    ; ++dest
dec dest    ; --dest

inc and dec increment/decrement their single operand, which can be either a register or a memory location. inc and dec do not modify the carry flag, as an add r, 1 or sub r, 1 instruction would. The flags OF, SF, ZF, AF, and PF are set/cleared as expected. When used on signed values, the behavior is still correct (incrementing a negative value brings it closer to 0, decrementing a negative value makes it more negative).

Addition/subtraction larger than 64-bits

The largest registers we have are 64-bits (qword). What if we want to perform an addition/subtraction on 128-bit operands (represented as, e.g., rdx:rax)? Let’s consider how we would perform word-sized addition, if the only addition we could do natively was byte-sized:

     111111←   1111
   00101101 11001101
 + 00010010 10101011 
─────────────────────
   01000000 01111000  

Adding the low bytes produced an extra carry (CF = 1), which we then used to start the addition of the high bytes. We effectively need two kinds of addition:

This is how we perform larger-than-qword addition, there is another addition operation, add-with-carry, adc which uses the status of the carry flag CF as an input for the first bit’s addition.

adc dest, src       ; dest = dest + src + CF

For subtraction there is sbb, subtract-with-borrow.

Thus, to add the double-qword rdx:rax to rcx:rbx, we would do

add rax, rbx 
adc rdx, rcx

The analogue for subtraction is sbb, subtract-with-borrow.

Multiplication and division

Multiplication and division are more complex than addition/subtraction. We will cover them in more detail later, but for now:

To store a double-qword (128-bit) result, we use a combination of rax and rdx: rax stores the low qword while rdx stores the high qword. We write this combination as rdx:rax. (Using a similar notation, we could say that ax = ah:al.) Smaller multiplications do not require this extension.

The unsigned/signed multiplication instructions are mul and imul, respectively:

Instruction Equivalent
mul rm rdx:rax *= rm, unsigned
imul rm rdx:rax *= rm, signed
imul r, rm r *= rm, signed
imul r, rm, imm r = rm * imm, signed

The CF and OF flags are set/cleared together, if the sign of the result is incorrect. If the result of the multiplication does not fit into the destination, the results are truncated (high bits discarded). The values in the other flags are undefined.

Division only has a single operand form, where the operand contains the divisor; the destination (which is also the dividend) is in rdx:rax. The result of div/idiv is both the rounded-down result in rax, but also the remainder (i.e., modulo or %) in rdx. Unlike C++ where we have / for integer division and % for integer modulo, in assembly a single instruction gives us both results.

Instruction Equivalent
div rm rax = rdx:rax / rm and rdx = rdx:rax % rm, unsigned
idiv rm rax = rdx:rax / rm and rdx = rdx:rax % rm, signed

An overflow in division is indicated not by setting the carry flag, but by a divide-error exception #DE, which is sent to our process as a signal SIGFPE. For now, this will immediately crash our program, but later we’ll see how to write a signal handler to deal with it in a more graceful manner. (Of course, we could also avoid the overflow by checking the operands before performing the division.)

Functions, branching and conditional instructions

Assembly language does not have dedicated looping constructs (like for, do, while, etc.). It only has

Function calls and returns are just specialized forms of branches, which manipulate the stack.

The structure of an assembly language program

We’ve said that the difference between assembly and a language like C/C++ is that each “statement” (instruction) in assembly corresponds to exactly one CPU operation. In contrast, in C/C++, a single statement may generate many operations during compilation. This means that assembly cannot have “conditional statements” or “looping statements” the way C/C++ does; in C/C++ these are compound statements, statements with other statements inside them. This necessarily means that an if-else or a while-loop generates more than one CPU operation. Hence, in assembly language, loops and conditions work very differently.

An assembly-language program is, in the end, just a sequence of instructions. That’s it. There’s no real division between (for example) different functions, or between the “body” of a loop or if-else and the rest of the function in which it is written. The program is just a big blob of instructions, and hence it falls to us to impose some structure on it. The usual programming language constructs — functions, conditionals, loops — are things that we have to build for ourself.

Every instruction in an assembly language program has an address, the location at which it will end up in memory when the program is eventually run. Adding a label tells the assembler that this address (the address of the instruction following the label) is important, important enough to be saved and given a name. Hence, when we write

_start:
  ...

the “value” of _start is the address of the first instruction immediately following it. This is true for local labels (those whose name starts with .) as well.

The normal flow of control in an assembly language program is simple: each instruction is executed in order from first to last. The CPU always knows what instruction “comes next” in the program: its the one directly after this one.

The only kind of other control flow supported by assembly language is, instead of running instructions in first-to-last sequence, jumping to an address (implemented by the CPU as changing the value of rip, the instruction pointer register). All of our existing flow control structures (if-else, switch-case, while, do-while) will have to be translated into this primitive notion of either skipping forward in the program, over some instructions, or skipping backwards in the program, so that some addresses in the program are fed to the CPU for instruction more than once.

Branches

Branches work by jumping to a new location. Because the instruction pointer register points to the following register, this is not just a matter of changing rip; it involves redirecting the CPU to the new address and loading the instruction immediately after the address into rip. Fortunately, all this is done under the hood; we only provide the address to jump to, usually in the form of an address.

Labels

The target of a jump must be a label. Labels consist of an identifier followed by a colon:

Target:
    ...

A local label is a label whose name starts with a period. The full name of a local label is the most recent non-local label added to the local label name. E.g.

my_function:
  ...
  .begin_loop:    ; Full name: my_function.begin_loop

This allows us to have multiple labels with the same “name” so long as they are in different functions/chunks of code.

A label is just an address, the address of the next instruction (or data, if used in the .data section).

Jumps

To jump to a label, use the jmp instruction:

jmp target

Note that the “value” of a label is simply its address within the program. Thus, it’s possible to store a label within a register, jump to a register, etc.

mov rax, Target 
jmp rax

This is sometimes called a “computed jump”. We could, for example, store a set of labels in an array (in the .data section) and then use an array index to determine what label to jump to. This technique will be used later to implement switch-case structures.

Comparisons

There are two comparison instructions, of which cmp is the first and most straightforward; it requires two operands, and both operands must be the same size. The first operand cannot be an immediate, but the second can be. Either of the operands can be in memory, but not both at the same time.

cmp op1, op2

The comparison instruction internally performs the subtraction op1 - op2, discarding the result but updating the flags register accordingly. For example, if op1 - op2 == 0 then the zero flag ZF will be set; but if op1 - op2 == 0 then op1 == op2, so the set zero flag tells us that the original operands were equal. Similarly, subtraction sets the carry flag if op1 > op2.

Conditions

Various combinations of flags can be used to determine the relationships between the two operands of sub a, b:

Each of these condition codes will be used, later, in a conditional jump instruction. For signed comparisons we normally use the terms “less than” and “greater than”; for unsigned comparisons we say “below” and “above”.

Memory-memory comparisons

The cmp instruction cannot compare two operands both in memory. The cmps* family of instructions can compare two operands in memory, the first located at [rsi] and the second at [rdi].

Instruction Description
cmpsb Compares byte [rsi] with byte [rdi]
cmpsw Compares word [rsi] with word [rdi]
cmpsd Compares dword [rsi] with dword [rdi]
cmpsq Compares qword [rsi] with qword [rdi]

The cmps* instructions do not take any operands; they always use rsi and rdi.

The test instruction

An alternative to the cmp instruction is test: while cmp performs subtraction and updates the same flags as sub, test performs a binary AND and updates only the SF, ZF, and PF flags. The CF and OF flags are cleared. This means that test cannot be used to determine any of the conditions which depend on those flags (ordering comparisons such as greater, less, above, below), or equality. Because it uses AND instead of subtraction, the uses for test are more limited:

test is subject to a few restrictions on its operands:

Neither operand is modified by the test instruction; only the flags register is changed.

Other instructions

Remember that many other instructions setup the flags register; you don’t have to just restrict yourself to cmp and test! For example, suppose you want to decrement rcx and then jump somewhere as long as it is not equal to 0. This can be done with just

dec rcx
jnz target

dec will set ZF if the result is 0, so there’s no need to use a cmp or test.

Another example would be if we needed to perform the subtraction rax -= rbx and then jump on whether rax was 0 afterwards. It would be wasteful to do

sub rax, rbx
cmp rax, 0
jz label

when we could just do

sub rax, rbx
jz label

because sub will have set the zero flag if rax == 0.

Conditional branch instructions

The conditional branch instructions examine the flags register and either jump to the target or not. These are usually referred to as jcc, where cc is a condition code:

Operation Description Flag condition
je Jump if op1 == op2 ZF == 1
jne Jump if op1 != op2 ZF == 0
jl Jump if op1 < op2, signed SF != OF
jle Jump if op1 <= op2, signed ZF == 1 or SF != OF
jg Jump if op1 > op2, signed ZF == 0 and SF == OF
jge Jump if op1 >= op2, signed SF == OF
jb Jump if op1 < op2, unsigned CF == 1
jbe Jump if op1 <= op2, unsigned CF == 1 or ZF == 1
ja Jump if op1 > op2, unsigned CF == 0 and ZF == 0
jae Jump if op1 >= op2, unsigned CF == 0

(For the unsigned comparisons, “a” is short for “above” and “b” is short for “below”. For signed, “greater” and “less” are used. “ae” is “above or equal”, while “ge” is “greater-than or equal”.)

For those who were annoyed that C/C++ does not have negated comparisons (!<, !>=) will be pleased to know that assembly has jnl (not-less-than) as a synonym for jge and so forth:

Operation Description
jna Jump if not above
jnae Jump if not above or equal
jnb Jump if not below
jnbe Jump if not below or equal
jng Jump if not greater-than
jnge Jump if not greater-than or equal
jnl Jump if not less-than
jnle Jump if not less-than or equal

These are just aliases for the above instructions (e.g., jna is an alias for jbe).

There are a set of jumps which mimic the operation of loop by examining the rcx register:

Operation Description
jcxz Jump if cx == 0
jecxz Jump if ecx == 0
jrcxz Jump if rcx == 0

Note that these jump if rcx is equal to zero, while loop jumps if rcx is not equal to zero.

Finally, there are a set of semi-redundant conditions which refer to the flag names directly:

Operation Description
jc Jump if CF == 1
jnc Jump if CF == 0
jz Jump if ZF == 1
jnz Jump if ZF == 0
jo Jump if OF == 1
jno Jump if OF == 0
js Jump if SF == 1
jns Jump if SF == 0
jz Jump if ZF == 1
jnz Jump if ZF == 0
jp Jump if PF == 1
jpo Jump if PF == 1 (jump if parity odd)
jpe Jump if PF == 0 (jump if parity even)

These may be more suitable for use with test or with other instructions that set the flags.

For example, suppose we want to implement the following code

if(rcx == 0)
    rax = rbx;

Using conditions and conditional branches, we could do this with

cmp rcx, 0
jne NotZero
mov rax, rbx
NotZero:
...

Jump targets

The normal jmp instruction can jump to any address. The conditional jumps store their target as a signed 8- or 32-bit offset from the current jump instruction’s address. In assembly we write a label, and the assembler determines how many bytes ahead (positive) or before (negative) the jump the label is, and writes this offset into the instruction. An 8-bit jump is called a “short” jump while a “32-bit” jump is called a “near” jump.

An 8-bit jump can be encoded in a smaller instruction than a 32-bit one, for obvious reasons. Yet another reason to keep your loops short!

Conditional jumps to computed targets

With an unconditional jump, it is easy to jump to a target defined by a register:

target:
...
mov rax, target
jmp rax

because the value of the rax register is the address to jump to. Because conditional jumps use not an absolute address, but an offset from the current address, computed conditional jumps require some more finesse.

The simplest method is to make the conditional jump to a fixed target, where the target is a normal jmp to a computed address:

jcc my_jmp_target

  ⋮

my_jmp_target:  jmp rax

This method is slightly slower than ideal, because it involves two jumps. A faster method is to compute the distances between the final jcc instruction and the various jump targets, and then store these distances into some register. Because these distances are fixed at assembly time, it would be inefficient to compute them at runtime. The general strategy is to label the conditional jump instruction itself, so that we have access to its address:

target1:

  ⋮
                mov rax, computed_jump - target1  ; Pick target to jump to
computed_jump:  jcc rax                           ; Jump 

  ⋮

target2:

  ⋮

The mov would of course be part of a conditional structure that would either do

mov rax, computed_jump - target1

or

mov rax, computed_jump - target2

depending on some condition. We could also store an array of computed_jump - target1, computed_jump - target2, etc. offsets and then index into that.

Compound conditions

How can we check compound conditions, e.g., rbx >= 10 and rbx < 100 and perform a jump if the compound condition is true?

Optimizing jumps

Conditional jumps are expensive! (Unconditional jumps are more expensive than normal sequential control flow, but not as expensive as conditional jumps.) The processor does not know what instruction will be taken until it has examined the flags register, which means many of the optimizations it performs have to be delayed. The best way to optimize jumps is to minimize their usage: try to keep as much of your control flow sequential as possible. Beyond that, try to

SETcc and bool → int conversion

Sometimes, in C/C++ we rely on the implicit conversion from bool → int to avoid writing an if/else. E.g., to count the number of negative values in an array we could do:

int c = 0;
for(int* p = arr; p < arr + size)
   c += (*p < 0);

This works because bool true converts to 1 (thus becoming c += 1) and false converts to 0 (becoming c += 0). This code is actually faster than the equivalent code:

int c = 0;
for(int* p = arr; p < arr + size)
   if(*p > 0)
      ++c;

because evaluating a conditional branch is slower for the CPU. To implement the above version, we can use the SETcc instruction, which sets a given (byte) register to 1 if the condition code cc is satisfied, or 0 if it is not. E.g., to increment rax only when rbx > 0 we could do

mov rcx, 0

cmp rbx, 0
seta cl      ; Set cl = 1 if rbx > 0
add rax, rcx

Transforming C/C++ constructs

If-else chains

The classic C/C++ if-else if construct:

if(condition1) {
  ... // body1
}
else if(condition2) {
  ... // body2
}
...
else {
  ... // else body
}

does not have a direct analogue in assembly. We have to reconstruct its behavior using comparisons and both conditional and unconditional jumps.

Thus, the above translates into something like

cmp ...
jncc .elseif1 
  ; body 1

  jmp end_else

.elseif1:
cmp ...
jncc .elseif2
  ; body2

  jmp end_else

... ; other else-if comparisons and bodies

.else: 

  ; else body

.end_else:

...

(Of course, you should try to use more descriptive label names!)

Nested if-else

A nested if-else, such as

if(rax == rbx) {
  if(rbx < rcx) {
    ...
  }
}

as we have seen can be translated into

cmp rax, rbx      ; Or whatever you need for the outer condition
jne .end          ; Note: jump if NOT equal
cmp rbx, rcx      
jge .end

...               ; Actual body 

.end
...               ; Rest of program

We test each condition in turn and jump to the label after the body of the nested-if, if the condition is not met. Thus, all the conditions are negated in the j** instructions.

do-while loops

We’ve already seen that the loop instruction serves to implement a kind of do-while loop, so long as you want to use a decrementing rcx as the loop variable, with the loop ending when rcx == 0. With conditional jumps we can build a more general do-while loop:

.do                 ; do {

  ...               ;   Loop body

  cmp rax, rbx      
  je .do            ; } while(rax == rbx);

while loops

Implementing a while loop requires testing the loop condition at the beginning of the loop, and possibly jumping to past-the-end of the loop if it fails. Thus, we need a label at both the beginning (so that we can do the loop at all) and the end of the loop:

.while:         ; while(rax != rbx) {
  cmp rax, rbx
  je .end_whle

  ...           ;   Loop body

  jmp .while
.end_while:     ; }

  ...

A for loop is just a specialized kind of while loop, so for example,

for(rax = 0; rax < 100; ++rax) {
  ...
}

would become

  xor rax, rax      ; rax = 0
.for:     
  cmp rax, 100
  jge .end_for

  ...               ; Loop body

  inc rax
  jmp .for
.end_for:

break and continue

break is equivalent to a jmp to a label immediately after the end of the loop. continue is equivalent to a jmp to a label at the beginning of the loop. The common pattern of

if(condition)
  break; // Or continue

can be done with a conditional jump to the end/beginning; there is no need to emulate the entire if structure.