Review: Arithmetic operations

Arithmetic operations

add dest, src       ; dest += src
sub dest, src       ; dest -= src

add and sub perform addition and subtraction between two operands of the same size. Internally, sub is just addition with the second operand negated, and the carry inverted at the end.

Flags

add and sub set/unset the OF, SF, ZF, AF, CF, and PF flags:

For signed operations the OF flag indicates that an overflow/underflow occurred. This bit is set if the sign bit of the result is incorrect, because the correct result was too big/small for the destination to hold. For example, signed 127+127 will generate an overflow. (Unsigned 127+127 = 254 which can still fit in an unsigned byte, hence would not set the carry flag.)

The correct sign of the result is easy to determine: if both inputs are positive, the result should be positive; if both are negative, then the result should be negative. If one is positive and the other is negative, then the correct sign of the result is more difficult to determine, but it turns out in this case that it doesn’t matter: it’s impossible for an addition of a positive and a negative value to overflow. Subtraction is treated similarly, except that the sign of the second operand is flipped (i.e., a - b is treated as a + (-b)).

If the inputs are unsigned, then the OF flag is still set/unset, but its value is meaningless.
For unsigned operations the CF flag indicates that an extra carry/borrow was “left over” at the end. This indicates that the result of the operation was too big/small to for the destination to hold. For example, 255+127 is too big for a byte, and would set the carry flag. It would not set the overflow flag, because 255 unsigned = -1 signed, and 127-1 = 126 which fits in a signed byte.

Note that after an unsigned subtraction a - b, if the carry flag is set, this indicates that b > a.

The CF flag is still set/unset if the inputs are signed, but its value is meaningless in that case.
The ZF flag is set if the result was 0 (all zero bits).
The SF flag is set to a copy of the sign bit (set if result was negative, for signed operations). For unsigned operations its just a copy of the high bit of the result.
The PF flag is set if the number of set (=1) bits in the low byte of the result is odd. The parity flag is something of a historical artifact and is not used much, partly because it does not give the parity of the entire result, just the lowest byte of it.
We’ll ignore the AF flag, as it only makes sense in the context of BCD arithmetic.

Note that all flags are set/cleared on all operations, but some flags only make sense on signed/unsigned operations. The add/sub instructions don’t know whether you are performing a signed or unsigned operation, so its up to you to make sure you check the correct flags for the type of operation you are performing.

Example: Let’s perform a add operation and see how the flags are set from it:

  111  11
   10110011   = 179 (unsigned)   = -77 (signed)
 + 01100110   = 102 (unsigned)   = 102 (signed)
────────────
1  00011001   =  25 (unsigned)   = 25  (signed)

Interpreted as signed values, one input was negative, the other was positive, so the sign of the result is guaranteed to be correct. OF = 0.
Interpreted as unsigned, the addition produced an extra carry, CF = 1.
The result is not 0, so ZF = 0.
The number of set bits in the result is odd, so PF = 1.
The high bit is unset (the result is positive), so SF = 0.

Increment and decrement

inc dest    ; ++dest
dec dest    ; --dest

inc and dec increment/decrement their single operand, which can be either a register or a memory location. inc and dec do not modify the carry flag, as an add r, 1 or sub r, 1 instruction would. The flags OF, SF, ZF, AF, and PF are set/cleared as expected. When used on signed values, the behavior is still correct (incrementing a negative value brings it closer to 0, decrementing a negative value makes it more negative).

Addition/subtraction larger than 64-bits

The largest registers we have are 64-bits (qword). What if we want to perform an addition/subtraction on 128-bit operands (represented as, e.g., rdx:rax)? Let’s consider how we would perform word-sized addition, if the only addition we could do natively was byte-sized:

     111111←   1111
   00101101 11001101
 + 00010010 10101011 
─────────────────────
   01000000 01111000

Adding the low bytes produced an extra carry (CF = 1), which we then used to start the addition of the high bytes. We effectively need two kinds of addition:

Low byte addition, which doesn’t start with a carry (ignores CF)
High byte addition, which uses CF to start the addition.

This is how we perform larger-than-qword addition, there is another addition operation, add-with-carry, adc which uses the status of the carry flag CF as an input for the first bit’s addition.

adc dest, src       ; dest = dest + src + CF

For subtraction there is sbb, subtract-with-borrow.

Thus, to add the double-qword rdx:rax to rcx:rbx, we would do

add rax, rbx 
adc rdx, rcx

The analogue for subtraction is sbb, subtract-with-borrow.

Multiplication and division

Multiplication and division are more complex than addition/subtraction. We will cover them in more detail later, but for now:

The result of multiplying two n-bit values can have up to 2n bits. Hence, when multiplying two qword values, we need somewhere to store the double-quad-word result.
Division has the opposite problem, we may want to divide a double-qword value by a qword divisor.
Signed multiplication is different from unsigned, and similarly for division. Different instructions are used for each.
Multiplication instructions exist in two-operand forms (dest *= src) as you would expect, but also in a one-operand form which the rax register is implicitly used as the destination for the operation, and even a three operand form, equivalent to dest = src * immediate. The division instructions only take a single operand, and always store their results into a combination of rax and rdx.

To store a double-qword (128-bit) result, we use a combination of rax and rdx: rax stores the low qword while rdx stores the high qword. We write this combination as rdx:rax. (Using a similar notation, we could say that ax = ah:al.) Smaller multiplications do not require this extension.

The unsigned/signed multiplication instructions are mul and imul, respectively:

Instruction	Equivalent
`mul rm`	`rdx:rax *= rm`, unsigned
`imul rm`	`rdx:rax *= rm`, signed
`imul r, rm`	`r *= rm`, signed
`imul r, rm, imm`	`r = rm * imm`, signed

The CF and OF flags are set/cleared together, if the sign of the result is incorrect. If the result of the multiplication does not fit into the destination, the results are truncated (high bits discarded). The values in the other flags are undefined.

Division only has a single operand form, where the operand contains the divisor; the destination (which is also the dividend) is in rdx:rax. The result of div/idiv is both the rounded-down result in rax, but also the remainder (i.e., modulo or %) in rdx. Unlike C++ where we have / for integer division and % for integer modulo, in assembly a single instruction gives us both results.

Instruction	Equivalent
`div rm`	`rax = rdx:rax / rm` and `rdx = rdx:rax % rm`, unsigned
`idiv rm`	`rax = rdx:rax / rm` and `rdx = rdx:rax % rm`, signed

An overflow in division is indicated not by setting the carry flag, but by a divide-error exception #DE, which is sent to our process as a signal SIGFPE. For now, this will immediately crash our program, but later we’ll see how to write a signal handler to deal with it in a more graceful manner. (Of course, we could also avoid the overflow by checking the operands before performing the division.)

Functions, branching and conditional instructions

Assembly language does not have dedicated looping constructs (like for, do, while, etc.). It only has

Branches (also knowns as jumps, goto statements, etc.) jumps to a new location in the program.
Comparisons compare two operands and then set the flags register appropriately. There is only one comparison instruction, which performs all possible comparisons (equality, less-than, equal-to-zero, etc.).
Conditional branches perform a branch or continue on to the next instruction normally, depending on the state of one of the flags (set previously by a comparison operation).
Conditional moves perform a mov or not depending on the state of one of the flags.

Function calls and returns are just specialized forms of branches, which manipulate the stack.

The structure of an assembly language program

We’ve said that the difference between assembly and a language like C/C++ is that each “statement” (instruction) in assembly corresponds to exactly one CPU operation. In contrast, in C/C++, a single statement may generate many operations during compilation. This means that assembly cannot have “conditional statements” or “looping statements” the way C/C++ does; in C/C++ these are compound statements, statements with other statements inside them. This necessarily means that an if-else or a while-loop generates more than one CPU operation. Hence, in assembly language, loops and conditions work very differently.

An assembly-language program is, in the end, just a sequence of instructions. That’s it. There’s no real division between (for example) different functions, or between the “body” of a loop or if-else and the rest of the function in which it is written. The program is just a big blob of instructions, and hence it falls to us to impose some structure on it. The usual programming language constructs — functions, conditionals, loops — are things that we have to build for ourself.

Every instruction in an assembly language program has an address, the location at which it will end up in memory when the program is eventually run. Adding a label tells the assembler that this address (the address of the instruction following the label) is important, important enough to be saved and given a name. Hence, when we write

_start:
  ...

the “value” of _start is the address of the first instruction immediately following it. This is true for local labels (those whose name starts with .) as well.

The normal flow of control in an assembly language program is simple: each instruction is executed in order from first to last. The CPU always knows what instruction “comes next” in the program: its the one directly after this one.

The only kind of other control flow supported by assembly language is, instead of running instructions in first-to-last sequence, jumping to an address (implemented by the CPU as changing the value of rip, the instruction pointer register). All of our existing flow control structures (if-else, switch-case, while, do-while) will have to be translated into this primitive notion of either skipping forward in the program, over some instructions, or skipping backwards in the program, so that some addresses in the program are fed to the CPU for instruction more than once.

Branches

Branches work by jumping to a new location. Because the instruction pointer register points to the following register, this is not just a matter of changing rip; it involves redirecting the CPU to the new address and loading the instruction immediately after the address into rip. Fortunately, all this is done under the hood; we only provide the address to jump to, usually in the form of an address.

Labels

The target of a jump must be a label. Labels consist of an identifier followed by a colon:

Target:
    ...

A local label is a label whose name starts with a period. The full name of a local label is the most recent non-local label added to the local label name. E.g.

my_function:
  ...
  .begin_loop:    ; Full name: my_function.begin_loop

This allows us to have multiple labels with the same “name” so long as they are in different functions/chunks of code.

A label is just an address, the address of the next instruction (or data, if used in the .data section).

Jumps

To jump to a label, use the jmp instruction:

jmp target

Note that the “value” of a label is simply its address within the program. Thus, it’s possible to store a label within a register, jump to a register, etc.

mov rax, Target 
jmp rax

This is sometimes called a “computed jump”. We could, for example, store a set of labels in an array (in the .data section) and then use an array index to determine what label to jump to. This technique will be used later to implement switch-case structures.

Comparisons

There are two comparison instructions, of which cmp is the first and most straightforward; it requires two operands, and both operands must be the same size. The first operand cannot be an immediate, but the second can be. Either of the operands can be in memory, but not both at the same time.

cmp op1, op2

The comparison instruction internally performs the subtraction op1 - op2, discarding the result but updating the flags register accordingly. For example, if op1 - op2 == 0 then the zero flag ZF will be set; but if op1 - op2 == 0 then op1 == op2, so the set zero flag tells us that the original operands were equal. Similarly, subtraction sets the carry flag if op1 > op2.

Conditions

Various combinations of flags can be used to determine the relationships between the two operands of sub a, b:

If a == b then the result will be 0, which will set ZF = 1. So we can detect equality by looking at the zero flag. (The condition code for ZF == 1 is e or z.)
If a != b then the result will be non-zero, so ZF == 0. (Condition code: ne or nz.)
If a > b, unsigned, then the result will be non-zero, and no extra carries/borrows will be needed, so ZF == 0 and CF == 0. (Condition code: a)
If a >= b, unsigned, then no extra carries/borrows will be needed (the result might be zero or might not), so CF == 0. (Condition code: ae, nc)
If a < b, unsigned, then this is just the negation of a >= b, so CF == 1 (an extra borrow was needed). (Condition code: b, c)
Similarly, a <= b, unsigned, is the negation of a > b, so ZF == 1 or CF == 1. (Condition code: be)
If a > b, signed, then things are a bit more interesting: we know that the result will not be 0, so ZF == 0, but the rest of the result depends on both the overflow flag and the sign flag:
- If a and b have the same sign, then OF == 0 (overflow is impossible). If a > b then if both are positive, the result will be positive (SF == 0). If both are negative, then the result will also be positive (e.g., -2 > -10, -2 - -10 = +8). So in this case we have SF == OF.
- If a and b have different signs, then overflow is possible. If a > -b then we are doing a - -b = a + b:
  
  * If a + b does not overflow, then the sign is positive, so SF == OF == 0
  
  * If a + b does overflow, then the sign is negative, but OF == 1, so we have OF == SF == 1
  
  Either way, we have SF == OF again.
Thus, our final condition for a > b signed is ZF == 0 and SF == OF. (Condition code: g)
If a >= b, signed, then we simply ignore the zero flag: SF == OF. (Condition code: ge)
a < b is the negation of a >= b, so SF != OF. (Condition code: l)
a <= b is the negation of a > b, so ZF == 1 or SF != OF. (Condition code: le)

Each of these condition codes will be used, later, in a conditional jump instruction. For signed comparisons we normally use the terms “less than” and “greater than”; for unsigned comparisons we say “below” and “above”.

Memory-memory comparisons

The cmp instruction cannot compare two operands both in memory. The cmps* family of instructions can compare two operands in memory, the first located at [rsi] and the second at [rdi].

Instruction	Description
`cmpsb`	Compares `byte [rsi]` with `byte [rdi]`
`cmpsw`	Compares `word [rsi]` with `word [rdi]`
`cmpsd`	Compares `dword [rsi]` with `dword [rdi]`
`cmpsq`	Compares `qword [rsi]` with `qword [rdi]`

The cmps* instructions do not take any operands; they always use rsi and rdi.

The `test` instruction

An alternative to the cmp instruction is test: while cmp performs subtraction and updates the same flags as sub, test performs a binary AND and updates only the SF, ZF, and PF flags. The CF and OF flags are cleared. This means that test cannot be used to determine any of the conditions which depend on those flags (ordering comparisons such as greater, less, above, below), or equality. Because it uses AND instead of subtraction, the uses for test are more limited:

A common pattern to jump if a register is equal to 0 is
```
 test reg, reg
 jz target          ; or je target, jump if ZF == 1
```
If you AND a value with itself, you get back that same value; the only way to get 0 as a result is if reg == 0 in the first place. Both je and jz jump only if ZF == 1.
test reg, reg can also be used to determine the sign of a register: if SF == 1 then reg is negative. Thus, we can do
```
 test reg, reg
 js target
```
to jump to target if reg < 0. (If reg is unsigned, this will jump if the high bit is set.)
Similarly, test reg, reg followed by jle will jump if reg <= 0, although figuring out why this works takes a little work:
- jle jumps if ZF == 1 or SF != OF
- test always sets OF = 0, so this is effectively ZF == 1 or SF == 1
- ZF == 1 is the condition used above for reg == 0
- SF == 1 is the condition used above for reg < 0
- So ZF == 1 or SF == 1 is equivalent to reg <= 0
Of course, this only makes sense if reg is a signed value.
test reg, 00000010b can be used to test whether a specific bit (or combination of bits) is set in a register. If the result of the AND is 0, then the bit was not set, and ZF == 1; if the bit is set, then ZF == 0. So we can do
```
 test reg, 00000010b
 jnz target              ; jump if bit 2 is set
```
This is probably the primary use for test

test is subject to a few restrictions on its operands:

The second operand must be a register or immediate, not a memory location.
The first operand can be either register or memory
Both must be the same size

Neither operand is modified by the test instruction; only the flags register is changed.

Other instructions

Remember that many other instructions setup the flags register; you don’t have to just restrict yourself to cmp and test! For example, suppose you want to decrement rcx and then jump somewhere as long as it is not equal to 0. This can be done with just

dec rcx
jnz target

dec will set ZF if the result is 0, so there’s no need to use a cmp or test.

Another example would be if we needed to perform the subtraction rax -= rbx and then jump on whether rax was 0 afterwards. It would be wasteful to do

sub rax, rbx
cmp rax, 0
jz label

when we could just do

sub rax, rbx
jz label

because sub will have set the zero flag if rax == 0.

Conditional branch instructions

The conditional branch instructions examine the flags register and either jump to the target or not. These are usually referred to as jcc, where cc is a condition code:

Operation	Description	Flag condition
`je`	Jump if `op1 == op2`	`ZF == 1`
`jne`	Jump if `op1 != op2`	`ZF == 0`
`jl`	Jump if `op1 < op2`, signed	`SF != OF`
`jle`	Jump if `op1 <= op2`, signed	`ZF == 1 or SF != OF`
`jg`	Jump if `op1 > op2`, signed	`ZF == 0 and SF == OF`
`jge`	Jump if `op1 >= op2`, signed	`SF == OF`
`jb`	Jump if `op1 < op2`, unsigned	`CF == 1`
`jbe`	Jump if `op1 <= op2`, unsigned	`CF == 1 or ZF == 1`
`ja`	Jump if `op1 > op2`, unsigned	`CF == 0 and ZF == 0`
`jae`	Jump if `op1 >= op2`, unsigned	`CF == 0`

(For the unsigned comparisons, “a” is short for “above” and “b” is short for “below”. For signed, “greater” and “less” are used. “ae” is “above or equal”, while “ge” is “greater-than or equal”.)

For those who were annoyed that C/C++ does not have negated comparisons (!<, !>=) will be pleased to know that assembly has jnl (not-less-than) as a synonym for jge and so forth:

Operation	Description
`jna`	Jump if not above
`jnae`	Jump if not above or equal
`jnb`	Jump if not below
`jnbe`	Jump if not below or equal
`jng`	Jump if not greater-than
`jnge`	Jump if not greater-than or equal
`jnl`	Jump if not less-than
`jnle`	Jump if not less-than or equal

These are just aliases for the above instructions (e.g., jna is an alias for jbe).

There are a set of jumps which mimic the operation of loop by examining the rcx register:

Operation	Description
`jcxz`	Jump if `cx == 0`
`jecxz`	Jump if `ecx == 0`
`jrcxz`	Jump if `rcx == 0`

Note that these jump if rcx is equal to zero, while loop jumps if rcx is not equal to zero.

Finally, there are a set of semi-redundant conditions which refer to the flag names directly:

Operation	Description
`jc`	Jump if `CF == 1`
`jnc`	Jump if `CF == 0`
`jz`	Jump if `ZF == 1`
`jnz`	Jump if `ZF == 0`
`jo`	Jump if `OF == 1`
`jno`	Jump if `OF == 0`
`js`	Jump if `SF == 1`
`jns`	Jump if `SF == 0`
`jz`	Jump if `ZF == 1`
`jnz`	Jump if `ZF == 0`
`jp`	Jump if `PF == 1`
`jpo`	Jump if `PF == 1` (jump if parity odd)
`jpe`	Jump if `PF == 0` (jump if parity even)

These may be more suitable for use with test or with other instructions that set the flags.

For example, suppose we want to implement the following code

if(rcx == 0)
    rax = rbx;

Using conditions and conditional branches, we could do this with

cmp rcx, 0
jne NotZero
mov rax, rbx
NotZero:
...

Jump targets

The normal jmp instruction can jump to any address. The conditional jumps store their target as a signed 8- or 32-bit offset from the current jump instruction’s address. In assembly we write a label, and the assembler determines how many bytes ahead (positive) or before (negative) the jump the label is, and writes this offset into the instruction. An 8-bit jump is called a “short” jump while a “32-bit” jump is called a “near” jump.

An 8-bit jump can be encoded in a smaller instruction than a 32-bit one, for obvious reasons. Yet another reason to keep your loops short!

Conditional jumps to computed targets

With an unconditional jump, it is easy to jump to a target defined by a register:

target:
...
mov rax, target
jmp rax

because the value of the rax register is the address to jump to. Because conditional jumps use not an absolute address, but an offset from the current address, computed conditional jumps require some more finesse.

The simplest method is to make the conditional jump to a fixed target, where the target is a normal jmp to a computed address:

jcc my_jmp_target

  ⋮

my_jmp_target:  jmp rax

This method is slightly slower than ideal, because it involves two jumps. A faster method is to compute the distances between the final jcc instruction and the various jump targets, and then store these distances into some register. Because these distances are fixed at assembly time, it would be inefficient to compute them at runtime. The general strategy is to label the conditional jump instruction itself, so that we have access to its address:

target1:

  ⋮
                mov rax, computed_jump - target1  ; Pick target to jump to
computed_jump:  jcc rax                           ; Jump 

  ⋮

target2:

  ⋮

The mov would of course be part of a conditional structure that would either do

mov rax, computed_jump - target1

mov rax, computed_jump - target2

depending on some condition. We could also store an array of computed_jump - target1, computed_jump - target2, etc. offsets and then index into that.

Compound conditions

How can we check compound conditions, e.g., rbx >= 10 and rbx < 100 and perform a jump if the compound condition is true?

One method is to perform a multi-step jump:
```
 cmp rbx, 10
 jge .step1
 jmp .else

 .rax_ge_0:
 cmp rbx, 100
 jnge .else

    ; rbx >= 0 and rbx < 100

 .else:

    ; condition failed.
```
Each condition except for the last would require its own cmp and conditional jump. (Because cmp resets the flags before it does the comparison, you cannot “combine” multiple comparisons.)

This is effectively equivalent to transforming
```
 if(rbx >= 10 and rbx <= 100) {
  ...
 }
```
into
```
 if(rbx >= 10) {
     if(rbx <= 100) {
      ...
     }
 }
```
This is the most common and straightforward method.
set** checks a specific condition flag (or combination of flags) and sets a (byte) register/memory to 1 or 0. These can then be combined using the normal and/or/not bitwise operations, and the z, nz conditions can be used to check for false/true. E.g.,
```
 cmp rbx, 10
 setge al
 cmp rbx, 100
 setl  ah
 and al, ah      ; Sets the zero flag if al && ah == 0
 jz .outside

    ; Inside

 .outside:
```
set** supports the same set of condition codes as conditional jumps.
The lahf instruction can be used to save the values of the CF, ZF, SF, and PF flags into the ah register for later manipulation. Because this does not include OF, signed comparisons cannot be used with this method.
A range check like the above example actually has a simple version using subtraction:
```
 sub rbx, 10
 cmp rbx, 100 - 10
 jae .outside
    ; Inside the range

 .outside:
    ; Outside the range 
```
This works because if rbx < 10 the subtraction will wrap around to a value, so values < 10 and values >= 100 will jump to .outside. This works assuming that rbx is unsigned.

Optimizing jumps

Conditional jumps are expensive! (Unconditional jumps are more expensive than normal sequential control flow, but not as expensive as conditional jumps.) The processor does not know what instruction will be taken until it has examined the flags register, which means many of the optimizations it performs have to be delayed. The best way to optimize jumps is to minimize their usage: try to keep as much of your control flow sequential as possible. Beyond that, try to

Keep conditional jumps short, within +-127 bytes
Try to arrange conditional jumps so that the condition is usually false or usually true, not frequently alternating. The processor will try to do branch prediction, storing the choices made by a small number of jumps, but this functions best if conditional jumps are mostly consistent in their choices (i.e., if the “prediction” is correct most of the time).

For example, in a loop, the loop condition is true most of the time, and false only at the very end. The processor will learn this behavior and “guess” that the loop will repeat, so that most loop jumps will be fast. Only the final jump, out of the loop, will be slow, as that’s where the prediction fails.
Avoid conditional branches completely by using conditional moves (see below) or the setcc instruction.

`SETcc` and bool → int conversion

Sometimes, in C/C++ we rely on the implicit conversion from bool → int to avoid writing an if/else. E.g., to count the number of negative values in an array we could do:

int c = 0;
for(int* p = arr; p < arr + size)
   c += (*p < 0);

This works because bool true converts to 1 (thus becoming c += 1) and false converts to 0 (becoming c += 0). This code is actually faster than the equivalent code:

int c = 0;
for(int* p = arr; p < arr + size)
   if(*p > 0)
      ++c;

because evaluating a conditional branch is slower for the CPU. To implement the above version, we can use the SETcc instruction, which sets a given (byte) register to 1 if the condition code cc is satisfied, or 0 if it is not. E.g., to increment rax only when rbx > 0 we could do

mov rcx, 0

cmp rbx, 0
seta cl      ; Set cl = 1 if rbx > 0
add rax, rcx

Transforming C/C++ constructs

If-else chains

The classic C/C++ if-else if construct:

if(condition1) {
  ... // body1
}
else if(condition2) {
  ... // body2
}
...
else {
  ... // else body
}

does not have a direct analogue in assembly. We have to reconstruct its behavior using comparisons and both conditional and unconditional jumps.

Each if requires a cmp or test (possibly multiple, if the condition is more complex than a simple comparison), and a conditional jump if the condition is false. The jump target is the next if in the chain.
The body of each if ends with an unconditional jump to a label just after the end of the final else.
The body of the else does not need a jump, because it “falls through” directly to the following code.

Thus, the above translates into something like

cmp ...
jncc .elseif1 
  ; body 1

  jmp end_else

.elseif1:
cmp ...
jncc .elseif2
  ; body2

  jmp end_else

... ; other else-if comparisons and bodies

.else: 

  ; else body

.end_else:

...

(Of course, you should try to use more descriptive label names!)

Nested `if-else`

A nested if-else, such as

if(rax == rbx) {
  if(rbx < rcx) {
    ...
  }
}

as we have seen can be translated into

cmp rax, rbx      ; Or whatever you need for the outer condition
jne .end          ; Note: jump if NOT equal
cmp rbx, rcx      
jge .end

...               ; Actual body 

.end
...               ; Rest of program

We test each condition in turn and jump to the label after the body of the nested-if, if the condition is not met. Thus, all the conditions are negated in the j** instructions.

`do-while` loops

We’ve already seen that the loop instruction serves to implement a kind of do-while loop, so long as you want to use a decrementing rcx as the loop variable, with the loop ending when rcx == 0. With conditional jumps we can build a more general do-while loop:

.do                 ; do {

  ...               ;   Loop body

  cmp rax, rbx      
  je .do            ; } while(rax == rbx);

`while` loops

Implementing a while loop requires testing the loop condition at the beginning of the loop, and possibly jumping to past-the-end of the loop if it fails. Thus, we need a label at both the beginning (so that we can do the loop at all) and the end of the loop:

.while:         ; while(rax != rbx) {
  cmp rax, rbx
  je .end_whle

  ...           ;   Loop body

  jmp .while
.end_while:     ; }

  ...

A for loop is just a specialized kind of while loop, so for example,

for(rax = 0; rax < 100; ++rax) {
  ...
}

would become

  xor rax, rax      ; rax = 0
.for:     
  cmp rax, 100
  jge .end_for

  ...               ; Loop body

  inc rax
  jmp .for
.end_for:

`break` and `continue`

break is equivalent to a jmp to a label immediately after the end of the loop. continue is equivalent to a jmp to a label at the beginning of the loop. The common pattern of

if(condition)
  break; // Or continue

can be done with a conditional jump to the end/beginning; there is no need to emulate the entire if structure.