Final Exam Review

Binary arithmetic

Binary addition

      1111
    1001011
 +  1100101
────────────
 1  0110000

In this case, we had an extra carry; the true result of the addition is too big to fit into a single byte.

Binary subtraction

Subtract-with-borrow, but see also the negate-then-add method below.

Subtraction follows a similar pattern, but with “borrowing” instead of carrying. E.g.,


    110110
 -  100001
────────────

0 - 1 = -1, so we borrow a 1 from the next column (i.e., we are doing 10 - 1 = 1)

        1
    110110
 -  100001
────────────
         1

1 - 1 = 0:

        1
    110110
 -  100001
────────────
        01

1 - 0 = 1:

        1
    110110
 -  100001
────────────
       101

0 - 0 = 0:

        1
    110110
 -  100001
────────────
      0101

1 - 0 = 1:

        1
    110110
 -  100001
────────────
     10101

And 1 - 1 = 0 (we could drop the leading 0 in the answer):

        1
    110110
 -  100001
────────────
    010101

It’s possible to end up with an extra “borrow”, indicating underflow.

Binary to decimal, and the reverse

Input	Remainder	Binary
1234	0	`__________0`
617	1	`_________10`
308	0	`________010`
154	0	`_______0010`
77	1	`______10010`
38	0	`_____010010`
19	1	`____1010010`
9	1	`___11010010`
4	0	`__011010010`
2	0	`_0011010010`
1	1	`10011010010`

To decimal: multiple bits by powers of two.

Two’s complement representation

Represent the negation of a value by

Flipping all the bits
Adding 1

E.g., 00110110 negated gives

00110110
11001001  Flip all bits
11001010  Add 1

Negative values will always have the high bit set.

Addition/subtraction can be done normally. (To do subtraction, just negate the second operand and then add.)

Registers, and their uses

Syscall register use

64-bits	Low 32-bits	Low 16-bits	Low 8-bits	Comment
`rax`	`eax`	`ax`	`al`	Accumulator; `syscall` code and return
`rbx`	`ebx`	`bx`	`bl`	Base
`rcx`	`ecx`	`cx`	`cl`	Count (syscall clobbered)
`rdx`	`edx`	`dx`	`dl`	Dword accum.; 3rd `syscall` arg.
`rsi`	`esi`	`si`	`sil`	Source index; 2nd `syscall` arg.
`rdi`	`edi`	`di`	`dil`	Dest. index; 1st `syscall` arg.
`rbp`	`ebp`	`bp`	`bpl`	Stack base pointer
`rsp`	`esp`	`sp`	`spl`	Stack pointer
`r8`	`r8d`	`r8w`	`r8b`	5th `syscall` arg.
`r9`	`r9d`	`r9w`	`r9b`	6th `syscall` arg.
`r10`	`r10d`	`r10w`	`r10b`	4th `syscall` arg.
`r11`	`r11d`	`r11w`	`r11b`	(syscall clobbered)
…	…	…	…
`r15`	`r15d`	`r15w`	`r15b`

The first four registers allow access to their second byte (the high byte of the word-sized): ah, bh, ch, dh. These cannot be mixed with any of the newer registers (e.g., mov r15b, ah is invalid).

C-style functions

Registers:

Register	Use
`rax`	Return value
`rbx`	Callee-preserved
`rcx`	4th argument
`rdx`	3rd argument
`rsi`	2nd argument
`rdi`	1st argument
`rbp`	Callee-preserved
`rsp`	Stack pointer
`r8`	5th argument
`r9`	6th argument
`r10`	Temporary (caller-preserved)
`r11`	Temporary (caller-preserved)
`r12`-`r15`	Callee-preserved

Stack (rsp) must be aligned to a multiple of 16 + 8 before any call. The stack is aligned to a multiple of 16 immediately after function entry, so usually we can just do either

sub rsp, 8

push rbp
mov rbp, rsp

Either way, we have to undo the process before ret-urning.

Arithmetic operations

add
sub
inc
dev

mul and div (and imul and idiv) and their register usage.

mul rm      ; Multiply rdx:rax by rm, store the result back into rdx:rax
div rm      ; Divide rdx:rax by rm, store the result back into rdx:rax

Note that these use rdx:rax as a 128-bit input; if you are not using the full 128 bits, you should zero rdx before the operation.

Division stores the quotient into rax and the remainder (modulo) into rdx.

Comparisons

cmp – basically just a sub which discards the results (but keeps the flags).

test – just an and which discards the results. Mostly useful for testing individual bits.

Flags and their meanings: CF, OF, SF, ZF

CF – Carry flag, set if addition/subtraction generated an extra carry/borrow. Indicates overflow for unsigned arithmetic operations. Meaningless if the operands are signed.
OF – Overflow flag, set if addition/subtraction generated an overflow when interpreted as signed. (I.e., set if the sign of the result is not what it should be.) Meaningless for unsigned operations.
SF – Sign flag, copy of the high bit of the result.
ZF – Zero flag, set if the result = 0.

Condition codes

a, b – “above” and “below” for unsigned comparisons. a means CF unset, ZF unset (if a > b then a - b > 0). ae means CF unset, ZF ignored. be means CF == 1 or ZF == 1. b means CF == 1 and ZF == 0.
l g – “less than”, “greater than” for signed comparisons. l means SF != OF and ZF == 0, le means SF != OF. g means SF == OF and ZF == 0, ge means SF == OF.
e ne – “equal”, not-equal, e means ZF == 1 (a - a = 0) and ne means ZF == 0.
There are condition codes for each of the flags. E.g., ns means SF == 0.
There are negated forms of all the conditions. E.g., nae means “not above-or-equal” and is equivalent to b.

Jumps and branches

jmp target: unconditional jump

jCC target: conditional jump, replace CC with condition code

loop target: decrement rcx, jump to target if not 0.

Memory operands and arrays/strings

[displacement + scale*offset + base]

Displacement is an immediate address (typically a label)
scale is 1,2,4 or 8. If omitted, 1 is assumed
offset is the offset register
base is the base register.

Memory-memory operations are generally forbidden.

lea reg, mem computes the effective address of mem (i.e., does the math) and then stores the address, not the value in memory, into reg.

String operations

String operations implicitly use [rdi] and [rsi] as their operands.

Instruction	Description
`lodsb`	Load `byte [rdi]` into `al`
`stosb`	Write byte from `al` into `byte [rdi]`
`movsb`	Copy byte from `[rdi]` to `[rsi]`
`cmpsb`	Compare `[rdi]` with `[rsi]` and update flags
`scasb`	Compare `[rdi]` with `rax` and update flags

Replace b with w for word-sized, d for dword, etc.

All of these implicitly increment rdi and rsi (if used).

Repetition prefixes:

rep – Repeat rcx many times. Can be used with lodsb, stosb, movsb.
repe/repne – Repeat rcx many times, or until equal/not equal. Can be used with cmpsb and scasb.

Structures and alignments

struc/endstruc – Shortcut for defining a bunch of equ definitions. E.g.,

struc thing
    a:  resb
    b:  resb
    c:  resw
    d:  resd
    e:  resq
endstruc

defines the following constants:

thing:      equ     0
a:          equ     0
b:          equ     1
c:          equ     2
d:          equ     4
e:          equ     8
thing_size: equ     16

To be C-compatible, the elements of a structure must be aligned (placed in memory at a multiple of their size). So a qword member must start at a multiple of 8. Extra resbs can be used to add padding bytes, or the align directive.

Instances of structures must be placed in memory at structure alignment, which is a multiple of the largest element of the structure. E.g., the above structure would have 8-byte alignment.

Floating-point operations

Floating point registers are xmm0 through xmm15. Operations are suffixed with their operand size: ss for single-precision (float), sd for double-precision (double).

Use movss, movsd to move float values into/out of operands. There are no float immediates; store floating point constants in the .data section and then movs* them into a register.

addss dest, src         ; dest += src (float)
addsd dest, src         ; dest += src (double)
subss dest, src         ; dest -= src (float)
subsd dest, src         ; dest -= src (double)
mulss dest, src         ; dest *= src (float)
mulsd dest, src         ; dest *= src (double)
divss dest, src         ; dest /= src (float)
divsd dest, src         ; dest /= src (double)

All of these are also available in three-operand forms:

vaddss dest, src1, src2  ; dest = src1 + src2
vaddsd dest, src1, src2  ; dest = src1 + src2
vsubss dest, src1, src2  ; dest = src1 + src2
vsubsd dest, src1, src2  ; dest = src1 + src2
vmulss dest, src1, src2  ; dest = src1 + src2
vmulsd dest, src1, src2  ; dest = src1 + src2
vdivss dest, src1, src2  ; dest = src1 + src2
vdivsd dest, src1, src2  ; dest = src1 + src2

Comparisons use ucomiss, ucomisd which update the flags as if for an unsigned comparison.

Bitwise operations

and, or, not, xor, andn (AND followed by negation of the result).

These set flags, so they can be used for various purposes.

Shifts and rotates

shl – Shift left, fill in low bits with 0
shr – Shift right, fill in high bits with 0
sar – Shift arithmetic right, for signed values, fill high bits with
copies of the existing sign bit.
ror, rol – Rotate left/right.

The shift/rotate amount can either be an immediate or a byte-sized register.