Binary arithmetic

Binary addition

      1111
    1001011
 +  1100101
────────────
 1  0110000

In this case, we had an extra carry; the true result of the addition is too big to fit into a single byte.

Binary subtraction

Subtract-with-borrow, but see also the negate-then-add method below.

Subtraction follows a similar pattern, but with “borrowing” instead of carrying. E.g.,


    110110
 -  100001
────────────

0 - 1 = -1, so we borrow a 1 from the next column (i.e., we are doing 10 - 1 = 1)

        1
    110110
 -  100001
────────────
         1

1 - 1 = 0:

        1
    110110
 -  100001
────────────
        01

1 - 0 = 1:

        1
    110110
 -  100001
────────────
       101

0 - 0 = 0:

        1
    110110
 -  100001
────────────
      0101

1 - 0 = 1:

        1
    110110
 -  100001
────────────
     10101

And 1 - 1 = 0 (we could drop the leading 0 in the answer):

        1
    110110
 -  100001
────────────
    010101

It’s possible to end up with an extra “borrow”, indicating underflow.

Binary to decimal, and the reverse

Input Remainder Binary
1234 0 __________0
617 1 _________10
308 0 ________010
154 0 _______0010
77 1 ______10010
38 0 _____010010
19 1 ____1010010
9 1 ___11010010
4 0 __011010010
2 0 _0011010010
1 1 10011010010

To decimal: multiple bits by powers of two.

Two’s complement representation

Represent the negation of a value by

  1. Flipping all the bits

  2. Adding 1

E.g., 00110110 negated gives

00110110
11001001  Flip all bits
11001010  Add 1

Negative values will always have the high bit set.

Addition/subtraction can be done normally. (To do subtraction, just negate the second operand and then add.)

Registers, and their uses

Syscall register use

64-bits Low 32-bits Low 16-bits Low 8-bits Comment
rax eax ax al Accumulator; syscall code and return
rbx ebx bx bl Base
rcx ecx cx cl Count (syscall clobbered)
rdx edx dx dl Dword accum.; 3rd syscall arg.
rsi esi si sil Source index; 2nd syscall arg.
rdi edi di dil Dest. index; 1st syscall arg.
rbp ebp bp bpl Stack base pointer
rsp esp sp spl Stack pointer
r8 r8d r8w r8b 5th syscall arg.
r9 r9d r9w r9b 6th syscall arg.
r10 r10d r10w r10b 4th syscall arg.
r11 r11d r11w r11b (syscall clobbered)
r15 r15d r15w r15b

The first four registers allow access to their second byte (the high byte of the word-sized): ah, bh, ch, dh. These cannot be mixed with any of the newer registers (e.g., mov r15b, ah is invalid).

C-style functions

Registers:

Register Use
rax Return value
rbx Callee-preserved
rcx 4th argument
rdx 3rd argument
rsi 2nd argument
rdi 1st argument
rbp Callee-preserved
rsp Stack pointer
r8 5th argument
r9 6th argument
r10 Temporary (caller-preserved)
r11 Temporary (caller-preserved)
r12-r15 Callee-preserved

Stack (rsp) must be aligned to a multiple of 16 + 8 before any call. The stack is aligned to a multiple of 16 immediately after function entry, so usually we can just do either

sub rsp, 8

or

push rbp
mov rbp, rsp

Either way, we have to undo the process before ret-urning.

Arithmetic operations

add
sub
inc
dev

mul and div (and imul and idiv) and their register usage.

mul rm      ; Multiply rdx:rax by rm, store the result back into rdx:rax
div rm      ; Divide rdx:rax by rm, store the result back into rdx:rax

Note that these use rdx:rax as a 128-bit input; if you are not using the full 128 bits, you should zero rdx before the operation.

Division stores the quotient into rax and the remainder (modulo) into rdx.

Comparisons

cmp – basically just a sub which discards the results (but keeps the flags).

test – just an and which discards the results. Mostly useful for testing individual bits.

Flags and their meanings: CF, OF, SF, ZF

Condition codes

Jumps and branches

jmp target: unconditional jump

jCC target: conditional jump, replace CC with condition code

loop target: decrement rcx, jump to target if not 0.

Memory operands and arrays/strings

[displacement + scale*offset + base]

Memory-memory operations are generally forbidden.

lea reg, mem computes the effective address of mem (i.e., does the math) and then stores the address, not the value in memory, into reg.

String operations

String operations implicitly use [rdi] and [rsi] as their operands.

Instruction Description
lodsb Load byte [rdi] into al
stosb Write byte from al into byte [rdi]
movsb Copy byte from [rdi] to [rsi]
cmpsb Compare [rdi] with [rsi] and update flags
scasb Compare [rdi] with rax and update flags

Replace b with w for word-sized, d for dword, etc.

All of these implicitly increment rdi and rsi (if used).

Repetition prefixes:

Structures and alignments

struc/endstruc – Shortcut for defining a bunch of equ definitions. E.g.,

struc thing
    a:  resb
    b:  resb
    c:  resw
    d:  resd
    e:  resq
endstruc

defines the following constants:

thing:      equ     0
a:          equ     0
b:          equ     1
c:          equ     2
d:          equ     4
e:          equ     8
thing_size: equ     16

To be C-compatible, the elements of a structure must be aligned (placed in memory at a multiple of their size). So a qword member must start at a multiple of 8. Extra resbs can be used to add padding bytes, or the align directive.

Instances of structures must be placed in memory at structure alignment, which is a multiple of the largest element of the structure. E.g., the above structure would have 8-byte alignment.

Floating-point operations

Floating point registers are xmm0 through xmm15. Operations are suffixed with their operand size: ss for single-precision (float), sd for double-precision (double).

Use movss, movsd to move float values into/out of operands. There are no float immediates; store floating point constants in the .data section and then movs* them into a register.

addss dest, src         ; dest += src (float)
addsd dest, src         ; dest += src (double)
subss dest, src         ; dest -= src (float)
subsd dest, src         ; dest -= src (double)
mulss dest, src         ; dest *= src (float)
mulsd dest, src         ; dest *= src (double)
divss dest, src         ; dest /= src (float)
divsd dest, src         ; dest /= src (double)

All of these are also available in three-operand forms:

vaddss dest, src1, src2  ; dest = src1 + src2
vaddsd dest, src1, src2  ; dest = src1 + src2
vsubss dest, src1, src2  ; dest = src1 + src2
vsubsd dest, src1, src2  ; dest = src1 + src2
vmulss dest, src1, src2  ; dest = src1 + src2
vmulsd dest, src1, src2  ; dest = src1 + src2
vdivss dest, src1, src2  ; dest = src1 + src2
vdivsd dest, src1, src2  ; dest = src1 + src2

Comparisons use ucomiss, ucomisd which update the flags as if for an unsigned comparison.

Bitwise operations

and, or, not, xor, andn (AND followed by negation of the result).

These set flags, so they can be used for various purposes.

Shifts and rotates

The shift/rotate amount can either be an immediate or a byte-sized register.