;;;
;;; hello.s
;;; Prints "Hello, world!"
;;;
section .data
msg: db "Hello, world!", 10
MSGLEN: equ $-msg
section .text
;; Program code goes here
global _start
_start:
mov rax, 1 ; Syscall code in rax
mov rdi, 1 ; 1st arg, file desc. to write to
mov rsi, msg ; 2nd arg, addr. of message
mov rdx, MSGLEN ; 3rd arg, num. of chars to print
syscall
;; Terminate process
mov rax, 60 ; Syscall code in rax
mov rdi, 0 ; First parameter in rdi
syscall ; End process
This can be assembled and linked with
asm hello.s
or, manually, with
yasm -g dwarf2 -f elf64 hello.s -l hello.lst
ld -g -o hello hello.o
and then run via the usual
./hello
Today’s lecture will cover conversions between different number representations, the general-purpose registers available to us, the additional special-purpose registers used by the system to maintain our program, and we’ll hopefully end with our in-class first group project.
Operands
Each assembly instruction has a number of “operands”, inputs to the instruction.
The biggest instructions have three operands, most have two or one, and some
(like syscall
) have none. Each operand can be one of the following things
(with some restrictions which depend on the exact instruction).
A register name, like
rax
.A constant, like
60
ormsg
. (msg
, being a label, is still a constant: the assembler computes the address of the beginning of our string and writes the actual numeric value into the instruction.) Constant operands are called immediates in assembly-language terminology.Note that because the assembler can do arithmetic, an immediate can be something like
4 * msg + 1
, as the value of this is still known at assembly-time. (What you can’t do is something likerax + 1
, as the value of registerrax
is not known until the program is running.)A memory-direct lookup:
[msg]
gives the value in memory, at the addressmsg
. I.e., this would give the first 8 bytes of our string, as a qword value.An memory-indirect lookup:
[rax]
gives the value in memory, at the address stored in therax
register. There are several different forms of memory-indirect operands, which allow for accessing arrays and structures in natural ways. We’ll look at these later.
Normally when we use a memory operand we’ll give a size qualifier with it, like
byte [msg]
(first byte of the message) or qword [rax]
(64-bits at address
rax
). The size is often optional, as it can be deduced from the other operands,
but it’s good practice to put it in anyway, as it helps to catch size-mismatch
mistakes.
Often, the memory operands are grouped together, as generally speaking, wherever memory-direct operands are allowed, memory-indirect are allowed also. Hence, we can describe the allowed operand types by a combination of R(egister), I(mmediate), and M(emory). If we say that a given operand is RM, that means it supports register and memory operands, but not immediates.
Most instructions have the following restrictions:
For two-operand instructions, both must be the same size. (There are specialized instructions for converting between different sized operands.) Some instructions only operate on specific sizes.
Many instructions can only read memory values smaller than a qword.
mov
and a few others are the only instructions which support accessingqword [addr]
.Only
mov
supports 64-bit immediate values.Both operands cannot be memory in a single instruction.
The destination operand cannot be an immediate (obviously).
Representations of Numbers
Although we’re all familiar with the decimal representation of numbers, it may be helpful to review how it works: Suppose we have a decimal number, say, 15386. The numerical value of this number is given by
In other words, to find the value of a number, we start at the first (far right) digit and multiply by \(10^0\), and the accumulate a sum, multiplying each digit by 10 to the power of its respective position. We use 10 because there are 10 possible values for each digit: 0, 1, 2, …, 8, 9.
Our other systems will work essentially the same way, the only thing that changes is how many possible values there are for each digit:
For binary there are only two choices, 0 or 1, so binary is base-2.
For hexadecimal there are 16 choices, 0, 1, 2, …, 8, 9, a, b, …, e, f. Hence, hexadecimal is base-16.
For octal, there are eight choices: 0, 1, 2, …, 6, 7, so it is base-8.
Converting to decimal
To translate a binary (base-2) number — say, 1001011b
(the b
suffix
identifies it as binary) — into decimal, we compute
To translate a hexadecimal (base-16) number — say, 0x1a2b (the 0x
prefix
identifies it as hexadecimal) — we compute
b
and 10 is the value of a
).
Octal is similar, except that the base is 8. Octal is used relatively rarely, and we won’t cover it further.
Converting from decimal
Converting a decimal value into binary or hexadecimal is a bit more tricky. We’ll look at the procedure in binary, where it’s not too difficult, and then generalize it to hexadecimal.
To translate a decimal value — say 1234 — into binary, we divide by 2, and then look at the remainder: 0. This becomes the low bit, and we use the value, divided by two and rounded down (i.e., 617), as input for the next stage:
xxxxxxxxxx0
Again, we divide 617 by 2 and look at the remainder. It’s 1, so we set the next higher bit to 1, and use 617/2 = 308 as input for the next cycle:
xxxxxxxxx10
Continuing this process we have:
Input | Remainder | Binary |
---|---|---|
1234 | 0 | __________0 |
617 | 1 | _________10 |
308 | 0 | ________010 |
154 | 0 | _______0010 |
77 | 1 | ______10010 |
38 | 0 | _____010010 |
19 | 1 | ____1010010 |
9 | 1 | ___11010010 |
4 | 0 | __011010010 |
2 | 0 | _0011010010 |
1 | 1 | 10011010010 |
½ = 0, and we stop the process when we reach 0. If we want, we can check our work by converting the binary value back to decimal.
A similar process works for converting decimal to hexadecimal, except that we divide by 16. E.g., to convert 1234 to hex:
Input | Remainder | Hexadecimal |
---|---|---|
1234 | 2 | 0x___2 |
77 | 13 = d |
0x__d2 |
4 | 4 | 0x4d2 |
(Dividing by 16 each time obviously gets us to 0 much faster than dividing by 2.)
Binary arithmetic
Unsigned binary addition
Unsigned binary addition (i.e., both values positive) works in a way analogous to normal decimal addition, except that there are only two digits, so 1 + 1 = 10, and we “carry” the 1. For example:
1001011
+ 1100101
────────────
1 + 1 = 10, so we put a 0 on the answer line and carry the “extra” leading 1 digit:
1
1001011
+ 1100101
────────────
0
Again, 1 + 1 = 10 so we put a 0 and carry the 1:
11
1001011
+ 1100101
────────────
00
Again, 1 + 1 = 0, carry the 1:
111
1001011
+ 1100101
────────────
000
Add and carry again:
1111
1001011
+ 1100101
────────────
0000
1 + 0 + 0 = 1, so there’s no need to carry:
1111
1001011
+ 1100101
────────────
10000
0 + 1 = 1:
1111
1001011
+ 1100101
────────────
110000
And finally 1 + 1 = 10, and the carried 1 drops into the answer:
1111
1001011
+ 1100101
────────────
10110000
To check our work: the top value in decimal is 75, the bottom is 101, and the answer is 176, which is correct.
This example also illustrates a problem that sometimes occurs: the result of adding two n-digit values may have n+1 digits. E.g., if we add two bytes, the result may not fit in a byte! We’ll see how the CPU deals with this situation later.
Another example:
111111
01011011 = 91
+ 01110110 = 118
────────────
11010001 = 209
Unsigned binary subtraction
Subtraction follows a similar pattern, but with “borrowing” instead of carrying. E.g.,
110110
- 100001
────────────
0 - 1 = -1, so we borrow a 1 from the next column (i.e., we are doing 10 - 1 = 1)
110102
- 100001
────────────
1
We cheat a little bit and write the borrowed-into digit as “2”, and 2-1 = 1. Similarly, the borrowed-from digit (10 = 2) becomes 0.
1 - 1 = 0:
110100
- 100001
────────────
01
1 - 0 = 1:
110110
- 100001
────────────
101
0 - 0 = 0:
110110
- 100001
────────────
0101
1 - 0 = 1:
110110
- 100001
────────────
10101
And 1 - 1 = 0 (we could drop the leading 0 in the answer):
110110
- 100001
────────────
010101
To check our work: the top is 54 and the bottom is 33, and the result is 21, which is correct.
Sometimes we may need to borrow more than once in a row:
1 1 1 0 0 0
- 1 0 0 0 1 1
─────────────────────
The first step is 2 - 1 = 1, with the extra 1 borrowed from the next column.
-1
1 1 1 0 0 2
- 1 0 0 0 1 1
─────────────────────
1
But then the next column is -1 + 0 - 1. So we borrow, again, from the next column, giving us -1 + 10 - 1 = 0:
-1 -1
1 1 1 0 2 2
- 1 0 0 0 1 1
─────────────────────
0 1
We have to keep borrowing over until we find a digit that is set:
-1 -1 -1
1 1 1 2 2 2
- 1 0 0 0 1 1
─────────────────────
0 1 0 1 0 1
What happens if the bottom value is larger than the top?
1
011
- 100
────────
-1 111
Arithmetically, this says that 3 - 4 = 7, with an extra borrow. The answer that we get is effectively the result we would have if there was another column to borrow from (i.e., if we had done 1011 - 100 the correct result is 7).
We’re left trying to borrow a 1 that doesn’t exist. This is the opposite situation of adding two values where the result doesn’t fit; here we need an extra 1 which is not present. As we’ll see later, both these situations are treated similarly by the CPU, by setting a flag to indicate that a carry/borrow occurred in the most recent addition/subtraction operation.
Another example:
-1-1 -1-1
0 1 1 1 0 1 1 0 = 118
- 0 1 0 1 1 0 1 1 = 91
───────────────────
0 0 0 1 1 0 1 1 = 27
Registers and memory
Registers occupy the highest level of the memory hierarchy; they are located on the CPU itself and are directly accessible by instructions, and thus are the fastest place to store values you are working with. On the other hand, because they must be physically close to the CPU, they can’t take up too much space; x86-64 has 16 64-bit general purpose registers, 16 128-bit floating-point/SIMD registers, and a number of special-purpose registers.
General purpose registers
The general purpose registers are arranged in such a way that the full 64-bits
is partitioned into the low 32 bits of that, the low 16 bits of that, and the
low (and sometimes high) 8 bits of that. E.g., for rax
:
rax (64 bits) | |||||||
eax (32 bits) | |||||||
ax (16 bits) | |||||||
ah (8 bits) | al (8 bits) |
Only registers rax
, rbx
, rcx
, and rdx
allow access to the high byte (of the
low word, so that it’s actually kind of in the middle of the whole register),
via ah
, bh
, ch
, and dh
. There are some restrictions on when/where these
can be used (in 64-bit mode).
64-bits | Low 32-bits | Low 16-bits | Low 8-bits | Comment |
---|---|---|---|---|
rax |
eax |
ax |
al |
syscall code and return; Accumulator |
rbx |
ebx |
bx |
bl |
Base |
rcx |
ecx |
cx |
cl |
Loop-Count; syscall temp register |
rdx |
edx |
dx |
dl |
3rd syscall arg.; Dword accum. |
rsi |
esi |
si |
sil |
2nd syscall arg.; Source index |
rdi |
edi |
di |
dil |
1st syscall arg.; Dest. index |
rbp |
ebp |
bp |
bpl |
Stack base pointer |
rsp |
esp |
sp |
spl |
Stack pointer |
r8 |
r8d |
r8w |
r8b |
5th syscall arg. |
r9 |
r9d |
r9w |
r9b |
6th syscall arg. |
r10 |
r10d |
r10w |
r10b |
4th syscall arg. |
r11 |
r11d |
r11w |
r11b |
syscall temp. reg. |
… | … | … | … | |
r15 |
r15d |
r15w |
r15b |
When we get to functions, we’ll see that there is an additional categorization into which each register falls: whether, when we call a function, are we, the function caller responsible for saving the register if we need it (“callee-preserved registers”), or is the function we call responsible for saving the register if it needs it (“caller-preserved registers”)?
rsp
and rbp
, although general-purpose, are used for managing the stack;
rsp
points to the top of the stack, while rbp
(“base pointer”) traditionally
points to the beginning of the current function’s stack frame (as we’ll see,
this is not automatic). rsp
should not be used for anything else, but rbp
is not strictly off-limits. Note that rsp
points to the element on the top
of the stack, and not the empty space above it.
rax
is called the accumulator and several instructions implicitly use it
as their destination. Similarly, rbx
is sometimes called the “base” register,
rcx
is called the “count register”, and rdx
is called the “dword accumulator”;
there are a few instructions that use them implicitly, but for the most part you
can use them for any purpose.
By “use implicitly”, I mean there are instructions which either read their input,
or write their output, to one of these registers without mentioning it.
For example, to divide rax
by rbx
the instruction is
idiv rbx
This will read from rax
(and rdx
!) and then write the division back into rax
and the remainder into rdx
, even though it doesn’t mention them.
rsi
and rdi
are the Source and Destination Indexes used by certain string
operations implicitly, but you can use them as general-purpose registers in
other contexts.
When a double-qword (128-bit) value is needed, it is commonly stored in a
combination of rax
and rdx
, with the high qword in rdx
(this is
written as rdx:rax
). We’ll see this with multiplication and division.
The SIMD/floating-point registers are named xmm0
through xmm15
and
share space with the
floating-point registers named fpr0
-fpr7
, and can only
be used with special floating-point/SIMD instructions.
(Generally these instructions start with f
or p
; e.g., fadd
is
floating-point addition.) They cannot be accessed by normal instructions.
Syscall register usage
Syscalls will return an error code via rax
, which means that the value of
rax
after the syscall returns is probably not what you set it to. Similarly,
the two “temporary” registers rcx
and r11
are overwritten by the syscall.
Second-byte registers
The registers rax
, rbx
, rcx
, and rdx
are the only registers which allow
access to the second bytes, via ah
, bh
, ch
, and dh
. However, there
are some restrictions on the use of these registers: they cannot be used with
any instruction which uses the REX
prefix, which all the “new” 64-bit
instructions use. So any instruction which uses features added by x86-64 is
unable to also use the *h
registers. Examples of this restriction include:
mov ah, sil
– The low-byte versions ofrsi
,rdi
,rsp
,rbp
were added by x86-64 and hence require theREX
prefix to use.mov r8b, ah
– Similarly,r8
throughr15
were added by x86-64.mov ah, byte [rax]
– Using the full qword width ofrax
, even as an address, requires theREX
prefix.The instructions that convert values of different sizes cannot convert from
*h
to any of the new 64-bit registers.
We won’t use the *h
registers much, so you probably won’t run into these
restrictions, but they’re worth being aware of.
syscall
register use
As we’ve seen, some registers are used specially by syscall
s: rax
is used
for the syscall code, and then rdi
, rsi
, rdx
, r10
, r8
and r9
are
used for the arguments to the syscall, in that order. (As we’ll see, this
order is very similar to the order used for C-style function calls; the only
difference is that rcx
is used instead of r10
.) There are no syscalls
that take more than six arguments. If the syscall returns a value, it will be
in rax
. Negative values generally indicate an error.
The syscall
itself is allowed to overwrite the values in the rcx
and r11
registers, but it will preserve all other registers. You should bear this in
mind if you are using rcx
or r11
with syscalls: the values you put in the
registers before a syscall may not be there after the syscall!
This usage is not an official part of x86-64 assembly, but simply a convention
written into the System-V Unix ABI specification which describes how programs running on x86-64 Unix
systems can expect to interact with the OS. (Among other things, the specification
also states that the address space of each process is 48-bits, and that each
process’s .text
section are mapped starting at 0x400000.)
The mov
instruction
The most fundamental assembly instruction is mov
, which moves data from one
location (memory, register, immediate) to another (memory, register). It has
the form
mov destination, source
where destination can be a register or memory, and source can be
a register, memory, or immediate value. Both destination and source
must be the same size, and both cannot be in memory in a single instruction.
(Memory-to-memory transfers require two mov
instructions.)
An important thing to remember is that, in 64-bit mode, mov
is the only
instruction which supports qword immediate operands. All other
instructions can operate on 64-bit values only if they are already loaded into
a register. Thus, most qword operations on immediate values begin with
a mov
. For example, you cannot add a 64-bit constant to a register directly:
add rax, some_huge_constant
you have to mov
the constant into a register, and then add it:
mov rbx, some_huge_constant
add rax, rbx
A special case is when the source/destination are dword (32-bit) values, e.g.,
mov eax, ebx
In this situation, and only this situation, the high double-word of rax
is
implicitly set to 0. This zeroing does not occur when setting ax
, or
ah
/al
. (This behavior applies not just to mov
but to many other
instructions as well. E.g., xor eax, ebx
will zero the high dword of rax
.)
Swapping registers
Swapping (exchanging) the values in two registers (or a register and a memory location) is a common enough operation that a dedicated instruction is provided for it:
xchg a, b
exchanges the values in locations a and b. Either can be a register or memory, but both cannot be memory at the same time, and neither can be an immediate value (for obvious reasons). This allows us to swap values without needing a third “temporary” register.
Like mov
, xchg
on the 32-bit registers (eax
, ebx
, etc.) will implicitly
zero the high dword.
Clearing registers
The easiest way to set a register to 0 is
mov reg, 0
A slightly more efficient way is to XOR the register with itself:
xor reg, reg
Remember that the result of XOR is 1, if and only if only one of its inputs is 1. If we XOR a value with itself, each pair of bits being XOR’d is either (0,0) (0 XOR 0 = 0) or (1,1) (1 XOR 1 = 0), so the result is to 0 all the bits.
The opcode for xor
is smaller than for mov
with an immediate, and thus
can be loaded into the CPU faster; it also allows the CPU to perform a number of
data flow optimizations that are not otherwise possible. Eventually, your
brain will automatically translate xor reg, reg
into reg = 0
, but I don’t
particularly care which you use.
Special purpose registers
The following registers exist for particular purposes, which are enforced by
the CPU. Either you can’t put general data into them, or get it out of them.
Typically specialized instructions (not mov
) have to be used to access them.
rip
The Instruction Pointer points to the next instruction to be executed (i.e., the instruction immediately after this one). The low 32-bits are accessible aseip
, and the low word asip
, but since addresses are always 64-bits, this isn’t particularly useful. Branch instructions modifyrip
directly.The
rflags
register: each bit of the flags register has a different meaning, and the various flags are set or unset depending on the results of certain operations. We will look at the flag register in depth when we learn about tests and conditional operations. Normally you don’t need to worry about accessing the flags register, as it is set and tested by the relevant operations automatically.The flags register is organized as
Bit 0 2 4 6 7 10 11 21 Purpose Carry (CF) Parity (PF) Adjust (AF) Zero (ZF) Sign (SF) Overflow Direction (DF) Identification (ID) (Unused bits are reserved.)
- CF is set if the previous addition/subtraction operation ended with a carried (or borrowed) 1.
- PF is set if the the last operation produced an even number of 1s.
- AF is set if the last BCD addition/subtraction operation ended with a carry. We’ll look at BCD later.
- OF is set if the last signed arithmetic operation overflowed (wrapped around).
- DF determines the direction the repeating string operations move in (increment or decrement). We’ll see its use when we look at string operations.
- ID indicates the presence of the
cpuid
instruction. All modern x86 CPUs support this instruction, so we can ignore this flag.
Normally we don’t need to worry about examining the flags register, as there are dedicated condition instructions (e.g., branches, moves) that execute only if a specific flag is/is not set. If you want to set/clear a specific flag, the
st*
/cl*
family of instructions do so. E.g.,stc
sets the carry flag to true, whileclc
clears it (sets it to false).The data segment registers
ds, es, ss, fs
andgs
and the code segment registercs
are not useful in (non-kernel) x86-64 code, but you should not use them for your own purposes, either. (Windows and Linux both usefs
andgs
to point to thread-local storage, but that’s not a standard thing, just a convention.) They control how your process’s memory is mapped to the system’s memory address space.The control registers
cr0
throughcr15
are not accessible in user-mode code at all; they control whether the CPU is running in protected (kernel) or unprotected (user) mode, whether it is running in 16-, 32-, or 64-bit mode, etc.There are a few additional registers that have to do with memory management, debugging breakpoints, internal performance parameters, etc. Most of these are not useful for us, and many are inaccessible by user code anyway.
The normal mov
instruction usually cannot be used to manipulate these
registers. Instead, specialized instructions exist for getting/setting their
values.
In-class group project
To allow you to get your feet wet with assembly, here is our first group project:
Write an assembly program which prompts the user for their name, printing
What is your name?
and then accepts up to 255 characters of input, and then prints outHello,
name, nice to meet you!
followed by a newline.
You’ll have to use both the SYS_WRITE
(= 1) and SYS_READ
(= 0) syscalls.
Use the following .data
section:
section .data
prompt: db "What is your name?"
prompt_len: equ $-prompt
buffer: times 255 db '!'
resp1: db "Hello, "
resp1_len: equ $-resp1
resp2: db ", nice to meet you!", 10
resp2_len: equ $-resp2
buffer
is the input buffer to pass to the SYS_READ
call; it consists of
255 !
characters. Note that SYS_READ
will “return” the actual number of bytes
read in rax
, which you will then have to use when you print out the contents
of the buffer. (If you get the length of the input wrong, you’ll see either the
user’s name cut off, or with !!!!
s added onto the end of it.)
The “fd” parameter to both SYS_READ
and SYS_WRITE
is a file descriptor,
a number which identifies a file or stream. The standard file descriptors which
are always available are
FD Number | Stream |
---|---|
0 | Standard input |
1 | Standard output |
2 | Standard error (output) |
So you’ll SYS_READ
from FD #0, and SYS_WRITE
to FD #1 (as we did before).
Don’t forget to end your program with a SYS_EXIT
(= 60) syscall, to gracefully
end your program!
Save your work in ~/cs241/group1/
.