Today we’ll cover:
Binary representations for negative values.
In-memory representations of multi-byte numbers.
The special-purpose registers
rif
,rflags
and a few others.The arithmetic instructions
add
,sub
,inc
,dec
.Their effects on the flags register
rflags
.We will briefly look at the unsigned multiplication/division/modulo instructions
mul
anddiv
.The function instructions
call
andret
Instructions covered last time:
mov
– Move operand values between registers, memory.inc
– Increment a register or memory value.syscall
– Invoke an operating-system function.
Review
Registers
Last time we looked at the full set of general-purpose registers available to
us: rax
, rbx
, rcx
, rdx
, rdi
, rsi
, rbp
, rsp
, r8
through r15
.
We saw how each of these can be accessed either its full 64-bit (qword) width, or as the low dword, or the low word, or the low byte:
rax (64 bits) | |||||||
eax (32 bits) | |||||||
ax (16 bits) | |||||||
ah (8 bits) | al (8 bits) |
We saw that many instructions (including mov
and xor
), when operating on the
low dword portion of a register, will implicitly zero the high dword, and we
looked at some strategies for preserving the high dword.
We looked at the flags register rif
which is used to store information about
the results of various operations. The most important flags for us will be
CF – carry flag, set if the result of an unsigned operation produces an extra carry/borrow
OF – overflow flag, set if the result of a signed operation is too big/small to store
ZF – zero flag, set if all bits of the result are 0.
SF – sign flag, set if the high bit of the result is set
We saw that the mov
instruction is our basic tool for moving data around,
between registers, memory, and immediate (constant) values. We also saw the
xchg
instruction, which swaps operands, saving us from having to use a
register as a temporary variable.
Negative values
All of the arithmetic operations described previously assume that the incoming value is positive. How are negative values handled? Historically, there have been four choices:
Sign-bit: in decimal, we indicate a negative value by a
-
sign, so why not simply dedicate a bit to meaning “this number is negative”? Historically, the high bit has been dedicated to this. So 00000011b is 3, but 10000011b is -3. Although this method is simple for us to understand, it has several drawbacks when it comes to implementation:- There are now two values for 0, a positive and a negative.
- Arithmetic on values is more complicated, as we have to check the sign bits of both values. If we forget and perform an unsigned operation, the results are nonsense.
- The CPU has to switch between performing an addition operation, and performing a subtraction operation, depending on the value of the sign bit. That is, the CPU cannot set itself up to perform an addition/subtraction until it also knows the values being operated on.
Floating-point values use a sign bit, partly because they want to have both positive and negative zeros.
Biased: this represents all values (not just negative ones) as being arithmetically “shifted” by a fixed amount. So 0 is not 0, but 0+127 (=01111111b) or something. 3 is 3 + 127 = 10000010b, and -3 is -3 + 127 = 01111100b. Note that the high bit serves as a kind of “positive sign bit”; if it is set, then the number is positive (greater than 0).
Addition and subtraction work (kind of) normally, except that we have to “unbias” the values after performing the normal operation. E.g., adding 3 and -3 gives:
10000010 + 01111100 ───────────── 11111110 = 127 (254-127)
We have to subtract 127 from the result again, because two “copies” of 127 were added, so the final result is actually
11111110 - 01111111 ───────────── 01111111 = 0 (127-127)
Positive numbers look weird. Zero looks weird.
Detecting non-negative numbers (≥ 0) is tricky.
Biased representation is used in a few places (e.g., in the exponent of floating point values)
Ones-complement: this represents negative values as the binary inverse (flip all bits) of the corresponding number. So 3 is 00000011b, while -3 is 11111100b. Note that we can determine whether a number is negative by examining the high bit: if it is set, the number is negative. But the high bit is not a sign bit: we cannot make a negative number positive by simply flipping the high bit, we have to flip all the bits.
If we perform a normal binary addition of +3 and -3, we get
00000011 + 11111100 ───────────── 11111111 = -0
As with the sign-bit representation, we have two representations for 0, a positive one (as 00000000b) and a negative one (as 11111111b). However, we can use normal binary addition to add signed numbers, and the result will be correct when interpreted as a ones-complement number.
Subtraction is a bit more difficult:
111111 00000011 = 3 - 00000100 = 4 ───────────── 11111110 = -1
When the borrowed 1 gets to the far left, it “wraps around” and is borrowed from the low bit of the answer. This is known as an end-around borrow. We have to adjust for this when it happens.
Twos-complement: this represents negative values as the binary inverse (flip all bits) plus 1. Technically, the twos complement is defined as 28 - n. So 3 is 00000011b while -3 is 11111101b. Note that the high bit still can be used to detect negative values. You can think of twos complement as 1-biased ones complement.
- To negate a value, take its ones complement (i.e., flip every bit) and then add 1 to the result. E.g., if 3 = 00000011b, -3 = 11111100b + 1 = 11111101b.
All arithmetic operations work normally, without needing to know the sign of the incoming values. We can just do normal addition, and the result will “make sense” regardless of whether either or both of the inputs were negative. E.g.,
1111111 00000011 + 11111101 ───────────── 00000000
There is only one representation of 0, not two, and it’s just 0b.
The range of values represented (by a byte) is -128 to +127.
There’s no need to do an end-around borrow/carry in twos complement
x86-64 systems (and many others) use twos-complement, because it doesn’t require any extra circuitry to represent or manipulate negative values; you just do “normal” binary addition and the results are correct.
Multiplication is an expensive enough operation that we generally don’t bother trying to do it “inside” the number representation. Instead, we just make both operands positive, multiply, and then negate the result if needed.
Another example:
11 1
01110110 = 118
+ 11100101 = -27
────────────
01011011 = 91
To negate a (positive or negative):
Flip all the bits
Add 1 (this boils down to setting the lowest zero bit to 1)
This works regardless of whether the input value is positive or negative. E.g.,
11100101 = -27
00011010 (flip all bits)
00011011 = 27 (add 1)
Expanding data sizes
Suppose we have a 8-bit value and we wish to store it into a 16-bit location. If the value is unsigned then this is easy: we copy the value into the low 8 bits, and we fill the high 8 bits with 0s. But what if the value is signed (twos-complement)? In this case, in order to get the equivalent value, we must sign extend the number, filling the high bits with copies of the high bit in the original value. If the high bit was originally 1, then all the high 8 bits must be 1, otherwise they should be 0.
Many arithmetic operations that can “mix” values of different word sizes will come in two forms: an unsigned form that “zero-extends” (fills with 0s) and a signed form that sign-extends (copies the high-bit).
In-memory representations
For single-byte values, the above representations are used. For multi-byte values, however, there are a few options. Consider a 16-bit value. When we place this into a memory address a, it can be done in two ways:
We can place the low byte (8 bits) into address a, and the high byte into a+1. This is called little-endian, because the low byte comes first.
We can place the low byte into address a+1, and the high into a. This is called big-endian, because the high byte comes first.
If big-endianness seems insane, consider that this is how a 16-bit value would be written from left-to-right, assuming memory addresses increase to the right:
high byte | low byte |
addr | addr + 1 |
Little-endian is used by Intel systems, and thus we don’t need to worry about big-endian. Big-endian is used by a few microcontrollers (AVR32) and a few big-iron processors. If you’re writing a file format (or a network protocol) then you have to define a “standard” endianness and have to ensure that the correct translations are done in software, when necessary. But for us, if you want to access the high byte of a 16-bit value in memory, it can be found at address + 1.
Accessing memory
In 64-bit mode all addresses are 64-bits, hence the full register (rax
, rbx
,
etc.) must be used to store an address. As we’ve seen, the label used to
define a string in the .data
section is actually the address of that string,
hence we can load the address of a string my_text
into rax
with
mov rax, my_text
You can think of rax
in this usage as a pointer-typed variable, holding the
address of something.
We can “dereference” a memory address by putting it in square brackets:
mov al, byte [my_text]
The byte
qualifier is not strictly required here, but it’s good practice to
add it. An easy mistake to make is
mov rax, [my_text] ; Read one *qword* from my_text
which reads not one byte but eight (qword), where as
mov rax, byte [my_text] ; Read one byte from my_text
will give an error when you try to assemble.
(Of course, if you need more than just the first byte, you might want to get them in chunks of eight, for speed…)
In reality, if we were processing a string, we would want to iterate through it,
rather than just accessing the first byte. It would be more useful to put the
address my_text
into a register and then “dereference” that:
mov rsi, my_text
mov al, byte [rsi]
We can then inc rsi
to increment rsi
and access the next byte in the string.
Because my_text
is an immediate, we cannot increment it. (Again, the byte
qualifier on [rsi]
is not required, as it could be inferred from the size of
al
.)
Simple looping
Because doing anything interesting will require looping, we’ll introduce the
loop
instruction. loop
takes a single operand, a label to jump to
(internally, loop
stores the offset of the label’s address from the current
instruction’s address). The operation of loop
is to perform the following
steps:
Decrement
rcx
If
rcx != 0
, jump to the labelIf
rcx == 0
, continue to the next instruction normally.
Thus, the structure of a basic loop would look something like this:
mov rcx, init ; Initialize rcx > 0
.start_loop:
; ... Perform loop operation using rcx
loop .start_loop
; ... Continue after end of loop
This is roughly equivalent to a C/C++-style do-while
loop:
rcx = init;
do {
// ... Perform loop operation
--rcx;
} while(rcx != 0);
Note that because rcx
is one of the registers syscall
is allowed to clobber,
if you do any syscalls inside the loop, you will need to save rcx
before the
call, and then restore it after, before loop
.
As a demo of this, we can modify our “Hello, world” program to print
“Hello, world!” backwards, by printing one character at a time, from the
end to the beginning. (We’ll still use the write
syscall, we’ll just tell
it to print a single character instead of the entire string.)
It may be useful to consider how we would do this in C or C++. The C++ equivalent to our original assembly program would be something like this:
int main()
{
char* msg = "Hello, world!";
const int MSGLEN = 13;
cout.write(msg,MSGLEN); // equiv. to write syscall
}
To write one character at a time, we need a loop that starts at the end of the string, and writes one character at a time, backwards, something like this:
int main()
{
char* msg = "Hello, world!";
const int MSGLEN = 13;
int c = MSGLEN;
do {
char* addr = msg + c - 1;
cout.write(addr,1);
--c;
} while(c != 0);
}
I’ve intentionally written the do-while
loop in a way that mirrors the
execution of the loop
instruction, to make it easier to translate to
assembly.
Our original program looked like this:
section .data
msg: db 10, "Hello, world!"
MSGLEN: equ $-msg
section .text
;; Program code goes here
global _start
_start:
mov rax, 1 ; Syscall code in rax
mov rdi, 1 ; 1st arg, file desc. to write to
mov rsi, msg ; 2nd arg, addr. of message
mov rdx, MSGLEN ; 3rd arg, num. of chars to print
syscall
;; Terminate process
mov rax, 60 ; Syscall code in rax
mov rdi, 0 ; First parameter in rdi
syscall ; End process
I’ve removed the trailing 10 (\n
) from the text, and moved it to the beginning,
so it will still print at the “end”.
The first syscall
will be inside the loop, so we can add:
section .data
msg: db 10, "Hello, world!"
MSGLEN: equ $-msg
section .text
;; Program code goes here
global _start
_start:
mov rdi, 1 ; 1st arg, file desc. to write to
mov rdx, 1 ; 3rd arg, num. of chars to print
.begin_loop
mov rax, 1 ; Syscall code in rax
mov rsi, msg ; 2nd arg, addr. of message
syscall
loop .begin_loop
;; Terminate process
mov rax, 60 ; Syscall code in rax
mov rdi, 0 ; First parameter in rdi
syscall ; End process
Note that syscall
preserves rdi
and rdx
, so we can set those once
outside the loop. However, rax
is used for the return value, so we should set
it every time through the loop, and rsi
is the address of the start of the
string, which will change as we move through the string.
We need to initialize rcx
to the length of the string:
mov rcx, MSGLEN
and then we set rsi
(the address to write) to rcx + msg - 1
.
mov rsi, rcx
add rsi, msg-1
(add a, b
performs addition, a += b
, and dec a
decrements --a
. Both are
subject to the usual restrictions: no memory-to-memory operations, both
operands of the same size, etc. Because msg
is a constant, the msg-1
is performed at assembly-time.)
Finally, note that rcx
is one of the registers that syscall
is allowed
to “clobber” (r11
is the other), so we have to save it into another, safe
register before the syscall
and then restore it afterwards:
mov r15, rcx
syscall
mov rcx, r15
That leaves us with:
global _start
_start:
mov rdi, 1 ; 1st arg, file desc. to write to
mov rdx, 1 ; 3rd arg, num. of chars to print
mov rcx, MSGLEN ; loop counter = MSGLEN
.begin_loop
; Print 1 char at [msg + rcx - 1]
mov rax, 1 ; Syscall code in rax
mov rsi, rcx ; rsi = addr to print
add rsi, msg
dec rsi
mov r15, rcx ; Save rcx before syscall
syscall
mov rcx, r15 ; Restore rcx
loop .begin_loop
;; Terminate process
mov rax, 60 ; Syscall code in rax
mov rdi, 0 ; First parameter in rdi
syscall ; End process
Local labels
When writing loops or other labels that exist inside a function, it’s useful
to write them as “local” labels, by starting them with a period. A local label
is actually named after the most recent non-local label, so the full name of
.begin_loop
is actually _start.begin_loop
. Labels can normally only be
defined once per file, so without a local label, no other function we wrote
could use the label begin_loop
.
We’ll use local labels for all our loop and branch targets, and only use non-local labels for functions.
Negative rcx
In case you’re curious, let’s consider what happens if rcx
is negative and
we decrement it. E.g., if rcx = 11111111
(= -1), and we decrement:
11111111
- 00000001
────────────
11111110 = -2
In other words, the result is exactly what you’d expect (but not particularly
useful when used with loop
).
Loop variants
There are two variants of the loop
instruction which test the zero flag (ZF)
along with the value of rcx
:
loope
– “Loop Equal”; decrementrcx
, loop ifrcx != 0 and ZF == 1
loopne
– “Loop Not Equal”; decrementrcx
, loop ifrcx != 0 and ZF == 0
The zero flag is connected to the idea of (in)equality because, if we perform a subtraction:
sub a, b
and a == b
, then the zero flag will be set, otherwise it will be unset.
Including files
Like C/C++, yasm
has a simple mechanism for including the contents of one .s
file into another:
%include "source.s"
copies the contents of source.s
into the current assembly file. For example,
we could start to centralize a lot of our syscall
definitions include an
include file:
;;;
;;; sysdefs.s
;;;
[section .data]
SYS_write equ 1
SYS_exit equ 60
SYS_stdin equ 0
SYS_stdout equ 1
...
__SECT__
The [section .data]
and __SECT__
stuff is “magic” to temporarily switch to
the data section and then switch back to whatever section we were in before.
Arithmetic operations
add dest, src ; dest += src
sub dest, src ; dest -= src
add
and sub
perform addition and subtraction between two operands of the
same size. Internally, sub
is just addition with the second operand negated,
and the carry inverted at the end.
Like many operations, add
and sub
are dyadic: they take two operands,
a destination and a source, with the destination serving as both the
second input and the target for the output of the operation.
Flags
add
and sub
set/unset the OF, SF, ZF, AF, CF, and PF flags:
For signed operations the OF flag indicates that an overflow/underflow occurred. This bit is set if the sign bit of the result is incorrect, because the correct result was too big/small for the destination to hold. For example, signed 127+127 will generate an overflow. (Unsigned 127+127 = 254 which can still fit in an unsigned byte, hence would not set the carry flag.)
The correct sign of the result is easy to determine: if both inputs are positive, the result should be positive; if both are negative, then the result should be negative. If one is positive and the other is negative, then the correct sign of the result is more difficult to determine, but it turns out in this case that it doesn’t matter: it’s impossible for an addition of a positive and a negative value to overflow. Subtraction is treated similarly, except that the sign of the second operand is flipped (i.e.,
a - b
is treated asa + (-b)
).If the inputs are unsigned, then the OF flag is still set/unset, but its value is meaningless.
For unsigned operations the CF flag indicates that an extra carry/borrow was “left over” at the end. This indicates that the result of the operation was too big/small to for the destination to hold. For example, 255+127 is too big for a byte, and would set the carry flag. It would not set the overflow flag, because 255 unsigned = -1 signed, and 127-1 = 126 which fits in a signed byte.
Note that after an unsigned subtraction
a - b
, if the carry flag is set, this indicates thatb > a
.The CF flag is still set/unset if the inputs are signed, but its value is meaningless in that case.
The ZF flag is set if the result was 0 (all zero bits).
The SF flag is set to a copy of the sign bit (set if result was negative, for signed operations). For unsigned operations its just a copy of the high bit of the result.
The PF flag is set if the number of set (=1) bits in the low byte of the result is odd. The parity flag is something of a historical artifact and is not used much, partly because it does not give the parity of the entire result, just the lowest byte of it.
We’ll ignore the AF flag, as it only makes sense in the context of BCD arithmetic.
Note that all flags are set/cleared on all operations, but some flags only
make sense on signed/unsigned operations. The add
/sub
instructions don’t
know whether you are performing a signed or unsigned operation, so its up to you
to make sure you check the correct flags for the type of operation you are
performing.
Example: Let’s perform a add
operation and see how the flags are set from it:
111 11
10110011 = 179 (unsigned) = -77 (signed)
+ 01100110 = 102 (unsigned) = 102 (signed)
────────────
1 00011001 = 25 (unsigned) = 25 (signed)
Interpreted as signed values, one input was negative, the other was positive, so the sign of the result is guaranteed to be correct.
OF = 0
.Interpreted as unsigned, the addition produced an extra carry,
CF = 1
.The result is not 0, so
ZF = 0
.The number of set bits in the result is odd, so
PF = 1
.The high bit is unset (the result is positive), so
SF = 0
.
Here are two tables summarizing the results of addition and subtraction on the OF, CF, ZF, and SF flags (original):
Addition | ||||||||||||
A | B | A + B | Flags | |||||||||
Hex | U | S | Hex | U | S | Hex | U | S | OF | SF | ZF | CF |
7F | 127 | 127 | 0 | 0 | 0 | 7F | 127 | 127 | 0 | 0 | 0 | 0 |
FF | 255 | -1 | 7F | 127 | 127 | 7E | 126 | 126 | 0 | 0 | 0 | 1 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
FF | 255 | -1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
FF | 255 | -1 | 0 | 0 | 0 | FF | 255 | -1 | 0 | 1 | 0 | 0 |
FF | 255 | -1 | FF | 255 | -1 | FE | 254 | -2 | 0 | 1 | 0 | 1 |
FF | 255 | -1 | 80 | 128 | -128 | 7F | 127 | 127 | 1 | 0 | 0 | 1 |
80 | 128 | -128 | 80 | 128 | -128 | 0 | 0 | 0 | 1 | 0 | 1 | 1 |
7F | 127 | 127 | 7F | 127 | 127 | FE | 254 | -2 | 1 | 1 | 0 | 0 |
Subtraction | ||||||||||||
A | B | A - B | Flags | |||||||||
Hex | U | S | Hex | U | S | Hex | U | S | OF | SF | ZF | CF |
FF | 255 | -1 | FE | 254 | -2 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
7E | 126 | 126 | FF | 255 | -1 | 7F | 127 | 127 | 0 | 0 | 0 | 1 |
FF | 255 | -1 | FF | 255 | -1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
FF | 255 | -1 | 7F | 127 | 127 | 80 | 128 | -128 | 0 | 1 | 0 | 0 |
FE | 254 | -2 | FF | 255 | -1 | FF | 255 | -1 | 0 | 1 | 0 | 1 |
FE | 254 | -2 | 7F | 127 | 127 | 7F | 127 | 127 | 1 | 0 | 0 | 0 |
7F | 127 | 127 | FF | 255 | -1 | 80 | 128 | -128 | 1 | 1 | 0 | 1 |
Increment and decrement
inc dest ; ++dest
dec dest ; --dest
inc
and dec
increment/decrement their single operand, which can be either a
register or a memory location. inc
and dec
do not modify the carry flag,
as an add r, 1
or sub r, 1
instruction would.
The flags OF, SF, ZF, AF, and PF are set/cleared as expected. When used on
signed values, the behavior is still correct (incrementing a negative value
brings it closer to 0, decrementing a negative value makes it more negative).
Addition/subtraction larger than 64-bits
The largest registers we have are 64-bits (qword). What if we want to perform
an addition/subtraction on 128-bit operands (represented as, e.g., rdx:rax
)?
Let’s consider how we would perform word-sized addition, if the only addition
we could do natively was byte-sized:
111111← 1111
00101101 11001101
+ 00010010 10101011
─────────────────────
01000000 01111000
Adding the low bytes produced an extra carry (CF = 1), which we then used to start the addition of the high bytes. We effectively need two kinds of addition:
Low byte addition, which doesn’t start with a carry (ignores CF)
High byte addition, which uses CF to start the addition.
This is how we perform larger-than-qword addition, there is another addition
operation, add-with-carry, adc
which uses the status of the carry flag CF
as an input for the first bit’s addition.
adc dest, src ; dest = dest + src + CF
For subtraction there is sbb
, subtract-with-borrow.
Thus, to add the double-qword rdx:rax
to rcx:rbx
, we would do
add rax, rbx
adc rdx, rcx
The analogue for subtraction is sbb
, subtract-with-borrow.
Multiplication and division
Multiplication and division are more complex than addition/subtraction. We will cover them in more detail later, but for now:
The result of multiplying two n-bit values can have up to 2n bits. Hence, when multiplying two qword values, we need somewhere to store the double-quad-word result.
Division has the opposite problem, we may want to divide a double-qword value by a qword divisor.
Signed multiplication is different from unsigned, and similarly for division. Different instructions are used for each.
Multiplication instructions exist in two-operand forms (
dest *= src
) as you would expect, but also in a one-operand form which therax
register is implicitly used as the destination for the operation, and even a three operand form, equivalent todest = src * immediate
. The division instructions only take a single operand, and always store their results into a combination ofrax
andrdx
.
To store a double-qword (128-bit) result, we use a combination of rax
and
rdx
: rax
stores the low qword while rdx
stores the high qword. We
write this combination as rdx:rax
. (Using a similar notation, we could say
that ax = ah:al
.) Smaller multiplications do not require this extension.
The unsigned/signed multiplication instructions are mul
and imul
, respectively:
Instruction | Equivalent |
---|---|
mul rm |
rdx:rax *= rm , unsigned |
imul rm |
rdx:rax *= rm , signed |
imul r, rm |
r *= rm , signed |
imul r, rm, imm |
r = rm * imm , signed |
(For some reason, the signed multiply comes in two- and three-argument variants, while the unsigned only takes one.)
The CF and OF flags are set/cleared together, if the sign of the result is incorrect. If the result of the multiplication does not fit into the destination, the results are truncated (high bits discarded). The values in the other flags are undefined.
Division only has a single operand form, where the operand contains the
divisor; the destination (which is also the dividend) is in rdx:rax
. The
result of div
/idiv
is both the rounded-down result in rax
, but also
the remainder (i.e., modulo or %
) in rdx
. Unlike C++ where we have
/
for integer division and %
for integer modulo, in assembly a single
instruction gives us both results.
Instruction | Equivalent |
---|---|
div rm |
rax = rdx:rax / rm and rdx = rdx:rax % rm , unsigned |
idiv rm |
rax = rdx:rax / rm and rdx = rdx:rax % rm , signed |
An overflow in division is indicated not by setting the carry flag, but by
a divide-error exception #DE
, which is sent to our process as a signal
SIGFPE
. For now, this will immediately crash our program, but later we’ll
see how to write a signal handler to deal with it in a more graceful manner.
(Of course, we could also avoid the overflow by checking the operands before
performing the division.)
Simple functions
As we’ll see later, calling C functions from assembly, or making our assembly functions callable from C/C++ requires a few extra steps to set up the stack correctly. However, as long as we stay purely in “assembly-land”, we don’t need to worry about the extra complexity; we can essentially make functions work however we like. The only requirements are that we be able to return from a function and get back to where we were.
The two instructions that handle functions are call
and ret
. Both use
the stack internally:
call
takes an address (a label in our.text
section) and performs two steps: pushesrip
, the instruction pointer, onto the stack, and then jumps to the address given. Remember thatrip
points to the next instruction to be executed, so the value pushed onto the stack is effectively the return address of the function, the address where execution should resume when the function returns.ret
pops the top element of the stack and jumps to it.rip
is updated to the following instruction automatically.
These work together as follows (addresses are just made up):
Address | Instruction | Address | Instruction |
---|---|---|---|
_start: |
my_func: |
||
0x100 | call my_func |
0x200 | mov eax, ... |
0x108 | mov rbx, rax … |
0x208 | … |
… | 0x280 | ret |
While my_func
is executing, the stack contains 0x108, the return address.
When ret
is executed, this address is popped off the stack and we resume
execution at that point. (Later, we’ll see that this means if you’re using
the stack for anything else, you have to make sure you’ve popped every off
before you ret
, so that at that point, the only thing on the stack is the
return address.)
Although we can use any “calling convention” we like, in terms of passing arguments and returning results, you should try to stick with what will eventually become the convention for calling functions:
Pass the first six arguments in registers
rdi
,rsi
,rdx
,rcx
,r8
andr9
. Note that this is slightly different from thesyscall
convention (rcx
instead ofr10
).Return results in
rax
As an example, let’s write a function that prints a string (given as an
address and a length) and adds a newline at the end. This will just wrap up
the call to the write
syscall we’ve been using:
section .data
newline: db 10
section .text
write_ln:
; rdi = address
; rsi = length
mov rax, 1
mov rdi, 1
mov rsi, rdi
mov rdx, rsi
syscall
mov rax, 1
mov rdi, 1
mov rsi, newline
mov rdx, 1
syscall
ret
(Because we can switch sections at any time, you could put this code in a
file and then %include
it to make the write_ln
function available in any
program you write.)
To use this, we load the registers with the appropriate information and then
issue a call
:
section .data
msg: db "Hello, world!"
MSGLEN: equ $-msg
section .text
mov rdi, msg
mov rsi, MSGLEN
call write_ln
; Normal exit syscall...
We could, of course, further tidy this up by writing a function to wrap up
the write
syscall:
sys_write:
; rdi = address
; rsi = length
mov rax, 1
mov rdi, 1
mov rsi, rdi
mov rdx, rsi
syscall
ret
and then our write_ln
becomes just
write_ln:
; rdi = address
; rsi = length
call sys_write
mov rdi, newline
mov rsi, 1
call sys_write
ret
Note that our functions “clobber” the registers rdi
, rsi
, rax
and rdx
:
when calling either write_ln
or sys_write
you cannot rely on the values of
these registers being preserved through the call. Theoretically, which registers
and clobbered and which are preserved should be part of our function calling
convention, so that when we call a function, we know which registers are “safe”.
When we start using the stack, we’ll see how we can preserve register values by
pushing them onto the stack before the function call, and then popping them off
after it returns.
“Function pointers”
The address passed to call
can be a register, not just a label:
mov r11, my_function
call r11
This is equivalent to calling a function through a function pointer.