Review
Again, the System V C ABI specification has all the details on how function calls actually work.
Calling C standard library functions
To call a function from the C standard library, we must
Declare it in our
.s
file asextern
. (E.g.,extern printf
). Extern is the opposite ofglobal
; whileglobal
makes a symbol visible to things outside our program,extern
says that our program is using a symbol defined somewhere outside it (and, in fact,printf
must have been declaredglobal
or the C equivalent in order for us to be able to access it this way!).push
any caller-preserved registers (rax
, any of the argument registers,r10
,r11
) onto the stack if you are using them.Ensure that the stack is correctly aligned before calling any functions. See the next section. Note that stack re(alignment) must be done after pushing caller-preserved registers, but before pushing any stack-based arguments.
Place the arguments to the function in registers
rdi
,rsi
,rdx
,rcx
,r8
, andr9
, from first to last. Place floating-point arguments in registersxmm0
throughxmm7
. If there are more than six arguments (or more than 8 floating-point arguments), push the remainder onto the stack, in right-to-left order (i.e., the 7th argument should be pushed last).If you have any stack-based arguments, you will need to re-align the stack before pushing them. I.e., you have to compute an offset to subtract from
rsp
that will result in the stack being correctly aligned after all thepush
es have taken place.call
the function. E.g.,call printf
.The return value, if any, will be placed in
rax
. 128-bit return values are placed in bothrdx:rax
. (Return values larger than 128-bits are handled specially: the caller passes an address to use for the result as an “invisible” first argument, inrdi
. On return, the results are written into this address, which is “returned” inrax
.)Floating-point return values will be in
xmm0
(andxmm1
if big enough).pop
any registers you pushed in (3). Un-align the stack if needed.
Stack alignment
Before call
-ing any functions, the stack pointer rsp
must be aligned to
a multiple of 16, plus 8. Upon entry to your main
function it will be
aligned to a multiple a 16, so every function must normally begin with a
prologue which either
push
esrbp
(the calling function’s stack frame pointer) onto the stack.Manually adjusts
rsp
by subtracting 8 from it.
If you call any functions which take stack-based arguments, or if you use any caller-preserved registers, you may need to re-align the stack before each function call, rather than just once at the beginning of the function.
Most functions don’t need a frame base pointer; it’s really only useful if you’re storing a dynamic amount of data on the stack, which most functions don’t do. Hence, the most common prologue is simply
sub rsp, 8
We subtract because the stack grows down in memory. Then, at the end of the
function, add rsp, 8
to revert the stack alignment before ret
urning.
If a function uses some callee-preserved registers (rbp
, rbx
, r12
-r15
) then
those will have to be pushed onto the stack during the prologue. Depending on
the number of registers pushed, the extra alignment may or may not be needed.
(E.g., if you use rbx
, you’ll have to push
it, but doing so in addition
to rbp
will give 16 extra bytes, so we’ll need to sub rsp, 8
in order to
get another 8 bytes of alignment.) It doesn’t matter whether you do the
alignment before or after pushing the callee-preserved registers, so long as
you undo in the opposite order before ret
urning.
When we call
a function, the CPU pushes the (qword) rip
onto the stack,
thus making the stack aligned to a multiple of 16 again. Thus, every function
which intends to call other functions must perform this preamble.
Before ret
urning, the rip
pushed by call
must be on top of the stack.
Hence, every function needs an epilogue, which is simply the reverse of the
prologue:
pop
any callee-preserved registers you pushed in the prologue.Either
pop rbp
oradd rsp, 8
Either way, the stack is adjusted back to the stack that it was immediately
after the call
.
The state of the stack during a function g
called by a function f
can be visualized as
Position | Contents | Frame |
---|---|---|
8n + 16 + rbp … 16 + rbp |
stack argument n … stack argument 0 |
function f (previous frame) |
8 + rbp |
Return address | function g (current frame) |
rbp |
Previous rbp /unused 8 bytes |
|
rbp - 8 … rsp |
Callee-saved registers … Top of stack |
|
rsp - 128 |
“Red zone” |
(The “red zone” is a Posix-specific optimization: the 128 bytes above rsp
are reserved for the function’s use, and can be used freely without any
special action. Of course, this space will be overwritten by any called
functions!)
As an example, suppose we have a function with declaration
long f(int a, // rdi
char* d, // rsi
int* e, // rdx
long f, // rcx
char g, // r8
unsigned long h, // r9
unsigned long i, // stack
int* j, // stack
long k // stack
);
How would we right a call to this function equivalent to
f(1,nullptr,nullptr,5,'6',7,8,nullptr,9);
In assembly, this would be
; Assuming stack is already aligned to 16+8
mov rdi, 1
mov rsi, 0
mov rdx, 0
mov rcx, 5
mov r8, '6'
mov r9, 7
sub rsp, 8 ; Align stack for call
push 9 ; k = 9
push 0 ; j = nullptr
push 8 ; i = 8
call f
Note that stack-based arguments are passed in reverse order. Because we are
pushing 3 qword arguments onto the stack, we must re-align the stack pointer.
However, the callee will expect to find its stack arguments immediately below
rip
on the stack, so we must adjust the stack alignment first, before
pushing any stack-based arguments.
The stack layout within the callee will look like this:
Stack | Addr |
---|---|
… | |
9 | |
nullptr | |
8 | |
rip | <- rsp |
After the callee sets up the base pointer, we will have
Stack | Addr |
---|---|
… | |
9 | rbp+32 |
nullptr | rbp+24 |
8 | rbp+16 |
rip | rbp+8 |
rbp | <- rbp <- rsp |
and hence the callee can access its stack arguments as [rbp + 16], [rbp + 24]
,
and [rbp + 32]
.
(Note that, while pushing arguments in reverse order might be counter-intuitive,
from the perspective of the callee it makes perfect sense: from left-to-right,
the 7th argument is directly below rip
, the 8th below that, and so forth.)
Note that, no matter the arguments’ sizes, the stack pointer rsp
should
always be a
multiple of 8. Thus, if we had a dword stack argument, we would round its
size up to qword before pushing.
Writing functions
Writing a “well-behaved” function is simply the opposite process of the above:
Assume that the stack is aligned to a multiple of 16 at the start of the function, so use the prologue to adjust it to 16+8.
Save any of the callee-preserved registers you use on the stack.
Access any stack-based arguments by offset from
rbp
.Do whatever your function does, including call other functions.
Place the return value in
rax
(xmm0
if floating-point)Pop callee-preserved registers
De-align the stack
ret
A function written in this way, declared global
can be called from C/C++.
Note that if a function calls no other functions, and does not need to be
called from C/C++, then you can ignore many of these rules. That’s what we
did earlier in the semester. Similarly, a leaf function, one which calls no
other function, is free to use an unaligned stack
Example: student grades
Here’s a simple example: we want to write a program, using the C standard library, which
Reads in a number of student grades
When a -1 is entered, stops reading grades and reports the highest and lowest grades entered.
In C, this looks like this:
#include <stdio.h>
#include <limits.h>
int main() {
long high = LONG_MIN, low = LONG_MAX;
long grade;
printf("Enter grades: ");
do {
scanf("%ld", &grade);
if(grade == -1)
break;
if(grade > high)
high = grade;
if(grade < low)
low = grade;
} while(1);
printf("Highest grade: %ld\n", high);
printf("Lowest grade: %ld\n", low);
return 0;
}
(We should be checking the return value of scanf
, but I’m begin lazy.)
Here’s a sample run:
Enter grades: 1 10 100 4 6 57 92 28 -1
Highest grade: 100
Lowest grade: 1
To translate this into assembly, we have to carry out the following steps:
Any string literals must be placed in the
.data
section, terminated with anul
character.Because
scanf
takes the address of a variable, we must have a qword somewhere in memory. We could use a global variable, declared in the.data
section, but instead, we’ll store it on the stack, in the “unused” space we would normally create to alignrsp
. Thehigh
andlow
variables, because we never use their addresses, can be stored in registers.Because we have a function call inside the loop, we will have to avoid using any of the caller-saved registers.
The
if-else
ongrade
can be done entirely with conditional moves; no jumps necessary!Any library functions used must be declared
extern
.Write
main
Ensure that the resulting object file is linked with the C library (The
asm
script will do this for you.)
;;;;
;;;; grades.s
;;;;
section .data
scanf_format: db "%ld", 0
scanf_result: dq 0
printf_prompt: db "Enter grades: ", 0
printf_high: db "Highest grade: %ld", 10, 0
printf_low: db "Lowest grade: %ld", 10, 0
; Note that we don't need string lengths, because the strings are
; nul-terminated
LONG_MIN: equ -9223372036854775808
LONG_MAX: equ 9223372036854775807
section .text
extern printf
extern scanf
global main
main:
mov r14, rsp ; Note: [r14] (rsp+8) is used for the result of scanf
sub rsp, 8 ; Align stack
; r12 = high, r13 = low
mov r12, LONG_MIN
mov r13, LONG_MAX
.begin_loop:
; Call scanf to get input
mov rdi, scanf_format
mov rsi, r14
call scanf
; Check for -1
cmp qword [r14], -1
je .print_results
; Update low/high
cmp qword [r14], r12
cmovg r12, qword [r14]
cmp qword [r14], r13
cmovl r13, qword [r14]
jmp .begin_loop
.print_results:
; Print low/high grades
mov rdi, printf_high
mov rsi, r12
call printf
mov rdi, printf_low
mov rsi, r13
call printf
add rsp, 8 ; Un-align stack
ret
The design of this program demonstrates in important facet of non-leaf
functions (functions which call other functions): when choosing registers,
choose caller-preserved registers first! We use r12,13,14
for our
intermediate results, as these are caller-preserved; this means that we don’t
have to worry about push
/pop
-ing them when we call scanf
.
Calling C functions from assembly
You can call C functions that you write, not just the standard library
functions. The process is similar, except that you’ll have to link manually
instead of using asm
:
Write your assembly and C code in
.s
and.c
files.Declare any C functions you want to call from assembly as
extern
.Assemble your
.s
files; compile your.c
files.Link all the resulting object files together, using
gcc
. E.g.,gcc -o my_program asm_part.o c_part.o
.
Calling Assembly functions from C
To call an assembly function from a C program, use the opposite procedure:
Write your assembly and C code in
.s
and.c
files.In your C source code, write declarations for the assembly functions you wish to call. Remember that the compiler will use your declarations to determine how to pass the arguments, so make sure your declaration matches the arguments the function expects to receive!
In your assembly source code, declare the functions
global
. If you have amain
in C, don’t write one in assembly also!Assembly/compile source files.
Link everything together with
gcc
As an example, let’s write a replacement for the strlen
function in assembly.
strlen
takes the address of a nul-terminated string in memory, and returns
its length, as an unsigned 64-bit integer. In C, it’s declaration
would be
size_t strlen_asm(char* s);
(We have to give our function a different name in order to avoid a name
collision with the standard library strlen
when we link.)
Here’s the assembly function:
section .text
global strlen_asm
strlen_asm:
sub rsp, 8 ; Align stack
; rdi = addr of string
; Return: rax = Length
mov rax, 0
.begin_loop:
cmp byte [rdi + rax], 0
je .done
inc rax
jmp .begin_loop
.done:
add rsp, 8
ret
This is equivalent to the C/C++ loop
size_t strlen_asm(char* s)
{
size_t rax = 0;
while(*(s + rax) != 0)
++rax;
return rax;
}
(We’ll see later that there are a number of string-specific instructions that can accelerate this.)
We can save this as strlen_asm.s
and assemble it, producing strlen_asm.o
:
yasm -g dwarf2 -f elf64 strlen_asm.s
(Note that we use yasm
manually to assemble this; if we used the asm
script,
it would try to link and fail, because our .s
file does not have either a
_start
or main
entry point.)
Meanwhile, we can write the following C code:
#include <stdio.h>
// In C++, declare this as `extern "C"`
size_t strlen_asm(char* s);
int main() {
size_t len = strlen_asm("Hello, world!");
printf("Length: %ld\n", len);
return 0;
}
Save as strlen_test.c
, compile, and link the two together with
gcc -o strlen_test strlen_test.o strlen_asm.o
and then run with
./strlen_test