Review

Again, the System V C ABI specification has all the details on how function calls actually work.

Calling C standard library functions

To call a function from the C standard library, we must

  1. Declare it in our .s file as extern. (E.g., extern printf). Extern is the opposite of global; while global makes a symbol visible to things outside our program, extern says that our program is using a symbol defined somewhere outside it (and, in fact, printf must have been declared global or the C equivalent in order for us to be able to access it this way!).

  2. push any caller-preserved registers (rax, any of the argument registers, r10, r11) onto the stack if you are using them.

  3. Ensure that the stack is correctly aligned before calling any functions. See the next section. Note that stack re(alignment) must be done after pushing caller-preserved registers, but before pushing any stack-based arguments.

  4. Place the arguments to the function in registers rdi, rsi, rdx, rcx, r8, and r9, from first to last. Place floating-point arguments in registers xmm0 through xmm7. If there are more than six arguments (or more than 8 floating-point arguments), push the remainder onto the stack, in right-to-left order (i.e., the 7th argument should be pushed last).

    If you have any stack-based arguments, you will need to re-align the stack before pushing them. I.e., you have to compute an offset to subtract from rsp that will result in the stack being correctly aligned after all the pushes have taken place.

  5. call the function. E.g., call printf.

  6. The return value, if any, will be placed in rax. 128-bit return values are placed in both rdx:rax. (Return values larger than 128-bits are handled specially: the caller passes an address to use for the result as an “invisible” first argument, in rdi. On return, the results are written into this address, which is “returned” in rax.)

    Floating-point return values will be in xmm0 (and xmm1 if big enough).

  7. pop any registers you pushed in (3). Un-align the stack if needed.

Stack alignment

Before call-ing any functions, the stack pointer rsp must be aligned to a multiple of 16, plus 8. Upon entry to your main function it will be aligned to a multiple a 16, so every function must normally begin with a prologue which either

If you call any functions which take stack-based arguments, or if you use any caller-preserved registers, you may need to re-align the stack before each function call, rather than just once at the beginning of the function.

Most functions don’t need a frame base pointer; it’s really only useful if you’re storing a dynamic amount of data on the stack, which most functions don’t do. Hence, the most common prologue is simply

sub rsp, 8

We subtract because the stack grows down in memory. Then, at the end of the function, add rsp, 8 to revert the stack alignment before returning.

If a function uses some callee-preserved registers (rbp, rbx, r12-r15) then those will have to be pushed onto the stack during the prologue. Depending on the number of registers pushed, the extra alignment may or may not be needed. (E.g., if you use rbx, you’ll have to push it, but doing so in addition to rbp will give 16 extra bytes, so we’ll need to sub rsp, 8 in order to get another 8 bytes of alignment.) It doesn’t matter whether you do the alignment before or after pushing the callee-preserved registers, so long as you undo in the opposite order before returning.

When we call a function, the CPU pushes the (qword) rip onto the stack, thus making the stack aligned to a multiple of 16 again. Thus, every function which intends to call other functions must perform this preamble.

Before returning, the rip pushed by call must be on top of the stack. Hence, every function needs an epilogue, which is simply the reverse of the prologue:

Either way, the stack is adjusted back to the stack that it was immediately after the call.

The state of the stack during a function g called by a function f can be visualized as

Position Contents Frame
8n + 16 + rbp

16 + rbp
stack argument n

stack argument 0
function f (previous frame)
8 + rbp Return address function g (current frame)
rbp Previous rbp/unused 8 bytes
rbp - 8

rsp
Callee-saved registers

Top of stack
rsp - 128 “Red zone”

(The “red zone” is a Posix-specific optimization: the 128 bytes above rsp are reserved for the function’s use, and can be used freely without any special action. Of course, this space will be overwritten by any called functions!)

As an example, suppose we have a function with declaration

long f(int a,             // rdi
       char* d,           // rsi
       int* e,            // rdx
       long f,            // rcx
       char g,            // r8
       unsigned long h,   // r9
       unsigned long i,   // stack
       int* j,            // stack
       long k             // stack
       );         

How would we right a call to this function equivalent to

f(1,nullptr,nullptr,5,'6',7,8,nullptr,9);

In assembly, this would be

  ; Assuming stack is already aligned to 16+8

  mov rdi, 1
  mov rsi, 0
  mov rdx, 0
  mov rcx, 5
  mov r8,  '6'
  mov r9,  7

  sub rsp, 8    ; Align stack for call
  push 9        ; k = 9
  push 0        ; j = nullptr
  push 8        ; i = 8
  call f

Note that stack-based arguments are passed in reverse order. Because we are pushing 3 qword arguments onto the stack, we must re-align the stack pointer. However, the callee will expect to find its stack arguments immediately below rip on the stack, so we must adjust the stack alignment first, before pushing any stack-based arguments.

The stack layout within the callee will look like this:

Stack Addr
9
nullptr
8
rip <- rsp

After the callee sets up the base pointer, we will have

Stack Addr
9 rbp+32
nullptr rbp+24
8 rbp+16
rip rbp+8
rbp <- rbp <- rsp

and hence the callee can access its stack arguments as [rbp + 16], [rbp + 24], and [rbp + 32].

(Note that, while pushing arguments in reverse order might be counter-intuitive, from the perspective of the callee it makes perfect sense: from left-to-right, the 7th argument is directly below rip, the 8th below that, and so forth.)

Note that, no matter the arguments’ sizes, the stack pointer rsp should always be a multiple of 8. Thus, if we had a dword stack argument, we would round its size up to qword before pushing.

Writing functions

Writing a “well-behaved” function is simply the opposite process of the above:

A function written in this way, declared global can be called from C/C++. Note that if a function calls no other functions, and does not need to be called from C/C++, then you can ignore many of these rules. That’s what we did earlier in the semester. Similarly, a leaf function, one which calls no other function, is free to use an unaligned stack

Example: student grades

Here’s a simple example: we want to write a program, using the C standard library, which

In C, this looks like this:

#include <stdio.h>
#include <limits.h>

int main() {
    long high = LONG_MIN, low = LONG_MAX;
    long grade;

    printf("Enter grades: ");
    do { 
        scanf("%ld", &grade);

        if(grade == -1)
            break;

        if(grade > high)
            high = grade;
        if(grade < low)
            low = grade;
    } while(1);

    printf("Highest grade: %ld\n", high);
    printf("Lowest grade: %ld\n", low);

    return 0;
}

(We should be checking the return value of scanf, but I’m begin lazy.)

Here’s a sample run:

Enter grades: 1 10 100 4 6 57 92 28 -1
Highest grade: 100
Lowest grade: 1

To translate this into assembly, we have to carry out the following steps:

  1. Any string literals must be placed in the .data section, terminated with a nul character.

  2. Because scanf takes the address of a variable, we must have a qword somewhere in memory. We could use a global variable, declared in the .data section, but instead, we’ll store it on the stack, in the “unused” space we would normally create to align rsp. The high and low variables, because we never use their addresses, can be stored in registers.

  3. Because we have a function call inside the loop, we will have to avoid using any of the caller-saved registers.

  4. The if-else on grade can be done entirely with conditional moves; no jumps necessary!

  5. Any library functions used must be declared extern.

  6. Write main

  7. Ensure that the resulting object file is linked with the C library (The asm script will do this for you.)

;;;;
;;;; grades.s
;;;;
section .data

scanf_format:       db      "%ld", 0
scanf_result:       dq      0

printf_prompt:      db      "Enter grades: ", 0
printf_high:        db      "Highest grade: %ld", 10, 0
printf_low:         db      "Lowest grade: %ld", 10, 0

; Note that we don't need string lengths, because the strings are 
; nul-terminated

LONG_MIN:           equ     -9223372036854775808
LONG_MAX:           equ      9223372036854775807

section .text

extern printf
extern scanf
global main

main:
    mov r14, rsp    ; Note: [r14] (rsp+8) is used for the result of scanf
    sub rsp, 8      ; Align stack                    

    ; r12 = high, r13 = low
    mov r12, LONG_MIN
    mov r13, LONG_MAX    

.begin_loop:

    ; Call scanf to get input
    mov rdi, scanf_format
    mov rsi, r14
    call scanf    

    ; Check for -1
    cmp qword [r14], -1
    je .print_results

    ; Update low/high
    cmp qword [r14], r12
    cmovg r12, qword [r14]
    cmp qword [r14], r13
    cmovl r13, qword [r14]

    jmp .begin_loop

.print_results:

    ; Print low/high grades
    mov rdi, printf_high
    mov rsi, r12
    call printf

    mov rdi, printf_low
    mov rsi, r13
    call printf

    add rsp, 8      ; Un-align stack
    ret

The design of this program demonstrates in important facet of non-leaf functions (functions which call other functions): when choosing registers, choose caller-preserved registers first! We use r12,13,14 for our intermediate results, as these are caller-preserved; this means that we don’t have to worry about push/pop-ing them when we call scanf.

Calling C functions from assembly

You can call C functions that you write, not just the standard library functions. The process is similar, except that you’ll have to link manually instead of using asm:

  1. Write your assembly and C code in .s and .c files.

  2. Declare any C functions you want to call from assembly as extern.

  3. Assemble your .s files; compile your .c files.

  4. Link all the resulting object files together, using gcc. E.g., gcc -o my_program asm_part.o c_part.o.

Calling Assembly functions from C

To call an assembly function from a C program, use the opposite procedure:

  1. Write your assembly and C code in .s and .c files.

  2. In your C source code, write declarations for the assembly functions you wish to call. Remember that the compiler will use your declarations to determine how to pass the arguments, so make sure your declaration matches the arguments the function expects to receive!

  3. In your assembly source code, declare the functions global. If you have a main in C, don’t write one in assembly also!

  4. Assembly/compile source files.

  5. Link everything together with gcc

As an example, let’s write a replacement for the strlen function in assembly. strlen takes the address of a nul-terminated string in memory, and returns its length, as an unsigned 64-bit integer. In C, it’s declaration would be

size_t strlen_asm(char* s);

(We have to give our function a different name in order to avoid a name collision with the standard library strlen when we link.)

Here’s the assembly function:

section .text
global strlen_asm

strlen_asm:
    sub rsp, 8  ; Align stack

    ; rdi = addr of string
    ; Return: rax = Length
    mov rax, 0

.begin_loop:
    cmp byte [rdi + rax], 0
    je .done

    inc rax
    jmp .begin_loop

.done:
    add rsp, 8
    ret

This is equivalent to the C/C++ loop

size_t strlen_asm(char* s) 
{
    size_t rax = 0;
    while(*(s + rax) != 0) 
        ++rax;

    return rax;
}

(We’ll see later that there are a number of string-specific instructions that can accelerate this.)

We can save this as strlen_asm.s and assemble it, producing strlen_asm.o:

yasm -g dwarf2 -f elf64 strlen_asm.s

(Note that we use yasm manually to assemble this; if we used the asm script, it would try to link and fail, because our .s file does not have either a _start or main entry point.)

Meanwhile, we can write the following C code:

#include <stdio.h>

// In C++, declare this as `extern "C"`
size_t strlen_asm(char* s);

int main() {
    size_t len = strlen_asm("Hello, world!");

    printf("Length: %ld\n", len);

    return 0;
}

Save as strlen_test.c, compile, and link the two together with

gcc -o strlen_test strlen_test.o strlen_asm.o 

and then run with

./strlen_test