Structures and structure packing; signals

A C/C++ struct is in reality nothing more than several data values stored together in memory in a known arrangement. If we want to interoperate with C/C++ programs which use structures, we’ll need to see how to form the assembly equivalents.

An example structure:

struct thing { 
    double a;  // 8 bytes
    char   b;  // 1 byte
    int    c;  // 4 bytes
    char*  d;  // 8 bytes    
};

If we sum up the size of the elements of this structure, we get 21 bytes. However, if you compile this structure and then cout << sizeof(thing) C++ will report that its size is 24 bytes. Where did the extra 3 bytes come from? The answer has to do with structure layout, particularly alignment and structure packing.

Remember that the CPU can perform memory accesses faster if they are aligned to multiples of certain powers of 2 (typically, 32 or 64). In order to allow optimized moves, structure members are typically aligned, rather than being packed in as tightly as possible. This leads to some extra space, in the form of padding bytes being added to each structure.

If we create an instance of thing and then examine the addresses of its members, we can deduce where, within those 24 bytes, each of the structure’s elements is located:

thing x;

struct thing { 
    double a;  // &x.a == &x
    char   b;  // &x.b == &x + 8
    int    c;  // &x.c == &x + 12
    char*  d;  // &x.d == &x + 16
};

First, the address of the structure itself is just the address of its first member.
b is located 8 bytes after a
c is located not 1 byte after, but 4 bytes after b. The char b has been expanded to 4 bytes (or rather, 1 byte followed by 3 invisible padding bytes) in order to align all the structure members to multiples of 4 bytes.
d is located 4 bytes after c, and is itself 8 bytes wide.

In C++, this also hold but only for a subset of structs and classes: those that are POD types — “Plain Old Data”. A POD type is one that

Has no user-provided default constructor (i.e., uses the compiler-generated default ctor)
Has no user-provided copy constructor
Has no user-provided destructor
Has only public data members (all of which must also be POD)
Has no reference-type data members
Has no virtual methods, and no virtual base classes.

Note that this does allow POD types to have (non-virtual) methods, and to use inheritance. POD types are completely compatible between C++ and C, and, with some care, with assembly.

As usual, the Sys V C ABI defines how the elements of a structure are to be packed/aligned in memory. The rules for alignment are actually specified based on the data type, and are not particular to structures (i.e., every int in memory should be aligned to a multiple of 4 bytes, not just those in structures). A summary would be:

Align data values to multiples of their size, in bytes. Add padding bytes after smaller data members so that subsequent members are correctly aligned.
The entire structure, when stored in memory, should be aligned to the largest alignment of any of its members. E.g., thing would be aligned to (would always start on) 64-bit addresses, because it’s largest member is a double, which requires 64-bit alignment.
The entire structure should be padded at the end so that its size is a multiple of its alignment. (If you align correctly, this will happen naturally.)

Structures in assembly

We can “build” a structure by just arranging things in memory to conform to the structure layout. For example, to build an instance of the thing structure on the stack, we could do

sub rsp, 24     ; Make room for the struct 
mov [rsp + 24], a
mov [rsp + 16], b
mov [rsp + 12], c
mov [rsp + 8],  d

and then the address of the structure is rsp + 24. This is obviously tedious and error-prone. A better option is to use Yasm’s macros for building structures, struc and endstruc. To mirror the above structure in assembly, we’d use

struc thing
    a:      resq    1
    b:      resb    1
            resb    3 ; 3 Padding bytes
    c:      resd    1  
    d:      resq    1
endstruc

(The resb,w,d,q directives reserve a certain number of bytes, words, dwords, or qwords, respectively.)

This implicitly defines, via equ, six constants:

thing is defined to be 0, as the offset of the address of the whole structure from the beginning.
Similarly, a is defined to be 0.
b is defined to be 8
c is defined to be 12
d is defined to be 16
thing_size is defined to be 24

Note that these are file-global constants, which means that the names a, thing, etc. cannot be used for labels or other constants anywhere else in the same file. If this is a problem, you can use local labels .a, .b, etc. for the member names.

Instead of manually adding the padding bytes, we can also use alignb to request a specific alignment of subsequent data. alignb n adds 0 bytes to the current section until the current address $ is a multiple of n, so we would add alignb directives before the elements:

struc thing
            alignb  8   ; Does nothing, already aligned
    a:      resq    1
            alignb  1   ; Does nothing, already aligned
    b:      resb    1
            alignb  4   ; Advance to multiple of 4
    c:      resd    1 
            alignb  8   ; Advance to multiple of 8 
    d:      resq    1
endstruc

Note that the first, second, and fourth alignbs do not add any padding at all, as the member d is already naturally aligned to a multiple of 8. It’s safe to add extra alignbs, because they will not insert any padding unless it is needed.

alignb fills the unused space with 0s.

To instantiate a structure in the .data section, use istruc, at and iend:

my_thing:   istruc thing
    at a,   dq      0.0     ; a = 0.0
    at b,   db      '!'     ; b = '!'
    at c,   dd      -12     ; c = -12
    at d,   dq      0       ; d = nullptr
iend

The at macro advances to the correct offset within the structure. Fields within in an istruc must be given in the exact same order as in the original struc.

Note that istruc/iend can only be used to declare instances in the .data section, i.e., as globals. To create an instance on the stack, we would first reserve the correct amount of space:

add rsp, thing_size

and then populate it relative to rsp:

mov qword [rsp - thing_size + a], 0.0
mov byte  [rsp - thing_size + b], '!'
mov dword [rsp - thing_size + c], -12
mov qword [rsp - thing_size + d], 0

(We subtract thing_size to get back to the beginning of the structure, and then offset the various members from there.)

Stack offset	Member	Value
rsp - 24	a	0.0
rsp - 16	b	‘!’
(rsp - 15) to (rsp - 13)	padding bytes
rsp - 12	c	-12
rsp - 8	d	0
rsp	top of stack

Function calling convention for structures

Structures which are passed as pointers are passed as 64-bit qword addresses, as usual. What about passing a structure directly, by value:

void f(thing x);

How would we call x from assembly? To pass a structure by value to a function, there are a number of different rules, mostly depending on the size of the structure and its members. The underlying theme is that a structure is passed by “decomposing” it into its members and passing them individually in registers, except that members smaller than a qword may be combined in a single register.

If the size of the structure, in bytes, is ≤ 8, then the the entire structure is packed into a single 64-bit register and passed through it.
If the size of the structure is > 16 bytes and the first qword cannot be passed in an xmm register, then pass the entire structure on the stack. (This is the traditional, 32-bit way to pass structures, thus ensuring that “classic” structures are passed in the same way for 32- and 64-bit programs.)
If the size of the structure is larger than 64, or if any of its members are not correctly aligned, then pass it on the stack.
If the size of the structure is > 8 but ≤ 64, then we examine the the structure as if it was a sequence of qwords, classifying each qword separately, by looking at the fields that go into it, classifying them according to the normal argument-passing rules, and then using the most stringent requirements. For example, if one of the fields in a qword can be passed in a register, but another must be passed on the stack, then the entire structure is passed on the stack.
- Padding bytes are ignored, having no effect on how each qword is passed.
- If all the fields in the qword can be passed the same way, then do so. E.g., if they can all be passed in general-purpose registers, then pass them in the next available argument register.
- Otherwise, if any of the fields must be passed on the stack, then pass the entire structure on the stack. Note that this does not apply if the structure simply “spills” onto the stack due to not enough registers being available. E.g., if part of a qword is a substructure which is larger than 64 bytes, then it must be passed on the stack, regardless of how many registers are available.
- Otherwise, if any of the fields can be passed in a GP register, pass them all in a GP register.
- Otherwise, use the next xmm register (this only occurs if all fields are either floating-point or padding).

Thus, to figure out how to pass a larger-than-8-bytes structure to a function, we must build a “map” of the structure layout in groupings of 8 bytes, examine the fields in each grouping, and determine how each group will be passed:

Qword	Fields	Passed as
0	a	xmm register
1	b, padding, c	GP register
2	d	GP register

Note that we only go through this process if the structure size is greater than 8, less than 64, and, if greater than 16, the first qword is passed in a xmm register. If we were to reorder the elements of the structure so that the double a was not the first element, then the structure would be passed on the stack (size > 16 and first element not a xmm-compatible value).

Structures are never broken up between registers and the stack; if any part of a structure (or > qword value in general) cannot be passed in registers, then the entire value is passed on the stack. The exceptions are “spilled” structure (where the whole structure could be passed in registers, but there aren’t enough available), and sub-structures.

Returning structure values

Return values are classified according to the above process, except that only two GP registers (rax and rdx), and two xmm registers (xmm0 and xmm1) are available; if the structure does not fit into these, then it is returned as follows:

When calling the function, an invisible first argument is added, in rdi, as the address where the return value should be written. This can simply be allocated by the caller on the stack.
When the function returns, it sets rax to a copy of the address it received in rdi and writes its return value to that address.

Example

Here’s a rather complex example, borrowed from the Sys-V ABI specification:

// Note: sizeof(structparm) == 16
struct structparm {
    int a, b;
    double d;
};

structparm s;
int e, f, g, h, i, j, k;
double m, n;

structparm func(int e, int f, 
                structparm s, 
                int g, int h, 
                double m, 
                double n, 
                int i, int j, int k);

How will the registers and stack be set up for this function call?

GP registers	FP registers	Stack
`rdi: e rsi: f rdx: s.a, s.b rcx: g r8: h r9: i`	`xmm0: s.d xmm1: m xmm2: n`	`0: j 8: k`

How will the return value be represented?

GP registers	FP registers	Stack
`rax: ret.a, ret.b`	`xmm0: ret.d`	None

Handling Signals

Currently, if we divide by 0 (either integer or floating-point), our program simply crashes with a floating-point exception. To avoid this, we have to install a signal handler, to catch the sigfpe signal which is sent to our process on a divide-by-zero error. This requires setting up a sigaction structure and passing it to the C-library sigaction() function.

Note that trying to “recover” from sigfpe is very dangerous in general; you have no idea what your program was doing, so the only reason to catch this signal is to make your program exit in some particular way: e.g., write some information to a log. The only safe thing to do is to end the program. Hence, we won’t try to “resume” computation in any way, we’ll just print a custom message and then exit. Instead of handling sigfpe, we’ll handle a signal which is safe to resume from: sigwinch, sent when the size of the text console changes (e.g., resize your PuTTY window and SIGWINCH is sent to the process you are running).

Signals

Signals are one of the ways in which a Unix-based operating system communicates with the processes running on it. Signals can be broken into those which can be caught by our processes, and those which are not catchable, and of the former, those which have some default behavior, and those which do nothing if not caught

Signal	Catchable?	Default Behavior
SIGINT (Ctrl-C)	Yes	Terminate process
SIGKILL	No	Terminate process
SIGTERM	Yes	Terminate process
SIGSGEV	Yes	Terminate process (Null-pointer dereference)
SIGFPE	Yes	Terminate process (Arithmetic error, div. by 0)
SIGHUP	Yes	Terminate process (“Hangup”)
SIGWINCH	Yes	Nothing (“window change”)

Signals are sent to a process asynchronously; this means that if a signal-handler, it may be triggered anywhere. It will appear to our program that the signal handler function was called, e.g., in the middle of an operation. Hence, signal handlers have to be very careful, as the state of the program is unknown. In particular, signal handlers should not call any standard library functions except exit() (it’s OK to call printf for testing or experimentation, but should never be done in production). The typical behavior for a signal handler is to set some (global) variables and then return, if the signal is not a fatal one. For fatal signals, the only real option is to clean-up and then exit.

It’s possible to send and receive custom signals, which are normally ignored.

Signals are a fairly simple communication method: you can’t attach information to signals, and you also can’t determine which process sent a signal to yours. They’re mostly used for simple “event notifications”: “something has happened”.

Catching signals

To catch a signal, we can use one of two mechanisms:

#include <stdio.h>
#include <signal.h>

int window_resized = 0;

void my_handler(int sig) {
    window_resized = 1;
}

int main() {
    if(signal(SIGWINCH, my_handler) == SIG_ERR) {
        return 1; // Handler could not be attached
    }

    // Wait for window resizes
    while(1) { 
        if(window_resized) {
            printf("Window resized!\n");
            window_resized = 0;
        }
    }

    return 0;
}

The signal function takes two parameters: a signal constant and a pointer to a handler function. Every handler function should have the prototype void handler(int sig), where the parameter will be the number of the signal that was caught (this allows a single handler function to be associated with different signals while still distinguishing them).

The behavior of the signal function is not completely specified, in particular, if a signal is caught while a handler is executing. Hence, the second method is preferred, which uses the sigaction structure and function:

#include <stdio.h>
#include <signal.h>

int window_resized = 0;

void my_handler(int sig) {
    window_resized = 1;
}

int main() {
    struct sigaction act;
    act.sa_handler = my_handler;    // Handler function
    sigemptyset(&act.sa_mask);      // Signals to block while running handler
    act.sa_flags = SA_RESTART;      // Flags

    if(sigaction(SIGWINCH, &act, NULL) != 0) {
        return 1; // Could not register handler
    }

    // Wait for window resizes
    while(1) { 
        if(window_resized) {
            printf("Window resized!\n");
            window_resized = 0;
        }
    }

    return 0;
}

The sigaction structure defined in signal.h looks like this:

struct sigaction
{
    handler_t sa_handler;           // Function pointer (8 bytes) 
    unsigned long int sa_mask[16];  // Signal mask      (16*8 = 128 bytes)
    int sa_flags;                   // Flags            (4 bytes)

    // ... Other members
};

The size of the structure as a whole is 152 bytes (!).

The assembly structure definition corresponding to this would be

struc sigaction_t
    sa_handler:     resq 1
    sa_mask:        resq 16
    sa_flags:       resd 1
                    resb 12 ; Padding/other members
endstruc

Fortunately for us, the structure is passed as a pointer, so we don’t have to worry about all the crazy structure-passing rules. We can just allocate a global instance of the structure and pass the address of that.

section .data

SIGWINCH:       equ         28
SA_RESTART:     equ         268435456
msg:            db          "Window resized!\n", 0

window_resized: dq          0

action: istruc sigaction_t
    at sa_handler,  dq              my_handler
    at sa_mask,     times 16 dq     0
    at sa_flags,    dd              SA_RESTART
                    times 12 db     0 
iend

my_handler must be a C-compatible function taking a single int parameter:

my_handler:
    push rbp
    mov rbp, rsp

    mov qword [window_resized], 1

    pop rbp
    ret

The most complex part is main, as it has to set up the signal handler and then loop waiting for signals:

extern sigaction
extern printf

main:
    push rbp
    mov rbp, rsp

    ; Install signal handler
    mov rdi, SIGWINCH
    mov rsi, action
    call sigaction

    cmp rax, 0
    je .continue

    ; Couldn't register handler, return 1
    mov rax, 1
    pop rbp
    ret

    ; Loop forever
.continue:

    cmp qword [window_resized], 1
    jne .continue

    mov rdi, msg
    call printf
    mov qword [window_resized], 0
    jmp .continue

    pop rbp
    mov rax, 0
    ret