A C/C++ struct
is in reality nothing more than several data values stored
together in memory in a known arrangement. If we want to interoperate with
C/C++ programs which use structures, we’ll need to see how to form the assembly
equivalents.
An example structure:
struct thing {
double a; // 8 bytes
char b; // 1 byte
int c; // 4 bytes
char* d; // 8 bytes
};
If we sum up the size of the elements of this structure, we get 21 bytes.
However, if you compile this structure and then cout << sizeof(thing)
C++ will report that its size is 24 bytes. Where did the extra 3 bytes
come from? The answer has to do with structure layout, particularly
alignment and structure packing.
Remember that the CPU can perform memory accesses faster if they are aligned to multiples of certain powers of 2 (typically, 32 or 64). In order to allow optimized moves, structure members are typically aligned, rather than being packed in as tightly as possible. This leads to some extra space, in the form of padding bytes being added to each structure.
If we create an instance of thing
and then examine the addresses of its
members, we can deduce where, within those 24 bytes, each of the structure’s
elements is located:
thing x;
struct thing {
double a; // &x.a == &x
char b; // &x.b == &x + 8
int c; // &x.c == &x + 12
char* d; // &x.d == &x + 16
};
First, the address of the structure itself is just the address of its first member.
b
is located 8 bytes aftera
c
is located not 1 byte after, but 4 bytes afterb
. Thechar b
has been expanded to 4 bytes (or rather, 1 byte followed by 3 invisible padding bytes) in order to align all the structure members to multiples of 4 bytes.d
is located 4 bytes afterc
, and is itself 8 bytes wide.
In C++, this also hold but only for a subset of struct
s and class
es: those
that are POD types — “Plain Old Data”. A POD type is one that
Has no user-provided default constructor (i.e., uses the compiler-generated default ctor)
Has no user-provided copy constructor
Has no user-provided destructor
Has only
public
data members (all of which must also be POD)Has no reference-type data members
Has no
virtual
methods, and novirtual
base classes.
Note that this does allow POD types to have (non-virtual
) methods, and to
use inheritance. POD types are completely compatible between C++ and C, and,
with some care, with assembly.
As usual, the Sys V C ABI defines how the elements of a structure are to be
packed/aligned in memory. The rules for alignment are actually specified based
on the data type, and are not particular to structures (i.e., every int
in memory should be aligned to a multiple of 4 bytes, not just those in structures).
A summary would be:
Align data values to multiples of their size, in bytes. Add padding bytes after smaller data members so that subsequent members are correctly aligned.
The entire structure, when stored in memory, should be aligned to the largest alignment of any of its members. E.g.,
thing
would be aligned to (would always start on) 64-bit addresses, because it’s largest member is adouble
, which requires 64-bit alignment.The entire structure should be padded at the end so that its size is a multiple of its alignment. (If you align correctly, this will happen naturally.)
Structures in assembly
We can “build” a structure by just arranging things in memory to conform
to the structure layout. For example, to build an instance of the thing
structure on the stack, we could do
sub rsp, 24 ; Make room for the struct
mov [rsp + 24], a
mov [rsp + 16], b
mov [rsp + 12], c
mov [rsp + 8], d
and then the address of the structure is rsp + 24
. This is obviously
tedious and error-prone. A better option is to use Yasm’s macros
for building structures, struc
and endstruc
. To mirror the above structure
in assembly, we’d use
struc thing
a: resq 1
b: resb 1
resb 3 ; 3 Padding bytes
c: resd 1
d: resq 1
endstruc
(The resb,w,d,q
directives reserve a certain number of bytes, words, dwords,
or qwords, respectively.)
This implicitly defines, via equ
, six constants:
thing
is defined to be 0, as the offset of the address of the whole structure from the beginning.Similarly,
a
is defined to be 0.b
is defined to be 8c
is defined to be 12d
is defined to be 16thing_size
is defined to be 24
Note that these are file-global constants, which means that the names a
, thing
,
etc. cannot be used for labels or other constants anywhere else in the same
file. If this is a problem, you can use local labels .a
, .b
, etc. for the
member names.
Instead of manually adding the padding bytes, we can also use alignb
to
request a specific alignment of subsequent data. alignb n
adds 0 bytes to
the current section until the current address $
is a multiple of n, so we
would add alignb
directives before the elements:
struc thing
alignb 8 ; Does nothing, already aligned
a: resq 1
alignb 1 ; Does nothing, already aligned
b: resb 1
alignb 4 ; Advance to multiple of 4
c: resd 1
alignb 8 ; Advance to multiple of 8
d: resq 1
endstruc
Note that the first, second, and fourth alignb
s do not add any padding at all,
as the member d
is already naturally aligned to a multiple of 8. It’s safe to
add extra alignb
s, because they will not insert any padding unless it is
needed.
alignb
fills the unused space with 0s.
To instantiate a structure in the .data
section, use istruc
, at
and iend
:
my_thing: istruc thing
at a, dq 0.0 ; a = 0.0
at b, db '!' ; b = '!'
at c, dd -12 ; c = -12
at d, dq 0 ; d = nullptr
iend
The at
macro advances to the correct offset within the structure. Fields
within in an istruc
must be given in the exact same order as in the original
struc
.
Note that istruc
/iend
can only be used to declare instances in the .data
section, i.e., as globals. To create an instance on the stack, we would
first reserve the correct amount of space:
add rsp, thing_size
and then populate it relative to rsp
:
mov qword [rsp - thing_size + a], 0.0
mov byte [rsp - thing_size + b], '!'
mov dword [rsp - thing_size + c], -12
mov qword [rsp - thing_size + d], 0
(We subtract thing_size
to get back to the beginning of the structure, and
then offset the various members from there.)
Stack offset | Member | Value |
---|---|---|
rsp - 24 | a | 0.0 |
rsp - 16 | b | ‘!’ |
(rsp - 15) to (rsp - 13) | padding bytes | |
rsp - 12 | c | -12 |
rsp - 8 | d | 0 |
rsp | top of stack |
Function calling convention for structures
Structures which are passed as pointers are passed as 64-bit qword
addresses,
as usual. What about passing a structure directly, by value:
void f(thing x);
How would we call x
from assembly? To pass a structure by value to a
function, there are a number of different rules, mostly depending on the size
of the structure and its members. The underlying theme is that a structure is
passed by “decomposing” it into its members and passing them individually in
registers, except that members smaller than a qword may be combined in a
single register.
If the size of the structure, in bytes, is ≤ 8, then the the entire structure is packed into a single 64-bit register and passed through it.
If the size of the structure is > 16 bytes and the first qword cannot be passed in an
xmm
register, then pass the entire structure on the stack. (This is the traditional, 32-bit way to pass structures, thus ensuring that “classic” structures are passed in the same way for 32- and 64-bit programs.)If the size of the structure is larger than 64, or if any of its members are not correctly aligned, then pass it on the stack.
If the size of the structure is > 8 but ≤ 64, then we examine the the structure as if it was a sequence of qwords, classifying each qword separately, by looking at the fields that go into it, classifying them according to the normal argument-passing rules, and then using the most stringent requirements. For example, if one of the fields in a qword can be passed in a register, but another must be passed on the stack, then the entire structure is passed on the stack.
- Padding bytes are ignored, having no effect on how each qword is passed.
- If all the fields in the qword can be passed the same way, then do so. E.g., if they can all be passed in general-purpose registers, then pass them in the next available argument register.
- Otherwise, if any of the fields must be passed on the stack, then pass the entire structure on the stack. Note that this does not apply if the structure simply “spills” onto the stack due to not enough registers being available. E.g., if part of a qword is a substructure which is larger than 64 bytes, then it must be passed on the stack, regardless of how many registers are available.
- Otherwise, if any of the fields can be passed in a GP register, pass them all in a GP register.
- Otherwise, use the next
xmm
register (this only occurs if all fields are either floating-point or padding).
Thus, to figure out how to pass a larger-than-8-bytes structure to a function, we must build a “map” of the structure layout in groupings of 8 bytes, examine the fields in each grouping, and determine how each group will be passed:
Qword | Fields | Passed as |
---|---|---|
0 | a | xmm register |
1 | b, padding, c | GP register |
2 | d | GP register |
Note that we only go through this process if the structure size is greater than
8, less than 64, and, if greater than 16, the first qword is passed in a
xmm
register. If we were to reorder the elements of the structure so that
the double a
was not the first element, then the structure would be passed
on the stack (size > 16 and first element not a xmm
-compatible value).
Structures are never broken up between registers and the stack; if any part of a structure (or > qword value in general) cannot be passed in registers, then the entire value is passed on the stack. The exceptions are “spilled” structure (where the whole structure could be passed in registers, but there aren’t enough available), and sub-structures.
Returning structure values
Return values are classified according to the above process, except that only
two GP registers (rax
and rdx
), and two xmm
registers (xmm0
and xmm1
)
are available; if the structure does not fit into these, then it is returned
as follows:
When calling the function, an invisible first argument is added, in
rdi
, as the address where the return value should be written. This can simply be allocated by the caller on the stack.When the function returns, it sets
rax
to a copy of the address it received inrdi
and writes its return value to that address.
Example
Here’s a rather complex example, borrowed from the Sys-V ABI specification:
// Note: sizeof(structparm) == 16
struct structparm {
int a, b;
double d;
};
structparm s;
int e, f, g, h, i, j, k;
double m, n;
structparm func(int e, int f,
structparm s,
int g, int h,
double m,
double n,
int i, int j, int k);
How will the registers and stack be set up for this function call?
GP registers | FP registers | Stack |
---|---|---|
rdi: e |
xmm0: s.d |
0: j |
How will the return value be represented?
GP registers | FP registers | Stack |
---|---|---|
rax: ret.a, ret.b
|
xmm0: ret.d |
None |
Handling Signals
Currently, if we divide by 0 (either integer or floating-point), our program
simply crashes with a floating-point exception. To avoid this, we have to
install a signal handler, to catch the sigfpe
signal which is sent to
our process on a divide-by-zero error. This requires setting up a sigaction
structure and passing it to the C-library sigaction()
function.
Note that trying to “recover” from sigfpe
is very dangerous in general;
you have no idea what your program was doing, so the only reason to catch
this signal is to make your program exit in some particular way: e.g., write
some information to a log. The only safe thing to do is to end the program.
Hence, we won’t try to “resume” computation in any way, we’ll just print a
custom message and then exit. Instead of handling sigfpe, we’ll handle a
signal which is safe to resume from: sigwinch, sent when the size of the
text console changes (e.g., resize your PuTTY window and SIGWINCH is sent to
the process you are running).
Signals
Signals are one of the ways in which a Unix-based operating system communicates with the processes running on it. Signals can be broken into those which can be caught by our processes, and those which are not catchable, and of the former, those which have some default behavior, and those which do nothing if not caught
Signal | Catchable? | Default Behavior |
---|---|---|
SIGINT (Ctrl-C) | Yes | Terminate process |
SIGKILL | No | Terminate process |
SIGTERM | Yes | Terminate process |
SIGSGEV | Yes | Terminate process (Null-pointer dereference) |
SIGFPE | Yes | Terminate process (Arithmetic error, div. by 0) |
SIGHUP | Yes | Terminate process (“Hangup”) |
SIGWINCH | Yes | Nothing (“window change”) |
Signals are sent to a process asynchronously; this means that if a
signal-handler, it may be triggered anywhere. It will appear to our program
that the signal handler function was called, e.g., in the middle of an
operation. Hence, signal handlers have to be very careful, as the state of the
program is unknown. In particular, signal handlers should not call
any standard library functions except exit()
(it’s OK to call printf
for
testing or experimentation, but should never be done in production). The typical
behavior for
a signal handler is to set some (global) variables and then return, if the
signal is not a fatal one. For fatal signals, the only real option is to
clean-up and then exit.
It’s possible to send and receive custom signals, which are normally ignored.
Signals are a fairly simple communication method: you can’t attach information to signals, and you also can’t determine which process sent a signal to yours. They’re mostly used for simple “event notifications”: “something has happened”.
Catching signals
To catch a signal, we can use one of two mechanisms:
#include <stdio.h>
#include <signal.h>
int window_resized = 0;
void my_handler(int sig) {
window_resized = 1;
}
int main() {
if(signal(SIGWINCH, my_handler) == SIG_ERR) {
return 1; // Handler could not be attached
}
// Wait for window resizes
while(1) {
if(window_resized) {
printf("Window resized!\n");
window_resized = 0;
}
}
return 0;
}
The signal
function takes two parameters: a signal constant and a pointer
to a handler function. Every handler function should have the prototype
void handler(int sig)
, where the parameter will be the number of the signal
that was caught (this allows a single handler function to be associated with
different signals while still distinguishing them).
The behavior of the signal
function is not completely specified, in particular,
if a signal is caught while a handler is executing. Hence, the second method
is preferred, which uses the sigaction
structure and function:
#include <stdio.h>
#include <signal.h>
int window_resized = 0;
void my_handler(int sig) {
window_resized = 1;
}
int main() {
struct sigaction act;
act.sa_handler = my_handler; // Handler function
sigemptyset(&act.sa_mask); // Signals to block while running handler
act.sa_flags = SA_RESTART; // Flags
if(sigaction(SIGWINCH, &act, NULL) != 0) {
return 1; // Could not register handler
}
// Wait for window resizes
while(1) {
if(window_resized) {
printf("Window resized!\n");
window_resized = 0;
}
}
return 0;
}
The sigaction
structure defined in signal.h
looks like this:
struct sigaction
{
handler_t sa_handler; // Function pointer (8 bytes)
unsigned long int sa_mask[16]; // Signal mask (16*8 = 128 bytes)
int sa_flags; // Flags (4 bytes)
// ... Other members
};
The size of the structure as a whole is 152 bytes (!).
The assembly structure definition corresponding to this would be
struc sigaction_t
sa_handler: resq 1
sa_mask: resq 16
sa_flags: resd 1
resb 12 ; Padding/other members
endstruc
Fortunately for us, the structure is passed as a pointer, so we don’t have to worry about all the crazy structure-passing rules. We can just allocate a global instance of the structure and pass the address of that.
section .data
SIGWINCH: equ 28
SA_RESTART: equ 268435456
msg: db "Window resized!\n", 0
window_resized: dq 0
action: istruc sigaction_t
at sa_handler, dq my_handler
at sa_mask, times 16 dq 0
at sa_flags, dd SA_RESTART
times 12 db 0
iend
my_handler
must be a C-compatible function taking a single int
parameter:
my_handler:
push rbp
mov rbp, rsp
mov qword [window_resized], 1
pop rbp
ret
The most complex part is main
, as it has to set up the signal handler and
then loop waiting for signals:
extern sigaction
extern printf
main:
push rbp
mov rbp, rsp
; Install signal handler
mov rdi, SIGWINCH
mov rsi, action
call sigaction
cmp rax, 0
je .continue
; Couldn't register handler, return 1
mov rax, 1
pop rbp
ret
; Loop forever
.continue:
cmp qword [window_resized], 1
jne .continue
mov rdi, msg
call printf
mov qword [window_resized], 0
jmp .continue
pop rbp
mov rax, 0
ret