Today we’ll cover:

Binary representations for negative values.
In-memory representations of multi-byte numbers.
The special-purpose registers rif, rflags and a few others.
The arithmetic instructions add, sub, inc, dec.
Their effects on the flags register rflags.
We will briefly look at the unsigned multiplication/division/modulo instructions mul and div.
The function instructions call and ret

Instructions covered last time:

mov – Move operand values between registers, memory.
inc – Increment a register or memory value.
syscall – Invoke an operating-system function.

Review

Registers

Last time we looked at the full set of general-purpose registers available to us: rax, rbx, rcx, rdx, rdi, rsi, rbp, rsp, r8 through r15.

We saw how each of these can be accessed either its full 64-bit (qword) width, or as the low dword, or the low word, or the low byte:

`rax` (64 bits)
	`eax` (32 bits)
		`ax` (16 bits)
		`ah` (8 bits)	`al`(8 bits)

We saw that many instructions (including mov and xor), when operating on the low dword portion of a register, will implicitly zero the high dword, and we looked at some strategies for preserving the high dword.

We looked at the flags register rif which is used to store information about the results of various operations. The most important flags for us will be

CF – carry flag, set if the result of an unsigned operation produces an extra carry/borrow
OF – overflow flag, set if the result of a signed operation is too big/small to store
ZF – zero flag, set if all bits of the result are 0.
SF – sign flag, set if the high bit of the result is set

We saw that the mov instruction is our basic tool for moving data around, between registers, memory, and immediate (constant) values. We also saw the xchg instruction, which swaps operands, saving us from having to use a register as a temporary variable.

Negative values

All of the arithmetic operations described previously assume that the incoming value is positive. How are negative values handled? Historically, there have been four choices:

Sign-bit: in decimal, we indicate a negative value by a - sign, so why not simply dedicate a bit to meaning “this number is negative”? Historically, the high bit has been dedicated to this. So 00000011b is 3, but 10000011b is -3. Although this method is simple for us to understand, it has several drawbacks when it comes to implementation:
- There are now two values for 0, a positive and a negative.
- Arithmetic on values is more complicated, as we have to check the sign bits of both values. If we forget and perform an unsigned operation, the results are nonsense.
- The CPU has to switch between performing an addition operation, and performing a subtraction operation, depending on the value of the sign bit. That is, the CPU cannot set itself up to perform an addition/subtraction until it also knows the values being operated on.
Floating-point values use a sign bit, partly because they want to have both positive and negative zeros.
Biased: this represents all values (not just negative ones) as being arithmetically “shifted” by a fixed amount. So 0 is not 0, but 0+127 (=01111111b) or something. 3 is 3 + 127 = 10000010b, and -3 is -3 + 127 = 01111100b. Note that the high bit serves as a kind of “positive sign bit”; if it is set, then the number is positive (greater than 0).
- Addition and subtraction work (kind of) normally, except that we have to “unbias” the values after performing the normal operation. E.g., adding 3 and -3 gives:
```
     10000010
  +  01111100
 ─────────────
     11111110 = 127 (254-127) 
```
  We have to subtract 127 from the result again, because two “copies” of 127 were added, so the final result is actually
```
     11111110
  -  01111111
 ─────────────
     01111111 = 0 (127-127)
```
- Positive numbers look weird. Zero looks weird.
- Detecting non-negative numbers (≥ 0) is tricky.
Biased representation is used in a few places (e.g., in the exponent of floating point values)
Ones-complement: this represents negative values as the binary inverse (flip all bits) of the corresponding number. So 3 is 00000011b, while -3 is 11111100b. Note that we can determine whether a number is negative by examining the high bit: if it is set, the number is negative. But the high bit is not a sign bit: we cannot make a negative number positive by simply flipping the high bit, we have to flip all the bits.

If we perform a normal binary addition of +3 and -3, we get
```
     00000011
  +  11111100
 ─────────────
     11111111 = -0
```
As with the sign-bit representation, we have two representations for 0, a positive one (as 00000000b) and a negative one (as 11111111b). However, we can use normal binary addition to add signed numbers, and the result will be correct when interpreted as a ones-complement number.

Subtraction is a bit more difficult:
```
    111111  
     00000011   = 3
  -  00000100   = 4
 ─────────────
     11111110 =  -1 
```
When the borrowed 1 gets to the far left, it “wraps around” and is borrowed from the low bit of the answer. This is known as an end-around borrow. We have to adjust for this when it happens.
Twos-complement: this represents negative values as the binary inverse (flip all bits) plus 1. Technically, the twos complement is defined as 2⁸ - n. So 3 is 00000011b while -3 is 11111101b. Note that the high bit still can be used to detect negative values. You can think of twos complement as 1-biased ones complement.
- To negate a value, take its ones complement (i.e., flip every bit) and then add 1 to the result. E.g., if 3 = 00000011b, -3 = 11111100b + 1 = 11111101b.
- All arithmetic operations work normally, without needing to know the sign of the incoming values. We can just do normal addition, and the result will “make sense” regardless of whether either or both of the inputs were negative. E.g.,
```
      1111111
      00000011
   +  11111101
  ─────────────
      00000000 
```
- There is only one representation of 0, not two, and it’s just 0b.
- The range of values represented (by a byte) is -128 to +127.
- There’s no need to do an end-around borrow/carry in twos complement
x86-64 systems (and many others) use twos-complement, because it doesn’t require any extra circuitry to represent or manipulate negative values; you just do “normal” binary addition and the results are correct.

Multiplication is an expensive enough operation that we generally don’t bother trying to do it “inside” the number representation. Instead, we just make both operands positive, multiply, and then negate the result if needed.

Another example:

   11  1
   01110110  = 118
 + 11100101  = -27
────────────
   01011011  = 91

To negate a (positive or negative):

Flip all the bits
Add 1 (this boils down to setting the lowest zero bit to 1)

This works regardless of whether the input value is positive or negative. E.g.,

11100101  = -27
00011010        (flip all bits)
00011011  =  27 (add 1)

Expanding data sizes

Suppose we have a 8-bit value and we wish to store it into a 16-bit location. If the value is unsigned then this is easy: we copy the value into the low 8 bits, and we fill the high 8 bits with 0s. But what if the value is signed (twos-complement)? In this case, in order to get the equivalent value, we must sign extend the number, filling the high bits with copies of the high bit in the original value. If the high bit was originally 1, then all the high 8 bits must be 1, otherwise they should be 0.

Many arithmetic operations that can “mix” values of different word sizes will come in two forms: an unsigned form that “zero-extends” (fills with 0s) and a signed form that sign-extends (copies the high-bit).

In-memory representations

For single-byte values, the above representations are used. For multi-byte values, however, there are a few options. Consider a 16-bit value. When we place this into a memory address a, it can be done in two ways:

We can place the low byte (8 bits) into address a, and the high byte into a+1. This is called little-endian, because the low byte comes first.
We can place the low byte into address a+1, and the high into a. This is called big-endian, because the high byte comes first.

If big-endianness seems insane, consider that this is how a 16-bit value would be written from left-to-right, assuming memory addresses increase to the right:

high byte	low byte
addr	addr + 1

Little-endian is used by Intel systems, and thus we don’t need to worry about big-endian. Big-endian is used by a few microcontrollers (AVR32) and a few big-iron processors. If you’re writing a file format (or a network protocol) then you have to define a “standard” endianness and have to ensure that the correct translations are done in software, when necessary. But for us, if you want to access the high byte of a 16-bit value in memory, it can be found at address + 1.

Accessing memory

In 64-bit mode all addresses are 64-bits, hence the full register (rax, rbx, etc.) must be used to store an address. As we’ve seen, the label used to define a string in the .data section is actually the address of that string, hence we can load the address of a string my_text into rax with

mov rax, my_text

You can think of rax in this usage as a pointer-typed variable, holding the address of something.

We can “dereference” a memory address by putting it in square brackets:

mov al, byte [my_text]

The byte qualifier is not strictly required here, but it’s good practice to add it. An easy mistake to make is

mov rax, [my_text] ; Read one *qword* from my_text

which reads not one byte but eight (qword), where as

mov rax, byte [my_text] ; Read one byte from my_text

will give an error when you try to assemble.

(Of course, if you need more than just the first byte, you might want to get them in chunks of eight, for speed…)

In reality, if we were processing a string, we would want to iterate through it, rather than just accessing the first byte. It would be more useful to put the address my_text into a register and then “dereference” that:

mov rsi, my_text
mov al, byte [rsi]

We can then inc rsi to increment rsi and access the next byte in the string. Because my_text is an immediate, we cannot increment it. (Again, the byte qualifier on [rsi] is not required, as it could be inferred from the size of al.)

Simple looping

Because doing anything interesting will require looping, we’ll introduce the loop instruction. loop takes a single operand, a label to jump to (internally, loop stores the offset of the label’s address from the current instruction’s address). The operation of loop is to perform the following steps:

Decrement rcx
If rcx != 0, jump to the label
If rcx == 0, continue to the next instruction normally.

Thus, the structure of a basic loop would look something like this:

    mov rcx, init       ; Initialize rcx > 0

.start_loop:

    ; ... Perform loop operation using rcx

    loop .start_loop

    ; ... Continue after end of loop

This is roughly equivalent to a C/C++-style do-while loop:

rcx = init;
do {

    // ... Perform loop operation

    --rcx;
} while(rcx != 0);

Note that because rcx is one of the registers syscall is allowed to clobber, if you do any syscalls inside the loop, you will need to save rcx before the call, and then restore it after, before loop.

As a demo of this, we can modify our “Hello, world” program to print “Hello, world!” backwards, by printing one character at a time, from the end to the beginning. (We’ll still use the write syscall, we’ll just tell it to print a single character instead of the entire string.)

It may be useful to consider how we would do this in C or C++. The C++ equivalent to our original assembly program would be something like this:

int main()
{
    char* msg = "Hello, world!";
    const int MSGLEN = 13; 

    cout.write(msg,MSGLEN); // equiv. to write syscall
}

To write one character at a time, we need a loop that starts at the end of the string, and writes one character at a time, backwards, something like this:

int main()
{
    char* msg = "Hello, world!";
    const int MSGLEN = 13;

    int c = MSGLEN;
    do {

        char* addr = msg + c - 1;
        cout.write(addr,1);

        --c;
    } while(c != 0);
}

I’ve intentionally written the do-while loop in a way that mirrors the execution of the loop instruction, to make it easier to translate to assembly.

Our original program looked like this:

section .data

msg:            db      10, "Hello, world!"
MSGLEN:          equ     $-msg

section .text

;; Program code goes here

global _start
_start:






    mov     rax,    1               ; Syscall code in rax
    mov     rdi,    1               ; 1st arg, file desc. to write to
    mov     rsi,    msg             ; 2nd arg, addr. of message
    mov     rdx,    MSGLEN          ; 3rd arg, num. of chars to print
    syscall






    ;; Terminate process
    mov     rax,    60              ; Syscall code in rax
    mov     rdi,    0               ; First parameter in rdi
    syscall                         ; End process

I’ve removed the trailing 10 (\n) from the text, and moved it to the beginning, so it will still print at the “end”.

The first syscall will be inside the loop, so we can add:

section .data

msg:            db      10, "Hello, world!"
MSGLEN:          equ     $-msg

section .text

;; Program code goes here

global _start
_start:

    mov     rdi,    1               ; 1st arg, file desc. to write to
    mov     rdx,    1               ; 3rd arg, num. of chars to print



.begin_loop




    mov     rax,    1               ; Syscall code in rax
    mov     rsi,    msg             ; 2nd arg, addr. of message
    syscall




    loop .begin_loop

    ;; Terminate process
    mov     rax,    60              ; Syscall code in rax
    mov     rdi,    0               ; First parameter in rdi
    syscall                         ; End process

Note that syscall preserves rdi and rdx, so we can set those once outside the loop. However, rax is used for the return value, so we should set it every time through the loop, and rsi is the address of the start of the string, which will change as we move through the string.

We need to initialize rcx to the length of the string:

mov rcx, MSGLEN

and then we set rsi (the address to write) to rcx + msg - 1.

mov rsi, rcx
add rsi, msg-1

(add a, b performs addition, a += b, and dec a decrements --a. Both are subject to the usual restrictions: no memory-to-memory operations, both operands of the same size, etc. Because msg is a constant, the msg-1 is performed at assembly-time.)

Finally, note that rcx is one of the registers that syscall is allowed to “clobber” (r11 is the other), so we have to save it into another, safe register before the syscall and then restore it afterwards:

mov r15, rcx
syscall
mov rcx, r15

That leaves us with:

global _start
_start:

    mov     rdi,    1               ; 1st arg, file desc. to write to
    mov     rdx,    1               ; 3rd arg, num. of chars to print

    mov rcx, MSGLEN                 ; loop counter = MSGLEN

.begin_loop

    ; Print 1 char at [msg + rcx - 1]

    mov     rax,    1               ; Syscall code in rax

    mov rsi, rcx                    ; rsi = addr to print
    add rsi, msg
    dec rsi

    mov r15, rcx                    ; Save rcx before syscall
    syscall
    mov rcx, r15                    ; Restore rcx

    loop .begin_loop

    ;; Terminate process
    mov     rax,    60              ; Syscall code in rax
    mov     rdi,    0               ; First parameter in rdi
    syscall                         ; End process

Local labels

When writing loops or other labels that exist inside a function, it’s useful to write them as “local” labels, by starting them with a period. A local label is actually named after the most recent non-local label, so the full name of .begin_loop is actually _start.begin_loop. Labels can normally only be defined once per file, so without a local label, no other function we wrote could use the label begin_loop.

We’ll use local labels for all our loop and branch targets, and only use non-local labels for functions.

Negative `rcx`

In case you’re curious, let’s consider what happens if rcx is negative and we decrement it. E.g., if rcx = 11111111 (= -1), and we decrement:

   11111111
 - 00000001
────────────
   11111110  = -2

In other words, the result is exactly what you’d expect (but not particularly useful when used with loop).

Loop variants

There are two variants of the loop instruction which test the zero flag (ZF) along with the value of rcx:

loope – “Loop Equal”; decrement rcx, loop if rcx != 0 and ZF == 1
loopne – “Loop Not Equal”; decrement rcx, loop if rcx != 0 and ZF == 0

The zero flag is connected to the idea of (in)equality because, if we perform a subtraction:

sub a, b

and a == b, then the zero flag will be set, otherwise it will be unset.

Including files

Like C/C++, yasm has a simple mechanism for including the contents of one .s file into another:

%include "source.s"

copies the contents of source.s into the current assembly file. For example, we could start to centralize a lot of our syscall definitions include an include file:

;;;
;;; sysdefs.s
;;;
[section .data]

SYS_write   equ     1
SYS_exit    equ     60

SYS_stdin   equ     0
SYS_stdout  equ     1
...

__SECT__

The [section .data] and __SECT__ stuff is “magic” to temporarily switch to the data section and then switch back to whatever section we were in before.

Arithmetic operations

add dest, src       ; dest += src
sub dest, src       ; dest -= src

add and sub perform addition and subtraction between two operands of the same size. Internally, sub is just addition with the second operand negated, and the carry inverted at the end.

Like many operations, add and sub are dyadic: they take two operands, a destination and a source, with the destination serving as both the second input and the target for the output of the operation.

Flags

add and sub set/unset the OF, SF, ZF, AF, CF, and PF flags:

For signed operations the OF flag indicates that an overflow/underflow occurred. This bit is set if the sign bit of the result is incorrect, because the correct result was too big/small for the destination to hold. For example, signed 127+127 will generate an overflow. (Unsigned 127+127 = 254 which can still fit in an unsigned byte, hence would not set the carry flag.)

The correct sign of the result is easy to determine: if both inputs are positive, the result should be positive; if both are negative, then the result should be negative. If one is positive and the other is negative, then the correct sign of the result is more difficult to determine, but it turns out in this case that it doesn’t matter: it’s impossible for an addition of a positive and a negative value to overflow. Subtraction is treated similarly, except that the sign of the second operand is flipped (i.e., a - b is treated as a + (-b)).

If the inputs are unsigned, then the OF flag is still set/unset, but its value is meaningless.
For unsigned operations the CF flag indicates that an extra carry/borrow was “left over” at the end. This indicates that the result of the operation was too big/small to for the destination to hold. For example, 255+127 is too big for a byte, and would set the carry flag. It would not set the overflow flag, because 255 unsigned = -1 signed, and 127-1 = 126 which fits in a signed byte.

Note that after an unsigned subtraction a - b, if the carry flag is set, this indicates that b > a.

The CF flag is still set/unset if the inputs are signed, but its value is meaningless in that case.
The ZF flag is set if the result was 0 (all zero bits).
The SF flag is set to a copy of the sign bit (set if result was negative, for signed operations). For unsigned operations its just a copy of the high bit of the result.
The PF flag is set if the number of set (=1) bits in the low byte of the result is odd. The parity flag is something of a historical artifact and is not used much, partly because it does not give the parity of the entire result, just the lowest byte of it.
We’ll ignore the AF flag, as it only makes sense in the context of BCD arithmetic.

Note that all flags are set/cleared on all operations, but some flags only make sense on signed/unsigned operations. The add/sub instructions don’t know whether you are performing a signed or unsigned operation, so its up to you to make sure you check the correct flags for the type of operation you are performing.

Example: Let’s perform a add operation and see how the flags are set from it:

  111  11
   10110011   = 179 (unsigned)   = -77 (signed)
 + 01100110   = 102 (unsigned)   = 102 (signed)
────────────
1  00011001   =  25 (unsigned)   = 25  (signed)

Interpreted as signed values, one input was negative, the other was positive, so the sign of the result is guaranteed to be correct. OF = 0.
Interpreted as unsigned, the addition produced an extra carry, CF = 1.
The result is not 0, so ZF = 0.
The number of set bits in the result is odd, so PF = 1.
The high bit is unset (the result is positive), so SF = 0.

Here are two tables summarizing the results of addition and subtraction on the OF, CF, ZF, and SF flags (original):

Addition
A			B			A + B			Flags
Hex	U	S	Hex	U	S	Hex	U	S	OF	SF	ZF	CF
7F	127	127	0	0	0	7F	127	127	0	0	0	0
FF	255	-1	7F	127	127	7E	126	126	0	0	0	1
0	0	0	0	0	0	0	0	0	0	0	1	0
FF	255	-1	1	1	1	0	0	0	0	0	1	1
FF	255	-1	0	0	0	FF	255	-1	0	1	0	0
FF	255	-1	FF	255	-1	FE	254	-2	0	1	0	1
FF	255	-1	80	128	-128	7F	127	127	1	0	0	1
80	128	-128	80	128	-128	0	0	0	1	0	1	1
7F	127	127	7F	127	127	FE	254	-2	1	1	0	0

Subtraction
A			B			A - B			Flags
Hex	U	S	Hex	U	S	Hex	U	S	OF	SF	ZF	CF
FF	255	-1	FE	254	-2	1	1	1	0	0	0	0
7E	126	126	FF	255	-1	7F	127	127	0	0	0	1
FF	255	-1	FF	255	-1	0	0	0	0	0	1	0
FF	255	-1	7F	127	127	80	128	-128	0	1	0	0
FE	254	-2	FF	255	-1	FF	255	-1	0	1	0	1
FE	254	-2	7F	127	127	7F	127	127	1	0	0	0
7F	127	127	FF	255	-1	80	128	-128	1	1	0	1

Increment and decrement

inc dest    ; ++dest
dec dest    ; --dest

inc and dec increment/decrement their single operand, which can be either a register or a memory location. inc and dec do not modify the carry flag, as an add r, 1 or sub r, 1 instruction would. The flags OF, SF, ZF, AF, and PF are set/cleared as expected. When used on signed values, the behavior is still correct (incrementing a negative value brings it closer to 0, decrementing a negative value makes it more negative).

Addition/subtraction larger than 64-bits

The largest registers we have are 64-bits (qword). What if we want to perform an addition/subtraction on 128-bit operands (represented as, e.g., rdx:rax)? Let’s consider how we would perform word-sized addition, if the only addition we could do natively was byte-sized:

     111111←   1111
   00101101 11001101
 + 00010010 10101011 
─────────────────────
   01000000 01111000

Adding the low bytes produced an extra carry (CF = 1), which we then used to start the addition of the high bytes. We effectively need two kinds of addition:

Low byte addition, which doesn’t start with a carry (ignores CF)
High byte addition, which uses CF to start the addition.

This is how we perform larger-than-qword addition, there is another addition operation, add-with-carry, adc which uses the status of the carry flag CF as an input for the first bit’s addition.

adc dest, src       ; dest = dest + src + CF

For subtraction there is sbb, subtract-with-borrow.

Thus, to add the double-qword rdx:rax to rcx:rbx, we would do

add rax, rbx 
adc rdx, rcx

The analogue for subtraction is sbb, subtract-with-borrow.

Multiplication and division

Multiplication and division are more complex than addition/subtraction. We will cover them in more detail later, but for now:

The result of multiplying two n-bit values can have up to 2n bits. Hence, when multiplying two qword values, we need somewhere to store the double-quad-word result.
Division has the opposite problem, we may want to divide a double-qword value by a qword divisor.
Signed multiplication is different from unsigned, and similarly for division. Different instructions are used for each.
Multiplication instructions exist in two-operand forms (dest *= src) as you would expect, but also in a one-operand form which the rax register is implicitly used as the destination for the operation, and even a three operand form, equivalent to dest = src * immediate. The division instructions only take a single operand, and always store their results into a combination of rax and rdx.

To store a double-qword (128-bit) result, we use a combination of rax and rdx: rax stores the low qword while rdx stores the high qword. We write this combination as rdx:rax. (Using a similar notation, we could say that ax = ah:al.) Smaller multiplications do not require this extension.

The unsigned/signed multiplication instructions are mul and imul, respectively:

Instruction	Equivalent
`mul rm`	`rdx:rax *= rm`, unsigned
`imul rm`	`rdx:rax *= rm`, signed
`imul r, rm`	`r *= rm`, signed
`imul r, rm, imm`	`r = rm * imm`, signed

(For some reason, the signed multiply comes in two- and three-argument variants, while the unsigned only takes one.)

The CF and OF flags are set/cleared together, if the sign of the result is incorrect. If the result of the multiplication does not fit into the destination, the results are truncated (high bits discarded). The values in the other flags are undefined.

Division only has a single operand form, where the operand contains the divisor; the destination (which is also the dividend) is in rdx:rax. The result of div/idiv is both the rounded-down result in rax, but also the remainder (i.e., modulo or %) in rdx. Unlike C++ where we have / for integer division and % for integer modulo, in assembly a single instruction gives us both results.

Instruction	Equivalent
`div rm`	`rax = rdx:rax / rm` and `rdx = rdx:rax % rm`, unsigned
`idiv rm`	`rax = rdx:rax / rm` and `rdx = rdx:rax % rm`, signed

An overflow in division is indicated not by setting the carry flag, but by a divide-error exception #DE, which is sent to our process as a signal SIGFPE. For now, this will immediately crash our program, but later we’ll see how to write a signal handler to deal with it in a more graceful manner. (Of course, we could also avoid the overflow by checking the operands before performing the division.)

Simple functions

As we’ll see later, calling C functions from assembly, or making our assembly functions callable from C/C++ requires a few extra steps to set up the stack correctly. However, as long as we stay purely in “assembly-land”, we don’t need to worry about the extra complexity; we can essentially make functions work however we like. The only requirements are that we be able to return from a function and get back to where we were.

The two instructions that handle functions are call and ret. Both use the stack internally:

call takes an address (a label in our .text section) and performs two steps: pushes rip, the instruction pointer, onto the stack, and then jumps to the address given. Remember that rip points to the next instruction to be executed, so the value pushed onto the stack is effectively the return address of the function, the address where execution should resume when the function returns.
ret pops the top element of the stack and jumps to it. rip is updated to the following instruction automatically.

These work together as follows (addresses are just made up):

Address	Instruction	Address	Instruction
`_start:`		`my_func:`
0x100	`call my_func`	0x200	`mov eax, ...`
0x108	`mov rbx, rax`…	0x208	…
	…	0x280	`ret`

While my_func is executing, the stack contains 0x108, the return address. When ret is executed, this address is popped off the stack and we resume execution at that point. (Later, we’ll see that this means if you’re using the stack for anything else, you have to make sure you’ve popped every off before you ret, so that at that point, the only thing on the stack is the return address.)

Although we can use any “calling convention” we like, in terms of passing arguments and returning results, you should try to stick with what will eventually become the convention for calling functions:

Pass the first six arguments in registers rdi, rsi, rdx, rcx, r8 and r9. Note that this is slightly different from the syscall convention (rcx instead of r10).
Return results in rax

As an example, let’s write a function that prints a string (given as an address and a length) and adds a newline at the end. This will just wrap up the call to the write syscall we’ve been using:

section .data

newline:    db      10

section .text

write_ln:

    ; rdi = address
    ; rsi = length

    mov rax, 1 
    mov rdi, 1
    mov rsi, rdi
    mov rdx, rsi
    syscall

    mov rax, 1
    mov rdi, 1
    mov rsi, newline
    mov rdx, 1
    syscall

    ret

(Because we can switch sections at any time, you could put this code in a file and then %include it to make the write_ln function available in any program you write.)

To use this, we load the registers with the appropriate information and then issue a call:

section .data

msg:    db      "Hello, world!"
MSGLEN: equ     $-msg

section .text

    mov rdi, msg
    mov rsi, MSGLEN
    call write_ln

    ; Normal exit syscall...

We could, of course, further tidy this up by writing a function to wrap up the write syscall:

sys_write:

    ; rdi = address
    ; rsi = length

    mov rax, 1 
    mov rdi, 1
    mov rsi, rdi
    mov rdx, rsi
    syscall

    ret

and then our write_ln becomes just

write_ln:

    ; rdi = address
    ; rsi = length
    call sys_write

    mov rdi, newline
    mov rsi, 1
    call sys_write

    ret

Note that our functions “clobber” the registers rdi, rsi, rax and rdx: when calling either write_ln or sys_write you cannot rely on the values of these registers being preserved through the call. Theoretically, which registers and clobbered and which are preserved should be part of our function calling convention, so that when we call a function, we know which registers are “safe”. When we start using the stack, we’ll see how we can preserve register values by pushing them onto the stack before the function call, and then popping them off after it returns.

“Function pointers”

The address passed to call can be a register, not just a label:

mov r11, my_function
call r11

This is equivalent to calling a function through a function pointer.