Just like C/C++, YASM has a preprocessor which operates purely on the text of your assembly program, before actually doing the assembly. Because it runs at assembly time, it can actually be a richer “language” than assembly itself. On the other hand, because it runs at assembly time, it cannot refer to any runtime information (contents of registers or memory). Text-based macro languages are actually quite common in computer science, so it’s worth getting to know at least one in depth (if you haven’t already looked into the C/C++ preprocessor).

Errors

We can stop assembly and print a message using the %error macro:

%error Something went wrong.

Include-files

Just like in C/C++ we can “include” the contents of another file within the current .s file:

%include "file.s"

will directly include the contents of file.s.

Note that there is a different mechanism for embedding a binary file into an assembly file:

data:     incbin "file.data"

includes the contents of “file.dat” in binary into the executable, labeled as data.

Single-line macros

At its heart, the macro preprocessor works by replacing some text, with some other text. For example,

%define accumulator     rax

says that whenever the text accumulator is seen, it should be replaced with the text rax. Thus, we can now write things like

add accumulator, 10   ; Equivalent to    add rax, 10

Macros can be redefined: we can do

%define accumulator rax

and then later do

%define accumulator rcx

accumulator will expand into whatever the most recent definition says it should.

We can define one macro in terms of another:

%define increment   inc accumulator

Now, when we write

increment

This will first expand into inc accumulator, and then into inc rax. Note that the expansion of accumulator doesn’t happen when we %define increment, but when we use increment. This means that macro expansion is temporarily suspended during the definition of other macros. If accumulator is later redefined, as above, increment will use the most-recent definition, rather than the one which was in place when it was defined.

On the other hand, sometimes we want the definition to be expanded at definition time, instead of waiting until it is used. If a macro is defined with %xdefine the definition will be expanded immediately, at the point of definition. The difference can be seen by redefining a macro:

%define a  1
%define b  a
%xdefine c a        ; Equivalent to %define x 1
...
%define a 2
mov rax, b          ; Expands to mov rax, 2
mov rcx, c          ; Expands to mov rax, 1

If you define a macro to expand to itself, e.g., %define x x, using it will not send the assembler into an infinite loop; it will expand once, and then stop. When we later see function-style macros, this implies that recursion is not possible. Note that this is not a problem for %xdefine macros.

A macro can be undefined (so that using its name is an error) by using %undef:

%undef accumulator

...
mov rbx, accumulator 

The second line will assume that accumulator is a label, and if it is not defined as such, give an error.

Function-style macros

Single-line macros can have parameters:

%define increment(r)    inc r

To use this, we must supply the parameter, which will be “spliced” into the expansion, wherever r is used:

increment(rax)              ; expands to inc rax
increment(qword [var])      ; expands to inc qword [var]

Multiple parameters are separated by commas:

%define swap(a,b)       xchg a, b
...
swap(rax, rdi)          ; Expands to xchg rax, rdi

YASM allows “macro overloading”; multiple macros with the same name and different numbers of parameters, with different definitions.

Case-sensitivity

Macros are by default case-sensitive: %define foo 1 will expand foo, but not Foo or FOO. Some assemblers are case-insensitive, so for compatibility, YASM has %idefine and %xidefine which define a case-insensitive macro.

Concatenating macro expansions

Sometimes we need to form a single string out of some macro expansions. For example, if we write

%define reg(b) r b

reg(ax)         ; Expands to r ax, which is *not* a register!

a %+ b concatenates the text on its left and right, “eating” any spaces around it. The proper definition of reg would be

%define reg(b) r %+ b

reg(ax)        ; Expands to rax

Arithmetic-expansion macros

Suppose we want to define a macro contains a numeric value, and then be able to increment it. We might try something like this:

%xdefine v 0
...
%xdefine v v+1      ; Now v expands to 0+1

This will work, as the assembler will perform the arithmetic 0+1 after the preprocessor is finished and compute the correct value, but it’s cumbersome. After a few increments, we could have v expanding into 0+1+1+1+1+1+1+....

Instead of doing this, we can use %assign. %assign works like %xdefine, except that it evaluates any arithmetic in its definition. So we can use

%assign v 0
%assign v v+1       ; Now v expands to 1

Thus, the expansion of v is always something that resembles a number, rather than a string like 0+1.

String handling

YASM’s preprocessor has some features for working with string literals: "..." or '...'. You can extract the length (number of characters) of a string literal:

%strlen len "String"

This %assign-s 6 to len. Note that the following will not work:

string:     db  "String"

%strlen len string

because string is not a string, but rather a label (address) pointing to the first character of the string in memory. On the other hand, we can %define a macro which expands to a string literal, and then ask for its length:

%define string "String"
%strlen len string

Similarly, you can kind of “subscript” into a string literal to extra individual characters:

%substr c "String" 2        ; Defines c to be 'r'

Subscripts start a 1, not 0, so the last character’s position is equal to the string’s %strlen.

Multiline Macros

More complex macros will need multi-line definitions. This is accomplished by using %macro and %endmacro:

%macro swap 2
    mov r11, %1
    mov %1, %2
    mov %2, r11
%endmacro

Unlike a single-line macro, a multi-line macro only knows the number of parameters (2, above), and not their names. The names of the parameters are always %1, %2, etc.

To “call” a multi-line macro, use its name, follow by its parameters (not in parentheses):

swap rax, rcx
; Expands into 
;   mov r11, rax
;   mov rax, rbx
;   mov rbx, r11

As with single line macros, multi-line macros can be “overloaded” on the number of parameters. You can even define a multi-line macro with the same name as an instruction:

%macro push 2
    push %1
    push %2
%endmacro
...
push rax            ; Normal push instruction
push rax, rbx       ; Expands to the above

The assembler will give a warning, but the above code works just fine.

Rest parameter

You can create a multi-line macro which accepts any number of parameters (with some minimum). For example,

%macro print 1+
  section .data
    %%string:   db      %1
    %%strlen:   equ     $-%%string

  section .text
    mov rax, 1
    mov rdi, 1
    mov rsi, %%string
    mov rdx, %%strlen
    syscall
%endmacro

Here, we temporarily switch to the .data section to add a new string constant, and then switch back to .text and expand to a syscall which prints it. We can use this like

print "Hello world!", 10

and no matter how many “parameters” are given, they will all be placed in %1.

Note that we can no longer overload print with 2, 3, etc. parameters. The
given definition effectively defines different versions of print for all parameter counts from 1 to infinity.

Default parameters

Instead of allowing an unlimited number of parameters, we can support a range, and supply defaults for any omitted parameters:

%macro swap 2-3 r11
    mov %3, %1
    mov %1, %2
    mov %2, %3
%endmacro

If we use this as swap rax, rbx then %3 expands to the default, r11. On the other hand, if we supply a third parameters, e.g., swap rax, rbx, r15 then the supplied third parameter will be used for %3.

If we create a macro with 3-5 parameters, then we have to supply 5-3 = 2 default values, which become the defaults for parameters %4 and %5. If you omit a default, then the default is to expand into nothing at all.

Default parameters can be combined with rest parameters; you can write 3-5+ and this means, 3 or more, but any more than 5 all go into %5.

You can specify an unlimited max number of parameters by writing 3-* (three to infinity). Of course, you won’t be able to write defaults for all of them. The difference between this and + is that + groups all remaining parameters into a single parameter, while this makes them all individual accessible. The %0 “parameter”, which expands into the number of actual parameters supplied, is very useful here.

“Rotating” the parameter list

Suppose a macro takes three parameters, %1, %2, %3. We can rotate the list by issuing the macro %rotate 1. After rotation, the original second parameter will be in the first position, and the third in the second position, and the first parameter will have been rotated all the way around to 3. This is mostly useful inside repeating macros, as it allows us to access all the parameters without numeric indexing. E.g., the following is a version of push which accepts any number of parameters and pushes all of them:

%macro push 2-*
  %rep %0
    push %1
    %rotate 1
  %endrep
%endmacro

push rax, rbx, rcx, qword [var]

%rotate n rotates the parameter list n spaces to the left (toward parameter %1) if n is positive. If n is negative, it rotates to the right.

%rep ... %endrep are discussed later, under Repeating Macros.

Macro-local names

If multi-line macros can expand into code, we might want to expand into code which contains labels (e.g., to expand into a loop). However, this will cause problems if the macro is ever expanded more than once; then we will have multiple definitions of the same label. To fix this problem we can use macro-local labels. A macro-local label is a normal label whose name starts with %%. For example,

%macro retz 0
    jnz     %%skip
    ret

  %%skip:
%endmacro

Each time the macro is expanded, a new unique name will be generated for the label %%skip, so none of the expansions will interfere with each other.

A macro-local name does not actually have to be used as a label. We can, for example, use it as a single-line macro name to create a kind of “local variable”:

%macro testmacro 0
    %assign %%v 0
    mov rax, %%v
%endmacro

Here, the “variable” %%v will get a different name each time the macro is expanded.

Concatenating multi-line parameters

Unlike single-line parameters, which need the special %+ operator to concatenate, multi-line parameters need no such notation:

%macro string_n 2
    string%1:   db  %2
%endmacro

string_n 7 "Hello"          ; Expands into string7: db "Hello"

If we want to concatenate some text after a parameter, we can write %{1}text; this will expand %1 and then add “text” immediately after the expansion, with no space between.

Condition code parameters

YASM has special support for parameters which contain condition codes (z, ge, etc.). If %1 expands into a condition code, then %-1 expands into the negation of that code. E.g., z becomes nz, ge becomes l, and so forth. Similarly, %+1 expands into the original, unchanged, condition code, except that it enforces that the parameter actually is a condition code, and gives an error if it is not.

For example, our retz macro above can be generalized into one which allows any condition code (with a default of z) by doing

%macro retcc 0-1 z
    j%-1    %%skip
    ret

  %%skip:
%endmacro

Conditional macros

Often we want to include some portion of our source text, depending on certain (assembly-time) parameters, and exclude it or replace it with some other text the rest of the time. This can be accomplished via conditional macros. The most basic conditional macro mirrors the familiar if-else if-else statement:

%if<condition>
    ...
%elif<condition>
    ...
%else
    ...
%endif

As we would expect, 0 or more than 1 %elif-s are allowed, and the final %else is optional. Note that the condition appears immediately after the %if/%elif, with no space between.

def – Checking a single-line macro for defined-ness

We can use %ifdef to check whether a given (single line) macro has already been defined. For example,

%ifdef DEBUG
    ... ; Debug build code
%else
    ... ; Production code
%endif

Not-defined can be tested using the condition ndef.

macro — Checking multi-line defined-ness

macro checks to see if a multi-line macro is defined:

%ifmacro push 2+

    ; Multi-arg push is defined

%endif

Numerical expressions

%if expr will check the numerical expression expr and continue if its value is non-zero. You can use the normal comparison operators in the expression. Note that equality is = (a single equals), and inequality is <>.

idn – Text comparison

%ifidn t1, t2 succeeds if t1 and t2 expand into identical sequences of text.

num, id, str – Checking token types

Repetition macros

To repeat some text a certain number of times, we use %rep:

%rep 10
    inc rax
%endrep

This will expand into ten inc rax instructions. The argument to %rep can be a numeric expression.

%assign can be used in tandem with %rep to make “loop variables”:

%assign i 0
%rep 10
    mov qword [arr + i], i
    %assign i i+1
%endrep

This initializes a 10-qword array at address arr to 0, 1, 2, 3, … 9.

We can “break” out of a %rep early using %exitrep:

section .data
data:

%assign i 1
%rep 100
    db i
    %assign i i*2
    %if i > 1024
        %exitrep
    %endif
%endrep

This creates an array labeled data which is initialized to 1, 2, 4, 8, ….

The Context stack

The context stack is a mechanism for allowing things like macro-local labels (which do not break if the macro is expanded more than once), but which are shared by more than one macro definition. E.g., currently, a macro-local label defined in one macro cannot be referred to by another in any way; it is invisible. This makes defining fancier macros (which often require more than one definition) difficult or impossible.

To solve this problem, YASM maintains a stack of “contexts”. Labels can be created which are local to the context on the top of the stack. A new context can be pushed on top of the stack with %push, and can be removed with %pop. Because a stack is used, fancy macros can be nested without breaking each other.

Context-local label

To create a label which is local to the current context, we write %$name. This can also be used to %define or %assign a macro whose name is local to the current context:

%define %$lm 5
%assign %$i 0

This prevents variables from interfering with other scopes.

Note that when a context is %pop-ed, all of its local labels/macros are undefined.

Example: block IF statements

Suppose we want to define a macro which will allow us to write a more natural if-like structure:

IF rax, e, rcx
    ...
ENDIF

where this expands into something like

cmp rax, rcx
jne .endif
    ...
.endif:

except that the label .endif should be generated uniquely for each IF-ENDIF.

We need to define two %macros:

%macro IF 3
    %push if 
    cmp %1, %3
    j%-2 %$endif
%endmacro

%macro ENDIF 0
    %$endif:
    %pop
%endmacro
  1. When we call the IF macro, it pushes the if context onto the stack, letting us know we are inside an if.

  2. It also performs the comparison, and does the jump if the comparison fails.

  3. The ENDIF macro creates the label which is the target of the jump in (2), and removes the if context from the stack (because we are no longer inside an if).

This macro works, but it will fail horribly if we use an ENDIF with no matching IF. We can use the %ifctx conditional to check the context on the top of the stack, and issue an error if we are not inside an IF:

%macro ENDIF 0
    %ifctx if
        %$endif:
        %pop
    %else
        %error Expected IF before ENDIF
    %endif
%endmacro

We could use a similar technique to define a DO-WHILE macro for iteration:

%macro DO 0
    %push do_while
    %$do
%endmacro

%macro WHILE 3
    %ifctx do_while
        cmp %1, %3
        j%-2 %$do
    %else
        %error Expected DO before WHILE
    %endif
%endmacro

This can be used as

mov rax, 0
DO
    mov qword [arr + rax], rax
    inc rax

WHILE rax, le, 100 

These loops/ifs can even be nested within each other, so long as they use different registers.

Another example: PROC/ENDPROC are used in the Microsoft assembler to mark the beginning/end of functions:

PROC myfunction
    ... Stuff

ENDPROC

In MASM, these are needed to make labels inside functions local to the function; YASM doesn’t need this, because we have local labels, but we can still define them for compatibility. We can even add some error checking, so that ENDPROC without PROC, or nested PROCs, is an assembly-time error:

%macro PROC 1
    %ifnctx proc
        %push proc
        %{1}:
    %else
        %error Found PROC without preceding closing ENDPROC
    %endif
%endmacro

%macro ENDPROC 0
    %ifctx proc
        %pop 
    %else
        %error Found ENDPROC without preceding PROC
    %endif
%endmacro