Just like C/C++, YASM has a preprocessor which operates purely on the text of your assembly program, before actually doing the assembly. Because it runs at assembly time, it can actually be a richer “language” than assembly itself. On the other hand, because it runs at assembly time, it cannot refer to any runtime information (contents of registers or memory). Text-based macro languages are actually quite common in computer science, so it’s worth getting to know at least one in depth (if you haven’t already looked into the C/C++ preprocessor).
Errors
We can stop assembly and print a message using the %error
macro:
%error Something went wrong.
Include-files
Just like in C/C++ we can “include” the contents of another file within the
current .s
file:
%include "file.s"
will directly include the contents of file.s
.
Note that there is a different mechanism for embedding a binary file into an assembly file:
data: incbin "file.data"
includes the contents of “file.dat” in binary into the executable, labeled
as data
.
Single-line macros
At its heart, the macro preprocessor works by replacing some text, with some other text. For example,
%define accumulator rax
says that whenever the text accumulator
is seen, it should be replaced with
the text rax
. Thus, we can now write things like
add accumulator, 10 ; Equivalent to add rax, 10
Macros can be redefined: we can do
%define accumulator rax
and then later do
%define accumulator rcx
accumulator
will expand into whatever the most recent definition says it
should.
We can define one macro in terms of another:
%define increment inc accumulator
Now, when we write
increment
This will first expand into inc accumulator
, and then into inc rax
. Note
that the expansion of accumulator
doesn’t happen when we %define increment
,
but when we use increment
. This means that macro expansion is temporarily
suspended during the definition of other macros. If accumulator
is later
redefined, as above, increment
will use the most-recent definition, rather
than the one which was in place when it was defined.
On the other hand, sometimes we want the definition to be expanded at
definition time, instead of waiting until it is used. If a macro is defined
with %xdefine
the definition will be expanded immediately, at the point
of definition. The difference can be seen by redefining a macro:
%define a 1
%define b a
%xdefine c a ; Equivalent to %define x 1
...
%define a 2
mov rax, b ; Expands to mov rax, 2
mov rcx, c ; Expands to mov rax, 1
If you define a macro to expand to itself, e.g., %define x x
, using it will
not send the assembler into an infinite loop; it will expand once, and then
stop.
When we later see function-style macros, this implies that recursion is not
possible. Note that this is not a problem for %xdefine
macros.
A macro can be undefined (so that using its name is an error) by using
%undef
:
%undef accumulator
...
mov rbx, accumulator
The second line will assume that accumulator
is a label, and if it is not
defined as such, give an error.
Function-style macros
Single-line macros can have parameters:
%define increment(r) inc r
To use this, we must supply the parameter, which will be “spliced” into the expansion, wherever r is used:
increment(rax) ; expands to inc rax
increment(qword [var]) ; expands to inc qword [var]
Multiple parameters are separated by commas:
%define swap(a,b) xchg a, b
...
swap(rax, rdi) ; Expands to xchg rax, rdi
YASM allows “macro overloading”; multiple macros with the same name and different numbers of parameters, with different definitions.
Case-sensitivity
Macros are by default case-sensitive: %define foo 1
will expand foo
, but
not Foo
or FOO
. Some assemblers are case-insensitive, so for
compatibility, YASM has %idefine
and %xidefine
which define a
case-insensitive macro.
Concatenating macro expansions
Sometimes we need to form a single string out of some macro expansions. For example, if we write
%define reg(b) r b
reg(ax) ; Expands to r ax, which is *not* a register!
a %+ b
concatenates the text on its left and right, “eating” any spaces
around it. The proper definition of reg
would be
%define reg(b) r %+ b
reg(ax) ; Expands to rax
Arithmetic-expansion macros
Suppose we want to define a macro contains a numeric value, and then be able to increment it. We might try something like this:
%xdefine v 0
...
%xdefine v v+1 ; Now v expands to 0+1
This will work, as the assembler will perform the arithmetic 0+1
after the
preprocessor is finished and compute the correct value, but it’s cumbersome.
After a few increments, we could have v
expanding into 0+1+1+1+1+1+1+...
.
Instead of doing this, we can use %assign
. %assign
works like %xdefine
,
except that it evaluates any arithmetic in its definition. So we can use
%assign v 0
%assign v v+1 ; Now v expands to 1
Thus, the expansion of v
is always something that resembles a number, rather
than a string like 0+1
.
String handling
YASM’s preprocessor has some features for working with string literals:
"..."
or '...'
. You can extract the length (number of characters) of a
string literal:
%strlen len "String"
This %assign
-s 6 to len
. Note that the following will not work:
string: db "String"
%strlen len string
because string
is not a string, but rather a label (address) pointing to the
first character of the string in memory. On the other hand, we can %define
a macro which expands to a string literal, and then ask for its length:
%define string "String"
%strlen len string
Similarly, you can kind of “subscript” into a string literal to extra individual characters:
%substr c "String" 2 ; Defines c to be 'r'
Subscripts start a 1, not 0, so the last character’s position is equal to the
string’s %strlen
.
Multiline Macros
More complex macros will need multi-line definitions. This is accomplished by
using %macro
and %endmacro
:
%macro swap 2
mov r11, %1
mov %1, %2
mov %2, r11
%endmacro
Unlike a single-line macro, a multi-line macro only knows the number of
parameters (2, above), and not their names. The names of the parameters are
always %1
, %2
, etc.
To “call” a multi-line macro, use its name, follow by its parameters (not in parentheses):
swap rax, rcx
; Expands into
; mov r11, rax
; mov rax, rbx
; mov rbx, r11
As with single line macros, multi-line macros can be “overloaded” on the number of parameters. You can even define a multi-line macro with the same name as an instruction:
%macro push 2
push %1
push %2
%endmacro
...
push rax ; Normal push instruction
push rax, rbx ; Expands to the above
The assembler will give a warning, but the above code works just fine.
Rest parameter
You can create a multi-line macro which accepts any number of parameters (with some minimum). For example,
%macro print 1+
section .data
%%string: db %1
%%strlen: equ $-%%string
section .text
mov rax, 1
mov rdi, 1
mov rsi, %%string
mov rdx, %%strlen
syscall
%endmacro
Here, we temporarily switch to the .data
section to add a new string constant,
and then switch back to .text
and expand to a syscall
which prints it. We
can use this like
print "Hello world!", 10
and no matter how many “parameters” are given, they will all be placed in %1
.
Note that we can no longer overload print
with 2, 3, etc. parameters. The
given definition effectively defines different versions of print
for all
parameter counts from 1 to infinity.
Default parameters
Instead of allowing an unlimited number of parameters, we can support a range, and supply defaults for any omitted parameters:
%macro swap 2-3 r11
mov %3, %1
mov %1, %2
mov %2, %3
%endmacro
If we use this as swap rax, rbx
then %3
expands to the default, r11
. On
the other hand, if we supply a third parameters, e.g., swap rax, rbx, r15
then
the supplied third parameter will be used for %3
.
If we create a macro with 3-5 parameters, then we have to supply 5-3 = 2
default values, which become the defaults for parameters %4
and %5
. If you
omit a default, then the default is to expand into nothing at all.
Default parameters can be combined with rest parameters; you can write
3-5+
and this means, 3 or more, but any more than 5 all go into %5
.
You can specify an unlimited max number of parameters by writing 3-*
(three
to infinity). Of course, you won’t be able to write defaults for all of them.
The difference between this and +
is that +
groups all remaining parameters
into a single parameter, while this makes them all individual accessible. The
%0
“parameter”, which expands into the number of actual parameters supplied,
is very useful here.
“Rotating” the parameter list
Suppose a macro takes three parameters, %1, %2, %3
. We can rotate the
list by issuing the macro %rotate 1
. After rotation, the original second
parameter will be in the first position, and the third in the second position,
and the first parameter will have been rotated all the way around to 3. This
is mostly useful inside repeating macros, as it allows us to access all the
parameters without numeric indexing. E.g., the following is a version of
push
which accepts any number of parameters and pushes all of them:
%macro push 2-*
%rep %0
push %1
%rotate 1
%endrep
%endmacro
push rax, rbx, rcx, qword [var]
%rotate n
rotates the parameter list n
spaces to the left (toward parameter
%1
) if n is positive. If n is negative, it rotates to the right.
%rep ... %endrep
are discussed later, under Repeating Macros.
Macro-local names
If multi-line macros can expand into code, we might want to expand into
code which contains labels (e.g., to expand into a loop). However, this will
cause problems if the macro is ever expanded more than once; then we will have
multiple definitions of the same label. To fix this problem we can use
macro-local labels. A macro-local label is a normal label whose name
starts with %%
. For example,
%macro retz 0
jnz %%skip
ret
%%skip:
%endmacro
Each time the macro is expanded, a new unique name will be generated for the
label %%skip
, so none of the expansions will interfere with each other.
A macro-local name does not actually have to be used as a label. We can, for example, use it as a single-line macro name to create a kind of “local variable”:
%macro testmacro 0
%assign %%v 0
mov rax, %%v
%endmacro
Here, the “variable” %%v
will get a different name each time the macro is
expanded.
Concatenating multi-line parameters
Unlike single-line parameters, which need the special %+
operator to
concatenate, multi-line parameters need no such notation:
%macro string_n 2
string%1: db %2
%endmacro
string_n 7 "Hello" ; Expands into string7: db "Hello"
If we want to concatenate some text after a parameter, we can write
%{1}text
; this will expand %1
and then add “text” immediately after the
expansion, with no space between.
Condition code parameters
YASM has special support for parameters which contain condition codes (z
,
ge
, etc.). If %1
expands into a condition code, then %-1
expands into
the negation of that code. E.g., z
becomes nz
, ge
becomes l
, and
so forth. Similarly, %+1
expands into the original, unchanged, condition
code, except that it enforces that the parameter actually is a condition code,
and gives an error if it is not.
For example, our retz
macro above can be generalized into one which allows
any condition code (with a default of z
) by doing
%macro retcc 0-1 z
j%-1 %%skip
ret
%%skip:
%endmacro
Conditional macros
Often we want to include some portion of our source text, depending on certain
(assembly-time) parameters, and exclude it or replace it with some other text
the rest of the time. This can be accomplished via conditional macros. The
most basic conditional macro mirrors the familiar if-else if-else
statement:
%if<condition>
...
%elif<condition>
...
%else
...
%endif
As we would expect, 0 or more than 1 %elif
-s are allowed, and the final
%else
is optional. Note that the condition appears immediately
after the %if/%elif
, with no space between.
def
– Checking a single-line macro for defined-ness
We can use %ifdef
to check whether a given (single line) macro has already
been defined. For example,
%ifdef DEBUG
... ; Debug build code
%else
... ; Production code
%endif
Not-defined can be tested using the condition ndef
.
macro
— Checking multi-line defined-ness
macro
checks to see if a multi-line macro is defined:
%ifmacro push 2+
; Multi-arg push is defined
%endif
Numerical expressions
%if expr
will check the numerical expression expr and continue if its value
is non-zero. You can use the normal comparison
operators in the expression. Note that equality is =
(a single equals), and
inequality is <>
.
idn
– Text comparison
%ifidn t1, t2
succeeds if t1
and t2
expand into identical sequences of
text.
num, id, str
– Checking token types
%ifnum t
succeeds ift
expands to something that looks like a number.%ifid t
succeeds ift
expands to something that looks like an identifier (i.e., a label orequ
)%ifstr t
succeeds ift
expands to something that looks like a string literal.
Repetition macros
To repeat some text a certain number of times, we use %rep
:
%rep 10
inc rax
%endrep
This will expand into ten inc rax
instructions. The argument to %rep
can be
a numeric expression.
%assign
can be used in tandem with %rep
to make “loop variables”:
%assign i 0
%rep 10
mov qword [arr + i], i
%assign i i+1
%endrep
This initializes a 10-qword array at address arr
to 0, 1, 2, 3, … 9.
We can “break” out of a %rep
early using %exitrep
:
section .data
data:
%assign i 1
%rep 100
db i
%assign i i*2
%if i > 1024
%exitrep
%endif
%endrep
This creates an array labeled data
which is initialized to 1, 2, 4, 8, ….
The Context stack
The context stack is a mechanism for allowing things like macro-local labels (which do not break if the macro is expanded more than once), but which are shared by more than one macro definition. E.g., currently, a macro-local label defined in one macro cannot be referred to by another in any way; it is invisible. This makes defining fancier macros (which often require more than one definition) difficult or impossible.
To solve this problem, YASM maintains a stack of “contexts”. Labels can be
created which are local to the context on the top of the stack. A new context
can be pushed on top of the stack with %push
, and can be removed with %pop
.
Because a stack is used, fancy macros can be nested without breaking each other.
Context-local label
To create a label which is local to the current context, we write %$name
.
This can also be used to %define
or %assign
a macro whose name is local
to the current context:
%define %$lm 5
%assign %$i 0
This prevents variables from interfering with other scopes.
Note that when a context is %pop
-ed, all of its local labels/macros are
undefined.
Example: block IF statements
Suppose we want to define a macro which will allow us to write a more
natural if
-like structure:
IF rax, e, rcx
...
ENDIF
where this expands into something like
cmp rax, rcx
jne .endif
...
.endif:
except that the label .endif
should be generated uniquely for each IF-ENDIF.
We need to define two %macro
s:
%macro IF 3
%push if
cmp %1, %3
j%-2 %$endif
%endmacro
%macro ENDIF 0
%$endif:
%pop
%endmacro
When we call the
IF
macro, it pushes theif
context onto the stack, letting us know we are inside an if.It also performs the comparison, and does the jump if the comparison fails.
The
ENDIF
macro creates the label which is the target of the jump in (2), and removes theif
context from the stack (because we are no longer inside anif
).
This macro works, but it will fail horribly if we use an ENDIF
with no
matching IF
. We can use the %ifctx
conditional to check the context on
the top of the stack, and issue an error if we are not inside an IF:
%macro ENDIF 0
%ifctx if
%$endif:
%pop
%else
%error Expected IF before ENDIF
%endif
%endmacro
We could use a similar technique to define a DO-WHILE macro for iteration:
%macro DO 0
%push do_while
%$do
%endmacro
%macro WHILE 3
%ifctx do_while
cmp %1, %3
j%-2 %$do
%else
%error Expected DO before WHILE
%endif
%endmacro
This can be used as
mov rax, 0
DO
mov qword [arr + rax], rax
inc rax
WHILE rax, le, 100
These loops/ifs can even be nested within each other, so long as they use different registers.
Another example: PROC/ENDPROC are used in the Microsoft assembler to mark the beginning/end of functions:
PROC myfunction
... Stuff
ENDPROC
In MASM, these are needed to make labels inside functions local to the function; YASM doesn’t need this, because we have local labels, but we can still define them for compatibility. We can even add some error checking, so that ENDPROC without PROC, or nested PROCs, is an assembly-time error:
%macro PROC 1
%ifnctx proc
%push proc
%{1}:
%else
%error Found PROC without preceding closing ENDPROC
%endif
%endmacro
%macro ENDPROC 0
%ifctx proc
%pop
%else
%error Found ENDPROC without preceding PROC
%endif
%endmacro