Here we’re going to learn about assembly on a different family of CPUs: the Z80 family. The Z80 CPUs were originally cloned from the Intel 8080 CPU, which was a distant ancestor of our modern Intel CPUs, so some features will seem familiar.
The Z80 CPU is most famously used for being the CPU (with some changes) used in the original GameBoy and GameBoy Color, and for being the CPU used in the TI graphical calculators. The Z80 was intended to be binary compatible with the Intel 8080: the opcodes are the same (meaning that software written for the Intel 8080 can run on a Z80 without recompilation, and sometimes vice versa), but due to copyright issues, the register and instruction names are different.
Registers
The Z80 has 8 or 5 general-purpose registers depending on how you look at things:
The registers
A
,F
,B
,C
,D
,E
,H
andL
can be used individually as 8-bit registersThe register pairs
BC
,DE
, andHL
can be used as 16-bit registers. As you might expect,B
is the high-byte ofBC
, whileC
is the low-byte.Somewhat strangely, the
A
register can be combined with the flags registerF
as a 16-bit registerAF
.The registers
IX
andIY
are 16-bit index registers, used with memory operands.HL
is similarly used for memory operands.
As on x86, the SP
register points to the top of the stack, and the PC
register points to the current instruction (note that this is a difference
from x86 where the IP register points to the next instruction).
The F
lags register contains the flags: Sign, Zero, Parity/Overflow,
Full-Carry. The A
and F
registers can be combined in a few instructions
into a single 16-bit register AF
.
As a bit of foresight, the Z80 has an interrupt vector register I
which
contains the address of the interrupt vector table; in Intel CPUs of this
era, the address of the interrupt vector was fixed in memory.
Unlike x86, Z80 has a set of shadow registers; this is a set of additional
registers A'
, F'
, B'
, C'
, D'
, E'
, H'
, and L'
. You cannot
access these directly, but you can swap the main and shadow register sets
with a single instruction. This provides an easy, but limited, way to save the
values of all the registers and then restore them later (but, unlike the
stack, you can only save a single set of register values).
Instruction set
Most Z80 assemblers using a syntax more like the AT&T syntax than the Intel syntax. This means that instructions have the form
INSTRUCTION DESTINATION, SOURCE
memory operands are written in (parens)
,
You can find a complete reference to the Z80 instruction set here.
mov
has been renamed to ld
(“load”) but it can be used to write values into
registers or memory. E.g.,
ld a, 5 ; Set register A = 5
ld b, 7 ; Set register B = 7
add a, b ; Set A = A + B
ld [Output], a ; Write A into address Output
(Some Z80 assemblers write memory operands in parentheses ()
, while others,
such as the one we’re going to use, use the more familiar []
.
The add
instruction can only use a
, hl
, ix
, or iy
as its destination.
Memory operands can use immediate addresses or you can do something like
ld [hl], a ; Write A into the address contained in HL
If you want to use an immediate address plus a register in a memory operand,
the register must be either IX
or IY
:
ld [Array + ix], a ; Write A into Array[ix]
As with x86, both operands to ld
must be the same size: both 8-bit or both
16-bit. Immediate operands will be promoted to 16-bit as needed.
Z80 instructions are 1 to 3 bytes in size.
Conditional jumps
There are two conditional jump instructions
jp CC, Target ; Conditional absolute jump
jr CC, Target ; Conditional relative jump
where CC is a condition code. The main differences between the two are that
jp
is more flexible (can use more conditions and is faster), while jr
is more limited (fewer conditions, slower, can only jump ±128 bytes within the
program) but takes up only 2 bytes in the program as opposed to jp
‘s 3 bytes.
A more specialized conditional jump instruction, akin to loop
in x86, is
djnz
, which stands for “Decrement, Jump if Non-Zero”
djnz Target ; Decrement B, jump to Target if B ≠ 0
Functions
call
is used to call functions; as on x86, it pushes PC+3
and then jumps to
the function address. (Remember that the PC
contains the address of the
current instruction; i.e., the call
instruction itself. The call instruction
takes up 3 bytes, hence the next instruction is PC+3
.) Similarly, ret
returns from a function by popping the PC and jumping to it.
Unlike x86, the Z80 has conditional return instructions:
ret p ; Return if positive (sign flag = 0)
The available conditions are C/NC (carry set/not set), M/P (sign = 1/0),
Z/NZ (zero = 1/0), PE/PO (parity = ½). These are the same conditions used with
the conditional jump instruction jp CC
.
push
and pop
can be used to push/pop registers onto/off of the stack. Only
16-bit registers can be pushed/popped, so if you want to push A
, you have to
push it in combination with the flags register:
push af
Block transfer
Intel processes did not yet have the string instructions, let alone the string repetition prefixes, but the Z80 did have a minimal form of these in its block transfer opcodes. These allow a block of data to be copied from an I/O port or memory, to an I/O port or memory (memory-to-memory transfers are allowed).
The block transfer functions use HL
as the memory address to read/write, and
B
as the count. The INI
instruction (INput and Increment) reads 16-bits
from the given I/O port, copies it to the address pointed to by HL
,
increments HL
, and decrements B
. All of the block transfer instructions
follow this pattern:
Instruction | Operation |
---|---|
ind (input and decrement) |
Input to (HL) , --HL, --B |
ini (input and decrement) |
Input to (HL ), ++HL, --B |
indr (input, decrement, and repeat) |
Input to (HL) , --HL, --B until B == 0 |
indr (input, increment, and repeat) |
Input to (HL) , ++HL, --B until B == 0 |
LDD (load and decrement) |
Copy (HL) → (DE) , --HL, --DE, --BC |
LDI (load and increment) |
Copy (HL) → (DE) , ++HL, ++DE, --BC |
LDDR (load, decrement, and repeat) |
LDD , until BC == 0 |
LDIR (load, increment, and repeat) |
LDI , until BC == 0 |
CPI (compare and increment) |
Compare A with (HL) , ++HL, --BC |
CPD (compare and decrement) |
Compare A with (HL) , --HL, --BC |
CPIR (compare, inc. and repeat) |
CPI until A == (HL) or BC == 0 |
CPDR (compare, dec. and repeat) |
CPD until A == (HL) or BC == 0 |
Note that on x86, the string repetition instructions can decrement; this is
done by setting the DF
direction flag, rather than by using a separate
instruction.
Arithmetic instructions
The arithmetic instructions are ADD
, ADC
, SUB
, and SBC
. The *C
variants
use the contents of the carry flag to enable 16-bit arithmetic. No hardware
support for multiplication or division is provided (this was common in CPUs of
the class from this era). Multiplication by powers-of-2 can be done via shifts,
of course.
The GameBoy Z80 variant
The GameBoy used a custom variant of the Z80, existing somewhat “in between” the
Intel 8080 and the Z80: the Z80-specific registers were removed, but some
of the Z80-specific instructions (bit shifts and rotates, which were not
yet part of the Intel architecture) were retained. All of the port I/O
instructions were removed, as the GameBoy used memory-mapped I/O exclusively. A
few specialized variants of LD
were added that did not exist in either the
Z80 or the Intel 8080, to enable more efficient memory-mapped I/O.
The GameBoy CPU also lacks the “shadow” register set, and with it, the
instructions for manipulating it. The IX and IY registers are also missing,
which means that the HL
register is primarily used for indexing into memory.
16-bit loads/stores from memory are not supported, they have to be done as two separate instructions.
Some additions to the CPU:
C
can be used as an index in memory operands:ld [$ff00 + c], a
The new
swap
instruction can swap the high and low 4 bits of any byte-sized register.
The RGBDS assembler and tools
We’ll be using the RGBDS and tools to assemble code for the GameBoy. A copy has already been installed on the server, although you won’t be able to run a GameBoy emulator on the server, obviously.
Although the architecture is different, the assembly syntax supported by RGBASM is very similar to the YASM/Intel syntax we are familiar with. The main differences are
Hexadecimal values start with
$
, not0x
Binary values start with
%
, not0b
“GameBoy graphics” can be embedded directly using the notation
`0123
because each pixel on the GameBoy can be one of four shades of gray, the values 0 to 3 are used. Each value is encoded as a bit pair (00, 01, 10, 11) which are then packed into a byte. Thus, the above corresponds to the byte
%00011011
or$0f55
.Like YASM, RGBASM allows assembly-time computation of expressions involving constants and arithmetic operators.
The
SECTION
directive exists, but the only valid sections areROM0
( cartridge ROM bank 0) andROMx
(cartridge ROM bank x). TheSECTION
directive supports additional options for specifying the address of the section, if you need more control.As in YASM, labels start at the beginning of the line and end with
:
. Local labels start with.
. Labels must begin in the first character of a line.Because labels must begin the in the first column of a line, instruction must be indented by at least one space or tab.
Also as in YASM,
equ
can be used to define constants. However, the symbol name must not be followed by a:
(in YASM, the:
is optional):Addr equ 0xff40
While Z80 assembly traditionally writes memory operands in (), RGBASM uses Intel-style [].
For the
ldi
-family of instructions (load-and-increment), RGBASM supports two syntaxes. The usual way:ldi [hl], 0 ; Write 0 into address hl, increment hl
and an alternate syntax
ld [hli], 0 ; Write 0 into address hl, increment hl
The
hli
“register” indicates that registerHL
should be used and then incremented. This special syntax is only allowed with theld
instruction (you cannot just writehli
anywhere and expect it to be incremented).
You can find the full documentation on the RGBASM syntax here.
Hello World on a Game Boy (emulator)
The GameBoy hardware lacks any kind of text output to the screen, so we cannot actually print “Hello, world!”. Furthermore, it does not even allow direct access to the individual pixels on the screen; instead, it uses a tiled screen mode: the screen is broken into tiles, where each tile is 8x8 pixels. A tileset is stored in memory, and each on-screen tile is mapped to one of the tiles in the set. This saves a significant amount of memory over storing the entire (pixel) contents of the screen:
160 × 144 × 2 = 46080 bits = 5760 bytes
(160/8) × (144/8) × 8 = 2880 bits = 360 bytes
That is a 16x reduction in memory usage, quite a bit for a system with only 8KB of RAM in the first place.
A GameBoy runs software off cartridges, which contain the game executable stored in ROM. The layout of the cartridge ROM must match what the system expects in order for the cartridge to boot at all.
Cartridge memory map
The memory address space of the GameBoy is laid out like this:
Address range | Usage |
---|---|
0000-3fff | Cartridge ROM bank 0 |
4000-7fff | Cartridge ROM bank N |
8000-8fff | Video RAM: tiles/sprites |
9000-97ff | Video RAM: alternate tiles |
9800-9bff | Video RAM: Tilemap 1 |
9c00-9fff | Video RAM: Tilemap 2 |
a000-bfff | Cartridge RAM |
c000-cfff | Working RAM bank 0 |
d000-dfff | Working RAM bank N |
e000-efff | Mirror of working RAM bank 0 |
fe00-fe9f | Sprite attributes |
fea0-fe9f | Reserved |
ff00-ff7f | Memory mapped I/O |
ff80-fffe | Stack space |
ffff | Interrupt enable |
A note about “banking”: A cartridge may have more than 0x8000 (= 32KB) bytes of ROM, but only 32KB is directly accessible at any one time. To enable access to more ROM, the range 0x4000 - 0x7fff is switchable between different “banks”, where each bank is a 16KB chunk of cartridge ROM. You can access bank 0 at the same time as any of the other banks, but because you must switch to access the other banks, you cannot access (for example) banks 2 and 3 at the same time.
A similar system is used to provide access to extended cartridge RAM, via the range 0xd000-0xdfff.
The range 0xff00-0xff7f is used instead of hardware ports (the GameBoy CPU
lacks the in
/out
instructions) to access hardware. Some relevant values
in this range:
0xff00
: Joypad input0xff42, 0xff43
: Tilemap scroll origin X, Y0xff40
: LCD control register (can be used to turn the screen on and off, among other things)0xff41
: LCD status register0xff47, 48, 49
: Color palette data0xff4a
: “Window” position Y0xff4b
: Window position X minus 7
Cartridge ROM layout
On the GameBoy, the first 160 bytes of the cartridge are reserved for
interrupt handlers. We could either fill these with dummy handlers (reti
to
immediately return from an interrupt) or just disable interrupts entirely. We
do the latter.
INCLUDE "hardware.inc"
; Prior to address $100 is the interrupt table; we would have to set this up
; except that we disable
SECTION "GB Header", ROM0[$100]
EntryPoint:
di ; Disable interrupts
jp Begin ; Jump to executable start
; Header data...
The next portion of the cartridge must be laid out exactly as written: a
nop
followed immediately by a jp
over a block of data. This data encodes
the Nintendo logo displayed when this system starts up. If the system does not
find the logo exactly at the address expected, the system will not start. (This
system was Nintendo’s way of trying to thwart “unauthorized” cartridges: the
logo was copyrighted by Nintendo, so in order to use in in a cartridge, you
would need a Nintendo license, otherwise they could sue you for violating the
copyright on their logo. However, when this system was actually tested in court,
Nintendo lost, as the court ruled that, because creating “compatible” cartridges
was explicitly allowed by law, and “copying” the logo was required for a
cartridge to be compatible, this copying did not infringe on Nintendo’s
copyright.)
; Nintendo logo: 0104 - 0133
; This will be added by RGBFIX
; 0134-0142: Game name (upper-case)
db "CSCI241GAME.COM"
; GameBoy (00) /GameBoy Color (80) flag
db $00
; Manufacturer code
db 0,0
; Super GameBoy flag
db 0
; Cartridge type
db 0
; Cartridge ROM size
db 2 ; 0 = 32KB, 1 = 64KB, 2 = 128KB
; Cartridge RAM size
db 3 ; 3 = 32KB
; Country code
db 1 ; 1 = USA
; License code (must be 0x33)
db $33
; Cartridge version num.
db 0
The header portion of a GameBoy ROM ends with a pair of checksum values; these are the sum of all the bytes in the header, and all the bytes in the entire ROM, stored as ones-complement. We will use an external tool to update these to the correct values once we have finished writing our ROM.
; Header checksum: sum of bytes 0134-014c
db 0
; ROM checksum: sum of all ROM bytes
dw 0
When we assemble our program, we will use the rgbfix
tool to both add the
logo image, and to update the checksums:
rgbfix -p 0 -f lhg mygame.rom
After the header, the executable portion of the cartridge begins in proper:
Begin:
ld sp, $ffff ; Setup stack pointer
Setting up the screen
In order to display anything, we need to setup the screen to display a tilemap, a mapping from tile indexes (in the tileset) to tile graphics. Furthermore, because the tilemap can be scrolled around (as a character moves around), we must set the tilemap origin to (0,0):
TilemapX equ $ff42
TilemapY equ $ff43
xor a ; Set a = 0
ld hl, TilemapX ; Addr. 0xff42 = Tilemap origin X, 0xff43 = Y
ld [hl], 0 ; Set address 0xff42 = 0
inc hl
ld [hl], 0 ; Set address 0xff43 = 0
Anything that involves changing the tilemap or the tileset should be done with the screen turned off, to prevent graphical artifacts. This is because we will be copying data to video RAM. In order to turn off the screen, we have to wait for the next vertical refresh and then clear bit 7 of address 0xff40.
WaitLCD:
ld a, [$ff44] ; Load value at addr. 0xff44 into A (scanline)
cp 145 ; Compare A with 145
jr nz, WaitLCD ; Loop until equal
; Now the vertical refresh is in scanline 145, off the bottom of the visible
; screen. We can turn off the screen without it producing garbage.
ld hl, $ff40 ; Addr 0xff40 = LCD control
res 7, [hl] ; Reset bit 7 of (0xff40)
; Now the screen is off.
Note that we’ll have to reverse this process to turn it back on again once we have everything configured.
In order to display any tiles from a tileset, we have to copy it into the tileset part of video memory, which starts at address $9000. Because tiles can technically contain four “colors” (four shades of gray), each tileset has two “bitplanes” (because 4 different values requires 2 bits to represent). The font we are using contains data for both bitplanes, so we don’t need to worry about them.
ld de, Tileset ; Load address of tileset (from cart.)
ld hl, $9000 ; Load address of video RAM
ld bc, TileSetEnd-Tileset ; Size of tileset in bytes
CopyTilesetLoop:
ld a, [de] ; Load byte from tileset
ldi [hl], a ; Write to bitplane 1
ldi [hl], a ; Write to bitplane 2
inc de
dec bc
; This checks to see if BC == 0
ld a, b
or c
jr nz, CopyTilesetLoop
To compare BC to 0, because the cp
instruction is only 8-bit, we use a trick:
We bitwise-OR the high and low bytes of BC
with each other. If the result is
0, it can only be because both B
and C
are zero, indicating that BC
is 0.
(The or
instruction can only bitwise-OR with the A
register, which is why
we first copy B
into A
.)
In order to display our tileset, we have to configure some colors:
The color to use for the background is stored at address
0xff47
. This byte defines the colors for a few other things, as well.ld a, %00011011 ld hl, $ff47 ldi [hl], a ; Background colors
The colors to use for sprites are stored at address
0xff48
. We can use the same colors we used for the previous, together with the complement:ldi [hl], a ; Sprite set 1 colors cpl ; a = bitwise NOT of a (invert colors) ldi [hl], a ; Sprite set 2 colors
(Note that the procedure for the GameBoy Color is different, as that uses actual colors.)
“Printing” text
With our tileset (containing text characters) copied into video RAM, “printing” text is just a matter of setting the correct entries in the tilemap to point to the characters from the tileset.
The tilemap starts at address 0x9800 and although the screen is only 20 tiles wide, the tilemap is 32 tiles wide. The extra 12 rows are used for scrolling. The tilemap is stored using row-major order, so to compute the address of a tile at location (x,y) use
Address = 0x9800 + x + y * 32
The multiplication by 32 must be done via a shift (shift left by 5), because the Z80 does not have multiplication.
ld hl, $9800 ; Starting address
ld de, HelloWorldStr ; "Text" (tile indexes) to display
.copyString:
ld a, [de] ; Read byte from string
ldi [hl], a ; Write to destination and increment address
inc de ; Increment text index
and a ; Check for terminating NUL
jr nz, .copyString ; Loop until NUL
Finally, we’ll also want to shut the sound system down, so it isn’t making noise:
xor a
ld [$ff26], a ; Sound on/off control
Since we are done writing to video RAM, we can now safely turn the screen back on:
ld hl, $ff40 ; Addr 0xff40 = LCD control
set 7, [hl] ; Set bit 7 of (0xff40)
(It’s important that we wait until after we done writing to the tilemap to turn the screen on; if you write to the tilemap while the screen is active, some tiles in the map will randomly not be displayed.)
Note that because the screen is turned off, there’s no need to wait for a vertical refresh before turning it back on.
Just like we did in our bootloader, we’ll end with an infinite loop, so that our text continues to be displayed:
.loopForever:
jr .loopForever
Data: Font and Text
The last thing we need to do is embed the font data and string we want to use I’m using a font included with RGBDS, for convenience. As in our bootloader, we treat the area after the infinite loop as a “data” section, because it is unreachable.
Tileset:
INCBIN "font.chr"
TilesetEnd:
HelloWorldStr:
db "Hello, world!", 0
Final assembly
To assemble this into a working cartridge, we assemble it into an object file using
rgbasm -o game.o game.asm
which we then “link” to produce a cartridge file:
rgblink -o game.gb game.o
The linker doesn’t actually link the object file with anything, it just places the sections in the cartridge ROM.
and finally, we use rgbfix
to add the logo and update the checksums:
rgbfix -v -p 0 game.gb
At this point, we should be able to load the “cartridge” game.gb
into your
favorite emulator and start it up. If you have access to a physical flash
cartridge, you can run it on an actual GameBoy. (I’m running it in Visual Boy
Advance, which, among other things, allows remote GDB connections for
debugging your code, similar to the way we could connect to QEMU from GDB
to debug our bootloader.)
Hello World on a TI-83 Graphing Calculator (emulator)
A number of TI graphing calculators ran Z80 processors, hence we can write Z80 assembly for them. Unlike the GameBoy, TI calculators run a “stock” Z80 with no modifications.
We’re going to use the Pasmo assembler and the Oysterpac tool for “packing” raw binary data into TI calculator executables. The two step process is simply to first assemble our source code into a binary file:
pasmo file.asm file.bin
and then run Oysterpac on it to produce a packed executable:
oysterpac file.bin file.83p
This will produce a TI executable named file
which can be run
in a calculator emulator or transfered via link cable to an actual calculator.
Pasmo assembler
The syntax used by Pasmo is similar to that used by RGBASM, with a few alterations:
Memory operands are written in
(parens)
instead of[brackets]
Hexadecimal constants start with
#
instead of$
or0x
Binary constants still start with
%
Labels begin in column 1 but do not end with a colon
Similarly,
equ
definitions do not use a colon:Symbol equ expression
Instructions must be indented
As in YASM,
$
is defined to be the current address.
Hello, world!
The GameBoy Hello World example was rather lengthy, because the GameBoy is a) not intended to display text, so we had to setup a font and b) provides very little in the way of system routines to help do anything. The TI system is the opposite: except when graphing, it mostly displays text and it provides a great deal of system routines to help with everything, including printing Hence, Hello World for a TI-83+ is much shorter.
Before we start with the code proper, it’s helpful to define some macros and definitions:
In order to call system routines, we use a “restart” instruction which triggers an interrupt. All of the TI-83+ system routines are under interrupt 0x28, and the following two bytes after the
rst
instruction give the “subfunction address” specifying which system routine to call.macro rom_call,addr rst #28 dw addr endm
TI-83+ programs, running under the Ion shell start at address 0x9d95.
ProgStart equ #9d95
Finally, we define the addresses of a number of useful system routines, to be used with the
rom_call
macro we defined earlier:_clearlcd equ #4543 _newline equ #452E _puts equ #450A
;;
;; Executable code
;;
org ProgStart
; Ti-83+/Ion Shell programs must start with these two bytes:
db #bb, #6d
; Clear the LCD (system routine)
rom_call _clearlcd
; Set the "pen" column where text will be printed
ld hl, 0
ld (#86D7), hl
ld hl, msg
; Print text, followed by newline
rom_call _puts
rom_call _newline
ret
msg
db "Hello, world!", 0