Here we’re going to learn about assembly on a different family of CPUs: the Z80 family. The Z80 CPUs were originally cloned from the Intel 8080 CPU, which was a distant ancestor of our modern Intel CPUs, so some features will seem familiar.

The Z80 CPU is most famously used for being the CPU (with some changes) used in the original GameBoy and GameBoy Color, and for being the CPU used in the TI graphical calculators. The Z80 was intended to be binary compatible with the Intel 8080: the opcodes are the same (meaning that software written for the Intel 8080 can run on a Z80 without recompilation, and sometimes vice versa), but due to copyright issues, the register and instruction names are different.

Registers

The Z80 has 8 or 5 general-purpose registers depending on how you look at things:

As on x86, the SP register points to the top of the stack, and the PC register points to the current instruction (note that this is a difference from x86 where the IP register points to the next instruction). The Flags register contains the flags: Sign, Zero, Parity/Overflow, Full-Carry. The A and F registers can be combined in a few instructions into a single 16-bit register AF.

As a bit of foresight, the Z80 has an interrupt vector register I which contains the address of the interrupt vector table; in Intel CPUs of this era, the address of the interrupt vector was fixed in memory.

Unlike x86, Z80 has a set of shadow registers; this is a set of additional registers A', F', B', C', D', E', H', and L'. You cannot access these directly, but you can swap the main and shadow register sets with a single instruction. This provides an easy, but limited, way to save the values of all the registers and then restore them later (but, unlike the stack, you can only save a single set of register values).

Instruction set

Most Z80 assemblers using a syntax more like the AT&T syntax than the Intel syntax. This means that instructions have the form

    INSTRUCTION DESTINATION, SOURCE

memory operands are written in (parens),

You can find a complete reference to the Z80 instruction set here.

mov has been renamed to ld (“load”) but it can be used to write values into registers or memory. E.g.,

    ld  a, 5             ; Set register A = 5
    ld  b, 7             ; Set register B = 7
    add a, b             ; Set A = A + B
    ld [Output], a       ; Write A into address Output

(Some Z80 assemblers write memory operands in parentheses (), while others, such as the one we’re going to use, use the more familiar [].

The add instruction can only use a, hl, ix, or iy as its destination. Memory operands can use immediate addresses or you can do something like

ld [hl], a ; Write A into the address contained in HL

If you want to use an immediate address plus a register in a memory operand, the register must be either IX or IY:

    ld [Array + ix], a   ; Write A into Array[ix]

As with x86, both operands to ld must be the same size: both 8-bit or both 16-bit. Immediate operands will be promoted to 16-bit as needed.

Z80 instructions are 1 to 3 bytes in size.

Conditional jumps

There are two conditional jump instructions

    jp CC, Target           ; Conditional absolute jump

    jr CC, Target           ; Conditional relative jump

where CC is a condition code. The main differences between the two are that jp is more flexible (can use more conditions and is faster), while jr is more limited (fewer conditions, slower, can only jump ±128 bytes within the program) but takes up only 2 bytes in the program as opposed to jp‘s 3 bytes.

A more specialized conditional jump instruction, akin to loop in x86, is djnz, which stands for “Decrement, Jump if Non-Zero”

    djnz Target          ; Decrement B, jump to Target if B ≠ 0

Functions

call is used to call functions; as on x86, it pushes PC+3 and then jumps to the function address. (Remember that the PC contains the address of the current instruction; i.e., the call instruction itself. The call instruction takes up 3 bytes, hence the next instruction is PC+3.) Similarly, ret returns from a function by popping the PC and jumping to it.

Unlike x86, the Z80 has conditional return instructions:

    ret p               ; Return if positive (sign flag = 0)

The available conditions are C/NC (carry set/not set), M/P (sign = 1/0), Z/NZ (zero = 1/0), PE/PO (parity = ½). These are the same conditions used with the conditional jump instruction jp CC.

push and pop can be used to push/pop registers onto/off of the stack. Only 16-bit registers can be pushed/popped, so if you want to push A, you have to push it in combination with the flags register:

    push af

Block transfer

Intel processes did not yet have the string instructions, let alone the string repetition prefixes, but the Z80 did have a minimal form of these in its block transfer opcodes. These allow a block of data to be copied from an I/O port or memory, to an I/O port or memory (memory-to-memory transfers are allowed).

The block transfer functions use HL as the memory address to read/write, and B as the count. The INI instruction (INput and Increment) reads 16-bits from the given I/O port, copies it to the address pointed to by HL, increments HL, and decrements B. All of the block transfer instructions follow this pattern:

Instruction Operation
ind (input and decrement) Input to (HL), --HL, --B
ini (input and decrement) Input to (HL), ++HL, --B
indr (input, decrement, and repeat) Input to (HL), --HL, --B until B == 0
indr (input, increment, and repeat) Input to (HL), ++HL, --B until B == 0
LDD (load and decrement) Copy (HL) → (DE), --HL, --DE, --BC
LDI (load and increment) Copy (HL) → (DE), ++HL, ++DE, --BC
LDDR (load, decrement, and repeat) LDD, until BC == 0
LDIR (load, increment, and repeat) LDI, until BC == 0
CPI (compare and increment) Compare A with (HL), ++HL, --BC
CPD (compare and decrement) Compare A with (HL), --HL, --BC
CPIR (compare, inc. and repeat) CPI until A == (HL) or BC == 0
CPDR (compare, dec. and repeat) CPD until A == (HL) or BC == 0

Note that on x86, the string repetition instructions can decrement; this is done by setting the DF direction flag, rather than by using a separate instruction.

Arithmetic instructions

The arithmetic instructions are ADD, ADC, SUB, and SBC. The *C variants use the contents of the carry flag to enable 16-bit arithmetic. No hardware support for multiplication or division is provided (this was common in CPUs of the class from this era). Multiplication by powers-of-2 can be done via shifts, of course.

The GameBoy Z80 variant

The GameBoy used a custom variant of the Z80, existing somewhat “in between” the Intel 8080 and the Z80: the Z80-specific registers were removed, but some of the Z80-specific instructions (bit shifts and rotates, which were not yet part of the Intel architecture) were retained. All of the port I/O instructions were removed, as the GameBoy used memory-mapped I/O exclusively. A few specialized variants of LD were added that did not exist in either the Z80 or the Intel 8080, to enable more efficient memory-mapped I/O.

The GameBoy CPU also lacks the “shadow” register set, and with it, the instructions for manipulating it. The IX and IY registers are also missing, which means that the HL register is primarily used for indexing into memory.

16-bit loads/stores from memory are not supported, they have to be done as two separate instructions.

Some additions to the CPU:

The RGBDS assembler and tools

We’ll be using the RGBDS and tools to assemble code for the GameBoy. A copy has already been installed on the server, although you won’t be able to run a GameBoy emulator on the server, obviously.

Although the architecture is different, the assembly syntax supported by RGBASM is very similar to the YASM/Intel syntax we are familiar with. The main differences are

You can find the full documentation on the RGBASM syntax here.

Hello World on a Game Boy (emulator)

The GameBoy hardware lacks any kind of text output to the screen, so we cannot actually print “Hello, world!”. Furthermore, it does not even allow direct access to the individual pixels on the screen; instead, it uses a tiled screen mode: the screen is broken into tiles, where each tile is 8x8 pixels. A tileset is stored in memory, and each on-screen tile is mapped to one of the tiles in the set. This saves a significant amount of memory over storing the entire (pixel) contents of the screen:

    160 × 144 × 2 = 46080 bits = 5760 bytes

    (160/8) × (14‌4/8) × 8 = 2880 bits = 360 bytes

That is a 16x reduction in memory usage, quite a bit for a system with only 8KB of RAM in the first place.

A GameBoy runs software off cartridges, which contain the game executable stored in ROM. The layout of the cartridge ROM must match what the system expects in order for the cartridge to boot at all.

Cartridge memory map

The memory address space of the GameBoy is laid out like this:

Address range Usage
0000-3fff Cartridge ROM bank 0
4000-7fff Cartridge ROM bank N
8000-8fff Video RAM: tiles/sprites
9000-97ff Video RAM: alternate tiles
9800-9bff Video RAM: Tilemap 1
9c00-9fff Video RAM: Tilemap 2
a000-bfff Cartridge RAM
c000-cfff Working RAM bank 0
d000-dfff Working RAM bank N
e000-efff Mirror of working RAM bank 0
fe00-fe9f Sprite attributes
fea0-fe9f Reserved
ff00-ff7f Memory mapped I/O
ff80-fffe Stack space
ffff Interrupt enable

A note about “banking”: A cartridge may have more than 0x8000 (= 32KB) bytes of ROM, but only 32KB is directly accessible at any one time. To enable access to more ROM, the range 0x4000 - 0x7fff is switchable between different “banks”, where each bank is a 16KB chunk of cartridge ROM. You can access bank 0 at the same time as any of the other banks, but because you must switch to access the other banks, you cannot access (for example) banks 2 and 3 at the same time.

A similar system is used to provide access to extended cartridge RAM, via the range 0xd000-0xdfff.

The range 0xff00-0xff7f is used instead of hardware ports (the GameBoy CPU lacks the in/out instructions) to access hardware. Some relevant values in this range:

Cartridge ROM layout

On the GameBoy, the first 160 bytes of the cartridge are reserved for interrupt handlers. We could either fill these with dummy handlers (reti to immediately return from an interrupt) or just disable interrupts entirely. We do the latter.

INCLUDE "hardware.inc"

; Prior to address $100 is the interrupt table; we would have to set this up
; except that we disable 
SECTION "GB Header", ROM0[$100]

EntryPoint:
    di                          ; Disable interrupts
    jp Begin                    ; Jump to executable start

; Header data...

The next portion of the cartridge must be laid out exactly as written: a nop followed immediately by a jp over a block of data. This data encodes the Nintendo logo displayed when this system starts up. If the system does not find the logo exactly at the address expected, the system will not start. (This system was Nintendo’s way of trying to thwart “unauthorized” cartridges: the logo was copyrighted by Nintendo, so in order to use in in a cartridge, you would need a Nintendo license, otherwise they could sue you for violating the copyright on their logo. However, when this system was actually tested in court, Nintendo lost, as the court ruled that, because creating “compatible” cartridges was explicitly allowed by law, and “copying” the logo was required for a cartridge to be compatible, this copying did not infringe on Nintendo’s copyright.)

; Nintendo logo: 0104 - 0133
; This will be added by RGBFIX

; 0134-0142: Game name (upper-case)
db "CSCI241GAME.COM"

; GameBoy (00) /GameBoy Color (80) flag
db $00

; Manufacturer code
db 0,0

; Super GameBoy flag
db 0

; Cartridge type
db 0

; Cartridge ROM size
db 2            ; 0 = 32KB, 1 = 64KB, 2 = 128KB

; Cartridge RAM size
db 3            ; 3 = 32KB

; Country code
db 1            ; 1 = USA

; License code (must be 0x33)
db $33

; Cartridge version num.
db 0

The header portion of a GameBoy ROM ends with a pair of checksum values; these are the sum of all the bytes in the header, and all the bytes in the entire ROM, stored as ones-complement. We will use an external tool to update these to the correct values once we have finished writing our ROM.

; Header checksum: sum of bytes 0134-014c
db 0

; ROM checksum: sum of all ROM bytes
dw 0

When we assemble our program, we will use the rgbfix tool to both add the logo image, and to update the checksums:

rgbfix -p 0 -f lhg mygame.rom

After the header, the executable portion of the cartridge begins in proper:

Begin:
    ld  sp, $ffff              ; Setup stack pointer

Setting up the screen

In order to display anything, we need to setup the screen to display a tilemap, a mapping from tile indexes (in the tileset) to tile graphics. Furthermore, because the tilemap can be scrolled around (as a character moves around), we must set the tilemap origin to (0,0):

    TilemapX        equ    $ff42
    TilemapY        equ    $ff43

    xor a                       ; Set a = 0

    ld hl, TilemapX             ; Addr. 0xff42 = Tilemap origin X, 0xff43 = Y
    ld [hl], 0                  ; Set address 0xff42 = 0
    inc hl
    ld [hl], 0                  ; Set address 0xff43 = 0

Anything that involves changing the tilemap or the tileset should be done with the screen turned off, to prevent graphical artifacts. This is because we will be copying data to video RAM. In order to turn off the screen, we have to wait for the next vertical refresh and then clear bit 7 of address 0xff40.

WaitLCD:
    ld      a, [$ff44]         ; Load value at addr. 0xff44 into A (scanline)
    cp      145                 ; Compare A with 145
    jr      nz, WaitLCD         ; Loop until equal

; Now the vertical refresh is in scanline 145, off the bottom of the visible
; screen. We can turn off the screen without it producing garbage.

    ld      hl, $ff40          ; Addr 0xff40 = LCD control
    res     7, [hl]             ; Reset bit 7 of (0xff40)

; Now the screen is off.

Note that we’ll have to reverse this process to turn it back on again once we have everything configured.

In order to display any tiles from a tileset, we have to copy it into the tileset part of video memory, which starts at address $9000. Because tiles can technically contain four “colors” (four shades of gray), each tileset has two “bitplanes” (because 4 different values requires 2 bits to represent). The font we are using contains data for both bitplanes, so we don’t need to worry about them.

    ld        de, Tileset       ; Load address of tileset (from cart.)
    ld        hl, $9000         ; Load address of video RAM
    ld        bc, TileSetEnd-Tileset ; Size of tileset in bytes

CopyTilesetLoop:

    ld        a, [de]           ; Load byte from tileset

    ldi       [hl], a           ; Write to bitplane 1
    ldi       [hl], a           ; Write to bitplane 2

    inc       de
    dec       bc

    ; This checks to see if BC == 0
    ld        a, b
    or        c
    jr        nz, CopyTilesetLoop

To compare BC to 0, because the cp instruction is only 8-bit, we use a trick: We bitwise-OR the high and low bytes of BC with each other. If the result is 0, it can only be because both B and C are zero, indicating that BC is 0. (The or instruction can only bitwise-OR with the A register, which is why we first copy B into A.)

In order to display our tileset, we have to configure some colors:

(Note that the procedure for the GameBoy Color is different, as that uses actual colors.)

“Printing” text

With our tileset (containing text characters) copied into video RAM, “printing” text is just a matter of setting the correct entries in the tilemap to point to the characters from the tileset.

The tilemap starts at address 0x9800 and although the screen is only 20 tiles wide, the tilemap is 32 tiles wide. The extra 12 rows are used for scrolling. The tilemap is stored using row-major order, so to compute the address of a tile at location (x,y) use

    Address = 0x9800 + x + y * 32 

The multiplication by 32 must be done via a shift (shift left by 5), because the Z80 does not have multiplication.

    ld      hl, $9800                ; Starting address
    ld      de, HelloWorldStr        ; "Text" (tile indexes) to display

.copyString:
    ld      a, [de]                  ; Read byte from string
    ldi     [hl], a                 ; Write to destination and increment address
    inc     de                      ; Increment text index
    and     a                       ; Check for terminating NUL
    jr      nz, .copyString         ; Loop until NUL

Finally, we’ll also want to shut the sound system down, so it isn’t making noise:

xor a
ld [$ff26], a                       ; Sound on/off control

Since we are done writing to video RAM, we can now safely turn the screen back on:

    ld      hl, $ff40          ; Addr 0xff40 = LCD control
    set     7, [hl]             ; Set bit 7 of (0xff40)

(It’s important that we wait until after we done writing to the tilemap to turn the screen on; if you write to the tilemap while the screen is active, some tiles in the map will randomly not be displayed.)

Note that because the screen is turned off, there’s no need to wait for a vertical refresh before turning it back on.

Just like we did in our bootloader, we’ll end with an infinite loop, so that our text continues to be displayed:

.loopForever:
    jr .loopForever

Data: Font and Text

The last thing we need to do is embed the font data and string we want to use I’m using a font included with RGBDS, for convenience. As in our bootloader, we treat the area after the infinite loop as a “data” section, because it is unreachable.

Tileset:
INCBIN  "font.chr"
TilesetEnd:

HelloWorldStr:
    db  "Hello, world!", 0

Final assembly

To assemble this into a working cartridge, we assemble it into an object file using

rgbasm -o game.o game.asm

which we then “link” to produce a cartridge file:

rgblink -o game.gb game.o

The linker doesn’t actually link the object file with anything, it just places the sections in the cartridge ROM.

and finally, we use rgbfix to add the logo and update the checksums:

rgbfix -v -p 0 game.gb

At this point, we should be able to load the “cartridge” game.gb into your favorite emulator and start it up. If you have access to a physical flash cartridge, you can run it on an actual GameBoy. (I’m running it in Visual Boy Advance, which, among other things, allows remote GDB connections for debugging your code, similar to the way we could connect to QEMU from GDB to debug our bootloader.)

Hello World on a TI-83 Graphing Calculator (emulator)

A number of TI graphing calculators ran Z80 processors, hence we can write Z80 assembly for them. Unlike the GameBoy, TI calculators run a “stock” Z80 with no modifications.

We’re going to use the Pasmo assembler and the Oysterpac tool for “packing” raw binary data into TI calculator executables. The two step process is simply to first assemble our source code into a binary file:

    pasmo file.asm file.bin

and then run Oysterpac on it to produce a packed executable:

    oysterpac file.bin file.83p

This will produce a TI executable named file which can be run in a calculator emulator or transfered via link cable to an actual calculator.

Pasmo assembler

The syntax used by Pasmo is similar to that used by RGBASM, with a few alterations:

Hello, world!

The GameBoy Hello World example was rather lengthy, because the GameBoy is a) not intended to display text, so we had to setup a font and b) provides very little in the way of system routines to help do anything. The TI system is the opposite: except when graphing, it mostly displays text and it provides a great deal of system routines to help with everything, including printing Hence, Hello World for a TI-83+ is much shorter.

Before we start with the code proper, it’s helpful to define some macros and definitions:

;; 
;; Executable code
;;
org      ProgStart

    ; Ti-83+/Ion Shell programs must start with these two bytes:
    db #bb, #6d

    ; Clear the LCD (system routine)
    rom_call   _clearlcd

    ; Set the "pen" column where text will be printed
    ld     hl, 0
    ld     (#86D7), hl

    ld     hl, msg

    ; Print text, followed by newline
    rom_call _puts
    rom_call _newline
    ret

msg
    db "Hello, world!", 0