Review of last time
We built a simple bootloader which used BIOS services to print the text “Hello, world!” to the screen:
; Set 80x25 text mode
mov ah, 0x0
mov al, 0x3
int 0x10
; Print text
mov si, 0 ; Memory index/cursor position
print:
; Print character
mov ah, 0x0a ; Subfunction = write char
mov al, byte [si + string]
mov bh, 0 ; Page = 0
mov cx, 1 ; Write count = 1
int 0x10
; Move cursor
inc si
mov ah, 0x02 ; Subfunction = set cursor pos.
mov bh, 0 ; Page = 0
mov dh, 0 ; Cursor row = 0
mov dx, si ; Cursor col = si
mov dh, 0
int 0x10
cmp si, strlen
jne print
; Infinite loop
forever: jmp forever
; Unreachable,
string: db "Hello, world!"
strlen: equ $-string
screen_addr: equ 0xb8000
Today, we’re going to try to get ourselves into 32-bit protected mode (at startup, the system is in 16-bit real mode for compatibility with old software). This involves a fair bit of code, so first we’ll need to write a two stage bootloader. The first stage, which is loaded automatically and can be at most 440 bytes, exists only to (manually) load the second stage from disk and then start running it. The second stage will have no size restrictions.
Two-Stage Bootloader
Because the BIOS only loads the first 512 bytes of the disk into memory automatically, we will have to load the remainder ourselves. This will involve invoking interrupt 0x13, which is used for disk-related operations. We have to perform two steps:
Reset the disk (subfunction
ah = 0x0
)Perform an extended read to load n blocks into memory (subfunction
ah = 0x42
)
Interrupt 0x13 covers disk-related functions, and subfunction ah=0x42
does
an extended read from disk into memory. dl
should be 0x80
(drive number),
ds:si
should contain the
address of a structure describing what we want to load and where to load it:
struct disk_addr_pkt {
unsigned char sz; // Size of packet = 0x10
unsigned char _res; // Reserved, do not use
unsigned short blk_cnt; // How many blocks to transfer?
void* buffer; // Address to load into
unsigned long blk_num; // Starting block number
};
“Blocks” are not bytes; 1 block = 512 bytes. (The disk may think of “blocks” differently, but so much code is written using this assumption that the BIOS does the translation for us.)
The disk address packet must be aligned in memory to a multiple of 2 (i.e., on a word boundary). The “size” of the packet is stored inside the packet, because there are two versions of the packet structure: the 16/32-bit version we are using above, and a 64-bit version where the address and block-count can be 64-bit quantities.
;;;
;;; two-stage.s
;;; Illustrates a two-stage loader, where the first stage invokes the BIOS
;;; to load the second stage.
;;;
bits 16
org 0x7c00
start:
origin: equ 0x7c00
blk_count: equ (end - loaded_code) / 512 + 1
; -----------------------------------------------------------------------------
; First stage loader
; Reset disk
mov ah, 0x0 ; Subfunction reset
mov dl, 0x80 ; Disk number
int 0x13
; Load blocks
mov ah, 0x42 ; Int 0x13, subfunction Extended Read
mov dl, 0x80 ; Drive num
mov si, disk_packet ; Packet address
int 0x13
jmp loaded_code
; ----------------------------------------------------------------------------
; Begin "pseudo-data" section
string: db "Hello, world!"
strlen: equ $-string
screen_addr: equ 0xb8000
align 2
disk_packet: db 0x10 ; Packet size
db 0 ; Reserved
dw blk_count ; Block count
dd loaded_code ; Addr. to load
dd 1 ; Starting block
; Pad remainder with 0 bytes
times 510 - ($ - $$) db 0
; Write boot signature at end
dw 0xaa55
; -----------------------------------------------------------------------------
; Begin second-stage loader
loaded_code:
; Set 80x25 text mode
mov ah, 0x0
mov al, 0x3
int 0x10
; Print text
mov si, 0 ; Memory index/cursor position
print:
; Print character
mov ah, 0x0a ; Subfunction = write char
mov al, byte [si + string]
mov bh, 0 ; Page = 0
mov cx, 1 ; Write count = 1
int 0x10
; Move cursor
inc si
mov ah, 0x02 ; Subfunction = set cursor pos.
mov bh, 0 ; Page = 0
mov dh, 0 ; Cursor row = 0
mov dx, si ; Cursor col = si
mov dh, 0
int 0x10
cmp si, strlen
jne print
; Infinite loop
forever: jmp forever
end:
; Pad so there's a good number of blocks used in the disk
times 1024 * 1024 db 0
Entering 32-bit protected mode
Now that we have a lot more room for code (and data), we can work on switching the system to 32-bit protected mode. The basic steps for entering 32-bit protected mode are
Disable interrupts. We don’t want an interrupt to fire while we are changing the system mode, as the interrupt handler won’t work correctly.
Enable the A20 line, to allow for the larger address space. (Remember that in 16-bit mode, we can only access a 20-bit address space.)
Load the Global Descriptor Table (GDT) with segment offsets. In 32-bit mode, instead of the values in the segment registers being used directly, as segment addresses, they are indexes into a table, the GDT, where each entry in the table contains information about that segment.
Switch to 32-bit mode, by setting the low bit of register
CR0
.
See http://www.osdever.net/tutorials/view/the-world-of-protected-mode for a tutorial on entering 32-bit protected mode.
Disabling interrupts
There are several situations where we want to ensure that interrupts do not
interrupt our code. The simplest way to get this is to disable them
entirely. Whether or not interrupts are enabled is controlled by the
interrupt flag IF
, which can be cleared with the cli
instruction, and
set (re-enabled) with sti
. So to turn off interrupts temporarily we do
cli
; Remainder of code here
Note that this leaves the non-maskable interrupts still enabled. NMIs are interrupts which are so important they should never be disabled. They are typically fired by RAM errors and other unrecoverable hardware errors. Disabling NMIs is more complex:
in al, 0x70
or al, 0x80 ; Set bit 4
out 0x70, al
Renabling NMI is
in al, 0x70
and al, 0x7f ; Unset bit 4
out 0x70, al
Fortunately, it’s usually OK to leave NMIs enabled.
Enable A20 line
In 16-bit mode, addresses are 20 bits wide. Bits are numbers starting at 0, so each address has bits 0 through 19. The A20 line is the “hidden” 20th bit of the address line. Old software expects there to be only lines 0-19, so it has to be explicitly enabled to gain access to the extended memory.
There are several ways to enable A20, all of them weird. The system designers didn’t have an easy way to add an extension like this, so they had to find some other part of the system which had an unused port available. In order from least to most sketchy, these methods are
Check to see it’s already enabled. Some systems start with A20 enabled. This can be done by comparing the values at two addresses which map to the same physical address if A20 is disabled, but to different addresses if it is enabled. E.g.,
mov ax, 0 mov es, ax ; Extra segment = 0 not ax mov ds, ax ; Data segment = 0xffff mov di, 0x7dfe ; This points to the "valid bootloader" mov si, 0x7e0e ; bytes at the end of the bootloader mov al, byte [ds:si] inc byte [es:di] cmp al, byte [es:di] jne a20_enabled ; A20 not enabled, so enable it... a20_enabled: ...
BIOS function: Interrupt 15 has a subfunction which can be used to enable the A20 line. Set
ax = 0x2401
and trigger interrupt 15. Int 15 has a few other A20 related functions: subfunction0x2403
can be used to check whether the BIOS supports this operation, and if it does,0x2402
can be used to check the current status of the A20 line.Keyboard controller: The original, and most crazy way, uses a spare port on the keyboard controller. Disable interrupts, then send keyboard commands
0xad, 0xd0
, read a character, send commands0xd1, 0xae
, and reenable interrupts.Fast A20: Fast A20 support is available on newer computers, and uses bit 1 on port 92:
in al, 0x92 or al, 2 ; Set bit 1 out 0x92, al
Checking bit 1 will also tell you if the A20 is already enabled. The downside to this method is that there is no way to check whether it works! And if it doesn’t work, it may do something completely different, like clearing the screen, or crashing the system. Furthermore, despite the name, the “Fast” A20 may in fact take a while to have effect, so you should do a loop afterwards which checks the A20 status and doesn’t continue until it’s enabled.
QEMU supports all of these methods, so we’ll probably go with the Fast A20 method, although a real operating system would need to use all the methods in succession, as no one method is a sure thing.
Install GDT
The GDT is an array of segment descriptors, where each segment descriptor should have the following form:
struct seg_desc {
unsigned short limit; // Segment size (low 16 bits)
unsigned short base_low; // Low 16 bits of segment base address
unsigned char base_mid; // Middle 8 bits of seg. base
unsigned char type : 5; // Segment type, attributes
unsigned char priv : 2; // Privilege level
unsigned char present : 1; // Is segment present?
unsigned char limit_high : 4; // High 4 bits of seg. size
unsigned char attr : 3; // More attributes
unsigned char granularity : 1; // Affects segment size
unsigned char base_high; // High 8 bits of segment base
};
Note that segment bases have 32-bits, spread out over the fields base
,
base_mid
, and base_high
. Segment limits (sizes) have 20 bits, meaning that
the largest segment is 1MB. The entire structure is 64-bits or one qword.
Entry 0 in the table is reserved for the null segment; if you try to use the null segment, a processor exception will occur, so we load the first entry of the table with 0:
gdt:
dq 0 ; Null segment
We will define four entries above this in the segment table:
A segment for code, starting at 0x7c00 and extending to TODO
A segment for data, starting at TODO and extending to TODO.
A segment for the stack, overlapping the previous. We’ll set the flags so that the stack grows backwards, from the end of the segment.
A segment for direct access to video memory.
Once we have the GDT setup in memory, we need to load it into the CPU, using
the lgdt
instruction. The GDT descriptor tells the CPU both the address
at which the GDT exists, and also how many entries are in it. The low 16 bits
are the size of the GDT (in bytes, not in entries!), while the high 48 bits
are the address in physical memory.
Enabling protected mode
This is as simple as setting bit 0 of control register 0 (CR0
). We cannot
modify the control registers directly, so we have to load it into eax
,
set the bit there, and then load it back:
mov eax, cr0
or eax, 1
mov cr0, eax
After switching modes, it’s important to clear the pipeline; any instructions
still in the pipeline are real-mode, and won’t make sense in the new mode. To do
this, all we have to do is issue a far jump, a jmp
with an explicit
segment. This can even be to the same segment we are in, it just has to be
there.
jmp 0x08:in_protected_mode
[bits 32]
in_protected_mode:
...
(We use segment 0x8
, because the code segment is index 1 in the GDT, and
each GDT entry is 8 bytes wide. The segment part of the address is not actually
an index into the table, but a direct byte offset from the beginning of the table.)
Each segment entry in the table is called a selector and has a rather complex structure, with a total size of 64-bits:
63-56 | 55-52 | 51-48 | 47-40 | 39-32 | 31-16 | 15-0 |
Base 24:31 | Flags | Limit 16:19 | Access | Base 16:23 | Base 0:15 | Limit 0:15 |
The most important fields are the base and limit, which specify the (linear) base address and size of this segment. Both of these fields are split up: the low 16 bits are stored first, and then later the middle 8, and then the high 8 of the base. The total base is 32-bits, while the total limit is 20 bits. (If paging is enabled, then the limit is not in bytes, but in pages.)
The Flags bits are GS00
where the the G field specifies the units for limit:
G = 0
means that limit is in bytes, while G = 1
means that limit is in 4KB
(pages), and the S bit should be 0 for 16-bit protected mode, and 1 for 32-bit
protected mode. (It’s possible to mix 16- and 32-bit segments, which can be
useful when interoperating with 16-bit code.)
The access byte is broken up as
7 | 6,5 | 4 | 3 | 2 | 1 | 0 |
Pr | Priv | S | Ex | DC | RW | Ac |
Pr: Present, is the segment actually available for use. This should be 1 for any selectors which are actually in use.
Priv: Privilege, stores the “ring level” of this segment. High-level segments cannot be accessed by code running at lower levels. 0 is the highest (kernel), while 3 is the lowest (user-level code).
S: Segment type. This should be set for code/data segments, and cleared for any system segments.
Ex: Executable. If this bit is not set, the CPU will refuse to execute code located in this segment. Should be set for code segments and cleared (for security) on data/stack segments. Note that code segments forbid writing, so to load a program into memory, you have to first set the segment to non-executable, then load it, then set it to executable only after the load is complete.
Dc: “Direction/Conforming”. Has a different meaning depending on whether this is a code segment (Ex = 1) or data segment (Ex = 0)
- For data segments, specifies whether the base of the segment is the beginning (0) or end, and correspondingly, whether the limit should be interpreted as positive or negative.
- For code segments, then code in this segment can only be run from code in the same privilege level.
RW: For code segments, set this bit to allow reading from the segment. Write access is never allowed for code segments.
For data segments, set this bit to allow writing to the segment (reading from data segments is always allowed).Ac: Accessed. Set by the CPU to 1 whenever the segment is accessed.
Configuring segment registers
We can now setup the segment registers with offsets into the GDT. The address of
the GDT is stored in the gdtr
register, loaded with lgdt
. The value of
gdtr
stores both the size (minus 1) of the GDT, and its linear address, in
the format size:address
, where size is 16 bits and address is 32. The address
is linear, meaning that if paging is enabled, it will be translated through
the page table to get a physical address. (Thus, x86 in 32-bit mode supports
segmentation on top of paging.)
Printing to the screen
We now only have one choice as to how to print to the screen: the BIOS interrupts won’t work in 32-bit mode, so we have to write directly to video memory (hence, the extra segment pointing to video RAM). Later, we’ll see that it’s possible to temporarily switch back to real-mode, provided we are careful about how we set things up.
Setting up 32-bit interrupts
After the switch to 32-bit mode, interrupts are still disabled, because the interrupt handlers in the IVT cannot run in 32-bit mode. Instead, before we can re-enable interrupts, we have to create an interrupt descriptor table, the 32-bit analogue to the IVT. Unlike the IVT, which is partly (mostly) setup by the system itself, the IDT is totally under our control, and only needs to contain handlers for hardware interrupts and exceptions; there’s no need for it to handle software interrupts (like BIOS calls) unless we want to. (Handling software interrupts is how a )
Unlike the IVT, which is hard-coded to be located at address 0x0
with a
limit (size) of 0x3ff
, the location
Each entry of the IDT is called a gate, telling where the interrupt service routines (handlers) are located. We have to setup the GDT first, because the addresses used in the IDT will be translated through the GDT.
The idtr
register defines where the IDT is located in (physical) memory, and
its size in bytes, minus 1. The low 16 bits contain the size, while the high
32 bits define the base address. The first entry of the table is for interrupt
0. The idtr
register is loaded by using the lidt
instruction, which takes
a value (either in register or memory) corresponding to the above format
(size:address). (If paging is enabled, the address of the idtr
is translated
through the page table as well!)
In 32-bit mode, each gate is 64-bits wide, structured as
struct idt_gate {
unsigned short offset_low; // Low 16 bits of offset
unsigned short segment; // Code segment selector (into GDT)
unsigned char reserved; // Reserved, must be 0
unsigned char type_attr; // Type and attribute flags
unsigned short offset_high; // High 16-bits of offset
};
The handler is located in the segment indicated by segment
in the GDT. The
offset into this segment is specified by the combination of offset_high:offset_low
.
The type_attr
member is broken up as
7 | 6,5 | 4 | 3,2,1,0 |
P | DPL | S | Gate type |
P should be set to 0 for gates that are unused (unused interrupts)
DPL sets the privilege level in which the interrupt handler will run. The processor will prevent high-privilege interrupts from being called by unprivileged (user) code.
S should be set to 0 for interrupt- and trap-type gates.
Type is one of the following bit patterns:
0101 32-bit task gate 0110 16-bit interrupt gate 0111 16-bit trap gate 1110 32-bit interrupt gate 1111 32-bit trap gate Task gates ignore the handler given in
offset
, and instead just cause an immediate task switch.Interrupt gates are gates corresponding to software interrupts (services called by user code). For an interrupt gate, the CPU will automatically disable interrupts before calling the handler, and re-enable them when it returns.
Trap gates are gates called by hardware interrupts or CPU exceptions.
The IDT is installed by calling the lidt
instruction, passing it the
(64-bit mode uses an extension of the IDT which allows for 64-bit addresses.)
Setting up paging
Segmentation is, as mentioned, a somewhat out-dated memory management method. We’d prefer to use paging, for its additional flexibility. In order to use paging, we’ll have to setup the page table, and then enable it. Remember, also that segmentation is applied on top of paging, so we’ll have to disable segmentation.
Switching back to real mode
There are some services which are only available in BIOS, thus it will be very useful to have routines which can switch back to real mode, and then return to protected mode after they are done. There are several options for doing this:
We can actually switch back to real mode. We may need to setup our memory layout for the first 1MB so that everything we need is there.
Switch to “unreal mode”. Unreal mode is a variant of real mode with a few flags set which allow it to access more than its normal memory limit. This means we don’t need to rearrange things quite as much.
Switch to virtual 8086 mode. In virtual 8086 mode the processor pretends to be in 16-bit mode, using a set of segments/interrupts we configure. This was intended to allow 16-bit programs to run unchanged in 32-bit mode, but still allow the OS to have control over them, but we can use it to run BIOS functions without too much work.
Virtual 8086 mode is interesting, but it’s really intended for 32/64-bit operating systems to run 16-bit programs; it’s not intended for the OS itself to use to talk to BIOS, so it’s rather complex to use it in that way. Instead, we’ll use the first method, switching to real mode and then back.
Switching back to real mode
The basics steps are:
Disable (32-bit) interrupts (
cli
). It can be useful to also disable the non-maskable interrupts (those not controlled by theIF
flag), but this is optional. We only have to do this if we installed a 32-bit IVT; if we left the interrupts disabled after switching to protected mode, they will obviously still be disabled.Disable paging if in use. We don’t have to throw away all our paging, we just have to make sure that the code we will be running in real mode makes sense: that it is running on a page that maps directly to physical addresses (no translation), and that the GDT (32-bit segment table) and IDT (32-bit interrupt table) are also in pages which do no translation.
If GDT is using a table larger than 16-bits, create a new GDT that is 16-bit compatible.
Far-jump to 16-bit protected mode (this is 16-bit mode but with protection still enabled). We do a far jump, again, to flush the pipeline.
Setup segment registers (according to the 16-bit segment table).
Setup IDT for real-mode. Unless you’ve changed something, the original system IDT should still be located at address
0x0
, with limit (size)0x3ff
. These can be loaded using thelidt
instruction.Disable protected mode (PE bit in CR0)
Far-jump to real-mode (again, to flush pipeline)
Reload segment registers
Setup stack pointer
Re-enable interrupts (
sti
and NMIs, if disabled)
Whew! Of course, to get back into 32-bit protected mode, we have to reload our GDT, IDT, and 32-bit segments, set the PE bit, and then far-jump to 32-bit mode. A 16-32-16 transition is slow, which makes it something we want to avoid if possible. The only things we absolutely need it for is switching video modes and a few other BIOS functions. Most BIOS services can be replicated by our own code in 32-bit mode, and later in 64-bit mode.
Interrupt handlers and the PIC
There are 256 possible interrupts that can be fired. Which of these do we need to write handlers for, and how do we write them?
Writing an interrupt handler is fairly easy: just end it with the iret
instruction rather than the normal ret
instruction.
The part of the system which interprets hardware interrupts is called the Programmable Interrupt Controller (PIC). It essentially filters/remaps interrupts as they are received, before running the actual interrupt handler. To add to the complexity of the system, there are actually two PIC chips in the system; one was not enough, but rather than redesign the chip, they just chained two of them together. One is called the master PIC and the other the slave.
Communication with both PICs is done via a pair of ports: one for commands and one for data:
Master Cmd | 0x20 |
---|---|
Master Data | 0x21 |
Slave Cmd | 0xA0 |
Slave Data | 0xA1 |
Note that between the two PICs, there are a total of 15 possible hardware interrupts. (Each PIC provides 8, and one is used for communication between the two PICs.) One of the most basic operations of the PIC is to determine the mapping from hardware interrupt numbers (0-15) to system interrupt numbers (i.e., entries in the IVD). (Internally, interrupt 2 is used for inter-PIC communication but the PIC normally remaps hardware interrupt 9 to 2, so if you receive interrupt 2, it was originally 9.)
The master PIC is responsible for interrupts 0-7, while the slave is responsible for 8-15. Each PIC has a vector offset which is added to the (hardware) interrupt number to get the (IVD) interrupt index seen by the CPU. This means that the first 8, and second 7 hardware interrupts can be mapped to different parts of the IVD. Both vector offsets must be multiples of 8.
In 16-bit mode, the default mapping is to map hardware interrupts 0-7 to system interrupts 8 - 0xf, and hardware 8-15 to 0x70 - 0x7f. In 32-bit mode, the first 32 system interrupts are reserved, so at a minimum we have to remap hardware interrupts 0-7 to a different part of the table.
In order to remap the PIC, you have to reinitialize it from scratch, essentially “rebooting” that component of the system. Thus, remapping the PIC is a rather complex procedure.
When we start protected mode, we have to reinitialize the PICs by sending the initialize command, 0x11. After this command, we send three initialization words, telling the the PIC
Its vector offset
How the master/slave connection is setup
Some additional information
See here for the details.
When an interrupt routine ends, before it calls iret
, it needs to signal
to the PIC that it is finished. This is done by issuing command PIC_EOI
,
end-of-interrupt, 0x20. This command must be sent to the PIC (master, slave)
which originated the interrupt, so depending on which PIC it came from, we
either do
mov al, 0x20 ; PIC_EOI
out 0x20, al ; Signal master
or
mov al, 0x20 ; PIC_EOI
out 0x70, al ; Signal master
Masking interrupts
The PIC has the ability to mask (temporarily ignore) certain interrupts. Each PIC has a mask register which is 8 bits wide. Each bit of the mask corresponds to one of the interrupt lines connected to that PIC. If a bit is set, then the PIC will ignore any signals on the corresponding line; if it is unset, the corresponding register functions normally.
Masking is done via the data port: read a byte from the data port to get the current mask, and then write the (modified) value back to the data port to set the mask.