Course overview
This is “Computer organization and assembly language”, focusing on the low-level operations of the computer and how we can interact with them.
Useful links:
Course website: https://staffwww.fullcoll.edu/aclifton/courses/cs241/index-private.html
Discord invite: https://discord.gg/8wsRZt3j2A
Course outline and grading
My plan for this course is to have
Short open-book quizzes every other Thursday. These are mostly just for you (and me) to gauge your progress.
Group assignments on the alternate Thursdays.
Periodic assignments, usually one every couple weeks.
A midterm exam roughly in the middle of the semester; closed book, notes, computer.
Final exam at the end, open book, open notes, open computer.
Points from assignments and in-class projects go into an “assignments” pool, while points from quizzes, midterm, and final go into a “tests” pool, and at the end of the semester, your grade in the course is the smaller of the two. Thus, you have to keep up both with the assignments and with the tests.
Additional requirements to get a C or better
As this is now effectively an online course, there are a few additional tasks you must complete in order to get a C or better:
Login to the server and create the
cs241
directory (this is “assignment 0”, which does not count as a real assignment)Sign the syllabus assignment which I’ll post to Canvas later this week.
Post to the Discord once per week.
Assembly language
How is assembly different from C or C++?
Assembly is lower-level, closer to what the CPU natively understands. What the CPU understands is a stream of pure bytes, but no one wants to program by writing raw byte values, so assembly language is one step above this: the different commands understood by the CPU (called opcodes) are assigned human-readable names like
add
,mov
, etc. These names are called mnemonics.Compared to C/C++, assembly has fewer “tools” available. There is no “standard assembly library”. If you want string processing, you need to write it yourself.
Assembly is not portable to other types of CPUs (x86 vs. ARM) or to other operating systems (Windows vs. Linux) or even to other assembler programs (YASM vs. MASM vs. GAS).
Normally, the process we go through to compile a C/C++ program looks like this:
compile link
C/C++ source code --> object code --> executable
where the object code is a stream of opcodes or instructions, to be run by the CPU. A single statement in C/C++ may compile into a large number of opcodes. Even just a simple assignment
x = y;
may need to do a fair amount of work at the CPU-level, depending on where x
and y
are located in memory, whether or not they have the same type, etc.
Hence, there is a disconnect between the “instructions” we write in our programs, and the actual amount of work done by the CPU, in terms of the number of opcodes it has to execute.
In assembly language, assembly “statements” (operations) are one-to-one with CPU instructions. Each line of code in an assembly-language program is guaranteed to translate into a single CPU instruction. On the one hand, this means that assembly gives us a good handle on how much work the CPU is doing. On the other hand, many of the conveniences we are used to in C/C++ (so used to we don’t even think of them as conveniences, but just as things that are always there!) don’t exist at the CPU level. There are no CPU instructions corresponding to “for-loop”, or “if-else”, or “variable declaration”. We will have to build all of these things up, from a very primitive set of operations.
Because assembly instructions map one-to-one with CPU instructions, every type of CPU has its own assembly language. Assembly on an Intel CPU will be completely different from assembly on an ARM CPU (such as most cellphones run), and again totally different from the AVR CPUs used on Arduino. Assembly is, unlike C/C++, not portable.
Even if we stick to a single CPU type, there is also no guaranteed portability between different assemblers and operating systems. Unlike C/C++, where an international committee decides what is, and is not, “standard C/C++”, assembly is not controlled by anyone. Hence, assembly written for YASM (the assembler program we will use) may not work on GAS or NASM or the Microsoft assembler. Operating systems impose another layer of incompatibility: because there is no “standard assembly library”, assembly programs written for one operating system may not be portable to another. An assembly program written on Windows in the Microsoft assembler (such as exists in Visual Studio) will not run on Linux; not just because the assemblers are different, but also because the operating system interfaces are different.
In this course, we will use the YASM assembler, targeting 64-bit Intel CPUs
(commonly called “x86-64”) on Linux. You can work on your assignments on our
server, fccsci.fullcoll.edu
, or on your own computer, but if you work on
your own computer it is your responsibility to ensure that your setup is
identical to the server’s, so that your code still works on the server.
(One way to do this is to setup a virtual machine on your personal computer. I’ve written up some instructions on how to do this.)
Assignments must be submitted on the server, by placing them in a subdirectory
named cs241
, usually with directories under that named after the assignment
(e.g., cs241/assign1
). You will have to create the cs241
directory
yourself, when you first log in (mkdir cs241
).
We’ll use the GDB debugger to debug our assembly programs. In C++ your first
tool for debugging might be to add a few cout
s where things are going wrong,
but the mere act of adding printing to an assembly program might require
rewriting at least the function in which you are printing, possibly even your
whole program! So obviously that is not a viable approach, so we’ll become
quite familiar with GDB, which, fortunately, understands assembly.
Computer organization
Computer organization refers to how the computer is structured internally: how memory, the CPU, I/O devices, etc. are connected together and interoperate. Although we’ll primarily focus on the organization of the computers we’re actually using, sometimes we’ll look at other computer systems (MIPS, ARM, etc.) for comparison. Of course, it’s important to keep these different systems separate in your mind!
Digital circuits
CPUs are implemented in terms of digital circuits, which are built out of logic gates. This is a level even lower than that of assembly language. We’ll look at digital circuits a bit, just to get a feeling for how the CPU “really” works, but mostly we’ll work in assembly language.
Terminology review
Byte: the smallest unit of computer memory that can be individually addressed. For us, a byte will always be 8 bits, but it’s worth bearing in mind that this is not true on all systems; there are weird systems out there where a byte is 10 bits or 7 bits or whatever.
The bit positions in a byte are numbered 0 through 7, from right to left:
Bit value | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 |
---|---|---|---|---|---|---|---|---|
Bit pos. | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
Word: two bytes (16 bits)
When looking at a word as two bytes, we refer to the first word (the one occupying the first/lower 8 bits) as the “low” byte, and the second as the “high” byte. E.g.,
Word | |
High byte | Low byte |
Similarly, if we were numbering the bit positions in a word, the bits in the low byte would be numbered 0-7, while in the high byte they would be 8-15.
This generalizes to the low and high words of a double-word (dword), low and high dwords of a quad-word (qword), etc. Similarly, in a byte, the bit at position 0 is the “low” bit, while the bit at position 7 is the “high” bit.
Double-word or dword: two words; i.e., four bytes (32 bits)
Quad-word or qword: four words; i.e., 8 bytes (64 bits).
This generalizes to “double-quad-words” (16 bytes), “quad-quad-words” (32 bytes), etc. but we won’t need those very often.
KB: Kilo-byte, where “kilo” here is a binary thousand: 210 = 1024 bytes. The capital B at the end indicates that we are measuring bytes; a small b indicates bits.
Kb: Kilo-bit, 1024 bits. We won’t need this very often, but bit-sized quantities are common in communications (e.g., bandwidth is often measured in mega-bits). A kilobit is “approximately” 1 thousand.
MB: Megabyte, 220 = 10242 = 1,048,576 bytes. Note that this is approximately 1 million.
GB: Gigabyte, 230 = 1024³ = 1,073,741,824. Note that this value is approximately 1 billion.
(and so forth to TB, PB, etc.)
The difference between a binary “million” (1,048,576) and a decimal million (1,000,000) explains the difference you sometimes see between the capacity of a drive as advertised on the box, vs. the capacity you see in your operating system. Your operating system will use the binary measure, while the box uses decimal (because decimal makes the capacity appear larger). So a drive that is advertised as being “500GB” (= 500 × 1,000,000,000 = 500,000,000,000 bytes) will appear in your operating system as
500,000,000,000
——————————————— = 465 GB
1,073,741,824
Number systems
Decimal: base-10 number system that we all know and love. The only digits available are 0-9 (i.e., it’s “base 10” because there are 10 possible digits)
Binary: base-2 number system; i.e., numbers where the only digits available are 0 and 1 (base 2 because there are two possible digits).
Octal: base-8 number system; the only digits available are 0-7. (We won’t use this much.)
Hexadecimal: base-16 number system; the digits available are 0-9, a
(= 10),
b
(= 11), …, f
(= 15).
We’ll review binary and hex arithmetic later.
Note that no one of these number systems is “better” or “more correct” than any other. E.g.,
21 == 10101b == 0x15 == 025
decimal binary hex. octal
Internally, the computer stores everything as binary, but that is less relevant, even in assembly language, than you might think! We can add/subtract/etc. binary numbers as easily as any other number, and we don’t have any way of directly accessing the individual bits, so most of the time the fact that the computer is using binary isn’t important to us.
Both C/C++ and assembly allow us to write numbers in source-code in any of the above number systems, just by using a different format:
Syntax | Base |
---|---|
21 |
Decimal |
10101b |
Binary (b at the end) |
0x15 |
Hexadecimal (0x at the beginning) |
025 |
Octal (leading 0 ) |
Note that b
, 0x
, etc. are not part of the number itself; they are just
notational elements used to make the different number formats distinct. The
compiler/assembler does the work of translating the numbers into the internal
format used by the computer.
Internally, all of these numbers are identical. E.g., in C++, you can do
int x = 21;
if(x == 0x15) {
⋮
}
and the if
will always come out as true
.
Similarly, when we print out a number (via cout
or printf
) it is normally
printed as decimal, but through various flags we can ask for hexadecimal.
The runtime library does the work of translating the internal representations
back into decimal/hexadecimal. Later on in the semester, we’ll have an assignment
to print out numbers manually (because assembly language doesn’t have a
standard library to do it for us!).
Digital Circuits
The CPU is implemented as a complex set of digital circuits. Digits circuits are built out of logic gates (which, in turn, are built using transistors). In a digit circuit design, we show how logic signals (on/off values) flow from inputs, through logic gates, to outputs. A logic signal is high (on) if there is current flowing through it, and low (off) if there is not (or a very small amount of current).
The basic types of logic gates are:
NOT: Single input, single output gate which inverts its input. If the input is high, the output is low, and vice versa.
The C/C++ operator for NOT is
~
(this is bitwise NOT, which is different from logical NOT,!
).AND: Two input, single output gate: the output is high if and only if both inputs are high, otherwise it is low.
The C/C++ operator for AND is
&
(this is bitwise AND, which is different from&&
, logical AND).OR: Two input, single output gate: the output is high if either or both of its inputs is high, otherwise (if both inputs are low) it is low.
The C/C++ operator for OR is
|
(this is bitwise OR, again, different from logical OR,||
).XOR: Two input, single output gate: the output is high if either of its inputs is high but not both, otherwise (if both inputs are high, or both are low), the output is low. Effectively, the output is high if the inputs are different (one high and one low), and low if they are the same.
The C/C++ operator for XOR is
^
(this is bitwise XOR; there is no logical XOR operator in C/C++). Note that C/C++ does not have an exponentiation operator.NAND: An AND gate with a NOT attached to the output. That is, the output is low if both inputs are high, and high otherwise.
There is no C/C++ operator for NAND, although you can combine
&
with~
to achieve the same effect.NOR: An OR gate with a NOT attached to the output. The output is high if both inputs are low, and low otherwise.
There is no C/C++ operator for NOR, although you can combine
|
with~
to achieve the same effect.XNOR: An XOR gate with a NOT attached to the output. The output is high if both inputs are the same (both low, or both high), and low otherwise.
There is no C/C++ operator for XNOR, although you can combine
^
with~
to achieve the same effect.
The first three are probably at least a bit familiar to you. There are a few things to note:
AND and OR can be extended to more than two inputs: an n-input AND gate is high if all of its n inputs are high, and low otherwise. Likewise, an n-input OR gate is high if at least one of its n inputs is high, and low if all its inputs are low.
This diagram illustrates how to built a 3-input AND gate:
Question: If XOR gates are arranged in the same configuration, what does the resulting 3-input, 1-output circuit do?
NAND and NOR gates have the property of universality: all of the other gates can be built from just NAND, or just NOR. In fact, its common for circuits to be built using just NAND gates, to simplify manufacturing.
For example, here is a circuit that is equivalent to
A OR B
implemented using only NAND gates (you should verify that the circuit produces the correct output for all four combinations of inputs A and B):You can find a full reference of how to translate all the other types of logic gates into both NAND and NOR gates on Wikipedia. Assignment 1 will ask you to translate a circuit using NOT, AND, and OR into one that only uses NAND gates.
Table form of circuits
The behavior of any (stateless) m-input, n-output circuit can also be illustrated using a table showing how each combination of inputs maps to a particular set of outputs. Because each input can be either low (0) or high (1), the table will have 2m rows and m + n columns. For example, the 3-input AND shown above
can be defined via the following table:
Inputs | Output | ||
---|---|---|---|
A | B | C | Q |
0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 |
0 | 1 | 0 | 0 |
1 | 1 | 0 | 0 |
0 | 0 | 1 | 0 |
1 | 0 | 1 | 0 |
0 | 1 | 1 | 0 |
1 | 1 | 1 | 1 |
I.e., the output is high (1) only when all three inputs are high (1).
Circuits in hardware
If you were to try and implement a logic circuit in actual electronic hardware, you’d come up against several issues that were not mentioned above:
Electricity does not just flow from point A to point B (as a logic diagram would suggest), but rather only flows if a closed circuit is present. In order for a circuit to work in the real world, there must be a return connection from the final output of the circuit, back to the source which is powering the inputs. In a real circuit, these connections would of course be present, but in a logic diagram, we omit them, because they do not affect the logic of the circuit, what it actually computes.
Further complicating real-world circuit layouts is the fact that many logic gates requires a supply connection: an input that is always high, supplying power to the device.
If you try to buy an OR gate (for example), you’ll discover that you usually can’t buy just one; gates are usually only available on ICs, which will bundle several of the same type of gate together. For example, you can purchase an IC (integrated circuit) with four, or eight, or more NAND gates on a single chip. This makes sense, because in a real circuit design, you would rarely need just one gate. (The chip will have a single supply input, shared by all its gates.)
Ideally, we describe logic circuits as if the signals instantaneously switched from low to high, and vice versa, but in real-world systems this is impossible. The rise time of a circuit is the amount of time it takes for a line to go from low to high. During this transition period, the amount of current flowing through the connection is somewhere in between 0 and 1, which may cause the circuit output to be unpredictable for a brief period.
In order to work around this period of unpredictability, most digital circuits are synchronous: they use a clock to control when computation is performed. A clock is a 0-input, 1-output logic device which outputs a signal which alternates low, high, low, high, … at a regular clock rate. Typically, when the clock signal goes from low to high (the “rising edge” of the clock signal), the rest of the circuit will perform its computation, but the output of the computation will not be read until the rising edge of the next clock cycle. Thus, the output has 1 entire clock cycle to stabilize on the correct value.
In fact, even when the signal is high it will still not be at a constant level; it is simply higher than some threshold value which marks the dividing line between “low” and “high”.
Electrically, a single output cannot be connected to an unlimited number of other devices; there is a limit to the “fan-out” of an output.
Logic gates can be implemented electronically in many different ways, leading to different logic families, each with its own electrical characteristics. E.g., the voltage levels for “low” vs. “high” may be very different for different families. Note also that in most families the “low” level is not 0V, but some voltage level less than the “high” level. E.g., the transistor-transistor-logic (TTL) family uses a low voltage level between 0 and 0.8V (with respect to ground), and a high 2 and 5V. An input signal between 0.8 and 2V is in the “unpredictable” range and may be seen as high, or low, or even fluctuating between the two.
Logic circuit problems
Here are some circuits you can try to build, to test your understanding of logic circuits:
Using any gates you like, build a 4-input, 1-output circuit where the output is high if and only if exactly one of the inputs is high.
Using any gates you like, build a 4-input, 1-output circuit where the output is high if and only if exactly two of the inputs are high (but it could be any two).
Using only NAND, build a comparator circuit, a 2-input, 1-output circuit where the output is high if
- Both inputs are low
- The first input is low and the second is high
- Both inputs are high
This is equivalent to checking whether the first input is “less than or equal to” the second, where low is “less than” high.
Note that there are many different possible solutions to these problems. A more advanced course in digital circuits would teach methods for optimizing the design of a circuit, so as to minimize the number of gates used.
Beginning assembly
Here we will write and then de-construct the most basic assembly language program: one that prints the traditional “Hello, world!” message. There are two broad styles in which we can write an asm. program which interacts with the operating system (which is any interesting asm. program):
We can interact with the operating system directly, by calling its system routines. For lack of a better name, I’ll call this the “syscall-style”. This is the most direct method, but is a little unfamiliar. If we use this method, then the entry point of our program will be named
_start
, and we will first use the system call to print text to a file descriptor (in our case, the standard output), and then use another system call to exit with exit code 0 (success).If we use the syscall style, our program will be totally self-contained: there will be nothing in the resulting executable other than what we write.
We can use routines from the standard C library like
printf
andexit
. I’ll call this “C-lib-style” This requires us to link with the C library. This method is obviously much more powerful, as it puts the entire resources of the C standard library in the hands of our program, but it is also much more complex: we have to learn how to call C-style functions (how to pass argument values, how to interpret return values). If we use this method, the entry point of our program will be namedmain
, as you are probably used to. (The C library defines_start
for us, using it to do things like initialize the heap, setup internal structures, etc.)If we use the C-lib style, then our final executable will include not just the code that we write, but also quite a bit of code added by the standard library (the code for any standard library functions we use, plus all of the code for the library’s
_start
).
We will use the syscall style for now, as it lets us get started faster.
;;;
;;; hello.s
;;; Prints "Hello, world!"
;;;
section .data
msg: db "Hello, world!", 10
MSGLEN: equ $-msg
section .text
;; Program code goes here
global _start
_start:
mov rax, 1 ; Syscall code in rax
mov rdi, 1 ; 1st arg, file desc. to write to
mov rsi, msg ; 2nd arg, addr. of message
mov rdx, MSGLEN ; 3rd arg, num. of chars to print
syscall
;; Terminate process
mov rax, 60 ; Syscall code in rax
mov rdi, 0 ; First parameter in rdi
syscall ; End process
This can be assembled and linked with
asm hello.s
or, manually, with
yasm -g dwarf2 -f elf64 hello.s -l hello.lst
ld -g -o hello hello.o
and then run via the usual
./hello
which will print
Hello, world!
and then exit.
A step-by-step decomposition of this program: Each line consists of the following form:
label: instruction ; comment
All of these are optional, so only a few lines begin with a label, and many do not have comments. Also, sometimes the instruction is not a proper assembly language instruction, but a pseudo-instruction which modifies the assembler’s state in some way. The colon at the end of a label is optional, but I’ll always try to use it for clarity.
Line | Description |
---|---|
section .data |
The data section contains (initialized) global variables and constants. |
msg: db "Hello, world!", 10 |
msg defines a label pointing to the string “Hello, world!”, which will be copied verbatim into our assembled program |
MSGLEN: equ $-msg |
equ defines a constant named MSGLEN containing the length of the message. $ represents our current location in the program |
section .text |
The text section contains the actual executable code of our program |
global _start |
We declare the _start label as global so that it is visible outside our program (so that the OS can find it to start our program) |
_start: |
This declares _start as a label pointing to the current location in the program |
mov rax, 1 |
This loads the value 1 into the register rax , which stores the syscall code. 1 is the syscall code for “write to file” |
mov rdi, 1 |
Stores 1 into register rdi . This is the first argument to syscall write , which is the file descriptor (1 is standard output) |
mov rsi, msg |
Stores msg , an address into rsi . This is the 2nd argument, the message to write |
mov rdx, MSGLEN |
Stores MSGLEN into rdx . This is the 3rd argument, the length, in bytes, to write |
syscall |
This calls the syscall loaded into rax , printing the string |
mov rax, 60 |
60 is the syscall code for “exit process” |
mov rdi, 0 |
1st argument, 0, the exit code (success) |
syscall |
Execute the system call |
(Note that the default extension for assembly-language programs is .s
.)
The mov
instruction, like all Intel-syntax instructions, takes its
operands in the form
mov dest, src
I.e., destination comes first, followed by the source(s). This makes more
sense if you read it as dest = src
.
Program sections
A running program, in memory, has its memory space divided into a number of
different “sections”. Although all sections are part of the same address
space, they are used for conceptually different things, and may have different
permissions applied to them by the operating system. For example, it is common
for OSes to set the .text
section (where the executable machine code lives)
to read only, as self-modifying code is (usually) either a bug or an exploit.
The normal memory layout of a process looks something like this:
Stack (grows down) |
… |
Heap (grows up) |
.data section (global variables) |
.text section |
The fact that the stack grows down (i.e., “pushing” onto the stack actually decrements the pointer to the top of the stack) will be important later.
Besides the .data
section for global variables, there is also the .bss
section, which is used for uninitialized global data. The difference
between the two is that for .data
, when your program starts, the operating
system has to copy the contents of the data section from the program on
disk in memory. The .bss
section, on the other hand, is for uninitialized
data, so the operating system doesn’t need to copy anything, it just reserves
the total amount of .bss
space in the program’s address space.
(There are a few other section types that we won’t use like .readonly
.)
(A syscall style program actually doesn’t have a “heap”; the heap is managed by the C standard library, so if we don’t link with that, we don’t get a heap, and the stack occupies all the upper address space. All of our data must be statically allocated, or allocated on the stack.)
The above program could be made easier to read by defining some more constants, e.g.:
section .data
SYS_write equ 1
SYS_stdout equ 1
SYS_exit equ 60
EXIT_SUCCESS equ 0
equ
defines an assemble-time constant; it does not take up any space in
memory when the program is running, or use any space in the resulting executable.
I.e., equ
is equivalent to #define
in C/C++.
To use these, we simply refer to them by name:
mov rax, SYS_exit
mov rdi, EXIT_SUCCESS
syscall
db
stores a sequence of bytes into the resulting executable directly. When
we write
msg db "Hello, world!", 10
two things happen:
The sequence of bytes representing
"Hello, world!"
followed by 10 is written directly into the executable file.msg
is defined as a label containing the address of the beginning of this sequence of bytes. Note that we do not actually “store” the string intomsg
, only its address.
Because we are using syscall-style, our strings do not end with a terminating
NUL (0) character. (The string above ends with 10, the ASCII character for
Line Feed; this is what you get when you use the \n
character escape
in C/C++.) We have to know the length of the string to pass to the SYS_write
syscall. We could simply count the number of bytes by hand, but that will
break if we ever change the string.
As mentioned, the assembler places the string msg
into the resulting
executable file at some address. In fact, everything in our assembly source
file has some address which it will end up at in the resulting executable.
Even things like MSGLEN
which take up 0 space theoretically have some
notion of the “current location” in the output file. $
gets the address of
the current location. $-msg
subtracts the address msg
from the current
address, giving us the length of the string pointed to by msg
. Note that this
only works because we define MSGLEN
immediately after defining msg
; if
there were any other definitions in between which took up space in the file,
the computed length would be wrong.
(This also demonstrates that equ
definitions can use limited arithmetic in
their values; the computation is done at assembly-time, not at run-time.)
In all of our programs, we’ll have the .data
section first and the .text
section second, but this is just a convention. You can change the order of the
sections, and even interleave them, and your program will still work.
Calling operating system functions
The process for calling a syscall works like this:
Set register
rax
to the code for the syscall you wish to execute. 60 is exit process, 1 is write, etc. You can find a complete reference here.Set
rdi
,rsi
,rdx
,r10
,r8
, andr9
to the 1st, 2nd, 3rd, etc. arguments of the syscall, as needed.Execute the
syscall
instruction
Note that steps (1) and (2) can occur in any order, but all of the register
values must be properly setup before executing syscall
. If the system call
returns a value (neither SYS_write
nor SYS_exit
do), it will be in rax
after the syscall
returns.
Listing files
The -l noop.lst
argument to yasm
is optional; it instructs YASM to
produce a listing file, a list of the assembly instructions we wrote,
line-by-line, together
with their hexadecimal opcodes. Here is the listing file for the above
program:
1 %line 1+1 hello_bare.s
2
3
4
5
6
7 [section .data]
8
9 00000000 48656C6C6F2C20776F- msg db "Hello, world!", 10
10 00000000 726C64210A
11 MSGLEN equ $-msg
12
13 [section .text]
14
15
16
17 [global _start]
18 _start:
19
20 00000000 48C7C001000000 mov rax, 1
21 00000007 48C7C701000000 mov rdi, 1
22 0000000E 48C7C6[00000000] mov rsi, msg
23 00000015 48BA0E000000000000- mov rdx, MSGLEN
24 00000015 00
25 0000001F 0F05 syscall
26
27
28 00000021 48C7C03C000000 mov rax, 60
29 00000028 48C7C700000000 mov rdi, 0
30 0000002F 0F05 syscall
The first column is the original line number, the second is the address within the assembled program relative to the current section (starting at 00000000), the third is the opcode, and the fourth is our original program.
Looking at this, we can see that the opcode for mov rax, 60
is
48C7C03C000000
, mov rdi, 0
is 48C7C700000000
, and for syscall
its
0F05
. (x86-64 uses varying instruction widths: not all opcodes are the same
number of bytes; some are shorter and some are longer.)
Assembling and linking
The asm
script takes care of running the assembler on all the input files,
and then linking them together. (It also correctly detects whether or not you
are defining _start
or main
as the entry point of your program, and will
link with the C standard library in the latter case.)
If you want to assemble manually, the command is
yasm -g dwarf2 -f elf64 filename.s -l filename.lst
The
-g
parameter gives the format to use for the debugging info, so that GDB (see next section) can read it.The
-f
parameter says to output an x86-64 format object file. The-l
parameter says to output a list file.
To link one (or more) assembled object files together into an executable, there are two options:
If you are not using any C standard library functions, and the entry point of your program is named
_start
, then useld
:ld -g -o exe_name object.o files.o ...
If you are using functions from the C standard library, and your entry point is named
main
, then usegcc
:gcc -o exe_name object.o files.o ...
This is the same command-line you would use to link a C program’s object files. (And, in fact, it can be used to link a program consisting of a mixture of C and assembly!)
(The asm
script checks to see if any of your files defines main
; if main
is defined, it assumes you want to use C standard library functions.)
Debugging assembly programs
GDB understands assembly; we can run our program inside GDB via
gdb ./hello
We can break at the _start
of the program via
break _start
run
and then use the n
(ext) command to step through the program line by line.
The values of registers can be printed by name, prefixed with $
, e.g.,
print $rax
or change them with
set $rax = 0
You can also use info registers
to print all the registers at once.
Note that when I use GDB I use a plugin called GDB dashboard, which shows the contents of the registers on every step. I’ll copy it into all of your home directories on the server, so you’ll have the same setup.
GDB will default to using AT&T syntax for its assembly. You can switch it to Intel syntax by entering the command
set disassembly-flavor intel
You can put this command in your ~/.gdbinit
(before the beginning of GDB dashboard’s
scripts, if you’re using that), to make it apply to all your GDB sessions.
Disassembling an existing program
You can use objdump
to disassemble a compiled executable. This can be
a useful way of figuring out how to do something in assembly: write it in
C or C++ and then disassemble it. Of course, often the result is not as useful
as you might expect: the compiler may have done some “interesting” things to
your code. Remember, the compiler’s goal is to make the resulting assembly
fast, not easy to understand.
Suppose we compile the traditional “Hello, world” program:
#include <stdio.h>
int main() {
printf("Hello, world!\n");
return 0;
}
with
gcc -c hello.c
producing the hello.o
object file. We can then disassemble this with
objdump -d -M intel hello.o
producing the output
hello.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
0: 55 push rbp
1: 48 89 e5 mov rbp,rsp
4: bf 00 00 00 00 mov edi,0x0
9: e8 00 00 00 00 call e <main+0xe>
e: b8 00 00 00 00 mov eax,0x0
13: 5d pop rbp
14: c3 ret
There’s not much here that’s useful, as it relies on a lot of scaffolding
that the standard library provides (and which hasn’t been linked in yet).
The call
in particular is just a placeholder for the call to printf
; because
we haven’t linked in the definition of printf
, there’s just a do-nothing
call in its place.
Note that we are disassembling the object file, and not the linked
executable. If we link hello.o
into an executable:
gcc -o hello hello.o
and then objdump -d -M intel hello
we will get a lot more assembly; the
standard library does a lot of setup before running main()
, and the executable
contains all that code. On the other hand, it does let us see what the final
main
looks like:
0000000000400507 <main>:
400507: 55 push rbp
400508: 48 89 e5 mov rbp,rsp
40050b: bf a4 05 40 00 mov edi,0x4005a4
400510: e8 eb fe ff ff call 400400 <puts@plt>
400515: b8 00 00 00 00 mov eax,0x0
40051a: 5d pop rbp
40051b: c3 ret
40051c: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
Here, the do-nothing call has been replaced with a call to a procedure
puts
which (presumably) implements printf
. Note that main
is a
procedure; it is called from the (standard-library-provided) _start
and
must return to it when it’s done, hence it ends with a ret
instruction. The
first two instructions, as we will see, are also part of a standard “preamble”
which every procedure starts with. You can look through the rest of the
disassembly and find the library-provided _start
procedure, as well
as the definition of puts
.