Course overview

This is “Computer organization and assembly language”, focusing on the low-level operations of the computer and how we can interact with them.

Useful links:

Course website: https://staffwww.fullcoll.edu/aclifton/courses/cs241/index-private.html
Discord invite: https://discord.gg/8wsRZt3j2A

Course outline and grading

My plan for this course is to have

Short open-book quizzes every other Thursday. These are mostly just for you (and me) to gauge your progress.
Group assignments on the alternate Thursdays.
Periodic assignments, usually one every couple weeks.
A midterm exam roughly in the middle of the semester; closed book, notes, computer.
Final exam at the end, open book, open notes, open computer.

Points from assignments and in-class projects go into an “assignments” pool, while points from quizzes, midterm, and final go into a “tests” pool, and at the end of the semester, your grade in the course is the smaller of the two. Thus, you have to keep up both with the assignments and with the tests.

Additional requirements to get a C or better

As this is now effectively an online course, there are a few additional tasks you must complete in order to get a C or better:

Login to the server and create the cs241 directory (this is “assignment 0”, which does not count as a real assignment)
Sign the syllabus assignment which I’ll post to Canvas later this week.
Post to the Discord once per week.

Assembly language

How is assembly different from C or C++?

Assembly is lower-level, closer to what the CPU natively understands. What the CPU understands is a stream of pure bytes, but no one wants to program by writing raw byte values, so assembly language is one step above this: the different commands understood by the CPU (called opcodes) are assigned human-readable names like add, mov, etc. These names are called mnemonics.
Compared to C/C++, assembly has fewer “tools” available. There is no “standard assembly library”. If you want string processing, you need to write it yourself.
Assembly is not portable to other types of CPUs (x86 vs. ARM) or to other operating systems (Windows vs. Linux) or even to other assembler programs (YASM vs. MASM vs. GAS).

Normally, the process we go through to compile a C/C++ program looks like this:

                    compile                   link
C/C++ source code     -->      object code     -->     executable

where the object code is a stream of opcodes or instructions, to be run by the CPU. A single statement in C/C++ may compile into a large number of opcodes. Even just a simple assignment

x = y;

may need to do a fair amount of work at the CPU-level, depending on where x and y are located in memory, whether or not they have the same type, etc.

Hence, there is a disconnect between the “instructions” we write in our programs, and the actual amount of work done by the CPU, in terms of the number of opcodes it has to execute.

In assembly language, assembly “statements” (operations) are one-to-one with CPU instructions. Each line of code in an assembly-language program is guaranteed to translate into a single CPU instruction. On the one hand, this means that assembly gives us a good handle on how much work the CPU is doing. On the other hand, many of the conveniences we are used to in C/C++ (so used to we don’t even think of them as conveniences, but just as things that are always there!) don’t exist at the CPU level. There are no CPU instructions corresponding to “for-loop”, or “if-else”, or “variable declaration”. We will have to build all of these things up, from a very primitive set of operations.

Because assembly instructions map one-to-one with CPU instructions, every type of CPU has its own assembly language. Assembly on an Intel CPU will be completely different from assembly on an ARM CPU (such as most cellphones run), and again totally different from the AVR CPUs used on Arduino. Assembly is, unlike C/C++, not portable.

Even if we stick to a single CPU type, there is also no guaranteed portability between different assemblers and operating systems. Unlike C/C++, where an international committee decides what is, and is not, “standard C/C++”, assembly is not controlled by anyone. Hence, assembly written for YASM (the assembler program we will use) may not work on GAS or NASM or the Microsoft assembler. Operating systems impose another layer of incompatibility: because there is no “standard assembly library”, assembly programs written for one operating system may not be portable to another. An assembly program written on Windows in the Microsoft assembler (such as exists in Visual Studio) will not run on Linux; not just because the assemblers are different, but also because the operating system interfaces are different.

In this course, we will use the YASM assembler, targeting 64-bit Intel CPUs (commonly called “x86-64”) on Linux. You can work on your assignments on our server, fccsci.fullcoll.edu, or on your own computer, but if you work on your own computer it is your responsibility to ensure that your setup is identical to the server’s, so that your code still works on the server.

(One way to do this is to setup a virtual machine on your personal computer. I’ve written up some instructions on how to do this.)

Assignments must be submitted on the server, by placing them in a subdirectory named cs241, usually with directories under that named after the assignment (e.g., cs241/assign1). You will have to create the cs241 directory yourself, when you first log in (mkdir cs241).

We’ll use the GDB debugger to debug our assembly programs. In C++ your first tool for debugging might be to add a few couts where things are going wrong, but the mere act of adding printing to an assembly program might require rewriting at least the function in which you are printing, possibly even your whole program! So obviously that is not a viable approach, so we’ll become quite familiar with GDB, which, fortunately, understands assembly.

Computer organization

Computer organization refers to how the computer is structured internally: how memory, the CPU, I/O devices, etc. are connected together and interoperate. Although we’ll primarily focus on the organization of the computers we’re actually using, sometimes we’ll look at other computer systems (MIPS, ARM, etc.) for comparison. Of course, it’s important to keep these different systems separate in your mind!

Digital circuits

CPUs are implemented in terms of digital circuits, which are built out of logic gates. This is a level even lower than that of assembly language. We’ll look at digital circuits a bit, just to get a feeling for how the CPU “really” works, but mostly we’ll work in assembly language.

Terminology review

Byte: the smallest unit of computer memory that can be individually addressed. For us, a byte will always be 8 bits, but it’s worth bearing in mind that this is not true on all systems; there are weird systems out there where a byte is 10 bits or 7 bits or whatever.

The bit positions in a byte are numbered 0 through 7, from right to left:

Bit value	0	0	1	0	1	1	0	1
Bit pos.	7	6	5	4	3	2	1	0

Word: two bytes (16 bits)

When looking at a word as two bytes, we refer to the first word (the one occupying the first/lower 8 bits) as the “low” byte, and the second as the “high” byte. E.g.,

Word
High byte	Low byte

Similarly, if we were numbering the bit positions in a word, the bits in the low byte would be numbered 0-7, while in the high byte they would be 8-15.

This generalizes to the low and high words of a double-word (dword), low and high dwords of a quad-word (qword), etc. Similarly, in a byte, the bit at position 0 is the “low” bit, while the bit at position 7 is the “high” bit.

Double-word or dword: two words; i.e., four bytes (32 bits)

Quad-word or qword: four words; i.e., 8 bytes (64 bits).

This generalizes to “double-quad-words” (16 bytes), “quad-quad-words” (32 bytes), etc. but we won’t need those very often.

KB: Kilo-byte, where “kilo” here is a binary thousand: 2¹⁰ = 1024 bytes. The capital B at the end indicates that we are measuring bytes; a small b indicates bits.

Kb: Kilo-bit, 1024 bits. We won’t need this very often, but bit-sized quantities are common in communications (e.g., bandwidth is often measured in mega-bits). A kilobit is “approximately” 1 thousand.

MB: Megabyte, 2²⁰ = 1024² = 1,048,576 bytes. Note that this is approximately 1 million.

GB: Gigabyte, 2³⁰ = 1024³ = 1,073,741,824. Note that this value is approximately 1 billion.

(and so forth to TB, PB, etc.)

The difference between a binary “million” (1,048,576) and a decimal million (1,000,000) explains the difference you sometimes see between the capacity of a drive as advertised on the box, vs. the capacity you see in your operating system. Your operating system will use the binary measure, while the box uses decimal (because decimal makes the capacity appear larger). So a drive that is advertised as being “500GB” (= 500 × 1,000,000,000 = 500,000,000,000 bytes) will appear in your operating system as

    500,000,000,000
    ——————————————— = 465 GB
     1,073,741,824

Number systems

Decimal: base-10 number system that we all know and love. The only digits available are 0-9 (i.e., it’s “base 10” because there are 10 possible digits)

Binary: base-2 number system; i.e., numbers where the only digits available are 0 and 1 (base 2 because there are two possible digits).

Octal: base-8 number system; the only digits available are 0-7. (We won’t use this much.)

Hexadecimal: base-16 number system; the digits available are 0-9, a (= 10), b (= 11), …, f (= 15).

We’ll review binary and hex arithmetic later.

Note that no one of these number systems is “better” or “more correct” than any other. E.g.,

   21   ==   10101b   ==   0x15   ==   025
decimal      binary        hex.       octal

Internally, the computer stores everything as binary, but that is less relevant, even in assembly language, than you might think! We can add/subtract/etc. binary numbers as easily as any other number, and we don’t have any way of directly accessing the individual bits, so most of the time the fact that the computer is using binary isn’t important to us.

Both C/C++ and assembly allow us to write numbers in source-code in any of the above number systems, just by using a different format:

Syntax	Base
`21`	Decimal
`10101b`	Binary (`b` at the end)
`0x15`	Hexadecimal (`0x` at the beginning)
`025`	Octal (leading `0`)

Note that b, 0x, etc. are not part of the number itself; they are just notational elements used to make the different number formats distinct. The compiler/assembler does the work of translating the numbers into the internal format used by the computer.

Internally, all of these numbers are identical. E.g., in C++, you can do

int x = 21;

if(x == 0x15) { 
    ⋮
}

and the if will always come out as true.

Similarly, when we print out a number (via cout or printf) it is normally printed as decimal, but through various flags we can ask for hexadecimal. The runtime library does the work of translating the internal representations back into decimal/hexadecimal. Later on in the semester, we’ll have an assignment to print out numbers manually (because assembly language doesn’t have a standard library to do it for us!).

Digital Circuits

The CPU is implemented as a complex set of digital circuits. Digits circuits are built out of logic gates (which, in turn, are built using transistors). In a digit circuit design, we show how logic signals (on/off values) flow from inputs, through logic gates, to outputs. A logic signal is high (on) if there is current flowing through it, and low (off) if there is not (or a very small amount of current).

The basic types of logic gates are:

NOT: Single input, single output gate which inverts its input. If the input is high, the output is low, and vice versa.

The C/C++ operator for NOT is ~ (this is bitwise NOT, which is different from logical NOT, !).
AND: Two input, single output gate: the output is high if and only if both inputs are high, otherwise it is low.

The C/C++ operator for AND is & (this is bitwise AND, which is different from &&, logical AND).
OR: Two input, single output gate: the output is high if either or both of its inputs is high, otherwise (if both inputs are low) it is low.

The C/C++ operator for OR is | (this is bitwise OR, again, different from logical OR, ||).
XOR: Two input, single output gate: the output is high if either of its inputs is high but not both, otherwise (if both inputs are high, or both are low), the output is low. Effectively, the output is high if the inputs are different (one high and one low), and low if they are the same.

The C/C++ operator for XOR is ^ (this is bitwise XOR; there is no logical XOR operator in C/C++). Note that C/C++ does not have an exponentiation operator.
NAND: An AND gate with a NOT attached to the output. That is, the output is low if both inputs are high, and high otherwise.

There is no C/C++ operator for NAND, although you can combine & with ~ to achieve the same effect.
NOR: An OR gate with a NOT attached to the output. The output is high if both inputs are low, and low otherwise.

There is no C/C++ operator for NOR, although you can combine | with ~ to achieve the same effect.
XNOR: An XOR gate with a NOT attached to the output. The output is high if both inputs are the same (both low, or both high), and low otherwise.

There is no C/C++ operator for XNOR, although you can combine ^ with ~ to achieve the same effect.

The first three are probably at least a bit familiar to you. There are a few things to note:

AND and OR can be extended to more than two inputs: an n-input AND gate is high if all of its n inputs are high, and low otherwise. Likewise, an n-input OR gate is high if at least one of its n inputs is high, and low if all its inputs are low.

This diagram illustrates how to built a 3-input AND gate:

Question: If XOR gates are arranged in the same configuration, what does the resulting 3-input, 1-output circuit do?
NAND and NOR gates have the property of universality: all of the other gates can be built from just NAND, or just NOR. In fact, its common for circuits to be built using just NAND gates, to simplify manufacturing.

For example, here is a circuit that is equivalent to A OR B implemented using only NAND gates (you should verify that the circuit produces the correct output for all four combinations of inputs A and B):

You can find a full reference of how to translate all the other types of logic gates into both NAND and NOR gates on Wikipedia. Assignment 1 will ask you to translate a circuit using NOT, AND, and OR into one that only uses NAND gates.

Table form of circuits

The behavior of any (stateless) m-input, n-output circuit can also be illustrated using a table showing how each combination of inputs maps to a particular set of outputs. Because each input can be either low (0) or high (1), the table will have 2^m rows and m + n columns. For example, the 3-input AND shown above

Logic diagram of a three-input AND circuit

can be defined via the following table:

Inputs			Output
A	B	C	Q
0	0	0	0
1	0	0	0
0	1	0	0
1	1	0	0
0	0	1	0
1	0	1	0
0	1	1	0
1	1	1	1

I.e., the output is high (1) only when all three inputs are high (1).

Circuits in hardware

If you were to try and implement a logic circuit in actual electronic hardware, you’d come up against several issues that were not mentioned above:

Electricity does not just flow from point A to point B (as a logic diagram would suggest), but rather only flows if a closed circuit is present. In order for a circuit to work in the real world, there must be a return connection from the final output of the circuit, back to the source which is powering the inputs. In a real circuit, these connections would of course be present, but in a logic diagram, we omit them, because they do not affect the logic of the circuit, what it actually computes.

Further complicating real-world circuit layouts is the fact that many logic gates requires a supply connection: an input that is always high, supplying power to the device.
If you try to buy an OR gate (for example), you’ll discover that you usually can’t buy just one; gates are usually only available on ICs, which will bundle several of the same type of gate together. For example, you can purchase an IC (integrated circuit) with four, or eight, or more NAND gates on a single chip. This makes sense, because in a real circuit design, you would rarely need just one gate. (The chip will have a single supply input, shared by all its gates.)
Ideally, we describe logic circuits as if the signals instantaneously switched from low to high, and vice versa, but in real-world systems this is impossible. The rise time of a circuit is the amount of time it takes for a line to go from low to high. During this transition period, the amount of current flowing through the connection is somewhere in between 0 and 1, which may cause the circuit output to be unpredictable for a brief period.

In order to work around this period of unpredictability, most digital circuits are synchronous: they use a clock to control when computation is performed. A clock is a 0-input, 1-output logic device which outputs a signal which alternates low, high, low, high, … at a regular clock rate. Typically, when the clock signal goes from low to high (the “rising edge” of the clock signal), the rest of the circuit will perform its computation, but the output of the computation will not be read until the rising edge of the next clock cycle. Thus, the output has 1 entire clock cycle to stabilize on the correct value.

In fact, even when the signal is high it will still not be at a constant level; it is simply higher than some threshold value which marks the dividing line between “low” and “high”.
Electrically, a single output cannot be connected to an unlimited number of other devices; there is a limit to the “fan-out” of an output.
Logic gates can be implemented electronically in many different ways, leading to different logic families, each with its own electrical characteristics. E.g., the voltage levels for “low” vs. “high” may be very different for different families. Note also that in most families the “low” level is not 0V, but some voltage level less than the “high” level. E.g., the transistor-transistor-logic (TTL) family uses a low voltage level between 0 and 0.8V (with respect to ground), and a high 2 and 5V. An input signal between 0.8 and 2V is in the “unpredictable” range and may be seen as high, or low, or even fluctuating between the two.

Logic circuit problems

Here are some circuits you can try to build, to test your understanding of logic circuits:

Using any gates you like, build a 4-input, 1-output circuit where the output is high if and only if exactly one of the inputs is high.
Using any gates you like, build a 4-input, 1-output circuit where the output is high if and only if exactly two of the inputs are high (but it could be any two).
Using only NAND, build a comparator circuit, a 2-input, 1-output circuit where the output is high if
- Both inputs are low
- The first input is low and the second is high
- Both inputs are high
This is equivalent to checking whether the first input is “less than or equal to” the second, where low is “less than” high.

Note that there are many different possible solutions to these problems. A more advanced course in digital circuits would teach methods for optimizing the design of a circuit, so as to minimize the number of gates used.

Beginning assembly

Here we will write and then de-construct the most basic assembly language program: one that prints the traditional “Hello, world!” message. There are two broad styles in which we can write an asm. program which interacts with the operating system (which is any interesting asm. program):

We can interact with the operating system directly, by calling its system routines. For lack of a better name, I’ll call this the “syscall-style”. This is the most direct method, but is a little unfamiliar. If we use this method, then the entry point of our program will be named _start, and we will first use the system call to print text to a file descriptor (in our case, the standard output), and then use another system call to exit with exit code 0 (success).

If we use the syscall style, our program will be totally self-contained: there will be nothing in the resulting executable other than what we write.
We can use routines from the standard C library like printf and exit. I’ll call this “C-lib-style” This requires us to link with the C library. This method is obviously much more powerful, as it puts the entire resources of the C standard library in the hands of our program, but it is also much more complex: we have to learn how to call C-style functions (how to pass argument values, how to interpret return values). If we use this method, the entry point of our program will be named main, as you are probably used to. (The C library defines _start for us, using it to do things like initialize the heap, setup internal structures, etc.)

If we use the C-lib style, then our final executable will include not just the code that we write, but also quite a bit of code added by the standard library (the code for any standard library functions we use, plus all of the code for the library’s _start).

We will use the syscall style for now, as it lets us get started faster.

;;; 
;;; hello.s
;;; Prints "Hello, world!"
;;;

section .data

msg:            db      "Hello, world!", 10
MSGLEN:         equ     $-msg

section .text

;; Program code goes here

global _start
_start:

    mov     rax,    1               ; Syscall code in rax
    mov     rdi,    1               ; 1st arg, file desc. to write to
    mov     rsi,    msg             ; 2nd arg, addr. of message
    mov     rdx,    MSGLEN          ; 3rd arg, num. of chars to print
    syscall

    ;; Terminate process
    mov     rax,    60              ; Syscall code in rax
    mov     rdi,    0               ; First parameter in rdi
    syscall                         ; End process

This can be assembled and linked with

asm hello.s

or, manually, with

yasm -g dwarf2 -f elf64 hello.s -l hello.lst
ld -g -o hello hello.o

and then run via the usual

./hello

which will print

Hello, world!

and then exit.

A step-by-step decomposition of this program: Each line consists of the following form:

label:       instruction       ; comment

All of these are optional, so only a few lines begin with a label, and many do not have comments. Also, sometimes the instruction is not a proper assembly language instruction, but a pseudo-instruction which modifies the assembler’s state in some way. The colon at the end of a label is optional, but I’ll always try to use it for clarity.

Line	Description
`section .data`	The `data` section contains (initialized) global variables and constants.
`msg: db "Hello, world!", 10`	`msg` defines a label pointing to the string “Hello, world!”, which will be copied verbatim into our assembled program
`MSGLEN: equ $-msg`	`equ` defines a constant named `MSGLEN` containing the length of the message. `$` represents our current location in the program
`section .text`	The `text` section contains the actual executable code of our program
`global _start`	We declare the `_start` label as global so that it is visible outside our program (so that the OS can find it to start our program)
`_start:`	This declares `_start` as a label pointing to the current location in the program
`mov rax, 1`	This loads the value 1 into the register `rax`, which stores the syscall code. 1 is the syscall code for “write to file”
`mov rdi, 1`	Stores 1 into register `rdi`. This is the first argument to syscall `write`, which is the file descriptor (1 is standard output)
`mov rsi, msg`	Stores `msg`, an address into `rsi`. This is the 2nd argument, the message to write
`mov rdx, MSGLEN`	Stores `MSGLEN` into `rdx`. This is the 3rd argument, the length, in bytes, to write
`syscall`	This calls the syscall loaded into `rax`, printing the string
`mov rax, 60`	60 is the syscall code for “exit process”
`mov rdi, 0`	1st argument, 0, the exit code (success)
`syscall`	Execute the system call

(Note that the default extension for assembly-language programs is .s.)

The mov instruction, like all Intel-syntax instructions, takes its operands in the form

mov dest, src

I.e., destination comes first, followed by the source(s). This makes more sense if you read it as dest = src.

Program sections

A running program, in memory, has its memory space divided into a number of different “sections”. Although all sections are part of the same address space, they are used for conceptually different things, and may have different permissions applied to them by the operating system. For example, it is common for OSes to set the .text section (where the executable machine code lives) to read only, as self-modifying code is (usually) either a bug or an exploit.

The normal memory layout of a process looks something like this:

Stack (grows down)

…

Heap (grows up)

.data section (global variables)

.text section

The fact that the stack grows down (i.e., “pushing” onto the stack actually decrements the pointer to the top of the stack) will be important later.

Besides the .data section for global variables, there is also the .bss section, which is used for uninitialized global data. The difference between the two is that for .data, when your program starts, the operating system has to copy the contents of the data section from the program on disk in memory. The .bss section, on the other hand, is for uninitialized data, so the operating system doesn’t need to copy anything, it just reserves the total amount of .bss space in the program’s address space. (There are a few other section types that we won’t use like .readonly.)

(A syscall style program actually doesn’t have a “heap”; the heap is managed by the C standard library, so if we don’t link with that, we don’t get a heap, and the stack occupies all the upper address space. All of our data must be statically allocated, or allocated on the stack.)

The above program could be made easier to read by defining some more constants, e.g.:

section .data

SYS_write       equ     1
SYS_stdout      equ     1
SYS_exit        equ     60
EXIT_SUCCESS    equ     0

equ defines an assemble-time constant; it does not take up any space in memory when the program is running, or use any space in the resulting executable. I.e., equ is equivalent to #define in C/C++.

To use these, we simply refer to them by name:

mov     rax,  SYS_exit 
mov     rdi,  EXIT_SUCCESS
syscall

db stores a sequence of bytes into the resulting executable directly. When we write

msg  db  "Hello, world!", 10

two things happen:

The sequence of bytes representing "Hello, world!" followed by 10 is written directly into the executable file.
msg is defined as a label containing the address of the beginning of this sequence of bytes. Note that we do not actually “store” the string into msg, only its address.

Because we are using syscall-style, our strings do not end with a terminating NUL (0) character. (The string above ends with 10, the ASCII character for Line Feed; this is what you get when you use the \n character escape in C/C++.) We have to know the length of the string to pass to the SYS_write syscall. We could simply count the number of bytes by hand, but that will break if we ever change the string.

As mentioned, the assembler places the string msg into the resulting executable file at some address. In fact, everything in our assembly source file has some address which it will end up at in the resulting executable. Even things like MSGLEN which take up 0 space theoretically have some notion of the “current location” in the output file. $ gets the address of the current location. $-msg subtracts the address msg from the current address, giving us the length of the string pointed to by msg. Note that this only works because we define MSGLEN immediately after defining msg; if there were any other definitions in between which took up space in the file, the computed length would be wrong.

(This also demonstrates that equ definitions can use limited arithmetic in their values; the computation is done at assembly-time, not at run-time.)

In all of our programs, we’ll have the .data section first and the .text section second, but this is just a convention. You can change the order of the sections, and even interleave them, and your program will still work.

Calling operating system functions

The process for calling a syscall works like this:

Set register rax to the code for the syscall you wish to execute. 60 is exit process, 1 is write, etc. You can find a complete reference here.
Set rdi, rsi, rdx, r10, r8, and r9 to the 1st, 2nd, 3rd, etc. arguments of the syscall, as needed.
Execute the syscall instruction

Note that steps (1) and (2) can occur in any order, but all of the register values must be properly setup before executing syscall. If the system call returns a value (neither SYS_write nor SYS_exit do), it will be in rax after the syscall returns.

Listing files

The -l noop.lst argument to yasm is optional; it instructs YASM to produce a listing file, a list of the assembly instructions we wrote, line-by-line, together with their hexadecimal opcodes. Here is the listing file for the above program:

     1                                 %line 1+1 hello_bare.s
     2                                 
     3                                 
     4                                 
     5                                 
     6                                 
     7                                 [section .data]
     8                                 
     9 00000000 48656C6C6F2C20776F-    msg db "Hello, world!", 10
    10 00000000 726C64210A         
    11                                 MSGLEN equ $-msg
    12                                 
    13                                 [section .text]
    14                                 
    15                                 
    16                                 
    17                                 [global _start]
    18                                 _start:
    19                                 
    20 00000000 48C7C001000000          mov rax, 1
    21 00000007 48C7C701000000          mov rdi, 1
    22 0000000E 48C7C6[00000000]        mov rsi, msg
    23 00000015 48BA0E000000000000-     mov rdx, MSGLEN
    24 00000015 00                 
    25 0000001F 0F05                    syscall
    26                                 
    27                                 
    28 00000021 48C7C03C000000          mov rax, 60
    29 00000028 48C7C700000000          mov rdi, 0
    30 0000002F 0F05                    syscall

The first column is the original line number, the second is the address within the assembled program relative to the current section (starting at 00000000), the third is the opcode, and the fourth is our original program.

Looking at this, we can see that the opcode for mov rax, 60 is 48C7C03C000000, mov rdi, 0 is 48C7C700000000, and for syscall its 0F05. (x86-64 uses varying instruction widths: not all opcodes are the same number of bytes; some are shorter and some are longer.)

Assembling and linking

The asm script takes care of running the assembler on all the input files, and then linking them together. (It also correctly detects whether or not you are defining _start or main as the entry point of your program, and will link with the C standard library in the latter case.)

If you want to assemble manually, the command is

yasm -g dwarf2 -f elf64 filename.s -l filename.lst

The -g parameter gives the format to use for the debugging info, so that GDB (see next section) can read it.
The -f parameter says to output an x86-64 format object file. The -l parameter says to output a list file.

To link one (or more) assembled object files together into an executable, there are two options:

If you are not using any C standard library functions, and the entry point of your program is named _start, then use ld:
```
 ld -g -o exe_name object.o files.o ...
```
If you are using functions from the C standard library, and your entry point is named main, then use gcc:
```
 gcc -o exe_name object.o files.o ...
```
This is the same command-line you would use to link a C program’s object files. (And, in fact, it can be used to link a program consisting of a mixture of C and assembly!)

(The asm script checks to see if any of your files defines main; if main is defined, it assumes you want to use C standard library functions.)

Debugging assembly programs

GDB understands assembly; we can run our program inside GDB via

gdb ./hello

We can break at the _start of the program via

break _start
run

and then use the n(ext) command to step through the program line by line. The values of registers can be printed by name, prefixed with $, e.g.,

print $rax

or change them with

set $rax = 0

You can also use info registers to print all the registers at once.

Note that when I use GDB I use a plugin called GDB dashboard, which shows the contents of the registers on every step. I’ll copy it into all of your home directories on the server, so you’ll have the same setup.

GDB will default to using AT&T syntax for its assembly. You can switch it to Intel syntax by entering the command

set disassembly-flavor intel

You can put this command in your ~/.gdbinit (before the beginning of GDB dashboard’s scripts, if you’re using that), to make it apply to all your GDB sessions.

Disassembling an existing program

You can use objdump to disassemble a compiled executable. This can be a useful way of figuring out how to do something in assembly: write it in C or C++ and then disassemble it. Of course, often the result is not as useful as you might expect: the compiler may have done some “interesting” things to your code. Remember, the compiler’s goal is to make the resulting assembly fast, not easy to understand.

Suppose we compile the traditional “Hello, world” program:

#include <stdio.h>

int main() {
    printf("Hello, world!\n");
    return 0;
}

with

gcc -c hello.c

producing the hello.o object file. We can then disassemble this with

objdump -d -M intel hello.o

producing the output


hello.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <main>:
   0:   55                      push   rbp
   1:   48 89 e5                mov    rbp,rsp
   4:   bf 00 00 00 00          mov    edi,0x0
   9:   e8 00 00 00 00          call   e <main+0xe>
   e:   b8 00 00 00 00          mov    eax,0x0
  13:   5d                      pop    rbp
  14:   c3                      ret

There’s not much here that’s useful, as it relies on a lot of scaffolding that the standard library provides (and which hasn’t been linked in yet). The call in particular is just a placeholder for the call to printf; because we haven’t linked in the definition of printf, there’s just a do-nothing call in its place.

Note that we are disassembling the object file, and not the linked executable. If we link hello.o into an executable:

gcc -o hello hello.o

and then objdump -d -M intel hello we will get a lot more assembly; the standard library does a lot of setup before running main(), and the executable contains all that code. On the other hand, it does let us see what the final main looks like:

0000000000400507 <main>:
  400507:   55                      push   rbp
  400508:   48 89 e5                mov    rbp,rsp
  40050b:   bf a4 05 40 00          mov    edi,0x4005a4
  400510:   e8 eb fe ff ff          call   400400 <puts@plt>
  400515:   b8 00 00 00 00          mov    eax,0x0
  40051a:   5d                      pop    rbp
  40051b:   c3                      ret    
  40051c:   0f 1f 40 00             nop    DWORD PTR [rax+0x0]

Here, the do-nothing call has been replaced with a call to a procedure puts which (presumably) implements printf. Note that main is a procedure; it is called from the (standard-library-provided) _start and must return to it when it’s done, hence it ends with a ret instruction. The first two instructions, as we will see, are also part of a standard “preamble” which every procedure starts with. You can look through the rest of the disassembly and find the library-provided _start procedure, as well as the definition of puts.