Structuring larger projects
Putting all your definitions and code into a single file is rather cumbersome, so today we’re going to look at ways to structure larger project by using multiple files.
There are a few different elements involve in splitting your projects up into multiple files:
You have to make C++ aware of all the different files that are needed to actually complete your project.
You have to make sure that you never define the same thing twice, in two different places (e.g., have the same definition for a function in two different files). “Multiple definition” is an error in C++.
You have to make sure that C++ knows about the parts of your program that are not in the current file.
Let’s start with a simple example. Suppose I want to put my main
in one file,
and a function that it calls in another. So we have
main.cpp:
#include<iostream>
int main() {
cout << "Hello!" << endl;
greet();
return 0;
}
greet.cpp
#include<iostream>
void greet() {
cout << "Nice to meet you!" << endl;
}
I can compile these two files by doing
g++ -c main.cpp
g++ -c greet.cpp
and combine them together into a single program using
g++ -o greet main.o greet.o
(Or, doing both things in a single step: g++ -o greet main.cpp greet.cpp
.)
However, when I try to compile main.cpp
I’ll get an error: greet
is
not declared. And it isn’t! Notice how, when I compile main.cpp
, I make no
mention of greet.cpp
. At that point in the process, the compiler knows
nothing about greet.cpp
or what it contains.
Think back; what did we say about functions and the order in which they must be defined…
Either you have to define a function before you use it (and if we’re using multiple files this isn’t possible; we can’t have multiple definitions).
Or you have to put the declaration of the function before you use it.
The second option will work: we can add the declaration for greet()
to the
top of main.cpp
:
#include<iostream>
void greet();
int main() {
cout << "Hello!" << endl;
greet();
return 0;
}
While a function can only be defined once in a multiple-file project, you
can declare it as many times as you need. If we had other files that used
greet()
, each one would need to contain a declaration for it, to make them
aware of the fact that it exists.
Later, when we compile all the different files in the project together, C++ will
link up all the usages of greet()
with is one-and-only definition,
in greet.cpp
. If we recompile now, everything will tie up nicely and when we
run our program, we’ll see:
Hello!
Nice to meet you!
This illustrates a general principle of multi-file projects
You can spread your definitions over several files, so long as a declaration is visible everywhere a definition is used.
Separate compilation
I’ve mentioned that if you want to compile file1.cpp
, file2.cpp
and
file3.cpp
into a single executable program
, you can do this the explicit
way:
g++ -c file1.cpp
g++ -c file2.cpp
g++ -c file3.cpp
g++ -o program file1.o file2.o file3.o
or you can do it all in one step with just
g++ -o program file1.cpp file2.cpp file3.cpp
You might be wondering why we’d ever do it the first way. The answer is,
think about what happens if you modify only one of the files. Say you
edit file2.cpp
and you want to recompile. The first way you do
g++ -c file2.cpp
g++ -o program file1.o file2.o file3.o
There’s no need to recompile file1
and file3
, because they haven’t changed,
so we save a bit of time by telling the compiler to only recompile file2.cpp
.
The second, shortcut way doesn’t give us that option: it always recompiles
everything, even if it hasn’t changed.
You’re unlikely to notice the difference for a small project, but for big projects, recompiling everything may take a long time; only recompiling the things that you know you’ve changed can cut down on compilation time significantly.
Linker order
Note that the order in which I listed the files when I compiled them together
g++ -o greet main.o greet.o
i.e., main.o
first and greet.o
second, is important. In particular,
if file1.cpp
uses functions from file2.cpp
then you must put file.cpp
(or file1.o
if you compiled it separately) before file2.cpp
in the
list. The reason for this is that G++ throws away anything that isn’t used,
so if we listed greet.o
first, it would throw away the greet()
function,
before moving on to main.cpp
and realizing that, nope, we actually needed it.
It’s annoying, but them’s the breaks.
For example, suppose we have three files
//-------------------------------------------------------------
// main.cpp
void f();
void g();
void h();
int main() {
f();
g();
return 0;
}
// -------------------------------------------------------------
// one.cpp
void h();
void g() {
h();
}
void f() {
g();
}
// -------------------------------------------------------------
// two.cpp
void h() {
return;
}
What order do I have to list main.cpp
, one.cpp
and two.cpp
when I
compile them?
Include files
Manually copying the declarations to the tops of all the functions that need
them is cumbersome and error-prone (imagine if we later wanted to change the
signature of a function?). Instead, we normally put all the declarations for
a single .cpp
file (one that doesn’t have a main()
) into a header file.
E.g., for the above, in addition to greet.cpp
we would also create greet.h
:
greet.h
void greet();
This file must exist in the same directory as greet.cpp
and main.cpp
.
The entire contents of greet.h
is just the declaration of the greet()
function. If we had more functions in greet.cpp
, we’d add their decalarations
to greet.h
. Now, instead of actually putting the declaration itself at the
top of main.cpp
, we’re going to do this:
#include <iostream>
#include "greet.h"
int main() {
cout << "Hello!" << endl;
greet();
return 0;
}
The line #include "greet.h"
will pull in the contents of the greet.h
file,
just as if we had written them at that point in the file. So we get the
declaration of greet()
like before, but we don’t have to worry about putting
it into main.cpp
.
A file ending in .h
is called a header file; you don’t need to tell G++
about it when you compile because we’re #include
-ing it. Sometimes header files
are called include files, for obvious reasons.
Considering this, there are a few ways to structure larger projects:
If we know that we’ll never want to reuse any parts of the project, then we can write a single
.h
file which includes declarations for all the different functions in the project, regardless of what.cpp
file they are defined in. Every.cpp
will#include
this file, so every.cpp
file will have access to all the functions from every other file.If, however, we might want to reuse some part of our system, we might choose to create more than one header file. In this case, we would break up the project into “components”: collections of
.cpp
files that are all related, and give each component a header file. Other components could include the headers from the component(s) that they need, and we can also use a component in a totally different project by just including its header, and then listing its source files when we compile.
For example, we might have a large project consisting of files main.cpp
,
utilities.cpp
, data.cpp
, and user.cpp
. We’d put all the function
declarations into a single header file named anything we like, say,
project.h
and then all of the above files would #include
this header.
On the other hand, we might decide that utilities.cpp
is something that we
want to use in other projects. So then we’d pull its declarations out of
project.h
and put them into a separate utilities.h
. We can then either
#include "utilities.h"
in any source file that needs it, or#include "utiltiies.h"
withinproject.h
. That way, every other file can remain unchanged, but we still get the ability to use utilities separately.
Note that when we do #include <iostream>
the exact same thing happens:
iostream
is a file full of declarations for things like cin
, cout
, etc.
The angle-brackets tell C++ to look for it, not in the current directory
(which is what #include "..."
means) but rather in whatever the system paths
for built-in C++ header files are. But the idea is the same: we are pulling in
declarations for things. (We don’t have to tell G++ about these things when
we compile, because it automatically links against all the standard C++ stuff.)
Note that while you can do something like
#include "file.cpp"
you almost never should do this. If you’re including anything other than
a .h
file, you’re probably doing something wrong. Remember these two simple
rules:
Don’t ever compile
.h
files (i.e., don’t useg++ -c file.h
). Instead,#include
them into the.cpp
files that need them.Don’t ever
#include
.cpp
files. Instead, compile them with G++ and link them together.
If you #include
a .cpp
file and then also compile it normally, you will
get multiply-defined errors (because the things in the file are being defined
more than once: once in the file itself, and once everywhere where you
included it). With G++ it’s not too hard to just not compile the .cpp
, but
other compilers (like Visual Studio) will automatically compile everything
that ends with .cpp
.
Include guards
Because the contents of a header file are literally just copied into the source
file, replacing the #include "..."
, there are some problems that can crop up.
None of these occur if the include just contains function declarations, but they
can occur for includes that contain classes, so it’s a good practice to use
include guards on all include files.
An include guard is a special bit of code at the beginning and the ending of
a .h
file that makes sure that the contents of the file are only loaded once.
A header file test.h
with include guards looks like this:
#pragma once
/*
* test.h
*/
#ifndef TEST_H_
#define TEST_H_
// Contents of the file go here
#endif
Everything in the include file should go between the #define...
and the
#endif
. Note also that the name TEST_H_
is based on the name of the file:
for a file named whatever.h
you would replace this with WHATEVER_H_
.
There are three components to the include guard:
#pragma once
at the top of the file tells C++ that the file should only ever be included once. Most modern compilers support this, but…Some older compilers don’t, so we add the next bit, around all the declarations:
#ifndef TEST_H_ #define TEST_H_
This bit checks to see if there is not a special directive definition named
TEST_H_
; if there isn’t, it defines it. If there is, everything from this point is skipped until…#endif
– IfTEST_H_
was already defined, then everything between the above and this is completely skipped.
You don’t really have to worry about how it all works, but you should understand why we do it.
This bit of magical code ensures that if we accidentally do
#include "test.h"
...
#include "test.h"
we won’t get multiple definitions; the second #include
will effectively be
ignored, thanks to the include guard. (It’s unlikely that we would make
this obvious mistake. But what we might do is #include "test.h"
and then
later, include some other thing.h
which in turn also includes test.h
;
the result is the same: test.h
is included multiple times.)
An Example
Let’s take the example project (Pascal’s triangle) that’s posted to the course
website, and split it into multiple files. The example is pretty simple, it
only has main()
and two functions, so we’ll split it into three files:
pascals_triangle.cpp
containing main()
, triangle.cpp
containing the
triangle-printing code, and triangle.h
containing the function declarations
pascals_triangle.cpp
/*
pascals_triangle.cpp
Prints Pascal's triangle
*/
#include <iostream>
using namespace std;
int main() {
int rows;
cout << "Pascal's triangle!" << endl;
do {
cout << "Enter how many rows to print (at least 0): ";
cin >> rows;
} while(rows < 0);
print_pascals_triangle(rows);
return 0;
}
triangle.h
#pragma once
/*
* triangle.h
* Code for printing Pascal's triangle
*/
#ifndef TRIANGLE_H_
#define TRIANGLE_H_
void print_pascals_triangle(int rows);
void print_row(vector<int> row_values);
#endif
triangle.cpp
#include <iostream>
#include <vector>
#include "triangle.h"
/* print_pascals_triangl(rows)
Prints the first rows of Pascal's triangle, left-aligned.
rows: number of rows to print
*/
void print_pascals_triangle(int rows) {
vector<int> row_values;
for(int row = 0; row < rows; ++row) {
// Update for the next row
vector<int> new_row = row_values;
new_row.push_back(1); // Last column is always 1
// Compute the new row from the current row
for(int c = 1; c < new_row.size()-1; ++c)
new_row.at(c) = row_values.at(c-1) + row_values.at(c);
// Copy updated row back into the original
row_values = new_row;
// Print the updated row
print_row(row_values);
}
}
/* print_row(row_values)
Prints a vector<int> as a row of the triangle.
row_values: vector<int> of the values to print
*/
void print_row(vector<int> row_values) {
for(int value : row_values) {
cout << value << " ";
}
cout << endl;
}
Note that pascals_triangle.cpp
does not include <vector>
, because it
never uses vectors. You should try to only include things you actually need.
Classes
So functions go in .cpp
files while their declarations, if we want to “share”
those functions among several files, need to go in a header file (.h
) which
gets included in the .cpp
files. What about class definitions?
Here’s where things get interesting:
Class definitions go in header files. Remember what I said about C++ not allowing multiple definitions? Yeah, that doesn’t apply to classes. You can have multiple definitions of the same class in different files (or because you included the same header file in different source files), and C++ will check them all against each other to make sure that they are identical, and then will merge them into a single class definition in your program.
However, remember how we can pull the definitions of the methods out of the
class itself? If you do that, then those method definitions need to go
into a .cpp
file. So if you want to pull the method definitions out of a
class, you’re going to end up with a header file (containing the skeleton of the
class definition) and a source file (containing the actual method definitions).
A common pattern for classes is a pair of files:
A
name.h
file containing the class definition itself (or possibly multiple class definitions, if they are all related), along with declarations for any related (e.g.,friend
) functions.A
name.cpp
file containing definitions of any methods, along with function definitions. (For special cases, you might have more than one.cpp
with a single.h
.)
The “base name” of both files is the same: whatever.h
and whatever.cpp
. The
name should be based on the main class defined (if there is one), or on
whatever the purpose of the code is.
Note that however you choose to group things, you should try to put related
things together. That way, you can just #include
the header, compile and link
with the .cpp
, and get all the related functionality.
An example project structure
Let’s think about the 369 game we worked on before. We’re going to break this up into three “components”:
The board component handles the game board
The player component handles individual player objects
The game component handles the whole game
We made each of these things a class, so we’re going to put the class
definitions into header files, and then any methods that are not defined as
part of their classes, will go into a related .cpp
file.
The board component
board.h:
#pragma once
/*
* board.h
*/
#ifndef BOARD_H_
#define BOARD_H_
class board {
public:
board() {
squares = vector<vector<bool>>{9, vector<bool>{9,false}};
}
void set(int x, int y) {
if(x >= 0 && x < 9 && y >= 0 && y < 9)
squares.at(x).at(y) = true;
}
bool get(int x, int y) {
if(x >= 0 && x < 9 && y >= 0 && y < 9)
return squares.at(x).at(y);
else
return false;
}
int compute_points(int x, int y);
bool is_full();
private:
vector<vector<bool>> squares;
};
#endif
board.cpp:
#include "board.h"
int board::compute_points(int x, int y) {
...
}
bool board::is_full() {
for(int y = 0; y < 9; y++)
for(int x = 0; x < 9; x++)
if(!squares.at(x).at(y))
return false;
return true;
}
The player component
player.h:
#pragma once
/*
* player.h
*/
#ifndef PLAYER_H_
#define PLAYER_H_
class player {
public:
player(string n) {
name = n;
score += 0;
}
void set_name(string n) {
name = n;
}
void add_points(int p) {
score += p;
}
int get_score() {
return score;
}
private:
string name;
int score;
};
There is no player.cpp
, because all the of the methods are defined within
the class itself.
The game component
game.h:
#pragma once
/*
* game.h
*/
#ifndef GAME_H_
#define GAME_H_
#include <iostream>
#include "player.h"
#include "game.h"
using namespace std;
class game {
public:
game(int player_count) {
players = vector<int>{player_count};
// Setup player names
int n = 1;
for(player& p : players) {
cout << "Player " << n << " what is your name?";
string n;
cin >> n;
p.set_name(n);
}
whose_turn = 0;
}
int current_player() {
return whose_turn + 1;
}
void end_turn() {
if(whose_turn == players.size() - 1)
whose_turn = 0;
else
whose_turn++;
}
bool game_finished() {
return b.is_full();
}
void get_move();
int get_winner();
private:
board b;
vector<player> players;
int whose_turn;
}
#endif
game.cpp:
#include <iostream>
#include "game.h"
using namespace std;
void game::get_move() {
int x,y;
do {
cout << "Player " << current_player() <<
": what is your move (x,y)?";
cin >> x >> y;
} while(x < 0 || x >= 9 || y < 0 || y >= 9);
b.set(x,y);
int points = g.get_points(x,y);
players.at(whose_turn).add_points(points);
}
int game::get_winner() {
int winner = 0, high_score = players.at(0).get_score();
for(int i = 1; i < players.size(); i++)
if(players.at(i).get_score() > high_score) {
winner = i;
high_score = players.at(i).get_score();
}
return winner + 1;
}
Main
Finally, we tie all these together with a single main.cpp
which contains
main()
.
main.cpp
#include <iostream>
#include "game.h"
using namespace std;
int main() {
game g{2}; // Start two player game
while(!g.is_finished()) {
g.get_move();
g.end_turn();
}
cout << "Player " << g.get_winner() << " you're the winner!" << endl;
return 0;
}
To compile all this, we do either
g++ -c board.cpp
g++ -c player.cpp
g++ -c game.cpp
g++ -c main.cpp
g++ -o 369 main.o game.o player.o board.o
Or we can do all this at once with just
g++ -o 369 main.cpp game.cpp player.cpp board.cpp
Either way, this will produce the executable 369
which contains all the
functions, classes, and methods of any of the above source files, and it will
tie all the usages together so that everything works.
Thinking about components
When you structure a large project, you should always try to build your code into reasonably independent “components” (these don’t necessarily have to be classes!). E.g., looking at the previous example, we see that the Board component doesn’t need to know anything about the Player component, and vice versa, but the Game component needs to know about both. Having the Game depend on them is fine, if it really needs to. What you want to avoid is “accidental” dependencies: e.g., if I used some code from Board in Player when I really didn’t need to. The problem with this is now you can’t use Player without also using Board, even if you don’t actually need board.
For really big projects, you might want to put the different components into different subdirectories. E.g., if Player and Board and Game were more complex, we might have this structure:
369/
|
+--- board/
| +---- board.h
| +---- 369_board.cpp
| +---- chess_board.cpp
| +---- checker_board.cpp
|
+--- game/
| +---- game.h
| +---- 369.cpp
| +---- chess.cpp
| +---- checkers.cpp
|
+--- player/
| +---- player.h
| +---- 369.player
| +---- chess.cpp
| +---- checkers.cpp
|
+--- 369.cpp
Inside 369.cpp
we would have
#include "board/board.h"
#include "player/player.h"
#include "game/game.h"
Inside game.h
and game.cpp
we have to bring in player.h
and board.h
,
so we’ll need to look for those:
#include "../player/player.h"
#include "../board/board.h"
And then when we compile we’ll have to compile all the .cpp
files and link
them together.
Namespaces
We can use source files and headers to structure the locations of our
declarations and defintions, spreading them out over more than one file to
keep things organized. But what about the things themselves, the functions,
classes, etc.? Right now the all live in the same “space” so the name f()
always refers to a single function f
.
An interesting problem arose on Piazza where a student had written
#include <iostream>
using namespace std;
class messages {
...
};
and got an obscure error message. The problem was that inside iostream
, there
is something already defined named messages
which was conflicting with the
class definition. Is it really the case that, if someone else has used a name,
we can never use it for any of our classes or functions? Actually, no, we
brought this situation on ourselves, by doing this:
using namespace std;
The messages
declared inside of <iostream>
is actually std::messages
. If
we hadn’t asked C++ to bring in everything in std
, we wouldn’t have this
problem!
std
is what’s called a namespace; while files let us organize our
definitions “spatially”, namespaces let us organize names. Namespaces
have nothing to with file organization (although very often everything in a
set of files will live in the same namespace). Everything in <iostream>
,
<vector>
, <string>
etc. are all in the std
namespace, even though those
are different files on the system.
Using namespaces
Anyway, how do we fix our problem? The solution is not to bring in all of
std
but just the parts we need:
#include <iostream>
using std::cout;
using std::cin;
using std::endl;
This brings in just cout
, cin
, and endl
so we can use them without
putting std::
in front. If we need anything else (e.g., getline
) we will
have to write it as std::getline
. This is a good method to use when there
are a few names from std
that you’re going to be using a lot, and
writing std::
in front of them would get cumbersome. Any other names, which
maybe you only use once or twice, can be written as std::whatever
.
We can do the same thing for functions. Maybe your main
uses cin
and friends,
but the rest of your functions do not. Instead of putting using...
at the
top of your file, put it inside your main
:
int main() {
using namespace std;
...
}
(Or, if you want to get more precise, using std::cout;
, etc.)
A good rule of thumb is to only use using
for some specific thing:
cin
, cout
, string
, etc. and to try to use it in the most limited scope
possible. E.g., instead of using it over the entire file, do it inside just
the class, or method, or function that actually needs cin
etc. If you need
something just once, don’t write using
for it, just give the fully
qualified name std::whatever
name. Try to avoid using namespace std;
. It’s
OK to use it in practices and simple problems, but for anything larger, you
should be thinking about what parts of the std
namespace do you really need,
and where do you really need them.
For various technical reasons, these kinds of using
declarations cannot
appear directly within a class, only within its function members. E.g., you
can’t do
class thing {
public:
using namespace std; // Error!
using std::cin; // Also error!
};
but you can do
class thing {
public:
void print() {
using namespace std; // OK
using std::cin; // Also OK
...
}
};
Creating Namespaces
For a large project, you might want to think about creating your own
namespace(s), to be polite and keep your definitions from stepping on other
people’s code. (If we defined a class messages
then other people wouldn’t be
able to use that name.)
Right now, if you don’t create a namespace, everything you declare or define
lives in the global namespace. Although you never have to, the “fully
qualified” name of something in the global namespace is ::something
. Note
that main
must be defined in the global namespace; if you put main
in a
different namespace, the compiler won’t be able to find it.
To create a namespace, use
namespace whatever {
// Definitions and declarations go here
}
Anything you define/declare inside the curly braces is now inside the
whatever
namespace. To refer to it from outside that namespace, you’ll either
have to do using namespace whatever;
or refer to it as whatever::...
.
(Within the whatever
namespace, you don’t have to qualify names in whatever
).
You can actually spread the contents of a single namespace over different parts of your file:
namespace whatever {
// Some definitions in whatever
}
// Some things in the global namespace...
namespace whatever {
// More definitions in whatever
}
You can even split a single namespace over more than one file; C++ will grab everything that is inside a particular namespace and group it together.
You can, if you want, put one namespace inside another:
namespace one {
namespace two {
void f() { }
}
}
To refer to the function f
we have to write one::two::f
.
Generally, you don’t need to worry about creating your own namespaces
until you start to work on really big projects, or libraries. But you
should try to break the habit of writing using namespace std;
at the
top of your files; instead, just bring in the parts that you are going
to use a lot, and only bring them in where you really need them.