Structuring larger projects

Putting all your definitions and code into a single file is rather cumbersome, so today we’re going to look at ways to structure larger project by using multiple files.

There are a few different elements involve in splitting your projects up into multiple files:

You have to make C++ aware of all the different files that are needed to actually complete your project.
You have to make sure that you never define the same thing twice, in two different places (e.g., have the same definition for a function in two different files). “Multiple definition” is an error in C++.
You have to make sure that C++ knows about the parts of your program that are not in the current file.

Let’s start with a simple example. Suppose I want to put my main in one file, and a function that it calls in another. So we have

main.cpp:

#include<iostream> 

int main() {
    cout << "Hello!" << endl;

    greet();

    return 0;
}

greet.cpp

#include<iostream>

void greet() {
    cout << "Nice to meet you!" << endl;
}

I can compile these two files by doing

g++ -c main.cpp
g++ -c greet.cpp

and combine them together into a single program using

g++ -o greet main.o greet.o

(Or, doing both things in a single step: g++ -o greet main.cpp greet.cpp.)

However, when I try to compile main.cpp I’ll get an error: greet is not declared. And it isn’t! Notice how, when I compile main.cpp, I make no mention of greet.cpp. At that point in the process, the compiler knows nothing about greet.cpp or what it contains.

Think back; what did we say about functions and the order in which they must be defined…

Either you have to define a function before you use it (and if we’re using multiple files this isn’t possible; we can’t have multiple definitions).
Or you have to put the declaration of the function before you use it.

The second option will work: we can add the declaration for greet() to the top of main.cpp:

#include<iostream> 

void greet();

int main() {
    cout << "Hello!" << endl;

    greet();

    return 0;
}

While a function can only be defined once in a multiple-file project, you can declare it as many times as you need. If we had other files that used greet(), each one would need to contain a declaration for it, to make them aware of the fact that it exists.

Later, when we compile all the different files in the project together, C++ will link up all the usages of greet() with is one-and-only definition, in greet.cpp. If we recompile now, everything will tie up nicely and when we run our program, we’ll see:

Hello!
Nice to meet you!

This illustrates a general principle of multi-file projects

You can spread your definitions over several files, so long as a declaration is visible everywhere a definition is used.

Separate compilation

I’ve mentioned that if you want to compile file1.cpp, file2.cpp and file3.cpp into a single executable program, you can do this the explicit way:

g++ -c file1.cpp
g++ -c file2.cpp
g++ -c file3.cpp
g++ -o program file1.o file2.o file3.o

or you can do it all in one step with just

g++ -o program file1.cpp file2.cpp file3.cpp

You might be wondering why we’d ever do it the first way. The answer is, think about what happens if you modify only one of the files. Say you edit file2.cpp and you want to recompile. The first way you do

g++ -c file2.cpp
g++ -o program file1.o file2.o file3.o

There’s no need to recompile file1 and file3, because they haven’t changed, so we save a bit of time by telling the compiler to only recompile file2.cpp. The second, shortcut way doesn’t give us that option: it always recompiles everything, even if it hasn’t changed.

You’re unlikely to notice the difference for a small project, but for big projects, recompiling everything may take a long time; only recompiling the things that you know you’ve changed can cut down on compilation time significantly.

Linker order

Note that the order in which I listed the files when I compiled them together

g++ -o greet main.o greet.o

i.e., main.o first and greet.o second, is important. In particular, if file1.cpp uses functions from file2.cpp then you must put file.cpp (or file1.o if you compiled it separately) before file2.cpp in the list. The reason for this is that G++ throws away anything that isn’t used, so if we listed greet.o first, it would throw away the greet() function, before moving on to main.cpp and realizing that, nope, we actually needed it. It’s annoying, but them’s the breaks.

For example, suppose we have three files

//-------------------------------------------------------------
// main.cpp

void f(); 
void g();
void h();

int main() {
  f();
  g();

  return 0;
}

// -------------------------------------------------------------
// one.cpp

void h();

void g() {
  h();
}

void f() {
  g();
}

// -------------------------------------------------------------
// two.cpp

void h() {
  return;
}

What order do I have to list main.cpp, one.cpp and two.cpp when I compile them?

Include files

Manually copying the declarations to the tops of all the functions that need them is cumbersome and error-prone (imagine if we later wanted to change the signature of a function?). Instead, we normally put all the declarations for a single .cpp file (one that doesn’t have a main()) into a header file.

E.g., for the above, in addition to greet.cpp we would also create greet.h:

greet.h

void greet();

This file must exist in the same directory as greet.cpp and main.cpp.

The entire contents of greet.h is just the declaration of the greet() function. If we had more functions in greet.cpp, we’d add their decalarations to greet.h. Now, instead of actually putting the declaration itself at the top of main.cpp, we’re going to do this:

#include <iostream> 
#include "greet.h"

int main() {
    cout << "Hello!" << endl;

    greet();

    return 0;
}

The line #include "greet.h" will pull in the contents of the greet.h file, just as if we had written them at that point in the file. So we get the declaration of greet() like before, but we don’t have to worry about putting it into main.cpp.

A file ending in .h is called a header file; you don’t need to tell G++ about it when you compile because we’re #include-ing it. Sometimes header files are called include files, for obvious reasons.

Considering this, there are a few ways to structure larger projects:

If we know that we’ll never want to reuse any parts of the project, then we can write a single .h file which includes declarations for all the different functions in the project, regardless of what .cpp file they are defined in. Every .cpp will #include this file, so every .cpp file will have access to all the functions from every other file.
If, however, we might want to reuse some part of our system, we might choose to create more than one header file. In this case, we would break up the project into “components”: collections of .cpp files that are all related, and give each component a header file. Other components could include the headers from the component(s) that they need, and we can also use a component in a totally different project by just including its header, and then listing its source files when we compile.

For example, we might have a large project consisting of files main.cpp, utilities.cpp, data.cpp, and user.cpp. We’d put all the function declarations into a single header file named anything we like, say, project.h and then all of the above files would #include this header.

On the other hand, we might decide that utilities.cpp is something that we want to use in other projects. So then we’d pull its declarations out of project.h and put them into a separate utilities.h. We can then either

#include "utilities.h" in any source file that needs it, or
#include "utiltiies.h" within project.h. That way, every other file can remain unchanged, but we still get the ability to use utilities separately.

Note that when we do #include <iostream> the exact same thing happens: iostream is a file full of declarations for things like cin, cout, etc. The angle-brackets tell C++ to look for it, not in the current directory (which is what #include "..." means) but rather in whatever the system paths for built-in C++ header files are. But the idea is the same: we are pulling in declarations for things. (We don’t have to tell G++ about these things when we compile, because it automatically links against all the standard C++ stuff.)

Note that while you can do something like

#include "file.cpp"

you almost never should do this. If you’re including anything other than a .h file, you’re probably doing something wrong. Remember these two simple rules:

Don’t ever compile .h files (i.e., don’t use g++ -c file.h). Instead, #include them into the .cpp files that need them.
Don’t ever #include .cpp files. Instead, compile them with G++ and link them together.

If you #include a .cpp file and then also compile it normally, you will get multiply-defined errors (because the things in the file are being defined more than once: once in the file itself, and once everywhere where you included it). With G++ it’s not too hard to just not compile the .cpp, but other compilers (like Visual Studio) will automatically compile everything that ends with .cpp.

Include guards

Because the contents of a header file are literally just copied into the source file, replacing the #include "...", there are some problems that can crop up. None of these occur if the include just contains function declarations, but they can occur for includes that contain classes, so it’s a good practice to use include guards on all include files.

An include guard is a special bit of code at the beginning and the ending of a .h file that makes sure that the contents of the file are only loaded once. A header file test.h with include guards looks like this:

#pragma once
/*
 * test.h
 */
#ifndef TEST_H_
#define TEST_H_

// Contents of the file go here

#endif

Everything in the include file should go between the #define... and the #endif. Note also that the name TEST_H_ is based on the name of the file: for a file named whatever.h you would replace this with WHATEVER_H_.

There are three components to the include guard:

#pragma once at the top of the file tells C++ that the file should only ever be included once. Most modern compilers support this, but…
Some older compilers don’t, so we add the next bit, around all the declarations:
```
 #ifndef TEST_H_
 #define TEST_H_
```
This bit checks to see if there is not a special directive definition named TEST_H_; if there isn’t, it defines it. If there is, everything from this point is skipped until…
#endif – If TEST_H_ was already defined, then everything between the above and this is completely skipped.

You don’t really have to worry about how it all works, but you should understand why we do it.

This bit of magical code ensures that if we accidentally do

#include "test.h"
...
#include "test.h"

we won’t get multiple definitions; the second #include will effectively be ignored, thanks to the include guard. (It’s unlikely that we would make this obvious mistake. But what we might do is #include "test.h" and then later, include some other thing.h which in turn also includes test.h; the result is the same: test.h is included multiple times.)

An Example

Let’s take the example project (Pascal’s triangle) that’s posted to the course website, and split it into multiple files. The example is pretty simple, it only has main() and two functions, so we’ll split it into three files: pascals_triangle.cpp containing main(), triangle.cpp containing the triangle-printing code, and triangle.h containing the function declarations

`pascals_triangle.cpp`

/*
   pascals_triangle.cpp
   Prints Pascal's triangle
*/
#include <iostream>
using namespace std;

int main() {
    int rows;

    cout << "Pascal's triangle!" << endl;
    do {
        cout << "Enter how many rows to print (at least 0): ";
        cin >> rows;
    } while(rows < 0);

    print_pascals_triangle(rows);

    return 0;
}

`triangle.h`

#pragma once
/*
 * triangle.h
 * Code for printing Pascal's triangle
 */
#ifndef TRIANGLE_H_
#define TRIANGLE_H_

void print_pascals_triangle(int rows);
void print_row(vector<int> row_values);

#endif

`triangle.cpp`

#include <iostream>
#include <vector>
#include "triangle.h"

/* print_pascals_triangl(rows)
   Prints the first rows of Pascal's triangle, left-aligned.

   rows: number of rows to print
*/
void print_pascals_triangle(int rows) {
    vector<int> row_values; 

    for(int row = 0; row < rows; ++row) {
        // Update for the next row
        vector<int> new_row = row_values;
        new_row.push_back(1); // Last column is always 1

        // Compute the new row from the current row
        for(int c = 1; c < new_row.size()-1; ++c) 
            new_row.at(c) = row_values.at(c-1) + row_values.at(c);

        // Copy updated row back into the original
        row_values = new_row;

        // Print the updated row
        print_row(row_values);
    }
}

/* print_row(row_values)
   Prints a vector<int> as a row of the triangle. 

   row_values: vector<int> of the values to print
*/
void print_row(vector<int> row_values) {
    for(int value : row_values) {
        cout << value << " ";
    }
    cout << endl;
}

Note that pascals_triangle.cpp does not include <vector>, because it never uses vectors. You should try to only include things you actually need.

Classes

So functions go in .cpp files while their declarations, if we want to “share” those functions among several files, need to go in a header file (.h) which gets included in the .cpp files. What about class definitions?

Here’s where things get interesting:

Class definitions go in header files. Remember what I said about C++ not allowing multiple definitions? Yeah, that doesn’t apply to classes. You can have multiple definitions of the same class in different files (or because you included the same header file in different source files), and C++ will check them all against each other to make sure that they are identical, and then will merge them into a single class definition in your program.

However, remember how we can pull the definitions of the methods out of the class itself? If you do that, then those method definitions need to go into a .cpp file. So if you want to pull the method definitions out of a class, you’re going to end up with a header file (containing the skeleton of the class definition) and a source file (containing the actual method definitions).

A common pattern for classes is a pair of files:

A name.h file containing the class definition itself (or possibly multiple class definitions, if they are all related), along with declarations for any related (e.g., friend) functions.
A name.cpp file containing definitions of any methods, along with function definitions. (For special cases, you might have more than one .cpp with a single .h.)

The “base name” of both files is the same: whatever.h and whatever.cpp. The name should be based on the main class defined (if there is one), or on whatever the purpose of the code is.

Note that however you choose to group things, you should try to put related things together. That way, you can just #include the header, compile and link with the .cpp, and get all the related functionality.

An example project structure

Let’s think about the 369 game we worked on before. We’re going to break this up into three “components”:

The board component handles the game board
The player component handles individual player objects
The game component handles the whole game

We made each of these things a class, so we’re going to put the class definitions into header files, and then any methods that are not defined as part of their classes, will go into a related .cpp file.

The board component

board.h:

#pragma once
/*
 * board.h
 */
#ifndef BOARD_H_
#define BOARD_H_

class board {
  public:
    board() {
        squares = vector<vector<bool>>{9, vector<bool>{9,false}};
    }

    void set(int x, int y) {
        if(x >= 0 && x < 9 && y >= 0 && y < 9)
            squares.at(x).at(y) = true;
    }

    bool get(int x, int y) {
        if(x >= 0 && x < 9 && y >= 0 && y < 9)
            return squares.at(x).at(y);
        else
            return false;
    }

    int compute_points(int x, int y);
    bool is_full();

  private:
    vector<vector<bool>> squares;
};
#endif

board.cpp:

#include "board.h"

int board::compute_points(int x, int y) {
    ...
}

bool board::is_full() {
    for(int y = 0; y < 9; y++)
        for(int x = 0; x < 9; x++)
            if(!squares.at(x).at(y))
                return false;

    return true;
}

The player component

player.h:

#pragma once
/*
 * player.h
 */
#ifndef PLAYER_H_
#define PLAYER_H_

class player {
  public:
    player(string n) {
        name = n;
        score += 0;
    }

    void set_name(string n) {
        name = n;
    }

    void add_points(int p) {
        score += p;
    }

    int get_score() {
        return score;
    }

  private:
    string name;
    int score;
};

There is no player.cpp, because all the of the methods are defined within the class itself.

The game component

game.h:

#pragma once
/*
 * game.h
 */
#ifndef GAME_H_
#define GAME_H_

#include <iostream>
#include "player.h"
#include "game.h"
using namespace std;

class game {
  public:
    game(int player_count) {
        players = vector<int>{player_count};

        // Setup player names
        int n = 1;
        for(player& p : players) {
            cout << "Player " << n << " what is your name?";
            string n;
            cin >> n;
            p.set_name(n);
        }

        whose_turn = 0;
    }

    int current_player() {
        return whose_turn + 1;
    }

    void end_turn() {
        if(whose_turn == players.size() - 1)
            whose_turn = 0;
        else
            whose_turn++;
    }

    bool game_finished() {
        return b.is_full();
    }

    void get_move();
    int get_winner(); 

  private:
    board b;
    vector<player> players;
    int whose_turn;
}

#endif

game.cpp:

#include <iostream>
#include "game.h"
using namespace std;

void game::get_move() {
    int x,y;
    do {
        cout << "Player " << current_player() <<
                ": what is your move (x,y)?";

        cin >> x >> y;
    } while(x < 0 || x >= 9 || y < 0 || y >= 9);

    b.set(x,y);

    int points = g.get_points(x,y);
    players.at(whose_turn).add_points(points);
}

int game::get_winner() {
    int winner = 0, high_score = players.at(0).get_score();
    for(int i = 1; i < players.size(); i++)
        if(players.at(i).get_score() > high_score) {
            winner = i;
            high_score = players.at(i).get_score();
        }

    return winner + 1;
}

Main

Finally, we tie all these together with a single main.cpp which contains main().

main.cpp

#include <iostream>
#include "game.h"
using namespace std;

int main() {
    game g{2}; // Start two player game

    while(!g.is_finished()) {
        g.get_move();
        g.end_turn();
    }

    cout << "Player " << g.get_winner() << " you're the winner!" << endl;

    return 0;
}

To compile all this, we do either

g++ -c board.cpp
g++ -c player.cpp
g++ -c game.cpp
g++ -c main.cpp
g++ -o 369 main.o game.o player.o board.o

Or we can do all this at once with just

g++ -o 369 main.cpp game.cpp player.cpp board.cpp

Either way, this will produce the executable 369 which contains all the functions, classes, and methods of any of the above source files, and it will tie all the usages together so that everything works.

Thinking about components

When you structure a large project, you should always try to build your code into reasonably independent “components” (these don’t necessarily have to be classes!). E.g., looking at the previous example, we see that the Board component doesn’t need to know anything about the Player component, and vice versa, but the Game component needs to know about both. Having the Game depend on them is fine, if it really needs to. What you want to avoid is “accidental” dependencies: e.g., if I used some code from Board in Player when I really didn’t need to. The problem with this is now you can’t use Player without also using Board, even if you don’t actually need board.

For really big projects, you might want to put the different components into different subdirectories. E.g., if Player and Board and Game were more complex, we might have this structure:

369/
 |
 +--- board/
 |      +---- board.h
 |      +---- 369_board.cpp
 |      +---- chess_board.cpp
 |      +---- checker_board.cpp
 |
 +--- game/
 |      +---- game.h
 |      +---- 369.cpp
 |      +---- chess.cpp
 |      +---- checkers.cpp
 |
 +--- player/
 |      +---- player.h
 |      +---- 369.player
 |      +---- chess.cpp
 |      +---- checkers.cpp
 |
 +--- 369.cpp

Inside 369.cpp we would have

#include "board/board.h"
#include "player/player.h"
#include "game/game.h"

Inside game.h and game.cpp we have to bring in player.h and board.h, so we’ll need to look for those:

#include "../player/player.h"
#include "../board/board.h"

And then when we compile we’ll have to compile all the .cpp files and link them together.

Namespaces

We can use source files and headers to structure the locations of our declarations and defintions, spreading them out over more than one file to keep things organized. But what about the things themselves, the functions, classes, etc.? Right now the all live in the same “space” so the name f() always refers to a single function f.

An interesting problem arose on Piazza where a student had written

#include <iostream>
using namespace std;

class messages {
  ...
};

and got an obscure error message. The problem was that inside iostream, there is something already defined named messages which was conflicting with the class definition. Is it really the case that, if someone else has used a name, we can never use it for any of our classes or functions? Actually, no, we brought this situation on ourselves, by doing this:

using namespace std;

The messages declared inside of <iostream> is actually std::messages. If we hadn’t asked C++ to bring in everything in std, we wouldn’t have this problem!

std is what’s called a namespace; while files let us organize our definitions “spatially”, namespaces let us organize names. Namespaces have nothing to with file organization (although very often everything in a set of files will live in the same namespace). Everything in <iostream>, <vector>, <string> etc. are all in the std namespace, even though those are different files on the system.

Using namespaces

Anyway, how do we fix our problem? The solution is not to bring in all of std but just the parts we need:

#include <iostream>
using std::cout;
using std::cin;
using std::endl;

This brings in just cout, cin, and endl so we can use them without putting std:: in front. If we need anything else (e.g., getline) we will have to write it as std::getline. This is a good method to use when there are a few names from std that you’re going to be using a lot, and writing std:: in front of them would get cumbersome. Any other names, which maybe you only use once or twice, can be written as std::whatever.

We can do the same thing for functions. Maybe your main uses cin and friends, but the rest of your functions do not. Instead of putting using... at the top of your file, put it inside your main:

int main() {
  using namespace std;
  ...
}

(Or, if you want to get more precise, using std::cout;, etc.)

A good rule of thumb is to only use using for some specific thing: cin, cout, string, etc. and to try to use it in the most limited scope possible. E.g., instead of using it over the entire file, do it inside just the class, or method, or function that actually needs cin etc. If you need something just once, don’t write using for it, just give the fully qualified name std::whatever name. Try to avoid using namespace std;. It’s OK to use it in practices and simple problems, but for anything larger, you should be thinking about what parts of the std namespace do you really need, and where do you really need them.

For various technical reasons, these kinds of using declarations cannot appear directly within a class, only within its function members. E.g., you can’t do

class thing {
  public:
    using namespace std; // Error!
    using std::cin;      // Also error!
};

but you can do

class thing {
  public:
    void print() {
      using namespace std; // OK
      using std::cin;      // Also OK
      ...
    }
};

Creating Namespaces

For a large project, you might want to think about creating your own namespace(s), to be polite and keep your definitions from stepping on other people’s code. (If we defined a class messages then other people wouldn’t be able to use that name.)

Right now, if you don’t create a namespace, everything you declare or define lives in the global namespace. Although you never have to, the “fully qualified” name of something in the global namespace is ::something. Note that main must be defined in the global namespace; if you put main in a different namespace, the compiler won’t be able to find it.

To create a namespace, use

namespace whatever {
  // Definitions and declarations go here
}

Anything you define/declare inside the curly braces is now inside the whatever namespace. To refer to it from outside that namespace, you’ll either have to do using namespace whatever; or refer to it as whatever::.... (Within the whatever namespace, you don’t have to qualify names in whatever).

You can actually spread the contents of a single namespace over different parts of your file:

namespace whatever {
  // Some definitions in whatever
}

// Some things in the global namespace...

namespace whatever {
  // More definitions in whatever
}

You can even split a single namespace over more than one file; C++ will grab everything that is inside a particular namespace and group it together.

You can, if you want, put one namespace inside another:

namespace one {
  namespace two {
    void f() { }
  }
}

To refer to the function f we have to write one::two::f.

Generally, you don’t need to worry about creating your own namespaces until you start to work on really big projects, or libraries. But you should try to break the habit of writing using namespace std; at the top of your files; instead, just bring in the parts that you are going to use a lot, and only bring them in where you really need them.