Let’s look at an example, to motivate what we’re going to talk about today. Suppose we want to build an application that deals with employee records. What kind of information will we need to maintain about employees?

We can store this information in several parallel vectors:

vector<int>    employee_id;
vector<string> employee_name;
vector<int>    employee_years;
vector<string> employee_position;
vector<float>  employee_salary;

Here, a particular employee is identified by their row number (different from their ID number). E.g., we might have

employee_id.at(0)       = 1234;
employee_name.at(0)     = "John Smith";
employee_years.at(0)    = 12;
employee_position.at(0) = "CEO";
employee_salary.at(0)   = 140000.00

When we want all the information about a particular employee, we look at the same row in all these vectors. If we want to, we can write some functions to manipulate a single employee in various ways:

int find_employee_by_id(int id) {
    for(int i = 0; i < employee_id.size(); i++)
        if(employee_id.at(i) == id)
            return i;

    return -1;
}

void promote_employee(int emp, string new_position, float new_salary) {
    employee_position.at(emp) = new_position;
    employee_salary.at(emp) = new_salary;
}

void fire_employee(int emp) {
    employee_id.erase(employee_id.begin() + emp);
    employee_name.erase(employee_name.begin() + emp);
    employee_years.erase(employee_years.begin() + emp);
    employee_position.erase(employee_position.begin() + emp);
    employee_salary.erase(employee_salary.begin() + emp);
}

// etc.

Having to treat every attribute of each employee in a separate vector is cumbersome. What we really want is a way to group all the information that makes up a single employee into one “object”. Then, we can just create a single vector of employee objects and everything will be in the right place. The idea of putting things together that belong together is called the principle of encapsulation.

We’re going to build an entirely new type of our own, like int or float or vector or string. This means that anything we could do with those types, we can do with ours: store them in a vector, create them dynamically via new, pass them to functions as parameters, and return them from functions. Here’s how we start:

class employee {
  public:
    // ...
};

Breaking this down:

Thus, the overall structure of our file now looks like this:

/* opening comment */
#includes
...
using namespace std;

// class definitions...

// function declarations...

int main() {
    ...
}

// function definitions

To describe what makes up an employee, we just list the various values, as if they were variables:

class employee {
  public:

    int id;
    string name;
    int years_of_service;
    string position;
    float salary;
};

These variable-like things are called data members of the class. Every object that is an employee is guaranteed to have all of these values associated with it.

We can create an employee-variable just like any other type:

employee jsmith;

but how do we access the values inside it? We use the dot . operator:

jsmith.name = "John Smith";

If thing is the name of a class instance, and whatever is the name of a data member of that class, then thing.whatever is the name of thing‘s particular whatever. Remember that every instance “owns” its own values, so a different instance will have a distinct whatever.

Now we have created the new type employee. What can we do with it?

Classes vs. Instances

employee is the class. It doesn’t actually have any data associated with it at runtime, which means that things like this:

employee.salary

make no sense. It makes no sense to ask, 'what is the salary of employee’, only to ask, ‘what is the salary of a particular employee?’. We call particular employees (or, more generally, particular objects belonging to some class) the instances of a class. E.g., in the following,

employee jsmith;

we would say that jsmith is an instance of employee. Instances have data associated with them (the data members that we gave in the class definition).

To summarize, when we say

class whatever { ... };

we are defining a new class (type), but to create instances of it, we still need to do

whatever thing;

to create a whatever variable named thing.

Functional abstraction vs. data abstraction

We mentioned before that functions are a kind of abstraction; they allow us to give a name to a sequence of statements, and thus “abstract” over the details of how some operation is done. When I use promote_employee I just give it the employee, position, and salary; I don’t have to worry about how it works. Similarly, when I am writing promote_employee, I don’t have to worry about where the employee, or position, or salary came from, I just use them.

Classes provide another kind of abstraction, abstraction over data. Now that I’ve defined employee, I don’t need to care about what goes into an employee. I can just pass them around and manipulate them. In particular, suppose we wanted to add another value to every employee, benefit. We can just add it to employee:

class employee {
  public:

    int id;
    string name;
    int years_of_service;
    string position;
    float salary;
    float benefits;
};

and now every employee gets that data member. Furthermore, any functions we’ve written to deal with employees don’t need to worry about benefits unless they have a particular interest in that value. E.g., when we fire_employee we don’t need to do anything extra (in the parallel vectors version, we’d have another vector for benefits, and we’d have to remember to modify the fire_employee function to deal with it as well).

Whereas functions let us abstract over behavior, classes let us abstract over data. If there are only two things you really learn in this class, make them be functions and classes. At the highest level, programming is nothing more than creating and working with abstractions.

A good way to think about this is that when we do

employee jsmith;

we are creating a box named jsmith which contains within it, sub-boxes named id, name, years_of_service, etc.

Examples

Let’s look at some more examples of classes at work.

A larger example

Let’s write a class to represent polynomials, e.g.,

$$p = 3 + 2x - 5x^2 + 3x^4$$

We can represent a polynomial as a list of its coefficients, one for each power of \(x\). E.g., the above could be represented as the vector

vector<float> p = {3, 2, -5, 0, 3};

In the polynomial above, there is no term for \(x^3\), so it has a coefficent of 0 (i.e., the term is \(0 x^3)\). As a class, this will just be

class polynomial {
  public:
    vector<float> coeffs;
};

What kind of operations can we perform on a polynomial?

Let’s write functions for some of these:

#include<cmath>
#include<iostream>
#include<utility>

polynomial create(int degree) {
    polynomial output;
    output.coeffs.resize(degree + 1, 0);

    return output;
}

int degree(polynomial p) {
    normalize(p);
    return p.coeffs.size() - 1;
}

void normalize(polynomial& p) {
    int i;
    for(i = p.coeffs.size() - 1; i >= 0; i--)
        if(p.coeffs.at(i) != 0)
            break;

    p.coeffs.resize(i + 1);
}

void shift(polynomial& p, int powers) {
    p.coeffs.insert(p.begin(), powers, 0);
}

polynomial multiply(polynomial p, float s) {
    polynomial output = create(degree(p));
    for(int i = 0; i < p.coeffs.size(); i++)
        output.coeffs.at(i) = p.coeffs.at(i) * s;

    return output;
}

polynomial add(polynomial a, polynomial b) {
    int larger = max(degree(a), degree(b));

    polynomial output = create(larger);

    for(int i = 0; i < larger; i++) {
        float x = i < a.coeffs.size() ? a.coeffs.at(i) : 0;
        float y = i < b.coeffs.size() ? b.coeffs.at(i) : 0;

        output.coeffs.at(i) = x + y;
    }

    return output;
}

float evaluate(polynomial p, float x) {
    float output;
    for(int i = 0; i <= degree(p); i++)
        output += p.coeffs.at(i) * pow(x,i);

    return output;
}

void print(polynomial p) {
    for(int i = 0; i <= degree(p); i++) {
        if(i == 0)
            cout << p.coeffs.at(i) << " ";
        else if(i == 1)
            cout << p.coeffs.at(i) << "x ";
        else
            cout << p.coeffs.at(i) << "x^" << i << " ";
    }

    cout << endl;
}

And now we can use these to do some interesting things:

int main() {
    polynomial a = create(3), b = create(4);

    // a = 1 - 2x + 3x^2
    a.coeffs.at(0) = 1; 
    a.coeffs.at(1) = -2;
    a.coeffs.at(2) = 3;

    // b = 2x + 4x^2 + 8x^3
    b.coeffs.at(0) = 0;
    b.coeffs.at(1) = 2;
    b.coeffs.at(2) = 4;
    b.coeffs.at(3) = 8;

    polynomial c = add(a,b);
    shift(c,1);

    print(c);

    return 0;
}

More encapsulation: member functions

We can use a class to wrap up some values inside a single object. But what about the operations on the class, all those functions above? If they are tied to the polynomial class, shouldn’t they be attached to it somehow? It turns out that the answer is Yes. Encapulsation means not just that a class contains the values it needs, but that it should contain everything it needs, values and operations. Let’s see how this works; going back to our dog class:

class dog {
  public:
    string name;
    string breed;
    int age;
    string color;
    char gender; // 'm' or'f'
};

Let’s give this class an operation speak which will print "Woof" to cout:

class dog {
  public:
    string name;
    string breed;
    int age;
    string color;
    char gender; // 'm' or'f'

    void speak() {
        cout << "Woof!" << endl;
    }  
};

We call a function like this, defined inside a class a “member function”. Just as you can only refer to the name of a particular dog (i.e., you must have a dog instance), so you can only ask a dog instance to speak():

dog puppers;
puppers.name = "Puppers";
// etc.
puppers.speak(); // Prints "Woof!"

We can add all the things a dog can do as member functions of the class, and then we no longer need them as individual functions. We’ve encapsulated all the operations and behaviors of dog into the dog class itself. A class is not just a container for values, rather, it’s a container for everything that concerns that particular type. Ideally, we want a class’s instances to stand on their own, and not rely on external functions for any important behavior. (Some behaviors will live in external functions.)

Let’s modify the dogs-and-owners system to have some more methods:

class dog {
  public:
    ...

    void speak() {
        cout << "Woof!" << endl;
    }

    void sleep() {
        cout << "Snooze-time" << endl;
    }

    vector<dog> have_puppies() {
        if(gender == 'm')
            return vector<dog>(); // Empty vector
        else
            return vector<dog>(4); // 4 puppies
    }
};

What’s going on in have_puppies, what is gender refering to? The answer is, it’s referring to the gender of the current dog. E.g., when we do

dog fido = {"Fido", 5, "Black", 'm'};
vector<dog> puppies = fido.have_puppies();

Within fido.have_puppies(), gender refers to fido.gender. When you call a method on an instance, that instance becomes the current instance, and any uses of the data member names within its methods refer to that, the current instance. This gives us a way to simplify functions that operate on a particular dog: just make them methods, and we can access the attributes of the current dog easily!

Another way to put it is to imagine that we wrote have_puppies as a function:

vector<dog> have_puppies(dog parent) {
    if(parent.gender == 'm')
        return vector<dog>(); // Empty vector
    else
        return vector<dog>(4); // 4 puppies
}

For a method of a class, the current instance is essentially passed as a hidden, invisible parameter. (Technically, it’s passed as a hidden pointer to the current instance, and that pointer is named this.) Every method gets access to it, and whenever we use gender, name, or any other name of a data member, it will access the current instance.

Polynomials, continued

Now that we have methods, we can clean up our polynomial class, moving some of the operations inside it.

class polynomial {
  public:
    vector<float> coeffs;

    void normalize() {
        int i;
        for(i = coeffs.size() - 1; i >= 0; i--)
            if(coeffs.at(i) != 0)
                break;

        coeffs.resize(i + 1);
    }

    int degree() {
        normalize();
        return coeffs.size() - 1;
    }

    void shift(int powers) {
        coeffs.insert(p.begin(), powers, 0);
    }

    polynomial multiply(float s) {
        polynomial output = ???

        for(int i = 0; i < coeffs.size(); i++)
            output.coeffs.at(i) = coeffs.at(i) * s;

        return output;    
    }

    polynomial add(polynomial b) {
        int larger = max(degree(), b.degree());

        polynomial output = ???

        for(int i = 0; i < larger; i++) {
            float x = i < coeffs.size() ? coeffs.at(i) : 0;
            float y = i < b.coeffs.size() ? b.coeffs.at(i) : 0;

            output.coeffs.at(i) = x + y;
        }

        return output;
    }

    float evaluate(float x) {
        float output;
        for(int i = 0; i <= degree(); i++)
            output += coeffs.at(i) * pow(x,i);

        return output;    
    }

    void print() {
        for(int i = 0; i <= degree(); i++) {
            if(i == 0)
                cout << coeffs.at(i) << " ";
            else if(i == 1)
                cout << coeffs.at(i) << "x ";
            else
                cout << coeffs.at(i) << "x^" << i << " ";
        }

        cout << endl;    
    }
};

I’ve left out both the create function and put ???s in the lines where it was called. We’ll deal with the proper way to create objects in the next section.

Some general observations here: