Classes: abstracting over data

Let’s look at an example, to motivate what we’re going to talk about today. Suppose we want to build an application that deals with employee records. What kind of information will we need to maintain about employees?

ID number
Name
Years of service
Position
Salary

We can store this information in several parallel vectors:

vector<int>    employee_id;
vector<string> employee_name;
vector<int>    employee_years;
vector<string> employee_position;
vector<float>  employee_salary;

Here, a particular employee is identified by their row number (different from their ID number). E.g., we might have

employee_id.at(0)       = 1234;
employee_name.at(0)     = "John Smith";
employee_years.at(0)    = 12;
employee_position.at(0) = "CEO";
employee_salary.at(0)   = 140000.00

When we want all the information about a particular employee, we look at the same row in all these vectors. If we want to, we can write some functions to manipulate a single employee in various ways:

int find_employee_by_id(int id) {
    for(int i = 0; i < employee_id.size(); i++)
        if(employee_id.at(i) == id)
            return i;

    return -1;
}

void promote_employee(int emp, string new_position, float new_salary) {
    employee_position.at(emp) = new_position;
    employee_salary.at(emp) = new_salary;
}

void fire_employee(int emp) {
    employee_id.erase(employee_id.begin() + emp);
    employee_name.erase(employee_name.begin() + emp);
    employee_years.erase(employee_years.begin() + emp);
    employee_position.erase(employee_position.begin() + emp);
    employee_salary.erase(employee_salary.begin() + emp);
}

// etc.

Having to treat every attribute of each employee in a separate vector is cumbersome. What we really want is a way to group all the information that makes up a single employee into one “object”. Then, we can just create a single vector of employee objects and everything will be in the right place. The idea of putting things together that belong together is called the principle of encapsulation.

We’re going to build an entirely new type of our own, like int or float or vector or string. This means that anything we could do with those types, we can do with ours: store them in a vector, create them dynamically via new, pass them to functions as parameters, and return them from functions. Here’s how we start:

class employee {
  public:
    // ...
};

Breaking this down:

class – A “class” is a new type that is built up as a collection of values.
employee – This is the name for the new class. From this point forward, employee will be the name of a new type.
{ ... }; – The body of a class is a block, with a semicolon at the end.
public: – This says that whatever follows will be “public”, meaning, exposed to the rest of the program, outside of the employee class. Later, we’ll see how we can “hide” information inside a class, so that only the class itself can access it.
A class is a definition. This means it goes outside of main, and other definitions, you can’t use an employee until after you’ve defined it. Thus, we usually put class definitions at the top of a file, before the function declarations (because functions might take employees as arguments, or return them, so the type will need to be defined before that).

Thus, the overall structure of our file now looks like this:

/* opening comment */
#includes
...
using namespace std;

// class definitions...

// function declarations...

int main() {
    ...
}

// function definitions

To describe what makes up an employee, we just list the various values, as if they were variables:

class employee {
  public:

    int id;
    string name;
    int years_of_service;
    string position;
    float salary;
};

These variable-like things are called data members of the class. Every object that is an employee is guaranteed to have all of these values associated with it.

We can create an employee-variable just like any other type:

employee jsmith;

but how do we access the values inside it? We use the dot . operator:

jsmith.name = "John Smith";

If thing is the name of a class instance, and whatever is the name of a data member of that class, then thing.whatever is the name of thing‘s particular whatever. Remember that every instance “owns” its own values, so a different instance will have a distinct whatever.

Now we have created the new type employee. What can we do with it?

We can create individual employee objects (statically):
```
 employee jsmith;
 jsmith.id = 1234;
 jsmith.name = "John Smith";
 jsmith.years_of_service = 12;
 jsmith.position = "CEO";
 jsmith.salary = 140000.00;
```
Initially, the employee object jsmith is created with all default values (id, years_of_service and salary will all be 0, name and position will be the empty string). We can then fill in the details after creating it. Because this is often inconvenient, we can also fill in the details when we create the object, like so:
```
 employee jsmith{
    1234,
    "John Smith",
    12,
    "CEO",
    140000.00
 };
```
Note that the order of values here must correspond exactly with the order we listed them in in the class definition. Whereas if we list out each data member individually (as above) we can initialize them in any order.

Another way to create an employee and store it in an employee variable is to create the object, and then assign it:
```
 employee jsmith;

 jsmith = employee{
    1234,
    "John Smith",
    12,
    "CEO",
    140000.00     
 };
```
For any class thing, you can create a standalone thing by doing
```
 thing{ member values... }
```
Note that the result is a value; this doesn’t store it any where, just creates a value with the given attributes.

We can also dynamically create employees:

 employee* jsmith = new employee();
 jsmith->id = 1234;
 jsmith->name = "John Smith";
 jsmith->years_of_service = 12;
 jsmith->position = "CEO";
 jsmith->salary = 140000.00;

 employee* jsmith = new employee{
    1234,
    "John Smith",
    12,
    "CEO",
    140000.00
 };

(Note the similarity with the above: employee{...} create a new employee object, and evaluates to an employee, statically allocated, while new employee{...} creates a dynamically allocated employee and returns an employee*.)

We can store employees in a vector:
```
 vector<employee> employees;
```
Assuming this is a global variable, we can use it as the collection of all employees in the system.

We can write functions that manipulate and take or return employees:

 int find_employee_by_id(int id) {
     for(int i = 0; i < employees.size(); i++)
         if(employees.at(i).id == id)
             return i;

     return -1;
 }

 void promote_employee(employee& emp, string new_position, float new_salary) {
     emp.position = new_position;
     emp.salary = new_salary;
 }

 void fire_employee(int emp) {
     employees.erase(employees.begin() + emp);
 }

Classes vs. Instances

employee is the class. It doesn’t actually have any data associated with it at runtime, which means that things like this:

employee.salary

make no sense. It makes no sense to ask, 'what is the salary of employee’, only to ask, ‘what is the salary of a particular employee?’. We call particular employees (or, more generally, particular objects belonging to some class) the instances of a class. E.g., in the following,

employee jsmith;

we would say that jsmith is an instance of employee. Instances have data associated with them (the data members that we gave in the class definition).

To summarize, when we say

class whatever { ... };

we are defining a new class (type), but to create instances of it, we still need to do

whatever thing;

to create a whatever variable named thing.

Functional abstraction vs. data abstraction

We mentioned before that functions are a kind of abstraction; they allow us to give a name to a sequence of statements, and thus “abstract” over the details of how some operation is done. When I use promote_employee I just give it the employee, position, and salary; I don’t have to worry about how it works. Similarly, when I am writing promote_employee, I don’t have to worry about where the employee, or position, or salary came from, I just use them.

Classes provide another kind of abstraction, abstraction over data. Now that I’ve defined employee, I don’t need to care about what goes into an employee. I can just pass them around and manipulate them. In particular, suppose we wanted to add another value to every employee, benefit. We can just add it to employee:

class employee {
  public:

    int id;
    string name;
    int years_of_service;
    string position;
    float salary;
    float benefits;
};

and now every employee gets that data member. Furthermore, any functions we’ve written to deal with employees don’t need to worry about benefits unless they have a particular interest in that value. E.g., when we fire_employee we don’t need to do anything extra (in the parallel vectors version, we’d have another vector for benefits, and we’d have to remember to modify the fire_employee function to deal with it as well).

Whereas functions let us abstract over behavior, classes let us abstract over data. If there are only two things you really learn in this class, make them be functions and classes. At the highest level, programming is nothing more than creating and working with abstractions.

A good way to think about this is that when we do

employee jsmith;

we are creating a box named jsmith which contains within it, sub-boxes named id, name, years_of_service, etc.

Examples

Let’s look at some more examples of classes at work.

A class for dogs. For some reason, let’s suppose we need to represent information about dogs. Maybe we’re making an obedience training application or something. The first question is, what kind of information do we want to associate with each dog? Remember that every instance of a class gets all the data members, so we need to know what kind of values every dog will have, so that we can define the members of the class.
```
 class dog {
   public:
     string name;
     string breed;
     int age;
     string color;
     char gender; // 'm' or'f'
 };
```

A class for dog owners. Make sense that we’d need this next, right?

 class owner {
   public:
     string name;
     string address;

     vector<dog> dogs_owned;
 };

Here, we have a class, that contains a vector of instances of another class! After we create a dog owner, we can add dogs to him/her:

 owner jsmith;
 jsmith.name = "Jane Smith";
 jsmith.dogs_owned.push_back(dog{"Fido", "Welsh Corgi", 5, "brown", 'm'});

If we look at jsmith now, we’ll find:

 jsmith.dogs_owned.size() == 1;
 jsmith.dogs_owned.at(0).name == "Fido";
 // etc.

A larger example

Let’s write a class to represent polynomials, e.g.,

$$p = 3 + 2x - 5x^2 + 3x^4$$

We can represent a polynomial as a list of its coefficients, one for each power of $x$. E.g., the above could be represented as the vector

vector<float> p = {3, 2, -5, 0, 3};

In the polynomial above, there is no term for $x^3$, so it has a coefficent of 0 (i.e., the term is $0 x^3)$. As a class, this will just be

class polynomial {
  public:
    vector<float> coeffs;
};

What kind of operations can we perform on a polynomial?

We can create a polynomial with a specific degree, although all the coefficients will be 0 to start with.
We can multiply it by a constant value. This has the effect of multiplying every coefficient by that constant.
We can multiply it by a power of $x^k$. This has the effect of inserting $k$ 0s at the beginning of the vector (i.e., shifting the coefficients up).
We can ask for the degree of a polynomial. This is the highest power of $x$ that does not have a 0 coefficient. Note that a vector such as
```
 {1, 2, 0, 0}
```
has degree 1, because it represents $1 + 2x$. In fact, we should chop off those higher 0s so that we can just use the size of the vector, which leads us to
We can “normalize” a polynomial. This just means chopping off any 0s at the end, so that the above would become
```
 {1, 2}
```
Then, the degree is just .size() - 1.
We can add two polynomials together. This is done by matching up coefficients with the same powers, and adding them together. E.g.,

$$(1 + 2x + 3x^2) + (2 - 4x - 5x^2) = $$ $$(3 - 2x - 2x^2)$$

We can similarly subtract two polynomials. If the polynomials do not have the same degree, then the shorter one is extended with 0s.
We can multiply polynomials, but that’s complicated, so we’ll skip it.
We can evaluate a polynomial, by plugging in a float for $x$ and seeing what value we get out.
We can print a polynomial, as (for example)
```
 1 + 2x + 3x^2 + 4x^3
```

Let’s write functions for some of these:

#include<cmath>
#include<iostream>
#include<utility>

polynomial create(int degree) {
    polynomial output;
    output.coeffs.resize(degree + 1, 0);

    return output;
}

int degree(polynomial p) {
    normalize(p);
    return p.coeffs.size() - 1;
}

void normalize(polynomial& p) {
    int i;
    for(i = p.coeffs.size() - 1; i >= 0; i--)
        if(p.coeffs.at(i) != 0)
            break;

    p.coeffs.resize(i + 1);
}

void shift(polynomial& p, int powers) {
    p.coeffs.insert(p.begin(), powers, 0);
}

polynomial multiply(polynomial p, float s) {
    polynomial output = create(degree(p));
    for(int i = 0; i < p.coeffs.size(); i++)
        output.coeffs.at(i) = p.coeffs.at(i) * s;

    return output;
}

polynomial add(polynomial a, polynomial b) {
    int larger = max(degree(a), degree(b));

    polynomial output = create(larger);

    for(int i = 0; i < larger; i++) {
        float x = i < a.coeffs.size() ? a.coeffs.at(i) : 0;
        float y = i < b.coeffs.size() ? b.coeffs.at(i) : 0;

        output.coeffs.at(i) = x + y;
    }

    return output;
}

float evaluate(polynomial p, float x) {
    float output;
    for(int i = 0; i <= degree(p); i++)
        output += p.coeffs.at(i) * pow(x,i);

    return output;
}

void print(polynomial p) {
    for(int i = 0; i <= degree(p); i++) {
        if(i == 0)
            cout << p.coeffs.at(i) << " ";
        else if(i == 1)
            cout << p.coeffs.at(i) << "x ";
        else
            cout << p.coeffs.at(i) << "x^" << i << " ";
    }

    cout << endl;
}

And now we can use these to do some interesting things:

int main() {
    polynomial a = create(3), b = create(4);

    // a = 1 - 2x + 3x^2
    a.coeffs.at(0) = 1; 
    a.coeffs.at(1) = -2;
    a.coeffs.at(2) = 3;

    // b = 2x + 4x^2 + 8x^3
    b.coeffs.at(0) = 0;
    b.coeffs.at(1) = 2;
    b.coeffs.at(2) = 4;
    b.coeffs.at(3) = 8;

    polynomial c = add(a,b);
    shift(c,1);

    print(c);

    return 0;
}

More encapsulation: member functions

We can use a class to wrap up some values inside a single object. But what about the operations on the class, all those functions above? If they are tied to the polynomial class, shouldn’t they be attached to it somehow? It turns out that the answer is Yes. Encapulsation means not just that a class contains the values it needs, but that it should contain everything it needs, values and operations. Let’s see how this works; going back to our dog class:

class dog {
  public:
    string name;
    string breed;
    int age;
    string color;
    char gender; // 'm' or'f'
};

Let’s give this class an operation speak which will print "Woof" to cout:

class dog {
  public:
    string name;
    string breed;
    int age;
    string color;
    char gender; // 'm' or'f'

    void speak() {
        cout << "Woof!" << endl;
    }  
};

We call a function like this, defined inside a class a “member function”. Just as you can only refer to the name of a particular dog (i.e., you must have a dog instance), so you can only ask a dog instance to speak():

dog puppers;
puppers.name = "Puppers";
// etc.
puppers.speak(); // Prints "Woof!"

We can add all the things a dog can do as member functions of the class, and then we no longer need them as individual functions. We’ve encapsulated all the operations and behaviors of dog into the dog class itself. A class is not just a container for values, rather, it’s a container for everything that concerns that particular type. Ideally, we want a class’s instances to stand on their own, and not rely on external functions for any important behavior. (Some behaviors will live in external functions.)

Let’s modify the dogs-and-owners system to have some more methods:

class dog {
  public:
    ...

    void speak() {
        cout << "Woof!" << endl;
    }

    void sleep() {
        cout << "Snooze-time" << endl;
    }

    vector<dog> have_puppies() {
        if(gender == 'm')
            return vector<dog>(); // Empty vector
        else
            return vector<dog>(4); // 4 puppies
    }
};

What’s going on in have_puppies, what is gender refering to? The answer is, it’s referring to the gender of the current dog. E.g., when we do

dog fido = {"Fido", 5, "Black", 'm'};
vector<dog> puppies = fido.have_puppies();

Within fido.have_puppies(), gender refers to fido.gender. When you call a method on an instance, that instance becomes the current instance, and any uses of the data member names within its methods refer to that, the current instance. This gives us a way to simplify functions that operate on a particular dog: just make them methods, and we can access the attributes of the current dog easily!

Another way to put it is to imagine that we wrote have_puppies as a function:

vector<dog> have_puppies(dog parent) {
    if(parent.gender == 'm')
        return vector<dog>(); // Empty vector
    else
        return vector<dog>(4); // 4 puppies
}

For a method of a class, the current instance is essentially passed as a hidden, invisible parameter. (Technically, it’s passed as a hidden pointer to the current instance, and that pointer is named this.) Every method gets access to it, and whenever we use gender, name, or any other name of a data member, it will access the current instance.

Polynomials, continued

Now that we have methods, we can clean up our polynomial class, moving some of the operations inside it.

class polynomial {
  public:
    vector<float> coeffs;

    void normalize() {
        int i;
        for(i = coeffs.size() - 1; i >= 0; i--)
            if(coeffs.at(i) != 0)
                break;

        coeffs.resize(i + 1);
    }

    int degree() {
        normalize();
        return coeffs.size() - 1;
    }

    void shift(int powers) {
        coeffs.insert(p.begin(), powers, 0);
    }

    polynomial multiply(float s) {
        polynomial output = ???

        for(int i = 0; i < coeffs.size(); i++)
            output.coeffs.at(i) = coeffs.at(i) * s;

        return output;    
    }

    polynomial add(polynomial b) {
        int larger = max(degree(), b.degree());

        polynomial output = ???

        for(int i = 0; i < larger; i++) {
            float x = i < coeffs.size() ? coeffs.at(i) : 0;
            float y = i < b.coeffs.size() ? b.coeffs.at(i) : 0;

            output.coeffs.at(i) = x + y;
        }

        return output;
    }

    float evaluate(float x) {
        float output;
        for(int i = 0; i <= degree(); i++)
            output += coeffs.at(i) * pow(x,i);

        return output;    
    }

    void print() {
        for(int i = 0; i <= degree(); i++) {
            if(i == 0)
                cout << coeffs.at(i) << " ";
            else if(i == 1)
                cout << coeffs.at(i) << "x ";
            else
                cout << coeffs.at(i) << "x^" << i << " ";
        }

        cout << endl;    
    }
};

I’ve left out both the create function and put ???s in the lines where it was called. We’ll deal with the proper way to create objects in the next section.

Some general observations here:

Every function that used to take a single polynomial parameter now doesn’t need to take one: it can just use the current instance directly. Similarly, functions like add that used to take two, now just take one.

When a function used to take a parameter polynomial p, we just remove the parameter, and then remove every copy of p in the body of the function. Effectively, p is being replaced by the current instance.
Notice how degree() refers to normalize()? As with members, methods can call each other on the current instance, implicitly.
multiply can access the coefficients of output directly, even though they are private. This is because multiply is part of polynomial, even though it is not part of the same instance as output.