Overloading insertion and extraction

As we build our own classes, we may want to control how they are printed, and how they are read in. E.g., suppose we have our trusty dog class:

class dog {
  public: 
    string name;
    string color;
    string breed;
    int age;
};

While we could add a .print method, it would be really great if we could do this:

dog fido{"Fido", "Brown", "Corgi", 7};

cout << fido << endl; // Prints information about Fido to cout

Can we just teach C++ what << means for our class dog? Yes, in fact we can. We can overload the << operator for objects of our class’s type (just as we can overload functions, we can also overload operators). In order to do this, we need to know two things:

What is the name of the function that represents the << and >> operators? Once we have a function name, we can create our own function with the same name.
What are the parameters and return value of these functions? We need to know how many inputs it takes, and what it returns, if we want to overload it.

the answer is

ostream& operator<< (ostream& out, ...) {
  ...
  return out;
}

istream& operator>> (istream& in, &...) {
  ...
  return in;
}

There are a few things that every overload of << and >> for input/output must do:

The return type must be either an ostream (for <<) or an istream (for >>) and it must be returned by.
The first parameter must be either an ostream (for <<) or an istream (for >>) and it must be passed by reference.
The second parameter is a value of whatever type we are trying to print/input. For an >> operator it must be passed by reference.
Both overloads must return the istream/ostream that they took in as the first parameter.

Thus, for our dog class, the << operator could look like this:

ostream& operator<< (ostream& out, dog d) {
  out << d.name << ", ";
  out << d.color << " " << d.age << ", ";
  out << "age " << d.age;
  return out;
}

Note that out is not a typo: I am printing to the out ostream that was passed in as the first argument, NOT to cout. If you print to cout, then your program will work, until you try to print to something other that cout. The version printed above will actually let you print to any ostream: file, stringstream, whatever. We don’t want to tie our printing down to just cout, especially when we can get the ability to print to anything basically for free.

Input is more difficult, just because we have to account for the fact that the input might not be in the right format. If the input fails then we have to reset the variable we are reading into to its default value. For example, suppose we create a class for two-dimensional points:

class point2d {
  public:
    point2d() { x = y = 0; }

    float x,y;
}

To print a point is relatively easy:

ostream& operator<< (ostream& out, point2d p) {
  out << p.x << "," << p.y;
  return out;
}

This prints the point \( (x,y) \) as x,y, i.e., coordinates with a comma between. To read in a point, we have to check for the comma: if it’s present we ignore it, if it’s not present then we reset p to the “default” point, using the default constructor:

istream& operator>> (istream& in, point2d& p) {
  if((in >> p.x) && 
     i.peek() == ',' &&
     in.ignore(1) &&
     (in >> p.y)) {

  }
  else  
    p = point2d{};

  return in;
}

Writing a correct extraction operator is more difficult than writing a correct insertion operator, just because there are many different ways that things can go wrong: wrong input format, stream ends mid-object, etc. The more complex the object is, the more complex its extraction will be.

Here I’ve overloaded the operators as functions, outside the class, which means that if they needed access to the class’s private members, they would have to be declared as friend functions.

Operator overloading

Overloading << and >> is just the beginning: it turns out that we can overload almost all the operators. The full list of overloadable operators is

Operators	Description
`+ - * / %`	Arithmetic
`ˆ & \| ~`	Bitwise
`!`	Negation
`= += -= *= /= %= ˆ= &= \|= << >> >>= <<=`	Assignment
`< > == != <= >=`	Comparison
`&& \|\|`	Logical
`++ --`	Inc/Dec
`,`	Comma
`->* ->`	Pointer-member
`()`	Function call
`[]`	Array element
conversion	Cast
`new delete`	Allocation/Deallocation

This is subject to a few caveats:

You can only overload operators to customize their behavior on your own classes. You can’t add overloads on built-in types, or pointer types.
You can’t add new operators. Just writing operator<|> doesn’t magically create a new <|> operator.
C++ won’t stop you from completely screwing up the types on the operators. In particular, many of the operators should have particular return types, or require some/all arguments to be passed by reference, in order to work correctly.

There are two ways you can overload an operator:

As a normal function, as we did above with << and >>. In this case, you write a normal function whose name is operatorOP where OP is the operator you want to overload. (Note that, like a normal function, you must write a declaration for this operator before you use it.)
As a method on the class for which you are specializing the operator. You can only overload as a method if the first argument to the operator is of the class type. E.g., we can’t overload << and >> for output/input as methods, because their first arguments are the streams. (You can overload them as shift operators, however.)

Some operators must be overloaded as methods. These are the assignment operator, the function call operator, and the array element operator.

Overloading the assignment operator

The assignment operator can be overloaded, which means that when we write

a = b;

where a and b are of some user-defined type, we can control exactly what happens. Normally, this just copies all the members (public and private) of b into a, but maybe we want something different to happen. It may be that some classes have some internal data that should not be copied; if so, the overloaded = can avoid copying it.

The overloaded = must be defined as a method on the class, and it must take and return references:

dog& dog::operator= (dog& other);

The purpose of this overload is to copy the contents of other into *this and then return *this. (Returning *this is necessary to make things like a = b = c; work.)

If you don’t overload =, you get a version for free that, as mentioned above, just copies all data members. If that’s what you want, then there’s really no point in overloading it.

The copy constructor

The copy constructor is closely related to the assignment overload. Normally, if you define one you’ll define the other. While the assignment operator is used when we explicitly copy an object into another, the copy constructor is used implicitly, when C++ needs to make a copy in the background. For example, if we write a function

void breed(dog a, dog b);

and then pass it two dogs:

dog fido{...};
dog woofy{...};

breed(fido, woofy);

both fido and woofy are passed by copy, so C++ has to make copies of them. Normally, it does the same thing as the default assignment operator: just calls the default constructor (the one with no arguments) and then copies over any data members, but you can change this if you want.

The copy constructor is just a constructor that takes another object of the class’s type by reference (can’t take it by copy, because we are in the process of definition how to do that!). For dog, it looks like this:

class dog {
  public:
    dog(dog& other) {
        // Copy other into *this
    }
};

Usually the copy-constructor and the overloaded assignment work in tandem, essentially doing the same thing, just in different situations.

The “big three”

Usually you’ll need to overload the assignment operator at the same time as two other things:

The destructor
The copy constructor

The normal reason for needing to overload these is because the class “owns” some resource that will not automatically copy itself. (E.g., a pointer to dynamic memory.) In that case, you need to overload all three:

The assignment operator is overloaded to copy the resource into an existing object.
The copy constructor is overloaded to copy the resource into a new object.
The destructor is overloaded to free the resource (e.g., delete the dynamic memory).

Thus, these three together are known as the “big three”: if you find yourself writing one, you probably need to write the other two, as well.

The increment/decrement operators

You can overload ++ and --, both the pre- and post-increment versions. The pre- versions look like this (for a class thing)

thing& operator++ (thing& t); // Function 
thing& operator-- (thing& t); // Function
thing& thing::operator++ ();  // Method
thing& thing::operator-- ();  // Method

What about the post-increment version? These have two differences:

They take an extra argument of type int. The value of this argument is always 0, it’s just there to distinguish them from the pre- versions.
The usually don’t return a reference.

thing operator++ (thing& t, int i); // Function 
thing operator-- (thing& t, int i); // Function
thing thing::operator++ (int i);  // Method
thing thing::operator-- (int i);  // Method

Arithmetic operators

You can overload any of the arithmetic operators + - * / %, including the unary (prefix) minus. Normally these are overloaded as functions rather than methods (so that they can take advantage of implicit conversions on both operands). They usually take their arguments by copy.

thing operator+ (thing a, thing b);
thing operator- (thing a, thing b);
thing operator- (thing a);           // Unary minus
thing operator* (thing a, thing b);
thing operator/ (thing a, thing b); 
thing operator% (thing a, thing b);

Note that % is normally only defined on int-like types (whole numbers).

We could use these to build a new numeric type, like complex, which supports all the usually arithmetic operations:

class complex {
  public:
    complex(float r, float i) { re = r; im = i; }
    complex(float r)          { re = r; im = 0; }
    float re, im;
};

complex operator+ (complex a, complex b) {
    return complex{a.re + b.re, a.im + b.im}
}

complex operator- (complex a) {
    return complex{-a.re, -a.im};
}

complex operator- (complex a, complex b) {
    return a + -b;
}

complex operator* (complex a, complex b) {
    return complex{a.re * b.re, a.im * b.im}
}

complex operator/ (complex a, complex b) {
    return complex{a.re / b.re, a.im / b.im}
}

Overloading conversion operators

We’ve already seen how we can define how to convert things into a user-defined type, by defining a constructor. Can we go the other way, and define how elements of a user-defined type are converted into other things? Yes, by overloading the cast operator. When we write something like

float x = 1.2;
int y = int(x); // convert x to int

we are performing a cast, a explicit conversion from float to int. The cast operator for a given target type looks like this:

int operator int (thing& a); // Function
int thing::operator int ();  // Method

Both of these define how instances of thing are converted into ints. After defining one of these, we can proceed to write

thing x = ...;
int y = x;
int z = x + 2; // Convert x to int and then add 2

Overloading comparisons

Traditionally, if you’re going to overload the comparison operators, you’d do it by first overloading < with the actual comparison. All the other comparisons can be defined in terms of <. (For normal use, all of these should return bool.) Although we can write == in terms of > (a == b iff !(a < b || a > b)), it’s usually possible to write a more efficient comparison.

bool operator<  (thing a, thing b) { /* compare */ }
bool operator>  (thing a, thing b) { return b < a; }
bool operator<= (thing a, thing b) { return !(a > b); }
bool operator>= (thing a, thing b) { return !(a < b); }
bool operator== (thing a, thing b) { /* compare */ }
bool operator!= (thing a, thing b) { return !(a == b); }

That way we really only have to write one or two “custom” comparison operator, the rest can be written fairly easily.

By default, if you don’t overload == and != you get a version that just compares all the members (public and private). None of the other comparison operators have default versions.

Overloading NOT and `bool` conversion

If your class has some idea of being “good” or “bad”, often you’ll implement a conversion to bool, along with overloading the ! (logical NOT) operator, with the idea that true corresponds to “good” and false corresponds to “bad”. Thus, a user can test whether a given thing is good or bad by just doing

thing x = ...;
if(x) {
    // x must be good
}

Input/output streams overload them to do just this:

bool operator bool (ostream& out) {
    return out.good();
}

bool operator! (ostream& out) {
    return !out.good();
}

Whenever we write something like

while(cin >> i) {...

we are taking advantage of this conversion.

Abstract data types

Classes, together with overloaded operators, give us the power to make new data types. Just like vector and string, we can create a new type to represent some collection of data. Sometimes a description of what a class does, as distinct from how it does it, is called an “abstract data type”. As an example, let’s implement a Set ADT. This is intended to act like a mathematical set, so we can:

Insert and remove elements from it
Test whether its empty
Get the size (cardinality) of the set
Test whether a specific element is in it
Given two sets, construct their union, intersection, difference, and symmetric difference.

Note that there are many ways we could implement sets. The ADT is just a description of what a Set can do, but sometimes we’ll refer to a particular version of the Set as an ADT. We’re going to build a version of Set that is built on a vector where all the elements are unique. Let’s think about how to implement all the above operations (bearing in mind that any operation which changes a set or constructs a new set must ensure that the elements are unique):

To insert an element we first check to see if it is already in the set. If it is not, then we simply push_back it.
To remove an element we search the vector for it and then erase it. Because the elements are unique, we know that there will be at most one copy of it, so we can stop as soon as we find it.
To check whether a set is empty, we just call .empty() on the set.
To get the size of the set, we just call .size() on the vector.
To check whether an element is in the set, we loop over all the elements in the set and see if we find it.
To construct the union of two sets, we loop over both sets and then insert both their elements into a new (initially empty) set. Because we assume that insert won’t add duplicate elements, this is safe.
To construct the intersection, we loop over one set, and then for each element, check to see if it is in the second set. If it is, we add it to a new, initially empty set (note that we don’t have to use insert here, because all the things we are inserting are coming from a single existing set, and thus are guaranteed to be unique).
To construct the set difference \(A \setminus B\) we loop over \(A\) and check to see if its elements are in \(B\) and add them only if they are not. Once again, it’s safe to not use insert here (just push_back).
To construct the symmetric difference, we can just construct the union, and then “subtract” (set difference) the intersection.

Our class will look something like this:

class set {
  public:
    // Constructor: constructs the empty set
    set() { } 

    int size()   { return data.size(); }
    bool empty() { return data.empty(); }

    bool contains(int x) {
        for(int i : data)
            if(i == x)
                return true;

        return false;
    }

    void insert(int x) {
        if(!contains(x))
            data.push_back(x);
    }

    void remove(int x) {
        for(int i = 0; i < data.size(); ++i)
            if(data.at(i) == x) {
                data.erase(data.begin() + i);
                break;
            }
    }

    set union(set& other) {
        set result = *this; // Copy of this set       

        // Insert other data (to avoid duplicates)
        for(int i : other.data)
            result.insert(i); 

        return result;
    }

    set intersection(set& other) {
        set result; // Empty
        for(int i : data)
            if(other.contains(i))
                result.data.push_back(i);

        return result;
    }

    set difference(set& other) {
        set result = *this;
        for(int i : other.data)
            result.remove(i);

        return result;
    }

    set symmetric_diff(set& other) {
        set result = union(other);
        return result.difference(intersection(other));
    }

    friend ostream& operator<<(ostream&, set);

  private:
    vector<int> data;
};

ostream& operator<< (ostream& out, set x) {
    cout << "{";
    for(int i = 0; i < x.data.size() - 1; ++i) 
        out << x.data.at(i) << ",";

    if(!x.empty())
        out << x.data.back() << "}";

    return out;
}

To make things easier, we can overload some operators for the set operations:

set operator| (set a, set b) {
    return a.union(b);
}

set operator& (set a, set b) {
    return a.intersection(b);
}

set operator- (set a, set b) {
    return a.difference(b);
}

set operator^ (set a, set b) {
    return a.symmetric_diff(b);
}

set operator+= (set a, int x) }
  a.insert(x);
  return a;

And now we can write a quite natural set manipulation program:

int main() {
    set a; 
    a += 1; a += 2; a += 3;

    set b; 
    b += 3; b += 4; b += 5;

    cout << (a & b) << endl;

    return 0; 
}

Optional topics: const correctness. Using initializer_list for more natural set creation. Member initalizer lists.