Review of last time

Big-O notation: \(f(n) = O(g(n))\) means that \(f(n)\) for very large \(n\) is roughly proportional (at least, no worse than) \(g(n)\).

If we have a simple loop over a data structure:

void square_elements(vector<int>& values) {
    for(auto& v : values)
        v *= v;
}

this will take time proportional to the size of the input, so we say that it is a linear time algorithm, or that it is of order \(O(n)\).

void remove_duplicates(vector<int>& data) {
  vector<int> out;

  for(int x : data) {
    // See if x is already in out
    bool found = false;
    for(int y : out)
      if(x == y) 
        found = true;              

    if(!found)
      out.push_back(x);      
  }

  return out;
}

To analyze this, we first have to consider whether it will always take the same amount of time for an input of size n, or will some inputs cause it to run faster or slower. We have a nested loop, where the outer loop always runs n times, but the inner loop runs in time proportional to the size of out, which changes as the outer loop runs. Sometimes out grows, sometimes it doesn’t.

The best case is an input that is nothing but duplicates: {1, 1, 1, 1, ...}. The first 1 will be push_back‘d into out, but after that, out will never get any additional elements, so the inner loop always takes \(O(1)\) times. This leads to a best-case runtime of \(O(n)\).

The worst case is an input that is all unique elements (no duplicates at all). Every x will be added to out, so out will grow by 1 every time through the loop. This leads to the now-familiar “triangular” time complexity which is \(O(n^2)\).

If we build a hierarchy of complexities, it looks like this (from best to worst):

Complexity class Name
\(O(1)\) Constant
\(O(\log n)\) Logarithmic
\(O(n)\) Linear
\(O(n \log n)\) N-log N (“linerithmic”)
\(O(n^2)\) Quadratic
\(O(n^3)\) Cubic
\(O(n^p)\) Polynomial
\(O(2^n)\) Exponential
\(O(n!) \equiv O(n^n)\) Factorial

Templates

Consider a function such as

int max(int a, int b)
{
  return a > b ? a : b;
}

You can think of this function (or any function) as a machine with a number of “holes” in it, where the parameters are. In order for the machine to work, you have to fill in all the holes with things of the appropriate type, and then the machine cranks up and the function runs. The holes allow us to make functions more flexible than just a block of code that always did the same thing; the behavior depends on the values supplied.

If you look at the function above and think for a bit, it should become clear that it is actually a family of functions, all operating on different types but all doing the same thing:

int    max(int a,    int b)    { return a > b ? a : b; }
float  max(float a,  float b)  { return a > b ? a : b; }
double max(double a, double b) { return a > b ? a : b; }
string max(string a, string b) { return a > b ? a : b; }

The body of the function stays the same, only the types of the holes vary. Looking at it this way, what if we added a different kind of hole, one that, instead of filling it in with a value, you fill it in with a type? This is what templates let us do: write one version of this function where the concrete types (int, float, etc.) are replaced by a type variable.

The template-ified version of the above (replacing all four functions and any future max functions with the same body, too) looks like this:

template<typename T>
T max(T a, T b) 
{
  return a > b ? a : b;
}

In order to make use of this function, we have to supply three pieces of information, fill in three holes:

To use the new max, we can write something like this:

cout << max<int>(10,4) << endl;

In angle brackets after the name, we write the type to use for T, i.e., the type arguments to the function. In parentheses, we supply the values for a and b, the value arguments. (Angle brackets for type args, parens for value args.)

In fact, in this case, we don’t have to supply int as the type for T, C++ can figure it out for us if we just write:

cout << max(10,4) << endl;

C++ looks at the value-arguments to the function, sees that in the function declaration that are both T, sees in the function call that they are both ints, and deduces that T = int is the only way this will work.

Template function declarations

A templated function definition is its own declaration; this means that template functions must go into .hpp files, never into .cpp files. This also means that the template function declaration/definition must come before any uses of the function; there’s no way to pre-declare a template function and then define it elsewhere. (The reason for this is that the compiler generates the T-specific version of the function when it sees a need for it, e.g., max(10,4), so it needs to have the full definition available.)

Template classes

Just like a function can have a type-hole in it, so can a class:

template<typename T>
class box {
  public:
    box() {
      data = new T[10];
    }

    ~box() {
      delete[] data;
    }

    T at(int i) {
      return data[i];
    }

  private:
    T* data;
};

When we use this class, we must supply a type for T:

box<int> my_box;

Unlike functions, C++ cannot deduce T in a templated class. (At least in the version of C++ we are using; C++17 added “class template automatic deduction” or CTAD which allows this in certain cases.)

You can have more than one template type variable (in both functions and classes). For example, a useful type is

template<typename A, typename B>
struct pair {
  A fst;
  B snd;
};

Functions normally can only return one value, but with this we can write a function that returns (e.g.) an int and string as

pair<int,string> my_function() {
  ...
  return pair<int,string>{12, "Hello"};
}

(pair is useful enough that it’s built-in, in the <utility> header, as std::pair. The function make_pair() makes building pairs more convenient.)

The Vector Class: extensible arrays

The std::vector class implements an extensible array. This means that it support a method .push_back(k) which adds k to the end of the array, growing the array by one element. Let’s take a first stab at building vector from scratch:

template<typename T, typename R>
class vector {
  public:
    vector(int s = 0, T v = T()) {
      cap = 2*s;
      sz = s;
      data = new T[cap];

      for(int i = 0; i < s; ++i)
        data[i] = v;
    }

    ~vector() { delete[] data; }

    // Need copy ctor, overloaded =

    int size()     { return sz; }
    bool empty()   { return sz == 0; }
    int capacity() { return cap; }

    T at(int i)    { return data[i]; }    

    void pop_back() {
      if(sz > 0)
        --sz;
    }

    void push_back(T v) {
      if(sz < cap)
        data[sz++] = v;
      else {
        // ?
      }
    }

    // insert, erase

  private:
    T* data;
    int sz, int cap;
};

This version supports creating a vector with a specific size, filled with some particular value v or the default T-value, if you don’t specify one. You can also create an empty vector: E.g.

vector<int>    vi(10,-1); // Size = 10, filled with -1
vector<string> vs(20);    // Size = 20, filled with empty strings
vector<float>  vf();      // Size = 0, capacity = 0

The capacity of the vector defaults to twice the size so in the above example, the capacities would be 20, 40, and 0.

The pop_back operation is easy: as long as the vector is not empty, just decrement the size. We don’t actually “remove” the element, we just pretend that it isn’t there, and a later push_back will overwrite it.

The push_back operation is easy, provided that there is room left (size() < capacity()). If there isn’t, what do we do?

An extensible array

The easiest way to build an array that can “grow” is to re-allocate a new array, with one extra element, and then copy everything over.

Here’s a sketch:

void push_back(T v) {
  if(sz == cap) {
    cap += 1; // Make room for the new element
    T* new_data = new T[cap];

    for(int i = 0; i < sz; ++i)
        new_data[i] = data[i];

    delete[] data;
    data = new_data;
  }

  // Now sz < cap always
  data[sz++] = v;        
}

What determines the runtime complexity of push_back? (The current size of the vector.) We have to loop over all the elements, so the runtime complexity of push_back is linear in the size of the vector. This isn’t so bad initially, but imagine a million-element vector; we have to copy a million things, just to add one!

What’s a better way of doing this? Allocate more than one “extra” space when we reallocate, and keep track of the number of “unused” entries available. Only when they are all used up do we need to do a full copy. This looks something like this

void push_back(T v) {
  if(sz == cap) {
    cap += ?; // Make room for the new element
    T* new_data = new T[cap];

    for(int i = 0; i < sz; ++i)
        new_data[i] = data[i];

    delete[] data;
    data = new_data;
  }

  // Now sz < cap always
  data[sz++] = v;        
}

The main question now is, when we do a full reallocation, what is the optimal amount of “scratch space” to allocate? It has to be at least 1 (for the new element), but we’ve already seen that this is bad. What should we use?

We’re going to analyze the behavior of push_back using a technique called amortized analysis. This is different from our normal analysis, because we’re looking at its behavior over many runs of the same function. It’s OK if some runs are bad, so long as they are balanced out by others. We have to use amortized analysis to get an accurate picture of how push_back runs, because simply taking its worst case (\(O(n)\)) and assuming that that happens every time is both incorrect (it can’t happen every time) and too pessimistic.

Let’s look at what happens if we use a fixed amount, like 8 (every time we have to reallocate, we add 8 extra elements).

Let’s analyze the behavior of this algorithm over time, by looking at the cost of each push_back:

adding items...
1-16  are free
17    costs 16
18-32 free
33    costs 32
34-48 free
49    costs 48

Over 49 items added via push_back, the average cost per item is 96 / 49 or about 2. What’s more interesting is if we examine the cost over the ranges 1-17, 18-33, and 34-39:

Range Avg cost per item
1 - 17 16/17
18 - 33 32/17
34 - 49 48/17

In other words, for every 16 items we add, the cost increases by 16 as well. We’ve reduced the constant factor (by spreading out the copies) but overall, adding new elements is still a linear-time algorithm. We need a different approach.

We want the avg. cost per add to remain the same. If we want to get that, we’re going to need to space out the copies further and further, as the array grows, so that copies become less frequent, as they become more expensive.

In fact, for an array of size \(n\) we should allocate exactly \(n\) more space, doubling the size of the array, every time we need to grow it. Let’s analyze the behavior of this:

1         cost 1    (array size = 2)
2         free
3         cost 3    (array size = 4)
4         free   
5         cost 5    (array size = 8)
6-8       free
9         cost 9    (array size = 16)
10-16     free
17        cost 17   (array size = 32)
18-32     free
33        cost 33   (array size = 64)
34-64     free   
65        cost 65   
...

The average cost over the entire range is 64 / 64 = 1. But a new picture emerges if we look at the cost, per range of items:

Range Avg cost per item
1 1 = 1
2-3 3/3 = 1
4-5 5/5 = 1
6-9 9/9 = 1
10-17 17/17= 1
18-33 33/33= 1
34-65 65/65= 1

Over all the ranges we’ve examined, the time to add new elements on average remains constant. If we fix our implementation, it looks like this:

void push_back(T v) {
  if(sz == cap) {
    // Make room for the new element
    cap = cap == 0 ? 1 : cap * 2; 
    T* new_data = new T[cap];
    ...

We call this kind of analysis over many operations amortized analysis.

Rent-or-buy

This isn’t really an algorithm, but it demonstrates some of the ideas behind amortized analysis: suppose you’re going to a fancy event, and you need a tuxedo:

If you think this will be the only fancy event you’ll ever go to, then renting is better. But if you get invited to fancy events regularly (but not with any certainty) the question becomes, at what point should you stop renting and just buy?

To figure this out, we’re going to compare the worst-case cost with the optimal cost, the cost if we had known in advance how many fancy-invites we were going to get. This is called the “competitive ratio” (CR).

Suppose we decide to rent until the total cost we’ve spent is greater than or equal to the cost of a purchase. That is, we’ll rent five times and then purchase when we get the sixth invite.

(The worst-case CR is \(2 - r/p\) where r is the rental cost and p is the purchase cost.)

With this strategy, we are never spending more than twice what we would if we had knowledge of the future (the optimal strategy), which, it turns out, is the best we can do with no knowledge. I.e., no other method without future knowledge can do better.

To show this, assume that you decide to rent k times and then purchase on the k+1 invite. Your cost would be \(50k + 300\). The optimal cost would be \(\text{min}(300, 50(k+1))\), i.e., the smaller of the purchase price, or the cost of renting k+1 times. So

$$\text{CR} = \frac{50k + 300}{\text{min}(300, 50(k+1))}$$

So no matter what you do, the ratio is always ≥ 1.83.

Vector complexity

How long does it take to access a particular array element (i.e., a[n])? Does it depend on n, or on the size of the array? No; remember that an array access a[n] is just a shorthand for a pointer addition:

*(a + n)

So it’s no more complex than an addition and a pointer dereference. Thus, accessing a particular array element is \(O(1)\), depending on neither the size of the array, nor the element being accessed. Vectors are just arrays internally, so they inherit that behavior. Let’s build a table of the complexities of the operations supported by vectors:

Operation Complexity class
a.at(n) \(O(1)\)
a.push_back() \(O(1)\) amort., \(O(n)\) worst
a.pop_back() \(O(1)\)
a.insert() \(O(n)\)
a.push_front() \(O(n)\)
a.erase() \(O(n)\)
a.clear() \(O(1)\) (deletes everything)

We can see that vector has some tradeoffs. We can get to any element quickly, and we can add new elements at the end relatively quickly. Splicing new elements in the middle (or at the beginning) requires copying k elements and also shifting all the elements after the insert position up, which on average, will be within a constant factor of the size of the whole vector. push_front() is the same as inserting at position 0, so it always takes time proportional to the size of the vector. Deleting the entire contents of the vector (i.e., resetting size to 0) is fast, because we can just do sz = 0;.

A vector that supports push-front

Can we build a vector that supports amortized constant-time push_front, without sacrificing push_back or any of the other properties? Yes, but at a cost: additional space. Now, when we reallocate, we’ll need to leave some unused space at the end and the beginning. When we insert new elements, either at the beginning or the end, we have a choice:

As it turns out, we want to do both, switching between them depending on how close to “full” the vector is. When there’s still a lot of empty space available, it makes sense to spend some time re-centering. When there isn’t much space left, it’s very expensive to move everything, so we should just wait for the (impending) reallocation to recenter us. We can set a threshold for what percentage-used triggers the no-recentering policy.

An interesting thing to note is that this approach assumes that front and back inserts are equally likely. If they are not, we can improve by building an adaptive algorithm. For example, the vector could keep track of how many push_front’s you did, relative to how many push_back’s. If there was a bias in one direction or the other, we could skew the re-centering step to allow more space on that side.

(For a fuller discussion, see http://larshagencpp.github.io/blog/2016/05/22/devector)

So if you need a data structure that lets you access things at random locations, and the lets you add new things to the end quickly-ish, a vector is a good choice. If you need to insert things in the middle, or at the beginning, it may not be. Let’s look at a data structure with some different tradeoffs (slow random access, fast insertion/push_back): the linked list.