Big-O Notation

We refer to the function that an algorithm’s runtime is proportional to as its order and say that an algorithm is \(O(f(n))\) to mean that its order is \(f(n)\) (i.e., is roughly proportional to \(f(n)\)). This is colloquially known as “big-O” notation.

An algorithm is of order \(O(f(n))\) if there exist constants \(k\) and \(n_0\) such that for any input of size \(n > n_0\), the runtime of the algorithm is no more than \(k f(n)\).

Or, mathematically, a function \(f(n)\ \in O(g(n))\) iff

There exists \(k > 0\) such that for all \(n > n_0\):

$$f(n) \le k g(n)$$

(\(n_0\) captures the notion of our analysis applying to “big enough” inputs. Only if the size of the input is larger than some minimum does the proportionallity apply. Similarly, \(k\) captures proportionality.) We are assuming that both \(x(n)\) and \(f(n)\) are strictly positive.

If you want a calculus-style definition, we can also use a limit:

$$f(n) \in O(g(n))\; \text{iff}\; \lim_{n \rightarrow \infty} \frac{f(n)}{g(n)} \in [0, \infty)$$

(That is, the limit must exist and be finite.)

Intuitively, you should think of \(f(n) \in O(g(n))\) as meaning “\(f(n) \le K g(n)\) for some K and really big n”. (There are other “big-letter” notations which correspond to ≥, <, equal-to, etc.)

Summations of Big-O

There is a summation identity that will make working with big-O notation much easier: if \(f(i) \in O(i^p)\).

$$\Sigma_{i=1}^n f(i) = O(n^{p+1})$$

This basically says that if you sum up a function which is \(O(n^p)\) then the resulting function will be \(O(n^{p+1})\). \(n^2\) becomes \(n^3\) and so forth.

How Big-O works

Analyze how big-O works: what can we conclude from the definition of \(O\)?

Well, for one thing:

$$a_1 n^p + a_2 n^{p-1} + \ldots + a_p n^{1} + a_{p+1} \in O(n^p)$$

that is, we can drop all but the highest degree term in a polynomial.

For example, suppose we want have

$$100 n + n^4 \in O(n^4)$$

To show that this is true, all we have to do is figure out appropriate \(k\) and \(n_0\) to make the definition true. If we let \(k = 100, n_0 = 2\) then we have

$$100 n + n^4 \le 100 n^4, \quad n > n_0$$

Note that if we try to show that \(100 n + n^4 \in O(n^2))\) we will fail; there is no \(k, n_0\) that will make the definition true.

Some other properties:

$$c f(n) \in O(f(n)) \quad\mathrm{if}\quad c \ne 0$$

This says that we can ignore constant multiples (makes sense, because we can always fold them into \(k\)).

$$\text{if}\;f(n) \in O(F(n)), \quad g(n) \in O(G(n))$$ $$f(n) + g(n) \in O(F(n) + G(n)))$$

If we add two functions together, then the order of the sum is just the sum itself, but because in a sum we can drop all but the fastest growing term, we can rewrite this as

$$f(n) + g(n) \in O(\text{max}(F,G)(n))$$

On the other hand, if we take a product:

$$\text{if}\;f(n) \in O(F(n)), \quad g(n) \in O(G(n))$$ $$\text{then}\; f(n) g(n) \in O(F(n) G(n))$$

Note that this means that if we have some algorithm which we know is of order \(O(F(n))\), and we run it \(n\) times, then the result is of order \(O(n F(n))\).

Because big-O is a kind of generalized \(\le\), there is something like the transitive property for it:

$$\mathrm{if}\quad f(n) \in O(g(n))\quad\mathrm{and}\quad g(n) \in O(h(n))\quad\mathrm{then}\quad f(n) \in O(h(n))$$

Finally, adding a constant has no effect on a function’s order (because a constant is the lowest order, and hence always drops off).

$$f(n) \pm c \in O(f(n))$$

Question: which grows faster, \(n \log n\) or \(n^2\)? The multiplication rule means that we can factor out a common \(n\), thus we are left with the question of which grows faster, \(n\) or \(\log n\). This should hopefully be easier.

Which grows faster, \(n^{100}\) or \(2^n\)?

Complexity Classes

Complexity class Name
\(O(1)\) Constant
\(O(\log n)\) Logarithmic
\(O(n)\) Linear
\(O(n \log n)\) N-log N (“linerithmic”)
\(O(n^2)\) Quadratic
\(O(n^3)\) Cubic
\(O(n^p)\) Polynomial
\(O(2^n)\) Exponential
\(O(n!) \equiv O(n^n)\) Factorial

Using asymptotic analysis

Analyzing the runtime (or space usage) of a function in big-O terms, rather than precisely as above, is called asymptotic analysis. Let’s try an example.

Suppose we want to raise a floating point value to a non-negative integer power. That is, we want to compute

$$f^p\qquad p \in \mathbb{N}$$

An easy way to do this is to use a loop from 0 to p, each time multiplying an accumulator by \(f\). That is,

float powi(float f, unsigned int n) {
  float accum = 1.0;

  for(unsigned int i = 1; i <= n; ++i)
    accum *= f;

  return accum;
}

Which of the two arguments determines the runtime of this algorithm? (n, because the loop depends on n). Inside the loop, we do one multiply and one assign (plus the compare and increment we do to update the loop variable). So this algorithm is \(O(n)\) because the loop executes \(n\) times.

However, there’s a faster way to compute a nonnegative integer power.

This is because if n is even then we can rewrite

$$f^n = (f^2)^{n \over 2}$$

and likewise if n is odd:

$$f^n = f (f^{n-1})$$

This leads to the following algorithm:

float powi(float f, unsigned int n) {
  if(n == 0)
    return 1;

  float accum = f;
  while(n > 1) {
    if(n % 2 == 0) { // even
      accum *= accum;
      n /= 2;
    }
    else { // odd
      accum *= f;
      n -= 1;
    }
  }

  return accum;
}

(In this case, we have to special-case 0 and start the accumulator at f. If we started the accumulator at 1, then if n was a power of 2, we’d never get anywhere!)

It should be clear that the runtime still depends on n (the loop termination condition is on n, and we manipulate n, shrinking it each time through the loop), but how can we categorize the runtime? We can start by categorizing the best and worst cases for this algorithm:

Let’s look at the best case. If \(n = 2^p\), then we’re going to divide n by 2 exactly p times; in other words, the loop will run through p iterations. Given n, what mathematical function do we use to get p? (Logarithm). The runtime will be proportional to the log (base 2) of n, so we say this is \(O(\log n)\). (The difference between \(\log_2\) and log-any-other-base is only a constant factor, so we can disregard it.)

What about the worst case, when does it occur? We can force a worst case by starting at the bottom and working up.

What do the numbers 1, 3, 7 (the next is 15) have in common? They are all powers of 2, minus one. So the worst case occurs when

$$n = 2^p - 1$$

In this case, we will perform \(p-1\) division steps and \(p-1\) decrement steps, for a total of \(2p - 2\) iterations. Note that this is still a constant multiple of p, so our assessment of this algorithm as being \(O(\log n)\) is correct even in the worst case.

Logarithmic algorithms are common in situations where we can cut the size of the problem in half (or some other fraction) every time, until we have the solution. The number of times we can halve it is simply the \(\log_2\) of the original input.

Another common logarithmic algorithm is binary search. We’ll take a closer look at it later, but the problem this solves is, how do we find a specific value in a sorted array? If the array was unsorted, we would have no choice but to scan through from beginning to end, because we have no idea where the target might be. But if the array is sorted, we can use that information to rule out portions of the array. E.g., suppose we are looking for 12 in an array, and we look at the middle element of the array and find that it is 7:

---------------------------------------------
                   | 7 |                     
---------------------------------------------

What can we say about where 12 must be in the array, if it exists at all? It must be above 7’s location, so we can rule out 7 and everything below it. If 7 truly was the middle value, then we’ve ruled out half the array.

The implementation of binary search looks like this (for a vector of ints):

int binary_search(vector<int> data, int target) {
    int low = 0, high = data.size() - 1;
    while(low <= high) {
        int mid = (low + high) / 2;

        if(target == data[mid])
            return mid;
        else if(target < data[mid])
            high = mid - 1;
        else // target > data[mid]
            low = mid + 1;
    }

    // Not found
    return -1;
}

Analysis: the worst case occurs when the element is not found; this will eventually result in low > high terminating the loop. Suppose the size of the vector is 31 (\(= 2^5 - 1\). Then mid = 30/2 = 15, and we have 0-14 below, and 16-31 above, 15 on both sides. It doesn’t matter which side we go down; we’ll be splitting 15 element (minus one for the middle) in half, leaving 7, then 3, then 1, then exit. Five steps to exit, because the original \(n\) was approximately \(2^5\).

Nested loops where the inner loop depends on the outer

One algorithm we’ll look at later is selection sort. This is algorithm for sorting a list of item that works pretty simply:

Algorithmically, this looks like this:

void sort(vector<int>& data) {
  for(int i = 0; i < data.size() - 1; ++i) {
    // Find smallest in i..data.size-1

    int smallest = data.at(i);
    int smallest_index = i;
    for(int j = i; j < data.size(); ++j) 
      if(data.at(j) < smallest) {
        smallest = data.at(j);
        smallest_index = j;
      }

    // Swap it into place
    data.swap(data.at(smallest_index), i);
  }
}

Let’s analyze the runtime of this. The outer loop executes \(n-1\) times (there’s no need to “sort” the list when only one element is left, so we stop at that point). For any particular i set by the outer loop, the inner loop will execute \(n - i\) times. So the sum we have is

$$\sum_{i=0}^{n-1} n - i$$

We can make this easier to read by changing it so that it counts up. That is, if we think about the outer loop running in the other direction, the inner loop executes 0 times, then 1 time, then 2 times, etc. So we really have a sum like this

$$\sum_{i=0}^{n} i = 0 + 1 + 2 + \ldots + (n-1) + n$$

to solve this, let’s look at \(2\Sigma_{i=0}^n i\):

(Demo, rearrange sum, show how terms align, divide by 2). Giving us a result of

$$= \frac{n(n+1)}{2} \in O(n^2)$$

We could also get this more simply by noting that the inner loop does \(O(n)\) work, and if we sum up \(O(n)\) work \(n\) times, the result is \(O(n^2)\) (by our summation-of-big-O identity).

Geometrically, this kind of sum/loop looks like a right triangle. (In fact, it’s half of a rectangle that is n by n-1.) We’ll see it come up a lot later on, so it’s worth remembering that it’s order \(O(n^2)\).

The maximal contiguous subsequence sum problem: three versions

This problem can be solved naively, with a \(O(n^3)\) algorithm, a bit smarter with a \(O(n^2)\) algorithm, or, by being quite clever, with a \(O(n)\) algorithm. We’ll examine each in turn.

The idea of this problem is that we are given a sequence of integers (positive and negatives allowed) and we want to find some range of elements [start..end] that adds up to a larger sum than any other range. If all the elements were positive, it would simply be a matter of finding the largest element and using the one-element range consisting of just it, but negative elements are allowed, so it’s possible that expanding a range may in fact make its sum smaller.

(Note that the range [start..start] has one element, not 0.)

E.g., given the sequence

1 -5 2 -1 3

we have subsequences like 1 -5 2, 2 -1 3, -5 2, etc. The subsequence with the largest sum is 2 -1 3 which sums to 4.

Version 1: check every possible subsequence

To do this, we have two loops, for the starting and ending points of the sequences:

int mcss(vector<int> seq) {
  int largest = INT_MIN;

  for(int start = 0; start < seq.size(); ++start) 
    for(int end = start; end < seq.size(); ++end)
      ...

  return largest;
}

and then finally an inner loop that loops over the elements in the range, computing their sum and comparing it to the current maximum:

int mcss(vector<int> seq) {
  int largest = INT_MIN;

  for(int start = 0; start < seq.size(); ++start) 
    for(int end = start; end < seq.size(); ++end) {
      int sum = 0;
      for(int i = start; i <= end; ++i)
        sum += seq.at(i);

      largest = max(largest, sum);
    }

  return largest;
}

What is the order of the runtime of this algorithm? We have three nested loops, each of which depends to some extent on the size of the input, so \(O(n^3)\) would not be a bad first guess, but let’s verify that.

The outer loop definitely executes \(n\) times (where \(n = \)seq.size()). The inner loops depends on where the outer loop starts. As a sum, this looks like

$$\sum_{i=0}^{n-1} \sum{j=i}^{n-1} \ldots$$

because on the first iteration, \(i = 0\) and the inner loop runs all the way through the sequence, but when \(i = n-1\) the inner loops only runs once. But that’s if the inner loop was constant, but it has another loop inside of it that depends on both, so in actuality, we have the triple sum

$$\sum_{i=0}^{n-1} \sum_{j=i}^{n-1} \sum{k=i}^{j} 1$$

How do we solve this? Well the inner loop just works out to \(1+j-i\), so that gives us

$$\sum_{i=0}^{n-1} \sum_{j=i}^{n-1} 1 + j - i$$

Using the splitting technique we have three sums:

  1. $$\sum_{i=0}^{n-1} \sum_{j=i}^{n-1} 1$$
  2. $$\sum_{i=0}^{n-1} \sum_{j=i}^{n-1} j$$
  3. $$\sum_{i=0}^{n-1} \sum_{j=i}^{n-1} -i$$

Note that in sum (1), the inner sum is the “simplest sum” which has an easy solution. In sum (2), the inner sum consists of just \(-i\), which is a constant with respect to the summation variable \(j\), so we can factor it out, leaving the last sum as \(\sum_{i=0}^{n-1} -i \sum_{j=i}^{n-1} 1\) which again has an easy solution. This leaves us at

  1. $$\sum_{i=0}^{n-1} 1 + n-1 - i$$
  2. $$\sum_{i=0}^{n-1} \sum_{j=i}^{n-1} j$$
  3. $$\sum_{i=0}^{n-1} -i (1 + n-1 - i) = \sum_{i=0}^{n-1} -in + i^2$$

Let’s focus on sum (3). If we split off the \(i^2\) portion, we have

$$\sum_{i=0}^{n-1} i^2$$

Rather than work this out (it’s complicated) I’m just going to give you the answer:

$$\frac{n(n+1)(2n-1)}{6}$$

(You’re welcome to try to work it out yourself!)

Note that if we multiply this out we get

$$\frac{2n^3 + n^2 - n}{6}$$

which is of order \(O(n^3)\), confirming our suspicious, this algorithm runs in cubic time.

Quadratic time

We can improve the algorithm by making a simple observation: Suppose we have computed the sum of elements \(i\) through \(j\). To compute the sum of elements \(i\) through \(j+1\), all we have to do is add seq[j+1] to the sum. That is, by reusing previous sums, we can avoid the inner loop. This variant of the algorithm looks like this

int mcss(vector<int> seq) {
  int largest = INT_MIN;

  for(int start = 0; start < seq.size(); ++start) {
    int sum = 0;
    for(int end = start; end < seq.size(); ++end) {
      sum += seq.at(end);
      largest = max(largest, sum);
    }
  }

  return largest;
}

Comparing the two loops in this algorithm, we can see that they are equivalent to the nested loops in the selection sort algorithm, which we already proved was \(O(n^2)\). Getting better!

Linear time

We can actually make a version that is \(O(n)\). The idea is to recognize a simple fact:

A subsequence [0..i] whose sum is negative cannot be part of the maximal subsequence.

The reason this is true is because if the maximal subsequence was [0..j] with \(j > i\), but [0..i] had a negative sum, we could make a larger sum by simply removing the subsequence [0..i] and using [i+1..j] as the subsequence.

Thus, we keep two indexes: the start and end of the potentially-largest subsequence. Whenever the sum becomes negative, we advance the start to skip over it. Otherwise, we advance the end. The implementation looks like this:

int mcss(vector<int> seq) {
  int largest = INT_MIN;
  int current = 0;

  for(int start = 0, end = 0; end < seq.size(); ++end) {
    current += seq.at[end]; 

    if(current > largest)
      largest = current;
    else if(current < 0) {
      // restart subsequence
      start = end + 1;
      current = 0;
    }
  }

  return largest;
}

One loop, which can execute at most seq.size() times, making this implementation \(O(n)\). Nice work. This demonstrates how a little attention and effort can dramatically improve the performance of an algorithm.

Recursive (divide and conquer)

It’s also possible to build a recursive, divide-and-conquer solution, by splitting the vector in half, finding the MCSS of the left half (including its starting and ending positions) and of the right half, and then checking to see whether the left MCSS ends at the midpoint, and the right MCSS begins at the midpoint. If so, we merge them into a single sequence, otherwise we return the larger of the two sequences. This looks something like this:

int mcss(vector<int> seq) {
  int sum, start, end;
  mcss(seq, 0, seq.size()-1, sum, start, end);
  return sum;
}

// This is the actual implementation, finds the MCSS in the range left..right,
// and stores its sum, beginning index, and ending index, into sum, begin, end.
void mcss(vector<int> seq, int left, int right, int& sum, int& begin, int& end) {
  if(left == right) {
    // One element
    sum = seq[left];
    begin = end = left; 
  }
  else {
    // Split in half
    int mid = (right - left) / 2 + left;

    int lsum, lbegin, lend;
    mcss(seq, left, mid-1, lsum, lbegin, lend);

    int rsum, rbegin, rend;
    mcss(seq, mid, right, rsum, rbegin, rend);

    if(lend == rbegin - 1 && lsum > 0 && rsum > 0) {
      // Stitch together if they meet in the middle and both are positive.
      sum = lsum + rsum;
      begin = lbegin;
      end = rend;
    }
    else if(lsum > rsum) {
      // Use left sum
      sum = lsum;
      begin = lbegin;
      end = lend;
    }
    else {
      // Use right sum
      sum = rsum;
      begin = rbegin;
      end = reng;
    }
  }
}

Amortized analysis

Sometimes just analyzing a single run of an algorithm isn’t enough. We’ve assumed so far that each run of an algorithm is independent of any others: the algorithms have no “internal state” which is maintained between runs. But when we analyze data structures, we often have to contend with the fact that one operation on a structure will change the structure, affecting the runtime of future operations. So the same operation, done twice in a row, might have different runtime complexities each time!

To deal with this situation we use amortized analysis. An amortized analysis analyzes not just a single run of an algorithm, but many runs, and then computes the average (big-O) runtime cost per run.

There are several different methods of performing amortized analysis:

A classic example of the accounting method is given by a generalized “stack” data structure. A stack is a data structure where we can push new items onto the top of the stack, or pop the top item off of the stack, removing it. A stack has 0 or more items at all times. To simplify the analysis, we’ll imagine that the stack supports only a single operation called op(k,x) which is both a pop and a push:

To perform a traditional “push”, just set k = 0. (Note that this stack always grows by one element; it can never shrink.)

An example of genstack, implemented using vector, is shown below:

template<typename T>
class genstack {
  public:
    void op(int k, T x) {
      // Pop
      for(int i = 0; i < k; ++i)
        data.pop_back();
      data.push_back(x);
    }
  private:
    vector<T> data;
};

The stack is empty when it is first created.

Suppose each individual pop_back and push_back has a cost of 1. Then the worst case for op is when the vector has n items in it, and n = k. Then we have to perform n pop_backs (for a total cost of n) plus one push_back (cost = 1), for a total cost of n+1. Thus, we might analyze the cost of a whole sequence of m successive op calls as requiring \(O(mn)\) time. However, this cannot be right, because there is no way to arrange two successive op calls so that the both require \(O(n)\) time! Either the first pops everything off, and the second is \(O(1)\) or vice versa.

To capture this, we use the accounting method and assign the op operation a cost of $2. Note that the first op call must be op(0,x), because the stack is empty. The real cost of this is $1, so we have $1 of credit available to spend after the first op. Credit is “stored” with the item pushed onto the stack.

When the stack has one item, there are two valid op operations we could perform:

This pattern continues for a whole sequence of ops: we always have $1 more than the real cost of the operation, so we can always leave $1 credit with the new item.

Once we’ve convinced ourselves that the above setup “pays the bills”, the final analysis is easy: each op has an amortized cost of $2 \(= O(1)\) so a whole sequence of m ops has a total amortized cost of \(O(m)\), not the \(O(mn)\) we originally predicted.

(An alternate but similar method for doing amortized analysis is the potential method, in which the entire data structure is considered to have some floating-point, nonnegative amount of “potential” distributed over the entire structure. Like in the accounting method, operations may add more potentially than they really need, with the excess accumulating in the structure, or may use more than they are given, by consuming some of the excess ambient potential. As above, the potential is never allowed to drop below 0. The potential method is particular useful when the data structure is being modified by several different operations, rather than a single operation as above.)

Rent-or-buy

This isn’t really an algorithm, but it demonstrates some of the ideas behind amortized analysis: suppose you’re going to a fancy event, and you need a tuxedo:

If you think this will be the only fancy event you’ll ever go to, then renting is better. But if you get invited to fancy events regularly (but not with any certainty) the question becomes, at what point should you stop renting and just buy?

To figure this out, we’re going to compare the worst-case cost with the optimal cost, the cost if we had known in advance how many fancy-invites we were going to get. This is called the “competitive ratio” (CR).

Suppose we decide to rent until the total cost we’ve spent is greater than or equal to the cost of a purchase. That is, we’ll rent five times and then purchase when we get the sixth invite.

(The worst-case CR is \(2 - r/p\) where r is the rental cost and p is the purchase cost.)

With this strategy, we are never spending more than twice what we would if we had knowledge of the future (the optimal strategy), which, it turns out, is the best we can do with no knowledge. I.e., no other method without future knowledge can do better.

To show this, assume that you decide to rent k times and then purchase on the k+1 invite. Your cost would be \(50k + 300\). The optimal cost would be \(\text{min}(300, 50(k+1))\), i.e., the smaller of the purchase price, or the cost of renting k+1 times. So

$$\text{CR} = \frac{50k + 300}{\text{min}(300, 50(k+1))}$$

So no matter what you do, the ratio is always ≥ 1.83.

For a fuller description of amortized analysis, including applications to some data structures which we’ll see later, see https://www.cs.princeton.edu/~fiebrink/423/AmortizedAnalysisExplained_Fiebrink.pdf

The Vector data structure: “growable” arrays

The std::vector class implements an extensible array. This means that it support a method .push_back(k) which adds k to the end of the array, growing the array by one element. Let’s take a first stab at building vector from scratch:

template<typename T>
class vector {
  public:
    vector(int s = 0, T v = T{}) {
      cap = 2*s;
      sz = s;
      data = new T[cap];

      for(int i = 0; i < s; ++i)
        data[i] = v;
    }

    ~vector() { delete[] data; }

    // Need copy ctor, overloaded =

    int size()     { return sz; }
    bool empty()   { return sz == 0; }
    int capacity() { return cap; }

    T at(int i)    { return data[i]; }    

    void pop_back() {
      if(sz > 0)
        --sz;
    }

    void push_back(T v) {
      if(sz < cap)
        data[sz++] = v;
      else {
        // ?
      }
    }

    // insert, erase

  private:
    T* data;
    int sz, int cap;
};

This version supports creating a vector with a specific size, filled with some particular value v or the default T-value, if you don’t specify one. You can also create an empty vector: E.g.

vector<int>    vi(10,-1); // Size = 10, filled with -1
vector<string> vs(20);    // Size = 20, filled with empty strings
vector<float>  vf();      // Size = 0, capacity = 0

The capacity of the vector defaults to twice the size so in the above example, the capacities would be 20, 40, and 0.

The pop_back operation is easy: as long as the vector is not empty, just decrement the size. We don’t actually “remove” the element, we just pretend that it isn’t there, and a later push_back will overwrite it.

The push_back operation is easy, provided that there is room left (size() < capacity()). If there isn’t, what do we do?

An extensible array

The easiest way to build an array that can “grow” is to re-allocate a new array, with one extra element, and then copy everything over.

Here’s a sketch:

void push_back(T v) {
  if(sz == cap) {
    cap += 1; // Make room for the new element
    T* new_data = new T[cap];

    for(int i = 0; i < sz; ++i)
        new_data[i] = data[i];

    delete[] data;
    data = new_data;
  }

  // Now sz < cap always
  data[sz++] = v;        
}

What determines the runtime complexity of push_back? (The current size of the vector.) We have to loop over all the elements, so the runtime complexity of push_back is linear in the size of the vector. This isn’t so bad initially, but imagine a million-element vector; we have to copy a million things, just to add one!

What’s a better way of doing this? Allocate more than one “extra” space when we reallocate, and keep track of the number of “unused” entries available. Only when they are all used up do we need to do a full copy. This looks something like this

void push_back(T v) {
  if(sz == cap) {
    cap += ?; // Make room for the new element
    T* new_data = new T[cap];

    for(int i = 0; i < sz; ++i)
        new_data[i] = data[i];

    delete[] data;
    data = new_data;
  }

  // Now sz < cap always
  data[sz++] = v;        
}

The main question now is, when we do a full reallocation, what is the optimal amount of “scratch space” to allocate? It has to be at least 1 (for the new element), but we’ve already seen that this is bad. What should we use?

We’re going to analyze the behavior of push_back using a technique called amortized analysis. This is different from our normal analysis, because we’re looking at its behavior over many runs of the same function. It’s OK if some runs are bad, so long as they are balanced out by others. We have to use amortized analysis to get an accurate picture of how push_back runs, because simply taking its worst case (\(O(n)\)) and assuming that that happens every time is both incorrect (it can’t happen every time) and too pessimistic.

Let’s look at what happens if we use a fixed amount, like 8 (every time we have to reallocate, we add 8 extra elements).

Let’s analyze the behavior of this algorithm over time, by looking at the cost of each push_back:

adding items...
1-16  are free
17    costs 16
18-32 free
33    costs 32
34-48 free
49    costs 48

Over 49 items added via push_back, the average cost per item is 96 / 49 or about 2. What’s more interesting is if we examine the cost over the ranges 1-17, 18-33, and 34-39:

Range Avg cost per item
1 - 17 16/17
18 - 33 32/17
34 - 49 48/17

In other words, for every 16 items we add, the cost increases by 16 as well. We’ve reduced the constant factor (by spreading out the copies) but overall, adding new elements is still a linear-time algorithm. We need a different approach.

We want the avg. cost per add to remain the same. If we want to get that, we’re going to need to space out the copies further and further, as the array grows, so that copies become less frequent, as they become more expensive.

In fact, for an array of size \(n\) we should allocate exactly \(n\) more space, doubling the size of the array, every time we need to grow it. Let’s analyze the behavior of this:

1         cost 1    (array size = 2)
2         free
3         cost 3    (array size = 4)
4         free   
5         cost 5    (array size = 8)
6-8       free
9         cost 9    (array size = 16)
10-16     free
17        cost 17   (array size = 32)
18-32     free
33        cost 33   (array size = 64)
34-64     free   
65        cost 65   
...

The average cost over the entire range is 64 / 64 = 1. But a new picture emerges if we look at the cost, per range of items:

Range Avg cost per item
1 1 = 1
2-3 3/3 = 1
4-5 5/5 = 1
6-9 9/9 = 1
10-17 17/17= 1
18-33 33/33= 1
34-65 65/65= 1

Over all the ranges we’ve examined, the time to add new elements on average remains constant. If we fix our implementation, it looks like this:

void push_back(T v) {
  if(sz == cap) {
    // Make room for the new element
    cap = cap == 0 ? 1 : cap * 2; 
    T* new_data = new T[cap];
    ...

Vector complexity

How long does it take to access a particular array element (i.e., a[n])? Does it depend on n, or on the size of the array? No; remember that an array access a[n] is just a shorthand for a pointer addition:

*(a + n)

So it’s no more complex than an addition and a pointer dereference. Thus, accessing a particular array element is \(O(1)\), depending on neither the size of the array, nor the element being accessed. Vectors are just arrays internally, so they inherit that behavior. Let’s build a table of the complexities of the operations supported by vectors:

Operation Complexity class
a.at(n) \(O(1)\)
a.push_back() \(O(1)\) amort., \(O(n)\) worst
a.pop_back() \(O(1)\)
a.insert() \(O(n)\)
a.push_front() \(O(n)\)
a.erase() \(O(n)\)
a.clear() \(O(1)\) (deletes everything)

We can see that vector has some tradeoffs. We can get to any element quickly, and we can add new elements at the end relatively quickly. Splicing new elements in the middle (or at the beginning) requires copying k elements and also shifting all the elements after the insert position up, which on average, will be within a constant factor of the size of the whole vector. push_front() is the same as inserting at position 0, so it always takes time proportional to the size of the vector. Deleting the entire contents of the vector (i.e., resetting size to 0) is fast, because we can just do sz = 0;.

A vector that supports push-front

Can we build a vector that supports amortized constant-time push_front, without sacrificing push_back or any of the other properties? Yes, but at a cost: additional space. Now, when we reallocate, we’ll need to leave some unused space at the end and the beginning. When we insert new elements, either at the beginning or the end, we have a choice:

As it turns out, we want to do both, switching between them depending on how close to “full” the vector is. When there’s still a lot of empty space available, it makes sense to spend some time re-centering. When there isn’t much space left, it’s very expensive to move everything, so we should just wait for the (impending) reallocation to recenter us. We can set a threshold for what percentage-used triggers the no-recentering policy.

An interesting thing to note is that this approach assumes that front and back inserts are equally likely. If they are not, we can improve by building an adaptive algorithm. For example, the vector could keep track of how many push_front‘s you did, relative to how many push_back’s. If there was a bias in one direction or the other, we could skew the re-centering step to allow more space on that side.

(For a fuller discussion, see http://larshagencpp.github.io/blog/2016/05/22/devector)

So if you need a data structure that lets you access things at random locations, and the lets you add new things to the end quickly-ish, a vector is a good choice. If you need to insert things in the middle, or at the beginning, it may not be. Let’s look at a data structure with some different tradeoffs (slow random access, fast insertion/push_back): the linked list.