Review of last time
Big-O notation: \(f(n) = O(g(n))\) means that \(f(n)\) for very large \(n\) is roughly proportional (at least, no worse than) \(g(n)\).
If we have a simple loop over a data structure:
void square_elements(vector<int>& values) {
for(auto& v : values)
v *= v;
}
this will take time proportional to the size of the input, so we say that it is a linear time algorithm, or that it is of order \(O(n)\).
void remove_duplicates(vector<int>& data) {
vector<int> out;
for(int x : data) {
// See if x is already in out
bool found = false;
for(int y : out)
if(x == y)
found = true;
if(!found)
out.push_back(x);
}
return out;
}
To analyze this, we first have to consider whether it will always take the
same amount of time for an input of size n, or will some inputs cause it
to run faster or slower. We have a nested loop, where the outer loop always
runs n times, but the inner loop runs in time proportional to the size of
out
, which changes as the outer loop runs. Sometimes out
grows,
sometimes it doesn’t.
The best case is an input that is nothing but duplicates: {1, 1, 1, 1, ...}
.
The first 1
will be push_back
‘d into out
, but after that, out
will
never get any additional elements, so the inner loop always takes \(O(1)\)
times. This leads to a best-case runtime of \(O(n)\).
The worst case is an input that is all unique elements (no duplicates at all).
Every x
will be added to out
, so out
will grow by 1 every time through
the loop. This leads to the now-familiar “triangular” time complexity which
is \(O(n^2)\).
If we build a hierarchy of complexities, it looks like this (from best to worst):
Complexity class | Name |
---|---|
\(O(1)\) | Constant |
\(O(\log n)\) | Logarithmic |
\(O(n)\) | Linear |
\(O(n \log n)\) | N-log N (“linerithmic”) |
\(O(n^2)\) | Quadratic |
\(O(n^3)\) | Cubic |
\(O(n^p)\) | Polynomial |
\(O(2^n)\) | Exponential |
\(O(n!) \equiv O(n^n)\) | Factorial |
Templates
Consider a function such as
int max(int a, int b)
{
return a > b ? a : b;
}
You can think of this function (or any function) as a machine with a number of “holes” in it, where the parameters are. In order for the machine to work, you have to fill in all the holes with things of the appropriate type, and then the machine cranks up and the function runs. The holes allow us to make functions more flexible than just a block of code that always did the same thing; the behavior depends on the values supplied.
If you look at the function above and think for a bit, it should become clear that it is actually a family of functions, all operating on different types but all doing the same thing:
int max(int a, int b) { return a > b ? a : b; }
float max(float a, float b) { return a > b ? a : b; }
double max(double a, double b) { return a > b ? a : b; }
string max(string a, string b) { return a > b ? a : b; }
The body of the function stays the same, only the types of the holes vary.
Looking at it this way, what if we added a different kind of hole, one that,
instead of filling it in with a value, you fill it in with a type? This
is what templates let us do: write one version of this function where the
concrete types (int
, float
, etc.) are replaced by a type variable.
The template-ified version of the above (replacing all four functions and any
future max
functions with the same body, too) looks like this:
template<typename T>
T max(T a, T b)
{
return a > b ? a : b;
}
In order to make use of this function, we have to supply three pieces of information, fill in three holes:
We have to supply a type for
T
first, otherwise we don’t know what the function is doing.Once we have
T
, we need a thing of typeT
fora
.We need a second thing of type
T
forb
.
To use the new max
, we can write something like this:
cout << max<int>(10,4) << endl;
In angle brackets after the name, we write the type to use for T
, i.e., the
type arguments to the function. In parentheses, we supply the values for
a
and b
, the value arguments. (Angle brackets for type args, parens for
value args.)
In fact, in this case, we don’t have to supply int
as the type for T
, C++
can figure it out for us if we just write:
cout << max(10,4) << endl;
C++ looks at the value-arguments to the function, sees that in the function
declaration that are both T
, sees in the function call that they are both
int
s, and deduces that T = int
is the only way this will work.
Template function declarations
A templated function definition is its own declaration; this means that
template functions
must go into .hpp
files, never into .cpp
files. This also means that
the template function declaration/definition must come before any uses of
the function; there’s no way to pre-declare a template function and then
define it elsewhere. (The reason for this is that the compiler generates
the T
-specific version of the function when it sees a need for it, e.g.,
max(10,4)
, so it needs to have the full definition available.)
Template classes
Just like a function can have a type-hole in it, so can a class:
template<typename T>
class box {
public:
box() {
data = new T[10];
}
~box() {
delete[] data;
}
T at(int i) {
return data[i];
}
private:
T* data;
};
When we use this class, we must supply a type for T
:
box<int> my_box;
Unlike functions, C++ cannot deduce T
in a templated class. (At least in
the version of C++ we are using; C++17 added “class template automatic
deduction” or CTAD which allows this in certain cases.)
You can have more than one template type variable (in both functions and classes). For example, a useful type is
template<typename A, typename B>
struct pair {
A fst;
B snd;
};
Functions normally can only return one value, but with this we can write a
function that returns (e.g.) an int
and string
as
pair<int,string> my_function() {
...
return pair<int,string>{12, "Hello"};
}
(pair
is useful enough that it’s built-in, in the <utility>
header,
as std::pair
. The function make_pair()
makes building pairs more
convenient.)
The Vector Class: extensible arrays
The std::vector
class implements an extensible array. This means that
it support a method .push_back(k)
which adds k to the end of the
array, growing the array by one element. Let’s take a first stab at building
vector
from scratch:
template<typename T, typename R>
class vector {
public:
vector(int s = 0, T v = T()) {
cap = 2*s;
sz = s;
data = new T[cap];
for(int i = 0; i < s; ++i)
data[i] = v;
}
~vector() { delete[] data; }
// Need copy ctor, overloaded =
int size() { return sz; }
bool empty() { return sz == 0; }
int capacity() { return cap; }
T at(int i) { return data[i]; }
void pop_back() {
if(sz > 0)
--sz;
}
void push_back(T v) {
if(sz < cap)
data[sz++] = v;
else {
// ?
}
}
// insert, erase
private:
T* data;
int sz, int cap;
};
This version supports creating a vector with a specific size, filled with
some particular value v
or the default T
-value, if you don’t specify one.
You can also create an empty vector:
E.g.
vector<int> vi(10,-1); // Size = 10, filled with -1
vector<string> vs(20); // Size = 20, filled with empty strings
vector<float> vf(); // Size = 0, capacity = 0
The capacity of the vector defaults to twice the size so in the above example, the capacities would be 20, 40, and 0.
The pop_back
operation is easy: as long as the vector is not empty, just
decrement the size. We don’t actually “remove” the element, we just pretend
that it isn’t there, and a later push_back
will overwrite it.
The push_back
operation is easy, provided that there is room left
(size() < capacity()
). If there isn’t, what do we do?
An extensible array
The easiest way to build an array that can “grow” is to re-allocate a new array, with one extra element, and then copy everything over.
Here’s a sketch:
void push_back(T v) {
if(sz == cap) {
cap += 1; // Make room for the new element
T* new_data = new T[cap];
for(int i = 0; i < sz; ++i)
new_data[i] = data[i];
delete[] data;
data = new_data;
}
// Now sz < cap always
data[sz++] = v;
}
What determines the runtime complexity of push_back
? (The current size
of the vector.) We have to loop over all the elements, so the runtime
complexity of push_back
is linear in the size of the vector. This isn’t
so bad initially, but imagine a million-element vector; we have to copy
a million things, just to add one!
What’s a better way of doing this? Allocate more than one “extra” space when we reallocate, and keep track of the number of “unused” entries available. Only when they are all used up do we need to do a full copy. This looks something like this
void push_back(T v) {
if(sz == cap) {
cap += ?; // Make room for the new element
T* new_data = new T[cap];
for(int i = 0; i < sz; ++i)
new_data[i] = data[i];
delete[] data;
data = new_data;
}
// Now sz < cap always
data[sz++] = v;
}
The main question now is, when we do a full reallocation, what is the optimal amount of “scratch space” to allocate? It has to be at least 1 (for the new element), but we’ve already seen that this is bad. What should we use?
We’re going to analyze the behavior of push_back
using a technique called
amortized analysis. This is different from our normal analysis, because
we’re looking at its behavior over many runs of the same function. It’s OK
if some runs are bad, so long as they are balanced out by others.
We have to use amortized analysis to get an accurate picture of
how push_back
runs, because simply taking its worst case
(\(O(n)\)) and assuming that that happens every time is
both incorrect (it can’t happen every time) and too pessimistic.
Let’s look at what happens if we use a fixed amount, like 8 (every time we have to reallocate, we add 8 extra elements).
Let’s analyze the behavior of this algorithm over time, by looking at the
cost of each push_back
:
adding items...
1-16 are free
17 costs 16
18-32 free
33 costs 32
34-48 free
49 costs 48
Over 49 items added via push_back
, the average cost per item is 96 / 49 or
about 2. What’s more interesting is if we examine the cost over the ranges
1-17, 18-33, and 34-39:
Range | Avg cost per item |
---|---|
1 - 17 | 16/17 |
18 - 33 | 32/17 |
34 - 49 | 48/17 |
In other words, for every 16 items we add, the cost increases by 16 as well. We’ve reduced the constant factor (by spreading out the copies) but overall, adding new elements is still a linear-time algorithm. We need a different approach.
We want the avg. cost per add to remain the same. If we want to get that, we’re going to need to space out the copies further and further, as the array grows, so that copies become less frequent, as they become more expensive.
In fact, for an array of size \(n\) we should allocate exactly \(n\) more space, doubling the size of the array, every time we need to grow it. Let’s analyze the behavior of this:
1 cost 1 (array size = 2)
2 free
3 cost 3 (array size = 4)
4 free
5 cost 5 (array size = 8)
6-8 free
9 cost 9 (array size = 16)
10-16 free
17 cost 17 (array size = 32)
18-32 free
33 cost 33 (array size = 64)
34-64 free
65 cost 65
...
The average cost over the entire range is 64 / 64 = 1
. But a new picture emerges if
we look at the cost, per range of items:
Range | Avg cost per item |
---|---|
1 | 1 = 1 |
2-3 | 3/3 = 1 |
4-5 | 5/5 = 1 |
6-9 | 9/9 = 1 |
10-17 | 17/17= 1 |
18-33 | 33/33= 1 |
34-65 | 65/65= 1 |
Over all the ranges we’ve examined, the time to add new elements on average remains constant. If we fix our implementation, it looks like this:
void push_back(T v) {
if(sz == cap) {
// Make room for the new element
cap = cap == 0 ? 1 : cap * 2;
T* new_data = new T[cap];
...
We call this kind of analysis over many operations amortized analysis.
Rent-or-buy
This isn’t really an algorithm, but it demonstrates some of the ideas behind amortized analysis: suppose you’re going to a fancy event, and you need a tuxedo:
You can rent a tux, for $50.
You can buy a tux, for $300.
If you think this will be the only fancy event you’ll ever go to, then renting is better. But if you get invited to fancy events regularly (but not with any certainty) the question becomes, at what point should you stop renting and just buy?
Clearly, buying after you’ve rented more than 6 times is kind of silly; you should have bought earlier. (E.g., if you wait until the 11th time to buy, you’ve spent $500 on rentals, plus $300 on the purchase)
On the other hand, if you buy too early, you’ll end up spending more, if you never get invited again.
To figure this out, we’re going to compare the worst-case cost with the optimal cost, the cost if we had known in advance how many fancy-invites we were going to get. This is called the “competitive ratio” (CR).
If we buy but only get invited once, then our cost is $300, but the optimal cost would be $50. CR = 300/50 = 6.
If we rent but get invited repeatedly for the rest of our life, then our cost is infinite, but the optimal cost would be $300. CR = infinite.
Suppose we decide to rent until the total cost we’ve spent is greater than or equal to the cost of a purchase. That is, we’ll rent five times and then purchase when we get the sixth invite.
If the number of invites we receive is five or less, then our cost is invites * $50, and optimal is the same, so we’re exactly optimal (CR = 1) in this case.
If the number of invites we receive is six exactly, then our cost is $250 + 300 = $550, but optimal would be $300. CR = 550/300 = 55/30 = 1.83.
If the number of invites is greater than 6, then we again spend $550, but optimal is $300. CR = 1.83.
(The worst-case CR is \(2 - r/p\) where r is the rental cost and p is the purchase cost.)
With this strategy, we are never spending more than twice what we would if we had knowledge of the future (the optimal strategy), which, it turns out, is the best we can do with no knowledge. I.e., no other method without future knowledge can do better.
To show this, assume that you decide to rent k times and then purchase on the k+1 invite. Your cost would be \(50k + 300\). The optimal cost would be \(\text{min}(300, 50(k+1))\), i.e., the smaller of the purchase price, or the cost of renting k+1 times. So
If \(300 \le 50(k+1)\) then CR \(\ge 2 - 50/300 (\approx 1.83)\).
If \(300 > 50(k+1)\) then CR \(\ge 2 - 50/300 (\approx 1.83)\).
So no matter what you do, the ratio is always ≥ 1.83.
Vector complexity
How long does it take to access a particular array element (i.e., a[n]
)? Does
it depend on n, or on the size of the array? No; remember that an array
access a[n]
is just a shorthand for a pointer addition:
*(a + n)
So it’s no more complex than an addition and a pointer dereference. Thus, accessing a particular array element is \(O(1)\), depending on neither the size of the array, nor the element being accessed. Vectors are just arrays internally, so they inherit that behavior. Let’s build a table of the complexities of the operations supported by vectors:
Operation | Complexity class |
---|---|
a.at(n) |
\(O(1)\) |
a.push_back() |
\(O(1)\) amort., \(O(n)\) worst |
a.pop_back() |
\(O(1)\) |
a.insert() |
\(O(n)\) |
a.push_front() |
\(O(n)\) |
a.erase() |
\(O(n)\) |
a.clear() |
\(O(1)\) (deletes everything) |
We can see that vector
has some tradeoffs. We can get to any element quickly,
and we can add new elements at the end relatively quickly. Splicing new
elements in the middle (or at the beginning) requires copying k elements and
also shifting all the elements after the insert position up, which on average,
will be within a constant factor of the size of the whole vector. push_front()
is the same as inserting at position 0, so it always takes time
proportional to the size of the vector. Deleting
the entire contents of the vector (i.e., resetting size to 0) is fast, because
we can just do sz = 0;
.
A vector that supports push-front
Can we build a vector that supports amortized constant-time push_front
,
without sacrificing push_back
or any of the other properties? Yes, but at
a cost: additional space. Now, when we reallocate, we’ll need to leave some
unused space at the end and the beginning. When we insert new elements,
either at the beginning or the end, we have a choice:
Don’t do any copying, unless absolutely necessary. This allows the vector to “fill up” all the way to the beginning or the end, which will trigger a full realloc, upon which we can create a new vector of larger size, and copy the current elements into the center of it (leaving some slack space at both the beginning and the end).
When inserting new elements, try to “re-center” the used part of the vector in the middle of the unused space. This requires some copying for every insert, but means that subsequent inserts will be faster, regardless of which end they are on.
As it turns out, we want to do both, switching between them depending on how close to “full” the vector is. When there’s still a lot of empty space available, it makes sense to spend some time re-centering. When there isn’t much space left, it’s very expensive to move everything, so we should just wait for the (impending) reallocation to recenter us. We can set a threshold for what percentage-used triggers the no-recentering policy.
An interesting thing to note is that this approach assumes that front and
back inserts are equally likely. If they are not, we can improve by building
an adaptive algorithm. For example, the vector could keep track of how
many push_front
’s you did, relative to how many push_back
’s. If there was
a bias in one direction or the other, we could skew the re-centering
step to allow more space on that side.
(For a fuller discussion, see http://larshagencpp.github.io/blog/2016/05/22/devector)
So if you need a data structure that lets you access things at random locations,
and the lets you add new things to the end quickly-ish, a vector
is a good
choice. If you need to insert things in the middle, or at the beginning, it may
not be. Let’s look at a data structure with some different tradeoffs (slow
random access, fast insertion/push_back
): the linked list.