Review of last time

Logging in to the server. Hostname, port, username, password.

Splitting strings into words. States and inputs, tabular form, state machine diagram.

Implementing string splitting using strings and vectors

The string class

#include <string>
using std::string;

string s = "Andy";

If you declare a string variable without initializing it, it defaults to the empty string:

string e; // e is the empty string

You can initialize a string object from a char* literal. The basic string operations are:

Operation Description
s.at(i), s[i] Access individual characters by index (char)
s.front(), s.back() Access first/last characters
s.length(), s.size() Length of string
s.empty() True if length is 0
s.substr(i) Extract substring starting at i, to the end
s.substr(i,l) Extract substring, starting at i, of length l
s1 + s2 Concatenate two strings to form a new one
s1 < s2, etc. Compare strings alphabetically
s.c_str() Get a char* representation of a string
s.find(c), s.find(s2) Get position of the first occurrence of char c or string s2 in s
s.rfind(c), s.rfind(s2) Get position of last occurrence


s.push_back(c); Add a character c to the end of the string
s.pop_back(); Delete the last character of the string
s.clear(); Delete all characters from string (len. = 0)
s1 += s2;, s1.append(s2); Append s2 to the end of s1, modifying s1
s1.insert(s2,i); Insert s2 into s1, before position i
s.erase(i,l); Erase characters starting at i, of length l
s1.replace(s2,i,l); Replace l characters of s1, starting at i, with s2
s1 = s2; Copy all of s2 into s1

find and rfind will return the special value string::npos if the thing you are searching for cannot be found at all.

Note that strings do not have a nul character at the end! This means that the length of a string is strictly the number of characters in it, and the last character (s.at(s.length() - 1)) is not necessarily nul. (In fact, strings can have nul characters anywhere inside them!)

To read a single word from cin, you can use

string w;
cin >> w; 

This reads a word because the normal behavior of >> is to stop at the first space character it sees.

To read an entire line, use

string l;
getline(cin, l);

You can, of course, print strings normally:

string name = "Andy";
cout << "Hello, " << name << endl;

For our purpose, we’re going to read in an entire line (using getline) and then process it one character at a time. We can do the latter using a loop:

for(unsigned i = 0; i < s.length(); ++i)
  // Use s[i] ...

Or we can use the fancy ranged-for-loop:

for(char c : s)
  // Use c ...

Vectors

You can think of a vector as a version of a string that stores any kind of element, not just characters. The operations supported by vectors are somewhat more limited than those supported by strings, just because vectors cannot make any assumptions about what kind of data they are holding.

vector<int>    vi;                                     // empty vector of ints
vector<char>   vc1(10);                                // Vector of 10 chars
vector<char>   vc2(10, 'x');                           // Vector of 10 'x' chars
vector<string> names = {"Bruce", "Richard", "Alfred"}; // vector of three strings

Vectors manage their own storage, so you don’t need to delete them.

To access the elements of a vector by index, use either .at(i) or [i]:

The shortcut methods v.front() and v.back() provide easy access to the first and last elements. These methods are “checked”: if the vector is empty, they will throw an exception.

Other vector operations:

Operation Description
v.size() Size of the vector (number of elements)
v.empty() True if size is 0
v.clear() Reset size to 0, deleting all elements
v1 == v2 True if v1 and v2 are identical
v.push_back(e); Add a new element to the end of the vector
v.pop_back(); Erase the last element of the vector
v1 = v2; Copy all elements from v2 into v1

Note that unlike arrays, vectors support copying, via assignment. This means that you can pass vectors as parameters to functions, and return from functions as well. This makes vectors much easier to work with than arrays, especially since you don’t need to worry about dynamically allocating them. The fact that vectors can grow, via push_back makes them even more useful; e.g., if you want to read some number of ints in from the user, and you don’t know how many, you can just do

vector<int> data;
int x;
while(cin >> x)
  data.push_back(x);

It’s possible to insert/erase elements in the middle of a vector, but doing so requires the use of an iterator. Here’s an example to get you started:

vector<int> vs = {1, 2, 3, 4};
vs.insert(vs.begin() + 2, 100); // vs = { 1, 2, 100, 3, 4 }
vs.erase(vs.begin() + 1);       // vs = { 1, 100, 3, 4 }

You can also insert the contents of one vector in the middle of another.

Splitting strings

Our table for states/inputs looks like this:

Letter Space
WORD Add to cur. word
Stay in WORD
Finish word
Switch to SPACE
SPACE Start new word
Switch to WORD
Ignore
Stay in SPACE

Implementing this as a function, it will take a string as its input, and return a vector<string> as its output:

vector<string> split(string input) {
  string word;            // Current word
  vector<string> output;  // List of words

  const int SPACE = 0;
  const int WORD = 1;
  int state = SPACE;

  for(char c : input) {

  }

  return output;
}

All of our work needs to be done inside the for loop. The current character is c, and the current state is state. Since we have four table entries, we’ll have an if-else for each of the four possibilities:

  for(char c : input) {
    if(state == WORD && c != ' ') {

    }
    else if(state == WORD && c == ' ') {

    }
    else if(state == SPACE && c != ' ') {

    }
    else { // state == SPACE && c == ' '

    }
  }

There’s one last thing we have to think about, and that’s what happens at the end of the input string:

Trace through this on the input string "get the rock". Run example.

References and pointers

A reference is just another name for an existing variable or location:

int  x = 1; // x is a variable, OK
int& y = x; // y is another name for x

Both x and y refer to the same thing. It’s impossible to distinguish them, because they are just different names for the same object. Any changes to x will be reflected in y, and vice versa.

Because a reference is an “alias” for something, you must initialize a reference variable with another variable/location (or another reference):

int& z = 1; // ERROR
int& q;     // ERROR

(The C++ name for things that you can get a reference to is “lvalue”, as in, values that can be on the left side of an assignment. Temporary objects are rvalues.)

You can, however, get a reference to an element of an array or vector (or string):

int         arr[] = {1, 2, 3};
vector<int> vec   = {5, 6, 7};

int& y = arr[1]; // OK, another name for arr[1]
int& z = vec[2]; // Also OK

With vectors, this can be a little dangerous, because the size of a vector can change, and thus there’s the possibility that the thing the reference is referring to might disappear. But as long as you are careful, everything is OK.

Functions can take references as parameters, in which case the formal parameter becomes “another name” for whatever argument is used when the function is called. Similarly, functions can return references (as long as the thing referred to still exists after the function exits!)

References can cause some weird behavior: Look at this function; can you see any way in which it might print out a 2, instead of a 1?

void f(int& a, int& b) {
  a = 1;
  b = 2;
  cout << a << endl;
}

References have some limitations:

Pointers

Pointers remove these limitations, but add some complexity in doing so. A pointer is like a reference that can change what it refers to. However, the simplicity of references comes from the fact that they cannot change; hence, there are no operations on the reference. Anything you do to a reference variable is transparently done to the the on the other end of the reference. Because pointers can change, this means that we now need two different syntaxes:

In other words, while references have no “identity”, the don’t exist on their own and are just aliases for other things, pointers are objects in their own right, and thus we need some way of manipulating them, as opposed to the thing they point to. All of the complexity of pointers springs from this duality: we always have to be clear what we are doing, are we manipulating the pointer or the object it points to?

Semantically, pointers work by storing addresses. Every (lvalue) object in our program exists somewhere in the computer’s memory; every location in the computer’s memory has an address, a number. A pointer stores an address of another thing. “Dereferencing” the pointer means going to the address it contains.

Pointer syntax:

Type Description
T* “Pointer to T”
T& “Reference to T”
Expression Description
*p Get object pointed to by p (look at the addr. in p)
&v Get pointer to object v (get the addr. of v)

The connection between the two is

Thus, for expressions, * and & are kind of like opposites. & adds a layer of “pointer to-ness” while * removes a layer.

Although pointers are often introduced with dynamic memory, you don’t actually have to do that. Just like with a reference, you can get a pointer to any lvalue:

int  x = 1;
int* y = &x; // y points to x

x++;         // Increments x
(*y)++;      // Also increments x

(Note that ++ has higher precedence than *, so the parentheses are necessary!)

As with references, you can get a pointer to an element of an array, vector, or string:

int         arr[] = {1, 2, 3};
vector<int> vec   = {2, 3, 4};
string      str   = "Hello";

int*  ap = &arr[1]; 
int*  vp = &vec.front(); // Same as vec.at(0)
char* cp = &str[3];

Once again, with strings and vectors, because elements can be removed, you have to be careful to make sure that the pointers still point to something that exists! A pointer that points to a non-existent object is called dangling and using it results in undefined behavior!

If we do ap == vp will this be true or false? What are we asking? We are asking if ap and vp point to the same thing. We are not asking anything about the values pointed to by ap and vp. Similarly, if I do, ap = vp; what happens? Do any of the values in the array or vector change? No: only the pointer ap changes what it is pointing to. The most important thing about pointers is to get yourself clear on when we are talking about the pointers themselves, and when we are talking about the things pointed to. If you have some pointer variables, and there aren’t any *s in the expression, then you are talking about the pointers. If there are *s then you are probably talking about the objects pointed to.

Pointers to array/vector elements

If we have a pointer into an array, vector, or string, we can do some interesting things with it:

vector<int> vec = {1, 2, 3, 1};
int* p1 = &vec[0];
int* p2 = &vec[3];

Consider the following:

 p1 == p2     // True or false?
*p1 == *p2    // True or false?
 p1 <  p2     // True or false?

p1 + 3        // What does this mean?
*(p1 + 3)     // What about this?
p1[3]         // This?
p1+3 == p2    // True or false?
p1++;         // What happens if we do this?

Pointers into an array can have arithmetic done on them (adding/subtracting integers) and it will cause them to move around within the array. Adding 1 moves the pointer one element forward within the array. Comparisons between pointers effectively compare the the indexes within the array that they point to. Note that none of the above operations change the elements in the array!

Pointers to structure and class instances

There are a few extra things we can do with pointers if we have a struct or a class. Here’s an example struct:

struct thing {
  int a;
  char* b;
  string s;
};
char c = '?';
thing t1 = {1, &c, "Hello"};
thing* tp = &t1;

How do I refer to each of the members of t2?

(*tp).a     // == 1
(*tp).b     // == pointer to c
(*tp).c     // == "Hello"

This is a bit cumbersome to type out, so there’s a shortcut:

tp->a
tp->b
tp->c

What if I want to access the actual char pointed to by tp->b?

*((*tp).b)   // == '?'
*(tp->b)     // == '?'

We can also get a pointer to a member of the structure:

int* ip = &(t1.a); 
        = &(tp->a); // Same thing

Finally, nothing stops us from having multiple pointers to the same structure:

thing* tp2 = tp; 

tp == tp2    // True
tp->a = 2;   // Also changes tp2->a

All of this applies equally to classes, and to methods. If we have a class

class thing {
  public:
    void run() {
      cout << "WHATUP" << endl;
    }
};

and then we have a pointer to an instance of that class

thing* t = ... ;

Then we can use either (*t).run() or t->run() to dereference t and call the run method:

(*t).run();
t->run();    // Same thing

In this semester, we’ll use a lot of pointers, but they will usually be pointers to structs or classes, so we’ll use -> a lot. We’ll almost never use &, because we generally won’t need to get pointers to existing objects. We’ll get our pointers by creating objects dynamically, via new. (Remember that new T returns a T*, a pointer to a T.

The null pointer

nullptr is the value you should use for the null pointer (the pointer that points to nothing at all), not NULL. The null pointer is unique: no other, non-null pointer is == to it, and it is == only to other null pointers. Thus, you can use nullptr to signal that a pointer doesn’t point to anything at all. This gives pointers a bit of power that normal “values” don’t have. E.g., if we have an int variable, there is no way to say that it contains “nothing”; it always contains some int value. But if we have an int pointer then there are two possibilities:

Dynamic memory allocation

I’ve intentionally separated pointers from dynamic memory because often students assume that whenever you have pointers to must also have dynamic allocation somewhere, but this is not the case. You can get a pointer to anything that “hangs around”. Dynamically allocated objects are particularly convenient for this, because they hang around until we delete them, but its not required to use them. Dynamic allocation is useful when we don’t how what or how many objects we will need (or how long they need to live), until our program is running.

Stack vs. heap. Activation records. Heap allocation. Lifetime of objects.

new finds space on the heap, big enough for an object of the given type, and then gives you a pointer to that space. (The array-new finds enough space for multiple objects of the same type.) Because the object is allocated on the heap, it will remain “alive” (taking up space) until we delete it.

A common mistake (coming from the assumption that pointers have to be used with dynamic allocation) is to write something like this:

int x = 12;
...

// Now we want a pointer to x
int* p = new int();
p = &x;
// carry on with p

What’s wrong with this? We create a new (dynamic) int, but then immediately discard the pointer to it, replacing it with a pointer to x. This means that we have no way to refer to the heap memory we just reserved, and hence no way to delete it. That memory is lost to our program, until our program exists; we have created a memory leak. The correct way to do this is

int* p = &x;
// carry on with p

If the thing you want to get a pointer to already exists, then just initialize your pointer to that; there’s no need to allocate any space. Only use new when you want to reserve some additional space on the heap.

Similarly, just because we are done with p does not mean we need to delete it. In this case, because the thing pointed to by p was not created via new, it should not be destroyed via delete.

An object can only be delete-d once; we saw last time that trying to delete something more than once crashes your program with a “double-free” error. This means that if we allocate something, we need to decide what part of our program “owns” it, and that part is responsible for delete-ing it. A pointer that points to an object that it will eventually delete is called an “owning” pointer. It “owns” the object on the other end, and somewhere, that object will be deleted via that pointer. There should only ever be one owning pointer to an object; as more than one would mean that the object would be deleted more than once. There can, of course, be any number of non-owning pointers to an object, because those won’t delete it when they are done.

Assignment 1

The first assignment is a kind of C++/CSci 123 review, but also hopefully gets you thinking about the kind of issues that will be important to us. The assignment is to implement an ordered array. A lot of the operations will be similar to those on the bag structure we implemented earlier, but a few are different.

Because an ordered array has a size, a capacity, and its elements, you will probably need three data members: size, capacity, and a dynamic array of elements (you could also use a vector).

The description of how the ordered array works says that the special value -2147483648 cannot be stored in the array. E.g., if you try to insert it, nothing happens, exists(-2147483648) always returns false, etc. This seems arbitrary, but it’s done so that if you want, you can use -2147483648 as a “special” value internally. E.g., if you store the contents of the ordered array in a (dynamic) array, you could implement remove(e) by finding the location of e and then simply putting a -2147483648 there, to mark it as “deleted”. This method will require you to implement every other method to ignore -2147483648 entries. If you use this method, you don’t need to store the size: the size of the array is just the number of entries that are != -2147483648.

Alternatively, you could implement it the way we implemented the bag, by using the front portion of the array, starting at index 0, for the “used” elements, and the second half for unused elements. Using this method, you must store the size of the array, and remove must shift array elements around (we cannot simply swap to the end, because that would un-sort the array!).

The assignment asks you to think about how much time the various operations will take, relative to the size of the array. E.g., as the array grows, which operations will get slower? Which will take the same amount of time? We’ll investigate these ideas further in the next few lectures.

Analyzing algorithms

We’re going to talk about analyzing algorithms, how can we quantify the time-taken (or memory used) by a given algorithm? We could run it and measure it, but that doesn’t give us enough information: it only tells us how fast it runs, on one particular input, on one particular computer and compiler. What we want is a more general way of describing the behavior of an algorithm, not in terms of “seconds” or “bytes”, but more abstractly.

In fact, what we are going to consider is not “how long” an algorithm takes, but rather, the growth rate of runtime, as the size of the input increase. E.g., for one algorithm, doubling the size of the input might double the runtime; for another, it might quadruple. Clearly, the former is better than the latter, even though we don’t know anything about the actual runtimes.

Before we can do that, we need to review some math:

Review of summation notation

$$\sum_{i=j}^k f(i)$$

This means, add up all the values of \(f(i)\) for every f between j and k (including \(i = k\)). In other words,

$$\sum_{i=j}^k f(i) = f(j) + f(j+1) + \ldots + f(k-1) + f(k)$$

The simplest summation is just

$$\sum_{i=j}^k 1 = 1 + k - j$$

Because it’s just a big sum, there are some things we can do with it:

Factoring

What is

$$\sum_{i=j}^k 2 f(i)$$

This sum works out to

$$\sum_{i=j}^k f(i) = 2 f(j) + 2 f(j+1) + \ldots + 2 f(k-1) + 2 f(k)$$

which we could factor into

$$2 ( f(j) + f(j+1) + \ldots + f(k-1) + f(k) )$$

but the thing inside the parens is our original sum, so we can say that

$$\sum_{i=j}^k 2 f(i) = 2 \sum_{i=j}^k f(i)$$

That is, for any constant c (and by “constant” we mean anything that does not depend on i), we can factor

$$\sum_{i=j}^k c f(i) = c \sum_{i=j}^k f(i)$$

Splitting

What is

$$\sum_{i=j}^k f(i) + g(i)$$

The sum works out to

$$f(j) + g(j) + f(j+1) + g(j+1) + \ldots + f(k-1) + g(k-1) + f(k) + g(k)$$

but we can rearrange the terms into

$$f(j) + f(j+1) + \ldots + f(k-1) + f(k) + g(j) + g(j+1) + \ldots + g(k-1) + g(k)$$

and this is just

$$(\sum_{i=j}^k f(i)) + (\sum_{i=j}^k g(i))$$

In other words, if the body of a summation is itself a sum, we can split it up into multiple summations.

Removing terms

Finally, we can remove a number of terms from a summation by adjusting the ends, either as

$$\sum_{i=j}^k f(i) = f(j) + \sum_{i=j+1}^k f(i)$$

or as

$$\sum_{i=j}^k f(i) = f(k) + \sum_{i=j}^{k-1} f(i)$$

Of course, we could remove more than one term if that was useful, or remove terms from both the beginning and the end. We just have to make sure we adjust the endpoints properly!

Change of variable

Finally, sometimes it’s useful to adjust the endpoints themselves, by pushing them up or down. In order to do this, we have to replace the summation variable \(i\) with a suitable adjusted \(i’\). For example,

$$\sum_{i=0}^k f(i) = \sum_{i’=1}^{k+1} f(i’ - 1)$$

where \(i = i’ - 1\). (If the presence of the final \(k+1\) term in the new sum was problematic, we could use the term-removal technique to strip it off.)

Basics of complexity analysis

Example 1: largest element

int largest(vector<int> values) {
    assert(values.size() > 0);

    int l = values[0];
    for(int i = 1; i < values.size(); i++)
        l = values[i] > l ? values[i] : l;

    return l;
}

We’re first going to do a fine-grained analysis of the work that this function does, in terms of the number of operations. We’re then going to take that and derive an asymptotic bound for the work, as the size of the input (i.e., values.size()) gets large.

How long does this take to run, for a vector of a given size()? Well, that largely depend on things like:

Note that none of these things depend on the size of the vector. Each of these things will take some constant, but unknown, amount of time, regardless of how big of a vector we give it, so we can represent them as constants:

Now, if values.size() == 1 how long will the program take to run? We do

Giving us a total time of \(2C + 2A + S\). If values.size()\(= n\), then we can figure out the total time taken, by looking at how much work is involved in the loop body:

So we have a base time of \(2C + 2A + S\), plus a per-iteration time of \(2C + A + I + S\) giving us a total time of

$$2C + 2A + S + n(2C + A + I + S)$$

Note that for all but the smallest vectors, the second term is going to dominate the the sum, so much so that we could really say that the total time is approximately proportional to \(n\), the size of the vector. A function like this one, that takes time proportional to the size of its input for “big enough” inputs, is generally said to run in “linear time”, meaning that the run time grows linearly with the size of the input: double the size of the input, and the runtime (roughly) doubles, too.

Note that this classification (linear time) is true regardless of what computer we run this function on. Different computers will have different constants, thus it makes no sense to ask, in the abstract, how many seconds will largest take to run for some input, but we can abstractly ask, how fast will its runtime increase, relative to an increase in the size of its input? In this case, there is a linear correlation.

We generally care more about this kind of classification of an algorithm, than the raw “run time” sum we computed above. This allows us to classify algorithms by how fast their run time grows, as the input size increases. If algorithm A grows more slowly than algorithm B we can say that A is “better” than B.

Example 2: finding an element

bool find(vector<int> values, int v) {
    for(int i = 0; i < values.size(); ++i)
        if(values[i] == v)
            return true;

    return false;
}

We can analyze the work of this function in terms of increments, equality comparisons, and calls to values.size() (there are no simple assignments in this function). Here, however, we run into a problem: the number of times that the loop runs depends on the values inside the vector. That is, even for two vectors with the same size, the function might take dramatically different amounts of time. In this case, instead of just asking for the work of the function, we analyze the best and worst cases. (The function from the previous example has its best and worst cases identical, so there’s no point in looking at them separately.)

In order to determine what the best/worst cases are, we need to figure out how to “rig” the contents of the vector so that a) the loop exits as early as possible (best case) and b) the loop runs all the way to its natural end (worst case). (Note that the one thing we cannot do is choose particular values for \(n\), the size of the vector. You cannot simple say, “the best case is when the vector is empty”, as that would be the best case for any algorithm, and thus doesn’t give us any useful information. Remember: you can never simply pick a value for \(n\)!)

So our best case occurs when the first element of the vector == v and the worst cast occurs when none of the elements of the vector == v. Note that the worst case always occurs when the function returns false.

Best case Worst case
Calls to size(): S 1 \(n+1\)
Increments: I 0 \(n\)
Comparisons (==): C 1 \(n\)

So the total work of the function would be

$$S + C$$

in the best case and

$$S(n+1) + nI + nC$$

in the worst case.

Often you’ll need to determine whether a function has different best/worst cases, or are they the same. Look for loops that have a possibility of an early exit, either by return or by break.

Example 3: quadratic time

As an example, take a look at this function: It tries to determine whether any pair of elements from a pair of vectors multiply to equal a particular value:

bool has_product(vector<int> a, vector<int> b, int p) {

    for(int v1 : a)
        for(int v2 : b) 
            if(v1 * v2 == p)
              return true;                

    return false;
}

Once again, this function has both best and worst cases. The best case is simply when the first elements of both vectors, multiply to p. The worst case is more interesting, and occurs when the product p cannot be found by multiplying any two elements. Assuming that a and b have the same size, n:

The runtime of this algorithm increases proportionally to the square of the size of its input. That is, if we double the size of the vector, the runtime will roughly quadruple. This is much worse that a linear-time algorithm, where doubling the input size only doubles the runtime.

An algorithm like this is said to run in polynomial time, indicating that its runtime is proportional to its input size \(n^p\), where \(p \ge 2\). Obviously, \(n^3\) is worse than \(n^2\), and so forth.

Note that our analysis assumes that the input is “large”. For any problem, there will be some small inputs where a theoretically worse algorithm may perform better. The reason why is in the word “proportionally”. If the runtime is “proportional” to \(n^2\) then that means there is some unknown constant \(k\) such that the runtime \(t \approx k n^2\). But \(k\) varies depending on the implementation of the algorithm, so it’s possible for a polynomial algorithm to have a small \(k\), while the corresponding linear algorithm has a larger \(k\). In this case, the “slower” polynomial algorithm will be faster, for inputs up to some size. But there is always going to be a cutoff, a point where the linear version is always faster.

Big-O Notation

We refer to the function that an algorithm’s runtime is proportional to as its order and say that an algorithm is \(O(f(n))\) to mean that its order is \(f(n)\) (i.e., is roughly proportional to \(f(n)\)). This is colloquially known as “big-O” notation.

An algorithm is of order \(O(f(n))\) if there exist constants \(k\) and \(n_0\) such that for any input of size \(n > n_0\), the runtime of the algorithm is no more than \(k f(n)\).

Or, mathematically, a function \(f(n)\ \in O(g(n))\) iff

There exists \(k > 0\) such that for all \(n > n_0\):

$$f(n) \le k g(n)$$

(\(n_0\) captures the notion of our analysis applying to “big enough” inputs. Only if the size of the input is larger than some minimum does the proportionallity apply. Similarly, \(k\) captures proportionality.) We are assuming that both \(x(n)\) and \(f(n)\) are strictly positive.

If you want a calculus-style definition, we can also use a limit:

$$f(n) \in O(g(n))\; \text{iff}\; \lim_{n \rightarrow \infty} \frac{f(n)}{g(n)} \in [0, \infty)$$

(That is, the limit must exist and be finite.)

Intuitively, you should think of \(f(n) \in O(g(n))\) as meaning “\(f(n) \le K g(n)\) for some K and really big n”. (There are other “big-letter” notations which correspond to ≥, <, equal-to, etc.)

Summations of Big-O

There is a summation identity that will make working with big-O notation much easier: if \(f(i) \in O(i^p)\).

$$\Sigma_{i=1}^n f(i) = O(n^{p+1})$$

This basically says that if you sum up a function which is \(O(n^p)\) then the resulting function will be \(O(n^{p+1})\). \(n^2\) becomes \(n^3\) and so forth.

How Big-O works

Analyze how big-O works: what can we conclude from the definition of \(O\)?

Well, for one thing:

$$a_1 n^p + a_2 n^{p-1} + \ldots + a_p n^{1} + a_{p+1} \in O(n^p)$$

that is, we can drop all but the highest degree term in a polynomial.

For example, suppose we want have

$$100 n + n^4 \in O(n^4)$$

To show that this is true, all we have to do is figure out appropriate \(k\) and \(n_0\) to make the definition true. If we let \(k = 100, n_0 = 2\) then we have

$$100 n + n^4 \le 100 n^4, \quad n > n_0$$

Note that if we try to show that \(100 n + n^4 \in O(n^2))\) we will fail; there is no \(k, n_0\) that will make the definition true.

Some other properties:

$$c f(n) \in O(f(n)) \quad\mathrm{if}\quad c \ne 0$$

This says that we can ignore constant multiples (makes sense, because we can always fold them into \(k\)).

$$\text{if}\;f(n) \in O(F(n)), \quad g(n) \in O(G(n))$$ $$f(n) + g(n) \in O(F(n) + G(n)))$$

If we add two functions together, then the order of the sum is just the sum itself, but because in a sum we can drop all but the fastest growing term, we can rewrite this as

$$f(n) + g(n) \in O(\text{max}(F,G)(n))$$

On the other hand, if we take a product:

$$\text{if}\;f(n) \in O(F(n)), \quad g(n) \in O(G(n))$$ $$\text{then}\; f(n) g(n) \in O(F(n) G(n))$$

Note that this means that if we have some algorithm which we know is of order \(O(F(n))\), and we run it \(n\) times, then the result is of order \(O(n F(n))\).

Because big-O is a kind of generalized \(\le\), there is something like the transitive property for it:

$$\mathrm{if}\quad f(n) \in O(g(n))\quad\mathrm{and}\quad g(n) \in O(h(n))\quad\mathrm{then}\quad f(n) \in O(h(n))$$

Finally, adding a constant has no effect on a function’s order (because a constant is the lowest order, and hence always drops off).

$$f(n) \pm c \in O(f(n))$$

Question: which grows faster, \(n \log n\) or \(n^2\)? The multiplication rule means that we can factor out a common \(n\), thus we are left with the question of which grows faster, \(n\) or \(\log n\). This should hopefully be easier.

Which grows faster, \(n^{100}\) or \(2^n\)?

Complexity Classes

Complexity class Name
\(O(1)\) Constant
\(O(\log n)\) Logarithmic
\(O(n)\) Linear
\(O(n \log n)\) N-log N (“linerithmic”)
\(O(n^2)\) Quadratic
\(O(n^3)\) Cubic
\(O(n^p)\) Polynomial
\(O(2^n)\) Exponential
\(O(n!) \equiv O(n^n)\) Factorial