Review of last time
Logging in to the server. Hostname, port, username, password.
Splitting strings into words. States and inputs, tabular form, state machine diagram.
Implementing string splitting using strings and vectors
The string
class
#include <string>
using std::string;
string s = "Andy";
If you declare a string variable without initializing it, it defaults to the empty string:
string e; // e is the empty string
You can initialize a string object from a char*
literal. The basic string
operations are:
Operation | Description |
---|---|
s.at(i) , s[i] |
Access individual characters by index (char ) |
s.front() , s.back() |
Access first/last characters |
s.length() , s.size() |
Length of string |
s.empty() |
True if length is 0 |
s.substr(i) |
Extract substring starting at i , to the end |
s.substr(i,l) |
Extract substring, starting at i , of length l |
s1 + s2 |
Concatenate two strings to form a new one |
s1 < s2 , etc. |
Compare strings alphabetically |
s.c_str() |
Get a char* representation of a string |
s.find(c) , s.find(s2) |
Get position of the first occurrence of char c or string s2 in s |
s.rfind(c) , s.rfind(s2) |
Get position of last occurrence |
s.push_back(c); |
Add a character c to the end of the string |
s.pop_back(); |
Delete the last character of the string |
s.clear(); |
Delete all characters from string (len. = 0) |
s1 += s2; , s1.append(s2); |
Append s2 to the end of s1 , modifying s1 |
s1.insert(s2,i); |
Insert s2 into s1 , before position i |
s.erase(i,l); |
Erase characters starting at i , of length l |
s1.replace(s2,i,l); |
Replace l characters of s1 , starting at i , with s2 |
s1 = s2; |
Copy all of s2 into s1 |
find
and rfind
will return the special value string::npos
if the thing
you are searching for cannot be found at all.
Note that string
s do not have a nul character at the end! This means that
the length of a string
is strictly the number of characters in it, and the
last character (s.at(s.length() - 1)
) is not necessarily nul. (In fact,
string
s can have nul characters anywhere inside them!)
To read a single word from cin
, you can use
string w;
cin >> w;
This reads a word because the normal behavior of >>
is to stop at the first
space character it sees.
To read an entire line, use
string l;
getline(cin, l);
You can, of course, print strings normally:
string name = "Andy";
cout << "Hello, " << name << endl;
For our purpose, we’re going to read in an entire line (using getline
) and then
process it one character at a time. We can do the latter using a loop:
for(unsigned i = 0; i < s.length(); ++i)
// Use s[i] ...
Or we can use the fancy ranged-for-loop:
for(char c : s)
// Use c ...
Vectors
You can think of a vector as a version of a string that stores any kind of element, not just characters. The operations supported by vectors are somewhat more limited than those supported by strings, just because vectors cannot make any assumptions about what kind of data they are holding.
vector<int> vi; // empty vector of ints
vector<char> vc1(10); // Vector of 10 chars
vector<char> vc2(10, 'x'); // Vector of 10 'x' chars
vector<string> names = {"Bruce", "Richard", "Alfred"}; // vector of three strings
Vectors manage their own storage, so you don’t need to delete
them.
To access the elements of a vector by index, use either .at(i)
or [i]
:
v.at(i)
returns the element at positioni
(starting at 0 for the first position). Ifi
is less than 0, or larger than the last position, then anout_of_range
exception is thrown. (I.e., this is like our checked array.)v[i]
returns the element at positioni
. Ifi
is out of range, then the behavior is undefined (same as for accessing out-of-range on a normal array). (Some compilers will still check[]
vector accesses when compiling a project with debugging enabled, but you shouldn’t rely on this.)
The shortcut methods v.front()
and v.back()
provide easy access to the
first and last elements. These methods are “checked”: if the vector is empty,
they will throw an exception.
Other vector operations:
Operation | Description |
---|---|
v.size() |
Size of the vector (number of elements) |
v.empty() |
True if size is 0 |
v.clear() |
Reset size to 0, deleting all elements |
v1 == v2 |
True if v1 and v2 are identical |
v.push_back(e); |
Add a new element to the end of the vector |
v.pop_back(); |
Erase the last element of the vector |
v1 = v2; |
Copy all elements from v2 into v1 |
Note that unlike arrays, vectors support copying, via assignment. This means
that you can pass vectors as parameters to functions, and return from functions
as well. This makes vectors much easier to work with than arrays, especially
since you don’t need to worry about dynamically allocating them. The fact that
vectors can grow, via push_back
makes them even more useful; e.g., if you
want to read some number of ints in from the user, and you don’t know how many,
you can just do
vector<int> data;
int x;
while(cin >> x)
data.push_back(x);
It’s possible to insert/erase elements in the middle of a vector, but doing so requires the use of an iterator. Here’s an example to get you started:
vector<int> vs = {1, 2, 3, 4};
vs.insert(vs.begin() + 2, 100); // vs = { 1, 2, 100, 3, 4 }
vs.erase(vs.begin() + 1); // vs = { 1, 100, 3, 4 }
You can also insert the contents of one vector in the middle of another.
Splitting strings
Our table for states/inputs looks like this:
Letter | Space | |
---|---|---|
WORD | Add to cur. word Stay in WORD | Finish word Switch to SPACE |
SPACE | Start new word Switch to WORD | Ignore Stay in SPACE |
Implementing this as a function, it will take a string
as its input,
and return a vector<string>
as its output:
vector<string> split(string input) {
string word; // Current word
vector<string> output; // List of words
const int SPACE = 0;
const int WORD = 1;
int state = SPACE;
for(char c : input) {
}
return output;
}
All of our work needs to be done inside the for
loop. The current character
is c
, and the current state is state
. Since we have four table entries,
we’ll have an if-else
for each of the four possibilities:
for(char c : input) {
if(state == WORD && c != ' ') {
}
else if(state == WORD && c == ' ') {
}
else if(state == SPACE && c != ' ') {
}
else { // state == SPACE && c == ' '
}
}
Taking the first case, what do we need to do to add the character
c
to the current word?word.push_back(c);
To “stay in WORD” we don’t need to do anything at all; if we don’t change the value of
state
then we will stay in the same state.In the second case, we want to “finish the word”. This requires us to do two things: add the word to the vector of words, and then clear the word, resetting it to empty, so that when we start the next word, there won’t be anything in it.
output.push_back(word); word.clear(); state = SPACE;
Third, we want to “start a new word”. Because we cleared the word in the previous case, this just means adding the current character to it.
word.push_back(c); state = WORD;
Finally, “ignoring” a character means doing literally nothing, so the
else
case is actually empty!
There’s one last thing we have to think about, and that’s what happens at the end of the input string:
If
state == SPACE
then there is no current word, so we can just return the vector of words.If
state == WORD
however, then there is still a current word in-progress, and because we never saw a space, we haven’t yet “finished” it. We can handle this in two ways:We can add a space to the end of the input string, at the very beginning of the function:
input.push_back(' ');
We can check the current state at the end, and add the word to the vector if it is still WORD:
if(state == WORD) output.push_back(word); return output;
Which method you use is up to you: some people think its bad manners to modify the parameter values to a function, but doing so makes the end of the function simpler.
Trace through this on the input string "get the rock"
. Run example.
References and pointers
A reference is just another name for an existing variable or location:
int x = 1; // x is a variable, OK
int& y = x; // y is another name for x
Both x
and y
refer to the same thing. It’s impossible to distinguish them,
because they are just different names for the same object. Any changes to x
will be reflected in y
, and vice versa.
Because a reference is an “alias” for something, you must initialize a reference variable with another variable/location (or another reference):
int& z = 1; // ERROR
int& q; // ERROR
(The C++ name for things that you can get a reference to is “lvalue”, as in, values that can be on the left side of an assignment. Temporary objects are rvalues.)
You can, however, get a reference to an element of an array or vector (or string):
int arr[] = {1, 2, 3};
vector<int> vec = {5, 6, 7};
int& y = arr[1]; // OK, another name for arr[1]
int& z = vec[2]; // Also OK
With vectors, this can be a little dangerous, because the size of a vector can change, and thus there’s the possibility that the thing the reference is referring to might disappear. But as long as you are careful, everything is OK.
Functions can take references as parameters, in which case the formal parameter becomes “another name” for whatever argument is used when the function is called. Similarly, functions can return references (as long as the thing referred to still exists after the function exits!)
References can cause some weird behavior: Look at this function; can you see any way in which it might print out a 2, instead of a 1?
void f(int& a, int& b) {
a = 1;
b = 2;
cout << a << endl;
}
References have some limitations:
There’s no way to change a reference, to make it refer to something else. Anything you do to the reference is interpreted as an action on the thing it refers to.
Because of this, there’s (almost) no way to “compare” references to see if they refer to the same thing or not.
Because you can’t change references, there’s no point in having a reference to nothing, because it would always be useless!
Pointers
Pointers remove these limitations, but add some complexity in doing so. A pointer is like a reference that can change what it refers to. However, the simplicity of references comes from the fact that they cannot change; hence, there are no operations on the reference. Anything you do to a reference variable is transparently done to the the on the other end of the reference. Because pointers can change, this means that we now need two different syntaxes:
We need a way of referring to the thing on the other end of the pointer.
We need a way of referring to the pointer itself.
In other words, while references have no “identity”, the don’t exist on their own and are just aliases for other things, pointers are objects in their own right, and thus we need some way of manipulating them, as opposed to the thing they point to. All of the complexity of pointers springs from this duality: we always have to be clear what we are doing, are we manipulating the pointer or the object it points to?
Semantically, pointers work by storing addresses. Every (lvalue) object in our program exists somewhere in the computer’s memory; every location in the computer’s memory has an address, a number. A pointer stores an address of another thing. “Dereferencing” the pointer means going to the address it contains.
Pointer syntax:
Type | Description |
---|---|
T* |
“Pointer to T” |
T& |
“Reference to T” |
Expression | Description |
---|---|
*p |
Get object pointed to by p (look at the addr. in p ) |
&v |
Get pointer to object v (get the addr. of v ) |
The connection between the two is
If
p
has typeT*
then*p
has typeT
If
v
has typeT
then&v
has typeT*
Thus, for expressions, *
and &
are kind of like opposites. &
adds a layer
of “pointer to-ness” while *
removes a layer.
Although pointers are often introduced with dynamic memory, you don’t actually have to do that. Just like with a reference, you can get a pointer to any lvalue:
int x = 1;
int* y = &x; // y points to x
x++; // Increments x
(*y)++; // Also increments x
(Note that ++
has higher precedence than *
, so the parentheses are necessary!)
As with references, you can get a pointer to an element of an array, vector, or string:
int arr[] = {1, 2, 3};
vector<int> vec = {2, 3, 4};
string str = "Hello";
int* ap = &arr[1];
int* vp = &vec.front(); // Same as vec.at(0)
char* cp = &str[3];
Once again, with strings and vectors, because elements can be removed, you have to be careful to make sure that the pointers still point to something that exists! A pointer that points to a non-existent object is called dangling and using it results in undefined behavior!
If we do ap == vp
will this be true or false? What are we asking? We are
asking if ap
and vp
point to the same thing. We are not asking anything
about the values pointed to by ap
and vp
. Similarly, if I do, ap = vp;
what happens? Do any of the values in the array or vector change? No: only the
pointer ap
changes what it is pointing to. The most important thing about
pointers is to get yourself clear on when we are talking about the pointers
themselves, and when we are talking about the things pointed to. If you have
some pointer variables, and there aren’t any *
s in the expression, then you
are talking about the pointers. If there are *
s then you are probably talking
about the objects pointed to.
Pointers to array/vector elements
If we have a pointer into an array, vector, or string, we can do some interesting things with it:
vector<int> vec = {1, 2, 3, 1};
int* p1 = &vec[0];
int* p2 = &vec[3];
Consider the following:
p1 == p2 // True or false?
*p1 == *p2 // True or false?
p1 < p2 // True or false?
p1 + 3 // What does this mean?
*(p1 + 3) // What about this?
p1[3] // This?
p1+3 == p2 // True or false?
p1++; // What happens if we do this?
Pointers into an array can have arithmetic done on them (adding/subtracting integers) and it will cause them to move around within the array. Adding 1 moves the pointer one element forward within the array. Comparisons between pointers effectively compare the the indexes within the array that they point to. Note that none of the above operations change the elements in the array!
Pointers to structure and class instances
There are a few extra things we can do with pointers if we have a struct or a class. Here’s an example struct:
struct thing {
int a;
char* b;
string s;
};
char c = '?';
thing t1 = {1, &c, "Hello"};
thing* tp = &t1;
How do I refer to each of the members of t2
?
(*tp).a // == 1
(*tp).b // == pointer to c
(*tp).c // == "Hello"
This is a bit cumbersome to type out, so there’s a shortcut:
tp->a
tp->b
tp->c
What if I want to access the actual char
pointed to by tp->b
?
*((*tp).b) // == '?'
*(tp->b) // == '?'
We can also get a pointer to a member of the structure:
int* ip = &(t1.a);
= &(tp->a); // Same thing
Finally, nothing stops us from having multiple pointers to the same structure:
thing* tp2 = tp;
tp == tp2 // True
tp->a = 2; // Also changes tp2->a
All of this applies equally to classes, and to methods. If we have a class
class thing {
public:
void run() {
cout << "WHATUP" << endl;
}
};
and then we have a pointer to an instance of that class
thing* t = ... ;
Then we can use either (*t).run()
or t->run()
to dereference t
and
call the run
method:
(*t).run();
t->run(); // Same thing
In this semester, we’ll use a lot of pointers, but they will usually be pointers
to structs or classes, so we’ll use ->
a lot. We’ll almost never use &
,
because we generally won’t need to get pointers to existing objects. We’ll
get our pointers by creating objects dynamically, via new
. (Remember that
new T
returns a T*
, a pointer to a T
.
The null pointer
nullptr
is the value you should use for the null pointer (the pointer that
points to nothing at all), not NULL
. The null pointer is unique: no other,
non-null pointer is ==
to it, and it is ==
only to other null pointers. Thus,
you can use nullptr
to signal that a pointer doesn’t point to anything at all.
This gives pointers a bit of power that normal “values” don’t have. E.g., if we
have an int
variable, there is no way to say that it contains “nothing”; it
always contains some int
value. But if we have an int
pointer then there
are two possibilities:
It is non-null, so then it points to some
int
(and hence has a value)It is null, and thus has no
int
value at all.
Dynamic memory allocation
I’ve intentionally separated pointers from dynamic memory because often students
assume that whenever you have pointers to must also have dynamic allocation
somewhere, but this is not the case. You can get a pointer to anything that
“hangs around”. Dynamically allocated objects are particularly convenient for
this, because they hang around until we delete
them, but its not required to
use them. Dynamic allocation is useful when we don’t how what or how many
objects we will need (or how long they need to live), until our program is running.
Stack vs. heap. Activation records. Heap allocation. Lifetime of objects.
new
finds space on the heap, big enough for an object of the given type,
and then gives you a pointer to that space. (The array-new
finds enough
space for multiple objects of the same type.) Because the object is allocated
on the heap, it will remain “alive” (taking up space) until we delete
it.
A common mistake (coming from the assumption that pointers have to be used with dynamic allocation) is to write something like this:
int x = 12;
...
// Now we want a pointer to x
int* p = new int();
p = &x;
// carry on with p
What’s wrong with this? We create a new (dynamic) int
, but then immediately
discard the pointer to it, replacing it with a pointer to x
. This means that
we have no way to refer to the heap memory we just reserved, and hence
no way to delete it. That memory is lost to our program, until our program
exists; we have created a memory leak. The correct way to do this is
int* p = &x;
// carry on with p
If the thing you want to get a pointer to already exists, then just initialize
your pointer to that; there’s no need to allocate any space. Only use new
when you want to reserve some additional space on the heap.
Similarly, just because we are done with p
does not mean we need to delete
it. In this case, because the thing pointed to by p
was not created via new
,
it should not be destroyed via delete
.
An object can only be delete
-d once; we saw last time that trying to delete
something more than once crashes your program with a “double-free” error. This
means that if we allocate something, we need to decide what part of our program
“owns” it, and that part is responsible for delete
-ing it. A pointer that
points to an object that it will eventually delete is called an “owning” pointer.
It “owns” the object on the other end, and somewhere, that object will be
deleted via that pointer. There should only ever be one owning pointer to an
object; as more than one would mean that the object would be deleted more than
once. There can, of course, be any number of non-owning pointers to an
object, because those won’t delete
it when they are done.
Assignment 1
The first assignment is a kind of C++/CSci 123 review, but also hopefully gets you thinking about the kind of issues that will be important to us. The assignment is to implement an ordered array. A lot of the operations will be similar to those on the bag structure we implemented earlier, but a few are different.
Like a bag, an ordered array has a maximum size (capacity), set when it is created and then never changed after that. (The
ordered_array
class adds a method which returns the capacity.)Like a bag, we can ask for the current
size()
of an ordered array, which will always be ≤ its capacity.Like a bag, we can use
at(i)
to access the i-th element of the array. Unlike a bag, ifi < 0
ori > size()
thenat(i)
shouldthrow std::out_of_range("...");
with a suitable message.at
return a reference, but you don’t have to do anything special to return a reference, so don’t worry about it.The combination of
at
andsize
means that we can loop over the elements of an ordered arrayarr
with:for(int i = 0; i < arr.size(); ++i) cout << arr.at(i) << endl;
The key property of an ordered array is that a loop like this will always process the elements of the array in ascending order. In other words,
arr.at(i) <= arr.at(j)
wheneveri <= j
. As elements are added/removed to/from the array, the sorted order of the elements must be maintained.New elements can be added to the array with
insert(e)
. If the array is full,insert
should do nothing.Existing elements can be removed with
remove(e)
. Note that unlike a bag,remove
takes an element’s value and not its index. If the element does not exist, nothing happens.We can check whether an element exist with
exist(e)
which returns true if it does, and false otherwise.
Because an ordered array has a size, a capacity, and its elements, you will probably need three data members: size, capacity, and a dynamic array of elements (you could also use a vector).
The description of how the ordered array works says that the special value
-2147483648 cannot be stored in the array. E.g., if you try to insert
it,
nothing happens, exists(-2147483648)
always returns false, etc. This seems
arbitrary, but it’s done so that if you want, you can use -2147483648 as a
“special” value internally. E.g., if you store the contents of the ordered
array in a (dynamic) array, you could implement remove(e)
by finding the
location of e
and then simply putting a -2147483648 there, to mark it as
“deleted”. This method will require you to implement every other method to
ignore -2147483648 entries. If you use this method, you don’t need to
store the size
: the size of the array is just the number of entries that
are != -2147483648
.
Alternatively, you could implement it the way we implemented the bag, by
using the front portion of the array, starting at index 0, for the “used”
elements, and the second half for unused elements. Using this method, you must
store the size of the array, and remove
must shift array elements around
(we cannot simply swap to the end, because that would un-sort the array!).
The assignment asks you to think about how much time the various operations will take, relative to the size of the array. E.g., as the array grows, which operations will get slower? Which will take the same amount of time? We’ll investigate these ideas further in the next few lectures.
Analyzing algorithms
We’re going to talk about analyzing algorithms, how can we quantify the time-taken (or memory used) by a given algorithm? We could run it and measure it, but that doesn’t give us enough information: it only tells us how fast it runs, on one particular input, on one particular computer and compiler. What we want is a more general way of describing the behavior of an algorithm, not in terms of “seconds” or “bytes”, but more abstractly.
In fact, what we are going to consider is not “how long” an algorithm takes, but rather, the growth rate of runtime, as the size of the input increase. E.g., for one algorithm, doubling the size of the input might double the runtime; for another, it might quadruple. Clearly, the former is better than the latter, even though we don’t know anything about the actual runtimes.
Before we can do that, we need to review some math:
Review of summation notation
This means, add up all the values of \(f(i)\) for every f between j and k (including \(i = k\)). In other words,
The simplest summation is just
Because it’s just a big sum, there are some things we can do with it:
Factoring
What is
This sum works out to
which we could factor into
but the thing inside the parens is our original sum, so we can say that
That is, for any constant c (and by “constant” we mean anything that does not depend on i), we can factor
Splitting
What is
The sum works out to
but we can rearrange the terms into
and this is just
In other words, if the body of a summation is itself a sum, we can split it up into multiple summations.
Removing terms
Finally, we can remove a number of terms from a summation by adjusting the ends, either as
or as
Of course, we could remove more than one term if that was useful, or remove terms from both the beginning and the end. We just have to make sure we adjust the endpoints properly!
Change of variable
Finally, sometimes it’s useful to adjust the endpoints themselves, by pushing them up or down. In order to do this, we have to replace the summation variable \(i\) with a suitable adjusted \(i’\). For example,
where \(i = i’ - 1\). (If the presence of the final \(k+1\) term in the new sum was problematic, we could use the term-removal technique to strip it off.)
Basics of complexity analysis
Example 1: largest element
int largest(vector<int> values) {
assert(values.size() > 0);
int l = values[0];
for(int i = 1; i < values.size(); i++)
l = values[i] > l ? values[i] : l;
return l;
}
We’re first going to do a fine-grained analysis of the work that this function
does, in terms of the number of operations. We’re then going to take that
and derive an asymptotic bound for the work, as the size of the input
(i.e., values.size()
) gets large.
How long does this take to run, for a vector of a given size()
? Well, that
largely depend on things like:
How long does it take to compare things?
How long does assignment take?
How long does it take to evaluate
values.size()
How long does it take to increment i?
Note that none of these things depend on the size of the vector. Each of these things will take some constant, but unknown, amount of time, regardless of how big of a vector we give it, so we can represent them as constants:
Compare: \(C\)
Assign: \(A\)
Compute size: \(S\)
Increment: \(I\)
Now, if values.size() == 1
how long will the program take to run? We do
1 compare (in the assert)
1 assignment (from
values[0]
to l)1 assignment (
i = 1
)1 size compute (in
i < values.size()
)1 compare (
i < values.size()
)
Giving us a total time of \(2C + 2A + S\). If values.size()
\(= n\), then
we can figure out the total time taken, by looking at how much work is
involved in the loop body:
1 compare (
values[i] > l
)1 assignment (
l = ...
)1 increment (
i++
)1 size lookup (
i < values.size()
)1 compare (
i < values.size()
)
So we have a base time of \(2C + 2A + S\), plus a per-iteration time of \(2C + A + I + S\) giving us a total time of
Note that for all but the smallest vectors, the second term is going to dominate the the sum, so much so that we could really say that the total time is approximately proportional to \(n\), the size of the vector. A function like this one, that takes time proportional to the size of its input for “big enough” inputs, is generally said to run in “linear time”, meaning that the run time grows linearly with the size of the input: double the size of the input, and the runtime (roughly) doubles, too.
Note that this classification (linear time) is true regardless of what
computer we run this function on. Different computers will have different
constants, thus it makes no sense to ask, in the abstract, how many seconds
will largest
take to run for some input, but we can abstractly ask, how
fast will its runtime increase, relative to an increase in the size of its
input? In this case, there is a linear correlation.
We generally care more about this kind of classification of an algorithm, than the raw “run time” sum we computed above. This allows us to classify algorithms by how fast their run time grows, as the input size increases. If algorithm A grows more slowly than algorithm B we can say that A is “better” than B.
Example 2: finding an element
bool find(vector<int> values, int v) {
for(int i = 0; i < values.size(); ++i)
if(values[i] == v)
return true;
return false;
}
We can analyze the work of this function in terms of increments, equality comparisons,
and calls to values.size()
(there are no simple assignments in this function).
Here, however, we run into a problem: the number of times that the loop runs
depends on the values inside the vector. That is, even for two vectors with
the same size
, the function might take dramatically different amounts of time.
In this case, instead of just asking for the work of the function, we analyze
the best and worst cases. (The function from the previous example has its
best and worst cases identical, so there’s no point in looking at them separately.)
In order to determine what the best/worst cases are, we need to figure out how to “rig” the contents of the vector so that a) the loop exits as early as possible (best case) and b) the loop runs all the way to its natural end (worst case). (Note that the one thing we cannot do is choose particular values for \(n\), the size of the vector. You cannot simple say, “the best case is when the vector is empty”, as that would be the best case for any algorithm, and thus doesn’t give us any useful information. Remember: you can never simply pick a value for \(n\)!)
To make the loop exit as early as possible, we want the
if
conditionvalues[i] == v
to be true the first time through the loop, wheni == 0
. So we simply say that the first element of the vector is equal tov
. (This is the case where the searched-for element is found right away.)To make the loop run all the way to the end, the
if
condition must never be true. Thus, none of the elements of the vector must be equal tov
.
So our best case occurs when the first element of the vector == v
and the
worst cast occurs when none of the elements of the vector == v
. Note that
the worst case always occurs when the function returns false
.
Best case | Worst case | |
---|---|---|
Calls to size() : S |
1 | \(n+1\) |
Increments: I | 0 | \(n\) |
Comparisons (== ): C |
1 | \(n\) |
So the total work of the function would be
in the best case and
in the worst case.
Often you’ll need to determine whether a function has different best/worst
cases, or are they the same. Look for loops that have a possibility of an
early exit, either by return
or by break
.
Example 3: quadratic time
As an example, take a look at this function: It tries to determine whether any pair of elements from a pair of vectors multiply to equal a particular value:
bool has_product(vector<int> a, vector<int> b, int p) {
for(int v1 : a)
for(int v2 : b)
if(v1 * v2 == p)
return true;
return false;
}
Once again, this function has both best and worst cases. The best case is
simply when the first elements of both vectors, multiply to p
. The
worst case is more interesting, and occurs when the product p
cannot be
found by multiplying any two elements. Assuming that a
and b
have the
same size, n
:
The outer loop will be executed \(n\) times.
For each time the outer loop is executed, the inner loop will be executed \(n\) times.
Thus, the total number of times the inner loop body will be executed is \(n^2\).
The runtime of this algorithm increases proportionally to the square of the size of its input. That is, if we double the size of the vector, the runtime will roughly quadruple. This is much worse that a linear-time algorithm, where doubling the input size only doubles the runtime.
An algorithm like this is said to run in polynomial time, indicating that its runtime is proportional to its input size \(n^p\), where \(p \ge 2\). Obviously, \(n^3\) is worse than \(n^2\), and so forth.
Note that our analysis assumes that the input is “large”. For any problem, there will be some small inputs where a theoretically worse algorithm may perform better. The reason why is in the word “proportionally”. If the runtime is “proportional” to \(n^2\) then that means there is some unknown constant \(k\) such that the runtime \(t \approx k n^2\). But \(k\) varies depending on the implementation of the algorithm, so it’s possible for a polynomial algorithm to have a small \(k\), while the corresponding linear algorithm has a larger \(k\). In this case, the “slower” polynomial algorithm will be faster, for inputs up to some size. But there is always going to be a cutoff, a point where the linear version is always faster.
Big-O Notation
We refer to the function that an algorithm’s runtime is proportional to as its order and say that an algorithm is \(O(f(n))\) to mean that its order is \(f(n)\) (i.e., is roughly proportional to \(f(n)\)). This is colloquially known as “big-O” notation.
An algorithm is of order \(O(f(n))\) if there exist constants \(k\) and \(n_0\) such that for any input of size \(n > n_0\), the runtime of the algorithm is no more than \(k f(n)\).
Or, mathematically, a function \(f(n)\ \in O(g(n))\) iff
There exists \(k > 0\) such that for all \(n > n_0\):
$$f(n) \le k g(n)$$(\(n_0\) captures the notion of our analysis applying to “big enough” inputs. Only if the size of the input is larger than some minimum does the proportionallity apply. Similarly, \(k\) captures proportionality.) We are assuming that both \(x(n)\) and \(f(n)\) are strictly positive.
If you want a calculus-style definition, we can also use a limit:
(That is, the limit must exist and be finite.)
Intuitively, you should think of \(f(n) \in O(g(n))\) as meaning “\(f(n) \le K g(n)\) for some K and really big n”. (There are other “big-letter” notations which correspond to ≥, <, equal-to, etc.)
Summations of Big-O
There is a summation identity that will make working with big-O notation much easier: if \(f(i) \in O(i^p)\).
This basically says that if you sum up a function which is \(O(n^p)\) then the resulting function will be \(O(n^{p+1})\). \(n^2\) becomes \(n^3\) and so forth.
How Big-O works
Analyze how big-O works: what can we conclude from the definition of \(O\)?
Well, for one thing:
that is, we can drop all but the highest degree term in a polynomial.
For example, suppose we want have
To show that this is true, all we have to do is figure out appropriate \(k\) and \(n_0\) to make the definition true. If we let \(k = 100, n_0 = 2\) then we have
Note that if we try to show that \(100 n + n^4 \in O(n^2))\) we will fail; there is no \(k, n_0\) that will make the definition true.
Some other properties:
This says that we can ignore constant multiples (makes sense, because we can always fold them into \(k\)).
If we add two functions together, then the order of the sum is just the sum itself, but because in a sum we can drop all but the fastest growing term, we can rewrite this as
On the other hand, if we take a product:
Note that this means that if we have some algorithm which we know is of order \(O(F(n))\), and we run it \(n\) times, then the result is of order \(O(n F(n))\).
Because big-O is a kind of generalized \(\le\), there is something like the transitive property for it:
Finally, adding a constant has no effect on a function’s order (because a constant is the lowest order, and hence always drops off).
Question: which grows faster, \(n \log n\) or \(n^2\)? The multiplication rule means that we can factor out a common \(n\), thus we are left with the question of which grows faster, \(n\) or \(\log n\). This should hopefully be easier.
Which grows faster, \(n^{100}\) or \(2^n\)?
Complexity Classes
Complexity class | Name |
---|---|
\(O(1)\) | Constant |
\(O(\log n)\) | Logarithmic |
\(O(n)\) | Linear |
\(O(n \log n)\) | N-log N (“linerithmic”) |
\(O(n^2)\) | Quadratic |
\(O(n^3)\) | Cubic |
\(O(n^p)\) | Polynomial |
\(O(2^n)\) | Exponential |
\(O(n!) \equiv O(n^n)\) | Factorial |