Recursion

I’m going to present pretty much all of the sorting algorithms recursively, so we should probably talk about recursion. Recursion is a really mind-expanding technique, once you get the hang of it. It’s also the foundation for what could be called the “mathematical interpretation” of computer programming, so if you’re a CSci major, you’ll have to get comfortable with it sooner or later. So let’s look at some simple algorithms, both the iteratively (using loops) and recursively.

Finding the factorial

The factorial of n is defined as the product $n (n-1) (n-2) \ldots (2) (1)$, i.e., the product of all integers up to and including n. It’s easy to write as a loop:

int factorial_iter(int n) {
    int r = 1; // Factorial of 0 is 1
    for(int i = 1; i <= n; ++i)
        r *= i;
    return r;
}

To write this, or any other algorithm, recursively, we have to ask two questions:

What is the smallest case, the case where I can give the answer right away? This is called the “base case”. (Sometimes there might be more than one smallest case, and that’s OK.)
For anything that is not the smallest case, how do I break it down to make it smaller? This is called the recursive case.

For the factorial, the base case is what happens when $n = 0$: the loop doesn’t run at all, and 1 is returned. So we can start our recursive version with

int factorial_rec(int n) {
    if(n == 0)
        return 1;
    else
        ...
}

To construct the recursive case, we need to look at what happens when n > 0. In particular, how can we break $n!$ down into some $n’ !, n’ < n$? The most common case is $n’ = n - 1$.

One way to look at this is to assume that we already have the value of $(n-1)!$, and we want to get $n!$ from it. That is, assume that factorial_rec(n - 1) will work and give us the right answer; we just need to construct the factorial of n from it. How can we do this? $n! = n (n-1)!$. So we write our recursive case like this:

int fact(int n) {
    if(n == 0)
        return 1;
    else
        return n * fact(n - 1);
}

Let’s take a minute to walk through the process of computing factorial_rec(3):

fact(3) = 3 * fact(2)
fact(2) = 2 * fact(1)
fact(1) = 1 * fact(0)
fact(0) = 1 and at this point we can work our way back up, giving the result 3 * 2 * 1 * 1 = 6.

Inductive proof

How do we show that a function does what it is supposed to do? We could test it, running it thousands or millions of times and verifying that its output is what we expect, but this requires us to come up with an independent way to define what the function does (e.g., a different way of computing the factorial), which might itself be incorrect, and furthermore, repeated testing can only ever give us a statistical confidence that our algorithm is correct. If we want to be sure, then we need a logical, or mathematical proof that it is correct. For recursive functions, this often takes the form of proof by induction. An inductive proof is kind of the mathematical equivalent to a recursive function. Like a recursive function it has base case(s) (one base case, in fact, for every base case in the function), and the base cases are usually easy. It also has inductive case(s) (one for each recursive case in the function), which are somewhat more tricky, but allow us to do something like recursion.

Consider the example above. We want to prove that fact(n) =$n!$, where the definition of $n! = n(n-1)(n-2)\ldots(2)(1), 0! = 1$.

Proof by induction on n (whatever variable we do the recursion on, we say we are doing “proof by induction” on that variable):

Base case, $n = 0$ Then fact(0) = 1 =$0!$ and we are done.
Inductive case: In the inductive case, we are trying to prove that fact(n) =$n!$` for some $n > 0$. We can break down the left and right sides and see that we actually have

$$n \cdot \mathtt{fact}(n-1) = n! = n (n-1) (n-2) \ldots (2)(1)$$

Dividing through by $n$ we get

$$\mathtt{fact}(n-1) = (n-1)(n-2) \ldots (2)(1) = (n-1)!$$

In other words, we have reduced the problem of proving that fact(n) =$n!$ to the problem of proving that fact(n-1) =$(n-1)!$. That doesn’t seem useful, but as in a recursive function, where we can call the function itself with a smaller argument, in an inductive proof we can reuse the proof itself as an assumption, for a smaller $n$. We call this assumption the inductive hypothesis and it looks like this:

$$\text{Assume}\qquad\mathtt{fact}(n’) = n’! \qquad \text{for all}\; n’ < n$$

If we let $n’ = n-1$ then we have

$$\text{Assume}\qquad\mathtt{fact}(n-1) = (n-1)!$$

which is exactly what we needed above! Substituting in this for the above we get

$$(n-1)! = (n-1)!$$

and we are done.

Like recursion, the heart of an inductive proof is the act of applying the proof itself as an assumption about “smaller” values ($n’ < n$). Technically, there are two kinds of inductive proofs:

“Natural number induction” only lets us make the assumption about $n’ = n-1$. That is, we can only make the assumption about an “input” that is one smaller than the original.
“Strong induction” lets us use any $n’ < n$. You can use strong induction anywhere where you can use natural number induction, but it isn’t always required.

The integer exponent calculation

Remember when we worked out the runtime complexity of our “optimized” $O(\log n)$ function for finding a $b^n$? We can write a recursive version of that as well. Once again, we have to ask

What is the base case? In this case, it’s when $n = 0$. In that case, $b^0 = 1$, no matter what f is (there is some debate about $0^0$).
What is the recursive case? How do we break down $b^n$ into $b^{n’}$? Here, we’re going to take our queue from our earlier implementation:

$$b^n = (b^{n/2})^2 \quad\text{if } n \text{ is even}$$

$$b^n = b \cdot b^{n-1} \quad\text{if } n \text{ is odd}$$

This gives us the following definition:

float powi(float b, int n) {
    if(n == 0)
        return 1;
    else if(n % 2 == 0) { 
        // Even
        float fp = powi(b, n / 2);
        return fp * fp;
    }
    else if(n % 2 == 1) // Odd
        return f * powi(b, n - 1);
}

This has the same complexity as the loop-based version, and is arguably simpler.

In this case, if we want to prove that $\mathtt{powi}(b,n) = b^n$ we’ll need strong induction, because one of the recursive cases shrinks the input by something other than just -1.

Proof that $\mathtt{powi}(b,n) = b^n$ by strong induction on $n$:

Base case: $n = 0$

Then by looking at the program we get $\mathtt{powi}(n,0) = 1 = b^0$ and we are done.
Inductive case: $n > 0$, prove $\mathtt{powi}(b,n) = b^n$. Here there are actually two inductive cases, one each for the two recursive cases in the function. Our inductive hypothesis (assumption) is

$$\mathtt{powi}(b,n’) = b^{n’}\qquad \text{for all}\; n’ < n$$
- Case 1, $n$ is even. Then replacing the call to powi by its return value we have
  
  $$\mathtt{powi}(b, n / 2)^2 = b^n$$ $$\mathtt{powi}(b, n / 2)^2 = (b^{n/2})^2$$
  
  Taking the square root of both sides:
  
  $$\mathtt{powi}(b, n/2) = b^{n/2}$$
  
  at which point we can apply the IH, with $n’ = n/2$, giving
  
  $$b^{n/2} = b^{n/2}$$
- Case 2, $n$ is odd. Then expanding powi we get
  
  $$b \cdot \mathtt{powi}(b,n-1) = b \cdot b^{n-1}$$ $$\mathtt{powi}(b,n-1) = b^{n-1} (\text{ divide by } b)$$ $$b^{n-1} = b^{n-1}\qquad\text{by IH, }(n’ = n-1)$$
And the proof is complete

Mutual recursion

Mutual recursion is when we define several recursive functions in terms of each other. For example, consider the following definition of even and odd:

A natural number n is even iff $n-1$ is odd.
A natural number n is odd iff $n-1$ is even.
$1$ is odd, $0$ is even. (These are our base cases.)

We can then define two functions (predicates) that recursively refer to each other:

bool is_even(int n) {
    if(n == 0)
        return true;
    else if(n == 1)
        return false;
    else
        return is_odd(n - 1);
}

bool is_odd(int n) {
    if(n == 0)
        return false;
    else if(n == 1)
        return true;
    else
        return is_even(n - 1);
}

If we track out the processing of determining is_even(4), we’ll see that it bounces back and forth between is_even and is_odd.

Binary search

We did a binary search iteratively, but we can do it recursively as well:

There are two base cases: when we find the item, or when the search space is reduced to 0 (indicating that the item is not found).
The recursive case compares the value of the target to the value at the current midpoint, and then reduces the size of the search space (by recursively searching either the left or right sides).

This looks like

template<typename T>
int binary_search(const vector<T>& data, 
                  int low = 0, 
                  int high = data.size()-1,
                  const T& target) {

    if(low > high)
        return -1;

    int mid = low + (high - low) / 2; // Why did I do this?

    if(data.at(mid) == target)
        return mid;
    else if(data.at(mid) < target) // Search right
        return binary_search(data, mid+1, high, target);
    else if(data.at(mid) > target) // Search left
        return binary_search(data, low, mid-1, target);
}

Other examples: Counting the number of copies in a vector. For any vector-style recursion, we need to keep track of our “starting place” within the vector. This is because we can’t make the vector itself smaller, so we have to put a marker into it showing where we are starting. We can do this in two ways, with an int start parameter, or by using iterators.

template<typename T>
int count(vector<T> data, int start, T target) {
    if(start == data.size())
        return 0;
    else
        return (data.at(start) == target) +
               count(data, start + 1, target);
}

With iterators:

template<typename T, typename It>
int count(It start, It finish, T target) {
    if(start == finish)
        return 0;
    else
        return (*start == target) + 
               count(start + 1, finish, target);
}

Iterators are kind of like pointers.

Sorting algorithms

A sorting algorithm is a function that takes a sequence of items and somehow constructs a permutation of them, such that they are ordered in some fashion. Usually, we want things to be ordered according to the normal comparison operators, so that if a < b then a comes before b in the final permutation. Still, there are a few things we have to make sure we get right:

Obviously we can’t lose any elements through the process.
There may be duplicates in the original, if so, there should be an equal number of duplicates in the output.
For convenience, we allow sorting an empty sequence (which, when sorted, results in yet another empty sequence)

There are some terms associated with sorting that it’s important to be aware of:

Stability – when the input sequence has element which compare as equal but which are distinct (e.g., employees with identical names but otherwise different people) the question arises as to whether, in the output sequence, they occur in the same order as in the original. E.g., if employee “John Smith” #123 came before “John Smith” #234 in the original sequence, then we say that a sort is stable if #123 is before #234 in the result.

Stability is important when sorting a sequence on multiple criteria. E.g., if we sort based on first name, then based on last name, an unstable sort won’t give us the result we want: the first name entries will be all mixed up.
Adaptability – some input sequences are already partially (or completely) sorted; an adaptable sorting algorithm will run faster (in big-O terms) on partially sorted inputs. The optimal runtime for a completely sorted input is $O(n)$, the time that it takes to verify that the input is already sorted.
In-Place – an in-place sorting algorithm is one that needs no extra space (i.e., it’s space complexity is $O(1)$) to sort. Some algorithms cannot be used in place, and have to construct a separate output sequence of the same size as the input, to write the results into.
Online – some datasets are too large to fit into memory at all. An online sorting algorithm is one that requires all its data to be accessible (in memory). Offline sorting algorithms can sort data which is partially in memory, partially on disk or other “slow” storage, without affecting their time complexity.

Selection sort

We already looked at selection sorting, so let’s look at it again:

To selection sort a list of items, we first find the smallest item in the entire list, and put it at the beginning.
Then we find the smallest item in everything after the first item, and put it second.
Continue until there’s nothing left unsorted.

Effectively, selection sort splits the list into the sorted region at the beginning, and the unsorted region at the end. The sorted region grows, while the unsorted region shrinks.

Selection sort is not stable.

Iteratively, this looks like this:

template<typename T>
void selection_sort(vector<T> data) {
    for(auto it = begin(data); it != end(data)-1; ++it) {

        // Find smallest
        auto smallest = it;
        for(auto jt = it+1; jt != end(data); ++jt)
            if(*jt < *smallest) 
                smallest = jt;

        // Swap it into place
        swap(it, smallest);
    }
}

Let’s trace through this on a small example to get a feel for how it works.

How can we implement this recursively? Instead of passing around the actual vector, we’re just going to pass around the iterators to the beginning and end of the vector. Why we do this will become obvious, shortly:

template<typename Iterator>
void selection_sort(Iterator first, Iterator last) {
    ...
}

Let’s analyze the recursion:

The base case is when last == first + 1. That means there’s only 1 element, and a 1-element list is always sorted.
The recursive case is when last - first > 1. In that case, we recursively break it down by
- Finding the minimum of the region, and placing it at the beginning.
- Recursively selection-sorting first+1 through last (because the element at first is now in the correct place).

template<typename Iterator>
void selection_sort(Iterator first, Iterator last) {
    if(last - first == 1)
        return; // 1 element, nothing to sort
    else {
        // Find minimum
        Iterator smallest = first;
        for(Iterator it = first; it != last; ++it)
            if(*it < *smallest)
                smallest = it;

        // Swap into place
        swap(*smallest, *first);

        // Recursively sort the remainder
        selection_sort(first + 1, last); 
    }
}

Let’s draw the recursion tree for this. We won’t trace through the loop, we’ll just assume (for now) that it works correctly.

DIAGRAM

Sorting algorithm analysis

Besides analyzing the general best/worst-case big-O runtime of a sorting algorithm, it’s common to also analyze two other runtime features:

The number of comparisons between elements. This only counts comparisons between the objects being sorted, not the comparison of (e.g.) a loop counter variable.
The number of swaps between elements.

Analyzing the number of comparisons and swaps is useful because these operations lie at the heart of any sorting algorithm: we cannot know whether elements are out of order until we compare them (and, if the elements are complex, comparing them may take a non-trivial amount of time), and we cannot put them in the right order without moving them around – i.e., swapping them.

Bubble sort

Bubble sort is a sorting algorithm that isn’t really suited for recursive implementation. The idea is to compare adjacent elements in the input (e.g., a[i] and a[i+1]) and swap them if they are out of order. We start at the beginning of the input and walk through it, swapping our way to the end. After one such pass, the largest element will have “bubbled” up to the last element of the array. So then we can make another pass, but we skip the last element this time. Whereas selction sort made smaller and smaller passes starting from the front, bubble sort makes smaller and smaller passes from the end.

Bubble sort is stable.

An implementation looks something like this:

template<typename T>
void bubble_sort(vector<T>& data) {

    for(int j = 1; j < data.size(); ++j)
        for(int i = 0; i < data.size() - j; ++i)
            if(data.at(i) > data.at(i + 1)) 
                swap(data.at(i), data.at(i + 1));
}

As implemented, this function is of order $O(n^2)$ in both the best and worst cases; nested loop, with the same “triangular” structure we saw before. We can actually implement a simple optimization: in the inner loop, if we perform no swaps, then the input is sorted, and we can stop. If we check for this, it may allow us to exit early.

template<typename T>
void bubble_sort(vector<T>& data) {

    for(int j = 1; j < data.size(); ++j) {
        bool sorted = true;
        for(int i = 0; i < data.size() - j; ++i)
            if(data.at(i) > data.at(i + 1)) {
                swap(data.at(i), data.at(i + 1));
                sorted = false;
            }

        if(sorted)
            return;
    }
}

Now, what are the best and worst cases?

The best case is when the input is already sorted. In this case, we make one pass through the vector (to check whether it is sorted) but don’t swap anything, and then we return. So the best case run time efficiency is $O(n)$.
The worst case is an array that is sorted in descending order. Every element will have to “bubble” the full distance to its proper place, so we’ll never exit early (due to sorted) and the condition in the if statement will be true every time, so we’ll perform as many swaps as necessary. If we work out the sum, we get

$$\sum_{j = 1}^n (n - j) = n - \sum_{j = 1}^n j$$

where the second sum expands to $\frac{n(n+1)}{2}$, giving us an order of $O(n)$.

Correctness of bubble sort

If we want to prove the correctness of bubble sort, as usual, it all comes down to choosing the right invariant(s). Here, it’s helpful if we have two, one of each loop:

template<typename T>
void bubble_sort(vector<T>& data) {

    for(int j = 1; j < data.size(); ++j) {
        bool sorted = true;
        for(int i = 0; i < data.size() - j; ++i)
            if(data.at(i) > data.at(i + 1)) {
                swap(data.at(i), data.at(i + 1));

                // INVARIANT: every element in data[0] to data[i] is *smaller*
                // than data[i+1].

                sorted = false;
            }

            // INVARIANT: every element of data[0] to data[size-j] is smaller 
            // than data[size-j+1], and every element of data[size-j+2] to
            // data[size-1] is larger than data[size-j+1]. (That is, 
            // data[size-j+1] is in the right place.)

        if(sorted)
            return;
    }
}

Insertion sort

Suppose we split the input into two “sections”, sorted and unsorted. The sorted section is initially empty. For each element in the unsorted section, we insert it into the sorted section, in its proper, sorted, position. Because this is an array/vector, inserting an element requires shifting all the elements after it up one step. Although we can build a version of this that operates “in place”, on a single array (because when the sorted region grows by 1, the unsorted region shrinks by 1, so the total size is always n), it will be easier to understand if we write it as a function that moves elements from an unsorted vector to a sorted vector, which is then returned.

Insertion sort is stable.

template<typename T>
vector<T> insertion_sort(vector<T>& data) {
    vector<T> sorted;

    for(int i = 0; i < data.size(); ++i) {
        T& item = data.at(i); 

        // Find the proper place for item in sorted
        int pos = sorted.size(); // Defaults to the end
        for(int j = 0; j < sorted.size(); ++j) 
            if(sorted.at(j) >= item) {
                pos = j;
                break;
            }

        // Insert the item at pos, by shifting everything forward
        // (Increases the size of sorted by 1)
        sorted.insert(sorted.begin() + pos, item);
    }
}

What are the best and worst cases for this algorithm? It’s a little bit tricky to figure out because the complexity of the insert is proportional to the number of elements moved (that is, to sorted.size() - pos). So while the loop to find the position can exit early (even in the first iteration), this is bad for the insert, as it implies that pos will be small and thus the runtime of insert rather bad. On the other hand, if the inner loop runs all the way to the end (pos = sorted.size()) then the insert effectively does nothing more than a push_back.

In fact, the inner loop and the insert always balance each other, so that the total between them is sorted.size(). And sorted.size() increases by one every time through the outer loop, so once again we have our “triangular” loop.

We can write a recursive version with the help of a utility functions:

insert places an element in the proper place in the sorted region, again by shifting everything up.

With these two in place, the recursive implementation becomes

template<typename Iterator>
void insertion_sort(Iterator start, Iterator unsorted, Iterator finish) {
    if(start == finish - 1)
        return; // One element
    else {
        // Insert first unsorted element into sorted region
        insert(start, unsorted, *unsorted); 

        // Recursively sort the remainder of the unsorted region.
        insertion_sort(start, unsorted+1, finish); 
    }

}