Recursion

I’m going to present pretty much all of the sorting algorithms recursively, so we should probably talk about recursion. Recursion is a really mind-expanding technique, once you get the hang of it. It’s also the foundation for what could be called the “mathematical interpretation” of computer programming, so if you’re a CSci major, you’ll have to get comfortable with it sooner or later. So let’s look at some simple algorithms, both the iteratively (using loops) and recursively.

Finding the factorial

The factorial of n is defined as the product \(n (n-1) (n-2) \ldots (2) (1)\), i.e., the product of all integers up to and including n. It’s easy to write as a loop:

int factorial_iter(int n) {
    int r = 1; // Factorial of 0 is 1
    for(int i = 1; i <= n; ++i)
        r *= i;
    return r;
}

To write this, or any other algorithm, recursively, we have to ask two questions:

For the factorial, the base case is what happens when \(n = 0\): the loop doesn’t run at all, and 1 is returned. So we can start our recursive version with

int factorial_rec(int n) {
    if(n == 0)
        return 1;
    else
        ...
}

To construct the recursive case, we need to look at what happens when n > 0. In particular, how can we break \(n!\) down into some \(n’ !, n’ < n\)? The most common case is \(n’ = n - 1\).

One way to look at this is to assume that we already have the value of \((n-1)!\), and we want to get \(n!\) from it. That is, assume that factorial_rec(n - 1) will work and give us the right answer; we just need to construct the factorial of n from it. How can we do this? \(n! = n (n-1)!\). So we write our recursive case like this:

int fact(int n) {
    if(n == 0)
        return 1;
    else
        return n * fact(n - 1);
}

Let’s take a minute to walk through the process of computing factorial_rec(3):

Inductive proof

How do we show that a function does what it is supposed to do? We could test it, running it thousands or millions of times and verifying that its output is what we expect, but this requires us to come up with an independent way to define what the function does (e.g., a different way of computing the factorial), which might itself be incorrect, and furthermore, repeated testing can only ever give us a statistical confidence that our algorithm is correct. If we want to be sure, then we need a logical, or mathematical proof that it is correct. For recursive functions, this often takes the form of proof by induction. An inductive proof is kind of the mathematical equivalent to a recursive function. Like a recursive function it has base case(s) (one base case, in fact, for every base case in the function), and the base cases are usually easy. It also has inductive case(s) (one for each recursive case in the function), which are somewhat more tricky, but allow us to do something like recursion.

Consider the example above. We want to prove that fact(n) =\(n!\), where the definition of \(n! = n(n-1)(n-2)\ldots(2)(1), 0! = 1\).

Proof by induction on n (whatever variable we do the recursion on, we say we are doing “proof by induction” on that variable):

Like recursion, the heart of an inductive proof is the act of applying the proof itself as an assumption about “smaller” values (\(n’ < n\)). Technically, there are two kinds of inductive proofs:

The integer exponent calculation

Remember when we worked out the runtime complexity of our “optimized” \(O(\log n)\) function for finding a \(b^n\)? We can write a recursive version of that as well. Once again, we have to ask

This gives us the following definition:

float powi(float b, int n) {
    if(n == 0)
        return 1;
    else if(n % 2 == 0) { 
        // Even
        float fp = powi(b, n / 2);
        return fp * fp;
    }
    else if(n % 2 == 1) // Odd
        return f * powi(b, n - 1);
}

This has the same complexity as the loop-based version, and is arguably simpler.

In this case, if we want to prove that \(\mathtt{powi}(b,n) = b^n\) we’ll need strong induction, because one of the recursive cases shrinks the input by something other than just -1.

Proof that \(\mathtt{powi}(b,n) = b^n\) by strong induction on \(n\):

Mutual recursion

Mutual recursion is when we define several recursive functions in terms of each other. For example, consider the following definition of even and odd:

We can then define two functions (predicates) that recursively refer to each other:

bool is_even(int n) {
    if(n == 0)
        return true;
    else if(n == 1)
        return false;
    else
        return is_odd(n - 1);
}

bool is_odd(int n) {
    if(n == 0)
        return false;
    else if(n == 1)
        return true;
    else
        return is_even(n - 1);
}

If we track out the processing of determining is_even(4), we’ll see that it bounces back and forth between is_even and is_odd.

We did a binary search iteratively, but we can do it recursively as well:

This looks like

template<typename T>
int binary_search(const vector<T>& data, 
                  int low = 0, 
                  int high = data.size()-1,
                  const T& target) {

    if(low > high)
        return -1;

    int mid = low + (high - low) / 2; // Why did I do this?

    if(data.at(mid) == target)
        return mid;
    else if(data.at(mid) < target) // Search right
        return binary_search(data, mid+1, high, target);
    else if(data.at(mid) > target) // Search left
        return binary_search(data, low, mid-1, target);
}

Other examples: Counting the number of copies in a vector. For any vector-style recursion, we need to keep track of our “starting place” within the vector. This is because we can’t make the vector itself smaller, so we have to put a marker into it showing where we are starting. We can do this in two ways, with an int start parameter, or by using iterators.

template<typename T>
int count(vector<T> data, int start, T target) {
    if(start == data.size())
        return 0;
    else
        return (data.at(start) == target) +
               count(data, start + 1, target);
}

With iterators:

template<typename T, typename It>
int count(It start, It finish, T target) {
    if(start == finish)
        return 0;
    else
        return (*start == target) + 
               count(start + 1, finish, target);
}

Iterators are kind of like pointers.

Sorting algorithms

A sorting algorithm is a function that takes a sequence of items and somehow constructs a permutation of them, such that they are ordered in some fashion. Usually, we want things to be ordered according to the normal comparison operators, so that if a < b then a comes before b in the final permutation. Still, there are a few things we have to make sure we get right:

There are some terms associated with sorting that it’s important to be aware of:

Selection sort

We already looked at selection sorting, so let’s look at it again:

Effectively, selection sort splits the list into the sorted region at the beginning, and the unsorted region at the end. The sorted region grows, while the unsorted region shrinks.

Selection sort is not stable.

Iteratively, this looks like this:

template<typename T>
void selection_sort(vector<T> data) {
    for(auto it = begin(data); it != end(data)-1; ++it) {

        // Find smallest
        auto smallest = it;
        for(auto jt = it+1; jt != end(data); ++jt)
            if(*jt < *smallest) 
                smallest = jt;

        // Swap it into place
        swap(it, smallest);
    }
}

Let’s trace through this on a small example to get a feel for how it works.

How can we implement this recursively? Instead of passing around the actual vector, we’re just going to pass around the iterators to the beginning and end of the vector. Why we do this will become obvious, shortly:

template<typename Iterator>
void selection_sort(Iterator first, Iterator last) {
    ...
}

Let’s analyze the recursion:

template<typename Iterator>
void selection_sort(Iterator first, Iterator last) {
    if(last - first == 1)
        return; // 1 element, nothing to sort
    else {
        // Find minimum
        Iterator smallest = first;
        for(Iterator it = first; it != last; ++it)
            if(*it < *smallest)
                smallest = it;

        // Swap into place
        swap(*smallest, *first);

        // Recursively sort the remainder
        selection_sort(first + 1, last); 
    }
}

Let’s draw the recursion tree for this. We won’t trace through the loop, we’ll just assume (for now) that it works correctly.

DIAGRAM

Sorting algorithm analysis

Besides analyzing the general best/worst-case big-O runtime of a sorting algorithm, it’s common to also analyze two other runtime features:

Analyzing the number of comparisons and swaps is useful because these operations lie at the heart of any sorting algorithm: we cannot know whether elements are out of order until we compare them (and, if the elements are complex, comparing them may take a non-trivial amount of time), and we cannot put them in the right order without moving them around – i.e., swapping them.

Bubble sort

Bubble sort is a sorting algorithm that isn’t really suited for recursive implementation. The idea is to compare adjacent elements in the input (e.g., a[i] and a[i+1]) and swap them if they are out of order. We start at the beginning of the input and walk through it, swapping our way to the end. After one such pass, the largest element will have “bubbled” up to the last element of the array. So then we can make another pass, but we skip the last element this time. Whereas selction sort made smaller and smaller passes starting from the front, bubble sort makes smaller and smaller passes from the end.

Bubble sort is stable.

An implementation looks something like this:

template<typename T>
void bubble_sort(vector<T>& data) {

    for(int j = 1; j < data.size(); ++j)
        for(int i = 0; i < data.size() - j; ++i)
            if(data.at(i) > data.at(i + 1)) 
                swap(data.at(i), data.at(i + 1));
}

As implemented, this function is of order \(O(n^2)\) in both the best and worst cases; nested loop, with the same “triangular” structure we saw before. We can actually implement a simple optimization: in the inner loop, if we perform no swaps, then the input is sorted, and we can stop. If we check for this, it may allow us to exit early.

template<typename T>
void bubble_sort(vector<T>& data) {

    for(int j = 1; j < data.size(); ++j) {
        bool sorted = true;
        for(int i = 0; i < data.size() - j; ++i)
            if(data.at(i) > data.at(i + 1)) {
                swap(data.at(i), data.at(i + 1));
                sorted = false;
            }

        if(sorted)
            return;
    }
}

Now, what are the best and worst cases?

Correctness of bubble sort

If we want to prove the correctness of bubble sort, as usual, it all comes down to choosing the right invariant(s). Here, it’s helpful if we have two, one of each loop:

template<typename T>
void bubble_sort(vector<T>& data) {

    for(int j = 1; j < data.size(); ++j) {
        bool sorted = true;
        for(int i = 0; i < data.size() - j; ++i)
            if(data.at(i) > data.at(i + 1)) {
                swap(data.at(i), data.at(i + 1));

                // INVARIANT: every element in data[0] to data[i] is *smaller*
                // than data[i+1].

                sorted = false;
            }

            // INVARIANT: every element of data[0] to data[size-j] is smaller 
            // than data[size-j+1], and every element of data[size-j+2] to
            // data[size-1] is larger than data[size-j+1]. (That is, 
            // data[size-j+1] is in the right place.)

        if(sorted)
            return;
    }
}

Insertion sort

Suppose we split the input into two “sections”, sorted and unsorted. The sorted section is initially empty. For each element in the unsorted section, we insert it into the sorted section, in its proper, sorted, position. Because this is an array/vector, inserting an element requires shifting all the elements after it up one step. Although we can build a version of this that operates “in place”, on a single array (because when the sorted region grows by 1, the unsorted region shrinks by 1, so the total size is always n), it will be easier to understand if we write it as a function that moves elements from an unsorted vector to a sorted vector, which is then returned.

Insertion sort is stable.

template<typename T>
vector<T> insertion_sort(vector<T>& data) {
    vector<T> sorted;

    for(int i = 0; i < data.size(); ++i) {
        T& item = data.at(i); 

        // Find the proper place for item in sorted
        int pos = sorted.size(); // Defaults to the end
        for(int j = 0; j < sorted.size(); ++j) 
            if(sorted.at(j) >= item) {
                pos = j;
                break;
            }

        // Insert the item at pos, by shifting everything forward
        // (Increases the size of sorted by 1)
        sorted.insert(sorted.begin() + pos, item);
    }
}

What are the best and worst cases for this algorithm? It’s a little bit tricky to figure out because the complexity of the insert is proportional to the number of elements moved (that is, to sorted.size() - pos). So while the loop to find the position can exit early (even in the first iteration), this is bad for the insert, as it implies that pos will be small and thus the runtime of insert rather bad. On the other hand, if the inner loop runs all the way to the end (pos = sorted.size()) then the insert effectively does nothing more than a push_back.

In fact, the inner loop and the insert always balance each other, so that the total between them is sorted.size(). And sorted.size() increases by one every time through the outer loop, so once again we have our “triangular” loop.

We can write a recursive version with the help of a utility functions:

With these two in place, the recursive implementation becomes

template<typename Iterator>
void insertion_sort(Iterator start, Iterator unsorted, Iterator finish) {
    if(start == finish - 1)
        return; // One element
    else {
        // Insert first unsorted element into sorted region
        insert(start, unsorted, *unsorted); 

        // Recursively sort the remainder of the unsorted region.
        insertion_sort(start, unsorted+1, finish); 
    }

}