Next up we’re going to take a look at a new data structure, intended to model the mathematical object of graphs. Mathematically, a graph consists of several things:

A set of vertices \(V\). Vertices are sometimes called nodes.
A set of edges \(E \subseteq (V \times V)\), connecting pairs of vertices. Edges are sometimes called arcs.

A graph is called undirected if every edge goes “both ways” (that is, for every \(m,n) \in E, (n,m) \in E\)). A directed graph is one in which at least one edge does not go both ways. Usually we assume that every node has an (implicit) edge to itself (i.e., for all \(n \in V, (n,n) \in E\)), but sometimes we have explicit self-edges. Some graphs allow you to mix directed and undirected edges.

Some graphs have labeled edges, meaning the edges have some extra information attached to them. Some graphs have extra information attached to each vertex. Usually we assume that the “names” of vertices are integers starting at 0.

At any particular vertex \(n\), we have several attributes:

An in-edge is an edge that ends at \(n\).
An out-edge is an edge that begins at \(n\).
A vertex \(m\) is a neighbor of \(n\) if there is an out-edge from \(n\) to \(m\)
The indegree is the number of edges that lead into that vertex, \(|\{i \mid (i,e) \in E\} |\).
The outdegree is the number of edges out of \(n\). \(|\{o \mid (e,o) \in E\} |\). Equivalently, this is the number of neighbors that \(n\) has.

Some graphs allow more than one edge between a single pair of vertices (called a multigraph), but usually we assume a single edge. If we only allow a single edge between vertices, then the maximum number of edges is \(|V|^2\). If \(|E|\) is close to \(|V|^2\) then we call the graph dense. If \(|E|\) is much smaller than \(|V|^2\) then we call the graph sparse.

A node with only in-edges is called a sink. A node with only out-edges is called a source. A node that has only out-edges to every other node, and no in edges, is called a universal source; similarly, a node with only in-edges from every other node (and no out edges) is a universal sink.

The transpose of a graph is another graph that is formed by reversing the directions of all the edges.

A path is a sequence of nodes connected by edges. The length of a path is one less than the number of nodes in it (so that a single node by itself is a path of length 0). Two nodes are connected if there exists at least one path between them.

An undirected graph is called connected if there is a path between any two vertices. A directed graph is called weakly connected if replacing all directed edges with undirected ones would produce a connected graph. A directed graph is called strongly connected if there is a path between all pairs of vertices.

A cycle is a path that begins and ends at the same vertex. A graph that contains no cycles is called acyclic.

A graph where there is at most one path between any pair of nodes is actually a tree.

Graph representations

If that’s a graph mathematically, how do we represent them in the computer? There are two general schemes, with different performance characteristics:

Adjacency matrix
Adjacency list

(Often we need to store some attributes per-vertex: this is easy in either representation, as we can just give ourselves a vector of size \(|V|\).)

Adjacency matrix

An adjacency matrix is a two-dimensional, \(|V| \times |V|\) array of bools, where an element is true if there is an edge between the node given by the x-coordinate and the y-coordinate. E.g.,

    0  1  2  3  4  (from)
  +-------------- 
 0|    T 
 1|       T 
 2|    T
 3|    T  T   
 4| T         T          

(to)

If we need to attach extra information to edges, we can store it in the array elements by making them something other than bool. Similarly for a multigraph, we can make the elements ints and store the edge count, or store a linked-list of edges if edges have extra data attached to them.

In this representation, we have the following implementations of some common operations:

Doing something for all edges: \(O(|V| \times |V|)\) because we have to loop over the entire matrix.
Getting all the edges out of a node: \(O(|V|)\) because we have to scan through an entire column.
Getting all the edges into a node: also \(O(|V|)\) because we have to scan the row.
Determining whether there is an edge from m to n: \(O(1)\), just look at location \((m,n)\) in the array.
Space consumption: \(O(|V| \times |V|)\). This makes the adj. matrix representation good for dense graphs, where we’re not wasting much space.

Adjacency list

An adjacency list stores a graph as an array (or vector) of linked lists. Each element of the array represents a vertex, and the elements of its list are all its neighbors (or rather, the edges to all its neighbors).

Doing something for all edges: \(O(|E|)\) because we are really only storing the edges. (Technically, if \(|E| < |V|\) then it is \(O(|V|)\) because we have to at least store the vector of all vertices.)
Getting all edges out of a node takes time proportional to the number of out edges, because we just walk down the list.
Getting all in-edges takes \(O(|E|)\) time because we have to walk through the entire vector-of-lists, looking for edges into the node.
Checking whether an edge from m to n exists takes time proportional to the out-degree of m (because we have to walk down its linked list).
Space consumption: \(O(|E|)\), which makes this representation suitable for sparse graphs.

Depending on the algorithm, one or the other representation may perform better. Most algorithms which run on adj. matrix representations require \(O(|V|^2)\) time, however, there is one interesting case where the adj. matrix representation is faster than an adj. list:

Finding universal sources/sinks

Recall that a universal source is a node with 0 indegree, and outdegree of \(|V|-1\). A naive way to find a universal source would be to check every column of the matrix, to see whether it contains \(|V|-1\) true values, which would require \(O(|V|^2)\) time. However, this is ignoring information which we discover while processing previous nodes.

Suppose we are looking at an entry in the matrix G[m][n]. This entry is true if there is an edge from node m to node n.

If G[m][n] is true, then node n cannot be a universal source (because it has at least one in-edge, m --> n, and universal sources have 0 in edges).
If G[m][n] is false, then node m cannot be a universal source, because there is no edge m --> n.

We start at i = 0, j = 0 looking at G[i][i] (assuming the diagonal entries are all false).

If an entry is true, this rules out the current j, so increment j.
If an entry is false, this rules out the current i, so increment i.
If \(i \ge |V|\) then we perform one last check on row/column i to verify the universal source property.

This gives us an \(O(|V|)\) algorithm for finding a universal source. It can be easily modified to find a universal sink.

Graph search

There are two fundamental algorithms that come up constantly when we work with graphs, and both have to deal with the problem of searching. Specifically, we want to solve this problem:

Given a starting node and an ending node, determine whether a path exists from start to end, and return the path if it exists.

In both algorithms, we begin at the starting node and proceed to
search its neighbors, marking a vertex as “explored” when we have exhausted its possibilities. The difference between the two algorithms deals with how they handle branching:

Breadth-first search explores all branches simultaneously. At any given time, a BFS will have explored all nodes that are less than or equal to some distance (number of edges) from the starting node. As the distance increases, eventually the entire graph will be searched.
Depth-first search explores a particular path all the way to its end (i.e. until it reaches a node with out-degree 0, or whose neighbors have all already been explored, or it finds the ending node). If it has not found the ending node, then it backtracks to the most recent branch point and tries a different choice.

Both BFS and DFS rely on us being able to mark vertices as “explored” or “unexplored”. Actually, they both use three colors:

White – totally unexplored. Initially the entire graph is white.
Gray – in the process of being explored. This means that the search has visited this node, and is looking at this node, its neighbors, its neighbors’ neighbors, etc.
Black – finished. We have completed the exploration of this node and all the nodes that it is connected to.

Breadth-first search

The idea behind BFS is that when we explore a vertex, we add all its neighbors to a the end of a queue. We then dequeue the next vertex and explore that. Thus, the neighbors of a vertex at distance \(d\) from the source will not be explore until all other nodes at distance \(d\) have also been exlored. This gives BFS the appearance of exploring a continually growing “fringe” around the starting node. At the edge of the fringe are the nodes which are in the queue. Inside the fringe are nodes which were in the queue at one time, but which have since been explored. Beyond the fringe are the nodes which we haven’t explored yet.

We need to keep track of whether we’ve explored a node or not, so we keep an extra vector<bool> to store whether a particular vertex has been explored. (If the graph is acyclic, we don’t need this, its only only useful if cycles exist.)

A general sketch of a BFS is something like this:

Mark all vertices unexplored
Mark the starting vertex explored, and enqueue it.
While the queue is not empty, dequeue a node \(n\)
If \(n\) is the ending node, we are done, return True.
Otherwise, mark \(n\) explored, and enqueue all its neighbors that are not already explored. Goto 2.
If the queue is empty and we have not found the ending node, then no path from start to end exists, return False.

(Most descriptions of BFS search use colors to distinguish explored/unexplored. A vertex is “white” if it is unexplored, “gray” while we are processing it, and “black” after all its neighbors have been enqueued and we are finished with it. But the algorithm works the same if we only distinguish between “white” and “non-white”.)

This takes time proportional to the number of vertices \(|V|\), because once we’ve explored a vertex, we never revisit it.

Note that if we run the algorithm until it has explored everything, and if then there are still unexplored nodes, then that means that the graph is not connected. It’s possible to use this to find the connected components of the graph, by marking all explored nodes as being in a component, and then repeating the search at an unexplored node. Continue until there are no unexplored nodes.

If we run the search only until we find the target node, then we call it a search algorithm. If, on the other hand, we run it until it visits all (connected) nodes then we call it a graph traversal algorithm, because it traverses the graph. We could, of course, “traverse” all the nodes in the graph by just doing a loop over all the nodes, but this ignores the connections between them. When we traverse a graph, we would like to do it in a way that gives us some information about connectivity.

Example…

This gives us a true/false value indicating whether or not a path exists, but it doesn’t tell us anything about the path itself. Fortunately, it’s easy to modify the algorithm to create a BFS tree which can actually tell us the path from the starting vertex to any other vertex, if we let the algorithm run until all nodes have been processed (i.e., skip (3)). To do this, we create a vector of parent pointers, pointers to nodes. We refer to the parent of a node \(n\) as \(n.\pi\). We modify the algorithm as follows:

(step 5) Mark \(n\) explored, and enqueue all unexplored neighbors. When we enqueue a neighbor \(m\), set \(m.\pi = n\). (That is, the parent of a vertex is the vertex that we came from when we explored it.)

When we are done, we can look at the ending node (or any node) and follow its parent-pointers back to the starting node.

Another easy change is to record for each node its distance from the starting node. We call the distance of a node \(n\) \(n.d\) and make the following changes:

(step 2) Mark the starting vertex explored, set its distance to 0, and enqueue it.

(step 5) Mark \(n\) explored and enqueue all its unexplored neighbors. When we enqueue a neighbor \(m\), set \(m.d = n.d + 1\).

The paths found by breadth-first search are in fact the shortest paths from the starting node to any other node. We can see that this is the case by virtue of how the BFS explores nodes at increasing distances: if the shortest path from start to finish is of length \(d\), it will be found when the fringe is \(d\) steps from the starting node. This means that the distances computed are also the minimum distances from the starting node.

Example…

Complexity analysis

The time taken by the algorithm depends on whether we use the adjacency list or adjacency matrix representation:

If we use the adj. list representation, then finding the neighbors of a node is easy: just walk down its list. Step 1 takes \(O(|V|)\) time, while the main loop can (in the worst case) visit every edge, taking \(O(\|E|)\) time, for a total of \(O(|E| + |V|)\) time.
If we use the adj. matrix version, then finding the neighbors of a node takes \(O(|V|)\) time because we have to loop through the entire row. Step 1 still takes \(O(|V|)\) time but the rest of the function now takes \(O(|V|^2)\) time (because we are effectively visiting every possible edge, in the worst case), for a total of \(O(|V|^2)\).

For this algorithm, the adjacency list representation is better.

Depth-first search

DFS can be thought of as what you’d get it you replaced the queue from BFS with a stack. Because it uses the stack, we can implement it recursively without too much trouble.

To visit a node, mark it explored. If the node is the finishing node, return True. Otherwise, recursively visit any unexplored neighbor nodes, and return the logical OR of their return values.
Start by visit-ing the starting node, returning whatever it returns.

Example…

Often it is useful to color nodes by their exploration: “white” is totally unexplored, “gray” is in the process of being explored (i.e., set to gray at the start of visit) and “black” is done being explored (set to black at the end of visit).

As with breadth-first search, an easy change to DFS is to add parent pointers, constructing the DFS-tree, so that we can recover the path from start to finish. Note, however, than unlike BFS, this path is most likely not the shortest path in any sense. It is merely the first path discovered by search.

Another common change is to record the two timestamps for each node:

We keep a counter time, which is incremented every time we visit a node.
The discovery time \(u.d\) of a node is the value of time immediately after it has been incremented, when a node is visited.
After we finish recursively visiting all a nodes neighbors, we increment time again
The finishing \(u.f\) time of a node is this final value of time.

For any given node \(u\), the values of \(u.d\) and \(u.f\) have some interesting properties.

In the DFS tree, every child of \(u\) has a discovery and finishing time that is between those of \(u\).
For any two vertices, either their time-spans do not overlap, or one is totally inside the other. It’s never possible for two node’s time-spans to partially overlap. Furthermore, if one is inside the other, then they are related in the DFS tree by a child-parent relationship.

Example…

By examining the colors of neighboring nodes when we visit them, we can extract some interesting information:

If we find a neighbor which is white then we call the edge to it a tree edge (because we will explore that neighbor, and hence this edge becomes part of the DFS tree).
If we find a neighbor which is gray then we call that edge a “back” edge. Back edges are those that lead up the DFS tree, from child to ancestor. If there are no back edges, then the graph is acyclic.
There are also “cross” and “forward” edge that occur in other scenarios.

Directed acyclic graphs and topological sort

A directed acyclic graph is a directed graph which contains no cycles. DAGs occur fairly often when talking about tasks and dependencies. If we have a set of tasks, where tasks can depend on other tasks, we cannot allow cycles, or the tasks could never be completed. But it is possible for one task to depend on multiple other tasks, or for multiple tasks to all depend on the same task. Given such a graph, we want to output all the tasks in dependency order, the order we would have to complete them in so that every task’s dependencies are completed before it. This is called a topological sort of the DAG, and it’s easy to find with DFS:

When we finish visiting a node, add it to the front of a linked list. We the DFS is finished, the linked list contains the nodes in topological order. (In reverse finishing time order.)

Graph Implementation

There are various ways to implement graphs in C++. The “best” implementations make writing the various graph algorithms straightforward, but that requires a lot of work on the implementors part (writing custom iterators and working with templates and such). We’ll write a simple directed graph with weighted edges, where we can ignore the edge weights if we want to.

class adj_list {
  public:
    adj_list(int node_count) {
        edges = vector<list<edge_type>>{node_count}
    }

    struct edge_type {
        float weight;
        int destination;
    };

    vector<list<edge_type>> edges;
};

Nodes are identified by the indices in the edges vector, so we can add methods to create new edges and check whether an edge exists:

void adj_list::create_edge(int src, int dest, float w = 1) {
    edges.at(source).push_back(edge_type{w, destination})
}

bool adj_list::has_edge(int src, int dest) {
    for(edge_type& e : edges.at(src))
        if(e.destination == dest)
            return true;

    return false;
}

Implementing DFS and BFS over this structure is not too hard:

void adj_list::bfs(int start, function<void(int)> visit) {
    queue q;
    vector<bool> explored{edges.size()};

    q.enqueue(start);

    while(!q.empty()) {
        int n = q.dequeue();

        visit(n);

        explored.at(n) = true;
        for(auto e : edges.at(n))
            if(!explored.at(e.destination))
                q.enqueue(e.destination);
    }
}

This version just does a traversal over the entire graph, calling the function-object visit on each node. It does not retain the parent-tree, or keep distances, though adding those is relatively easy.

void adj_list::dfs(int start, function<void(int)> visit) {
    vector<bool> explored{edges.size()};

    dfs(start, visit, explored);
}

void adj_list::dfs(int start, 
                   function<void(int)> visit, 
                   vector<bool>& explored) {

    visit(start);
    explored.at(start) = true;

    for(auto e : edges.at(start))
        if(!explored.at(e.destination))
            dfs(e.destination, visit, explored);

}

Once again, we just do a simple traversal, and don’t maintain the parent-tree or the timestamps.

Weighted graph algorithms

Weighted graphs have many physical applications: we can think of the edges as being (for example) roads between cities, then the weights become milage. If the graph represents a network of pipes, then the edges might be the flow capacity of a given pipe. Given this intepretation, we can ask for some operations that we might like to perform on weighted graphs:

Find the shortest path from one vertex to another, where “shortest” is defined in terms of the path weight (i.e., the sum of all the weights of the edges in the path), rather than just the number of edges.
Find the maximal flow of a graph between one vertex and another, if we treat the weights as capacities.
Find a minimum spanning tree of an (undirected) weighted graph. The MST is a tree built from edges in the graph (i.e., a “subgraph”) where the sum of all the edges is as small as possible. The MST is useful because it is essentially a graph in which every simple path is the shortest path between its two endpoints; it is not possible to construct a non-shortest path in a MST.

We’re going to assume that nodes in our graphs have a structure like this:

struct node {
  float d;      // Shortest distance to this node
  node* parent; // Pointer to shortest-parent node 
};

Shortest path in a weighted graph

What if the edges have weights, how can we find the shortest path(s) in terms of edge weights? (The path weight of a path is just the sum of all the weights along it.)

First of all, when does the shortest path even exist? Example: (graph with neg weight cycles). Does the shortest path exist? No. Example (graph with neg edges, but no neg-weight cycles). Negative weight edges are OK, but neg-weight cycles are not. If a negative-weight cycle exists anywhere along a path from start to finish, then the “shortest” path does not exist (because we could always make it shorter, by taking another trip around the cycle). We’ll also assume in the following discussion that the shortest path from start to finish actually exists.

Assuming there are no negative weight cycles, how would we find the shortest path from start to finish? Well, to start with, what is the longest length (in terms of number of edges) that the shortest path could be, in the worst case? It’s \(|V|-1\). Why? because that path allows us to visit every vertex once. If we visit any vertex more than once, then we must have a cycle in our path, and without negative-weight cycles, any other cycles can only make a path worse than it would be if we removed them. (Demo)

So in the worst case, the longest possible shortest-path has \(|V|\) nodes and \(|V|-1\) edges and no cycles. Let’s take a more local view.

For the starting node, what is the shortest path to the starting node itself? The empty path, which contains only the starting node and zero edges. (And note that the path-weight of this path is also 0.)

For any of the starting node’s neighbors, can we say that the shortest path is the one via the edge that connects them? Not necessarily:

      4
a ------> b
 \       ^
  \2    /1
   \   /
    V / 
     c

The direct path a -> b is longer, in terms of path weight, than the path a -> c -> b. So we can’t just set the distance to b to 4 and be done with it; we have to be prepared for the possibility that this value will be reduced by a later path. This is different from the BFS, where once we set the distance to a node, it was fixed.

In order to allow for reducing the distance to a node, we use an operation called relax:

void relax(node a, node b, float weight) {
  if(a.d + weight < b.d)
    b.d = a.d + weight;
}

We are re-using the .d distance attribute from the BFS. The idea is that if there is an edge from a to b with weight weight, we check it against the current distance to b and see if using this edge would be better: if so, we adjust the distance to b. Note that relax can only ever make a node’s distance smaller: a call to relax can never increase the distance of a node. This seemingly-simple fact will be important later. We’ll also initially set the distance to every node except the starting node to (\infty) (with the assumption that \(\infty\) can be used in comparisons like in relax).

Suppose we relax all the edges along the shortest path: does doing so set their distances to their final values? Yes, but only if we do it from start to finish. And besides, we don’t know what the shortest path is, so we can’t do that. What happens if we relax every edge in the graph? After doing this, at least one additional node (beyond the starting node) will now have its final distance, and, since distances are never reduced by relax, this node is now “done”, we can do more relaxations without worrying about it.

Put another way, after running relax on all edges, at least one node adjacent to the starting node will have its final distance (it may be more than one).

What if we run another “relax all edges” cycle after this? If a shortest path exists, then another cycle will “lock in” the distance of another node. Thus, every time we “relax all nodes” another node gets its final distance. In the worst case, the shortest path has \(|V|\) nodes, so it would take \(|V|\) “relax all edges” cycles to finalize the distance at the target node.

This brings us to the Bellman-Ford algorithm:

Set the starting node’s distance to 0, and set every other node’s distance to \(\infty\).
Repeat the following \(|V| - 1\) times:
For every edge
Relax that edge

Note that the order in which we relax edges does not matter, because we “relax all times” enough that even if we did them in a terrible order, it would still construct the shortest path.

With a simple extension, we can also detect, after the algorithm completes, whether there are any negative-weight cycles. (A nice feature of Bellman-Ford is that it will complete even if there are neg.-weight cycles. The next algorithm we’ll look at, Dijkstra’s algorithm, will go into a infinite loop!)

For each edge a -> b with weight weight, if a.d + weight < b.d then a negative weight cycle exists.

This just checks every edge to see if it’s possible for it to be relaxed even further. Remember that we demonstrate that the above algorithm should have finalized the distances to all nodes if there were no negative weight cycles. If there are, then it is not possible to finalize the distances, because any path that touches the neg-weight cycle could always be made “shorter”.

Note that we can also easily determine if a shortest path exists: if the target node’s distance is still \(\infty\) after the algorithm completes, then no shortest path exists.

If we want to actually find the shortest path itself, and not just its path length, this can be accomplished with an easy extension to relax:

void relax(node a, node b, float weight) {
  if(a.d + weight < b.d) {
    b.d = a.d + weight;
    b.parent = a;
  }
}

This saves a pointer to the node that we came from in b.parent. The last time that the if body is executed will both “lock in” the final distance, and also save the pointer to the parent node that gives the shortest path back to the starting node. After completion, every node will have both its distance to the starting node, and a pointer to a node that leads backwards along the shortest path to the start. (Thus, Bellman-Ford actually finds all shortest paths from the starting node, to any other node. )

What is the complexity of this algorithm? It’s easy to determine, because the two loops are independent of each other: \(O(|V| \times |E|)\). (This is assuming the adjacency list representation, where we can iterate over all the edges in \(O(|E|)\) time. In the adjacency matrix, iterating over all edges takes \(O(|V|^2)\) time, which means that the total runtime is actually \(O(|V|^3)\)!)

An example of running the Bellman-Ford algorithm:

…

Dijkstra’s algorithm

Bellman-Ford is an exhaustive algorithm: it basically tries everything, which allows it to work (or at least, terminate) even in the presence of negative-weight cycles. We can do better if we assume that there are no negative-weight cycles and take a greedy approach. This means that instead of trying everything, we’re going to give priority to the option that “looks” the best. In this case, the “best” means, smallest distance: the next node to be processed will be the node with the smallest total distance.

Dijkstra’s algorithm is similar to breadth-first search in that it uses a queue, however, it uses a min-priority queue, where items are enqueued with a priority, and the item with the smallest priority is the next one to be dequeued. The priority we’ll use is the node’s current distance estimate. Note that when we relax an edge, this might change the distance to the node on the other end, which in turn might change its position within the heap!

A review of min-heaps:

Insert: add at the next available space, and the swap up.
Extract-Min: remove root, replace with last node, and then swap down.
Reduce: new, this allows us to lower the value of a node already in the heap. If effectively works the same way as insert, swaping the newly decreased value up until it gets to its proper place.

When running Dijksta’s algorithm, for every time through the main loop, we “lock in” another node. In order for this to work, we must process nodes in the order they occur along the shortest path.

Set the distance to all nodes other than the start to \(\infty\). Set the starting node’s distance to 0.
Enqueue all nodes. (Note that the only node with a non-infinite distance will be the starting node, so it will be the first node to be dequeued.)
While the queue is not empty:
Let a = q.extract_min()
Relax every outbound edge from a, updating queue priorities as needed

Example…

This algorithm is effectively the original BFS but with path weights substituted for the simple path length “distance” that that algorithm relied on. The same property applies here: At any given time, the “fringe” will be at a certain distance (total path weight) from the starting node, and the algorithm proceeds by expanding the fringe.

The complexity of Dijsktra’s algorithm is complicated by the fact that each time we dequeue an element, we update the distances of its neighbors, which may dramatically restructure the queue. Whenever we relax a node, reducing its distance, we must also use the heap algorithm reduce to adjust its position within the queue, which requires \(O(\log n)\) time in terms of the size of the queue. But the queue is always shrinking, because we remove one node from it each time through the loop, and new nodes are never added into it.

Interestingly, we actually have two choices as to how to implement the heap:

We can not implement it at all: instead, just keep an array of the priorities. Finding the minimum requires \(O(|V|)\) time, because we must scan through the entire array, but changing the priority takes \(O(1)\) time, because we don’t actually have to move anything.

This leads to a complexity of \(O(|V|^2)\), which may be good if the graph is dense.
If we use a min-heap, then each fix_up takes logarithmic time, leading to \(|E| \log |V|) which may be better for sparse graphs.

This assumes we are using an adj. list. Using an adj. matrix adds an extra \(|V|\) to find the neighbors of a node.

Note that Dijkstra’s algorithm does not reliably work if there are any negative edges (it may produce incorrect results even if there are non negative-weight cycles). Dijkstra’s algorithm fails in the presence of negative-edges because it is greedy, it commits too early to what appears to be the shortest path. If there are no negative weight edges, then any additional edges will at best leave the path weight unchanged (if the weight is 0), so it makes sense to choose the best case we have right now. But if there are negative edges, any amount of seemingly-bad choices right now could be completed undone by a large negative edge later on. Example:

Weighted undirected edge algorithms

We’ll only look at one algorithm on weighted undirected graphs: Prim’s algorithm for minimum spanning trees.

A minimum spanning tree is a subset of \(E\) that still connects all the vertices, has no cycles, and where the sum of all the edge weights is as small as possible. For example:

…

There are two general approaches to building a MST:

Grow the tree by adding nodes to it
Grow the tree by adding edges to it.

The first methods requires maintaining a “forest” of separate trees which are only joined into a single tree at the very end, so we’ll look at the second method. This works by picking a starting node (because all nodes must be in the tree, it doesn’t matter which) and then picking an edge to one of its neighbors to add to the tree.

The key idea is to imagine that we already have a partial MST: some vertices and edges that have already been selected to be “in” the MST. We are now looking at new edges to add to the tree. In order to add an edge, it must fullfil two criteria:

It must not form a cycle (i.e., must not link two nodes that are both already in the tree)
After adding it to the tree, the resulting tree must still be a partial MST.

We call an edge “safe” if it fullfils both criteria. The MST algorithm can be thought of as just repeatedly finding and adding “safe” edges until a tree is formed.

The first criteria is easy, especially if we grow the tree by edges: we already node when a node is in the tree, so when we add an edge we just have to make sure that one of its end points is in the tree, and the other is not in the tree.

The second criteria is more difficult to quantify. When does adding a (non-cycle-forming) edge preserve the partial-MST property? Let us look at all the edges that fit the first criteria: they touch one node inside the tree, and one node outside it. Example:

Which of the highlighted edges is “safe”? The one with the minimum weight of all the edges. We call an edge like this a light edge: it’s a major theorem of the spanning tree algorithm that light edges are always safe.

It might not be obvious why the minimum weight edge would be safe. Couldn’t we run into a situation like in the shortest-weight-path algorithm, where we think we have found the smallest weight, but later we discover a better option? As it turns out, no, this cannot happen. Remember that we assume that we already had a partial MST to start with: if we already have a MST, then growing it by the smallest weight light edge is the correct choice. If we don’t have a MST, then growing it by a minimum-weight edge might be the wrong choice, but if we start with the smallest possible MST (a single node) and grow it safely every time, this situation will never arise.

This brings us to the implementation of Prim’s Algorithm, which, like Dijkstra’s, relies on the use of a min-priority queue to keep track of what to do next.

Set the starting node’s priority to 0, set every other node’s priority to \(\infty\), and set the parent pointers of all other nodes to nullptr.
Enqueue all nodes (again, noting that the starting node will be the first node to be dequeued).
While the queue is not empty:
Let a = q.extract_min()
For every b which is adjacent to a and still in the queue:
If weight(a,b) < b.d, set b.d = weight(a,b) and b.parent = a. (Note that because b.d is b‘s priority, this will trigger a reduce heap operation.)

At any given time, the queue contains the nodes which are still outside the tree. A nodes “distance” is repurposed as the minimum weight of all edges that connect it to a node inside the tree. The parent pointers form the tree itself, by connecting all nodes back eventually to the starting node.

Example:

The runtime of Prim’s algorithm is \(O(|E| \log |V|)\), again assuming the adj. list representation.