Chogarithms: February 2013

Thursday, February 28, 2013

Topological Sorting

A topological ordering of a directed graph G is a labeling of G's nodes such that:
(1) the f(v)'s are the set {1, 2, ... , n}
(2) if (u, v) in G, then f(u) < f(v)

Every directed acyclic graph has a sink vertex - a vertex that has no outgoing edges.

Here is a slick way to do a topological sorting with DFS:

DFS-loop(graph G)
mark all nodes unexplored
current_label = n (number of nodes, to keep track of ordering)
for each v in G
if v not yet explored
DFS(G, v)

DFS(graph G, vertex start)
mark start explored
for every edge(start, v)
if v not yet explored
DFS(G, v)
set f(start) = current_label
current_label--

This will run in linear time because you do a constant amount of time at each node and traverse each edge only once since we keep track of which has or has not been explored.

To prove correctness, we need to show that if (u,v) is an edge, then f(u)<f(v).
Case 1: If u is visited by DFS before v, the recursive call corresponding to v will finish before that of u, so f(v)>f(u).
Case 2: If v is visited before u, there is no path (v, u) because otherwise it would be a directed cycle. Then, the recursive call of v gets popped before u even gets put on the stack. Thus f(u)<f(v).

Tuesday, February 26, 2013

Depth First Search

This algorithm explores aggressively and backtracks only when necessary. This is similar to how we play a maze game.

To code DFS, we can mimic BFS, but use a Stack instead of a Queue. Another way to code this would be to do it recursively.
DFS(graph G, vertex s)
mark s as explored
for every edge (s, v):
if v unexplored
DFS(G, v)

Properties
At the end of DFS, all vertices that could have been reached will have been reached. The running time of this is also O(n_s+m_s)

BFS and Undirected Connectivity

Let G=(V, E) be an undirected graph. The connected components of G are the "pieces' of G. The goal is to compute all connected components. A good application for this is to check if a network is disconnected, or to see if everything in a graph can be reached from everything else.

To compute all components:
Mark all nodes unexplored
for i=1 to n
if i not yet explored
BFS(G, i)

The call to BFS discovers precisely i's connected component. This algorithm can't miss any nodes because it actually walks through every node so it must find every connected component. If we wanted to find the number of connected components, we could keep a count variable and increment it each time we call BFS.

Breadth First Search

BFS(graph G, vertex s)
[all nodes initially unexplored]
-mark s as explored
let Q = queue data structure, initialized with s
while (Q not empty)
remove the first node of Q, call it v
for each edge (v, w):
if w is unexplored
mark w as explored
add w to Q (at the end)

Layer n+1 gets added to the queue when we process layer n. We visit each node only once and each edge at most twice (for edge(v, w), we might encounter it when we're at node v and at node w).

Using breadth first search, we can implement a shortest path algorithm quite easily. The goal is to compute dist(v), the fewest number of edges on a path from s to v.

The only extra code we need to incorporate is:
initialize dist(v) = {0 if v=s, infinity if v != s}. When considering edge (v, w), if w is unexplored, then set dist(w) = dist(v)+1.

Monday, February 25, 2013

Minimum Cut Problem

The minimum cut problem is to determine the cut which has the least number of crossing edges. This is used when you want to break a graph down into two or more pieces. For example, in networks, the minimum cut can help determine where hot spots and weaknesses are.

Another example is image segmentation. An image can be thought of as a graph of pixels. Each pixel has an edge connecting to adjacent pixels, each with an edge weight which is how much you "expect" the two to be of the same object. Thus, by computing a minimum cut, we can essentially take out that object from the image.

Closest Pair Algorithm

Say you are given a set of points in the plane and you want to find the two points closest to one another. The brute force method of doing this would be to compute the distance between every possible pair of points and keep track of the minimum. This will take O(n²) because there is a quadratic number of pairs.

Let's assume that this problem were on a line, or 1-D. Then we can just sort all the points, O(nlogn) at best, and then run through the line to find the closest pair which must be adjacent points, O(n). So can we find a similar method that will solve the problem in 2-D?

We can sort the points according to x-coordinate and y-coordinate which will just take O(nlogn). This may not work for all points. If we wish to apply the divide and conquer method to solving this problem, we might think to split the problem in half, recursively do work in each half, and combine the solutions of the subproblems to recover the solution to original problem.

We can let Q be the left half of P and R be the right half of P, which forms Q_x, Q_y, R_x, and R_y.

Graph Search

The goal of graph search is to find every node that is reachable from a starting node. We don't want to explore anything twice so that we can get O(m+n) running time, linear in the number of edges and vertices.

The generic algorithm for this would be as follows:
Given (graph G, vertex s)
-initially explore s, all other vertices unexplored
-while possible:
-choose an edge (u, v) with u explored and v unexplored
-mark v explored
-halt
At the end, if a node v is explored, then G has a path from s to v.

What gives rise to multiple algorithms of this general sort is in the ambiguity in choosing the next edge.

In Breadth First Search (BFS), we explore the graph in layers. The starting node is in layer x₀. Its neighbors are in layer x₁. For any node in layer x_i, it will take i edges to connecting the starting node to it. BFS is good for computing shortest paths and connected components of an undirected graph. This can be implemented in a First In First Out data structure, such as a Queue.

Depth First Search (DFS), on the other hand, explores aggressively and backtracks only when necessary. It can be used to compute a topological ordering of a directed acyclic graph as well as the connect components in a directed graph. This can be implemented with a Last In First Out data structure such as a Stack.

Wednesday, February 20, 2013

Graphs

Graphs consist of vertices and edges. Vertices, also called nodes, can be pictured as the dots while edges are the lines connecting them. Graphs can be directed or undirected. Directed graphs have ordered pairs of endpoints (there is a head and tail) while undirected graphs do not.

Graphs are used ubiquitously. An obvious example is a map. The places are the vertices while the roads connecting them are the edges. Graphs can also be used to set precedence. That is, you can show which things are prerequisites for others, such as course prereqs.

A cut of a graph (V, E) is a partition of a graph into two non-empty sets A and B. Each set contains only the vertices that connect one vertex in the set to another in the set. Meanwhile, there are also edges connecting the vertices from set A to set B and vice versa.

In a graph with n nodes, you can either put them in set A, or set B. This gives us 2ⁿ cuts of the graph. However, since each set must be nonempty, it is actually 2^n-2. Meanwhile, a crossing edge of a cut (A, B) is an edge whose endpoint is in A and head is in B.

A graph in one piece with n nodes must have at least n-1 edges. You start out with n distinct pieces and by adding an edge between two edges, you now have n-1 distinct pieces. To get to one piece, you need to do this n-1 times. This graph will have a maximum number of edges when every vertex is connected to every other vertex. Then there will n(n-1)/2 total edges because this is the number of ways of choosing 2 vertices from n of them.

If we let n be the # of vertices and m be the # of edges, we have that m is Omega(n) and O(n²). A graph is "sparse" if it is closer to the lower bound and "dense" if it is closer to the upper bound.

Friday, February 15, 2013

Selecting the i'th smallest element

Finding the ith smallest element may seem difficult at first. The naive way would be to say, "Hey let's sort it and then we can just pick out the ith one!" We could use MergeSort or even better yet, QuickSort and have it done in O(nlogn). This is called a reduction of our problem: reducing selecting to sorting and then selecting.

It turns out that we can actually accomplish this task by slighting changing QuickSort. Recall that after partitioning, the pivot stays in its "rightful" spot and doesn't get moved in any of recursive calls later on. So let's say we're looking for the ith element and after the first partition, the pivot is at k.

If k=i, then we know we found the ith element and we're done! If k>i, then recursively partition the left hand side. Otherwise, recursively partition the right side.

What is the running time of this selection algorithm? Well it depends on how good the pivots are. If pivots are chosen in the worst possible way every time, then it will be O(n²). This would be if we chose the smallest or largest element every time, reducing the problem by only one element.

What about the best? This would be the median. Using the master method, we get that this will run in O(n).

Analysis of QuickSort

Another cool fact about QuickSort is that every two elements in an array will either get compared once or never. If you think about it, the only comparisons that ever occur are the ones between the pivot and every other element in [start, end]. Once these comparisons are over, there are two recursive calls to QuickSort, both of which do not include the pivot at all. Thus, that pivot can only be compared once at most.

Wednesday, February 13, 2013

Quicksort

Quicksort is a sorting algorithm that runs on average O(nlogn) and does in-place sorting. This saves space because it does not require a temporarily array to store values like in MergeSort.

The key concept behind Quicksort is the partitioning around a pivot. Choose one of the elements in the array as a pivot. Then rearrange the array so that everything less than the pivot is to its left and everything greater is to its right. Note that this rearrangement does not mean the entire array has to be sorted after this one partition. However, the pivot does end up in the "right" spot. That is, the pivot will not be moved anymore because it is already in the spot where it will be in the sorted array.

Partitioning can be done in linear time without allocating any extra memory and reduces the problem size.

QuickSort(Array a, length n)

if n = 1 return
p = ChoosePivot(a, n)
Parition a around p
QuickSort(first part)
QuickSort(second part)

Partitioning

The easy to partition the array would be to allocate a temporary array using O(n) space. This will still partition around the pivot in O(n) time. Assuming the pivot is the first element in the array, you traverse the array and keep a partitioned part and an unpartitioned part. You use one pointer to traverse the array and another to point to the first element in the partitioned part larger than the pivot. When you find an element smaller than the pivot, you simply swap it with the pointer's element. When you are finished, swap the element one before the pointer with the pivot to finish partitioning.

Finding the Optimal Pivot

So far, we have used the first element in the array as the pivot. What if the array were already sorted, or reverse sorted? Then partitioning would leave the pivot either in the same spot or in the back. Then we'd recursively call QuickSort on the (n-1) length subarray. We do this over and over until the array is sorted. This takes O(n²) time.

What would be a better way? If we want to balance the two subarrays so that they are about equal in length, then we would want to find the median. Thus, if we are lucky enough to be able to get the median as the pivot every time, then the algorithm would be much like MergeSort without the extra memory allocation, running in O(nlogn).

What if finding the median is too hard? What other choices do we have. Well, it is established that if we are able to split the subarrays just 25%/75%, then we will still have O(nlogn) running time. If we had 100 elements, we just have to pick an element that is anywhere from 25th smallest to 75th smallest. This is half the total elements, so we have 50% chance of picking this pivot.

As a side note, this is an instance of adding randomness to our algorithm. Randomness is actually really useful in many applications because it makes things easier, more elegant, and efficient, but more on this later.

In conclusion, we have established that a randomized QuickSort has an average running time of O(nlogn), a blazingly fast algorithm without the extra space allocation.

Monday, February 11, 2013

Meaning behind Master Method

After establishing what the running time of an algorithm based on three variables a, b, d, we might be wondering what is the meaning behind this method. Where do these logs and exponents come from?

If we look at the amount of work done at a single level of the recursion tree, we get that it is,

where a^j is the number of level-j subproblems and the rest is the amount of work per level-j subproblem. Thus, the total amount of work must be,

Here we finally have the a and b^d that we compared. But what do they represent? Remember, a is the number of subproblems we broke down our problem into. We can think of it as the rate of subproblem proliferation.

Now what about b^d? This is the rate of work shrinkage. How come it isn't just b, the amount we shrank the problem by? If the problem requires a linear amount of work outside the recursion, then halving the input size will halve the amount of work. However, if we have a quadratic amount of work, halving the input size will actually quarter the amount of work. This is where the exponent d comes into play.

These are two forces in opposition. If the rate of subproblem proliferation is greater than the rate of work shrinkage, then we will actually be doing more work. If it is less, then we are effectively reducing the amount of work we are doing by dividing the problem into more subproblems. Otherwise, we will be doing the same amount of work at each level.

So let's break it down into three cases.

a = b^d : then we have that the fraction is 1 and the sum would simply be log_bn. We can ignore the base because it will not make a difference and thus we get O(n^dlog_bn).
a < b^d : then using the sum of a geometric series with r<1, we know that this sum will be less than some constant, independent of the number of terms. Then, using Big-O notation, the constants will be supressed and thus we have only O(n^d)
a > b^d : Since r>1 in this geometric series, we know that the sum will be bounded by a constant times the largest term. With some algebra we will get that the running time will be O(n^log_ba).

Sunday, February 10, 2013

Master Method for Solving Recurrences

What makes the master method so awesome is that it can be thought of as a "black box" for solving recurrences. This means that you give it some inputs and it tell tell you the running time of the algorithm. However, it only works when all subproblems have the same size. For example in each recursive call to MergeSort, each recursive call does work on size n/2.

Base Case:

T(n) ≤ a constant c, for all sufficiently small n

For larger n:

T(n) ≤ aT(n/b) + O(n^d)

a is the number of recursive calls
b is the factor by which the problem size shrinks. It must be greater than one or the problem will never get any smaller
d is the exponent of the combining time. It can be as small as 0, which indicates it takes linear time.

a, b, d must all be constants and independent of the input size.
So the master method is:

In the first case, the base for the logarithm doesn't matter because a base will only change the amount by a constant factor. However, in the third case, the base matters a lot because it could make the difference between linear and quadratic time.

Let's look at MergeSort as an example. We know T(n) = 2T(n/2) + O(n). We have a=2, b=2, and d=1. a=b^d so we are in case 1. The running time will be O(n^dlogn) which is just O(nlogn).

For binary search, we split the problem into two with n/2 input size each BUT we only work in one of them and just compare the element you're searching for with the center element, which takes constant time. Thus, we have a=1, b=2, d=0 and a=b^d so the overall runtime will be O(logn).

Saturday, February 9, 2013

Strassen's Subcubic Matrix Multiplication Algorithm

A Review on Matrix Multiplication
Given two 2 nxn matrices X, Y and their product matrix Z, we know that Z_ij = (ith row of X) · (jth column of Y). This naive algorithm requires three for loops, two to traverse the Z matrix and one to obtain the values from X and Y for the dot product. This has an overall runtime of O(n³).

If we think back to MergeSort, we can attempt to apply the Divide and Conquer algorithm design paradigm. We can split each matrix into 4 matrices, as such:

A	B
C	D

E	F
G	H

AE+BG	AF+BH
CE+DG	CF+DH

Then, it can be proven that if we split the product matrix in 4 matrices as well, each submatrix will be the product of the corresponding submatrices in the two matrices.

Then to compute this product, we will need to do 8 recursive calls: 4 submatrices with two factors each. It turns out that this is no better than the naive approach. It also has a runtime of O(n³).

Strassen's algorithm makes it possible to solve the problem with only 7 recursive calls. This may seem like a small decrease but if we do 1/8 less work on each time we recurse, the total amount of work saved is pretty significant. This turns out to have a subcubic runtime.

The Algorithm So what is the algorithm? We need 7 recursive calls to find 7 products:

P₁ = A(F - H)

P₂ = (A + B)H

P₃ = (C + D)H

P₄ = D(G - E)

P₅ = (A + D)(E + H)

P₆ = (B - D)(G + H)

P₇ = (A - C)(E + F)

Then, it is claimed that

AE+BG	AF+BH
CE+DG	CF+DH

P6+P5+P4-P2	P1+P2
P3+P4	P1-P3+P5-P7

This makes you wonder how in the world did Strassen come up with P₁ through P₇. We have no idea. "How can we make this better?" This is the ultimate question for coming up with better and better algorithms to solve our problems.

Thursday, February 7, 2013

Inversions

The first thing I've learned from Algorithms: Design and Analysis Part I is how to count inversions.

What is an inversion? An inversion is a pair of objects that are out of order. If we are given an array and our goal is to sort from low to high, an inversion would be two items a_i and a_j such that a_i > a_j, but i < j. We can see that they are misplaced.

An array with no inversions is one that is sorted and an array with the maximum number of possible inversions would be sorted in reverse order. Thus, counting inversions is a way of checking how sorted or unsorted something is.

How do we find the number of inversions? Well, the naive and obvious way would be to traverse the array and compare each element to every other element. This algorithm would be O(n^2). Can we do it a better way? Yes. We'll use the divide and conquer method and piggyback on the MergeSort algorithm to do it in O(nlogn). By counting the number of inversions, we will also be sorting it.

By splitting the array in half, we know that:

total inversions = inversions in left + inversions in right + split inversions

A split inversion is one in which we have one element on the left half that will be an inversion with an element on the right half. To find the number of inversions in each half, we'll just recursively call the algorithm on each half. Finding the number of split inversions is the trickier task.

Say we have two halves, [2, 5, 6] and [1, 3, 8]. The number of split inversions will actually be counted when we do the merge subroutine. So the way merge works is that it will keep track of two pointers, one to the front of each half array. It will take the smaller element and increment the pointer. If an element is taken from the second array, we know that it must be smaller than the element in the first subarray, and subsequently every element thereafter. Then we can just sum up the number of elements left in the first array each time the second subarray is chosen from before the first.

Here is my code for this algorithm:

 int countInversions(int[] a) {
  return countInversions(a, 0, a.length, new int[a.length]);
 }

 int countInversions(int[] a, int start, int end, int[] t) {
  if (start == end - 1)
   return 0;
  int mid = (start + end) / 2;
  int x = countInversions(a, start, mid, t);
  int y = countInversions(a, mid, end, t);
  int z = subroutine(a, start, mid, end, t);
  return x + y + z;
 }

 int subroutine(int[] a, int start, int mid, int end, int[] t) {
  int i = start;
  int j = mid;
  int k = i;
  int count = 0;
  while (i < mid && j < end)
   if (a[i] < a[j])
    t[k++] = a[i++];
   else {
    t[k++] = a[j++];
    count += (mid - i);
   }
  System.arraycopy(a, i, t, k, mid - i);
  System.arraycopy(a, j, t, k, end - j);
  System.arraycopy(t, start, a, start, end - start);

  return count;
 }