Wednesday, August 14, 2013

CBV vs CBN

Call by Value - the most common evaluation strategy. Things are reduced before evaluating.
Call by Name - Things are passed down until they are needed to be evaluated.

Fresh Start for Functional Programming

Starting today, I plan to really focus on functional programming in Scala and commit myself to finishing the 7 week course on Coursera.

Sunday, April 21, 2013

Lists

A fundamental data structure is the immutable linked list. It is composed of two parts:

  1. Nil: the empty set
  2. Cons: a cell containing an element and the rest of the list
trait IntList ...
class Cons(val head: Int, val tail: IntList) extends IntList ...
class Nil extends IntList
The use of val is syntactic sugar for:
class Cons(_head: Int, _tail: IntList) extends IntList {
      val head = _head
      val tail = _tail
}
This saves the use of _head and _tail. 

Pathological Data Sets and Universal Hashing Motivation

The load of a hash table is the total number of elements in the hash table divided by total amount of space in the table. This is usually denoted as alpha. Alpha=O(1) is necessary for operations to run in constant time. With open addressing we need alpha << 1. For good performance, we need to control load.

Secondly, for good performance, we need a good hash function, i.e. spreads the data evenly across buckets. However, a hash table that spreads out every data set does not exist. That is, for every hash function, there is a pathological data set.

Pathological Data in the Real World
Reference: Crosby and Wallach, USENIX 2003.
You can paralyze several real-world systems (like network intrusion detection) by exploiting badly designed hash functions. Two characteristics of these breakable systems were:
  1. Open source
  2. Overly simplistic hash function designed for speed (easy to reverse engineer a pathological data set)
So what should we do to solve this problem? 

Object Oriented Programming

All OOP languages implement dynamic method dispatch. This means that the code invoked by a method call depends on the runtime type of the object that contains the method.

This dynamic dispatch is analogous to calls to higher-order-functions. Can we implement objects in terms of higher-order-functions? Can we implement higher-order-functions in terms of objects?

Traits

For a class that wants to inherit from more than one superclass, it can use inherit from traits. Traits are like abstract classes defined as:
trait name {}
Classes and objects can only inherit from one superclass, but many traits. For example,
class Square extends Shapes with Planar with Movable ...
Traits are like interfaces in Java, but are more powerful since they can contain fields and concrete methods. However, they cannot have valued parameters.
 

Saturday, April 13, 2013

Functions returning Functions

In Scala, we can use multiple parameter lists. Before

def sum(f: Int => Int): (Int, Int) => Int = {
     def sumF(a: Int, b: Int): Int =
          if (a > b) 0
          else f(a) + sumF(a+1, b)
     sumF
}
Now, we can turn that into
def sum(f: Int => Int)(a: Int, b: Int): Int =
     if (a>b) 0 else f(a) + sum(f)(a+1, b) 
This way, we can still use sum without the second parameter list such as sum(cube).

Friday, April 12, 2013

Currying

In the sum function, there is repetition in the parameters a and b which just get passed from one function to another. We can get rid of these with currying. Then, we can rewrite sum as follows:
def sum(f: Int => Int): (Int, Int) => Int = {
     def sumF(a: Int, b: Int): Int =
          if (a > b) 0
          else f(a) + sumF(a+1, b)
     sumF
}
Now sum is a function that returns another function. This function applies the input function parameter and sums up the result. We can redefine the three sum functions as:
def sumInts = sum(x => x)
def sumCubes = sum(x => x * x * x)
def sumFacts = sum(factorial)
If we wanted to just use the function sumCubes without explicitly defining it, we could do it another way: sum(cube)(1, 10).  sum(cube) applies the cube function to the sum function and returns the sum of cubes function. This function is then applied to the parameters (1,10). We see that sum(cube) is equivalent to our definition of sumCubes. Function application associates to the left.



Monday, April 8, 2013

Rewriting the main sum method

We can optimize the main sum method by making it tail-recursive. This can be done with an accumulator as follows:
def sum(f: Int => Int, a: Int, b: Int) = {
     def loop(a: Int, acc: Int): Int =
          if (a>b) acc
          else loop(a+1, f(a) + acc)
     loop(a, 0)

Anonymous Functions

An anonymous function is one that does not have a name and is treated as a literal. We can rewrite the cubic function as: (x: Int) => x * x * x. Another example is: (x: Int, y: Int) => x + y.

Every anonymous function can be written with a def like normal functions. Therefore we can say that anonymous functions are syntactic sugar.

Using anonymous functions, we can rewrite the three sum functions more succinctly:
     def sumInts(a: Int, b: Int) = sum(x => x, a, b)
     def sumCubes(a: Int, b: Int) = sum(x => x * x * x, a, b)

Higher Order Functions

Functional languages treat functions as first-class values, that is, like any other value. Higher order functions take other functions as parameters or return value.

Let's make three functions that sum up a range of integers, a range of their cubes, and a range of their factorials.

def sumInts(a: Int, b: Int): Int =
     if (a>b) 0 else a + sumInts(a+1, b)
def sumCubes(a: Int, b: Int): Int =
     if (a>b) 0 else cube(a) + sumCubes(a+1, b)
def sumFactorials(a: Int, b: Int): Int =
     if (a>b) 0 else fact(a) + sumFactorials(a+1, b)

You notice that these three functions are nearly identical, except for one area: what they do at each value of a. We want to factor out the common things. How can we do this? Higher order functions!

We can make three functions to do the work at each level. Say,

  1. def id(x: Int):Int = x
  2. def cube(x: Int):Int = x * x * x
  3. def fact(x: Int):Int = if (x==0) 1 else x * fact(x-1)

We can define a single function that can do all the work:
def sum(f: Int => Int, a: Int, b: Int):Int =
     if (a>b) 0
     else f(a) + sum(f, a+1, b)

def sumInts(a: Int, b: Int) = sum(id, a, b)
def sumCubes(a: Int, b: Int) = sum(cube, a, b)
def sumFactorials(a: Int, b: Int) = sum(fact, a, b)

In sum, f is the type of function that maps Int to Int. As you can see, this makes the code a lot more compact and breaks it down into more modules. However, we had to define 3 auxilliary functions: id, cube, and fact. These can be hidden using anonymous functions.

Friday, April 5, 2013

Tail Recursion

If an argument calls itself as its last action, the function's stack frame can be reused. This is called tail recursion. These are iterative processes and executive much like a loop.

However, if we have a function such as factorial(n) that calls n*factorial(n-1) last, it does not act as a tail recursive function.

It's best to refactor a function so that it is tail recursive because the JVM will only allow about 1000 stack frames to be used.

Thursday, April 4, 2013

Blocks and Lexical Scope

In the sqrt example, the user does not need to see the sqrtIter, isGoodEnough, and other auxillary functions. Thus, we can wrap them in a block under sqrt so that sqrt is the only visible function. 

A block is delimited by braces. The last element in the block is an expression that defines its value. The definitions inside a block are only visible within the block and shadow definitions of the same names outside the block.

Definitions outside the block are clearly visible to the inside of a block so there is no need to redefine parameters (such as x for sqrt). Nesting method definitions makes code much more cleaner as well. Here is my code for sqrt:

  def abs(x: Double): Double = if (x > 0) x else -x
  def sqrt(x: Double): Double = {
    def isGoodEnough(guess: Double): Boolean =
      abs(guess * guess - x) / x < 0.001
    def improveGuess(guess: Double): Double =
      (guess + x / guess) / 2
    def sqrtIter(guess: Double): Double =
      if (isGoodEnough(guess)) guess
      else sqrtIter(improveGuess(guess))
    sqrtIter(1.0)
  }
Unlike Java, Scala does not require a semicolon at the end of every line. However, you do need it if you want to put more than one statement on one line. This raises the issue of multiline statements. You could wrap them in parentheses or put the operator on the first line. For example:
a. (long-expression1
        + long-expression2)
b. long-expression1 +
        long-expression2


Conditional Expressions

To choose between two values, we can use a conditional expression if-else which looks like Java's if-else but is used for expressions rather than statements. For example,
def abs(x:Int) = if (x > 0) x else -x
Expressions like && and || use short-circuit evaluation because their right operands are not always evaluated. 

When we define a function, we are using by-name since we evaluate parameters when we use them. When we define a val (val x = square(2)), we are using by-value. 

Therefore, def x = loop will work perfectly fine, but val x = loop will not work.

def and(x: Boolean, y: => Boolean): Boolean = if (x) y else false
def or(x: Boolean, y: => Boolean): Boolean = if (x) true else y

We have to remember to include the => to make the second parameter pass by-name. This will let the function use short-circuit evaluation. 

Evaluation Strategies and Termination

Call-By-Value termination always implies Call-By-Name termination. However, the converse is not true. For example, given a function {def foo(x: Int, y: Int) = x} and we call foo(1,loop) where loop does not terminate,

  1. In CBV, we need to evaluate loop which does not terminate so this does not terminate
  2. In CBN, we only need the first argument which it will readily return
Scala normally uses Call-By-Value because it is usually exponentially faster to compute. However, you can make a function use CBN by adding a => such as: {def constOne(x: Int, y: => Int) = 1}

If we call const(1+2, loop) we get const(3,loop) which evaluates to 1 since the parameter loop is CBN. On the other hand, if we call const(loop, 1+2), we will get an infinite cycle!

Elements of Programming

Every programming language has three things:
  1. Primitives
  2. Expressions to combine primitives
  3. Ways to abstract expressions, introducing a name for the expression
Scala has an interactive shell that one can use to test simple expressions called a REPL (Read, Evaluate,  Print, Loop). You evaluate an expression by taking the leftmost operator, evaluating its operands, and applying the operator on the operands.

You can define a function in Scala as follows: def square(x:Double):Double = x * x. The second Double could be omitted. 

There are two ways of evaluating a function application. The first is called Call-By-Value.
  1. Evaluate function arguments from left to right
  2. Replace function application by function's RHS, and
  3. Replace the formal parameters in the function by the actual arguments
The Call-By-Name method of evaluating a function is similar but it doesn't evaluate the arguments until the very end. This is beneficial because it will not evaluate arguments that are not used. However, there may be repeated evaluations of the same expressions. 

These schemes of expression evaluation are called substitution models. This can be applied to all expressions that do not have side effects (i.g., x++ is a side effect). 

Tuesday, April 2, 2013

Why Functioning Programming

I watched a keynote talk by Martin Odersky on YouTube about why Scala. Today, the problem in computing is not in the hardware but rather the software. Since hardware developers have firmly decided that the way to go with building hardware is not to add more clock cycles but rather increase the number of cores, we as programmers must deal with more parallelism and concurrency.

Parallelism is running a problem across multiple cores, even if they're not running at the same time. However, concurrency is running multiple threads at the same time.

So why is Scala over Java in some sense? Scala is much more precise and compact. Whereas just to construct a class and its constructor with one instance variables, you need to repeat the instance variable three times, Scala only needs to do it once. Whats even more is that to make some Java program run in parallel could take days while in Scala you can just add .par and be done. Scala is actually shorthand for "scalable language."

There are also more tools to utilize in Scala to give you more control over things. This video was definitely a great motivation for me to learn Scala and functional programming from the Coursera course taught by this keynote speaker.

Resources:
Video Link

Tuesday, March 12, 2013

Bloom Filters

The bloom filter is a data structure designed in 1970. It is similar to a hash table. They are more efficient than a hash table, but they do raise false positives at times.

A bloom filter supports super fast lookup and insertion, just like a hash table. So why have this data structure?

  1. Pros: It is much more space efficient. 
  2. Cons: You can't store an associated object. You are only storing a value to indicate whether or not an object is in the data structure. Deletions are also not allowed. There is also a small false positive probability. This means that even if you haven't inserted object x in the bloom filter, it will say x has been inserted. 
Applications
  1. Early spellcheckers - you add every possible word into the filter. Then you just check word by word if something is in the filter or not. If it is not in the filter, then that is an incorrect word. With small probability, this will say that an incorrectly spelled word is actually correct.
  2. List of forbidden passwords - This could be used to prevent the use of common or too-easily guessed passwords. 
  3. Network routers - There is limited memory and it needs a super-fast data structure. 
To implement this data structure, you use an array of n bits. We also need k has functions, somewhere around 2-5 (You could also make 2 hash functions and make k linear combinations of those 2 functions). Then, 
    for i=1,2,...,k 
        set A[hi(x)]=1
Then for lookup, you just return true if A[hi(x)]=1 for i=1,2,...,k. Here, you can't get a false negative since we never return an entry in the array back to 0. However, you can get a false positive if other insertions set all k hi(x) to 1.

For this data structure to be good, we need to see that the probability of a false positive is relatively low, while keeping the space needed small. 

Rotations

Rotations are common to all balanced search trees. The idea is to locally rebalance subtrees at a node in O(1) time without violating the search tree property. When you invoke a rotation, it is always on a parent-child node pair. If it is a right child, you need a left rotation and vice-versa for a left child.

Here is a diagram.
We want to rotate the pair (x,y) so that Y is the new child of P. Since X is less than Y, X must be Y's left child. Since C is greater than Y, it must be Y's right child. Since A is greater than X, it must be X's left child. Now we know X<B<Y so B will be X's right child. This also satisfies that B is on Y's left subtree.

A right rotation is very similar to this.

Red Black Trees

There are lots of different trees for the same set of keys. The height of the tree could range anywhere between log2n and n. Most of the operations on the search tree are governed by the height, we want to minimum the height. This is where balanced binary trees come in.

We want to ensure that the height is always O(logn) even after insertion and deletion. Some examples are red-black trees, AVL trees, and splay trees. There are also B trees that has multiple keys per node.

A red-black tree is the same as a binary search tree but it has a few invariants.

  1. Each node is either red or black.
  2. The root is black.
  3. No 2 reds in a row - if you have a red node, its children must be black. This implies if a child is red, its parent must be black.
  4. Every root-null path has the same number
Every red-black tree with n nodes has a height less than or equal to 2log2(n+1).

Sunday, March 10, 2013

Binary Search Tree

The search tree property is as follows: Every thing to the left of the node is less than it and everything to the right is greater. There are many possible trees for the same set of keys. The height can range anywhere from log2n to n. Therefore, the worst possible running time for searching a key or inserting a key is O(height).

Getting Minimum
You can just traverse left down the tree since everything left is less than what you are at now.

Getting Maximum
This is the same concept as minimum but you move to the right.

Finding Predecessor
Easy Case - If k's left subtree is not empty, just return the max key in its left subtree.
Harder Case - Go up parent pointers until you reach one that is less than you.

In-Order Traversal
Let R be the root node of the search tree, with substrees Tl and Tr. Recurse on Tl. Next print out R's key. Lastly, recurse on Tr.

Deletion
First we need to search for the node. Then we have three cases:
(1) k has no children (easy) - Then we can just delete the node without any consequences
(2) k has one child (medium) - The unique child assumes the place of the parent node
(3) k has two children (hard) - We can sneakily compute the predecessor. That is, traverse k's non-null left child and then follow right child pointers until no longer possible to get l. Swap k and l. We know k now must have no right child since we followed them all. Then we know how to delete it if it has 0 or 1 child.

Note:
We can store the size of the tree at each node. If x has children y, x, then size(x) = size(y) + size(z) + 1. We also have to maintain this variable each time we do insertion or deletion as well.

Select
The goal is to select the ith order statistic. Start with root x with children y and z.
(1) If i==size(y)+1, then return x's key.
(2) If i<=size(y), then recurse on y.
(3) if i-1 > size(y), then recursively compute (i-size(y)-1)th order statistic on search tree rooted at z.
The running time for this is still O(height).

Saturday, March 9, 2013

Balanced Binary Search Trees

We can think about a balanced binary search tree as a dynamic sorted array. If we have a sorted array, what kinds of operations can we perform on it?

  1. Binary search in O(logn) time.
  2. Select element given ith order statistic in O(1)
  3. Computing min/max of array - O(1)
  4. Predecessor/Successor - O(1), just find that element and return one before/after it
  5. Rank - how many keys stored are less than or equal to the key - O(logn)
  6. Output in sorted order - O(n)
Simply using a sorted array would be unacceptable for insertion/deletion because it could use O(n) time. This is where the binary search tree comes in! A BST is like a sorted array, but with logarithmic insertion and deletion. Now, the operations:
  1. Binary search - O(log n)
  2. Select - O(log n)
  3. Min/max - O(log n)
  4. Pred/succ - O(log n)
  5. Rank - O(log n)
  6. Output in sorted order - O(n)
  7. Insert/Deletion - O(log n)
If you only need min/max, you should use a heap instead. If you only need fast insertion and lookup, use a hash table instead. 

Thursday, March 7, 2013

Hash Tables cont.

So how is a hash table implemented? We could use an array for quick insert, deletion, and lookup, but it will take up a lot of space. On the other hand, we could use a linked list. This would only take up as much as space as there are objects, but the 3 operations would not be O(1). To resolve this, we can use both.

The first way to implement a hash table could be to use an array of linked lists. We use a hash function to determine which index the object goes into. If there is already something in that index, this is called a "collision." To resolve this, we just make a chain at that index of the array.

The "Birthday Paradox" states that given 23 people, there is a 50% chance that two people will share the same birthday. By this principle, we can see that it doesn't take relatively many objects to cause a collision.

The second way to implement a hash table would be to use open addressing. When there is a collision, simply put it elsewhere. This could be the next available index, which is called linear probing. Or, you could use another hash table to determine the offset. Implementing deletion in this method is much trickier.

Wednesday, March 6, 2013

Hash Tables

A hash table is primarily used to keep track of an evolving set of stuff by using a key. It is good for inserting, deleting, and lookups. It is also sometimes referred to as a "dictionary."A hash table should not be used if you need to keep things in some particular order.


The operations that it supports are all blazingly fast, O(1) running time. However, this is only if it is implemented properly.

Applications
De-duplication: You are given a stream of object, such as a linear scan through a large file or objects arriving in real time. The goal is to remove duplicates (only keep track of unique objects). This could be  used to report only unique visitors to a website and avoid duplicates in search results.

When a new object x arrives, look it up in hash table H. If it is not found, then insert x into H.

The 2-SUM Problem: You are given an array of n integers and a target sum T. The goal is to determine whether or not there are two numbers x,y in A with x+y=T.

The naive way to do this would be to check all combinations (n choose 2). This exhaustive search is O(n2). A better way would be to sort the array. Then for each x in array A, use binary search to look for T-x. This will take O(nlogn). The best way would be to insert every element into a hash table. Then for every x, we can just look up its complement, T-x, which is O(1). This will be O(n).

Hash tables was first designed for and used to symbol table lookups for a compiler. It can also be used to block network traffic. Another application would be for search algorithms, such as for game tree explorations. (There are more chess configurations than size of web). You can add a configuration to a hash table to avoid exploring it more than once.

Tuesday, March 5, 2013

Implementing a Heap

We can think of a heap as a tree. It is rooted, binary, and as complete as possible. This means that the leaves on the lower most level is to the left of the leaves above it.

The heap property is:
(1) At every node x, key[x] <= all keys of x's children.
Consequently, the root node must have the minimum key value.

Heaps

A heap is a data structure that contains objects which have comparable keys. Some uses for a heap include: keeping track of employee records where the social security number is the key, network edges, events where there is a priority, etc.

There are two basic operations for a heap.
(1) Insert - add a new object to the heap
(2) Extract-min - remove an object in the heap with a minimum key value (with ties broken arbitrarily). You could always makes this an extract-max operation or negate every key and you will get the maximum when you extract-min.

The running time for both operations is O(logn) where n is the number of objects in the heap.

Another operation is heapify, which inserts a batch of objects into the heap. This seems like it should run in O(nlogn) but it can actually be done in O(n). Deletion can also be done in O(logn) time.

Applications
(1) One application is for heap sort. In selection sort, you compute the minimum multiple times, resulting in O(n2) running time. However, you can always heapify the set and then extract-min n times to get the elements in order. The running time for this algorithm is O(nlogn).

(2) A heap can also be called a priority queue. It can be used to schedule events and to check which event is going to occur next, using the timestamp as key. This makes it much faster than keeping an unordered list and finding the max each time.

(3) Thirdly we can use a heap for "median maintenance." You can given a sequence x1, ... xn and at each time step i, you need to give the median of {x1, ..., xi}. Use O(log i) time at each step i. We can solve this by maintaining two heaps, Hlow and Hhigh to keep track of the smallest half of the elements and the largest half of the elements respectively. Hlow supports extract-max while Hhigh supports extract-min. The key idea is to maintain the invariant that about i/2 smallest (largest) elements in Hlow (Hhigh).

When you add a new element, if it is smallest than max element in Hlow then add it in there. Otherwise, add to Hhigh. What if there is an imbalance? That is, if Hlow has two more elements than Hhigh? Then you simply extra the max of Hlow and add to the other heap.

So how do we compute the median? It is the max of the Hlow or the min of Hhigh. If i is even, then they are both medians. Otherwise, the median is just from the heap that has one more element than the other.

Monday, March 4, 2013

Strongly Connected Graph

A strongly connected graph is a graph such that there is a path connecting each vertex in the graph to another vertex in the graph. The strongly connected components of a directed graph G are the equivalence classes of the relation: u~v <-> path u->v AND path v->u. The ~ relation is an equivalence class, that is, ~ is reflexive, symmetric, and transitive.

Computing the strongly computed components allows you to use the for-free primitives (BFS, DFS, etc) on the components in linear time, which is blazingly fast.

Calling DFS on the right nodes will compute all the SCCs, but on the wrong nodes will just give back the union of multiple SCC's or even the whole graph.

With Kosaraju's Two-Pass Algorithm, we can compute the SCC's in linear time, with just two passes of DFS. Make a new graph with all the edges of original graph reversed. The naive way of implementing this would to run through the graph linearly and construct a new one. An optimization for this would be to just run DFS on the original graph, but going across arcs backwards.

Secondly, run DFS-loop on Grev. Thirdly, run DFS-loop on G.The first DFS-loop computes the "magic ordering" of the nodes while the second DFS-loop discovers the SCC's one by one.

DFS-loop(graph G)
     global variable t = 0 (for finishing times in first pass, number of nodes finished so far)
     global variable s = null (current source vertex, for leaders in second pass)
     Assume nodes labeled 1 to n
     for i:=n down to 1
          if i not yet explored
               s := i
               DFS(G, i)

DFS(graph G, node i)
     mark i as explored
     set leader(i) := node s
     for each arc(i,j) in G
          if j not yet explored
               DFS(G, j)
     t++
     set f(i) := t [i's finishing time]
   

Dijkstra's Shortest Path Algorithm

You are given as input a directed graph G=(V,E) where m=|E| and n=|V|. Each edge also has a nonnegative length. For each v in V, compute L(V) where L is the length of the shortest path from the source node s to v.

One might as why use Dijkstra's Algorithm if you can already find the shortest path via Breadth First Search. However, BFS actually only works when all the edge length are one. It only tells you how many edges it takes to get from s to v.

But then you can replace each edge e with with a directed path of length(e) edges. This will make the graph way too big. It is too wasteful and it won't be in linear time anymore.

Thirdly, you could always just add a large constant to each edge length so that there are no negative edge lengths. This seems plausible but then it will skew the shortest path results. If we add 5 to every edge length, a path with 10 edges will be increased by 50 while a path with only 5 edges would only be increased by 25.

Initialize:
X = {S} (vertices processed so far)
A[S] = 0 (computed shortest path distances)
B[S] = empty path [completed shortest paths]
Main Loop:
while (X != V) [when not all vertices have been processed]
     - among all edges (v, w) in E, with v in X and w not in X, pick one that minimizes A[v] + l(vw) [Dijkstra's greedy criterion]
     - call this edge (v*, w*)
     - add w* to X
     - set A[w*] := A[v*] + l(v*w*)
     - set B[w*] := B[v*] U (v*, w*)

Thursday, February 28, 2013

Topological Sorting

A topological ordering of a directed graph G is a labeling of G's nodes such that:
(1) the f(v)'s are the set {1, 2, ... , n}
(2) if (u, v) in G, then f(u) < f(v)

Every directed acyclic graph has a sink vertex - a vertex that has no outgoing edges.

Here is a slick way to do a topological sorting with DFS:

DFS-loop(graph G)
    mark all nodes unexplored
    current_label = n (number of nodes, to keep track of ordering)
    for each v in G
        if v not yet explored
            DFS(G, v)

DFS(graph G, vertex start)
    mark start explored
    for every edge(start, v)
        if v not yet explored
            DFS(G, v)
    set f(start) = current_label
    current_label--

This will run in linear time because you do a constant amount of time at each node and traverse each edge only once since we keep track of which has or has not been explored.

To prove correctness, we need to show that if (u,v) is an edge, then f(u)<f(v).
Case 1: If u is visited by DFS before v, the recursive call corresponding to v will finish before that of u, so f(v)>f(u).
Case 2: If v is visited before u, there is no path (v, u) because otherwise it would be a directed cycle. Then, the recursive call of v gets popped before u even gets put on the stack. Thus f(u)<f(v).


Tuesday, February 26, 2013

Depth First Search

This algorithm explores aggressively and backtracks only when necessary. This is similar to how we play a maze game.

To code DFS, we can mimic BFS, but use a Stack instead of a Queue. Another way to code this would be to do it recursively.
DFS(graph G, vertex s)
     mark s as explored
     for every edge (s, v):
          if v unexplored
               DFS(G, v)

Properties
At the end of DFS, all vertices that could have been reached will have been reached. The running time of this is also O(ns+ms)

BFS and Undirected Connectivity

Let G=(V, E) be an undirected graph. The connected components of G are the "pieces' of G. The goal is to compute all connected components. A good application for this is to check if a network is disconnected, or to see if everything in a graph can be reached from everything else.

To compute all components:
Mark all nodes unexplored
for i=1 to n
     if i not yet explored
          BFS(G, i)

The call to BFS discovers precisely i's connected component. This algorithm can't miss any nodes because it actually walks through every node so it must find every connected component. If we wanted to find the number of connected components, we could keep a count variable and increment it each time we call BFS.

Breadth First Search


BFS(graph G, vertex s)
[all nodes initially unexplored]
-mark s as explored
let Q = queue data structure, initialized with s
while (Q not empty)
     remove the first node of Q, call it v
     for each edge (v, w):
          if w is unexplored
               mark w as explored
               add w to Q (at the end)

Layer n+1 gets added to the queue when we process layer n. We visit each node only once and each edge at most twice (for edge(v, w), we might encounter it when we're at node v and at node w).

Using breadth first search, we can implement a shortest path algorithm quite easily. The goal is to compute dist(v), the fewest number of edges on a path from s to v.

The only extra code we need to incorporate is:
initialize dist(v) = {0 if v=s, infinity if v != s}. When considering edge (v, w), if w is unexplored, then set dist(w) = dist(v)+1.

Monday, February 25, 2013

Minimum Cut Problem

 The minimum cut problem is to determine the cut which has the least number of crossing edges. This is used when you want to break a graph down into two or more pieces. For example, in networks, the minimum cut can help determine where hot spots and weaknesses are.

Another example is image segmentation. An image can be thought of as a graph of pixels. Each pixel has an edge connecting to adjacent pixels, each with an edge weight which is how much you "expect" the two to be of the same object. Thus, by computing a minimum cut, we can essentially take out that object from the image.

Closest Pair Algorithm

Say you are given a set of points in the plane and you want to find the two points closest to one another. The brute force method of doing this would be to compute the distance between every possible pair of points and keep track of the minimum. This will take O(n2) because there is a quadratic number of pairs.

Let's assume that this problem were on a line, or 1-D. Then we can just sort all the points, O(nlogn) at best, and then run through the line to find the closest pair which must be adjacent points, O(n). So can we find a similar method that will solve the problem in 2-D?

We can sort the points according to x-coordinate and y-coordinate which will just take O(nlogn). This may not work for all points. If we wish to apply the divide and conquer method to solving this problem,  we might think to split the problem in half, recursively do work in each half, and combine the solutions of the subproblems to recover the solution to original problem.

We can let Q be the left half of P and R be the right half of P, which forms Qx, Qy, Rx, and Ry.

Graph Search

The goal of graph search is to find every node that is reachable from a starting node. We don't want to explore anything twice so that we can get O(m+n) running time, linear in the number of edges and vertices.

The generic algorithm for this would be as follows:
Given (graph G, vertex s)
-initially explore s, all other vertices unexplored
-while possible:
    -choose an edge (u, v) with u explored and v unexplored
    -mark v explored
-halt
At the end, if a node v is explored, then G has a path from s to v.

What gives rise to multiple algorithms of this general sort is in the ambiguity in choosing the next edge.

In Breadth First Search (BFS), we explore the graph in layers. The starting node is in layer x0. Its neighbors are in layer x1. For any node in layer xi, it will take i edges to connecting the starting node to it. BFS is good for computing shortest paths and connected components of an undirected graph. This can be implemented in a First In First Out data structure, such as a Queue.

Depth First Search (DFS), on the other hand, explores aggressively and backtracks only when necessary. It can be used to compute a topological ordering of a directed acyclic graph as well as the connect components in a directed graph. This can be implemented with a Last In First Out data structure such as a Stack.

Wednesday, February 20, 2013

Graphs

Graphs consist of vertices and edges. Vertices, also called nodes, can be pictured as the dots while edges are the lines connecting them. Graphs can be directed or undirected. Directed graphs have ordered pairs of endpoints (there is a head and tail) while undirected graphs do not.

Graphs are used ubiquitously. An obvious example is a map. The places are the vertices while the roads connecting them are the edges. Graphs can also be used to set precedence. That is, you can show which things are prerequisites for others, such as course prereqs.

A cut of a graph (V, E) is a partition of a graph into two non-empty sets A and B. Each set contains only the vertices that connect one vertex in the set to another in the set. Meanwhile, there are also edges connecting the vertices from set A to set B and vice versa.

In a graph with n nodes, you can either put them in set A, or set B. This gives us 2n cuts of the graph. However, since each set must be nonempty, it is actually 2n-2. Meanwhile, a crossing edge of a cut (A, B) is an edge whose endpoint is in A and head is in B.

A graph in one piece with n nodes must have at least n-1 edges. You start out with n distinct pieces and by adding an edge between two edges, you now have n-1 distinct pieces. To get to one piece, you need to do this n-1 times. This graph will have a maximum number of edges when every vertex is connected to every other vertex. Then there will n(n-1)/2 total edges because this is the number of ways of choosing 2 vertices from n of them.

If we let n be the # of vertices and m be the # of edges, we have that m is Omega(n) and O(n2). A graph is "sparse" if it is closer to the lower bound and "dense" if it is closer to the upper bound.

Friday, February 15, 2013

Selecting the i'th smallest element

Finding the ith smallest element may seem difficult at first. The naive way would be to say, "Hey let's sort it and then we can just pick out the ith one!" We could use MergeSort or even better yet, QuickSort and have it done in O(nlogn). This is called a reduction of our problem: reducing selecting to sorting and then selecting.

It turns out that we can actually accomplish this task by slighting changing QuickSort. Recall that after partitioning, the pivot stays in its "rightful" spot and doesn't get moved in any of recursive calls later on.   So let's say we're looking for the ith element and after the first partition, the pivot is at k.

If k=i, then we know we found the ith element and we're done! If k>i, then recursively partition the left hand side. Otherwise, recursively partition the right side.

What is the running time of this selection algorithm? Well it depends on how good the pivots are. If pivots are chosen in the worst possible way every time, then it will be O(n2). This would be if we chose the smallest or largest element every time, reducing the problem by only one element.

What about the best? This would be the median. Using the master method, we get that this will run in O(n).

Analysis of QuickSort

Another cool fact about QuickSort is that every two elements in an array will either get compared once or never. If you think about it, the only comparisons that ever occur are the ones between the pivot and every other element in [start, end]. Once these comparisons are over, there are two recursive calls to QuickSort, both of which do not include the pivot at all. Thus, that pivot can only be compared once at most.

Wednesday, February 13, 2013

Quicksort

Quicksort is a sorting algorithm that runs on average O(nlogn) and does in-place sorting. This saves space because it does not require a temporarily array to store values like in MergeSort.

The key concept behind Quicksort is the partitioning around a pivot. Choose one of the elements in the array as a pivot. Then rearrange the array so that everything less than the pivot is to its left and everything greater is to its right. Note that this rearrangement does not mean the entire array has to be sorted after this one partition. However, the pivot does end up in the "right" spot. That is, the pivot will not be moved anymore because it is already in the spot where it will be in the sorted array.

Partitioning can be done in linear time without allocating any extra memory and reduces the problem size.

QuickSort(Array a, length n)

  • if n = 1 return
  • p = ChoosePivot(a, n)
  • Parition a around p
  • QuickSort(first part)
  • QuickSort(second part)
Partitioning
The easy to partition the array would be to allocate a temporary array using O(n) space. This will still partition around the pivot in O(n) time. Assuming the pivot is the first element in the array, you traverse the array and keep a partitioned part and an unpartitioned part. You use one pointer to traverse the array and another to point to the first element in the partitioned part larger than the pivot. When you find an element smaller than the pivot, you simply swap it with the pointer's element. When you are finished, swap the element one before the pointer with the pivot to finish partitioning.

Finding the Optimal Pivot
So far, we have used the first element in the array as the pivot. What if the array were already sorted, or reverse sorted? Then partitioning would leave the pivot either in the same spot or in the back. Then we'd recursively call QuickSort on the (n-1) length subarray. We do this over and over until the array is sorted. This takes O(n2) time.

What would be a better way? If we want to balance the two subarrays so that they are about equal in length, then we would want to find the median. Thus, if we are lucky enough to be able to get the median as the pivot every time, then the algorithm would be much like MergeSort without the extra memory allocation, running in O(nlogn).

What if finding the median is too hard? What other choices do we have. Well, it is established that if we are able to split the subarrays just 25%/75%, then we will still have O(nlogn) running time. If we had 100 elements, we just have to pick an element that is anywhere from 25th smallest to 75th smallest. This is half the total elements, so we have 50% chance of picking this pivot.

As a side note, this is an instance of adding randomness to our algorithm. Randomness is actually really useful in many applications because it makes things easier, more elegant, and efficient, but more on this later.

In conclusion, we have established that a randomized QuickSort has an average running time of O(nlogn), a blazingly fast algorithm without the extra space allocation.

Monday, February 11, 2013

Meaning behind Master Method

After establishing what the running time of an algorithm based on three variables a, b, d, we might be wondering what is the meaning behind this method. Where do these logs and exponents come from?

If we look at the amount of work done at a single level of the recursion tree, we get that it is,
where aj is the number of level-j subproblems and the rest is the amount of work per level-j subproblem. Thus, the total amount of work must be,
Here we finally have the a and bd that we compared. But what do they represent? Remember, a is the number of subproblems we broke down our problem into. We can think of it as the rate of subproblem proliferation.

Now what about bd? This is the rate of work shrinkage. How come it isn't just b, the amount we shrank the problem by? If the problem requires a linear amount of work outside the recursion, then halving the input size will halve the amount of work. However, if we have a quadratic amount of work, halving the input size will actually quarter the amount of work. This is where the exponent d comes into play.

These are two forces in opposition. If the rate of subproblem proliferation is greater than the rate of work shrinkage, then we will actually be doing more work. If it is less, then we are effectively reducing the amount of work we are doing by dividing the problem into more subproblems. Otherwise, we will be doing the same amount of work at each level.

So let's break it down into three cases.
  1. a = bd : then we have that the fraction is 1 and the sum would simply be logbn. We can ignore the base because it will not make a difference and thus we get O(ndlogbn). 
  2. a < bd : then using the sum of a geometric series with r<1, we know that this sum will be less than some constant, independent of the number of terms. Then, using Big-O notation, the constants will be supressed and thus we have only O(nd)
  3. a > bd : Since r>1 in this geometric series, we know that the sum will be bounded by a constant times the largest term. With some algebra we will get that the running time will be O(nlogba).

Sunday, February 10, 2013

Master Method for Solving Recurrences

What makes the master method so awesome is that it can be thought of as a "black box" for solving recurrences. This means that you give it some inputs and it tell tell you the running time of the algorithm. However, it only works when all subproblems have the same size. For example in each recursive call to MergeSort, each recursive call does work on size n/2.

Base Case:
T(n) ≤ a constant c, for all sufficiently small n
For larger n:
T(n) ≤ aT(n/b) + O(nd) 
  1. a is the number of recursive calls
  2. b is the factor by which the problem size shrinks. It must be greater than one or the problem will never get any smaller
  3. d is the exponent of the combining time. It can be as small as 0, which indicates it takes linear time.
a, b, d must all be constants and independent of the input size.
So the master method is:
In the first case, the base for the logarithm doesn't matter because a base will only change the amount by a constant factor. However, in the third case, the base matters a lot because it could make the difference between linear and quadratic time.

Let's look at MergeSort as an example. We know T(n) = 2T(n/2) + O(n). We have a=2, b=2, and d=1. a=bd so we are in case 1. The running time will be O(ndlogn) which is just O(nlogn).

For binary search, we split the problem into two with n/2 input size each BUT we only work in one of them and just compare the element you're searching for with the center element, which takes constant time. Thus, we have a=1, b=2, d=0 and a=bd so the overall runtime will be O(logn).

Saturday, February 9, 2013

Strassen's Subcubic Matrix Multiplication Algorithm

A Review on Matrix Multiplication
Given two 2 nxn matrices X, Y and their product matrix Z, we know that Zij = (ith row of X) · (jth column of Y). This naive algorithm requires three for loops, two to traverse the Z matrix and one to obtain the values from X and Y for the dot product. This has an overall runtime of O(n3).

If we think back to MergeSort, we can attempt to apply the Divide and Conquer algorithm design paradigm. We can split each matrix into 4 matrices, as such:

AB
CD
EF
GH
=
AE+BGAF+BH
CE+DG CF+DH


Then, it can be proven that if we split the product matrix in 4 matrices as well, each submatrix will be the product of the corresponding submatrices in the two matrices.

Then to compute this product, we will need to do 8 recursive calls: 4 submatrices with two factors each.   It turns out that this is no better than the naive approach. It also has a runtime of O(n3).

Strassen's algorithm makes it possible to solve the problem with only 7 recursive calls. This may seem like a small decrease but if we do 1/8 less work on each time we recurse, the total amount of work saved is pretty significant. This turns out to have a subcubic runtime. 

The Algorithm So what is the algorithm? We need 7 recursive calls to find 7 products:
P1 = A(F - H)
P2 = (A + B)H
P3 = (C + D)H
P4 = D(G - E)
P5 = (A + D)(E + H)
P6 = (B - D)(G + H)
P7 = (A - C)(E + F)
Then, it is claimed that
AE+BG AF+BH
CE+DGCF+DH
=
P6+P5+P4-P2P1+P2
P3+P4P1-P3+P5-P7

This makes you wonder how in the world did Strassen come up with P1 through P7. We have no idea. "How can we make this better?" This is the ultimate question for coming up with better and better algorithms to solve our problems.

Thursday, February 7, 2013

Inversions

The first thing I've learned from Algorithms: Design and Analysis Part I is how to count inversions.

What is an inversion? An inversion is a pair of objects that are out of order. If we are given an array and our goal is to sort from low to high, an inversion would be two items ai and aj such that ai > aj, but  i < j. We can see that they are misplaced.

An array with no inversions is one that is sorted and an array with the maximum number of possible inversions would be sorted in reverse order. Thus, counting inversions is a way of checking how sorted or unsorted something is.

How do we find the number of inversions? Well, the naive and obvious way would be to traverse the array and compare each element to every other element. This algorithm would be O(n^2). Can we do it a better way? Yes. We'll use the divide and conquer method and piggyback on the MergeSort algorithm to do it in O(nlogn). By counting the number of inversions, we will also be sorting it.

By splitting the array in half, we know that:
total inversions = inversions in left + inversions in right + split inversions
A split inversion is one in which we have one element on the left half that will be an inversion with an element on the right half. To find the number of inversions in each half, we'll just recursively call the algorithm on each half. Finding the number of split inversions is the trickier task.

Say we have two halves, [2, 5, 6] and [1, 3, 8]. The number of split inversions will actually be counted when we do the merge subroutine. So the way merge works is that it will keep track of two pointers, one to the front of each half array. It will take the smaller element and increment the pointer. If an element is taken from the second array, we know that it must be smaller than the element in the first subarray, and subsequently every element thereafter. Then we can just sum up the number of elements left in the first array each time the second subarray is chosen from before the first.

Here is my code for this algorithm:
 int countInversions(int[] a) {
  return countInversions(a, 0, a.length, new int[a.length]);
 }

 int countInversions(int[] a, int start, int end, int[] t) {
  if (start == end - 1)
   return 0;
  int mid = (start + end) / 2;
  int x = countInversions(a, start, mid, t);
  int y = countInversions(a, mid, end, t);
  int z = subroutine(a, start, mid, end, t);
  return x + y + z;
 }

 int subroutine(int[] a, int start, int mid, int end, int[] t) {
  int i = start;
  int j = mid;
  int k = i;
  int count = 0;
  while (i < mid && j < end)
   if (a[i] < a[j])
    t[k++] = a[i++];
   else {
    t[k++] = a[j++];
    count += (mid - i);
   }
  System.arraycopy(a, i, t, k, mid - i);
  System.arraycopy(a, j, t, k, end - j);
  System.arraycopy(t, start, a, start, end - start);

  return count;
 }