Chogarithms: April 2013

Sunday, April 21, 2013

Lists

A fundamental data structure is the immutable linked list. It is composed of two parts:

Nil: the empty set
Cons: a cell containing an element and the rest of the list

trait IntList ...
class Cons(val head: Int, val tail: IntList) extends IntList ...
class Nil extends IntList

The use of val is syntactic sugar for:

class Cons(_head: Int, _tail: IntList) extends IntList {
val head = _head
val tail = _tail
}

This saves the use of _head and _tail.

Pathological Data Sets and Universal Hashing Motivation

The load of a hash table is the total number of elements in the hash table divided by total amount of space in the table. This is usually denoted as alpha. Alpha=O(1) is necessary for operations to run in constant time. With open addressing we need alpha << 1. For good performance, we need to control load.

Secondly, for good performance, we need a good hash function, i.e. spreads the data evenly across buckets. However, a hash table that spreads out every data set does not exist. That is, for every hash function, there is a pathological data set.

Pathological Data in the Real World
Reference: Crosby and Wallach, USENIX 2003.
You can paralyze several real-world systems (like network intrusion detection) by exploiting badly designed hash functions. Two characteristics of these breakable systems were:

Open source
Overly simplistic hash function designed for speed (easy to reverse engineer a pathological data set)

So what should we do to solve this problem?

Object Oriented Programming

All OOP languages implement dynamic method dispatch. This means that the code invoked by a method call depends on the runtime type of the object that contains the method.

This dynamic dispatch is analogous to calls to higher-order-functions. Can we implement objects in terms of higher-order-functions? Can we implement higher-order-functions in terms of objects?

Traits

For a class that wants to inherit from more than one superclass, it can use inherit from traits. Traits are like abstract classes defined as:

trait name {}

Classes and objects can only inherit from one superclass, but many traits. For example,

class Square extends Shapes with Planar with Movable ...

Traits are like interfaces in Java, but are more powerful since they can contain fields and concrete methods. However, they cannot have valued parameters.

Saturday, April 13, 2013

Functions returning Functions

In Scala, we can use multiple parameter lists. Before

def sum(f: Int => Int): (Int, Int) => Int = {
def sumF(a: Int, b: Int): Int =
if (a > b) 0
else f(a) + sumF(a+1, b)
sumF
}

Now, we can turn that into

def sum(f: Int => Int)(a: Int, b: Int): Int =
if (a>b) 0 else f(a) + sum(f)(a+1, b)

This way, we can still use sum without the second parameter list such as sum(cube).

Friday, April 12, 2013

Currying

In the sum function, there is repetition in the parameters a and b which just get passed from one function to another. We can get rid of these with currying. Then, we can rewrite sum as follows:

def sum(f: Int => Int): (Int, Int) => Int = {
def sumF(a: Int, b: Int): Int =
if (a > b) 0
else f(a) + sumF(a+1, b)
sumF
}

Now sum is a function that returns another function. This function applies the input function parameter and sums up the result. We can redefine the three sum functions as:

def sumInts = sum(x => x)
def sumCubes = sum(x => x * x * x)
def sumFacts = sum(factorial)

If we wanted to just use the function sumCubes without explicitly defining it, we could do it another way: sum(cube)(1, 10). sum(cube) applies the cube function to the sum function and returns the sum of cubes function. This function is then applied to the parameters (1,10). We see that sum(cube) is equivalent to our definition of sumCubes. Function application associates to the left.

Monday, April 8, 2013

Rewriting the main sum method

We can optimize the main sum method by making it tail-recursive. This can be done with an accumulator as follows:

def sum(f: Int => Int, a: Int, b: Int) = {
def loop(a: Int, acc: Int): Int =
if (a>b) acc
else loop(a+1, f(a) + acc)
loop(a, 0)
}

Anonymous Functions

An anonymous function is one that does not have a name and is treated as a literal. We can rewrite the cubic function as: (x: Int) => x * x * x. Another example is: (x: Int, y: Int) => x + y.

Every anonymous function can be written with a def like normal functions. Therefore we can say that anonymous functions are syntactic sugar.

Using anonymous functions, we can rewrite the three sum functions more succinctly:
def sumInts(a: Int, b: Int) = sum(x => x, a, b)
def sumCubes(a: Int, b: Int) = sum(x => x * x * x, a, b)

Higher Order Functions

Functional languages treat functions as first-class values, that is, like any other value. Higher order functions take other functions as parameters or return value.

Let's make three functions that sum up a range of integers, a range of their cubes, and a range of their factorials.

def sumInts(a: Int, b: Int): Int =
if (a>b) 0 else a + sumInts(a+1, b)
def sumCubes(a: Int, b: Int): Int =
if (a>b) 0 else cube(a) + sumCubes(a+1, b)
def sumFactorials(a: Int, b: Int): Int =
if (a>b) 0 else fact(a) + sumFactorials(a+1, b)

You notice that these three functions are nearly identical, except for one area: what they do at each value of a. We want to factor out the common things. How can we do this? Higher order functions!

We can make three functions to do the work at each level. Say,

def id(x: Int):Int = x
def cube(x: Int):Int = x * x * x
def fact(x: Int):Int = if (x==0) 1 else x * fact(x-1)

We can define a single function that can do all the work:
def sum(f: Int => Int, a: Int, b: Int):Int =
if (a>b) 0
else f(a) + sum(f, a+1, b)

def sumInts(a: Int, b: Int) = sum(id, a, b)
def sumCubes(a: Int, b: Int) = sum(cube, a, b)
def sumFactorials(a: Int, b: Int) = sum(fact, a, b)

In sum, f is the type of function that maps Int to Int. As you can see, this makes the code a lot more compact and breaks it down into more modules. However, we had to define 3 auxilliary functions: id, cube, and fact. These can be hidden using anonymous functions.

Friday, April 5, 2013

Tail Recursion

If an argument calls itself as its last action, the function's stack frame can be reused. This is called tail recursion. These are iterative processes and executive much like a loop.

However, if we have a function such as factorial(n) that calls n*factorial(n-1) last, it does not act as a tail recursive function.

It's best to refactor a function so that it is tail recursive because the JVM will only allow about 1000 stack frames to be used.

Thursday, April 4, 2013

Blocks and Lexical Scope

In the sqrt example, the user does not need to see the sqrtIter, isGoodEnough, and other auxillary functions. Thus, we can wrap them in a block under sqrt so that sqrt is the only visible function.

A block is delimited by braces. The last element in the block is an expression that defines its value. The definitions inside a block are only visible within the block and shadow definitions of the same names outside the block.

Definitions outside the block are clearly visible to the inside of a block so there is no need to redefine parameters (such as x for sqrt). Nesting method definitions makes code much more cleaner as well. Here is my code for sqrt:

def abs(x: Double): Double = if (x > 0) x else -x
def sqrt(x: Double): Double = {
def isGoodEnough(guess: Double): Boolean =
abs(guess * guess - x) / x < 0.001
def improveGuess(guess: Double): Double =
(guess + x / guess) / 2
def sqrtIter(guess: Double): Double =
if (isGoodEnough(guess)) guess
else sqrtIter(improveGuess(guess))
sqrtIter(1.0)
}

Unlike Java, Scala does not require a semicolon at the end of every line. However, you do need it if you want to put more than one statement on one line. This raises the issue of multiline statements. You could wrap them in parentheses or put the operator on the first line. For example:
a. (long-expression1
+ long-expression2)
b. long-expression1 +
long-expression2

Conditional Expressions

To choose between two values, we can use a conditional expression if-else which looks like Java's if-else but is used for expressions rather than statements. For example,

def abs(x:Int) = if (x > 0) x else -x

Expressions like && and || use short-circuit evaluation because their right operands are not always evaluated.

When we define a function, we are using by-name since we evaluate parameters when we use them. When we define a val (val x = square(2)), we are using by-value.

Therefore, def x = loop will work perfectly fine, but val x = loop will not work.

def and(x: Boolean, y: => Boolean): Boolean = if (x) y else false

def or(x: Boolean, y: => Boolean): Boolean = if (x) true else y

We have to remember to include the => to make the second parameter pass by-name. This will let the function use short-circuit evaluation.

Evaluation Strategies and Termination

Call-By-Value termination always implies Call-By-Name termination. However, the converse is not true. For example, given a function {def foo(x: Int, y: Int) = x} and we call foo(1,loop) where loop does not terminate,

In CBV, we need to evaluate loop which does not terminate so this does not terminate
In CBN, we only need the first argument which it will readily return

Scala normally uses Call-By-Value because it is usually exponentially faster to compute. However, you can make a function use CBN by adding a => such as: {def constOne(x: Int, y: => Int) = 1}

If we call const(1+2, loop) we get const(3,loop) which evaluates to 1 since the parameter loop is CBN. On the other hand, if we call const(loop, 1+2), we will get an infinite cycle!

Elements of Programming

Every programming language has three things:

Primitives
Expressions to combine primitives
Ways to abstract expressions, introducing a name for the expression

Scala has an interactive shell that one can use to test simple expressions called a REPL (Read, Evaluate, Print, Loop). You evaluate an expression by taking the leftmost operator, evaluating its operands, and applying the operator on the operands.

You can define a function in Scala as follows: def square(x:Double):Double = x * x. The second Double could be omitted.

There are two ways of evaluating a function application. The first is called Call-By-Value.

Evaluate function arguments from left to right
Replace function application by function's RHS, and
Replace the formal parameters in the function by the actual arguments

The Call-By-Name method of evaluating a function is similar but it doesn't evaluate the arguments until the very end. This is beneficial because it will not evaluate arguments that are not used. However, there may be repeated evaluations of the same expressions.

These schemes of expression evaluation are called substitution models. This can be applied to all expressions that do not have side effects (i.g., x++ is a side effect).

Tuesday, April 2, 2013

Why Functioning Programming

I watched a keynote talk by Martin Odersky on YouTube about why Scala. Today, the problem in computing is not in the hardware but rather the software. Since hardware developers have firmly decided that the way to go with building hardware is not to add more clock cycles but rather increase the number of cores, we as programmers must deal with more parallelism and concurrency.

Parallelism is running a problem across multiple cores, even if they're not running at the same time. However, concurrency is running multiple threads at the same time.

So why is Scala over Java in some sense? Scala is much more precise and compact. Whereas just to construct a class and its constructor with one instance variables, you need to repeat the instance variable three times, Scala only needs to do it once. Whats even more is that to make some Java program run in parallel could take days while in Scala you can just add .par and be done. Scala is actually shorthand for "scalable language."

There are also more tools to utilize in Scala to give you more control over things. This video was definitely a great motivation for me to learn Scala and functional programming from the Coursera course taught by this keynote speaker.

Resources:
Video Link