Wednesday, March 6, 2013

Hash Tables

A hash table is primarily used to keep track of an evolving set of stuff by using a key. It is good for inserting, deleting, and lookups. It is also sometimes referred to as a "dictionary."A hash table should not be used if you need to keep things in some particular order.


The operations that it supports are all blazingly fast, O(1) running time. However, this is only if it is implemented properly.

Applications
De-duplication: You are given a stream of object, such as a linear scan through a large file or objects arriving in real time. The goal is to remove duplicates (only keep track of unique objects). This could be  used to report only unique visitors to a website and avoid duplicates in search results.

When a new object x arrives, look it up in hash table H. If it is not found, then insert x into H.

The 2-SUM Problem: You are given an array of n integers and a target sum T. The goal is to determine whether or not there are two numbers x,y in A with x+y=T.

The naive way to do this would be to check all combinations (n choose 2). This exhaustive search is O(n2). A better way would be to sort the array. Then for each x in array A, use binary search to look for T-x. This will take O(nlogn). The best way would be to insert every element into a hash table. Then for every x, we can just look up its complement, T-x, which is O(1). This will be O(n).

Hash tables was first designed for and used to symbol table lookups for a compiler. It can also be used to block network traffic. Another application would be for search algorithms, such as for game tree explorations. (There are more chess configurations than size of web). You can add a configuration to a hash table to avoid exploring it more than once.

No comments:

Post a Comment