Secondly, for good performance, we need a good hash function, i.e. spreads the data evenly across buckets. However, a hash table that spreads out every data set does not exist. That is, for every hash function, there is a pathological data set.
Pathological Data in the Real World
Reference: Crosby and Wallach, USENIX 2003.
You can paralyze several real-world systems (like network intrusion detection) by exploiting badly designed hash functions. Two characteristics of these breakable systems were:
- Open source
- Overly simplistic hash function designed for speed (easy to reverse engineer a pathological data set)
So what should we do to solve this problem?
No comments:
Post a Comment