1. Hashing
-- Save items in a key-indexed table (index is a function of the key).
-- Hash function: Method for computing array index from key.
-- Classic space-time tradeoff.
-- No space limitation: trivial hash function with key as index.
-- No time limitation: trivial collision resolution with sequential search.
-- Space and time limitations: hashing (the real world).
2. Computing the hash function
-- Idealistic goal: scramble the keys uniformly to produce a table index.
-- Efficiently computable.
-- Each table index equally likely for each key.
-- Practical challenge. Need different approach for each key type.
3. Java hashCode implementation :
-- Integer :
public final class Integer { private final int value; ... public int hashCode() { return value; //using value of the Integer itself } }
-- Boolean :
public final class Boolean { private final boolean value; ... public int hashCode() { if (value) return 1231; else return 1237; } }
-- Double :
public final class Double { private final double value; ... public int hashCode() { long bits = doubleToLongBits(value); //convert to IEEE 64-bit representation; return (int) (bits ^ (bits >>> 32)); //xor most significant 32-bits with least significant 32-bits } }
-- String :
public final class String { private final char[] s; ... public int hashCode() { int hash = 0; // Horner's method to hash string of length L: L multiplies/adds. // Equivalent to h = s[0] · 31^(L–1) + … + s[L – 3] · 31^2 + s[L – 2] · 31^1 + s[L – 1] · 31^0. for (int i = 0; i < length(); i++) hash = s[i] + (31 * hash); return hash; } }
-- user-defined types
public final class Transaction implements Comparable<Transaction> { private final String who; private final Date when; private final double amount; ... public int hashCode() { int hash = 17; hash = 31*hash + who.hashCode(); hash = 31*hash + when.hashCode(); hash = 31*hash + ((Double) amount).hashCode(); return hash; } }
-- "Standard" recipe for user-defined types:
-- Combine each significant field using the 31x + y rule.
-- If field is a primitive type, use wrapper type hashCode().
-- If field is null, return 0.
-- If field is a reference type, use hashCode().
-- If field is an array, apply to each entry.
-- Modular hashing:
-- Hash code: An int between -2^31 and 2^31 - 1.
-- Hash function: An int between 0 and M - 1 (for use as array index).
-- Math.abs(key.hashCode()) % M --> 1-in-a-billion bug: hashCode() of "polygenelubricants" is -2^31
-- correct implementation:
private int hash(Key key) { return (key.hashCode() & 0x7fffffff) % M; }
4. Uniform hashing assumption:
-- Uniform hashing assumption: Each key is equally likely to hash to an integer between 0 and M - 1.
-- Birthday problem: Expect two balls in the same bin after ~ sqrt( π M / 2 ) tosses.
-- Coupon collector: Expect every bin has ≥ 1 ball after ~ M ln M tosses.
-- Load balancing: After M tosses, expect most loaded bin has Θ ( log M / log log M ) balls.
5. Separate chaining hash table
-- Use an array of M < N linked lists.
-- Hash: map key to integer i between 0 and M - 1.
-- Insert: put at front of ith chain (if not already there).
-- Search: need to search only ith chain.
-- Java Implementation
public class SeparateChainingHashST<Key, Value> { private int M = 97; // number of chains private Node[] st = new Node[M]; // array of chains , array doubling and halving code omitted private static class Node { private Object key; private Object val; private Node next; } private int hash(Key key) { return (key.hashCode() & 0x7fffffff) % M; } public Value get(Key key) { int i = hash(key); for (Node x = st[i]; x != null; x = x.next) if (key.equals(x.key)) return (Value) x.val; return null; } public void put(Key key, Value val) { int i = hash(key); for (Node x = st[i]; x != null; x = x.next) if (key.equals(x.key)) { x.val = val; return; } st[i] = new Node(key, val, st[i]); } }
6. Analysis of separate chaining :
-- Proposition: Under uniform hashing assumption, prob. that the number of keys in a list is within a constant factor of N / M is extremely close to 1.
-- Consequence: Number of probes for search/insert is proportional to N / M.
-- M too large ⇒ too many empty chains.
-- M too small ⇒ chains too long.
-- Typical choice: M ~ N / 5 ⇒ constant-time ops.
7. Open addressing
-- When a new key collides, find next empty slot, and put it there.
-- Hash: Map key to integer i between 0 and M-1.
-- Insert: Put at table index i if free; if not try i+1, i+2, etc.
-- Search. Search table index i; if occupied but no match, try i+1, i+2, etc.
-- Array size M must be greater than number of key-value pairs N.
-- Java Implementation
public class LinearProbingHashST<Key, Value> { private int M = 30001; private Value[] vals = (Value[]) new Object[M]; // array doubling and halving code omitted private Key[] keys = (Key[]) new Object[M]; private int hash(Key key) { /* as before */ } public void put(Key key, Value val) { int i; for (i = hash(key); keys[i] != null; i = (i+1) % M) if (keys[i].equals(key)) break; keys[i] = key; vals[i] = val; } public Value get(Key key){ for (int i = hash(key); keys[i] != null; i = (i+1) % M) if (key.equals(keys[i])) return vals[i]; return null; } }
8. Knuth's parking problem
-- Model. Cars arrive at one-way street with M parking spaces. Each desires a random space i : if space i is taken, try i + 1, i + 2, etc. What is mean displacement of a car?
-- Half-full: With M / 2 cars, mean displacement is ~ 3 / 2.
-- Full: With M cars, mean displacement is ~ sqrt( π M / 8 ).
-- Proposition. Under uniform hashing assumption, the average # of probes in a linear probing hash table of size M that contains N = α M keys is: ~1/2(1+1/(1-α ) ) for search hit and ~1/2(1+1/(1-α )^2) for search miss or insert.
-- Parameter choice
-- M too large ⇒ too many empty array entries.
-- M too small ⇒ search time blows up.
-- Typical choice: α = N / M ~ ½.
9. ST implementations summary:
10. Hashing Variations:
-- Two-probe hashing. (separate-chaining variant)
-- Hash to two positions, insert key in shorter of the two chains.
-- Reduces expected length of the longest chain to log log N.
-- Double hashing. (linear-probing variant)
-- Use linear probing, but skip a variable amount, not just 1 each time.
-- Effectively eliminates clustering.
-- Can allow table to become nearly full.
-- More difficult to implement delete.
-- Cuckoo hashing. (linear-probing variant)
-- Hash key to two positions; insert key into either position; if occupied, reinsert displaced key into its alternative position (and recur).
-- Constant worst case time for search.
相关推荐
在Java中运用Hashtables,?贘ava中运用Hashtables
useHashTables.cpp -- code that demonstrates the use of the hash tables
C macros for hash tables and more
基于DHT的结构化P2P中资源搜索协议的比较分析
Among these, cuckoo hashtables provide excellent performance by allowing lookupsto be processed with very few memory accesses (2 to 3 perlookup). Yet, for large tables, cuckoo hash tables remain mem-...
绍在Java中运用Hashtables
Algorithms Lecture 12: Hash Tables [Fa’13]Insanity is repeating the same mistakes and expecting different results.— Narcotics Anonymous (1981)Calvin: There! I finished our secret code! Hobbes: Let’...
使用Golang Hashtables的非常简单,惯用和线程安全的实现。 安装 ❯ go get -u github.com/HotPotatoC/hashtable 用法 设定值 ht := hashtable . New () ht . Set ( "user" , "John" ) 获得价值 ht . Get ( "user" ...
WijsAnalysing the Performance of GPU Hash Tables for State Space ExplorationNathan Cassee Eindhoven University of TechnologyEindhoven, The Netherlands N.W.Cassee@student.tue.nlAnton Wijs Eindhoven ...
Disk Based Hash Tables and Quantified NumbersEdscott Wilson GarciaMarch 24, 2014AbstractThis paper describes the design of Disk Based Hash Tables: n–dimen- sional self–indexed data tables. The ...
16Concurrent Hash Tables: Fast and General(?)!TOBIAS MAIER and PETER SANDERS, Karlsruhe Institute of Technology, GermanyROMAN DEMENTIEV, Intel Deutschland GmbH, GermanyConcurrent hash tables are one ...
Concurrent Hash Tables: Fast and General(?)!Tobias Maier1, Peter Sanders1, and Roman Dementiev21 Karlsruhe Institute of Technology, Karlsruhe, Germany {t.maier,sanders}@kit.edu 2 Intel Deutschland ...
Goals: This assignment is designed to reinforce the student's understanding of the use of hash tables as searchable containers. Outcomes: Students successfully completing this assignment would master...
Goals: This assignment is designed to reinforce the student's understanding of the use of hash tables as searchable containers. Outcomes: Students successfully completing this assignment would master...
GNU库。 截至 2014 年 4 月 9 日,DBH 是一个 GNU 程序,并更名为“GNU libdbh”。 用于创建和管理基于 64 位磁盘的哈希表的库 在键和值之间建立关联,以便可以快速找到给定键的值并将其加载到内存中:64 位支持
哈希表 哈希表的java表示 单独的链接哈希 二次探测哈希 需要确保质数大小的工作; 容易修复 -> 前 20 个素数的数组? 麻省理工学院许可证 (MIT) 版权所有 (c) [年份] [全名] 特此授予任何人免费获得本软件副本和...
You will increase the range of problems you can solve when you learn how to manipulate versatile and popular data structures such as binary trees and hash tables. This book assumes you have a ...
哈希表 哈希表的不同实现(从头开始的代码)
8.Hashing, Hash Tables, and Scatter Tables 9.Trees 10.Search Trees 11.Heaps and Priority Queues 12.Sets, Multisets, and Partitions 13.Garbage Collection and the Other Kind of Heap 14.Algorithmic ...
Hash tables are the fundamental data structure for analytical database workloads, such as aggregation, joining, set filtering and records deduplication. The performance aspects of hash tables differ ...