Suppose we wish to search a linked list of length n, where each element contains a key k along with a hash value h(k). Each key is a long character string. How might we take advantage of the hash values when searching the list for an element with a given key?
首先计算出给定关键字的hash值. 对列表中的每个元素,先验证hash值对不对,再进行字符串的比较.
First, calculate the hash value for the specified keyword. For each item in the list, first check the hash value. If one string share the same value as the target string, compare the string content.
Suppose that a string of r characters is hashed into m slots by treating it as a radix-128 number and then using the division method. The number m is easily represented as a 32-bit computer word, but the string of r characters, treated as a radix-128 number, takes many words. How can we apply the division method to compute the hash value of the character string without using more than a constant number of words of storage outside the string itself?
One easy approach would adding the values of each character to get a sum and then take a modulo 128. Another way would be using n-degree equation with each character value acting as a co-efficient for the nth term. Example: S[0] * x^n + S[1] * x^(n-1)+ …. + S[n-1], for better result substitute x = 33 or 37 or 39. This is the famous Horner’s method for finding a hash value.
Consider a version of the division method in which h(k) = k mod m, where m = 2p - 1 and k is a character string interpreted in radix 2p. Show that if string x can be derived from string y by permuting its characters, then x and y hash to the same value. Give an example of an application in which this property would be undesirable in a hash function.
有一个很简单的数论知识.先举个例子
3 * 128^3 mod 127 = 3 * 128^2 * (128 mod 127) mod 127 = 3 * 128^2 mod 127 = 3 mod 127 = 3
无论怎么交换字符串的order,radix的影响都会消失. 因为2p^n mod 2p-1 === 1.
Consider a hash table of size m = 1000 and a corresponding hash function h(k) = ⌊m(k A mod 1)⌋ for . Compute the locations to which the keys 61, 62, 63, 64, and 65 are mapped.
key | value |
---|---|
61 | 700 |
62 | 318 |
63 | 936 |
64 | 554 |
65 | 172 |
Define a family of hash functions from a finite set U to a finite set B to be ϵ-universal if for all pairs of distinct elements k and l in U,
where the probability is over the choice of the hash function h drawn at random from the family . Show that an ϵ-universal family of hash functions must have
Let
In other words, is the length of linked list of slot i in hash table.
Then we have
also
From above equations:
And finally:
Let U be the set of n-tuples of values drawn from Zp, and let B = Zp, where p is prime. Define the hash function hb : U → B for b in Zp on an input n-tuple [a0, a1, ..., an-1] from U as
and let H={hb:b∈Zp}. Argue that H is ((n−1)/p)-universal according to the definition of ϵ-universal in Exercise 11.3-5. (Hint: See Exercise 31.4-4.)
首先用归纳法证明以下结论: 对于取自集合 的 n 元组 , 满足
的 x (x ∈ Zp) 最多有 n 个。
- 当 n = 1 时, 假设有 x = v 与 x = w (不失一般性, v > w) 同时满足 , 则有 , 其中 a_1 ∈ Zp, (v - w) ∈ Zp, p 为素数, 显然不成立。因此, 当 n = 1 时, 结论成立。
- 当 n = k 时, 至多有 k 个 x (x ∈ Zp) 满足
下面证明当 n = k + 1 时, 结论成立。
不妨设
其中 v > w, 得到
由于(v - w) < p, 不影响取余结果, 上式可变形为
更进一步, 变为
上式说明, 当确定一个 v 可以确定一组取自 的序列 , 根据(0)式, (1)式的解最多有k个。 即: mod p 后, 和 v 具有相同余数的 x 有 k 个 也就是说:对一组确定的 , 具有相同余数的 x 最多有 (k + 1) 个, 因此余数为0的 x 也最多只有 (k + 1) 个
命题得证。
下面证明原始的问题:
由于取模运算, 可以等价的换为 中的元素, 根据上面证得的结果, (2)式最多有 (n - 1) 个 b 满足要求 即, H 中最多有 (n - 1) 个散列函数会导致两个不同输入散列结果相同, 因此
Follow @louis1992 on github to help finish this task.