-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Functional incorrectness: criterion used for item identity is hash value instead Eq
-identity
#24
Comments
I agree this is surprising. Can you provide an example showing this behavior? The hash table implementation used in PriorityQueue is provided by the IndexMap crate, so probably a fix must be implemented at that level |
I think the issue is in your |
I don't think I agree with this.
If 3 returns true, then it found what we were looking for, otherwise, the collision resolution strategy has to be used. |
The tests in that repo seem to pass |
I did a quick inspection in the IndexMap code and it seems to confirm what I said above. Also the implementation looks correct. |
The purpose of a key in a hash table is to have a unique hash value, not to be equal in some other sense. Also, note that The tests pass, because I manually implemented
|
The |
Oh, Now maybe I understand what you want to say. Would you like to compute |
I want the hash value of a type to be independent of its identity. |
That is not possible. As stated in the Standard library documentation, if you implement these yourself, it is important that the following property holds:
In other words, if two keys are equal, their hashes must be equal. |
From that property (which is an arbitrary restriction in the Rust standard library), it does not follow that if the hash values of a pair of keys are equal, the keys themselves must be equal, however (i.e., vice versa). This means you cannot rely on item types being Furthermore, it is a design choice in |
This assumption is never made in PriorityQueue. The check that
The property is not an arbitrary restriction. It depends on how hash tables work (in any language they are implemented).
The choice to use a HashMap as the underlying storage is driven by the fact that it is the only way (I am aware of) to implement the priority change function with a complexity
Again, this is not true. IndexMap ( |
An example will come, but it would only prove something about |
I am not defensive. I already published a new version of the Priority Queue to update the documentation and clarify the property that |
@garro95 You're right about The criterion used for item identity in However, suppose the hash value is equal for all items. Will Due to a design decision at the Rust standard library level, it isn't actually correct to implement |
Hmm, in fact if I were to hash more fields than used in |
Yeah, I know (joking LOL, but I was actually quite confident)
Yeah, that is what is called a HashMap. In PriorityQueue, the HashMap key is the PriorityQueue's item, while the value is its priority. Just to refresh you how HashMaps work, they do the following operations on lookup:
If this last check returns false there must have been a hash collision. HashMaps implement collision resolution strategies. According to the strategy, the HashMap will check if the other elements with a colliding hash match with the provided key, until an empty position is reached, which means the key is not present. If you read this carefully, it will be evident why the property mentioned above must hold. The point is, in other words, there is no explicit comparison between the hashes, but the hash is used as an index in the map. The comparison is only performed on the key using the Now, if you had to deal with HashMaps that did not implement collision resolution strategies and only used to compare hashes, I am sorry, but that were wrong implementations of a HashMap.
That would be a very bad
Again, I don't want to defend anything or anyone, but that's the case with every HashMap implementation, regardless of the language. As I tried to show you above, it's a matter of math.
No, it's the other way around: you have to check for equivalence at least the fields on which you compute the hash value, so that the property
Something worse can happen in that case: you could not retrieve an item despite having something |
A very surprising defect.
PriorityQueue
ignores theEq
-implementation of the item type. Whether or not an item is already in the queue should not be determined by an implementation detail leaked from the underlying data structure, namely its hash value. Item identity should be determined by itsEq
-identity.This defect also decreases performance in cases where comparing identities costs less than comparing hash values, which I presume is always.
The text was updated successfully, but these errors were encountered: