Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve hashCode() implementation for Tuple2 #2803

Merged
merged 1 commit into from
Aug 28, 2024
Merged

Conversation

pivovarit
Copy link
Member

@pivovarit pivovarit commented Aug 22, 2024

The current implementation of the hashCode() method for Tuple2 has a significant flaw that can easily lead to hash collisions, particularly in data structures like LinkedHashMap.

LinkedHashMap<String, Integer> m1 = LinkedHashMap.of("a", 1, "b", 2);
LinkedHashMap<String, Integer> m2 = LinkedHashMap.of("a", 2, "b", 1);
System.out.println(m1.hashCode() == m2.hashCode()); // true

In this case, m1 and m2 should ideally produce different hash codes, but they don't. This is because the current implementation of Tuple#hashCode() simply sums the hash codes of all elements, which can result in identical hash codes for different tuples.

A potential solution is to apply the XOR (^) operator to the hash codes of all elements. XOR is order-sensitive, so it would resolve the issue in the example above, where the order of elements differs between tuples.

However, XOR has its limitations and is only a suitable solution for tuples with up to two elements. This is because XOR is a commutative and associative operation, meaning that the XOR of multiple elements can still result in rather bad collisions:

Tuple3<Integer, Integer, Integer> t1 = Tuple.of(1, 2, 3);
Tuple3<Integer, Integer, Integer> t2 = Tuple.of(1, 3, 2);
System.out.println(t1.hashCode() == t2.hashCode()); // true

The new implementation is not perfect as well:

HashMap<Integer, Integer> hm1 = HashMap.of(0, 1, 1, 2);
HashMap<Integer, Integer> hm2 = HashMap.of(0, 2, 1, 3);
System.out.println(hm1.hashCode() == hm2.hashCode()); // true

But it is imperfect in the same way as standard library:

java.util.HashMap<Integer, Integer> jhm1 = new java.util.HashMap<>();
jhm1.put(0, 1);
jhm1.put(1, 2);

java.util.HashMap<Integer, Integer> jhm2 = new java.util.HashMap<>();
jhm2.put(0, 2);
jhm2.put(1, 3);

System.out.println(jhm1.hashCode() == jhm2.hashCode()); // true

Related: #2733

@pivovarit pivovarit force-pushed the tuple-hashcode branch 3 times, most recently from 232d995 to eb1b388 Compare August 23, 2024 06:43
@pivovarit pivovarit changed the title Improve hashCode() implementation for Tuple* Improve hashCode() implementation for Tuple2 Aug 23, 2024
@pivovarit pivovarit force-pushed the tuple-hashcode branch 3 times, most recently from 1ea8059 to 66b7794 Compare August 23, 2024 07:56
@LionelMeli
Copy link

Hello @pivovarit

I don't think the issue is related to hashCode implementation of tuples.
When you try something like:
Tuple3<Integer, Integer, Integer> t1 = Tuple.of(1, 2, 3); Tuple3<Integer, Integer, Integer> t2 = Tuple.of(1, 3, 2); System.out.println(t1.hashCode() == t2.hashCode()); // false
it's ok (even with similar with Tuple2 type).

If you replace the hastCode implementation for the LinkedHashMap class from
return Collections.hashUnordered(this);
to
return Collections.hashOrdered(this); (at the end the same mechanism like java.util.Objects.hash(Object... values))
Apparently seems to solve the issue (however some test cases failed with this change like shouldEqualsIgnoreOrder())
Same change could be applied for vavr HashMap.

Hope this help you.
Regards.

@pivovarit
Copy link
Member Author

pivovarit commented Aug 24, 2024

I think there are two issues here:

  • the collision from the first post
  • inclusion of ordering into hashCode() for LinkedHashMap

In this change, I attempted the first one by adjusting the hashCode() implementation to align with Java's.

I'm not sure yet what to do with the second one.

Java's LinkedHashMap ignores the ordering:

public static void main(String[] args) {
    LinkedHashMap<String, Integer> m1 = new LinkedHashMap<>();
    m1.put("a", 1);
    m1.put("b", 2);
    LinkedHashMap<String, Integer> m2 = new LinkedHashMap<>();
    m2.put("b", 2);
    m2.put("a", 1);

    System.out.println("m1.hashCode() = " + m1.hashCode()); // 192
    System.out.println("m2.hashCode() = " + m2.hashCode()); // 192

    System.out.println("m1.equals(m2) = " + m1.equals(m2)); // true
}

@LionelMeli
Copy link

Thanks for your clarification.

@pivovarit pivovarit marked this pull request as ready for review August 28, 2024 15:40
@pivovarit pivovarit enabled auto-merge (squash) August 28, 2024 15:44
@pivovarit pivovarit merged commit b1ce608 into master Aug 28, 2024
4 checks passed
@pivovarit pivovarit deleted the tuple-hashcode branch August 28, 2024 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants