Improve hashCode() implementation for Tuple2 #2803

pivovarit · 2024-08-22T20:23:20Z

The current implementation of the hashCode() method for Tuple2 has a significant flaw that can easily lead to hash collisions, particularly in data structures like LinkedHashMap.

LinkedHashMap<String, Integer> m1 = LinkedHashMap.of("a", 1, "b", 2);
LinkedHashMap<String, Integer> m2 = LinkedHashMap.of("a", 2, "b", 1);
System.out.println(m1.hashCode() == m2.hashCode()); // true

In this case, m1 and m2 should ideally produce different hash codes, but they don't. This is because the current implementation of Tuple#hashCode() simply sums the hash codes of all elements, which can result in identical hash codes for different tuples.

A potential solution is to apply the XOR (^) operator to the hash codes of all elements. XOR is order-sensitive, so it would resolve the issue in the example above, where the order of elements differs between tuples.

However, XOR has its limitations and is only a suitable solution for tuples with up to two elements. This is because XOR is a commutative and associative operation, meaning that the XOR of multiple elements can still result in rather bad collisions:

Tuple3<Integer, Integer, Integer> t1 = Tuple.of(1, 2, 3);
Tuple3<Integer, Integer, Integer> t2 = Tuple.of(1, 3, 2);
System.out.println(t1.hashCode() == t2.hashCode()); // true

The new implementation is not perfect as well:

HashMap<Integer, Integer> hm1 = HashMap.of(0, 1, 1, 2);
HashMap<Integer, Integer> hm2 = HashMap.of(0, 2, 1, 3);
System.out.println(hm1.hashCode() == hm2.hashCode()); // true

But it is imperfect in the same way as standard library:

java.util.HashMap<Integer, Integer> jhm1 = new java.util.HashMap<>();
jhm1.put(0, 1);
jhm1.put(1, 2);

java.util.HashMap<Integer, Integer> jhm2 = new java.util.HashMap<>();
jhm2.put(0, 2);
jhm2.put(1, 3);

System.out.println(jhm1.hashCode() == jhm2.hashCode()); // true

Related: #2733

LionelMeli · 2024-08-24T14:52:36Z

Hello @pivovarit

I don't think the issue is related to hashCode implementation of tuples.
When you try something like:
Tuple3<Integer, Integer, Integer> t1 = Tuple.of(1, 2, 3); Tuple3<Integer, Integer, Integer> t2 = Tuple.of(1, 3, 2); System.out.println(t1.hashCode() == t2.hashCode()); // false
it's ok (even with similar with Tuple2 type).

If you replace the hastCode implementation for the LinkedHashMap class from
return Collections.hashUnordered(this);
to
return Collections.hashOrdered(this); (at the end the same mechanism like java.util.Objects.hash(Object... values))
Apparently seems to solve the issue (however some test cases failed with this change like shouldEqualsIgnoreOrder())
Same change could be applied for vavr HashMap.

Hope this help you.
Regards.

pivovarit · 2024-08-24T16:37:39Z

I think there are two issues here:

the collision from the first post
inclusion of ordering into hashCode() for LinkedHashMap

In this change, I attempted the first one by adjusting the hashCode() implementation to align with Java's.

I'm not sure yet what to do with the second one.

Java's LinkedHashMap ignores the ordering:

public static void main(String[] args) {
    LinkedHashMap<String, Integer> m1 = new LinkedHashMap<>();
    m1.put("a", 1);
    m1.put("b", 2);
    LinkedHashMap<String, Integer> m2 = new LinkedHashMap<>();
    m2.put("b", 2);
    m2.put("a", 1);

    System.out.println("m1.hashCode() = " + m1.hashCode()); // 192
    System.out.println("m2.hashCode() = " + m2.hashCode()); // 192

    System.out.println("m1.equals(m2) = " + m1.equals(m2)); // true
}

LionelMeli · 2024-08-25T07:17:13Z

Thanks for your clarification.

pivovarit force-pushed the tuple-hashcode branch 3 times, most recently from 232d995 to eb1b388 Compare August 23, 2024 06:43

pivovarit changed the title Improve hashCode() implementation for Tuple* Improve hashCode() implementation for Tuple2 Aug 23, 2024

pivovarit mentioned this pull request Aug 23, 2024

poor implementation of hashCode of LinkedHashMap #2733

Open

pivovarit force-pushed the tuple-hashcode branch 3 times, most recently from 1ea8059 to 66b7794 Compare August 23, 2024 07:56

pivovarit marked this pull request as ready for review August 28, 2024 15:40

Improve hashCode() implementation for Tuple2

9d6e167

pivovarit force-pushed the tuple-hashcode branch from 66b7794 to 9d6e167 Compare August 28, 2024 15:41

pivovarit enabled auto-merge (squash) August 28, 2024 15:44

pivovarit merged commit b1ce608 into master Aug 28, 2024
4 checks passed

pivovarit deleted the tuple-hashcode branch August 28, 2024 15:46

pivovarit mentioned this pull request Aug 28, 2024

Ordered LinkedHashMap#hashCode() implementation #2806

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve hashCode() implementation for Tuple2 #2803

Improve hashCode() implementation for Tuple2 #2803

pivovarit commented Aug 22, 2024 •

edited

Loading

LionelMeli commented Aug 24, 2024

pivovarit commented Aug 24, 2024 •

edited

Loading

LionelMeli commented Aug 25, 2024

Improve hashCode() implementation for Tuple2 #2803

Improve hashCode() implementation for Tuple2 #2803

Conversation

pivovarit commented Aug 22, 2024 • edited Loading

LionelMeli commented Aug 24, 2024

pivovarit commented Aug 24, 2024 • edited Loading

LionelMeli commented Aug 25, 2024

pivovarit commented Aug 22, 2024 •

edited

Loading

pivovarit commented Aug 24, 2024 •

edited

Loading