Let FieldDef lazy-calculate and cache hash code #892

zhaih · 2023-04-05T17:02:29Z

Sometimes _dataSchema could be very complex and makes computing hash code for this class extremely slow. Since all data fields participated in hash code calculation is final, we should able to calculate it at construction time and avoid paying the cost repeatedly.

zackthehuman · 2023-04-05T22:35:06Z

Thanks for the contribution, Patrick! Based on the usage of FieldDef I think we should understand the implications and tradeoffs with how we compute the hashcode.

Here are a few things to consider:

FieldDef is not final, and can be extended. If we precompute the hashcode by calling hashCode() in the constructor, then we'll do it for all subclasses too, even if that isn't the desired behavior.
Caching the hashcode will improve performance as long as the same instance is reused -- do we know if that is the usage pattern? If we're creating new FieldDef or derived class instances frequently, then this caching and eager computation may actually negatively impact performance.
There are classes like Parameter which extend FieldDef, and the above points may apply here.

zhaih · 2023-04-05T23:02:56Z

Hi Zack, thank you for the comment:

For child classes, I think usually people either use super.hashCode() as part of the calculation or just do it from scratch, I don't think either way will be a true blocker
For our use case (and I believe for maybe most use cases using RestLi), the ActionRequestBuilder is the only place we're using it, and the key is get from a ResourceSpecImpl class which stores those FieldsDef as long as the service is running. I can't say that's for every use cases, but I do think the use case should be fairly normal.

Regarding the potential cost to the constructor, it can be solved by lazily initialize the _hashCode field, but maybe need some synchronization to avoid multiple concurrent write (but maybe that's acceptable since the hash code should not change?)

Let me know what you think, thanks!

zackthehuman · 2023-04-06T22:35:15Z

Let me know what you think, thanks!

I chatted with some folks about this internally and our recommendation is to compute the value lazily so that it is computed the first time hashCode() is called. Your point about synchronization is valid, but I'm convinced that it's not a huge problem since it should compute to the same value (as you mentioned).

How does this sound?

zhaih · 2023-04-10T05:34:05Z

Sure, I just changed it to compute lazily

zackthehuman

Change looks good to me. We should look out for any performance regressions due to auto-boxing/unboxing from Integer to int. If that happens, we can revisit this.

zackthehuman · 2023-04-10T21:44:25Z

data/src/main/java/com/linkedin/data/template/FieldDef.java

+      if (_hashCode == null) {
+          // If this method is called by multiple thread, there might be multiple concurrent write
+          // here, but since the hashCode should be the same it is tolerable
+          _hashCode = computeHashCode();
+      }
+      return _hashCode;


Leaving this comment here for my future self and/or other maintainers:

We compute the hash code of DataMap lazily using synchronize(this) and a form of double-checked locking:

public int dataComplexHashCode() { if (_dataComplexHashCode != 0) { return _dataComplexHashCode; } synchronized (this) { if (_dataComplexHashCode == 0) { _dataComplexHashCode = DataComplexHashCode.nextHashCode(); } } return _dataComplexHashCode; }

This approach avoids auto boxing/unboxing between Integer and int but makes the assumption that the computed hash code will never be 0. Doing something like this is also an option we can take in the future if we want to explore this more.

A similar but more general idea maybe just use another boolean to track whether the hashcode is computed. And better maybe using Atomics instead of synchronized block, like

public int hashCode() { if (_computed.get() == false) { int hashCode = computeHashCode(); if (_computed.get() == false) { _hashCode.compareAndSwap(INIT_VALUE, hashCode); _computed.set(true); } } return _hashCode.get(); }

FWIW here there's little guarantee the write will ever be visible to other threads. Have you considered making _hashCode at least volatile:

private volatile Integer _hashCode;

It preserves the original desired behavior of not wanting a full lock but at least makes it memory-safe.

Alternatively, why not just compute it at construction and make it final? Presumably instances of FieldDef aren't dynamically generated, they come from parsed .pdl and .pdsc schemas right? You could just eat the cost exactly once and forget about it, especially since it will happen very early on and likely completely out of the query serving codepath

Hi @PapaCharlie , thanks for chiming in:
For the flushing to memory problem: it's valid but since we'll not set it based on it's previous value, essentially all threads will set it to the same value if they cannot see it. So eventually that's not a safety issue but rather a performance issue (we compute it # of thread times rather than once). I like to compute it in ctor better (and I actually first implemented it in this way) but @zackthehuman is worrying about impacting the performance of the ctor so we changed it to lazy later on.

CHANGELOG.md

zackthehuman

Thanks for the long wait! Let's goooooo!

Sometimes _dataSchema could be very complex and makes computing hash code for this class extremely slow. Since all data fields participated in hash code calculation is final, we should able to calculate it at construction time and avoid paying the cost repeatedly. * Update FieldDef lazily calculate and cache hash code --------- Co-authored-by: Patrick Zhai <[email protected]>

zackthehuman added the performance For identifying performance problems or making improvements label Apr 10, 2023

zackthehuman approved these changes Apr 10, 2023

View reviewed changes

zackthehuman reviewed Apr 10, 2023

View reviewed changes

zhaih changed the title ~~Let FieldDef pre-calculate and cache hash code~~ Let FieldDef lazy-calculate and cache hash code Apr 12, 2023

zackthehuman requested changes Apr 13, 2023

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

zackthehuman approved these changes Apr 15, 2023

View reviewed changes

zhaih force-pushed the master branch from 910dc1f to 8bbabcd Compare September 13, 2023 23:18

Patrick Zhai added 3 commits November 2, 2023 15:34

Let FieldDef pre-calculate and cache hash code

ea85b32

Change to lazily compute

cba8487

update changelog and gradle.properties

784556a

zhaih force-pushed the master branch from ff78234 to 784556a Compare November 2, 2023 22:41

zackthehuman approved these changes Nov 2, 2023

View reviewed changes

zackthehuman merged commit 64faf7e into linkedin:master Nov 3, 2023
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let FieldDef lazy-calculate and cache hash code #892

Let FieldDef lazy-calculate and cache hash code #892

zhaih commented Apr 5, 2023

zackthehuman commented Apr 5, 2023

zhaih commented Apr 5, 2023

zackthehuman commented Apr 6, 2023

zhaih commented Apr 10, 2023

zackthehuman left a comment

zackthehuman Apr 10, 2023

zhaih Apr 10, 2023

PapaCharlie Nov 13, 2023

zhaih Nov 13, 2023

zackthehuman left a comment

Let FieldDef lazy-calculate and cache hash code #892

Let FieldDef lazy-calculate and cache hash code #892

Conversation

zhaih commented Apr 5, 2023

zackthehuman commented Apr 5, 2023

zhaih commented Apr 5, 2023

zackthehuman commented Apr 6, 2023

zhaih commented Apr 10, 2023

zackthehuman left a comment

Choose a reason for hiding this comment

zackthehuman Apr 10, 2023

Choose a reason for hiding this comment

zhaih Apr 10, 2023

Choose a reason for hiding this comment

PapaCharlie Nov 13, 2023

Choose a reason for hiding this comment

zhaih Nov 13, 2023

Choose a reason for hiding this comment

zackthehuman left a comment

Choose a reason for hiding this comment