Use consistent hash function when hashing cache field #1314
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Object.hash does not result in the same values across ruby invocations.
Noticed this when debugging a caching issue across app restarts. The first request after a restart would not use the cached values. On investigation I found the
String#hash
method does not behave as stated "Returns a hash based on the string’s length, content and encoding.". Following the link to Object#hash we see "The hash value for an object may not be identical across invocations or implementations of Ruby. If you need a stable identifier across Ruby invocations and implementations you will need to generate one with a custom method."This should result in a lot fewer cache misses across processes and result in much more efficient memory usage for the cache servers.
My one reservation with this is the MD5, or SHA1, hexdigest is about an order of magnitude slower than the current Object#hash.
We could consider the xxhash library. It's closer to the Object#hash efficiency (about 1/2 the speed for 64bit hashes). But it's one more dependency. This is of course just the default and users are free to switch, but I'd like to choose the best overall option since I don't think most users will bother to change it (since no one else has noticed the issue to begin with).
To speed up the most common condition where the
updated_at
field is used as the cache_key this uses theto_f
method to get the number of seconds since the Unix Epoch, including fractions. This is almost as fast as.hash
, and much faster thanto_r
(nanoseconds since Epoch).Any thoughts?
All Submissions:
New Feature Submissions:
Bug fixes and Changes to Core Features:
Test Plan:
Reviewer Checklist: