How smart is readyset in deciding when to recalculate the cache? #10
-
Say I have a query: SELECT * FROM table1 ORDER BY score LIMIT 10 If I update the score of a row in table1, it may or may not change the result of the above query. Does readyset recalculate the cache of the above query every time table1 is changed? Or is it smart enough to figure out that the cache only needs to be recalculated if the top 10 ordering by score is changed? For example, if an update to table1 only moves the 14th-ranked record up to the 12th position, does readyset recalculate the query cache? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Nope! The updated row gets sent to the TopK node, and the TopK node realizes that the value is outside the TopK node and discards it without requiring an expensive TopK calculation. In general, we only have to recalculate the TopK state if an element that was previously in the TopK is deleted. @dfwilbanks395 can go into greater depth if need be :) |
Beta Was this translation helpful? Give feedback.
-
That's great! How is the TopK node implemented? Do you use some probabilistic data structure like count-min sketch? If you do, how do you deal with overcounting or other counting errors inherent in probabilistic data structures? If you don't, wouldn't that make the TopK node as big as table1 itself? That'd kinda negate the benefit of having a cache. |
Beta Was this translation helpful? Give feedback.
Nope! The updated row gets sent to the TopK node, and the TopK node realizes that the value is outside the TopK node and discards it without requiring an expensive TopK calculation. In general, we only have to recalculate the TopK state if an element that was previously in the TopK is deleted. @dfwilbanks395 can go into greater depth if need be :)