Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

Design Doc: IPRO Metadata Cache Lookup New Design

Jeff Kaufman edited this page Jan 6, 2017 · 1 revision

IPRO Metadata Cache Lookup New Design

Bolian Yin, 2013-01-03

The problem

Current IPRO Metadata cache key is composed without using context information other than the request URL. This does not work when we have multiple rewritten contents for the same URL. Currently, the HTML path supports multiple rewritten content by encoding the content variance in the cache key. The context variance could come from user agent (webp support, mobile/table/desktop, screen size, etc) and HTML context information (desired image size from CSS or tag attributes).

Proposed Change

First, Examples

The new test in ajax_rewrite_context_test.cc serves as a concrete example.

Here are MD cache key examples will be referenced in below discussion.

Resource URL

http://www.example.com/test.xyz (Assume this is actually a JPEG image)

KeyA: Contextless/IPRO MD key

rname/**aj**_<hash-of-rewrite-options>/http://www.example.com/test.xyz@@

This is the key used for the first/current MD cache lookup in the IPRO path. Here "aj" is the id for IPRO path (AjaxRewriteContext).

KeyB: MD key for user-agent-1 (without WebP support)

rname/**ic**_<hash-of-rewrite-options>/http://www.example.com/test.xyz@@

This is the MD key generated in the HTML path for user-agent-1. It points to the JPEG version of the rewritten image. Here "ic", is the image_rewrite_filter id.

KeyBB: HTTP cache key pointed by KeyB

http://www.example.com/test.xyz.pagespeed.a.ic.<hash>.**jpg**

KeyC: MD key for user-agent-2 (with WebP support)

rname/**ic**_<hash-of-rewrite-options>/http://www.example.com/test.jpg@@**w**

This is the MD key generated in the HTML path for user-agent-2, it points to the WebP version of the rewritten image. Here "ic", is the image_rewrite_filter id.

KeyCC: HTTP cache key pointed by KeyC

http://www.example.com/test.xyz.pagespeed.a.ic.<hash>.**webp**

Currently, the IPRO path works by ignoring any context related information. In the warm cache case, it always does one MD cache lookup with KeyA and the returned cache entry containss KeyBB regardless of user-agent-1 or user-agent-2 requests.

Walkthrough

This is a walkthrough of examples of the proposed change(and is how the test in the above CL works)

First Fetch with user-agent-1

MD cache lookup with KeyA fails. The resource is fetched and rewritten. When rewrite finishes, HTTP cache PUT with key KeyBB. MD cache PUT with key KeyA and the cache entry contains KeyBB and the nested MD key KeyB.

Second Fetch with user-agent-2

MD cache lookup hits with KeyA. From the cache entry, found HTTP cache key KeyBB stored in the first fetch. Using KeyBB, decide the resource is actually an image. Compute "correct" context key for user-agent-2 and get KeyC. Compare KeyC with nested key in cache entry, KeyB, and they mismatch. Do a second MD cache lookup with the “correct” key, KeyC and it misses the cache. Resource is rewritten again, this time for the webp version. When rewrite finishes, HTTP cache PUT with key KeyCC. MD cache PUT with key KeyA and cache entry contains KeyCC and nested MD key KeyC.

Fetch again with user-agent-2.

MD cache hits with KeyA and the nested key matches the computed "correct" key, both KeyC in this case. HTTP cache hits with KeyCC (contained in the MD cache entry), and rewritten content is served out of cache.

Fetch again with user-agent-1.

MD cache hits with KeyA. The nested key, KeyC, mismatches the computed "correct" key, KeyB. The second MD cache lookup hits with key KeyB and cache entry contains KeyBB. HTTP cache hits with KeyBB, and rewritten content is served out of cache.

Description of the change

The change is to make IPRO MD cache uses the same context encoding (except HTML context encoding) as in the HTML path. The key point is to store a nested partition key (keyB or keyC) in the MD cache result for IPRO. The nested key is the same key as it would be composed from the HTML path with relevant suffix encoding (without HTML context suffix encoding, of course) and it represents the actual rewritten content pointed by the IPRO MD cache key (KeyA).

Once in the IPRO path, the server will do the MD cache lookup as usual with KeyA. In case of a cache hit, an additional verification step is used to decide whether an overriding key is needed. If the overriding key is not needed, rewritten content will be served through the cached result. If the overriding key is needed, an additional MD cache lookup is issued with the overriding key and the returned cached result will be used for serving (or fetch/rewrite in case of a cache miss with the overriding key). Nothing is changed in case of the first MD cache miss.

The verification step checks if the nested key matches the "correct" key that would be generated by the underlying rewrite for the resource. The underlying rewrite type (image/js/css) could be decided from the extension of the url field in the first MD lookup result. We can trust the url for extension because it is generated by the server with the correct extension appended. If the nested key equals to the “correct” key, then no overriding key is needed and the first cached result is good to use. We expect this covers the majority cases for most IPRO urls unless requests to multiple rewritten contents spread more evenly. If the nested key is not the same as the “correct” key, the “correct” key is returned to override the nested key. At the end of the IPRO rewriting, the overriding key will be saved in the MD cache as the new nested key.

Cost of the solution

The second MD lookup is only needed when a different version is requested, and the cost of the second lookup will be amortized. For example, if there are two rewritten versions, x and y, for the same resource and the request pattern is 1 y among 9 xs in every 10 request, for example, xxxxxyxxxx. There are total 12 MD cache lookups in warm cache cases, and the average cost per request is 1.2.

Alternatives Considered

Store nested cached result instead of just nested key

This approach can avoid the second MD cache lookup, but is much more complicated to implement and maintain. For example, it is hard to update nested cached result, and it is hard to update the parent cached result once one or more of the nested results are changed. The (minor) gain does not justify the complexity.

Use property cache to do the first MD cache lookup

The benefit is improved performance by effectively moving the first cache lookup earlier. This requires a much bigger change and some mechanism (for example, property cache for IPRO may not be readily available to used). Also, we have more disagreements on this approach based on the email discussion.

Clone this wiki locally