Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gifabol - Caching for airgapped solutions #10736

Open
mattkrick opened this issue Jan 24, 2025 · 0 comments
Open

Gifabol - Caching for airgapped solutions #10736

mattkrick opened this issue Jan 24, 2025 · 0 comments

Comments

@mattkrick
Copy link
Member

for all deploys, it'd be nice to have the images hosted on our platform vs. hosted on tenor.
for airgapped solutions, we need to create a solution where folks can search.

so, for prod:

  • when a query of '' comes in & we fetch the featured gifs, we need to save those gifs to S3 as well as write them to a table
  • we can write them to a bucket subdir so instead of store or build, we'll have gifabol
  • In PG we need 4 tables: GifabolGif, GifabolTag GifabolGifTag, GifabolQueryCache. URL has id, description, urlNano, urlTiny, urlOriginal. GifabolTag is id tag TEXT UNIQUE. GifTag is a cross table with a compound PK.
  • It may also be advantageous to cache the search results without resorting to tags. for example, if someone searches for "food" then we'd have a table with query, startCursor, endCursor, result, cachedAt. The result would be a TEXT[] of the GifabolGif table IDs. This gets tricky with pagination since we don't have cursors, just a next string for the request which will be the endCursor for that batch and the startCursor for the next batch. alternatively, we could denormalize it to query, gifId, rank, endCursor. this would make it easy to read. to write, we'd need to know how to create rank. rank would be the order of results as they come back from tenor, e.g. 1-20 if there was no start cursor. we'd make the endCursor the after value. that way, we adding new values for a particular query, we'd query select * from GifabolQueryCache where endCursor = $after order by rank desc limit 1. where $after is the value that came in via graphql (after is the start cursor, next is the end cursor). we can even index on endCursor where it is not null & only put the cursor on the one with the biggest rank. when a query comes in & overwrites it, it won't push the items down, it'll just overwrite the first page. that way could still query for the first n items in 1 query. when the 2nd page results come in it'll go right after first. if a 3rd never comes in, then the ranks will still hold true. there may be dupes, but who cares if it's in the later pages.
  • if the query cache is empty, then we'll need to search by tag. first, we'll search by exact match. then, we'll search by prefix. foo% => food.

for all deploys, if we don't like URLs pointing to tenor:

  • we can't write to S3 faster than we can send a URL to the client, so that means the client is going to get a tenor URL.
  • when they pic a gif, we can upload that to our own S3 like we do for embedded URLs. embedUserAsset is gonna check out the size, verify that it's a picture, and store it under the User subdir in S3. Ideally, we would store it in the gifabol subdir. By the time they make a selection, it might already be there. if there's a deterministic way to go from the tenor URL to our S3 url, then we can just use that without an extra server call. basically see if the URL starts with the CDN_BASE_URL. if it does, use that. if not, then convert it by using the ID of the gif and the requested size.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant