Refactor search sync to try to reduce memory usage and hash ids #662

alfredgrip · 2024-12-20T13:12:08Z

Title speaks for itself.

Idea is to divide data in batches and send it one at a time, this way the garbage collector should be able to drop objects since they go out of scope.

Current batch size is 1000, which I think should be fine.

Also hashes ids which should be a (temporary) fix to #661

danieladugyan · 2024-12-30T20:58:13Z

There's a lot of code in sync.ts and searchTypes.ts , to be honest it's a bit hard for me to grasp it all😅

src/lib/search/searchHelpers.ts

src/lib/search/sync.ts

src/lib/search/searchHelpers.ts

alfredgrip · 2024-12-31T13:17:45Z

There's a lot of code in sync.ts and searchTypes.ts , to be honest it's a bit hard for me to grasp it all😅

Yeah searchTypes.ts is a lot...
Essentially, for every index in Meili there is a type for:

What attributes is stored in Meili for that index
Which attributes can a user perform a search after
Which attributes are returned

Of course, all attributes that can be searched on, or are returned, must be stored in Meilisearch. It not as simple as doing a union of 2. and 3. to get 1. however, since some attributes (likestartDatetime for events) are used purely for sorting and ranking purposes internally by Meili.

Then there are objects like const memberMeilisearchConstants: MemberConstantsMeilisearch = {... which wraps things related to an index in a single object. Here we can specify custom ranking and sorting rules for Meili, such as giving newer members a higher ranking, and tweak which typo tolerance is allowed.

I know that the file is full of types, but it is to prevent us developers from accidentally trying to e.g. do custom ranking rules on an attribute that isn't even stored in Meilisearch.

As for sync.ts, it's basically just dump the data and attributes defined in searchTypes.ts to Meilisearch, but do so in batches. When all the data is dumped, tweak the rankings based on the values and types also defined in searchTypes.ts.

danieladugyan · 2025-01-06T14:05:57Z

There's a lot of code in sync.ts and searchTypes.ts , to be honest it's a bit hard for me to grasp it all😅

Yeah searchTypes.ts is a lot... Essentially, for every index in Meili there is a type for:

What attributes is stored in Meili for that index

Which attributes can a user perform a search after

Which attributes are returned

Of course, all attributes that can be searched on, or are returned, must be stored in Meilisearch. It not as simple as doing a union of 2. and 3. to get 1. however, since some attributes (likestartDatetime for events) are used purely for sorting and ranking purposes internally by Meili.

Then there are objects like const memberMeilisearchConstants: MemberConstantsMeilisearch = {... which wraps things related to an index in a single object. Here we can specify custom ranking and sorting rules for Meili, such as giving newer members a higher ranking, and tweak which typo tolerance is allowed.

I know that the file is full of types, but it is to prevent us developers from accidentally trying to e.g. do custom ranking rules on an attribute that isn't even stored in Meilisearch.

As for sync.ts, it's basically just dump the data and attributes defined in searchTypes.ts to Meilisearch, but do so in batches. When all the data is dumped, tweak the rankings based on the values and types also defined in searchTypes.ts.

Thanks! I added that as a comment to the top of the file since it helped a lot.

github-actions bot assigned alfredgrip Dec 20, 2024

alfredgrip requested a review from Isak-Kallini December 20, 2024 13:12

alfredgrip changed the title ~~Refactor search sync to try to reduce memory usage~~ Refactor search sync to try to reduce memory usage and hash ids Dec 20, 2024

alfredgrip mentioned this pull request Dec 20, 2024

Duplicate data in search #661

Open

alfredgrip force-pushed the search-sync-fixes branch 2 times, most recently from 57d2ecb to 7371b4c Compare December 20, 2024 14:39

alfredgrip added 2 commits December 30, 2024 21:32

refactor to reduce memory usage

a0eb3f0

hash ids to make them meilisearch compatible

8b84adb

danieladugyan force-pushed the search-sync-fixes branch from 7371b4c to 8b84adb Compare December 30, 2024 20:33

Replace MD5 hashing with base64 encoding

58c41a4

danieladugyan requested review from danieladugyan and removed request for Isak-Kallini December 30, 2024 20:56

danieladugyan reviewed Dec 30, 2024

View reviewed changes

src/lib/search/searchHelpers.ts Outdated Show resolved Hide resolved

src/lib/search/sync.ts Outdated Show resolved Hide resolved

src/lib/search/searchHelpers.ts Show resolved Hide resolved

alfredgrip and others added 2 commits January 2, 2025 16:18

refactor to reduce duplication

9266eab

Add comment explaining searchTypes

30b6164

danieladugyan merged commit 20a0874 into main Jan 6, 2025
3 checks passed

danieladugyan deleted the search-sync-fixes branch January 6, 2025 14:07

alfredgrip mentioned this pull request Jan 14, 2025

Search not finding any results #652

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor search sync to try to reduce memory usage and hash ids #662

Refactor search sync to try to reduce memory usage and hash ids #662

alfredgrip commented Dec 20, 2024 •

edited

Loading

danieladugyan commented Dec 30, 2024

alfredgrip commented Dec 31, 2024

danieladugyan commented Jan 6, 2025

Refactor search sync to try to reduce memory usage and hash ids #662

Refactor search sync to try to reduce memory usage and hash ids #662

Conversation

alfredgrip commented Dec 20, 2024 • edited Loading

danieladugyan commented Dec 30, 2024

alfredgrip commented Dec 31, 2024

danieladugyan commented Jan 6, 2025

alfredgrip commented Dec 20, 2024 •

edited

Loading