Add support for capturing group in tag inclusion. #27

mgao6767 · 2024-09-10T10:40:17Z

Re-worked PR. It allows the use of capturing group when using regular expression to determine item inclusion by tags.
If tag_include_re contains a named capture group kerko, the capture will be used.

For example, in Zotero, a reference has tags of "-tag-Topic 1" and "-tag-Topic 2".

In config.toml, set tag_include_re to "-tag-(?P<kerko>.*)".
In the web, facets will be "Topic 1" and "Topic 2", instead of "-tag-Topic 1", "-tag-Topic 2".

If tag_include_re does not include a capture group of name "kerko", or if it does not include any capture groups, then the full tags will be used if matched.

In config.toml, set tag_include_re to "-tag-(?P<anotherName>.*)", or "-tag-".
In the web, facets will be "-tag-Topic 1", "-tag-Topic 2".

This ensures that we can have some tags in Zotero for private use while some others specifically for creating facets.

In the current implementation, every search opens/loads Whoosh index and initiates a new `Searcher` instance. This commit introduces a singleton to eliminate such behaviour, except when syncing attachments that requires the new index. The downside of this change is that after syncing, the Flask app needs to be restarted for the searcher to use the new index.

In the current implementation, every search opens/loads Whoosh index to retrieve last sync time. This commit caches the last sync time. The downside of this change is that after syncing, the Flask app needs to be restarted for the searcher to use the new index.

This reverts commit acfcb32.

When server has small memory and documents are too many, syncing may cause OOM. This change may reduce such possibility.

Caching papers for 24hrs and reloading searcher index every 12hrs in case the index has been updated.

Compute the facet html in Python instead of in Jinja template. When facets are nested and in hundreds/thousands, this change can reduce rendering speed by hundreds of milliseconds.

mgao6767 added 9 commits September 10, 2024 20:27

Add support for capturing group in tag inclusion.

d17f3d2

Cache config to improve performance

63a4d33

Optionally show processing time

208b200

Cache last_sync

a7bc9ff

In the current implementation, every search opens/loads Whoosh index to retrieve last sync time. This commit caches the last sync time. The downside of this change is that after syncing, the Flask app needs to be restarted for the searcher to use the new index.

Minor performance improvement

acfcb32

Revert "Minor performance improvement"

7c6c492

This reverts commit acfcb32.

Add caching

f58c9f8

Fix error when cache directory doesn't exist

713cc2c

mgao6767 force-pushed the main branch from d12fb75 to 713cc2c Compare September 24, 2024 11:21

mgao6767 added 8 commits September 24, 2024 21:39

Disable CSRF

05b4610

Allow using tags and regex to define flat facets

07446cd

Update selector labels

1135baf

Update syncing

c733ac4

When server has small memory and documents are too many, syncing may cause OOM. This change may reduce such possibility.

Updating caching

7b4e58b

Caching papers for 24hrs and reloading searcher index every 12hrs in case the index has been updated.

Improve rendering speed

7b6924d

Compute the facet html in Python instead of in Jinja template. When facets are nested and in hundreds/thousands, this change can reduce rendering speed by hundreds of milliseconds.

Remove inline style to reduce file size

366d318

Wait less time when syncing

f3e7ea4

mgao6767 closed this Sep 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for capturing group in tag inclusion. #27

Add support for capturing group in tag inclusion. #27

mgao6767 commented Sep 10, 2024

Add support for capturing group in tag inclusion. #27

Add support for capturing group in tag inclusion. #27

Conversation

mgao6767 commented Sep 10, 2024