Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for capturing group in tag inclusion. #27

Closed
wants to merge 17 commits into from

Conversation

mgao6767
Copy link
Contributor

Re-worked PR. It allows the use of capturing group when using regular expression to determine item inclusion by tags.
If tag_include_re contains a named capture group kerko, the capture will be used.

For example, in Zotero, a reference has tags of "-tag-Topic 1" and "-tag-Topic 2".

  • In config.toml, set tag_include_re to "-tag-(?P<kerko>.*)".
  • In the web, facets will be "Topic 1" and "Topic 2", instead of "-tag-Topic 1", "-tag-Topic 2".

If tag_include_re does not include a capture group of name "kerko", or if it does not include any capture groups, then the full tags will be used if matched.

  • In config.toml, set tag_include_re to "-tag-(?P<anotherName>.*)", or "-tag-".
  • In the web, facets will be "-tag-Topic 1", "-tag-Topic 2".

This ensures that we can have some tags in Zotero for private use while some others specifically for creating facets.

In the current implementation, every search opens/loads Whoosh index and initiates a new `Searcher` instance. This commit introduces a singleton to eliminate such behaviour, except when syncing  attachments that requires the new index. The downside of this change is that after syncing, the Flask app needs to be restarted for the searcher to use the new index.
In the current implementation, every search opens/loads Whoosh index to retrieve last sync time. This commit caches the last sync time. The downside of this change is that after syncing, the Flask app needs to be restarted for the searcher to use the new index.
When server has small memory and documents are too many, syncing may cause OOM. This change may reduce such possibility.
Caching papers for 24hrs and reloading searcher index every 12hrs in case the index has been updated.
Compute the facet html in Python instead of in Jinja template. When facets are nested and in hundreds/thousands, this change can reduce rendering speed by hundreds of milliseconds.
@mgao6767 mgao6767 closed this Sep 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant