-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kerko with a large database - Incremental sync? #26
Comments
Glad to hear that Kerko is of interest for your project. Regarding the size of your database, I think there might be a few issues:
I'll be happy to work on the above issues 2 & 3 when I get sufficient funding, but there's not much we can do about issue 1. It seems to me that your project requires that all 3 be addressed. I hope this helps! |
Thanks a lot for your quick and thorough reply. I will consult and see what happens, especially considering issue 1, which seems to be the bottleneck. |
Hi! I've been syncing and testing the app with a 200k database and the main issue I can see is with there being over 30k topics. This generates an index html file of about 20mb which of course is suboptimal. I am trying to see if there's a way to deactivate the facets (or at least the topics, since the other facets are fine). I have two questions about it.
Thanks for your work! |
Thanks very much for your work, @davidlesieur! |
@mgao6767 Very interesting! I have used Elasticsearch (and Solr as well) on other projects. However, I feel a separate search server is overkill for most Kerko projects. It would also make deploying Kerko much harder (many users are researchers who can get by with a Python stack, but might be deterred by infrastructure complexity). I'm considering Tantivy, which would get transparently installed along with Python packages. I have also thought about supporting multiple search engines, but that would require introducing more abstractions in Kerko's architecture. This is certainly doable, but I'm not entirely convinced the increased complexity would be worth the effort and maintenance burden. However, had an abstraction layer been built in the first place, it would now be easier to replace Whoosh... Nice dilemma! |
Thanks @davidlesieur ! I didn't know much about Tantivy etc but it seems really great. I'm no professional programmer. My current implementation is to use Whoosh to generate query, which is then parsed to ES query. Not very ideal but at least working. Perhaps just write adpaters for new engines and enable a config choice to use which one? Please do let me know how it goes with Tantivy. I'm happy to contribute as much as I can. |
@mgao6767 At the moment, I'm still looking for funding in order to get going with replacing Whoosh. |
Hi there,
first of all thanks for this amazing piece of software. For a project I am working in, we need to publish a relatively large zotero database and make it searchable. Kerko seems to be the best fit for the job, but apparently we may have to index up to 450k items. I am wondering if there are any issues you may envision deploying this with kerko. I've been syncing some libraries I have (about 5k items) and I see that this takes considerable time. I am assuming that syncing 450k probably would take weeks, which is in principle not a big deal, as long as future syncs are incremental. But I am unsure about this...
Looking to hear your opinion on this.
best regards,
The text was updated successfully, but these errors were encountered: