Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to DuckDB #146

Open
wants to merge 44 commits into
base: master
Choose a base branch
from
Open

Switch to DuckDB #146

wants to merge 44 commits into from

Conversation

bmschmidt
Copy link
Member

Once merged, MySQL is done with. With bigrams restored, I think it's pretty close to being ready.

@organisciak
Copy link
Member

I trust you to do this merge, since you have the freshest understanding of the code. Perhaps loop in HTRC people like @borice?

How does DuckDB perform?

@bmschmidt
Copy link
Member Author

This is not yet completely ready for review, but close enough that I want to put it in tracking.

I'm still generally finding duckdb to work at, oh about 1.5x faster on standard queries on the Rate My Professor bookworm, and much faster on ingest. I just made a major change to the sort code though, by letting duckdb handle the word sorting (the stage that used to be index building in mysql, so often 6-12 hours.)

Duckdb has also just added forms of compression on numbers that drop the disk space requirements compared to MySQL significantly--rough guess, databases should be one-third the size they were with MySQL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants