Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance #437

Merged
merged 19 commits into from
Aug 1, 2023
Merged

Improve performance #437

merged 19 commits into from
Aug 1, 2023

Conversation

dometto
Copy link
Member

@dometto dometto commented Jan 13, 2023

In response to gollum/gollum#1940 (comment) I did some profiling and saw that a lot of time was being spent in the Gollum::File.find, iterating over entries in the treemap. Basically, gollum currently iterates over a Hash representation of all the blobs and trees in the repo (at a given commit). In a large repo, this means looping a long time (plus all the inefficiencies derived from instantiating objects associated with each entry in the repo). I decided to hack together another approach that does not rely on a treemap, and it was relatively straightforward to do.

Here are some quick profiling results, loading a page from the nlab-conten wiki referred to by @felixwellen:

  • Rugged:
    *master: 26653ms
    • this branch: 784ms
  • RJGit
    • master: 27704ms
    • this branch: 2279ms

(I used https://github.com/MiniProfiler/rack-mini-profiler to profile)

On a smaller repo (our lotr.git), tested with rugged, the speed increase is not noticeable (but I don't think it's any slower either ;)).

Feedback welcome!

Todo:

  • Fix symlinks (requires work on the adapters)
  • Implement global lookup

lib/gollum-lib/file.rb Outdated Show resolved Hide resolved
lib/gollum-lib/file.rb Outdated Show resolved Hide resolved
lib/gollum-lib/file.rb Outdated Show resolved Hide resolved
Gemfile Outdated Show resolved Hide resolved
@dometto
Copy link
Member Author

dometto commented Jan 24, 2023

I've written a small script that benchmarks the performance of reading (getting formatted_data) and writing pages on a) a wiki with many commits and b) a wiki with many blobs and trees. See gist.

Results:

On master

===== Repo with many trees and blobs
Reading pages
                 user     system      total        real
            20.933988   0.359543  21.293531 ( 22.686614)
Writing pages
                 user     system      total        real
             1.825870   0.224374   2.050244 (  2.227173)
===== Repo with many commits
Reading pages
                 user     system      total        real
             0.347461   0.014023   0.361484 (  0.380127)
Writing pages
                 user     system      total        real
             0.318842   0.433577   0.752419 (  1.229406)

On this branch

===== Repo with many trees and blobs
Reading pages
                 user     system      total        real
             0.124437   0.027632   0.152069 (  0.183316)
Writing pages
                 user     system      total        real
             1.842885   0.350508   2.193393 (  2.913138)
===== Repo with many commits
Reading pages
                 user     system      total        real
             0.303247   0.027656   0.330903 (  0.344811)
Writing pages
                 user     system      total        real
             0.266384   0.352682   0.619066 (  0.777089)

@dometto
Copy link
Member Author

dometto commented Jan 24, 2023

As we can see the changes on this branch improve performance drastically, except for the case of writing files, for some reason. ☹️ I didn't think I touched any code related to the latter, but: since the time increase for writing pages is only small (keep in mind the benchmark runs the write for two pages 5 times), I think the benefits clearly outweigh the damage here. :)

@dometto dometto closed this Mar 28, 2023
@dometto dometto reopened this Mar 28, 2023
Gemfile Outdated Show resolved Hide resolved
lib/gollum-lib/file.rb Show resolved Hide resolved
lib/gollum-lib/file.rb Show resolved Hide resolved
lib/gollum-lib/file.rb Outdated Show resolved Hide resolved
lib/gollum-lib/file.rb Outdated Show resolved Hide resolved
lib/gollum-lib/file.rb Show resolved Hide resolved
lib/gollum-lib/file.rb Outdated Show resolved Hide resolved
lib/gollum-lib/file.rb Show resolved Hide resolved
lib/gollum-lib/git_access.rb Show resolved Hide resolved
lib/gollum-lib/page.rb Outdated Show resolved Hide resolved
Copy link
Member

@bartkamphorst bartkamphorst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sound. Great speedups!

@dometto dometto merged commit ef19cba into gollum:6.x Aug 1, 2023
4 checks passed
@dometto dometto deleted the improve_performance branch August 1, 2023 14:04
dometto added a commit that referenced this pull request Aug 1, 2023
* Don't use treemap for finding pages and file
* Reimplement global find
* Ensure File.canonical_pat prevents path traversal. Add test.
* GitAccess: improve #tree! performance
* Expect file mode to be an Integer
* Refactor Page and File to use Pathname
* file.rb: remove unneeded version.to_s call (see gollum/gollum#1972)
* Provide compatibility with latest minitest.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants