Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

improve replace performance take 2 (this time I have confirmed substantial improvements in the profiler) #1051

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jedwards1211
Copy link
Contributor

@jedwards1211 jedwards1211 commented Sep 11, 2018

@maxbrunsfeld I did some more digging on this. The way changes are communicated to the marker layer doesn't seem to be the main culprit in my profiling results; instead I'm seeing splice calls from within setTextInBufferRange taking the most time:
image

I haven't figured out why each splice operation is taking from 0.5 - 1 millsecond, but it means that if 60000 markers get replaced, the splices consume a combined total of around 45 seconds, which is indeed at least how long it took for the editor to respond (I stopped the profiler after about 40 seconds).

When I changed replace to use plain JS string operations and then called setTextInBufferRange once to replace the entire buffer text at the end, the same 60000 replacements took about 5.5 seconds, of which at least 4 seconds are probably mostly due to the destroy marker notifications.

image

Description of the Change

Instead of calling getTextInBufferRange and setTextInBufferRange for each individual replacement, I get the entire text of the buffer, build up the replacement buffer text with native JS string operations, and replace the entire buffer text with a single call to setTextInBufferRange at the end.

Alternate Designs

This is really just an experiment right now to contribute to the discussion of text-buffer performance

Benefits

In my profiling test case (replacing "foo" with "hello" in a buffer with 60000 lines of "foo bar baz qux")
The time to replace is reduced from over 45 seconds to 5.5 seconds.

Possible Drawbacks

Unfortunately, replacing the entire buffer text clears the selection, so I doubt you will want to merge this PR as-is.

Applicable Issues

#653

@jedwards1211
Copy link
Contributor Author

jedwards1211 commented Sep 11, 2018

There was one test failure I haven't figured out. I do believe that the profiling results still give an accurate impression of what the speedup would be, though obviously I would need to fix the test failure if it's decided to approve this new approach.

@jedwards1211
Copy link
Contributor Author

Is there some kind of bug with the V8 profiler Atom is using? I don't understand why it's splitting up the one call to replaceAll into a dozen apparently separate calls in the timeline view.

@maxbrunsfeld
Copy link
Contributor

Hey, nice job on this investigation @jedwards1211! This is a useful starting point.

When I changed replace to use plain JS string operations and then called setTextInBufferRange once to replace the entire buffer text at the end, the same 60000 replacements took about 5.5 seconds

I get the entire text of the buffer, build up the replacement buffer text with native JS string operations, and replace the entire buffer text

I think it's important that we use individual setTextInRange calls for two reasons:

  • We need to update cursors, selections, and other markers according to the deltas, rather than losing all of their positions.
  • Atom can deal with files that are too large to fit in JavaScript strings. We need to avoid algorithms that require manifesting the entire buffer's text as a String.

The problem is that the BufferSearch's search results are also stored as markers, and there is a cost associated with updating markers when editing the buffer. I think that what replaceAll should do is:

  1. Get the ranges of every marker.
  2. Destroy the results marker layer, so that it won't need to be updated during subsequent setTextInRange calls.
  3. Iterate the ranges from step 1 in reverse, and update the buffer. Iterating the ranges in reverse is important so that as we update the buffer, the subsequent ranges don't become stale.
  4. Recompute the marker layer by calling findAndMarkAllInRangeSync, because for a regex search, it's possible that replacing all the matches will create new matches.

☝️ All of this only applies to when replacing all markers though. When replacing just a single marker, we probably want to keep things as they are. Maybe we need to separate those two code paths instead of having them call a common method.

@jedwards1211
Copy link
Contributor Author

Yeah I figured there are good reasons we wouldn't want to operate on the whole buffer as plain strings.
Do you think anything could be gained from adding the ability to set text in multiple ranges with a single call to text-buffer? Especially considering that there might be many replacements within a given chunk of text (if my understanding of the basic structure of the text buffer is correct)

And any idea why splice is taking so long?

Also

Atom can deal with files that are too large to fit in JavaScript strings. We need to avoid algorithms that require manifesting the entire buffer's text as a String.

That's good...for such a file though, assuming Atom is also capable of handling too many markers to fit in memory, do you think it would make more sense to do the replacements first and rebuild the markers later?

@jedwards1211
Copy link
Contributor Author

@maxbrunsfeld sorry to bother again but do you have any idea why splice would take so long?

@maxbrunsfeld
Copy link
Contributor

maxbrunsfeld commented Oct 8, 2018

@jedwards1211 splice is expensive because it's called a huge number of times and each time, it's updating a huge number of markers - the markers that represent all the search results. But we don't need to update these markers because we're about to invalidate them all by replacing the text that they're marking. That's why I made the suggestion above.

@jedwards1211
Copy link
Contributor Author

Oh I see. What about other kinds of markers though -- I'm guessing error underlines are another marker layer that would slow down calls to splice even if the search results markers are destroyed?
And probably git diff plugins use markers as well?

@jedwards1211
Copy link
Contributor Author

For this reason I'm wondering if we can get even better performance by devising a way to pass multiple changes to the text buffer at once, so that we only have to recompute the markers for each chunk of text once, rather than once for each marker being replaced within that chunk.

@maxbrunsfeld
Copy link
Contributor

Yes those features all use markers as well. It should be fine though; updating markers is already really efficient; in this case there’s just a huge amount of markers that don’t need to be updated.

@jedwards1211
Copy link
Contributor Author

jedwards1211 commented Oct 8, 2018

Well it's a bit absurd to imagine, but if for whatever reason there were 60000 diff markers, it would cause the same performance issue as 60000 search result markers, right?
I'd like to eventually try to make a PR to solve this but I'd prefer to fix the problem once and for all...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants