improve replace performance take 2 (this time I have confirmed substantial improvements in the profiler) #1051

jedwards1211 · 2018-09-11T01:26:33Z

@maxbrunsfeld I did some more digging on this. The way changes are communicated to the marker layer doesn't seem to be the main culprit in my profiling results; instead I'm seeing splice calls from within setTextInBufferRange taking the most time:

I haven't figured out why each splice operation is taking from 0.5 - 1 millsecond, but it means that if 60000 markers get replaced, the splices consume a combined total of around 45 seconds, which is indeed at least how long it took for the editor to respond (I stopped the profiler after about 40 seconds).

When I changed replace to use plain JS string operations and then called setTextInBufferRange once to replace the entire buffer text at the end, the same 60000 replacements took about 5.5 seconds, of which at least 4 seconds are probably mostly due to the destroy marker notifications.

Description of the Change

Instead of calling getTextInBufferRange and setTextInBufferRange for each individual replacement, I get the entire text of the buffer, build up the replacement buffer text with native JS string operations, and replace the entire buffer text with a single call to setTextInBufferRange at the end.

Alternate Designs

This is really just an experiment right now to contribute to the discussion of text-buffer performance

Benefits

In my profiling test case (replacing "foo" with "hello" in a buffer with 60000 lines of "foo bar baz qux")
The time to replace is reduced from over 45 seconds to 5.5 seconds.

Possible Drawbacks

Unfortunately, replacing the entire buffer text clears the selection, so I doubt you will want to merge this PR as-is.

Applicable Issues

#653

jedwards1211 · 2018-09-11T01:37:05Z

There was one test failure I haven't figured out. I do believe that the profiling results still give an accurate impression of what the speedup would be, though obviously I would need to fix the test failure if it's decided to approve this new approach.

jedwards1211 · 2018-09-11T01:42:13Z

Is there some kind of bug with the V8 profiler Atom is using? I don't understand why it's splitting up the one call to replaceAll into a dozen apparently separate calls in the timeline view.

maxbrunsfeld · 2018-09-11T03:29:24Z

Hey, nice job on this investigation @jedwards1211! This is a useful starting point.

When I changed replace to use plain JS string operations and then called setTextInBufferRange once to replace the entire buffer text at the end, the same 60000 replacements took about 5.5 seconds

I get the entire text of the buffer, build up the replacement buffer text with native JS string operations, and replace the entire buffer text

I think it's important that we use individual setTextInRange calls for two reasons:

We need to update cursors, selections, and other markers according to the deltas, rather than losing all of their positions.
Atom can deal with files that are too large to fit in JavaScript strings. We need to avoid algorithms that require manifesting the entire buffer's text as a String.

The problem is that the BufferSearch's search results are also stored as markers, and there is a cost associated with updating markers when editing the buffer. I think that what replaceAll should do is:

Get the ranges of every marker.
Destroy the results marker layer, so that it won't need to be updated during subsequent setTextInRange calls.
Iterate the ranges from step 1 in reverse, and update the buffer. Iterating the ranges in reverse is important so that as we update the buffer, the subsequent ranges don't become stale.
Recompute the marker layer by calling findAndMarkAllInRangeSync, because for a regex search, it's possible that replacing all the matches will create new matches.

☝️ All of this only applies to when replacing all markers though. When replacing just a single marker, we probably want to keep things as they are. Maybe we need to separate those two code paths instead of having them call a common method.

jedwards1211 · 2018-09-11T04:11:57Z

Yeah I figured there are good reasons we wouldn't want to operate on the whole buffer as plain strings.
Do you think anything could be gained from adding the ability to set text in multiple ranges with a single call to text-buffer? Especially considering that there might be many replacements within a given chunk of text (if my understanding of the basic structure of the text buffer is correct)

And any idea why splice is taking so long?

Also

Atom can deal with files that are too large to fit in JavaScript strings. We need to avoid algorithms that require manifesting the entire buffer's text as a String.

That's good...for such a file though, assuming Atom is also capable of handling too many markers to fit in memory, do you think it would make more sense to do the replacements first and rebuild the markers later?

jedwards1211 · 2018-10-08T17:02:12Z

@maxbrunsfeld sorry to bother again but do you have any idea why splice would take so long?

maxbrunsfeld · 2018-10-08T18:43:11Z

@jedwards1211 splice is expensive because it's called a huge number of times and each time, it's updating a huge number of markers - the markers that represent all the search results. But we don't need to update these markers because we're about to invalidate them all by replacing the text that they're marking. That's why I made the suggestion above.

jedwards1211 · 2018-10-08T19:01:34Z

Oh I see. What about other kinds of markers though -- I'm guessing error underlines are another marker layer that would slow down calls to splice even if the search results markers are destroyed?
And probably git diff plugins use markers as well?

jedwards1211 · 2018-10-08T19:11:01Z

For this reason I'm wondering if we can get even better performance by devising a way to pass multiple changes to the text buffer at once, so that we only have to recompute the markers for each chunk of text once, rather than once for each marker being replaced within that chunk.

maxbrunsfeld · 2018-10-08T19:32:56Z

Yes those features all use markers as well. It should be fine though; updating markers is already really efficient; in this case there’s just a huge amount of markers that don’t need to be updated.

jedwards1211 · 2018-10-08T19:42:44Z

Well it's a bit absurd to imagine, but if for whatever reason there were 60000 diff markers, it would cause the same performance issue as 60000 search result markers, right?
I'd like to eventually try to make a PR to solve this but I'd prefer to fix the problem once and for all...

refactor(replace): replace entire buffer text in one go

cdad28a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve replace performance take 2 (this time I have confirmed substantial improvements in the profiler) #1051

improve replace performance take 2 (this time I have confirmed substantial improvements in the profiler) #1051

jedwards1211 commented Sep 11, 2018 •

edited

Loading

jedwards1211 commented Sep 11, 2018 •

edited

Loading

jedwards1211 commented Sep 11, 2018

maxbrunsfeld commented Sep 11, 2018

jedwards1211 commented Sep 11, 2018

jedwards1211 commented Oct 8, 2018

maxbrunsfeld commented Oct 8, 2018 •

edited

Loading

jedwards1211 commented Oct 8, 2018

jedwards1211 commented Oct 8, 2018

maxbrunsfeld commented Oct 8, 2018

jedwards1211 commented Oct 8, 2018 •

edited

Loading

improve replace performance take 2 (this time I have confirmed substantial improvements in the profiler) #1051

Are you sure you want to change the base?

improve replace performance take 2 (this time I have confirmed substantial improvements in the profiler) #1051

Conversation

jedwards1211 commented Sep 11, 2018 • edited Loading

Description of the Change

Alternate Designs

Benefits

Possible Drawbacks

Applicable Issues

jedwards1211 commented Sep 11, 2018 • edited Loading

jedwards1211 commented Sep 11, 2018

maxbrunsfeld commented Sep 11, 2018

jedwards1211 commented Sep 11, 2018

jedwards1211 commented Oct 8, 2018

maxbrunsfeld commented Oct 8, 2018 • edited Loading

jedwards1211 commented Oct 8, 2018

jedwards1211 commented Oct 8, 2018

maxbrunsfeld commented Oct 8, 2018

jedwards1211 commented Oct 8, 2018 • edited Loading

jedwards1211 commented Sep 11, 2018 •

edited

Loading

jedwards1211 commented Sep 11, 2018 •

edited

Loading

maxbrunsfeld commented Oct 8, 2018 •

edited

Loading

jedwards1211 commented Oct 8, 2018 •

edited

Loading