Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GcRefCell.borrow_mut is too expensive #2773

Closed
tunz opened this issue Apr 2, 2023 · 8 comments
Closed

GcRefCell.borrow_mut is too expensive #2773

tunz opened this issue Apr 2, 2023 · 8 comments
Labels
E-Hard Hard difficulty problem gc Issue related to garbage collection help wanted Extra attention is needed performance Performance related changes and issues

Comments

@tunz
Copy link
Contributor

tunz commented Apr 2, 2023

When Object.borrow_mut() is called, boa tries to root/unroot all the properties. So, even if we just want to set a single element, boa accesses all the other elements. I see that this overhead is taking almost 90% of run time in QuickJS NavierStokes benchmark.

In the following example, running arr[0] = {}; is much slower when there are more elements.

% cat test.js
let arr = [];
for (let i = 0; i < 30000; i++)
  arr.push({});

let arr2 = [];

for (let i = 0; i < 30000; i++)
  arr[0] = {};
% time ./target/release/boa ./test.js
undefined
./target/release/boa ./test.js  11.04s user 0.02s system 99% cpu 11.146 total

% cat test2.js
let arr = [];
for (let i = 0; i < 30000; i++)
  arr.push({});

let arr2 = [];

for (let i = 0; i < 30000; i++)
  arr2[0] = {};
% time ./target/release/boa ./test2.js
undefined
./target/release/boa ./test2.js  6.57s user 0.01s system 98% cpu 6.655 total

I guess this design is for such case that we add a Gc<GcRefCell<T>> object to another Gc<GcRefCell<T>> object.

let v = Gc::new(GcRefCell::new(Vec::new()));
v.borrow_mut().push(Gc::new(GcRefCell::new(123)));

When 123 object is pushed to the v object, it should be unrooted because v is the root now. But, Vec.push does not know that information, and only the GcRefMut returned by borrow_mut knows that. So, what it does is just to root all the children when borrow starts, and unroot all the children again when borrow ends.

I'm not sure what's the best approach to fix this issue. I personally think we should not use Gc in another Gc object. I think we should allow Gc objects to keep GcPointer instead, and manage roots in a single place.

@jedel1043
Copy link
Member

jedel1043 commented Apr 2, 2023

Yeah, that's inherent of our current GC design. We have an open issue (#2631) about investigating other API designs for our GC, which should allow us to solve issues like this.

@jasonwilliams jasonwilliams added the performance Performance related changes and issues label Apr 2, 2023
@jasonwilliams
Copy link
Member

Hey @tunz thanks for raising this, that does indeed sound quite expensive so I’m glad you brought it up. I’d be interested in seeing if we can solve this somehow. I agree gc object within a gc object doesn’t sound ideal.

I think we should allow Gc objects to keep GcPointer instead, and manage roots in a single place.

could you elaborate on this idea a bit more? How would this work with objects that have multiple references?

@tunz
Copy link
Contributor Author

tunz commented Apr 7, 2023

It's not a clear idea yet. I was thinking of this scenario. First, make Gc object not traceable, and introduce another Gc-like traceable type GcSlot which is a pointer from one gc object to another gc object. So, the interface will be like:

#[derive(Trace, Finalize)]
struct Obj {
  GcSlot<i32> v,
}
let num = Gc::new(123);
let v = Gc::new(Obj { v: num.into() });

And, BoaGc has a vector of gc box pointers which is called roots. When Gc object is created, it adds the pointer of a created GcBox object to the roots. When Gc object is dropped, it removes the pointer from the roots (or decrease a reference count). When gc is triggered, it iterates the roots and marks all the referencing objects recursively. So, objects referenced by other multiple objects should be just handled by the mark and sweep process.

But, one thing I'm not sure is safety. The assumption of this approach is that we must create all the traceable objects with Gc wrapper. If a user creates a traceable object directly, then they might access freed objects. I'm not sure if we can allow defining a new type with GcSlot member, but restrict direct object allocation without Gc wrapper at the same time.

@jedel1043
Copy link
Member

Oh I think I got the gist of your idea: to have a Gc type that is Trace, then have a RootedGc type that is !Trace so that you cannot store it within Trace types, but that represents a rooted pointer to GC managed data, right? Then, we'd do conversions between Gc and RootedGc to root and unroot pointers.

I think it should be possible to implement a POC of this design, but as you've said, there's safety concerns about keeping Gc pointers in the stack, since that wouldn't count to the root list and it would cause UAF if the GC is triggered before accessing the pointer.

@jasonwilliams jasonwilliams added E-Hard Hard difficulty problem gc Issue related to garbage collection labels Apr 12, 2023
@jasonwilliams jasonwilliams added the help wanted Extra attention is needed label Jun 7, 2023
@tunz
Copy link
Contributor Author

tunz commented Jun 19, 2023

I did a quick research about other gc implementations written in Rust, and found that shredder has an interesting approach to find roots. It creates a Handle for each Gc, and manages a list of the handles. Then, it iterates and traces all the Gc heap objects, and mark all the children Gc handles as non root. Then, rest of the handles not makred as root are the roots.

I tried a quick prototype for this, but I'm stuck at understanding weak pointer implementation. Especially, the following line.

if !unreachables.strong.is_empty() || unreachables.weak.is_empty() {

Is this condition correct? It looks weird that it runs finalizers for weak objects when unreachables.weak is empty. But, changing it to !unreachables.weak.is_empty() breaks tests, so it's doing something.

@jedel1043
Copy link
Member

@tunz Yeah, that's definitely a bug. I opened #3049 which fixes it. With that, the GC algorithm should make a bit more sense!

tunz added a commit to tunz/boa that referenced this issue Jul 4, 2023
Attempt to address the issue boa-dev#2773.

The existing implementation had an expensive overhead of managing root
counts, especially for mutable borrow of GcRefCell.

Instead of managing the root counts, this change counts the number of
Gc/WeakGc handles located in Gc heap objects and total number of them.
Then, we can find whether there is a root by comparing those numbers.
tunz added a commit to tunz/boa that referenced this issue Jul 4, 2023
Attempt to address the issue boa-dev#2773.

The existing implementation had an expensive overhead of managing root
counts, especially for mutable borrow of GcRefCell.

Instead of managing the root counts, this change counts the number of
Gc/WeakGc handles located in Gc heap objects and total number of them.
Then, we can find whether there is a root by comparing those numbers.
@vultix
Copy link

vultix commented Jul 9, 2023

Here’s a blogpost with a novel rust Garbage Collection mechanism for finding roots: https://coredumped.dev/2022/04/11/implementing-a-safe-garbage-collector-in-rust/

@postmeback
Copy link
Contributor

I believe the reported issue is fixed. It should be good to close.

github-merge-queue bot pushed a commit that referenced this issue Jul 26, 2023
* Find roots when running GC

Attempt to address the issue #2773.

The existing implementation had an expensive overhead of managing root
counts, especially for mutable borrow of GcRefCell.

Instead of managing the root counts, this change counts the number of
Gc/WeakGc handles located in Gc heap objects and total number of them.
Then, we can find whether there is a root by comparing those numbers.

* Fix clippy errors

* Keep reference counts in Box

* Addressing comment

* Fix clippy errors

* Fix typo

* non_root_count includes mark bit

* give a space
@tunz tunz closed this as completed Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-Hard Hard difficulty problem gc Issue related to garbage collection help wanted Extra attention is needed performance Performance related changes and issues
Projects
None yet
Development

No branches or pull requests

5 participants