-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support copying garbage collectors #56819
Comments
A couple of misc thoughts:
|
@Keno Right. APIs like Once the API issues like above are sorted out, we won't need to be concerned about the runtime part that is written in Julia, and this would encourage rewriting more parts of the runtime into Julia. Maintaining a C dialect and using it for runtime sounds inferior to using more Julia for runtime. |
I know this sounds a bit off the wall, but it might be a good idea if it were feasible to build Julia with Ravenbrook MPS. That's a low latency, incremental GC that has been around about 30 years, used to be commercial and was open sourced about 20 years ago. I haven't used it, but from scanning messages, I think in its default form it is precise within objects but conservative for objects used from C. It does copy but it automatically pins any objects referenced from the C stack. Source on github https://github.com/Ravenbrook/mps While the github readme only mentions intel targets, it's clear that it supports macs and ARM and in the past supported every architecture you can think of and might still be coaxed into working with them. It's being used in some commercial products and there's a CommonLisp interpreter that uses it that was used on a super computer project. |
This issue describes problems that we encountered during our attempts to port a copying garbage collector in MMTk to Julia.
By "copying collector", we mean a collector that can move objects during GC. Copying collectors offer several advantages, including reducing memory fragmentation and improving object locality, making them particularly valuable for long-running programs. Note that this issue does not propose any change to the existing Julia mark-sweep collector. With #55256, Julia allows different garbage collector implementations, and we are upstreaming the support for MMTk (https://github.com/mmtk/mmtk-core) as an alternative collector #56288. This issue discusses support required in the runtime to allow an alternative copying garbage collector in Julia (e.g. a copying collector from MMTK).
Most of the problems described in this issue have been discussed internally during the development process. It is helpful to document the problems and allow broader input from the Julia community to refine these ideas, validate the proposed solutions and identify any additional issues or improvements, which makes our upstreaming process smoother.
It is worth noting that our attempts mainly focused on adapting Julia's runtime to support a copying collector. While it is equally important to provide support for copying collectors in Julia's API and FFI, this aspect is not fully explored yet (and I hope we can get more feedback on it from this issue).
TL;DR
Julia's runtime allows direct and hidden references to heap objects, which poses a challenge for adopting a copying garbage collector. When objects are moved, these hidden references become invalid. We propose a set of remedies, including conservative scanning, pinning, and transitive pinning, to address this issue. With the remedies, we had a working copying collector that can move most objects (up to 99%) in our measurement. We also suggest introducing rules to regulate how the runtime accesses heap objects, ensuring compatibility with a copying collector.
1. Problem
Julia runtime/codegen holds references to heap objects without making all reference visible to the garbage collector.
Julia has an assumption that if objects are kept alive properly -- whether through rooting, permanent allocation, or other mechanisms -- the references to those objects remain valid. Thus the collector is not required to see every reference. The collector just needs to keep objects alive accordingly. This assumption holds for non-copying collectors, such as Julia's default mark-and-sweep collector, but it breaks down for copying collectors.
Copying collectors relocate objects during collection. When an object is moved, the collector must update all references it encounters to point to the new location. However, if the runtime or code generation fails to make certain references visible during collection, the collect cannot update those "hidden" references. As a result, these stale references continue to point to the object's old location, which may become corrupted when the memory is reused or zeroed out.
This issue manifests in several ways throughout Julia's runtime and code generation.
1.1. Rooted objects’ children are referenced without explicit rooting
Rooting via mechanisms like
JL_GC_PUSH
informs the collector that an object is a root and that the reference to it is actively used by the runtime. In this case, the collector ensures the root object and its reference are properly handled. However, when an object is rooted, the runtime assumes its children are also transitively rooted and directly references them without additional rooting. This assumption creates issues for a copying collector, as it lacks information about these "hidden" references and cannot update them if the referenced objects are moved.Example: Hidden child references
The following example illustrates this issue. The
codeinst
object is rooted and the collector knows it is being used and needs to be kept alive. But its child object,codeinst->inferred
, is accessed later by the runtime wtihout being explicitly rooted. Since the collector is not aware of this child reference, it cannot update it if the objectcodeinst->inferred
is relocated during GC. This behavior results in a stale reference.julia/src/aotcompile.cpp
Lines 440 to 456 in 7192df7
Example: C/C++ data structures storing hidden references
This issue also occurs when the Julia runtime stores references to Julia heap objects in C/C++ data structures and the data structures are not scanned by the collector. As long as an object is guaranteed to be rooted somewhere, the runtime can store references for later use. One example is
jl_native_code_desc_t
which stores object references in a vector. Another example isjl_uv_associate_julia_struct()
which stores object references to a libuv handle. Both cases result in references that cannot be updated if the corresponding objects are moved, leading to stale references.1.2. Global variables marked with
JL_GLOBALLY_ROOTED
may not be supplied to the collector as roots.This is similar to the issues above, but for global variables. Julia's collector only considers a subset of global variables that are marked as
JL_GLOBALLY_ROOTED
as actual GC roots, and assume these selected roots will transitively reach all the otherJL_GLOBALLY_ROOTED
variables.The following code snippet illustrates how only a subset of
JL_GLOBALLY_ROOTED
variables is supplied as roots to the collector:julia/src/gc-stock.c
Line 2761 in 7192df7
However, for example, the datatype objects listed in the link below are
JL_GLOBALLY_ROOTED
, but are not treated as roots for the collector.julia/src/julia.h
Line 874 in 7192df7
For a copying collector, the collector may move those datatype objects, leaving the global variables invalid.
1.3. Generated code may include references to heap objects.
Generated code may include pointer literals that directly point to heap objects. While Julia makes sure that the referenced objects are 'rooted', the actual pointer literals in the generated code are not made visible to the garbage collector. Thus these references are not updated during GC. We identified at least two methods that turns a heap reference into a pointer literal:
literal_pointer_val
andliteral_static_pointer_val
.2. Solution
The theoratically straightforward solution is to ensure every reference to a heap object in the runtime visible to the collector. Once these references are visible, the collector can update the references during GC, or pin the referenced object to ensure they are not relocated.
Our work so far mostly focuses on pinning objects if they are referenced by the runtime so they will not be copied. An alternative approach, which involves using handles to encapsulate references and allowing the collector to update them dynamically, is also a viable option. However, it would requires a more extensive refactoring on the code base that we have not been able to explore yet.
However, regardless of the chosen approach, the key challenge remains the same: identify the hidden references in the runtime. Though we describe about our approaches mostly with pinning below, some of the pinning uses can be replaced with a handle.
2.1. Approaches
The following are what we found effective so far. With these methods, we were able to achieve object movement rates of 60–99% in the same set of benchmarks.
Opportunistic copying
Unless we use handles and do not use pinning in the entire code base, we cannot use a full copying collector in Julia, such as semi space collector, or collector with a full copying nursery (all nursery objects will be copied and evacuated from the nursery space). The collector needs to support pinning, and only copy unpinned objects while leaving pinned objects in place.
Hidden references on the stack: conservative scanning
We do conservative scanning for Julia stacks (not shadow stacks) and registers. If any potential references to heap objects are identified, those objects are pinned to ensure they are not moved during GC.
This approach is different from traditional conservative stack scanning, where potential references are treated as roots and a transitive closure is performed from those roots, which may potentially cause false retention. Instead, we still use Julia's precise root identification mechanism, and only pin potential references from conservative scanning without treating them as roots.
We must identify the base references for internal references found on the stack or in registers and ensure that the corresponding objects are properly pinned.
Hidden references in the code
Objects referenced directly in generated code are pinned to prevent invalid references during garbage collection. This includes references created through functions such as
literal_pointer_val
andliteral_static_pointer_val
.Hidden references in
JL_GLOBALLY_ROOTED
All global variables marked with
JL_GLOBALLY_ROOTED
need to be supplied to the garbage collector, as opposed to only a subset being supplied for Julia's stock collector. Additionally, we ensure that the collector does not move these objects, as relocating them would invalidate references.Hidden references in native globals and native heap
We identify runtime types (structs or classes) that 1. are stored in native global memory or native heap memory, and 2. include references to Julia heap objects. For these types, any references to Julia heap objects must be pinned when stored. This ensures the collector will not move the objects that these references depend on.
Some runtime types that are frequently pinned or always require pinning include
jl_method_instance_t
,jl_code_instance_t
,jl_datatype_t
. For these types, it is reasonable to allocate them using non-moving allocation, if the collector supports it.Additionally, if a Julia value is referenced in scm, it must be pinned.
GC.@preserve
The
GC.@preserve
macro makes sure the object and all derived pointers from it are valid. To achieve this, objects marked withGC.@preserve
must be transitively pinned so the object and its children objects will not be moved by the collector.However, as we discussed earlier, transitive pinning may lead to a large number of objects being pinned, depending on the number of objects are reachable from the preserved object. This imposes a risk, as abusing the use of
GC.@preserve
may reduce the effectiveness of the copying collector.Handling hashed objects
Julia computes hash codes for objects based on their memory addresses. If the object is copied, its address changes, resulting in an inconsistent hash code. To avoid this, we pin objects that get hashed so its address will not be changed. An alternative approach is to allow an inflated header to record the old address when an object gets moved by the colletor.
2.2. Performance
We had some preliminary performance results with the above approaches with JuliaCI's GC benchmarks. As the benchmarks are relatively simple, and only allocating one or two kinds of objects, they do not cause much fragmentation, thus the performance improvement for a copying collector is hardly seen with those benchmarks. We will try get more performance data on more realistic workloads. With the high ratio of objects that can be moved, we are optimismistic about performance improvement.
About the cost of the approaches,
2.3. Enforcing rules for using heap references in the runtime
We identified and resolved the above issues in Julia
v1.9.2
. We assume new violations may arise as we attempt to upstream our changes and adapt to the most recent commits. Additionally, developers may inadvertently introduce new runtime code that violates the above rules for safe heap reference usage in the context of a copying collector. Without establishing clear and enforceable rules, identifying and fixing breaches could turn into an endless chase in the evolving codebase.To address this, we suggest two potential approaches that align well with Julia:
Using a Clang static analyzer
Julia already uses a Clang static analyzer (
GCChecker
) to examine unrooted objects. It is reasonable to introduce another static analyzer to verify that heap references are safe for a copying collector. The rules for the new static analyzer need to be further discussed. We don't have a workable solution for this approach yet. Our failed attempt was aimed to identify all the unrooted references, which turned out to be impractical. But if we refine and limit the rules (e.g. allow direct heap references in local scopes, as we will discuss below), this could be possible.Introducing new reference types
The issues primarily arise from the direct use of heap references. To mitigate this, we can forbid or limit the uses of direct heap references and introduce new reference types that are safe for runtime use.
jl_value_t*
and cast into the pinned ref type. The compiler's type checks would ensure that only pinned references are used where required, making pinning explicit and enforceable.We could combine these two types to accommodate different use cases.
We may also allow direct heap references in limited cases, such as within a local scope (e.g., stack or register). In these scenarios, we can rely on the garbage collector's conservative scanning which ensure that any heap references on the stack or registers are identified and pinned. Under this relaxed rule, only references stored in code, global variables, or heap-allocated structures would need to use a pinned reference or a handle.
This issue aims to document the challenges we encountered and discussed internally during development, share our practical solutions, and encourage validation, feedback and suggestions from the wider Julia community to help us upstream these changes and build a more robust foundation for adopting a copying collector in Julia.
The text was updated successfully, but these errors were encountered: