-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Immediate arrays that are not scanned by the GC #38
Open
gasche
wants to merge
1
commit into
ocaml:master
Choose a base branch
from
gasche:immediate-array
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,222 @@ | ||
# A custom type of 'immediate arrays' not scanned by the GC | ||
|
||
In the last few days I have heard from several users who would like to | ||
define potentially-large arrays of integers, but are concerned about | ||
the (small) continued performance cost of having the GC scan those | ||
arrays in the major heap. | ||
|
||
My understanding is that this is a situation where: | ||
|
||
- the actual performance cost is probably very small, neglectible | ||
except for some very rare workflows | ||
|
||
- ... but the *perception of cost* is troubling our users and | ||
encouraging them to consider sub-par implementation choices | ||
(see the 'Alternatives' section below). | ||
|
||
I propose to solve this situation by adding specific support in the | ||
Array module for arrays of immediate values, at a *different type* | ||
than the usual type `'a array`, using a restricted submodule in | ||
a style reminiscent of Array.Float (but documented). | ||
|
||
|
||
## Proposed interface | ||
|
||
An `Array.MakeImmediate` functor with the following signature: | ||
|
||
```ocaml | ||
MakeImmediate : | ||
functor | ||
(Elt : sig type t [@@immediate] end) | ||
-> sig | ||
type elt = Elt.t | ||
type t (** arrays of immediate values *) | ||
|
||
val make : int -> elt -> t | ||
val length : t -> int | ||
|
||
val get : t -> int -> elt | ||
val unsafe_get : t -> int -> elt | ||
|
||
val set : t -> int -> elt -> unit | ||
val unsafe_set : t -> int -> elt -> unit | ||
end | ||
``` | ||
|
||
Proposed implementation: a custom block. | ||
|
||
The intention is to provide a minimal interface for a low-level | ||
module, rather than trying (like Float.Array) of copying the whole | ||
interface of Array. I would propose to restrict the scope of this | ||
submodule with this intention, but accept proposals/requests for | ||
extensions when they concern primitives that can be implemented | ||
significantly more efficiently in C -- for example `fill`, | ||
`blit` and `sub`. | ||
|
||
#### Design choice: a functor on immediate types | ||
|
||
This could be specialized to just an `IntArray` module, but I believe | ||
that a functor over any immediate type is significantly more | ||
convenient for end-users. In particular, this allows its usage on | ||
types such as | ||
|
||
```ocaml | ||
type id = Id of int [@@unboxed] | ||
``` | ||
|
||
which improve type-abstraction and safety. | ||
|
||
Note: one could consider a design where we accept an `immediate64` | ||
type as parameter, with an implementation that reverts to a standard | ||
array type on 32-bit machines. `immediate64` is obscure and inconvient | ||
to use from the stdlib, so I dropped the idea; we could provide an | ||
additional `MakeImmediate64` version in the future if we really wanted | ||
to. | ||
|
||
#### Design note: asking for an initial element | ||
|
||
We could skip the demand for an initial element and 0-initialize the | ||
array. I decided against this for two reasons: | ||
|
||
- zero-initialization is not substantially more efficient than our API, | ||
in particular because `memset` cannot be used (the OCaml | ||
representation of 0 is actually a full-word 1, which is not periodic | ||
at the byte level) | ||
|
||
- in the future we could have immediate types for which the value 0 is | ||
invalid, if we allow to "shift" the initial tag of a constructor. | ||
|
||
|
||
## Alternatives | ||
|
||
### Bigarrays | ||
|
||
One can use `(int, int_elt, c_layout) Bigarray.Array1.t` for this purpose. | ||
|
||
Downsides: | ||
|
||
- working with Bigarrays is a lot more code/ceremony (but this could be hidden | ||
by a dedicated library) | ||
|
||
- bigarray access is slightly slower due to performing an extra read | ||
to get the layout | ||
|
||
- it is harder to reason about the pacing of the GC, I would be | ||
slightly worried about allocating lots of small-to-medium immediate | ||
arrays with this implementation | ||
|
||
|
||
### Bytes | ||
|
||
One can use `Bytes.t` with the primitives `{get,set}_int64_ne` for access. | ||
|
||
Downsides: | ||
|
||
- getting an `Int64.t` is more cumbersome if we want to work with `int` values | ||
(again, a library layer can hide this) | ||
|
||
- The stdlib does not expose `unsafe` accessors for 64-wide reads. | ||
(a shortcoming we could fix) | ||
|
||
- Checked accesses are slower and generate more code due to the extra | ||
work to compute the "real" size of the string. | ||
|
||
|
||
### Fake arrays with an abstract tag | ||
|
||
Behold: | ||
|
||
```ocaml | ||
external unsafe_fill_bytes : Bytes.t -> offset:int -> len:int -> char -> unit = "caml_fill_bytes" | ||
|
||
let create_int_array size : int array = | ||
(* We are trying to create an [int array] that is not scanned by the GC. *) | ||
let arr = | ||
(Obj.obj (Obj.new_block Obj.abstract_tag size) : int array) | ||
in | ||
(* At this point the array elements are uninitialized memory, | ||
so they may be unsound as integers. Fill them with odd bytes | ||
to make them valid immediates. *) | ||
unsafe_fill_bytes (Obj.magic arr : Bytes.t) | ||
~offset:0 | ||
~len:(size * (Sys.word_size / 8)) | ||
'\001'; | ||
arr | ||
``` | ||
|
||
Downsides: | ||
|
||
- This uses a "fake array", something that we pretend to the runtime is an array | ||
but has a tag different from `0` or `Double_array_tag`. This currently works | ||
but I am unconvinced that this is a good idea. | ||
|
||
- Using `Abstract_tag` means that comparison, serialization and hashing are unsupported. | ||
This may be fine for some specialized use-cases, but is likely to come back to bite | ||
users in practice. | ||
|
||
- Exposing our array-of-immediate as a fake array severely restricts | ||
our implementation choices. For example it becomes impossible to | ||
switch to a `Custom_tag` block later, because their layout is | ||
incompatible with arrays -- they store non-element data in the first | ||
argument word. | ||
|
||
Upside: | ||
|
||
- Thanks to the use of a fake array, the whole Array module is available to users, | ||
including convenience functions (`for_all`, `map`, `iter2` and what not) and | ||
optimized runtime primitives (`blit`, `fill`, `concat` etc.) | ||
|
||
In this RFC I made the choice to give up on the convenience of reusing | ||
the `'a array` type and all its available functions and primitives, | ||
for the benefits of comparison/hashing/serialization and of avoiding | ||
introducing a non-standard representation in the runtime -- in | ||
comparison, custom blocks are precisely designed for this purpose. | ||
|
||
|
||
## Previous work | ||
|
||
### IntArray in Domainslib | ||
|
||
Domainslib is suffering from the sequential bottleneck of initializing | ||
large arrays, which hurts scalability. They have promising experiments | ||
with a specialized IntArray type that does not need initialization: | ||
|
||
- https://github.com/ocaml-multicore/domainslib/pull/7 | ||
- https://github.com/ocaml-multicore/domainslib/pull/10 | ||
|
||
Our proposed API does not help with this problem as it does not allow | ||
parallel initialization. We could consider exposing an unsafe creation | ||
primitive | ||
|
||
val unsafe_create_uninitialized : int -> t | ||
|
||
which does not initialize the array elements. This is unsafe, as | ||
reading an uninitialized element returns a type-unsound value that | ||
will crash the GC. | ||
|
||
Or: we could change the runtime to use a pool of worker threads to | ||
parallelize operations on large arrays, so that users don't have to do | ||
this. I would actually be more in favor of that approach. | ||
|
||
|
||
### `Obj.set_tag` to disable GC scanning in automated provers | ||
|
||
Some OCaml 4 codebase would use `Obj.set_tag` on arrays after the fact | ||
to disable GC scanning. This is explicitly mentioned in | ||
https://github.com/ocaml/ocaml/pull/1725 which deprecated `set_tag`; | ||
the alternative `Obj.with_tag` is less convincing (it requires | ||
allocating the array twice; above I used `Obj.new_block` instead). In | ||
that PR @stedolan explicit mentions that we could add builtin support | ||
for a `private int array` type of fake arrays using Abstract_tag. | ||
|
||
The use-case mentioned in that PR was in the Zipperposition prover: | ||
https://github.com/sneeuwballen/zipperposition/blob/2e6e100d5334d8e1e742432310171605305dee0e/src/core/Ordering.ml#L103 | ||
|
||
There are also instances of this trick in the Cubicle model-checker | ||
( https://github.com/cubicle-model-checker/cubicle/pull/5#issuecomment-1044548566 ), | ||
and probably some other automated theorem provers. | ||
|
||
Note: these users do not add Obj.set_tag just for fun, so they | ||
certainly *did* measure a noticeable performance benefit to doing | ||
so. (In the intro I said that the scanning by the GC probably does not | ||
matter; it apparently does for automated theorem provers.) |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lthls, @chambart pointed out that this is not true if the layout type of the bigarray is statically known. You get an extra indirection, but this should be okay. On the other hand, you might still run in GC-pacing issues if you have lots of small bigarrays.