Upstream from RelationalAI: Re-implement Salsa via new v2 API #17

ghost · 2020-06-02T19:42:13Z

These two PRs are upstreamed together in this PR from RelataionalAI:
Re-implement Salsa via new v2 API: a69bf5c
MiniSalsa Performance Improvements: 4b431eb

The bulk of the change is described in the first commit. I will reproduce those comments here:

Re-implement Salsa via new v2 API

This PR provides a new, simpler Salsa implementation.

This new package provides a simplified interface, partly due to requirements for the new backend we use in RelationalAI, and partly due to design choices we made better the second time around.

The Salsa package is split into a public interface and various Storage backends. It comes with a DefaultStorage backend which re-implements the existing open-source Salsa package from https://github.com/RelationalAI-oss/Salsa.jl.

Internally at RelationalAI, we built our own backend, which supports the extra requirements we have as a database, including persistence and multi-versioning.

We have changed from a field metaphor for how we represent Inputs to representing them via functions. This means that you can now no-longer reflect over a Salsa input (e.g. haskey) anywhere, even from outside derived functions. This simplifies the conceptual model of Salsa, I think. The user declares an input via @declare_input foo(...) which generates a getter, setter, and deleter for foo: foo, set_foo!, delete_foo!.

We have completely removed the concept of Components, so that any given Salsa runtime state can only have a single instance of any given Input. Inputs are still namespaced perfectly well, through julia's normal Module namespacing mechanism. I had previously imagined one might use Components for encapsulation, allowing you to repeat a set of inputs together in different contexts, but I now realize that this new setup more closely matches a normalized schema representation like one would have in Delve/datalog. (So if you wanted to support something like what i was originally imagining, all of those inputs could simply take an identifier key that represents the "encapsulated instance", e.g. classroom_id.)

Finally, I've added a couple small nitpicks that I noticed when comparing directly with v1 Salsa:

Fixed the spreadsheet example
Copied over and updated a bunch of tests to the new API. I originally didn't copy most of the tests, because i think the old salsa tests were a bit scattered and not systematic. But I think there are some good ones in there that i don't want to lose. I think i've gotten most of the good ones now.

@input

This PR provides a new, simpler Salsa implementation. This new package provides a simplified interface, partly due to requirements for the new backend we use in RelationalAI, and partly due to design choices we made better the second time around. The `Salsa` package is split into a public interface and various `Storage` backends. It comes with a `DefaultStorage` backend which re-implements the existing open-source Salsa package from https://github.com/RelationalAI-oss/Salsa.jl. Internally at RelationalAI, we built our own backend, which supports the extra requirements we have as a database, including persistence and multi-versioning. We have changed from a field metaphor for how we represent Inputs to representing them via functions. This means that you can now no-longer reflect over a Salsa input (e.g. haskey) _anywhere_, even from outside derived functions. This simplifies the conceptual model of Salsa, I think. The user declares an input via `@declare_input foo(...)` which generates a getter, setter, and deleter for foo: `foo, set_foo!, delete_foo!`. We have completely removed the concept of `Component`s, so that any given Salsa runtime state can only have a single instance of any given Input. Inputs are still _namespaced_ perfectly well, through julia's normal Module namespacing mechanism. I had previously imagined one might use Components for encapsulation, allowing you to repeat a set of inputs together in different contexts, but I now realize that this new setup more closely matches a normalized schema representation like one would have in Delve/datalog. (So if you wanted to support something like what i was originally imagining, all of those inputs could simply take an identifier key that represents the "encapsulated instance", e.g. `classroom_id`.) ------------------------------------ This PR also exercises the new Salsa by testing it on our simple `SpreadsheetApp` example in the Salsa package. Everything works including the Early Exit Optimization. Preliminary performance investigations indicate performance seems to be on par with the old Salsa. One of the cool aspects showcased in that app is fully-working Task Parallelism, where multiple threads might attempt to compute a value, and they can share results between them. Currently, if they start at the same time, they'll both compute the result and one will overwrite the other. -------------------------------------- The biggest change to the Salsa API itself is the changes to the `@input` macro, which has become `@declare_input`. I've left some of our design notes we wrote in the source code here in this PR comment, so i could delete them in the code: #= # NOTE: # Options we considered for @input syntax: X @declare_input mymap{Tuple{String,Int},String} # TOO WEIRD X @declare_input mymap :: InputMap{Tuple{String,Int},String} # Looks like a field, but isn't X @declare_input function mymap(x::String, y::Int):: String end # WEIRD: writing an empty method ✔︎ @declare_input mymap(x::String, y::Int) :: String # A little weird, b/c this isn't normal function declaration syntax, but we like it. ------ ❤ ------ mymap(component, x::String, y::Int) :: String set_mymap!(component, x::String, y::Int, v::String) ------ # TODO: Do we want to require users to put the `rt` parameter in the `@declare_input` macro, # just like they have to do for `@derived`? I feel like yes, we probably should? # TODO: Do we want to support varargs? Currently we don't, because we're putting the _value_ at # the end of the setter. We could support this if we put the value at the beginning but # it's not very intuitive. @declare_input mymap(args::String...) :: Int @declare_input mymap(args::Tuple{Vararg{String}}) :: Int ------ function mymap(component, x::String, y::Int) :: String #=...=# end # TODO: Question about whether to put v::String at end or beginning? function set_mymap!(component, x::String, y::Int, v::String) #=...=# end =# For those TODO questions, we decided that: 1. Yeah, we probably should require users to explicitly write the runtime parameter in the `@declare_input` functions, so that the calling signature is clear. Done. 2. No, we don't need to support varargs at all. it doesn't really make sense for an _input_ which is supposed to be a _value store_. If you want to store tuples, just add another input for the `::Tuple`. Done. Upstreamed from RelationalAI

@comnik

This PR contains various constant-factor performance improvements to the MiniSalsa package. The main conceptual improvements are the following: 1. Miscellaneous improvements to the MiniSalsa package that bring it up to feature parity with Salsa: debugging / tracing / etc. 1. We reduced allocations by sharing the vectors for tracing a functions dependencies in a "trace pool" and a freelist. This is needed because each derived function needs its own vector to track dependencies, and vectors have to be heap allocated, but allocs are expensive. We can avoid them by sharing them in a pool with a freelist. 2. We reduced allocations by making the new Salsa Tracing Runtime isbits. We need to generate a new _TracingRuntime per derived function call to support multithreaded task parallelism, so that different derived functions can each have their own vectors in which to record their dependencies, without clobbering each other. - To make them isbits, we have to remove the reference to the trace arrays. To do this, we use an integer index to refer to the trace vector (the int indexes into the pool, and is pushed and popped from the freelist). - Also, to make them isbits, we had to remove the pointer to the Storage, since the Storage object itself is mutable. To do this, we (in an ugly hack) just keep a `Ptr{}` to the storage, since we know that the storage will always survive past the lifetime of the derived functions (since it's referenced by the _TopLevelRuntime). 3. We made some other minor julia-related changes that made small allocations reductions, such as: - Removing return type annotations. E.g. removing `::Any` from `memoized_lookup()` made it 10x faster in a microbenchmark. - Changing from `lock(l) do ... end` lambdas to manual `lock(l) try ... finally unlock(l) end` blocks. Our best guess is that if the lambda is capturing local mutable variables in a closure, the lambda object ends up on the heap. Unsure why that isn't getting compiled away. 4. We realized that DerivedValues were not isbits since they contain an array, so we made reverted this back to what it was in old Salsa: we them mutable and now we edit them in-place, rather than constructing a new one for every derived function. - I can no longer remember if this was helpful or not... I think it should be fine either way, so it maybe makes sense to leave this how it was in old salsa, as we can always re-evaluate it in the future? ----------------------- Co-Authored-By: @comnik Upstreamed from RelationalAI

…orage} Use it to fix up the Spreadsheet example.

ghost · 2020-06-03T17:32:13Z

@comnik: There shouldn't be much to review here; this is just upstreaming the work we've already reviewed internally to the open source repo.

The main change is renaming everything back to Salsa and bumping Salsa to v2.0.0. :)

ghost · 2020-06-04T03:42:00Z

src/default_storage.jl

+    # We use one big dictionary for the inputs, storing them all together, to reduce
+    # allocating a new dictionary for every input.
+    # TODO: Do more performance investigation for the tradeoff between sharing a dictionary
+    # vs having separate dicts per method. It seems like the current decision is exactly
+    # opposite of what would be best: Since DerivedValues are not isbits, they will always
+    # be heap allocated, so there's no reason to strongly type their dict. But inputs can
+    # be isbits, so it's probably worth specailizing them.
+    inputs_map::Dict{Tuple{Type,Tuple},InputValue}
+    derived_function_maps::DerivedFunctionMapType


I've opened #18 for this TODO

ghost · 2020-06-04T03:46:59Z

src/default_storage.jl

+# TODO: I think we can @nospecialize the arguments for compiler performance?
+# TODO: It doesn't seem like this @nospecialize is working... It still seems to be compiling
+# a nospecialization for every argument type. :(


Filed #19 for the nospecialize comments throughout this PR

ghost · 2020-06-04T03:50:15Z

src/default_storage.jl

+                # TODO: Optimization idea:
+                #   - There's no reason to be tracing the Salsa functions during
+                #     the `still_valid` check, since we're not going to use them. We might
+                #     _do_ still want to keep the stack trace though for cycle detection and
+                #     error messages.
+                #   - We might want to consider keeping some toggle on the Trace object
+                #     itself, to allow us to skip recording the deps for this phase.


Opened #20 to track this TODO

Actually test that it is exiting early, as expected.

comnik

LGTM 👍 Goodbye MiniSalsa :)

README.md

test/runtests.jl

Co-authored-by: Nikolas Göbel <[email protected]>

ghost · 2020-06-06T17:44:35Z

Thanks for the review, Niko! :)

After merging #17, we've bumped to a new API. (I meant to bump this as part of that PR :) )

NHDaly added 4 commits June 2, 2020 15:14

Add Salsa.DefaultRuntime() alias for Runtime{EmptyContext,DefaultSt…

140c499

…orage} Use it to fix up the Spreadsheet example.

Julia v1.3 test support: revert newer threading syntax

58edb84

ghost self-assigned this Jun 2, 2020

ghost added enhancement New feature or request performance CPU, RAM, disk utilization and efficiency upstream labels Jun 2, 2020

Update README for Salsa v2.0.0

705f610

ghost requested a review from comnik June 3, 2020 17:31

Address TODO: Use delete_stored_cell_text! on clear.

7fffc66

ghost mentioned this pull request Jun 4, 2020

Investigate Performance: Tradeoff between a shared dict vs separate dict per method. #18

Open

ghost commented Jun 4, 2020

View reviewed changes

NHDaly added 4 commits June 4, 2020 00:05

Salsa tests: Improve "Early Exit Optimization" test.

abac87b

Actually test that it is exiting early, as expected.

Add missing error handling and key deletions tests to new API

3327820

Add test for multiple input methods

6215f47

Tests: rename test module to SalsaTest

b0a8ed6

comnik approved these changes Jun 4, 2020

View reviewed changes

README.md Show resolved Hide resolved

test/runtests.jl Outdated Show resolved Hide resolved

Update test/runtests.jl: remove whitespace

4c59c06

Co-authored-by: Nikolas Göbel <[email protected]>

ghost merged commit 6e4a211 into master Jun 6, 2020

ghost deleted the new-API-v2 branch June 6, 2020 17:44

NHDaly added a commit that referenced this pull request Jul 4, 2020

Bump Salsa to v2.0.0

6188235

After merging #17, we've bumped to a new API. (I meant to bump this as part of that PR :) )

NHDaly assigned NHDaly and unassigned NHDaly Dec 20, 2021

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upstream from RelationalAI: Re-implement Salsa via new v2 API #17

Upstream from RelationalAI: Re-implement Salsa via new v2 API #17

ghost commented Jun 2, 2020 •

edited by ghost

Loading

ghost commented Jun 3, 2020

ghost Jun 4, 2020

ghost Jun 4, 2020

ghost Jun 4, 2020

comnik left a comment

ghost commented Jun 6, 2020

Upstream from RelationalAI: Re-implement Salsa via new v2 API #17

Upstream from RelationalAI: Re-implement Salsa via new v2 API #17

Conversation

ghost commented Jun 2, 2020 • edited by ghost Loading

ghost commented Jun 3, 2020

ghost Jun 4, 2020

Choose a reason for hiding this comment

ghost Jun 4, 2020

Choose a reason for hiding this comment

ghost Jun 4, 2020

Choose a reason for hiding this comment

comnik left a comment

Choose a reason for hiding this comment

ghost commented Jun 6, 2020

ghost commented Jun 2, 2020 •

edited by ghost

Loading