Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What does "compile" mean for ReverseDiff? #91

Open
gdalle opened this issue Nov 6, 2024 · 3 comments · May be fixed by #92
Open

What does "compile" mean for ReverseDiff? #91

gdalle opened this issue Nov 6, 2024 · 3 comments · May be fixed by #92

Comments

@gdalle
Copy link
Collaborator

gdalle commented Nov 6, 2024

Following discussions with the Turing folks, it seems that there are (at least) two conflicting interpretations of the word "compile" when it comes to ReverseDiff and its tape API. To clarify, I will use the terminology from the ReverseDiff docs and sepatate two concepts:

  • the recording of a new tape based on a function execution (using e.g. ReverseDiff.GradientTape)
  • the compilation of an existing tape (using ReverseDiff.compile)

In LogDensityProblems and Turing, if I understand correctly, passing compile = true means "record a tape and promise that it is safe to reuse later". The tape is then always compiled for increased speed.
In DifferentiationInterface on the other hand, passing compile = true means "when you record a tape during preparation, compile it for increased speed". A tape is always recorded during preparation (which is arguably not ideal for functions with value-dependent control flow).

I think we need to settle on a unified standard, and the right way to do that would be in ADTypes. Here are my propositions:

  1. Clarify what the compile argument to AutoReverseDiff means at the moment: what is the interpretation we agree upon?
  2. Add more details to the struct, something like AutoReverseDiff(tape=true/false, compile=true/false) with defaults corresponding to the common interpretation in 1.

What do you think?


Related:

@ChrisRackauckas
Copy link
Member

Clarify what the compile argument to AutoReverseDiff means at the moment: what is the interpretation we agree upon?

It means the latter.

Add more details to the struct, something like AutoReverseDiff(tape=true/false, compile=true/false) with defaults corresponding to the common interpretation in 1.

What's a case for the other interpretation? If you have value-dependent control flow, you'd also have to grab a new tape. I guess in theory you could want to reuse the tape but never JIT compile? But if you know you're going to be reusing the tape, why would you not be compiling it? It seems like a theoretical choice but not one I'd point users to 99% of the time, since you'd have to do something like reuse the same tape 5 times but not more than 6 😅

@willtebbutt
Copy link
Contributor

I agree with Chris here -- specifically that if you've recorded a tape, I really struggle to imagine a realistic situation in which you wouldn't also want to compile it. Consequently, you might as well unify the concept and assert that either:

  1. you re-record the tape at each iteration, or
  2. you record it once, remove type instabilities with the liberal use of function pointers, and then re-use it each time you want a gradient.

@gdalle gdalle linked a pull request Nov 6, 2024 that will close this issue
5 tasks
@gdalle
Copy link
Collaborator Author

gdalle commented Nov 6, 2024

Thank you for your answers. Even though it seems obvious to you two, it didn't seem so obvious to me at the time. I think the reason I picked a different interpretation in DI was two-fold:

  • The ReverseDiff docs are written using the word "record" to describe tape creation, and "compile" to describe tape acceleration, so I thought ADTypes meant the same.
  • There is a warning in the docstring for ReverseDiff.compile which made me think that recording a tape without compiling it might also be worthwhile:

Note that the longer the tape, the more time compilation may take. Very long tapes (i.e. when length(t) is on the order of 10000 elements) can take a very long time to compile.

Anyway, I clarified the AutoReverseDiff docstring in #92 (would love a review) and I'm on my way to fixing DI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants