Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use faster activation functions #1837

Merged
merged 3 commits into from
Feb 5, 2022
Merged

Use faster activation functions #1837

merged 3 commits into from
Feb 5, 2022

Conversation

mcabbott
Copy link
Member

@mcabbott mcabbott commented Jan 16, 2022

This substitutes tanh_fast where possible, since tanh often dominates the forward pass of small networks. Builds on #1761, but in the months that sat waiting, I have mislaid the benchmarks I ran. Best case 2x better forwards, worst case no improvement, modulo noise.

Intersects with #1832, which would also do this to conv layers, but not to RNNs.

Closes #1272; this version is significantly faster, and this PR applies it to more cases.

@mcabbott mcabbott mentioned this pull request Jan 16, 2022
@mcabbott mcabbott added this to the v0.13 milestone Jan 16, 2022
src/layers/basic.jl Outdated Show resolved Hide resolved
@CarloLucibello
Copy link
Member

CarloLucibello commented Jan 18, 2022

It's a pity to pollute the code a bit but I guess the performance increase is worth it.

I wonder what could be a more aesthetically pleasing alternative. Pointing NNlib.sigmod to sigmoid_fast and defining a NNlib.tanh as tanh_fast?

Just for reference, the reason why this PR is switching activation in the forward instead of construction time is here

My thinking is that we may later decide that it's better to skip tanh_fast on the GPU. I can't measure a difference so who knows. To do that, we can add a method in NNlibCUDA like fast_act(::typeof(tanh), ::CuArray) = tanh. Provided you call this like fast = NNlib.fast_act(fun, x) with an example of the array you plan to broadcast it over.

@mcabbott
Copy link
Member Author

I wouldn't claim this as an aesthetic improvement. But it is a performance one.

I'm against less explicit names like NNlib.tanh, I think it's confusing to read packages which don't work how they look like they work, because some Base name means something different. Flux.rand used to be some such trick, and we ripped it out, for explicit rand32.

@CarloLucibello
Copy link
Member

Ok. Should we wait for a final tag of the v0.12 before merging (#1838)? I wouldn't say this PR is breaking though

@mcabbott
Copy link
Member Author

Not so breaking as to demand a release, but if there's one nearby perhaps that's the safe time. So after last 0.12.x tag, before v0.13, as you say.

@ToucheSir
Copy link
Member

Buildkite is failing with weird errors again 😬

@ToucheSir ToucheSir closed this Feb 5, 2022
@ToucheSir ToucheSir reopened this Feb 5, 2022
@ToucheSir
Copy link
Member

bors try

bors bot added a commit that referenced this pull request Feb 5, 2022
@bors
Copy link
Contributor

bors bot commented Feb 5, 2022

try

Build succeeded:

src/layers/basic.jl Outdated Show resolved Hide resolved
src/layers/conv.jl Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Feb 5, 2022

Codecov Report

Merging #1837 (0a8fade) into master (8d3b8d3) will increase coverage by 0.09%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1837      +/-   ##
==========================================
+ Coverage   73.85%   73.94%   +0.09%     
==========================================
  Files          28       28              
  Lines        1683     1689       +6     
==========================================
+ Hits         1243     1249       +6     
  Misses        440      440              
Impacted Files Coverage Δ
src/layers/basic.jl 75.00% <100.00%> (+0.19%) ⬆️
src/layers/conv.jl 80.11% <100.00%> (+0.44%) ⬆️
src/layers/recurrent.jl 75.67% <100.00%> (+0.22%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d3b8d3...0a8fade. Read the comment docs.

Co-authored-by: Brian Chen <[email protected]>
Comment on lines +228 to +229
c′ = @. sigmoid_fast(forget) * c + sigmoid_fast(input) * tanh_fast(cell)
h′ = @. sigmoid_fast(output) * tanh_fast(c′)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ought these to get the fast_act treatment too? I'm ok with revisiting too, RNN cell inflexibility is a bit of a long standing issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right. If we decide to disable fast_tanh on CuArrays, that will be ignored here. But perhaps revisit if & when... it's a bit more clutter to squeeze that in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be a blessing in disguise, as currently plain tanh.(...) will hit the likely obsolete https://github.com/FluxML/NNlibCUDA.jl/blob/master/src/cudnn/activations.jl.

@mcabbott mcabbott merged commit 841afe7 into FluxML:master Feb 5, 2022
@mcabbott mcabbott deleted the fastact branch February 5, 2022 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants