-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bottleneck in mapreducedim for convolutional layers #558
Comments
This is probably coming from the Ideally our mapreducedim kernel would just be fast, but it's easier said than done to optimise these kinds of GPU kernels. I believe there was also some work on wrapping CUDNN's gradient function, which would do that reduction for us, but that's not hooked up yet. |
There is https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#cudnnConvolutionBiasActivationForward to do the whole forward pass in one shot and then one can use https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#cudnnConvolutionBackwardBias and https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#cudnnConvolutionBackwardData for the backward pass? |
Yeah – the CUDNN wrappers were set up here, so it just needs someone to set up the right dispatch on the Flux side. |
The slow |
The integration from Flux's side is here #335. It needs a few fixes though. |
I ran the mnist model with the PR I mentioned.
I am getting quite low overhead for the bias term when the batch size is less (around 100). But increasing the batch size affects the bias term. It becomes around 28% of the time for a batch size of 1000. |
Looking only at the forward pass we currently have:
while enabling
by avoiding the anonymous kernel in applying the bias and activation function. I'll try make a PR for it. |
Running the conv network for MNIST in the model-zoo the following profile is obtained:
The time in the mapreduce kernel (https://github.com/JuliaGPU/CuArrays.jl/blob/a3d2650db3eb62f25dcbe18a64ea0a0036caced4/src/mapreduce.jl#L27-L54) is probably a bit big.
This seems to be coming from a call to
sum
following a call tounbroadcast
. I'm guessing this is from the activation function?The specific call to the mapreduce kernel is
Base._mapreducedim!(f::typeof(identity), op::typeof(Base.add_sum), R::CuArray{Float32}, A::CuArray{Float32})
The text was updated successfully, but these errors were encountered: