-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
np.einsum_path vs TensorOperations #156
Comments
As far as the contraction you give is concerned, actually there is no optimization of contraction order that is automatically happening. In this case, the standard left to right multiplication is carried out. Nevertheless, you could ask for an optimized version, with various different measures to optimize over. You can find this in some more detail in this part of the documentation but if anything is not clear, do let us know! For the rest, what is going on under the hood could be summarised as taking the given contraction, and rewriting it in terms of the pairwise operations that are defined in the interface. In that sense, by defining the functions of the interface for your sparse array type, you could just use tensoroperations as is. As a small side note, GPU support is already in there, so you could also have a look at the cuTENSOR extension if you want inspiration as to what it requires to define an new backend/support a new type. Hope this helps! |
I guess left to right multiplication is still N^4 complexity so that's fine. I also already do use the GPU version, but I'm running into memory issues as my tensors are kind of big (would be amazing if TensorOperations could automatically distribute an operation across GPUs). It isn't really faster to take advantage of sparsity but it sure as hell helps the memory out. I'm not sure I can overload the TensorOpt interface to use sparsity since I want to use the GPU. You have any recommendations for reducing memory on the GPU (that psi is 99.9% zero). While I'm on the topic of memory I noticed that TensorOperations is missing out on some optimizations to reduce memory. I've pasted my two functions below. One is a simple one liner and the other is the exact same contraction but split into the parts that For example, function K_tensopt(psi, P)
@tensoropt begin
K_true[n,m,l] := psi[i,j,k]*P[i,n]*P[j,m]*P[k,l]
end
end Whereas this function which is equivalent (and also just the contractions that numpy spits out) function K_tensopt_parts(psi, phi)
@tensor begin
K1[j,k,n] := psi[i,j,k]*phi[i,n]
end
@tensor begin
K2[k,m,n] := K1[j,k,n]*phi[j,m]
end
fill!(K1, 0.0)
@tensor begin
K1[n,m,l] = K2[k,m,n]*phi[k,l]
end
return K1
end |
Thanks for showing that, that's quite interesting! We've never really considered being able to reuse memory from the temporary intermediate objects within the same contraction. The biggest struggle with that is that the Considering the issue with memory usage. This is definitely an active and ongoing topic of interest. We rewrote part of the machinery to be able to experiment with different backends and allocation strategies, specifically to be able to address this issue. I would recommend having a look at AllocationKit.jl, where I have put some more information as well. They are primarily focussed on CPUs at the moment, but it might be the case that one of these approaches also works for reducing the GPU memory pressure. In any case, if you do figure it out, or have some more questions, definitely don't hesitate to contact me, I am also very interested, both in working with GPUs as well as dealing with memory pressure. |
Hello,
I was wondering if there is much difference in the contraction order given by
numpy.einsum_path
vs. what is being done inside of TensorOperations? I've looked at the output of@macroexpand
and its not super clear to me what is being done. The specific contraction I'm looking at is:@macroexpand @tensor K_true[n,m,l] := psi[i,j,k]*P[i,n]*P[j,m]*P[k,l]
and I'm just trying to understand how things are being multiplied under the hood so I can re-make this on GPU in a way that takes advantage of sparsity. I've coded what the numpy command outputs and TensorOperations is still much faster. I believe they have same computational complexity (4) and have similar memory footprints.
The text was updated successfully, but these errors were encountered: