Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ITensors] Reducing allocations in contraction (or inner) #78

Open
emstoudenmire opened this issue Oct 10, 2022 · 1 comment
Open

[ITensors] Reducing allocations in contraction (or inner) #78

emstoudenmire opened this issue Oct 10, 2022 · 1 comment

Comments

@emstoudenmire
Copy link
Contributor

Is your feature request related to a problem? Please describe.

Based on a Discourse discussion here https://itensor.discourse.group/t/evaluating-overlaps-of-mpss-in-parallel/451/
it seems that the tensor contraction backend, in this case called through the inner function, can generate a lot of "garbage", that is perform a large number of allocations. In the user's case, this resulted either in a measureable slowdown of multithreaded performance, or when disabling GC (GC.enable(false)) led to a spike in memory usage followed by a delay after GC was re-enabled.

It should be noted that the calculation performed by the user was itself rather demanding, with something like a thousand inner products of length N=100 MPS being performed all at the same time. The overall speed of this was actually quite good, and the only issue here is how effectively it can be parallelized by multithreading.

Describe the solution you'd like

This is more of a "placeholder" issue to remind us to investigate allocation in the contraction engine. (Unless it is is in the inner function itself, though I doubt that given the simplicity of that function.)

Describe alternatives you've considered

Considered disabling GC or other Julia-language aspects, outside of ITensor, but my current best guess here is that there are just a lot of allocations happening at the contraction level.

Additional context

Forum discussion:
https://itensor.discourse.group/t/evaluating-overlaps-of-mpss-in-parallel/451

@emstoudenmire
Copy link
Contributor Author

Note to also test this in the C++ version using an OpenMP parallelized for loop.

@mtfishman mtfishman transferred this issue from ITensor/ITensors.jl Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant