-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qsim 0.16 requires more RAM for a simple Hadamards circuit compared to qsim 0.12 #612
Comments
It could also be due to older versions of Cirq for the qsimcirq 0.13 that I tested. I did it on an older version of cuQuantum Appliance, which has an older Cirq. |
I tried qsimcirq 0.16.3 + Cirq 0.14, still OOM-ed. As such, the code that causes the increased memory cost is very likely in the qsim repo. |
Thanks for raising this issue! The root cause has several parts:
The appropriate solution to this likely involves passing the buffer qsim uses in its C++ layer up to the python level so we can recycle it for |
I see, it's a relief knowing the cause.
Why is the return type in qsim/pybind_interface/pybind_main.cpp Line 520 in 72a96d5
py::array_t<float> instead of complex?
(digression: I suppose a full solution to the issue requires changing pybind_main.cpp to expose the buffer, and thus requires a recompilation. Unfortunately, I can't recompile qsim from scratch because I'm using cuQuantum Appliance, which has custom modifications over qsim, especially the multi-GPU backend. Seems to require #601 to be solved) |
@sergeisakov would know best, but I believe this is because qsim stores the state as a float array for vector operations. Real and imaginary components of each complex value are stored as separate floats and recombined once the result surfaces in python. |
Yes, this is correct. |
Why do you think this issue is related to #601? |
NVM, that was a digression comment specific to my use case. Though I was mistaken that the issue in that comment is related to #601, because the cuQuantum Appliance would still contain (proprietary? [1]) modifications over qsim, even if there is a dynamic link to CUDA and cuQuantum. [1] I can't seem to find the string |
Yes, it seems the cuQuantum Appliance contains proprietary modifications over qsim. |
A note on performance. Just running sim = cirq.Simulator()
sim.simulate(qc_cirq) doesn't really perform any simulation. Running, say, sim = cirq.Simulator()
result = sim.simulate(qc_cirq)
print(result.state_vector()[0]) performs simulation and it takes 130 seconds on an a2-highgpu-1g (not just 2 seconds). Running that with |
I see.
I believe the OOM issue on a2-highgpu-1g happens even on deep circuits, because of the post-simulation creation of |
Yes, the OOM issue happens even on deep circuits. I think this issue should be fixed on the Cirq level. Cirq has another line state_vector = state_vector.copy() that allocates memory in addition to buffer = np.empty_like(state_vector). So Cirq allocates two buffers and it does not recycle the buffer that it gets from the C++ layer. |
I tried commenting out the I suppose, in addition to reusing the buffer from the C++ layer, there needs to be a flag in |
I think it should be safe to do so.
Yes, something like that. |
I was running the following code on a cuQuantum Appliance 23.03 Docker instance, on an a2-highgpu-1g, with a RAM of 85 GB:
The code ran just fine with
cirq.Simulator()
, taking 2 s in total. It ran fine withqsimcirq.QSimSimulator()
for qsimcirq 0.12, taking ~19 s in total (probably could be optimized to ~2 s), and ran fine for qsimcirq 0.13, taking ~1 min 8 s. But for qsimcirq 0.14 and 0.16, I gotThe line where the error happens, is for construction of the final state vector after the simulation has finished.
The most I can bisect is between releases, and so it is due to change(s) between 0.13 and 0.14. Any idea what could be the cause?
The text was updated successfully, but these errors were encountered: