Add Flag to avoid copying state vector#8054
Conversation
|
@sergeisakov @mhucka , here is a first go at implementing my proposed temporary solution to #8041 in Cirq, and #893 in qsim. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8054 +/- ##
=======================================
Coverage 99.61% 99.61%
=======================================
Files 1110 1110
Lines 100561 100616 +55
=======================================
+ Hits 100175 100230 +55
Misses 386 386 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Update from Cirq Cynq 2026-05-13: we verified with the user that they had at least 8 TB available on the system (but probably not 16 TB), and have requested an example that can be used to reproduce the failure and be used to test before/after behavior of this PR. |
|
I am not able to confirm this change helps with the memory needs of QSimSimulator and the initial OOM problem reported in quantumlib/qsim#893. My test files are available in a placeholder PR quantumlib/qsim#1076. First, there is the oom_qsim_simulation.py script to check at what number of qubits does the simulation run out of memory. I checked it on a 65GB Debian box and the OOM onset happened for 34 qubits equally before and with this change. Next, I ran an example qsim_simulation.py script under valgrind for the baseline qsim / cirq combination and for qsim and cirq using this change (and the new valgrind --tool=massif python qsim_simulation.pyThe extracted memory use during python execution was quite similar in both scenarios; most importantly they showed the same maximum memory use peak which would be the determining factor for running out of memory. Below is the collected memory usage for two simulations with 20 and 24 qubits. In summary, I don't see an evidence this change improves things and that it is worthwhile to complicate the code with an extra argument and an indirect buffer property.
|
|
I can confirm that this change helps. However, there is an additional problem. I ran a slightly modified script import gc
import sys
import cirq
import qsimcirq
if len(sys.argv) != 2:
print(f"usage: {__file__} num_qubits")
sys.exit(0)
nqubits = int(sys.argv[1])
q = cirq.LineQubit.range(nqubits)
m1 = cirq.Moment(cirq.H.on_each(q))
m2 = cirq.Moment(cirq.CX(qi, qj) for qi, qj in zip(q[0::2], q[1::2]))
m3 = cirq.Moment(cirq.CX(qi, qj) for qi, qj in zip(q[1::2], q[2::2]))
circuit = cirq.Circuit(5 * [m1, m2, m3])
options = qsimcirq.QSimOptions(max_fused_gate_size=4, cpu_threads=45, verbosity=1)
sim = qsimcirq.QSimSimulator(qsim_options=options)
gc.disable()
gc.collect()
state_vector = sim.simulate(program=circuit).state_vector()on a If I run this script with 35 qubits and without this change, it crashes while copying the state vector when Once that is done, it crashes during the call to state_vector = sim.simulate(program=circuit).state_vector()with state_vector = sim.simulate(program=circuit)._get_merged_sim_state().target_tensor.reshape(-1)Then it works with 35 qubits. I think an additional flag should be introduced to bypass Here are some benchmarks for 34 qubits:
|
…vector call, preventing additional memory overhead
|
@sergeisakov I just pushed a few commits implementing your suggested changes. I added a flag |
|
@pavoljuhas - to check memory use with the last version |
|
The somewhat messy code in StateVectorTrialResult.final_state_vector with an extra copy of the state vector array originates back to normalization patching in #6522 and #6556. I think we should revisit these and see if a precision fix is possible with an inplace modification of the SimulationState.target_tensor that would not need an extra state vector copy. The final_state_vector code could then go back to the state before #6522, i.e., as in Sergei's comment. |

resolves #8041