Skip to content

Add Flag to avoid copying state vector#8054

Open
jkalsi1 wants to merge 12 commits into
quantumlib:mainfrom
jkalsi1:main
Open

Add Flag to avoid copying state vector#8054
jkalsi1 wants to merge 12 commits into
quantumlib:mainfrom
jkalsi1:main

Conversation

@jkalsi1

@jkalsi1 jkalsi1 commented Apr 21, 2026

Copy link
Copy Markdown
Member

resolves #8041

@jkalsi1 jkalsi1 requested review from a team and vtomole as code owners April 21, 2026 18:22
@jkalsi1 jkalsi1 requested a review from 95-martin-orion April 21, 2026 18:22
@github-actions github-actions Bot added the size: M 50< lines changed <250 label Apr 21, 2026
@jkalsi1

jkalsi1 commented Apr 21, 2026

Copy link
Copy Markdown
Member Author

@sergeisakov @mhucka , here is a first go at implementing my proposed temporary solution to #8041 in Cirq, and #893 in qsim.

@codecov

codecov Bot commented Apr 21, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.61%. Comparing base (f4c35b0) to head (bca9c5c).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8054   +/-   ##
=======================================
  Coverage   99.61%   99.61%           
=======================================
  Files        1110     1110           
  Lines      100561   100616   +55     
=======================================
+ Hits       100175   100230   +55     
  Misses        386      386           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mhucka

mhucka commented May 13, 2026

Copy link
Copy Markdown
Contributor

Update from Cirq Cynq 2026-05-13: we verified with the user that they had at least 8 TB available on the system (but probably not 16 TB), and have requested an example that can be used to reproduce the failure and be used to test before/after behavior of this PR.

@pavoljuhas

pavoljuhas commented May 28, 2026

Copy link
Copy Markdown
Collaborator

I am not able to confirm this change helps with the memory needs of QSimSimulator and the initial OOM problem reported in quantumlib/qsim#893. My test files are available in a placeholder PR quantumlib/qsim#1076.

First, there is the oom_qsim_simulation.py script to check at what number of qubits does the simulation run out of memory. I checked it on a 65GB Debian box and the OOM onset happened for 34 qubits equally before and with this change.

Next, I ran an example qsim_simulation.py script under valgrind for the baseline qsim / cirq combination and for qsim and cirq using this change (and the new should_preserve_initial_state set to False):

valgrind --tool=massif python qsim_simulation.py

The extracted memory use during python execution was quite similar in both scenarios; most importantly they showed the same maximum memory use peak which would be the determining factor for running out of memory.

Below is the collected memory usage for two simulations with 20 and 24 qubits.

In summary, I don't see an evidence this change improves things and that it is worthwhile to complicate the code with an extra argument and an indirect buffer property.

plot_memory_use

@sergeisakov

Copy link
Copy Markdown

I can confirm that this change helps. However, there is an additional problem.

I ran a slightly modified script

import gc
import sys

import cirq
import qsimcirq

if len(sys.argv) != 2:
    print(f"usage: {__file__} num_qubits")
    sys.exit(0)

nqubits = int(sys.argv[1])

q = cirq.LineQubit.range(nqubits)
m1 = cirq.Moment(cirq.H.on_each(q))
m2 = cirq.Moment(cirq.CX(qi, qj) for qi, qj in zip(q[0::2], q[1::2]))
m3 = cirq.Moment(cirq.CX(qi, qj) for qi, qj in zip(q[1::2], q[2::2]))
circuit = cirq.Circuit(5 * [m1, m2, m3])

options = qsimcirq.QSimOptions(max_fused_gate_size=4, cpu_threads=45, verbosity=1)
sim = qsimcirq.QSimSimulator(qsim_options=options)

gc.disable()
gc.collect()

state_vector = sim.simulate(program=circuit).state_vector()

on a c3d-standard-90 instance in Google Cloud. This machine has 360 GB of memory, which is enough for 35 qubits.

If I run this script with 35 qubits and without this change, it crashes while copying the state vector when _BufferedStateVector is created. If I run it with this change, it still crashes at the same place. The problem is that the should_preserve_initial_state flag is True by default, so I need to modify qsimcirq to set it to False.

Once that is done, it crashes during the call to state_vector(), which in turn calls final_state_vector (a method of StateVectorTrialResult in state_vector_simulator.py). Here, an additional vector is allocated when the normalized state is computed. To bypass that, I can replace

state_vector = sim.simulate(program=circuit).state_vector()

with

state_vector = sim.simulate(program=circuit)._get_merged_sim_state().target_tensor.reshape(-1)

Then it works with 35 qubits.

I think an additional flag should be introduced to bypass final_state_vector. In addition to allocating the new state, it is actually very slow (at the very least, ret_norm = ret / norm should be replaced with ret_norm = ret * (1.0 / norm)).

Here are some benchmarks for 34 qubits:

  • qsim simulation time: 120.5 seconds.
  • final_state_vector time: 75.8 seconds.
  • Time to copy the state vector (if should_preserve_initial_state = True): 17.5 seconds.

@jkalsi1

jkalsi1 commented May 28, 2026

Copy link
Copy Markdown
Member Author

@sergeisakov I just pushed a few commits implementing your suggested changes. I added a flag normalize (defaults to True) in state_vector(), bypassing the call to final_state_vector if normalize == False. I also implemented your performance enhancement of replacing ret_norm = ret / norm with ret_norm = ret * (1.0 / norm).

@pavoljuhas pavoljuhas self-assigned this Jun 10, 2026
@pavoljuhas

Copy link
Copy Markdown
Collaborator

@pavoljuhas - to check memory use with the last version

@pavoljuhas pavoljuhas added the priority/before-1.7 Finish before the Cirq 1.7 release label Jun 11, 2026
@pavoljuhas

Copy link
Copy Markdown
Collaborator

The somewhat messy code in StateVectorTrialResult.final_state_vector with an extra copy of the state vector array originates back to normalization patching in #6522 and #6556.

I think we should revisit these and see if a precision fix is possible with an inplace modification of the SimulationState.target_tensor that would not need an extra state vector copy. The final_state_vector code could then go back to the state before #6522, i.e., as in Sergei's comment.

@pavoljuhas pavoljuhas added priority/after-1.7 Leave for after the Cirq 1.7 release and removed priority/before-1.7 Finish before the Cirq 1.7 release labels Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority/after-1.7 Leave for after the Cirq 1.7 release size: M 50< lines changed <250

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provide a way to avoid copying state vector from Cirq

4 participants