Skip to content

Gradient estimation notebook#1549

Open
ofriwienner wants to merge 12 commits into
Classiq:mainfrom
ofriwienner:gradient-estimation
Open

Gradient estimation notebook#1549
ofriwienner wants to merge 12 commits into
Classiq:mainfrom
ofriwienner:gradient-estimation

Conversation

@ofriwienner

Copy link
Copy Markdown

No description provided.

@review-notebook-app

Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions

github-actions Bot commented Apr 15, 2026

Copy link
Copy Markdown

🔥 New notebook just dropped!

@amir-naveh , @TomerGoldfriend — come check out this shiny new addition to our repo.

@@ -0,0 +1,2289 @@
{

@orsa-classiq orsa-classiq Apr 15, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to move the utility functions to an adjacent python file, to keep the notebook clean. If there is some specific utility that it is valuable for the reader to read the definition, you can put it along the lines.


Reply via ReviewNB

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • spelling issues according to the claude:
  • Cell 6: "Classicaly" → "Classically"
  • Cell 6: "in specific point" → "at a specific point"
  • Cell 7: "if there are more than one coordinates" → "if there is more than one coordinate"
  • Cell 8: "explaination" → "explanation" (appears multiple times)
  • Cell 8: "miss some" → "misses some"
  • Cell 8: "fill in this details" → "fill in these details"
  • Cell 8: "resultion" → "resolution"
  • Cell 9: "repetetive" → "repetitive"
  • Cell 9: "beggining" → "beginning"
  • Cell 18: "complitly" → "completely"
  • Cell 25: "unefficient" → "inefficient"
  • Cell 40: "explaination" → "explanation"
  • Cell 46: "drasticly" → "drastically"
  • Cell 59: "sucess" → "success"

@@ -0,0 +1,2289 @@
{

@orsa-classiq orsa-classiq Apr 15, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #53.    def compute_success_rate(pc, analytic_derivatives, reject_underresolution=False):

I think you can simplify this function using the dataframe results


Reply via ReviewNB

@@ -0,0 +1,2289 @@
{

@orsa-classiq orsa-classiq Apr 15, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stick to the format under the algorithms directory. See for example simon\deutch-josza. Specifically for the beginning of the notebook - what is the input, output, complexity etc.


Reply via ReviewNB

@@ -0,0 +1,2289 @@
{

@orsa-classiq orsa-classiq Apr 15, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. \ket is not parsed well in our documentation. switch to |...\rangle instead.
  2. In step 1 you talked about few coordinates, but the rest of the formulas were only for a single variable
  3. "if there are more than one coordinates" - fix grammer


Reply via ReviewNB

@@ -0,0 +1,2289 @@
{

@orsa-classiq orsa-classiq Apr 15, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Exact version of the algorithm - I think 'exact' is misleading, because you still get an approximation to the gradient. Maybe "Full" version or "General" version is more suitable.


Reply via ReviewNB

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 - There are several approximations to the gradient - i.e backwards, forwards, symmetric etc. It would be nice to write something about what version you use here, and if it can work with the other versions

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 - "so doing a similar trick we can define... " - I think that the equation is not correct, or I miss something

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 - what is the tradeoff in l? why shouldn't one take it to be very small?

It will be good if you can describe the gradient accuracy (in a formula) in terms of the problem parameters

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 - In general - this explanation is quite cumbersome. Try to think how to simplify it, consider drawing it

@ofriwienner ofriwienner Apr 25, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 - Fixed

2 - I tried to add it, but I feel that it hurts the flow, and doesn't add much. What do you think?

3 - I think it is correct. The paper use the convertion formula delta_masured = N/m * grad(f) (see page 2 in the middle of the left column). The -N/2 make delta a signed value instead of unsigned. I think it is more clear this way (because we don't assume that the gradient is positive). The paper notes about it (page 2 in the end of the left column), but I think it is more confusing that way.

4 - TODO

5 - TODO

@@ -0,0 +1,2289 @@
{

@orsa-classiq orsa-classiq Apr 15, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repetetive -> repetitive (do syntax and grammer check on all of the text)

"Phase Kickback (as in the cited paper) and Prepare State " -> I guess you meant 'direct phase'


Reply via ReviewNB

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 - "A technical note: " - the comment is not needed

@@ -0,0 +1,2289 @@
{

@orsa-classiq orsa-classiq Apr 15, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would explain what is the input model for f assumed for phase kickback vs the direct case


Reply via ReviewNB

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

@@ -0,0 +1,2289 @@
{

@orsa-classiq orsa-classiq Apr 15, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #14.    def main(x: Output[QNum[n]], ancilla: Output[QNum[n0, SIGNED, n0]]):

consider not defining ancilla as an output. then only define it locally and clean it after the usage


Reply via ReviewNB

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It caused me some problems. I created a temporary cell (search for "Ancilla not as output"), and for now kept the original cells.

@@ -0,0 +1,2289 @@
{

@orsa-classiq orsa-classiq Apr 15, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #44.    # Pay attention that we use a statevector simulator here

now you can use the new function sample and calculate_state_vector,it will save a lot of boilerplate code


Reply via ReviewNB

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason for the context manager is to use es.set_measured_state_filter("ancilla", lambda v: v == 0).

If I fix the previous problem with the ancilla, I'll be able to get rid of it.

I want to discuss the different ways to execute the code and the best practices, because it confuses me.

@@ -0,0 +1,2289 @@
{

@orsa-classiq orsa-classiq Apr 15, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #25.        numeric_value = -0.5**n0

I think it will be much simpler and robust to have an prepare_phase_gradient_state function that works with a QArray so the type of the qnum will not matter. you can see https://arxiv.org/pdf/2409.04643 page 10.


Reply via ReviewNB

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the prepare_probabilities function

@@ -0,0 +1,2289 @@
{

@orsa-classiq orsa-classiq Apr 15, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these part is a bit overexplained. I would plot the corrected phase at first place


Reply via ReviewNB

@@ -0,0 +1,2289 @@
{

@orsa-classiq orsa-classiq Apr 15, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depending on the problem -> depending on 'f' specifically, not?


Reply via ReviewNB

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but I think that if I write “depending on the function f” it will sound as if it depends on the mathematical function (ie linear, quadratic, etc). I think a better way to write it is “depending on the oracle f” maybe?

Changed to this for now.

@@ -0,0 +1,2289 @@
{

@orsa-classiq orsa-classiq Apr 15, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #18.        prepare_complex_amplitudes(magnitudes=magnitudes, phases=phases, out=x)

this function is not scallable. Why didn't you used the phase function directly? in the case f is a polynomial, it should be quite straight forward. The point is that it is sometimes beneficial to load f directly to the phase and not pass through the digital encoding (which will sometimes requires additional qubits or depth as well).


Reply via ReviewNB

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in all places, except from the multi-coordinate case - TODO this one.

@@ -0,0 +1,2289 @@
{

@orsa-classiq orsa-classiq Apr 16, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set the file name to the conventions (small case)


Reply via ReviewNB

@@ -0,0 +1,2289 @@
{

@orsa-classiq orsa-classiq Apr 16, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again here I think all the phase verification is redundant


Reply via ReviewNB

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is beneficial to show that the linear approximation does not happen in this step, only in the next step (the QFT).

@@ -0,0 +1,2289 @@
{

@orsa-classiq orsa-classiq Apr 16, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #6.    def main(x: Output[QNum[n]]):

why didn't you define x as a signed integer? then you save the ugly post processing


Reply via ReviewNB

@@ -0,0 +1,2289 @@
{

@orsa-classiq orsa-classiq Apr 16, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #23.    majority_state = pc[0].state

1 - I am not sure whether the results are sorted always. Better to work with the dataframe and sort it

2 - Consider plotting a graph with the parsed gradient on the x axis and the probability on the y axis, and a dotted vertical line on the analytical gradient. (then do it for the next examples as well)


Reply via ReviewNB

@@ -0,0 +1,2289 @@
{

@orsa-classiq orsa-classiq Apr 16, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would explain about it in the description of the algorithm already


Reply via ReviewNB

@@ -0,0 +1,2289 @@
{

@orsa-classiq orsa-classiq Apr 16, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused about the m and l parameters.

I would work with the problem parameters |grad(f)| (maximal size of the gradient) and epsilon (required accuracy for the result). Say you know both of them, how would you choose l, m and the number of qubits? Write it in an equation. Assuming that you have an accurate phase oracle. Then you can also consider the case you do the calculation with a digital oracle (phase kickback) then you have additional source of error, and what will be the required number of qubits for this calculation (not a must though).

It seemed to me in your description that the choices are not coupled. They should all be derived from the problem parameters eps and |grad|


Reply via ReviewNB

@@ -0,0 +1,2122 @@
{

@TomerGoldfriend TomerGoldfriend May 20, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We will explore quantum algorithm --> a quantum algorithm.
  • "Output: ..." --> "... at the origin encoded on a quantum variable of size n".
  • Can you add some keywords? see how it is done in other notebooks in the Algorithms directory.

Add a References section at the end and refer in the intro to the original paper.


Reply via ReviewNB

@@ -0,0 +1,2122 @@
{

@TomerGoldfriend TomerGoldfriend May 20, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your figure and equations it seems like the separation in the x axis is 1. But you use the approximation f(x)=f(0)+x \grad f(0), which is true for small x.

Try to combine the two figures somehow... maybe side by side. Or even two curves on the same plot, but with some arbitrary shift between them.


Reply via ReviewNB

@@ -0,0 +1,2122 @@
{

@TomerGoldfriend TomerGoldfriend May 20, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think N_0 is not defined yet. You can mention that for some scenarios we have more parameters to tune. Or even write explicitly what scenario, if it takes few words to indicate it. Another option is not to mention this parameter now.

  • "Let's define: " --> We assume the values of the following parameters (or some approximation of them) are given:

Reply via ReviewNB

Comment thread algorithms/gradient_estimation/gradient_estimation.ipynb
@@ -0,0 +1,2122 @@
{

@TomerGoldfriend TomerGoldfriend May 20, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use to talk about three different encodings of expression. Phase encoding e^{if(x)}|x>, Amplitude encoding f(x)|x> (sometimes we need post-selection/ancilla), and Digital encoding |f(x)>.

If I understood correctly, here you have as an input a digital encoding, and you apply phase-kick-back to transform it to a phase encoding, right?

If yes, then mention it: "The first step of the algorithm is to prepare the state: ..., given the oracle |x>--> |f(x)>


Reply via ReviewNB

@@ -0,0 +1,2122 @@
{

@TomerGoldfriend TomerGoldfriend May 20, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show the mathematics here (or in an appendix if this is too long) on why this gives a phase kick-back.

I think you should add that we need to post-select on the ancilla being zero.


Reply via ReviewNB

@@ -0,0 +1,2122 @@
{

@TomerGoldfriend TomerGoldfriend May 20, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #3.    p = params()

We typically try to avoid using helper python files. It depends how heavy is the calculation and critical for understanding the idea of the algorithm. I do not know what is the situation here... so it is just a thought for now.


Reply via ReviewNB

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I had the helpers in the top of the file, and Or suggested to move them to a separate file.

I use helpers in many places to avoid repetitive 'non interesting' code, but I can write the code explicitly if it makes more sense. Let's discuss it in person.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, yes, keep it as is and let's discuss.

@@ -0,0 +1,2122 @@
{

@TomerGoldfriend TomerGoldfriend May 20, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #24.        prepare_state(probabilities=probabilities, bound=0.01, out=ancilla)
  • why did you use a bound of 0.01?
  • You can do:
allocate(ancilla)
ancilla ^= -1/N_0
# or
prepare_basis_state([1]*N_0, ancilla)

  • The variable name ancilla is confusing here, because it typically means this is a qubit which returns to zero.

  • Use the new calculate_state_vector instead of the ExecutionSession.
  • Try to sort the dataframe, the result will be more visible I think.

Reply via ReviewNB

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I didnt like ^= -1/N_0 because it assumes that the ancilla is signed fractional, and if it's not it won't work. I changed to prepare_basis_state.
  2. The variable name ancilla is confusing here, because it typically means this is a qubit which returns to zero.

As you also wrote before, we can do post selection where the ancilla is zero, and we can also do the inverse QFT and return it to be explicitly zero. I didn't do it because I felt it isn't necessary and just waste of resources, but if it is more accurate I can do it. The name ancilla is used in the paper.

3. I really liked the calculate_state_vector :)

4. Done

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 . hmmm so you can uncompute the ancilla with inverse QFT to the ancilla. Interesting... OK, let's discuss this in person as well.

@@ -0,0 +1,2180 @@
{

@TomerGoldfriend TomerGoldfriend May 24, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the enumeration in the headings.


Reply via ReviewNB

@@ -0,0 +1,2180 @@
{

@TomerGoldfriend TomerGoldfriend May 24, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we want not at the origin, but at some other point x_0?

It should be written somewhere that the idea of the algorithm is to first encode the function in the phase, and that the result is an inplace encoding of the gradient. The idea of inplace shall be mentioned.


Reply via ReviewNB

@@ -0,0 +1,2180 @@
{

@TomerGoldfriend TomerGoldfriend May 24, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make this explanation a bit clearer, let's discuss.


Reply via ReviewNB

@@ -0,0 +1,2180 @@
{

@TomerGoldfriend TomerGoldfriend May 24, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice table. I would put 2^n instead of N, to show that the rows are related.


Reply via ReviewNB

@@ -0,0 +1,2180 @@
{

@TomerGoldfriend TomerGoldfriend May 24, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid the dataframe warning somehow (not as an output, but really do not raise it)?


Reply via ReviewNB

@@ -0,0 +1,2180 @@
{

@TomerGoldfriend TomerGoldfriend May 24, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have an output of a runtime error, why is it? can we avoid it (not as an output, but really not raising it)


Reply via ReviewNB

@@ -0,0 +1,2180 @@
{

@TomerGoldfriend TomerGoldfriend May 24, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again a runtime error


Reply via ReviewNB

@@ -0,0 +1,2180 @@
{

@TomerGoldfriend TomerGoldfriend May 24, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that we can simplify the code of the second 2d example (see comment below), I think we can remove the remove the first, simple usecase, and stay only with the second one.


Reply via ReviewNB

@@ -0,0 +1,2180 @@
{

@TomerGoldfriend TomerGoldfriend May 24, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #20.    

I think it is possible to work with a QSruct here, or a QArray[QNum[n, SIGNED, 0]],2]. Let's discuss this in person.


Reply via ReviewNB

@@ -0,0 +1,2180 @@
{

@TomerGoldfriend TomerGoldfriend May 24, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and PDEs ---> remove the and.


Reply via ReviewNB

N: int,
x0: float = 0.0,
d: int = 1,
) -> Callable:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QCallable

@@ -0,0 +1,1989 @@
{

@TomerGoldfriend TomerGoldfriend Jun 4, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the plot, if you like, you can put the first marker as "o" and the second one as ".", and it shows nicely the fit.


Reply via ReviewNB

@@ -0,0 +1,1989 @@
{

@TomerGoldfriend TomerGoldfriend Jun 4, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write that you will define now the quantum function that calculate the gradient, for 1d and for multi-d cases.


Reply via ReviewNB

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@@ -0,0 +1,1989 @@
{

@TomerGoldfriend TomerGoldfriend Jun 4, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #14.    @qfunc

The two qfuncs have the same docstring, no?

Write that the second one is for multi-dimensional.

(BTW, does the algorithm works for different discretization on each axis?)


Reply via ReviewNB

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works. The current implementation for phase_oracle uses the same l for all coordinates, but it is just for convenience, and it will work for arbitrary l in each axis.

Comment thread algorithms/gradient_estimation/gradient_estimation.ipynb
Comment thread algorithms/gradient_estimation/gradient_estimation.ipynb
Comment thread algorithms/gradient_estimation/gradient_estimation.ipynb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants