Introduction to Weak Measurements and Weak Values

Weak measurement is increasingly acknowledged as one of the most promising research tools in quantum mechanics. Tasks believed to be self contradictory by nature such as ‘determining a particle’s state between two measurements’ prove to be perfectly possible with the aid of this technique. Similarly for the theory within which weak measurement has been conceived, namely the two-state vector formalism. Within this framework several new questions can be raised, for which weak measurement turns out again to be the most suitable tool for seeking answers. In the following we very briefly present some of the techniques, problems and paradoxes arising from the theory. Weak measurements can reveal some information about the amplitudes of a quantum state without collaps-


Introduction
Weak measurement is increasingly acknowledged as one of the most promising research tools in quantum mechanics.Tasks believed to be self contradictory by nature such as 'determining a particle's state between two measurements' prove to be perfectly possible with the aid of this technique.Similarly for the theory within which weak measurement has been conceived, namely the two-state vector formalism.Within this framework several new questions can be raised, for which weak measurement turns out again to be the most suitable tool for seeking answers.In the following we very briefly present some of the techniques, problems and paradoxes arising from the theory.
Weak measurements can reveal some information about the amplitudes of a quantum state without collaps-This is an open access article distributed under the terms of the Creative Commons Attribution License CC-BY-3.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
ing the state into eigenvectors.This is done by a weak coupling between the measurement device and the system.Weak measurements generalize ordinary quantum (projective) measurements: following the weak coupling the state vector is not collapsed but biased by a small angle, and the measurement device does not show a clear eigenvalue, but a superposition of several values.One question is immediately raised: what type of information can weak measurement reveal?
While trying to answer the above question we realize that weak measurement theory presents a plethora of quantum strange phenomena, not yet completely understood.In the paper we shall present and discuss some of the paradoxes and peculiarities of the theory.

Weak measurements: the one-state vector formalism
We start with a short review of ordinary strong measurement, then we define the notion of weak measurement as a generalization of strong measurement.Next we describe some properties of weak measurement.

Strong quantum measurements
Let |ψ be a pure vector state.Consider the standard formalism of quantum measurement theory.We differentiate between the outcome of the measurement m, the probability to get that measurement outcome p(m), and the state of the system after the measurement |ψ m [1][2][3][4].
We can describe the measurement process using the formalism of vector operators.Let Mm be a set of linear operators satisfying M † m Mm = Î then we will say that the set defines a quantum measurement where the probability is defined by and the vector state (non normalized) after the measurement is If Mm are also projective operators i.e.M † m Mm = Mm then we say that the measurement is projective.

Example 1
Consider the state where |0 and |1 are the eigenvectors of the Pauli spin matrix in the z-direction Measuring the state using the projectors M0 = |0 0|, M1 = |1 1| gives the result 0 or 1 respectively with probability 1 2 .Following the measurement, the state of the system is |0 or |1 respectively.
In some cases it is enough to know the probabilities of the measurement outcome e.g. when the final state is not so important.In such cases it is enough to demand the existence of a set of positive operators Êm such that m Êm = Î (5) where the probability to get m is This is known as positive operator-valued measurement (POVM).As a general scheme for implementing a POVM one can couple the system with an ancilla which represents the environment, evolve the whole system and ancilla using a unitary operator, and then measure the ancilla space using a projective measurement.This produces a series of positive operators on the original system.The technique is frequently used in quantum operations and is also known as the operator-sum representation (see also [4, chapter 8]).

Weak measurements: definition
There are many motivations for generalizing quantum projective measurements into weak measurements: maintaining the initial state while gradually accumulating information [5], determining the system's state in-between two strong measurements [6] and revealing unusual weak values [7] are just three examples, others will be described below.
Weak measurement should be treated as a generalization of quantum strong measurement.In weak measurement theory both the system and the measurement device are quantum systems [8].Weak measurement consists of two steps.In the first step we weakly couple the quantum measurement device to the quantum system.In the second step we strongly measure the measurement device.The collapsed state of the measurement device is referred to as the outcome of the weak measurement process.For a measurement to be weak, the standard deviation of the measurement outcome should be larger than the difference between the eigenvalues of the system.We will now describe this process in details.The procedure resembles the strong measurement process described in [1], however here we use a very weak entanglement between the system and the measurement device (see [8]).
Let |φ d denote the wave function of the measurement device.When represented in the position basis it will be written as where x is the position variable of the measuring needle.
Let Xd be the position operator such that Xd |x = x|x (here, we use X to distinguish the operator X from its eigenvector |x and eigenvalue x, the same for P, the subscript d is used for measuring device).We will also assume that φ(x) behaves normally around 0 with some variance ∆ = σ 2 : We will later (strongly) measure |φ d i.e. collapse the device's needle to get a value which is the weak measurement's outcome.Let S denote our system to be measured.Suppose Â is an Hermitian operator on the system S .Suppose Â has N eigenvectors |a j such that Â|a j = a j |a j .
Consider the general state vector |ψ expressed in the eigenbasis of Â: Consider the interaction Hamiltonian Ĥint [8, chapter 7] Here g(t) is a coupling impulse function satisfying where T is the coupling time and Pd is the operator conjugate to Xd such that [ Xd , Pd ] = ı .We shall start the measuring process with the vector in the tensor space of the two systems.Then we apply the Hamiltonian It is easy to see that on each of the vectors |a j ⊗ |φ(x) the Hamiltonian Ĥ takes Xd to Xd +a j , (Heisenberg evolution) (see [9, section 8.4]).The corresponding transformation of the coordinates of the wave function is The above wave functions φ have high variance and therefore they overlap each other.The higher the variance the weaker the measurement process.If these normal wave functions do not overlap then the measurement is strong.Therefore we can control the process by the choice of the variance.

Example 2
Take Â = 2 Ŝ z and let Ĥint = Â ⊗ Pd .Write |ψ in the basis of eigenvectors of Ŝ z i.e. |ψ = α|0 + β|1 .Following the weak coupling the system and the measuring device are entangled (16) where the above functions φ are two normal functions overlapping each other, due to their high variance.

Biasing the initial vector
In Example 2 we can write the entangled system and the measurement device as (17) We will now strongly measure the needle of the measuring device.Suppose the needle collapses to the vector |x 0 , then our system is now in the state [e β|1 ] ⊗ |x 0 (18) without the integration.The eigenvalue x 0 could be anywhere around 0 or 1, or even further away especially if σ is big enough i.e. the measurement is very weak.
If the needle collapses to a value x 0 around 1 this means that the amplitude to post-select |0 is a little higher than the amplitude to post-select |1 and vice versa.So the collapse of the needle biases the system's vector.However if σ is very big with respect to the difference between the eigenvalues of Ŝ z then the bias will be very small and the outcome system's vector will be very similar to the original vector.
To sum up, the system's vector is being biased a little in a direction that corresponds to the needle's outcome value.This is a generalization of strong measurements.On the one hand the information we get, that is the value of the needle, is very vague, and on the other hand the system's vector does not collapse but is being biased a little.If the measurement is getting stronger we will have a clearer value, very close to one of the eigenvalues of the system and an almost collapse of the system's vector i.e. a strong bias in the direction that corresponds to the needle's value.Note that there is always a correspondence between the value of the needle and the direction of the bias.This is the reason we can look at the weak measurement process as a generalization of the strong measurement one.

Computing the average of all eigenvalues
Let Â be an Hermitian operator on the system S .Suppose |a j are eigenvectors for Â with eigenvalues a j .We will show that using weak measurements it is easy to compute the average of all eigenvalues [10].
Weakly couple the state |ψ = j α j |a j to a measuring device which has a normal distribution centered around 0. Our system is now described by Looking at the needle, the probability density to get x is which is a multinormal distribution with many modes.Now, if σ is big enough with respect to the variance of eigenvalues, it is simple to show that where the above normal distribution is centered around j |α j | 2 a j which is the average over all eigenvalues (use the fact that e y/σ 2 ≈ 1 + y/σ 2 for σ √ y).We can then use the distribution of the needle to sample an estimate for the average or the sum of all eigenvalues.

The uncertainty relation
The argument for the uncertainty principle follows a discussion in [8].Given a large ensemble of identical particles, suppose we measure part of them using an observable X and the other part with a conjugate observable P. We will show that the variances ∆X and ∆P satisfy the inequality ∆X∆P > (see [4, p.89] and also subsection 3.9).
Suppose we want to weakly measure the position x of a particle.We shall use a probe particle as a measuring device.Consider where X d is the position coordinate of the probe particle.
Hence the variance ∆P d of the probe's momentum sets a lower bound for the variance ∆X of the system Alternatively, suppose we want to weakly measure the momentum p of the particle.Consider Ĥint = g(t) P ⊗ Pd (24) Hence the variance ∆X d of the coordinate of the probe sets a lower bound for the variance ∆P of the momentum of the system Now given a large ensemble of identical particle, some will be weakly measured by X, the others by P. By the above two inequalities Hence the uncertainty principle for the system is deduced from the uncertainty for the probes.However the probes are being measured by strong measurements and clearly satisfy the uncertainty principle.

Physical realizations
We end this section with a few remarks regarding the realization of weak measurements.Many practical experiments such as [11][12][13][14][15] have been performed over the years to implement weak measurements.Although the setups are different, most of them share a common basis: inserting a small bias to the wave function (in position, polarization, frequency etc.) which hardly affects its time evolution.When this bias is later detected, it reveals some information regarding the wave-function and enables to calculate the weak-value in retrospect.In [12] for example, the weak measurement is accomplished with a thin piece of birefringent calcite that slightly changes the polarization of the photons passing through by introducing a phase shift between the ordinary and extraordinary components of polarization.This phase shift depends on their transverse momentum and therefore upon dissolving the rays using a beam displacer and detecting them with a CCD camera their trajectories can be reconstructed.
Up to now, the experiments verify all theoretical predictions, but nevertheless, much effort is invested in order to produce more accurate and broader experimental setups.

Weak measurement and post-selection: the two-state vector formalism
In this section we define the notion of weak measurement with post-selection.Introducing post-selection into the theory of weak measurement results in many strange, interesting and sometimes puzzling phenomena such as huge, negative and even complex weak values, superoscillations, amplification, negative probabilities etc.We start with some definitions.

Definitions
Let Â be an Hermitian operator on the system S , let |ψ in and |ψ fin denote two state vectors in the Hilbert space of S , and |φ(x) the state vector of the needle.Suppose we weakly measure an ensemble of particles prepared in the state |ψ in using the Hamiltonian Ĥ = Â ⊗ Pd .We will now post select the final vector |ψ fin .The amplitude to get |ψ fin will be It means that we start with several copies of our system, all are weakly measured by Ĥ.We then select only those that are found to be in the direction of |ψ fin [8,16,17].
Practically and computationally post-selection could be viewed as the result of a strong measurement.If {|ψ fin,i } i is a basis for the Hilbert space of S , where |ψ fin,0 = |ψ fin then consider the two operators We will use the above two operators to strongly measure |ψ w = e −ı ĤT/ |ψ in ⊗ |φ(x) .We will only look at the vector states for the outcome 1, (see subsection 2.1) We assume now that Pd is distributed around 0 with low variance.Therefore Xd will have high variance and the measurement will be weak.Hence the above vector is where [18]: We can now compute p(1), i.e. the probability to get the outcome 1, by the post-selection due to strong measurement Note that the normal probability distribution we get is centered around the weak value Â w .Note also that | ψ fin |ψ in | 2 multiplies the normal distribution.This factor could be very small if |ψ fin and |ψ in are almost orthogonal and therefore the post-selection process require the use many particles.Following the weak coupling, the tensor is described by Equation 15.The post-selection procedure reshuffles the Gaussian functions using new amplitudes that depend on |ψ in and |ψ fin .This is the reason for the fact that all those exponents gather around a new value when we apply the post-selection.

Strange weak values
If we use post-selection it is easy to see that the needle's distribution could be centered at a value which is far from any of the eigenvalues of the operator Â [8].[8, chapter 16]), which is more than twice the largest eigenvalue of Â.
In Example 3 the strange value of the needle was in the range of the variance of the needle, i.e. close to the eigenvalues of Â. However it is possible to construct an example where the strange value goes very far.From Equation 32 it can be noted that if |ψ fin and |ψ in are almost orthogonal this value could be large.In [19] a weak measurement of the spin of a particle gave a value of 100 2 .
The following example is a version of an example introduced in [20].It shows that by post selecting a very rare vector we could get a very strange weak value.In other words these strange weak values appear with very low probability.
Consider e −ı Ŝ z ⊗ L/ where the momentum operator L represents a small shift of the needle's wave function.Let |ψ in = α|0 + β|1 .Then consider the equation of operators on the needle's space We shall now post select two vectors Û|0 and Û|1 , where First compute (as operator equation on the needle's space) If we now post-select |0 the momentum operator on the needle is where Else, if we post-select |1 the momentum operator on the needle is Suppose α is very close to β.If we post select Û|0 then L0 could be high above the eigenvalues of Ŝ z , however this post-selection will have a very low probability |α−β| 2 .Alternatively, the probability to post select Û|1 is high |α + β| 2 , but L1 is close to 0. To sum up, we could have a very low probability to get a distribution function (for the needle) which is shifted very far from the eigenvalues of the measured operator.The post-selection process sums the shifted normal wave function with some weights.Such sums could accumulate around a 'strange' value.

Super-oscillations
Consider the ensemble of particles where |α| 2 + |β| 2 = 1.Let Â = Ŝ z /N, then following the post-selection it is easy to see that the wave function of the needle is [8]: If we apply these operators on the function |φ(x) one by one, we can see a superposition of shifted functions which are also weighted by polynomials in α and β.This resembles a random walk.In general, such expressions give rise to super-oscillations [21,22].Consider the following function where a > 1.If we take a very small x, (x < √ N): The peculiar thing is that originally the expression for f (x) contained only high wavelengths (N > 1), however its global expression for small x has a wavelength 1/a which could be much smaller.Such super-oscillations do exist, however they last for a very small length (x < √ N) since their length is exponentially proportional to their energy [22].
Super-oscillations in weak measurement theory could exist if Pd is small, and indeed we demand that Pd is distributed around 0 with a very low variance (weak interaction).Then the global shift of the needle which is the weak value corresponds to some super-oscillation.

The interpretation of post-selection
There is a question about the meaning of the weak value that we get following the post-selection [23].Could it be that the measurement depends on a post selected vector i.e. a future event?Before the post-selection any measurement outcome looks random in between the eigenvalues of Â.Following the post-selection these results (more accurately small part of the results) seems to accumulate around a value.It looks as if this value suggests a physical entity, however the information concerning this entity is presented to us only a posteriori.
In [8] and [16] it was claimed that the correct way to interpret those weak values is in terms of a time symmetric theory.Quantum theory should be thought of as a time symmetric theory.A pre selected state is no more physical than a post selected state.Any physical theory should consist of an initial condition, an evolutionary process and a final condition.The evolutionary process could be looked at as a process that evolves the initial condition forward in time, but also as a process that evolves the final condition backwards in time.Looking at the quantum process as most physicists do, from the initial condition forward in time, inserts an artificial time direction into the theory, a direction which does not exists in its own in quantum theory.The evolution process of the quantum theory follows the Aharonov-Bergmann-Lebowitz formula [24]: given a non degenerate operator Â with eigenvectors |a i and eigenvalues a i , the probability to measure the eigenvalue a j is It is easy to write the known S -matrices using such a formula while keeping the initial and final states.Therefore the Aharonov-Bergmann-Lebowitz formula generalizes the computation of amplitudes in the ordinary pre selected time antisymmetric quantum theory.Note that the probability to get some value j could be 1 even if |ψ in is not an eigenvector of Â.In the denominator it could indeed be that all of the summands besides the jth cancel and therefore the quotient is 1 (see Example 4).A quick look at Equation 1 shows that this could not happen for strong projective measurements.
Whenever the Aharonov-Bergmann-Lebowitz formula produces an eigenvalue with certainty we can interpret it as an element of reality.This value is certain under the initial and final conditions that we picked.A related discussion regarding the different definitions for 'elements of reality' appears in [10].

Complex weak values
The weak value Â w is a matrix coefficient and could have complex values.How do complex values effect the needle's wave function?
Theorem [Jozsa] [25]: Consider Ĥint = g(t) Â ⊗ Pd and let |α be the pointer's wave function following the post-selection If Â w = a + ıb, M any observable, then where Proof: The proof is a simple use of the definition of |α .The computation of M f is up to O(g) where g is the coupling impulse function (Equation 11).Note also that e −ı Â w Pd is non-unitary (has complex arguments) and α|α = 1 + 2gb Pd i . Corollary: So real values of Â w shift the needle's expectation value of Xd and imaginary values of Â w shift the needle's expectation value of Pd .

Amplification
We can use weak measurements for an amplification process [26][27][28].If |ψ fin is almost orthogonal to |ψ in then | Â w | could be as large as we want.We could use this to amplify signals.There is a trade off between the amplification and the complexity of the post-selection process.
When the two functions are almost orthogonal it would be hard to post select |ψ fin , having post selected |ψ fin the amplification factor is high.In [27] weak measurements were used to amplify a small momentum perturbation in a Sagnac interferometer.In the following we will briefly review the argument.
Let the initial vector be where η is some phase differentiating between the two beams.Let the final vector be Suppose we add a small amount of momentum k to the device.We want to measure its effect on the phase difference between the two beams.We can take any observable Â that differentiates between the two paths and measure The interaction Hamiltonian is where the wave function of the needle will be written in the momentum representation.
It is easy to compute Since Â w is imaginary, we will look at the shift of Xd .By the above theorem of Jozsa (for the Hamiltonian Equation 58 where Xd is used instead of Pd ): Hence for a very low η the value of Xd i will be shifted away.Note however that for such small angles it will also be hard to post-select since ψ fin |ψ in = sin 2 (η/2).Compare now the shift of the weak value of Xd with the phase shift we usually measure on a Sagnac interferometer as functions of the small momentum k added to the system.This will be the amplification factor of the signal.It looks as if this factor could be very large and even unlimited.In [28] it was shown that if we use the Jozsa expansion to all orders of g we can see the bound on the amplification factor.

Violation of the product rule
It is easy to see that a sum of weak values is a weak value of the sum of the measurement operators.However such a theorem is not true for the product of weak measurements.In standard quantum theory if a state is an eigenvector for two operators then the product of the eigenvalues is the eigenvalue of the product of the operators.But as we saw above weak values could be certain even if the initial state was not an eigenvector.The following example clarifies this fact.1.These operators are degenerate and therefore to compute probabilities we use a version of the Aharonov-Bergmann-Lebowitz formula for degenerate operators.We conclude that measuring PA yields 1 with probability 1 [8].The same is true for PB : measuring PB yields 1 with probability 1. However measuring the product PA PB yields 0 with probability 1. Hence the product rule in not true in the two-state vector formalism.

Negative probabilities
Consider a projective operator P. Its eigenvalues are 0 or 1, therefore its expectation value on a general state |ψ could be anywhere between 0 and 1.We could think of such a value as the probability to be in the state defined by the projection.Projection operators which are weakly measured could have other values.In Example 4 above PC w = −1.What does it mean that the weak projection operator P has a negative value?We could generalize the above argument saying that there is a negative probability to be projected into the state defined by P [29], [30].Let's dwell on this argument.
The initial wave function in Example 4 represents a superposition of three position states.Suppose we have a positive particle that could be in one of three positions, and the initial state is a superposition of those three states.Trying to verify if the particle is in position A is like measuring the projector PA .The measuring device for each of the projectors could measure the transverse momentum of an electron while passing near the corresponding position Ĥint = Xd ⊗ PA (64) If the positive particle is in one of the above positions then we expect a deflection of the electron towards that direction.However, computing the weak values of the projectors we get A negative value of PC corresponds to a deflection of the momentum wave function in the opposite direction, away from the particle.This looks very strange as if the electron is avoiding the positive charge.However experiments do show that the deflection is away from the positive particle [11].This also looks as if the particle has changed its charge from positive to negative.We could use this result to say that a negative probability could be interpreted as a positive probability for an opposite physical event.
Negative probability could also be used to explain paradoxical results that follow from the violation of the product rule.Consider the following example.

Example 5: The Cheshire cat experiment
A Cheshire cat experiment is a case where we can separate quantum variables by using projection operators.In this case we separate a cat from its grin [31]. Let By simple computation we can see that PA w = 0, PB w = 1, so the cat is in position B. Also Ŝ z w = 1 so the cat is grinning.However Ŝ z PB w = 0 and Ŝ z PA w = 1 so the grin is in position A. Hence the cat is in position B and its grin is in position A. This strange result is an outcome of the violation of the product rule.
The Cheshire cat experiment could be interpreted using negative probabilities, i.e. negative values of projective operators.We compute the weak values of all the products of the above projectors Note that although some of the values are negative the cat's position and the grin's position have positive real values and therefore positive probability.Note that PB w = PB ( P0 + P1 ) w = 1 and therefore the cat is in position B. Also ( PA + PB ) P0 w = 1 and therefore the cat is grinning.Moreover PA P0 w − PA P1 w = 1.This should be interpreted as the probability of a grin in position A, minus the probability of a frown in position A. If we interpret the probability of a frown as a negative probability of a grin then we can conclude that the grin is in position A.
Observe that although some of the probabilities above were negative, in each question having a real physical meaning the negative probabilities disappeared, and we were left with positive probability.Feynman used exactly this property in his discussion on negative probabilities [29].
Negative probabilities were also used as an alternative approach to Bell's inequalities [32].For a general review on extended probabilities see [33].

Paradoxes
There are several paradoxes that stem from weak measurements with post-selection.Some of them were already discussed above like the Cheshire cat paradox or the quantum shell game.There is also the faster than light paradox [34] and many others.Here we shall discuss a paradox that concerns the two slit experiment.We will show that it is possible to describe the path of the particle without disturbing the interference pattern [12,13].The pattern of the interference will be measured by a weak coupling while the path will be revealed a posteriori.The possibility of measuring the average trajectory without disturbing the interference does not in itself contradict the mathematical formulation of the Heisenberg principle.This formulation of the principle states that on a large ensemble of identical particles while measuring some of them with a variable X and the others with a variable P conjugate to X, the product of the uncertainties satisfies ∆X∆P > [3,4].It is generally accepted that the correct interpretation of the Heisenberg principle is that the measurements of X do disturb the measurements of P. Therefore we can say that weak measurement theory question the interpretation that the mere disturbance is the core of the principle.
Let Â denote some weak measurement on a system S with initial sate |ψ .Let |1 , ..., |l denote an orthogonal basis for the system.Suppose |ψ = k c k |k .Couple weakly the system to the measurement device.Now suppose we want to post select using the above basis.In fact, this is a strong projective measurement using the projectors |k k|.We shall repeat the process N times for some large N, each time register the weak value We can therefore use the post-selected states to compute the average ψ| Â|ψ (see also subsection 2.4).However the operator Â and the measurement operator of the post-selection could represent two non-commuting observables.Consider for example the two slit experiment.Suppose we place a series of momentum detectors at the screen edge of the experiment.The detector can identify the transverse momentum of the particle and therefore identify the slit through which it passed.This in fact is a post-selection.We can identify a posteriori the path of the particle.Suppose now Â enables a weak measurement of the projection of the position variable of the particle into a small area between the two slits and the screen (we have already used weak projectors in the sections above).The average of a projection operator could be considered as a density or amplitude.We could use the above process to compute ψ| Â|ψ which corresponds to the amplitude of the wave function of the particles in a small area.The measurement is weak and therefore does not interfere with the wave pattern.We can continuously change the position of Â, therefore Â is like a 'weak screen' in between the two slits and the 'strong screen'.Therefore we can see the whole pattern of the interference and at the same time have a list of all trajectories which is a paradox.
It seems that by using the two-state vector formalism it is possible to know the average trajectory of the particle without disturbing the interference.Notice however that a trajectory is known a posteriori from the point of view of the one-state vector formalism [12].