In answer to your first question, I wrote myself some notes some time ago about my understanding of how it worked. The notation is probably a bit different (I've tried to bring it more into line, but it's easy to miss bits), but attempts to explain that choice of the state |Ψ0⟩. There also seem to be some factors of 12 floating around in places.
When we first study phase estimation, we're usually thinking about it in respect to use in some particular algorithm, such as Shor's algorithm. This has a specific goal: getting the best t-bit approximation to the eigenvalue. You either do, or you don't, and the description of phase estimation is specifically tuned to give as high a success probability as possible.
In HHL, we are trying to produce some state
|ϕ⟩=∑jβjλj|λj⟩,
where
|b⟩=∑jβj|λj⟩, making use of phase estimation. The accuracy of the approximation of this will depend far more critically on an accurate estimation of the eigenvalues that are close to 0 rather than those that are far from 0. An obvious step therefore, is to attempt to modify the phase estimation protocol so that rather than using `bins' of fixed width
2π/T for approximating the phases of
e−iAt (
T=2t and
t is number of qubits in phase estimation register), we might rather specify a set of
ϕy for
y∈{0,1}t to act as the centre of each bin so that we can have vastly increased accuracy close to 0 phase. More generally, you might specify a trade-off function for how tolerant you might be of errors as a function of the phase
ϕ. The precise nature of this function can then be tuned to a given application, and the particular figure of merit which you will use to determine success. In the case of Shor's algorithm, our figure of merit was simply this binning protocol -- we were successful if the answer was in the correct bin, and unsuccessful outside it. This is not going to be the case in HHL, whose success is more reasonably captured by a continuous measure such as the fidelity. So, for the general case, we shall designate a cost function
C(ϕ,ϕ′) which specifies a penalty for answers
ϕ′ if the true phase is
ϕ.
Recall that the standard phase estimation protocol worked by producing an input state that was the uniform superposition of all basis states |x⟩ for x∈{0,1}t. This state was used to control the sequential application of multiple controlled-U gates, which are followed up by an inverse Fourier transform. Imagine we could replace the input state with some other state
|Ψ0⟩=∑x∈{0,1}tαx|x⟩,
and then the rest of the protocol could work as before. For now, we will ignore the question of how hard it is to produce the new state
|Ψ0⟩, as we are just trying to convey the basic concept. Starting from this state, the use of the controlled-
U gates (targeting an eigenvector of
U of eigenvalue
ϕ), produces the state
∑x∈{0,1}tαxeiϕx|x⟩.
Applying the inverse Fourier transform yields
1T−−√∑x,y∈{0,1}teix(ϕ−2πyM)αx|y⟩.
The probability of getting an answer
y (i.e.
ϕ′=2πy/T) is
1T∣∣∣∣∑x∈{0,1}teix(ϕ−2πyT)αx∣∣∣∣2
so the expected value of the cost function, assuming a random distribution of the
ϕ, is
C¯=12πT∫2π0dϕ∑y∈{0,1}t∣∣∣∣∑x∈{0,1}teix(ϕ−2πyT)αx∣∣∣∣2C(ϕ,2πy/T),
and our task is to select the amplitudes
αx that minimise this for any specific realisation of
C(ϕ,ϕ′). If we make the simplifying assumption that
C(ϕ,ϕ′) is only a function of
ϕ−ϕ′, then we can make a change of variable in the integration to give
C¯=12π∫2π0dϕ∣∣∣∣∑x∈{0,1}teixϕαx∣∣∣∣2C(ϕ),
As we noted, the most useful measure is likely to be a fidelity measure. Consider we have a state
|+⟩ and we wish to implement the unitary
Uϕ=|0⟩⟨0|+eiϕ|1⟩⟨1|, but instead we implement
Uϕ′=|0⟩⟨0|+eiϕ′|1⟩⟨1|. The fidelity measures how well this achieves the desired task,
F=∣∣⟨+|U†ϕ′U|+⟩∣∣2=cos2(ϕ−ϕ′2),
so we take
C(ϕ−ϕ′)=sin2(ϕ−ϕ′2),
since in the ideal case
F=1, so the error, which is what we want to minimise, can be taken as
1−F.
This will certainly be the correct function for evaluating any
Ut, but for the more general task of modifying the amplitudes, not just the phases, the effects of inaccuracies propagate through the protocol in a less trivial manner, so it is difficult to prove optimality, although the function
C(ϕ−ϕ′) will already provide some improvement over the uniform superposition of states. Proceeding with this form, we have
C¯=12π∫2π0dϕ∣∣∣∣∑x∈{0,1}teixϕαx∣∣∣∣2sin2(12ϕ),
The integral over
ϕ can now be performed, so we want to minimise the function
12∑x,y=0T−1αxα⋆y(δx,y−12δx,y−1−12δx,y+1).
This can be succinctly expressed as
min⟨Ψ0|H|Ψ0⟩
where
H=12∑x,y=0T−1(δx,y−12δx,y−1−12δx,y+1)|x⟩⟨y|.
The optimal choice of
|Ψ0⟩ is the minimum eigenvector of the matrix
H,
αx=2T+1−−−−−√sin((x+1)πT+1),
and
C¯ is the minimum eigenvalue
C¯=12−12cos(πT+1).
Crucially, for large
T,
C¯ scales as
1/T2 rather than the
1/T that we would have got from the uniform coupling choice
αx=1/T−−√. This yields a significant benefit for the error analysis.
If you want to get the same |Ψ0⟩ as reported in the HHL paper, I believe you have to add the terms −14(|0⟩⟨T−1|+|T−1⟩⟨0|) to the Hamiltonian. I have no justification for doing so, however, but this is probably my failing.