Category Archives: Quantum Mechanics

Stern-Gerlach Revisited

In this section we revisit the discussion of the Stern-Gerlach experiment and show that the observed behaviour of that system can be perfectly described using the mathematical framework developed so far.

In the previous discussion of the Stern-Gerlach experiment we saw that what was being measured was the component of spin angular momentum along a particular direction in space and that there were only ever two possible outcomes to such a measurement. Thus, it would appear that spin states of the electron live in a 2-dimensional state space, they are qubits, and it will be useful to employ the \(\ket{\mathbf{n},\pm}\) notation for states labeled by a particular direction in space. With respect to some (arbitrarily chosen) coordinate axes a Stern-Gerlach apparatus may be arranged to measure the component of spin in the \(z\)-direction and the corresponding spin states could be denoted \(\ket{z;\pm}\). We would be inclined to posit that these are eigenstates of an observable \(S_z\) corresponding to eigenvalues \(\hbar/2\) and \(-\hbar/2\) respectively. That is,
\begin{equation}
S_z\ket{z;\pm}=\pm\frac{\hbar}{2}\ket{z;\pm}
\end{equation}
and so the matrix representation of \(S_z\) in this basis is
\begin{equation}
\mathbf{S}_z=\frac{\hbar}{2}\begin{pmatrix}1&0\\0&-1\end{pmatrix}=\frac{\hbar}{2}\boldsymbol{\sigma}_z
\end{equation}
where we have recalled the definition of the Pauli matrix \(\sigma_z\).

Indeed, the discussion in Qubit Mechanics I suggests that we should define spin observables for a general orientation, \(\mathbf{n}\), in space according to \begin{equation}
\mathbf{S}_\mathbf{n}=\frac{\hbar}{2}\mathbf{n}\cdot\boldsymbol{\sigma}
\end{equation}
with corresponding eigenstates \(\ket{\mathbf{n},\pm}\). So in particular we would have
\begin{equation}
\mathbf{S}_x=\frac{\hbar}{2}\begin{pmatrix}0&1\\1&0\end{pmatrix}=\frac{\hbar}{2}\boldsymbol{\sigma}_x
\end{equation}
and
\begin{equation}
\mathbf{S}_y=\frac{\hbar}{2}\begin{pmatrix}0&-i\\i&0\end{pmatrix}=\frac{\hbar}{2}\boldsymbol{\sigma}_y
\end{equation}
with respective orthonormal eigenstates,
\begin{eqnarray}
\ket{x;+}&=&\frac{1}{\sqrt{2}}\begin{pmatrix}1\\1\end{pmatrix}&=&\frac{1}{\sqrt{2}}\left(\ket{z;+}+\ket{z;-}\right)\\
\ket{x;-}&=&\frac{1}{\sqrt{2}}\begin{pmatrix}1\\-1\end{pmatrix}&=&\frac{1}{\sqrt{2}}\left(\ket{z;+}-\ket{z;-}\right)
\end{eqnarray}
and
\begin{eqnarray}
\ket{y;+}&=&\frac{1}{\sqrt{2}}\begin{pmatrix}1\\i\end{pmatrix}&=&\frac{1}{\sqrt{2}}\left(\ket{z;+}+i\ket{z;-}\right)\\
\ket{y;-}&=&\frac{1}{\sqrt{2}}\begin{pmatrix}1\\-i\end{pmatrix}&=&\frac{1}{\sqrt{2}}\left(\ket{z;+}-i\ket{z;-}\right).
\end{eqnarray}

Stern-Gerlach Explained

We now have all the quantum mechanical machinery in place to understand the Stern-Gerlach experiments. Recall the basic setup, which we referred to as SG1. We assume that the spin state of an atom entering SG1 is in some arbitrary state \(\ket{\psi}\) in a 2-dimensional state space. The measuring device in SG1 corresponds to the observable \(S_z\) whose spectral decomposition is
\begin{equation}
S_z=\frac{\hbar}{2}\ket{z;+}\bra{z;+}-\frac{\hbar}{2}\ket{z;-}\bra{z;-}
\end{equation}
and therefore the probability of measuring a particle to be spin up is \(p(z;+)\) given by
\begin{equation}
p(z;+)=\braket{\psi\ket{z;+}\bra{z;+}\psi}
\end{equation}
in other words it is the squared modulus of the probability amplitude \(\braket{z;+|\psi}\) for finding \(\ket{\psi}\) in the state \(\ket{z;+}\). Likewise, the probability amplitude for finding \(\ket{\psi}\) in the state \(\ket{z;-}\) is \(\braket{z;-|\psi}\) corresponding to the probability \(p(z;-)=|\braket{z;-|\psi}|^2\).

Now let us consider SG2. In this case, we retain only the atoms emerging with spin state \(\ket{z;+}\) from the initial \(S_z\) measuring device then subject these atoms to a second \(S_z\) device. In this case the amplitudes for measuring the \(z\)-component of spin up and down are respectively \(\braket{z;+|z;+}=1\) and \(\braket{z;-|z;+}=0\) so we are sure to confirm that the atom has spin state up.

If instead of passing the atoms retained from the first \(S_z\) device in SG2 into a second \(S_z\) device we pass them instead into an \(S_x\) device, as in SG3, then the relevant amplitudes are
\begin{equation}
\braket{x;+|z;+}=\frac{1}{\sqrt{2}}\braket{z;+|z;+}+\frac{1}{\sqrt{2}}\braket{z;-|z;+}=\frac{1}{\sqrt{2}}
\end{equation}
for finding the \(x\)-component of the spin to be up and
\begin{equation}
\braket{x;-|z;+}=\frac{1}{\sqrt{2}}\braket{z;+|z;+}-\frac{1}{\sqrt{2}}\braket{z;-|z;+}=-\frac{1}{\sqrt{2}}
\end{equation}
for finding the \(x\)-component of the spin to be down. That is, we find that there is an equal probability of \(1/2\) for the \(x\)-component of the spin to be up or down.

In SG4 we retain the atoms measured to be spin down by a \(S_x\) device which took as input atoms measured to be spin up by a \(S_z\) device. These atoms are then passed to another \(S_z\) measuring device. The relevant amplitudes are now
\begin{equation}
\braket{z;+|x;-}=\frac{1}{\sqrt{2}}\braket{z;+|z;+}-\frac{1}{\sqrt{2}}\braket{z;+|z;-}=\frac{1}{\sqrt{2}}
\end{equation}
for finding the \(z\)-component of the spin to be up and
\begin{equation}
\braket{z;-|x;-}=\frac{1}{\sqrt{2}}\braket{z;-|z;+}-\frac{1}{\sqrt{2}}\braket{z;-|z;-}=-\frac{1}{\sqrt{2}}
\end{equation}
for finding the \(z\)-component of the spin to be down. That is, we find that there is an equal probability of \(1/2\) for the \(z\)-component of the spin to be up or down. The quantum mechanical formalism makes it clear that there is no ‘memory’ that the atoms had previously, before the \(S_x\) measurement, been found with probability 1 to have \(z\)-component of their spin up!

Qubit mechanics I

Two-level systems, quantum mechanical systems whose state space is \(\CC^2\), are relatively simple yet still rich enough to exhibit most of the peculiarities of the quantum world. Moreover, they are physically important – we will consider nuclear magnetic resonance and the ammonia maser as examples.

Throughout we will assume, unless explicitly stated otherwise, that the column vectors and matrices representing state vectors and observables respectively are with respect to the standard basis of \(\CC^2\).

Properties of Pauli matrices

It’s straightforward to verify that the three Pauli matrices,
\begin{equation}
\boldsymbol{\sigma}_1=\begin{pmatrix}0&1\\1&0\end{pmatrix}\qquad\boldsymbol{\sigma}_2=\begin{pmatrix}0&-i\\i&0\end{pmatrix}\qquad\boldsymbol{\sigma}_3=\begin{pmatrix}1&0\\0&-1\end{pmatrix},
\end{equation}
each square to the identity,
\begin{equation}
\boldsymbol{\sigma}_i^2=\mathbf{I},\qquad i=1,2,3
\end{equation}
and that they are all traceless,
\begin{equation}
\tr\boldsymbol{\sigma}_i=0,\qquad i=1,2,3.
\end{equation}
From these two facts it follows that each Pauli matrix has two eigenvalues \(\pm1\). We can compute the commutators and find
\begin{equation}
[\boldsymbol{\sigma}_i,\boldsymbol{\sigma}_j]=2i\epsilon_{ijk}\boldsymbol{\sigma}_k.
\end{equation}
Likewise the anti-commutators are
\begin{equation}
\{\boldsymbol{\sigma}_i,\boldsymbol{\sigma}_j\}=2\delta_{ij}\mathbf{I},
\end{equation}
and since the product of any pair of operators is one-half the sum of the anti-commutator and the commutator we have
\begin{equation}
\boldsymbol{\sigma}_i\boldsymbol{\sigma}_j=\delta_{ij}\mathbf{I}+i\epsilon_{ijk}\boldsymbol{\sigma}_k.
\end{equation}
A simple consequence of this is the rather useful relation,
\begin{equation}
(\mathbf{u}\cdot\boldsymbol{\sigma})(\mathbf{v}\cdot\boldsymbol{\sigma})=(\mathbf{u}\cdot\mathbf{v})\mathbf{I}+i(\mathbf{u}\times\mathbf{v})\cdot\boldsymbol{\sigma}.
\end{equation}

Hermitian and Unitary operators on \(\CC^2\)

Any \(2\times2\) matrix \(\mathbf{M}\) representing a linear operator on \(\CC^2\) can be expressed as a linear combination of the identity matrix and the three Pauli matrices,
\begin{equation}
\mathbf{M}=m_0\mathbf{I}+\mathbf{m}\cdot\boldsymbol{\sigma}
\end{equation}
where \(m_0\) and the components \(m_1,m_2,m_3\) of the vector \(\mathbf{m}\) are complex numbers and \(\boldsymbol{\sigma}\) is the vector with components the Pauli matrices, \(\boldsymbol{\sigma}=(\boldsymbol{\sigma}_1,\boldsymbol{\sigma}_2,\boldsymbol{\sigma}_3)\). It follows that
\begin{equation}
m_0=\frac{1}{2}\tr\mathbf{M},\quad m_i=\frac{1}{2}\tr(\mathbf{M}\boldsymbol{\sigma}_i).
\end{equation}

The condition for a matrix \(\mathbf{Q}\) to be Hermitian is that \(\mathbf{Q}^\dagger=\mathbf{Q}\). Thus we must have
\begin{equation}
\mathbf{Q}^\dagger=q_0^*\mathbf{I}+\mathbf{q}^*\cdot\boldsymbol{\sigma}=q_0\mathbf{I}+\mathbf{q}\cdot\boldsymbol{\sigma}=\mathbf{Q},
\end{equation}
where \(\mathbf{q}^*\) indicates the vector whose components are the complex conjugate of the vector \(\mathbf{q}\). It follows immediately that any Hermitian operator, that is, any qubit observable \(Q\), can be represented by a matrix
\begin{equation}
\mathbf{Q}=q_0\mathbf{I}+\mathbf{q}\cdot\boldsymbol{\sigma}
\end{equation}
where \(q_0\) and the components of the vector \(\mathbf{q}\) are all real.

The condition for a matrix \(\mathbf{U}\) to be unitary is \(\mathbf{U}^\dagger\mathbf{U}=\mathbf{U}\mathbf{U}^\dagger=\mathbf{I}\).

Theorem Any qubit unitary transformation \(U\) can, up to a choice of phase, be represented by a matrix \(\mathbf{U}\) given by
\begin{equation}
\mathbf{U}=\exp{(-i\theta\mathbf{n}\cdot\boldsymbol{\sigma})}
\end{equation}
where \(\theta\) and \(\mathbf{n}\) can be interpreted respectively as an angle and a unit vector in 3-dimensional space.

Proof We begin with the general form
\begin{equation}
\mathbf{U}=u_0\mathbf{I}+\mathbf{u}\cdot\boldsymbol{\sigma}
\end{equation}
in which \(u_0\) and \(\mathbf{u}\) are an arbitrary complex number and complex valued vector respectively. We will impose the condition \(\mathbf{U}^\dagger\mathbf{U}=\mathbf{I}\) but observe that this condition leaves an overall choice of phase unconstrained. Using this flexibility we can take \(u_0\) to be real then
\begin{align*}
\mathbf{U}^\dagger\mathbf{U}&=(u_0\mathbf{I}+\mathbf{u}^*\cdot\boldsymbol{\sigma})(u_0\mathbf{I}+\mathbf{u}\cdot\boldsymbol{\sigma})\\
&=u_0^2\mathbf{I}+2u_0\Real\mathbf{u}\cdot\boldsymbol{\sigma}+(\mathbf{u}^*\cdot\boldsymbol{\sigma})(\mathbf{u}\cdot\boldsymbol{\sigma})\\
&=u_0^2\mathbf{I}+2u_0\Real\mathbf{u}\cdot\boldsymbol{\sigma}+\mathbf{u}\cdot\mathbf{u}^*\mathbf{I}-i(\mathbf{u}\times\mathbf{u}^*)\cdot\boldsymbol{\sigma}=\mathbf{I}.
\end{align*}
Similarly, we have,
\begin{equation*}
\mathbf{U}\mathbf{U}^\dagger=u_0^2\mathbf{I}+2u_0\Real\mathbf{u}\cdot\boldsymbol{\sigma}+\mathbf{u}\cdot\mathbf{u}^*\mathbf{I}+i(\mathbf{u}\times\mathbf{u}^*)\cdot\boldsymbol{\sigma}=\mathbf{I}.
\end{equation*}
This means that we must have
\begin{equation}
\mathbf{u}\times\mathbf{u}^*=0,\label{eq:condition one}
\end{equation}
\begin{equation}
u_0\Real\mathbf{u}=0\label{eq:condition two}
\end{equation}
and
\begin{equation}
u_0^2+\mathbf{u}^*\cdot\mathbf{u}=1.\label{eq:condition three}
\end{equation}
Equation \eqref{eq:condition one} implies that
\begin{equation}
(\Real\mathbf{u}+i\Imag\mathbf{u})\times(\Real\mathbf{u}-i\Imag\mathbf{u})=-2i\Real\mathbf{u}\times\Imag\mathbf{u}=0
\end{equation}
that is, \(\Real\mathbf{u}\parallel\Imag\mathbf{u}\) so that we must be able to write \(\mathbf{u}=\alpha\mathbf{v}\) for some complex number \(\alpha\) and real vector \(\mathbf{v}\). The second condition, \eqref{eq:condition two}, tells us that either \(u_0=0\) or \(\mathbf{u}\) is pure imaginary. So that together, \eqref{eq:condition one} and \eqref{eq:condition two} imply that either \(\mathbf{u}=i\mathbf{v}\) or \(u_0=0\) and \(\mathbf{u}=\alpha\mathbf{v}\). In the latter case \eqref{eq:condition three} then implies that \(|\alpha|^2|\mathbf{v}|^2=1\) so we can write
\begin{align*}
\mathbf{U}&=\alpha\mathbf{v}\cdot\boldsymbol{\sigma}\\
&=e^{i\phi}|\alpha||\mathbf{v}|\frac{\mathbf{v}}{|\mathbf{v}|}\cdot\boldsymbol{\sigma}
\end{align*}
which up to a choice of phase has the form
\begin{equation}
\mathbf{U}=i\mathbf{n}\cdot\boldsymbol{\sigma}
\end{equation}
for some real unit vector \(\mathbf{n}\).
In the case that \(u_0\neq0\), since
\begin{equation}
u_0^2+\mathbf{v}\cdot\mathbf{v}=1
\end{equation}
we can write \(u_0=\cos\theta\) and \(\mathbf{v}=-\sin\theta\mathbf{n}\) for some angle \(\theta\) and a (real) unit vector \(\mathbf{n}\), that is, in either case, we have that up to an overall phase,
\begin{equation}
\mathbf{U}=\cos\theta\mathbf{I}-i\sin\theta\mathbf{n}\cdot\boldsymbol{\sigma}.
\end{equation}
Finally, observing that \((\mathbf{n}\cdot\boldsymbol{\sigma})^2=\mathbf{I}\) the desired matrix exponential can be written as
\begin{align*}
\exp{(-i\theta\mathbf{n}\cdot\boldsymbol{\sigma})}&=\mathbf{I}-i\theta\mathbf{n}\cdot\boldsymbol{\sigma}+\frac{i^2\theta^2}{2!}(\mathbf{n}\cdot\boldsymbol{\sigma})^2-\frac{i^3\theta^3}{3!}(\mathbf{n}\cdot\boldsymbol{\sigma})^3+\dots\\
&=\left(1-\frac{\theta^2}{2!}+\frac{\theta^4}{4!}+\dots\right)\mathbf{I}-i\left(\theta-\frac{\theta^3}{3!}+\frac{\theta^5}{5!}+\dots\right)\mathbf{n}\cdot\boldsymbol{\sigma}\\
&=\cos\theta\mathbf{I}-i\sin\theta\mathbf{n}\cdot\boldsymbol{\sigma}
\end{align*}\(\blacksquare\)

Unitary Operators on \(\CC^2\) and Rotations in \(\RR^3\)

The \(2\times2\) unitary matrices form a group under the usual matrix multiplication. This is the Lie group \(U(2)\). The \(2\times2\) unitary matrices with determinant 1 form the special unitary group \(SU(2)\). The previous theorem tells us that a general element of \(SU(2)\), which we’ll denote \(\mathbf{U}(\mathbf{n},\theta)\), can be written as
\begin{equation}
\mathbf{U}(\mathbf{n},\theta)=\exp{\left(-i\frac{\theta}{2}\mathbf{n}\cdot\boldsymbol{\sigma}\right)}.
\end{equation}
That is, the phase has been chosen such that \(\det\mathbf{U}(\mathbf{n},\theta)=1\).

The group of rotations in 3-dimensional space, denoted \(SO(3)\), consists of the all \(3\times3\)-orthogonal matrices with determinant 1. The reason for choosing the half-angle \(\theta\) is made clear in the following result.

Theorem There is a 2-to-1 group homomorphism from \(SU(2)\) to \(SO(3)\), \(\mathbf{U}(\mathbf{n},\theta)\mapsto\mathbf{R}(\mathbf{n},\theta)\) where \(\mathbf{R}(\mathbf{n},\theta)\) is the rotation through an angle \(\theta\) about the axis \(\mathbf{n}\).

Proof If we denote by \(H\) the set of traceless, Hermitian, \(2\times2\) matrices then it is not difficult to see that this is a real 3-dimensional vector space for which the Pauli matrices are a basis. The map \(f:\RR^3\mapto H\) given by \(f(\mathbf{v})=\mathbf{v}\cdot\boldsymbol{\sigma}\) is then an isomorphism of vector spaces. Defining an inner product on \(H\) according to
\begin{equation}
(\mathbf{M},\mathbf{N})_H=\frac{1}{2}\tr(\mathbf{M}\mathbf{N})
\end{equation}
this isomorphism becomes an isometry of vector spaces since,
\begin{equation*}
(f(\mathbf{v}),f(\mathbf{w}))_H=\frac{1}{2}\tr((\mathbf{v}\cdot\boldsymbol{\sigma})(\mathbf{w}\cdot\boldsymbol{\sigma}))=\mathbf{v}\cdot\mathbf{w}=(\mathbf{v},\mathbf{w})_{\RR^3},
\end{equation*}
that is, \(H\) and \(\RR^3\) are isometric. Now to any \(2\times2\) unitary matrix \(\mathbf{U}\) we can associate a linear operator \(T_\mathbf{U}\) on \(H\) such that \(T_\mathbf{U}\mathbf{M}=\mathbf{U}\mathbf{M}\mathbf{U}^\dagger\). This is clearly an isometry on \(H\) and so the corresponding linear operator on \(\RR^3\), \(f^{-1}\circ T_\mathbf{U}\circ f\) which we’ll denote \(\mathbf{R}_\mathbf{U}\) and is such that
\begin{equation}
\mathbf{R}_\mathbf{U}\mathbf{v}\cdot\boldsymbol{\sigma}=\mathbf{U}\mathbf{v}\cdot\boldsymbol{\sigma}\mathbf{U}^\dagger\label{eq:rotation from unitary}
\end{equation}
for any \(\mathbf{v}\in\RR^3\) and must be an isometrty, that is, an orthogonal operator. In fact, since
\begin{align*}
\tr\left((\mathbf{R}_\mathbf{U}\mathbf{e}_1\cdot\boldsymbol{\sigma})(\mathbf{R}_\mathbf{U}\mathbf{e}_2\cdot\boldsymbol{\sigma})
(\mathbf{R}_\mathbf{U}\mathbf{e}_3\cdot\boldsymbol{\sigma})\right)&=\tr\left((R_\mathbf{U})_1^i(R_\mathbf{U})_2^j(R_\mathbf{U})_3^k\boldsymbol{\sigma}_i\boldsymbol{\sigma}_j\boldsymbol{\sigma}_k\right)\\
&=2i\epsilon_{ijk}(R_\mathbf{U})_1^i(R_\mathbf{U})_2^j(R_\mathbf{U})_3^k\\
&=2i\det{\mathbf{R}_\mathbf{U}}
\end{align*}
and also
\begin{align*}
\tr\left((\mathbf{R}_\mathbf{U}\mathbf{e}_1\cdot\boldsymbol{\sigma})(\mathbf{R}_\mathbf{U}\mathbf{e}_2\cdot\boldsymbol{\sigma})
(\mathbf{R}_\mathbf{U}\mathbf{e}_3\cdot\boldsymbol{\sigma})\right)&=\tr\left(\mathbf{U}\boldsymbol{\sigma}_1\mathbf{U}^\dagger\mathbf{U}\boldsymbol{\sigma}_2\mathbf{U}^\dagger\mathbf{U}\boldsymbol{\sigma}_3\mathbf{U}^\dagger\right)\\
&=\tr(\boldsymbol{\sigma}_1\boldsymbol{\sigma}_2\boldsymbol{\sigma}_3)\\
&=2i
\end{align*}
we see that \(\mathbf{R}_\mathbf{U}\in SO(3)\). Also we observe that given two unitary matrices \(\mathbf{U}_1\) and \(\mathbf{U}_2\),
\begin{align*}
\mathbf{R}_{\mathbf{U}_1\mathbf{U}_2}\mathbf{v}\cdot\boldsymbol{\sigma}&=(\mathbf{U}_1\mathbf{U}_2)\mathbf{v}\cdot\boldsymbol{\sigma}(\mathbf{U}_1\mathbf{U}_2)^\dagger\\
&=\mathbf{U}_1\left(\mathbf{R}_{\mathbf{U}_2}\mathbf{v}\cdot\boldsymbol{\sigma}\right)\mathbf{U}_1^\dagger\\
&=(\mathbf{R}_{\mathbf{U}_1}\mathbf{R}_{\mathbf{U}_2}\mathbf{v})\cdot\boldsymbol{\sigma}
\end{align*}
So defining the map \(\Phi:SU(2)\mapto SO(3)\) such that
\begin{equation}
\Phi(\mathbf{U}(\mathbf{n},\theta))=\mathbf{R}_\mathbf{U},
\end{equation}
we have a group homomorphism. The kernel of this map consists of unitary matrices \(\mathbf{U}(\mathbf{n},\theta)\) such that
\begin{equation*}
\mathbf{U}(\mathbf{n},\theta)(\mathbf{v}\cdot\boldsymbol{\sigma})=(\mathbf{v}\cdot\boldsymbol{\sigma})\mathbf{U}(\mathbf{n},\theta)
\end{equation*}
for any vector \(\mathbf{v}\). It follows that \(\mathbf{U}(\mathbf{n},\theta)\) must be a multiple of the identity matrix and since \(\det\mathbf{U}(\mathbf{n},\theta)=1\) it can only be \(\pm\mathbf{I}\). Thus, \(\ker\Phi=\{\mathbf{I},-\mathbf{I}\}\) and so the homomorphism is 2-to-1. To confirm the nature of the spatial rotation corresponding to \(\mathbf{U}(\mathbf{n},\theta)\), chose \(\mathbf{v}=\mathbf{n}\) in \eqref{eq:rotation from unitary} to see that \(\mathbf{R}_\mathbf{U}\mathbf{n}=\mathbf{n}\) so that \(\mathbf{R}_\mathbf{U}\) is a rotation about the axis \(\mathbf{n}\). To determine the angle \(\gamma\) of rotation we note that if \(\mathbf{m}\) is a vector perpendicular to \(\mathbf{n}\) then \(\cos\gamma=(\mathbf{R}_\mathbf{U}\mathbf{m})\cdot\mathbf{m}\) and we have
\begin{align*}
\cos\gamma&=(\mathbf{R}_\mathbf{U}\mathbf{m})\cdot\mathbf{m}\\
&=\frac{1}{2}\tr\left(\left(\cos\frac{\theta}{2}\mathbf{I}-i\sin\frac{\theta}{2}\mathbf{n}\cdot\boldsymbol{\sigma}\right)(\mathbf{m}\cdot\boldsymbol{\sigma})\right.\\
&\qquad\times\left.\left(\cos\frac{\theta}{2}\mathbf{I}+i\sin\frac{\theta}{2}\mathbf{n}\cdot\boldsymbol{\sigma}\right)(\mathbf{m}\cdot\boldsymbol{\sigma})\right)\\
&=\frac{1}{2}\tr\left(\left(\cos\frac{\theta}{2}\mathbf{m}\cdot\boldsymbol{\sigma}-\sin\frac{\theta}{2}(\mathbf{m}\times\mathbf{n})\cdot\boldsymbol{\sigma}\right)\right.\\
&\qquad\times\left.\left(\cos\frac{\theta}{2}\mathbf{m}\cdot\boldsymbol{\sigma}+\sin\frac{\theta}{2}(\mathbf{m}\times\mathbf{n})\cdot\boldsymbol{\sigma}\right)\right)\\
&=\cos^2\frac{\theta}{2}-\sin^2\frac{\theta}{2}\\
&=\cos\theta
\end{align*}
so that the unitary operator \(\mathbf{U}(\mathbf{n},\theta)\) corresponds to a spatial rotation about the axis \(\mathbf{n}\) through an angle \(\theta\). We therefore denote the rotation \(\mathbf{R}(\mathbf{n},\theta)\).\(\blacksquare\)

 

The Bloch sphere revisited

We have seen that any qubit observable, \(Q\), can be represented as a matrix

\begin{equation*}
\mathbf{Q}=q_0\mathbf{I}+\mathbf{q}\cdot\boldsymbol{\sigma}
\end{equation*}

where \(q_0\in\RR\) and \(\mathbf{q}\in\RR^3\). Recall the Bloch sphere,

in which a general qubit state, \(\ket{\psi}\), is given by,

\begin{equation*}
\ket{\psi}=\cos(\theta/2)\ket{0}+e^{i\varphi}\sin(\theta/2)\ket{1}.
\end{equation*}

It can be useful to denote this state vector by \(\ket{\mathbf{n};+}\), where \(\mathbf{n}\) is the unit vector with polar coordinates \((1,\theta,\phi)\), that is,
\begin{equation*}
\ket{\mathbf{n};+}=\cos(\theta/2)\ket{0}+e^{i\varphi}\sin(\theta/2)\ket{1}.
\end{equation*}
where
\begin{equation*}
\mathbf{n}=(\sin\theta\cos\phi,\sin\theta\sin\phi,\cos\theta)
\end{equation*}
and
\begin{equation*}
\ket{\mathbf{n};-}=\sin(\theta/2)\ket{0}-e^{i\varphi}\cos(\theta/2)\ket{1}.
\end{equation*}
corresponding to the antipodal point on the Bloch sphere (\(\theta\mapto\pi-\theta\) and \(\phi\mapto2\pi+\phi\)). Indeed, \(\ket{\mathbf{n};\pm}\) are precisely the eigenvectors of the observable \(\mathbf{n}\cdot\boldsymbol{\sigma}\),
\begin{equation}
(\mathbf{n}\cdot\boldsymbol{\sigma})\ket{\mathbf{n};\pm}=\pm\ket{\mathbf{n};\pm}.
\end{equation}
For example,
\begin{align*}
(\mathbf{n}\cdot\boldsymbol{\sigma})\ket{\mathbf{n};+}&=\begin{pmatrix}\cos\theta&&e^{-i\phi}\sin\theta\\ e^{i\phi}\sin\theta&&-\cos\theta\end{pmatrix}\begin{pmatrix}\cos\frac{\theta}{2}\\e^{i\phi}\sin\frac{\theta}{2}\end{pmatrix}\\
&=\begin{pmatrix}\cos\theta\cos\frac{\theta}{2}+\sin\theta\sin\frac{\theta}{2}\\
e^{i\phi}(\sin\theta\cos\frac{\theta}{2}-\cos\theta\sin\frac{\theta}{2})\end{pmatrix}\\
&=\begin{pmatrix}\cos\frac{\theta}{2}\\e^{i\phi}\sin\frac{\theta}{2}\end{pmatrix}=\ket{\mathbf{n};+}
\end{align*}
Note of course that \(\braket{\mathbf{n};+|\mathbf{n};-}=0\), that is \(\ket{\mathbf{n};+}\) and \(\ket{\mathbf{n};-}\) are orthogonal as state vectors in the Hilbert space \(\CC^2\) though of course \(\mathbf{n}\) and \(-\mathbf{n}\) are certainly not orhthogonal vectors in \(\RR^3\)!

We’ve seen that there is a 2-to-1 homomorphism from \(SU(2)\) to \(SO(3)\) such that \(\mathbf{U}(\mathbf{n},\theta)\mapsto\mathbf{R}(\mathbf{n},\theta)\) where
\begin{equation*}
\mathbf{U}(\mathbf{n},\theta)=\exp\left(-i\frac{\theta}{2}\mathbf{n}\cdot\boldsymbol{\sigma}\right)=\cos\frac{\theta}{2}\mathbf{I}-i\sin\frac{\theta}{2}\mathbf{n}\cdot\boldsymbol{\sigma}
\end{equation*}
and the rotation \(\mathbf{R}(\mathbf{n},\theta)\) is such that
\begin{equation*}
(\mathbf{R}(\mathbf{n},\theta)\mathbf{v})\cdot\boldsymbol{\sigma}=\mathbf{U}(\mathbf{n},\theta)\mathbf{v}\cdot\boldsymbol{\sigma}\mathbf{U}(\mathbf{n},\theta)^\dagger,
\end{equation*}
which we confirmed was a rotation of \(\theta\) about the axis \(\mathbf{n}\). This means that for an arbitrary vector \(\mathbf{v}\in\RR^3\),
\begin{equation}
\mathbf{R}(\mathbf{n},\theta)\mathbf{v}=\cos\theta\mathbf{v}+(1-\cos\theta)(\mathbf{v}\cdot\mathbf{n})\mathbf{n}+\sin\theta\mathbf{n}\times\mathbf{v}.
\end{equation}

In terms of the state vector notation \(\ket{\mathbf{n};\pm}\) relating unit vectors in \(\RR^3\) to states on the Bloch sphere we have that
\begin{equation}
\mathbf{U}(\mathbf{m},\alpha)\ket{\mathbf{n};+}=\ket{\mathbf{R}(\mathbf{m},\alpha)\mathbf{n};+}
\end{equation}
since
\begin{align*}
\left((\mathbf{R}(\mathbf{m},\alpha)\mathbf{n})\cdot\boldsymbol{\sigma}\right)\mathbf{U}(\mathbf{m},\alpha)\ket{\mathbf{n};+}&=\mathbf{U}(\mathbf{m},\alpha)\mathbf{n}\cdot\boldsymbol{\sigma}\ket{\mathbf{n};+}\\
&=\mathbf{U}(\mathbf{m},\alpha)\ket{\mathbf{n};+}
\end{align*}

Thus, as would have been anticipated, the unitary operator \(\mathbf{U}(\mathbf{m},\alpha)\) rotates the Bloch sphere state \(\ket{\mathbf{n};+}\) by an angle \(\alpha\) around the axis \(\mathbf{m}\).
 

Basic Postulates and Mathematical Framework

Quantum mechanics plays out in the mathematical context of Hilbert spaces. These may be finite or infinite dimensional. A finite dimensional Hilbert space in in fact nothing other than a complex vector space equipped with an Hermitian inner product. In infinite dimensions the space needs some extra technical attributes but for the time being our focus will be on finite dimensional spaces. We’ll state a series of postulates in terms of (general) Hilbert spaces safe in the knowledge that they will require little or no modification when we come to consider infinite dimensions.

State vectors

Postulate [State vectors in state space] Everything that can be said about the state of a physical system is encoded in a mathematical object called a state vector belonging to a Hilbert space also called a state space. Commonly used notation for such a vector is \(\ket{\psi}\). Conversely every non-zero vector \(\ket{\psi}\) in the Hilbert space corresponds to (everything that can be said about) a possible state of the system.

In fact, any non-zero multiple of a vector \(\ket{\psi}\) contains precisely the same information about a given state of the system and so most often we restrict attention to normalised states, that is, those of unit length, \(\braket{\psi|\psi}=1\). But normalisation only fixes state vectors up to a phase factor and the equivalence class in state space of all normalised vectors differing only by a phase is called a ray. Thus, to be precise, we say that physical states are in one-to-one correspondence with rays in state space.

Remarkable richness and physical relevance is already found in a 2-dimensional state space, the inhabitants of which are called qubits (their \(n\)-dimensional counterparts are called qudits). Mathematically, such a state space is just \(\CC^2\) and the standard basis would be provided by the pair of vectors
\begin{equation*}
\begin{pmatrix}1\\0\end{pmatrix}\qquad\qquad\begin{pmatrix}0\\1\end{pmatrix}
\end{equation*}
In quantum information contexts these are typically denoted by \(\ket{0}\) and \(\ket{1}\) respectively,
\begin{equation}
\ket{0}=\begin{pmatrix}1\\0\end{pmatrix}\qquad\qquad\ket{1}=\begin{pmatrix}0\\1\end{pmatrix}
\end{equation}
An arbitary state vector in this state space would have the form,
\begin{equation}
\ket{\psi}=a\ket{0}+b\ket{1}
\end{equation}
with \(a,b\in\CC\) and normalisation requiring that \(|a|^2+|b|^2=1\). This means that we could write a general state vector as
\begin{equation*}
\ket{\psi}=e^{i\gamma}\left(\cos(\theta/2)\ket{0}+e^{i\varphi}\sin(\theta/2)\ket{1}\right)
\end{equation*}
but for different \(\gamma\) values these are just all the members of the same ray and so we can write the most general state as
\begin{equation}
\ket{\psi}=\cos(\theta/2)\ket{0}+e^{i\varphi}\sin(\theta/2)\ket{1}.
\end{equation}
Given the assumption of unit length, we can represent qubits as point on a unit sphere, called the Bloch sphere,

In this diagram diagram we have illustrated an arbitrary qubit, \(\ket{\psi}\), as well as another possible pair of basis vectors,
\begin{equation}
\ket{+}\equiv\frac{1}{\sqrt{2}}\left(\ket{0}+\ket{1}\right)
\end{equation}
and
\begin{equation}
\ket{-}\equiv\frac{1}{\sqrt{2}}\left(\ket{0}-\ket{1}\right).
\end{equation}

Observables

Physical quantities of a quantum mechanical system which can be measured, such as position or momentum, are called observables.

Postulate [Observables] The observables of a quantum mechanical system corresponding to a state space, \(\mathcal{H}\), are represented by self-adjoint (Hermitian) operators on \(\mathcal{H}\).

In two dimensions, \(\mathcal{H}=\CC^2\), the Pauli matrices,

\begin{equation*}
\boldsymbol{\sigma}_1=\begin{pmatrix}0&1\\1&0\end{pmatrix}\qquad\boldsymbol{\sigma}_2=\begin{pmatrix}0&-i\\i&0\end{pmatrix}\qquad\boldsymbol{\sigma}_3=\begin{pmatrix}1&0\\0&-1\end{pmatrix},
\end{equation*}

are examples of (qubit) observables. In due course we’ll see their relationship to spin.

Recall that the eigenvalues of Hermitian operators are real, that eigenvectors with distinct eigenvalues are orthogonal and that there exists an orthonormal basis of eigenvectors of such operators. If the state space, \(\mathcal{H}\), has dimension \(d\) then a quantum mechanical observable, \(O\), may have \(r\leq d\) distinct eigenvalues \(\lambda_i\), each with geometric multiplicity \(d_i\) such that \(\sum_{i=1}^rd_i=d\). Denoting the corresponding othonormal basis of eigenvectors, \(\ket{i,j}\), with \(i=1,\dots,r\) and \(j=1,\dots,d_i\), then,
\begin{equation*}
O\ket{i,j}=\lambda_i\ket{i,j},\quad i=1,\dots,r,\; j=1,\dots,d_i.
\end{equation*}
If we denote by \(P_{\lambda_i}\) the projector onto the eigenspace, \(V_{\lambda_i}\), so that
\begin{equation*}
P_{\lambda_i}=\sum_{j=1}^{d_i}\ket{i,j}\bra{i,j},
\end{equation*}
then \(O\) has the spectral decomposition,
\begin{equation*}
O=\sum_{i=1}^r \lambda_iP_{\lambda_i}.
\end{equation*}

Time Development of State Vectors

Leaving aside for the moment the question of how we extract some meaningful information from these state vectors, the next postulate deals with the question of how state vectors change in time. For this we restrict attention to closed systems, that is, (idealised) systems isolated from their environment.

Postulate [Unitary time evolution] If the state of a closed system at time \(t_1\) is represented by a state vector \(\ket{\psi(t_1)}\) then at a later time \(t_2\) the state is represented by a state vector \(\ket{\psi(t_2)}\) related to \(\ket{\psi(t_1)}\) by a unitary operator \(U(t_2,t_1)\) such that
\begin{equation}
\ket{\psi(t_2)}=U(t_2,t_1)\ket{\psi(t_1)}
\end{equation}
The unitary operator \(U(t_2,t_1)\) is a property of the given physical system and describes the time evolution of any possible state of the system from time \(t_1\) to time \(t_2\).

Since \(U(t_2,t_1)\) is unitary we have that
\begin{equation*}
U^\dagger(t_2,t_1)U(t_2,t_1)=\id
\end{equation*}
but of course \(U(t,t)=\id\) and if \(t_1{<}t{<}t_2\) then \(U(t_2,t_1)=U(t_2,t)U(t,t_1)\) so we must have
\begin{equation*}
U^\dagger(t_2,t_1)=U(t_1,t_2).
\end{equation*}

Starting from some fixed time \(t_0\) let us consider the time development of a state \(\ket{\psi(t_0)}\) to some later time \(t\),
\begin{equation*}
\ket{\psi(t)}=U(t,t_0)\ket{\psi(t_0)}.
\end{equation*}
Differentiating this with respect to \(t\) we obtain
\begin{equation*}
\frac{\partial}{\partial t}\ket{\psi(t)}=\frac{\partial U(t,t_0)}{\partial t}U^\dagger(t,t_0)\ket{\psi(t)},
\end{equation*}
or, defining
\begin{equation*}
\Lambda(t,t_0)\equiv\frac{\partial U(t,t_0)}{\partial t}U^\dagger(t,t_0),
\end{equation*}
we have
\begin{equation*}
\frac{\partial}{\partial t}\ket{\psi(t)}=\Lambda(t,t_0)\ket{\psi(t)}.
\end{equation*}
The operator \(\Lambda\) is actually independent of \(t_0\) since,
\begin{align*}
\Lambda(t,t_0)&=\frac{\partial U(t,t_0)}{\partial t}U^\dagger(t,t_0)\\
&=\frac{\partial U(t,t_0)}{\partial t}U(t_0,t_1)U^\dagger(t_0,t_1)U^\dagger(t,t_0)\\
&=\frac{\partial U(t,t_0)U(t_0,t_1)}{\partial t}(U(t,t_0)U(t_1,t_1))^\dagger\\
&=\frac{\partial U(t,t_0)}{\partial t}U^\dagger(t,t_0)\\
&=\Lambda(t,t_1)
\end{align*}
so we may as well write it simply as \(\Lambda(t)\). Moreover, \(\Lambda(t)\) is anti-Hermitian as can be seen by differentiating \(U(t,t_0)U^\dagger(t,t_0)=\id\) to obtain
\begin{equation*}
\Lambda(t)+\Lambda^\dagger(t)=0.
\end{equation*}
Thus if we define a new operator \(H(t)\) according to
\begin{equation}
H(t)=i\hbar\Lambda(t)
\end{equation}
where \(\hbar\) is Planck’s constant, then \(H(t)\) is an Hermitian operator with units of energy and the time development equation becomes
\begin{equation}
i\hbar\frac{\partial}{\partial t}\ket{\psi(t)}=H(t)\ket{\psi(t)}.
\end{equation}
The operator \(H(t)\) is interpreted as the Hamiltonian of the closed system, the energy observable, and the time development equation in this form is called the Schrödinger equation.

Because the Hamiltonian is an Hermitian operator it has a spectral decomposition (dropping the the explicit reference to potential time dependence)
\begin{equation}
H=\sum_{i=1}^rE_iP_{E_i}
\end{equation}
where \(E_i\) are the (real) energy eigenvalues and \(P_{E_i}\) is a projector onto the energy eigenspace corresponding to the eigenvalue \(E_i\),
\begin{equation}
P_{E_i}=\sum_{j=1}^{d_i}\ket{E_i,j}\bra{E_i,j}
\end{equation}
where \(\ket{E_i,j}\) are energy eigenstates and \(d_i\) is the degeneracy of the energy eigenvalue \(E_i\).

The typical situation is that for a given closed system we know the Hamiltonian \(H(t)\), perhaps by analogy with a corresponding classical system. In this case, at least in principle, we can compute the corresponding unitary operator \(U(t)\) by solving the differential equation
\begin{equation}
\frac{dU(t)}{dt}=-\frac{i}{\hbar}H(t)U(t).
\end{equation}
There are 3 cases to consider.

The simplest situation is that the Hamiltonian is time independent since then it is straightforward to confirm that the solution is
\begin{equation}
U(t,t_0)=\exp\left[-\frac{i}{\hbar}H(t-t_0)\right].
\end{equation}

The second case is that the Hamiltonian is time dependent but the Hamiltonians at two different times commute, that is, \([H(t_1),H(t_2)]=0\), then we claim that the solution is
\begin{equation}
U(t,t_0)=\exp\left[-\frac{i}{\hbar}\int_{t_0}^tds\,H(s)\right].
\end{equation}
To see this first define
\begin{equation*}
R(t)=-\frac{i}{\hbar}\int_{t_0}^tds\,H(s),
\end{equation*}
so that \(R'(t)=-(i/\hbar)H(t)\) and note that
\begin{equation*}
[R'(t),R(t)]=\left[-\frac{i}{\hbar}H(t),-\frac{i}{\hbar}\int_{t_0}^tds\,H(s)\right]=-\frac{1}{\hbar^2}\int_{t_0}^tds\,[H(t),H(s)]=0.
\end{equation*}
That \(R'(t)\) and \(R(t)\) commute then means that me can write the derivative,
\begin{align*}
\frac{d}{dt}\exp R(t)&=\frac{d}{dt}\left[\id+R(t)+\frac{1}{2!}R(t)R(t)+\frac{1}{3!}R(t)R(t)R(t)+\dots\right]\\
&=R’+\frac{1}{2!}(R’R+RR’)+\frac{1}{3!}(R’RR+RR’R+RRR’)+\dots
\end{align*}
as
\begin{equation*}
\frac{d}{dt}\exp R(t)=R’\left(\id+R+\frac{1}{2!}R^2+\dots\right)=R'(t)\exp R(t),
\end{equation*}
confirming the result.

The third case is the most general situation in which Hamiltonians at two different times do not commute. In this case the best we can do is write the differential equation for \(U(t)\) as an integral equation,
\begin{equation*}
U(t,t_0)=\id-\frac{i}{\hbar}\int_{t_0}^tdt_1\,H(t_1)U(t_1,t_0)
\end{equation*}
and then, expressing \(U(t_1,t_0)\) as
\begin{equation*}
U(t_1,t_0)=\id-\frac{i}{\hbar}\int_{t_0}^{t_1}dt_2\,H(t_2)U(t_2,t_0)
\end{equation*}
iterate once to obtain,
\begin{equation*}
U(t,t_0)=\id+\left(-\frac{i}{\hbar}\right)\int_{t_0}^tdt_1\,H(t_1)+\left(-\frac{i}{\hbar}\right)^2\int_{t_0}^tdt_1H(t_1)\int_{t_0}^{t_1}dt_2H(t_2)U(t_2,t_0).
\end{equation*}
Continuing in this way we obtain a formal series,
\begin{align*}
U(t,t_0)=\id+\left(-\frac{i}{\hbar}\right)\int_{t_0}^tdt_1\,H(t_1)&+\left(-\frac{i}{\hbar}\right)^2\int_{t_0}^tdt_1H(t_1)\int_{t_0}^{t_1}dt_2H(t_2)\\
&+\left(-\frac{i}{\hbar}\right)^3\int_{t_0}^tdt_1H(t_1)\int_{t_0}^{t_1}dt_2H(t_2)\int_{t_0}^{t_2}dt_3H(t_3)\\
&+\dots
\end{align*}
the right hand side of which is called a time-ordered exponential.

Measurement

Information is extracted from a quantum system through the process of measurement and, in contrast to classical physics, the process of measurement is incorporated into the theoretical framework.

Postulate [General measurement]To the \(i\)th possible outcome of a measurement of a quantum system in a state \(\ket{\psi}\) there corresponds a measurement operator \(M_i\) such that the probability that the \(i\)th outcome occurs is \(p(i)\) where
\begin{equation}
p(i)=\bra{\psi}M^\dagger_iM_i\ket{\psi}
\end{equation}
and if this occurs then the state of the system after the measurement is given by,
\begin{equation}
\frac{M_i\ket{\psi}}{\sqrt{\bra{\psi}M_i^\dagger M_i\ket{\psi}}}.
\end{equation}
The measurement operators satisfy
\begin{equation}
\sum_iM_i^\dagger M_i=\id
\end{equation}
expressing the fact that the probabilities sum to 1.

Distinguishing States by General Measurements

Suppose we have a two dimensional state space and we are given one of two states, \(\ket{0}\) or \(\ket{1}\), at random. There is a measurement which can definitely distinguish between these two states, namely, defining the measurement operators \(M_0=\ket{0}\bra{0}\) and \(M_1=\ket{1}\bra{1}\) then \(M_0+M_1=\id\) and assuming we receive \(\ket{0}\) then \(p(0)=1\), that is \(p(0|\text{receive} \ket{0})=1\) and similarly \(p(1|\text{receive} \ket{1})=1\) so that the probability of successfully identifying the received state is
\begin{equation*}
P_S=p(\text{receive} \ket{0})p(0\,|\,\text{receive} \ket{0})+p(\text{receive} \ket{1})p(1\,|\,\text{receiving} \ket{1})=1.
\end{equation*}
Of course this is a perfect situation. More realistic is that we must decide what kind of measurement to perform and how to infer from a given measurement outcome the identity of the original state. So for example if we (correctly) chose the \(\{M_0,M_1\}\) measurement but inferred from a 0 outcome the state \(\ket{1}\) and vice versa then the probability of successful identification would be 0. If instead we chose a measurement based on basis elements \(\ket{+}\) and \(\ket{-}\), and inferred from a \(+\) outcome the state \(\ket{0}\) and from a \(-\) outcome the state \(\ket{1}\) then since
\begin{equation*}
p(0\,|\,\text{receive} \ket{0})=p(+\,|\,\text{receive} \ket{0})=\braket{0|+}\braket{+|0}=\frac{1}{2}
\end{equation*}
and
\begin{equation*}
p(1\,|\,\text{receive} \ket{1})=p(-\,|\,\text{receive} \ket{1})=\braket{1|-}\braket{-|1}=\frac{1}{2}
\end{equation*}
the probability of successfully identifying the received state is \(1/2\).

We can generalise this discussion as follows. Suppose we receive, with equal probability, one of \(N\) states \(\{\ket{\phi_1},\dots,\ket{\phi_N}\}\) from a \(d\)-dimensional subspace \(U\subset\mathcal{H}\) of a state space \(\mathcal{H}\). We investigate the probability of successfully distinguishing these \(N\) states based on a measurement corresponding to \(n\) measurement operators \(\{M_1,\dots,M_n\}\). We need a rule which encodes how we infer one of the \(N\) given states from one of the \(n\) measurement outcomes. We can express this as a surjective map \(f\) from the set of outcomes, \(\{1,\dots,n\}\), to the given states, \(\{1,\dots,N\}\). Then the probability of success is given by
\begin{equation}
P_S=\sum_{i=1}^Np(\text{receive} \ket{\phi_i})\times\left(\sum_{j:f(j)=i}p(j\,|\,\text{receive} \ket{\phi_i})\right).
\end{equation}
Now if by \(P_U\) we denote the orthogonal projector onto the subspace \(U\) to which the \(N\) states \(\ket{\phi_i}\) belong then we can write
\begin{equation}
p(j\,|\,\text{receive} \ket{\phi_i})=\braket{\phi_i|M_j^\dagger M_j|\phi_i}=\braket{\phi_i|P_UM_j^\dagger M_jP_U|\phi_i}.
\end{equation}
But \(M_j^\dagger M_j\) is a positive operator and therefore so is \(P_UM_j^\dagger M_jP_U\) and since the \(\ket{\phi_i}\) are assumed to be normalised we can say that \(\braket{\phi_i|P_UM_j^\dagger M_jP_U|\phi_i}\leq\tr P_UM_j^\dagger M_jP_U\). Thus, noting that \(\tr P_U=d\), we obtain
\begin{align*}
P_S&=\sum_{i=1}^Np(\text{receive} \ket{\phi_i})\times\left(\sum_{j:f(j)=i}\braket{\phi_i|P_UM_j^\dagger M_jP_U|\phi_i}\right)\\
&=\frac{1}{N}\sum_{i=1}^N\sum_{j:f(j)=i}\braket{\phi_i|P_UM_j^\dagger M_jP_U|\phi_i}\\
&\leq\frac{1}{N}\sum_{i=1}^N\sum_{j:f(j)=i}\tr P_UM_j^\dagger M_jP_U\\
&=\frac{1}{N}\tr P_U\left(\sum_jM_j^\dagger M_j\right)P_U\\
&=\frac{d}{N}.
\end{align*}
That is, the probability of success is bounded from above according to \(P_S\leq d/N\). If \(N\leq d\) and the states \(\{\ket{\phi_1},\dots,\ket{\phi_N}\}\) are orthogonal then it is possible to distinguish the states with certainty. Indeed, defining operators \(M_i=\ket{\phi_i}\bra{\phi_i}\) for \(i=1,..,N\) and \(M_{N+1}=\sqrt{\id-\sum_{i=1}^NM_i}\) then we have the appropriate measurement to be combined with with the trivial inference map \(f(i)=i\) for \(i=1,\dots,n\) and, for example, \(f(N+1)=1\).

Let’s now focus on the case that we have two states \(\ket{\phi_1}\) and \(\ket{\phi_2}\) belonging to a two-dimensional subspace \(U\). We already know that if the states are orthogonal then in principle it is possible to distinguish them with certainty. Let us then consider the case that they are not orthogonal. We will show that in this case we cannot reliably distinguish the two states. To see this suppose on the contrary that it were indeed possible. Then we must have a measurement with operators \(M_i\) and an inference rule \(f\) such that
\begin{equation*}
\sum_{j:f(j)=1}p(j\,|\,\text{receive} \ket{\phi_1})=\sum_{j:f(j)=1}\braket{\phi_1|M_j^\dagger M_j|\phi_1}=1
\end{equation*}
and
\begin{equation*}
\sum_{j:f(j)=2}p(j\,|\,\text{receive} \ket{\phi_2})=\sum_{j:f(j)=2}\braket{\phi_2|M_j^\dagger M_j|\phi_2}=1
\end{equation*}
So defining \(E_1\equiv\sum_{j:f(j)=1}M_j^\dagger M_j\) and \(E_2\equiv\sum_{j:f(j)=2}M_j^\dagger M_j\) and noting that \(E_1+E_2=\id\) we have \(\braket{\phi_1|E_2|\phi_1}=0\) so that \(\sqrt{E_2}\ket{\phi_1}=0\). Now we can form an orthonormal basis for \(U\) as \(\{\ket{\phi_1},\ket{\tilde{\phi_1}}\}\) such that \(\ket{\phi_2}=\alpha\ket{\phi_1}+\beta\ket{\tilde{\phi_1}}\) with \(|\alpha|^2+|\beta|^2=1\) and \(\beta<1\). So that
\begin{equation*}
\braket{\phi_2|E_2|\phi_2}=|\beta|^2\braket{\tilde{\phi_1}|E_2|\tilde{\phi_1}}{<}1.
\end{equation*}

 

Projective measurement

The preceding discussion of measurement is rather abstract – we have conspicuously not mentioned what is being measured.  Let us now consider the more familiar projective measurement corresponding to the measurement of a particular observable.

Postulate [Projective measurement] The eigenvalues \(\lambda_i\) of the Hermitian operator, \(O\), representing a quantum mechanical observable are the possible outcomes of any experiment carried out on the system to establish the value of the observable. In this case the measurement operators are the orthogonal projectors, \(P_{\lambda_i}\), of the spectral decomposition of the Hermitian operator \(O\). That is, \(O=\sum_i\lambda_iP_{\lambda_i}\) where \(P_{\lambda_i}\) is the projector onto the eigenspace corresponding to the eigenvalue \(\lambda_i\). If a system is in a state \(\ket{\psi}\) then a measurement of an observable represented by the operator \(O\) will obtain a value \(\lambda_i\) with a probability
\begin{equation}
p(i)=\braket{\psi|P_i|\psi}
\end{equation}
and subsequently the system will be in a state
\begin{equation}
\frac{P_i\ket{\psi}}{\sqrt{p(i)}}
\end{equation}

Note that in contrast to general measurements, if we repeat a projective measurement of the same observable then we are guaranteed to get the same outcome.

We sometimes speak of measuring in (or along) a basis. Suppose \(\ket{i}\) is an orthonormal basis for the Hilbert space describing our system. If the system is initially in a state \(\ket{\psi}\) and we make a measurement in the basis \(\ket{i}\) then with probability \(P(i)=|\braket{i|\psi}|^2\) the measurement results in the system being in the state \(\ket{i}\). The measurement operators in this case are the one-dimensional projectors \(\ket{i}\bra{i}\).

Expectation values and uncertainty relations

The expectation value of the operator \(O\) when the system is in a state \(\ket{\psi}\), that is the expected value of a (projective) measurement of the observable represented by the operator \(O\) when the system is described by the state vector \(\ket{\psi}\) is given by
\begin{equation*}
\mathbf{E}_{\psi}[O]=\sum_ip(i)\lambda_i=\sum_i\braket{\psi|P_i|\psi}\lambda_i=\sum_i\braket{\psi|\lambda_iP_i|\psi}=\braket{\psi|O|\psi}.
\end{equation*}
We typically denote this expectation value \(\braket{O}_{\psi}\), thus
\begin{equation}
\braket{O}_{\psi}=\braket{\psi|O|\psi}.
\end{equation}

If a system is in an eigenstate of an observable \(O\) then when we measure this property we are sure to obtain the eigenvalue corresponding to that eigenstate. If though the system is in some arbitrary state \(\ket{\psi}\) then there will be some uncertainty in the value obtained. We denote the uncertainty of the Hermitian operator \(O\) in the state \(\ket{\psi}\) by \(\Delta_{\psi}O\), defined by
\begin{equation}
\Delta_{\psi}O\equiv\left|\left(O-\braket{O}_{\psi}\id\right)\ket{\psi}\right|
\end{equation}

It is not difficult to see that the uncertainty \(\Delta_{\psi}O\) vanishes if and only if \(\ket{\psi}\) is an eigenstate of \(O\).

We would expect there to be a relationship between the uncertainty \(\Delta_{\psi}O\) and the usual statistical standard deviation, \(\sqrt{\mathbf{E}_{\psi}[O^2]-\mathbf{E}_{\psi}[O]^2}\) and indeed we have,
\begin{align*}
(\Delta_{\psi}O)^2&=\left|\left(O-\braket{O}_{\psi}\id\right)\ket{\psi}\right|^2\\
&=\braket{\psi|\left(O-\braket{O}_{\psi}\id\right)^2|\psi}\\
&=\braket{\psi|O^2-2\braket{O}_{\psi}O+\braket{O}_{\psi}^2\id|\psi}\\
&=\braket{O^2}_{\psi}-\braket{O}_{\psi}^2.
\end{align*}

Geometrically, the orthogonal projection of \(O\ket{\psi}\) on the 1-dimensional subspace, \(U_{\psi}\), spanned by \(\ket{\psi}\) is \(P_{\psi}O\ket{\psi}\) where \(P_{\psi}\equiv\ket{\psi}\bra{\psi}\) and therefore \(P_{\psi}O\ket{\psi}=\braket{O}_{\psi}\ket{\psi}\). Furthermore, the component of \(O\ket{\psi}\) in the orthogonal complement of \(U_{\psi}\), \(U_{\psi}^\perp\), is \((\id-P_{\psi})O\ket{\psi}\), the length of which is just \(\Delta_{\psi}O\).

Theorem (The Uncertainty Principle) In a state \(\ket{\psi}\) the uncertainties in any pair of Hermitian operators, \(A\) and \(B\), satisfy the the relation
\begin{equation}
\Delta_{\psi}A\Delta_{\psi}B\geq\left|\braket{\psi|\frac{1}{2i}[A,B]|\psi}\right|
\end{equation}

Proof This is simply an application of the Cauchy-Schwarz inequality. We define two new operators, \(\tilde{A}=A-\braket{A}_{\psi}\id\) and \(\tilde{B}=B-\braket{B}_{\psi}\id\) and states \(\ket{a}=\tilde{A}\ket{\psi}\) and \(\ket{b}=\tilde{B}\ket{\psi}\). Then Cauchy-Schwarz tells us that
\begin{equation*}
\braket{a|a}\braket{b|b}\geq\left|\braket{a|b}\right|^2,
\end{equation*}
from which, observing that \(\braket{a|a}=(\Delta_{\psi}A)^2\) and \(\braket{b|b}=(\Delta_{\psi}B)^2\),
\begin{equation*}
(\Delta_{\psi}A)^2(\Delta_{\psi}B)^2\geq\left|\braket{a|b}\right|^2.
\end{equation*}
Now, \(\braket{a|b}=\braket{\psi|\tilde{A}\tilde{B}|\psi}\), and observe that we can write,
\begin{equation*}
\braket{\psi|\tilde{A}\tilde{B}|\psi}=\frac{1}{2}\braket{\psi|\{\tilde{A},\tilde{B}\}|\psi}+\frac{1}{2i}\braket{\psi|[\tilde{A},\tilde{B}]|\psi}i
\end{equation*}
where \(\{\tilde{A},\tilde{B}\}=\tilde{A}\tilde{B}+\tilde{B}\tilde{A}\) is the anti-commutator of \(\tilde{A}\) and \(\tilde{B}\). Therefore we have
\begin{equation*}
\left|\braket{a|b}\right|^2=\left(\frac{1}{2}\braket{\psi|\{\tilde{A},\tilde{B}\}|\psi}\right)^2+\left(\frac{1}{2i}\braket{\psi|[\tilde{A},\tilde{B}]|\psi}\right)^2
\end{equation*}
and can write the uncertainty relation as
\begin{equation}
(\Delta_{\psi}A)^2(\Delta_{\psi}B)^2\geq\left(\frac{1}{2}\braket{\psi|\{\tilde{A},\tilde{B}\}|\psi}\right)^2+\left(\frac{1}{2i}\braket{\psi|[\tilde{A},\tilde{B}]|\psi}\right)^2
\end{equation}
from which it immediately follows that
\begin{equation}
(\Delta_{\psi}A)^2(\Delta_{\psi}B)^2\geq\left(\frac{1}{2i}\braket{\psi|[\tilde{A},\tilde{B}]|\psi}\right)^2
\end{equation}
which is just the squared version of the desired result.\(\blacksquare\)

It is of interest to establish under what conditions the uncertainty relation is saturated. As can be seen from the proof, that will require that Cauchy-Schwarz inequality to be saturated and that the term involving the anti-commutator vanishes. We recall that saturation of Cauchy-Schwarz is equivalent to the linear dependence of the two vectors, \(\ket{b}=\alpha\ket{a}\), for some \(\alpha\in\CC\). The anti-commutator term came from the real part of \(\braket{a|b}\) so we require \(\braket{a|b}+\braket{b|a}=0\). That is, using \(\ket{b}=\alpha\ket{a}\), \((\alpha+\alpha^*)\braket{a|a}=0\), so that \(\alpha\) must be pure imaginary, \(\ket{b}=it\ket{a}\) for \(t\in\RR\). In terms of the original operators and state this is the condition,
\begin{equation}
(B-\braket{B}_{\psi}\id)\ket{\psi}=it(A-\braket{A}_{\psi}\id)\ket{\psi}
\end{equation}
from which we see that \(|t|=\Delta_{\psi}B/\Delta_{\psi}A\) and which can be rewritten as an eigenvalue equation involving a non-Hermitian operator \((B-itA)\),
\begin{equation}
(B-itA)\ket{\psi}=(\braket{B}_\psi-it\braket{A}_{\psi})\ket{\psi}.
\end{equation}

The Stern-Gerlach Experiment

Recall that a current circulating in a closed loop induces a magnetic dipole moment (a quantity that determines the torque which the loop will experience in an external magnetic field). This is a vector quantity, \(\boldsymbol{\mu}\), given by, \(\boldsymbol{\mu}=I\mathbf{A}\), where \(I\) is the current in the loop and \(\mathbf{A}\) is the oriented area enclosed by the loop, the orientation given by the right hand rule to be consistent with the direction of the current. imageThe torque \(\boldsymbol{\tau}\) experienced by such a current loop in a magnetic field \(\mathbf{B}\) is then given by \(\boldsymbol{\tau}=\boldsymbol{\mu}\times\mathbf{B}\). This turning force works to align the magnetic moment with the magnetic field.

More generally a rotating charge distribution results in a magnetic moment and if the distribution has mass then an angular momentum which must be related to the magnetic moment. Indeed, consider a ring of charge with radius \(r\) that has a uniform charge distribution and total charge \(Q\). imageWe assume the ring is rotating about an axis perpendicular to the plane of the ring and going through its centre. If the tangential velocity is \(v\), then the current at the loop is given by \(I=\lambda v\) where \(\lambda\) is the charge density, that is, \(\lambda=Q/2\pi r\). Thus we have that the magnitude \(\mu\) of \(\boldsymbol{\mu}\) is given by,
\begin{equation}
\mu=IA=\frac{Q}{2\pi r}v\pi r^2=\frac{Q}{2}rv.
\end{equation}
Now if the mass of the ring is \(M\), then recalling that the angular momentum is given by \(\mathbf{L}=M\mathbf{r}\times\mathbf{v}\), then
\begin{equation}
\boldsymbol{\mu}=\frac{Q}{2M}\mathbf{L}
\end{equation}
In particular, for a single electron with charge \(-e\) and mass \(m_e\),
\begin{equation}
\boldsymbol{\mu}=\frac{-e}{2m_e}\mathbf{L}.
\end{equation}
The ratio of the magnetic moment to the angular momentum is called the gyromagnetic ratio and denoted \(\gamma\). It depends only on the total charge and total mass. Generally we have,
\begin{equation}
\gamma=\frac{\mu}{L}=\frac{Q}{2M},
\end{equation}
and in the case of a single electron,
\begin{equation}
\gamma=\frac{-e}{2m_e}.
\end{equation}

It might be thought that the motion of electrons inside an atom and the motion of protons within a nucleus would account for, respectively, observed atomic and nuclear magnetism. However this is not found to be the case. Rather, such particles possess a wholly intrinsic angular momentum, quite distinct from the usual spatial, or orbital angular momentum, called spin and it is only when this extra contribution is incorporated that agreement with experiment is achieved.

The gyromagnetic ratio associated with spin is different to that associated with spatial or orbital angular momentum. For example, for the electron, this ratio, denoted \(\gamma_e\), is given by,
\begin{equation}
\gamma_e=-\frac{e}{m_e},
\end{equation}
and we have a relationship between the magnetic moment and spin angular momentum, \(\mathbf{S}\), given by,
\begin{equation}
\boldsymbol{\mu}=-g\mu_B\frac{\mathbf{S}}{\hbar},
\end{equation}
where we have introduced the so called “g-factor”, which for an electron is 2, and the Bohr magneton,
\begin{equation}
\mu_B=\frac{e\hbar}{2m_e}.
\end{equation}
Note that despite the motivation in terms of moments induced by current loops, the spin induced magnetic moment has nothing to do with electric charge. Indeed, a neutron carries no charge yet possesses a magnetic moment with gyromagnetic ratio given by,
\begin{equation}
\gamma_n=-3.83\frac{q_p}{2m_p},
\end{equation}
where \(q_p\) and \(m_p\) are respectively the charge and mass of a proton.

The Stern-Gerlach experiment depicted below probes the mysterious intrinsic angular momentum of an electron. Silver atoms have forty-seven electrons. Forty-six of them fill completely the \(n=1,2,3\) and \(4\) energy levels leaving a solitary \(n=5\) electron with zero orbital angular momentum. In the Stern-Gerlach apparatus silver is vaporised in an oven then collimated to create a beam of such atoms which are directed through a magnetic field behind which is a detector screen.image
The potential energy, \(U\), of a magnetic moment \(\boldsymbol{\mu}\) in a magnetic field \(\mathbf{B}\) is given by \(U=-\boldsymbol{\mu}\cdot\mathbf{B}\) and the corresponding force is thus,
\begin{equation}
\mathbf{F}=-\nabla U=\nabla(\boldsymbol{\mu}\cdot\mathbf{B}).
\end{equation}
Thus the force points in the direction for which \(\boldsymbol{\mu}\cdot\mathbf{B}\) increases fastest. In the Stern-Gerlach setup the magnetic field is highly inhomogeneous, image and to a good approximation,
\begin{equation}
\mathbf{F}=\mu_z\frac{\partial B_z}{\partial z}\mathbf{e}_z.
\end{equation}
Note that in the arrangement depicted, \(\partial B_z/\partial z\) is negative. Now, thanks to the high temperature of the oven generating the beam of silver atoms we would expect, reasoning classically, that the distribution of the magnetic moments of the sliver atoms passing through the apparatus would be isotropic. In particular, the component of the magnetic moment in the \(z\)-direction would be expected to be \(\mu_z=|\boldsymbol{\mu}|\cos\theta\) with no preferred angle \(\theta\) between the moment and the \(z\)-axis. Thus we would expect a spread of deflections with the upper and lower bounds corresponding respectively to \(-|\boldsymbol{\mu}|\) and \(|\boldsymbol{\mu}|\) and therefore a distribution detected by the screen looking something like,imageIn fact what is observed is a distribution of the form,imageThe atoms are deflected either up or down with nothing in between. It is as if all the atoms have either a fixed positive \(\mu_z\), corresponding to the lower screen distribution, or fixed negative \(\mu_z\), corresponding to the upper screen distribution. We therefore conclude that the dipole moment, and therefore the spin angular momentum, of an electron is quantized. The two values of \(\mu_z\) can be calculated and leads to a determination of the two possible values of \(S_z\),
\begin{equation}
S_z=\pm\frac{\hbar}{2}.
\end{equation}
The Stern-Gerlach experiment is effectively measuring the component of spin angular momentum along a particular direction in space of electrons in a beam and finds that they can take just two discrete values which we call up and down. Startling though this is the mystery certainly doesn’t stop here.

We’ll now consider a series of thought experiments, involving two or more Stern-Gerlach experiments in series. A single such experiment, which we’ll subsequently refer to as SG1, will be represented schematically as

image

The label \(\mathbf{e}_z\) on this machine indicates that the beam of electrons entering from the left will be subjected to a measurement of the electron spin in the \(z\)-direction. From such a machine two beams may emerge, in this case corresponding respectively to the \(z\)-component of spin ‘up’, \(S_z=\hbar/2\), and ‘down’, \(S_z=-\hbar/2\).

Let us now consider the following experiment, SG2

image

We begin by sending a beam of electrons of undetermined spin (i.e. silver atoms produced in an oven) to an \(\mathbf{e}_z\)-machine. Of the two beams emerging from this machine we discard the spin down beam passing only the spin up beam into another \(\mathbf{e}_z\)-machine. From this machine only one beam emerges corresponding to the spin up atoms with \(S_z=\hbar/2\). So if electrons are already in a \(S_z=\hbar/2\) state then another measurement of the \(z\)-component of the spin of the electrons is certain to find that \(S_z=\hbar/2\). There will be no electrons found with \(S_z=-\hbar/2\). In some sense these two states, spin up and spin down are ‘orthogonal’, an electron in a spin up state has no ‘component’ of spin in the spin down state of the same direction.

Now consider replacing the second machine above with an \(\mathbf{e}_x\)-machine whose two outputs correspond respectively to \(S_x=\hbar/2\) and \(S_x=-\hbar/2\), SG3.

image

Intuitively we think of the \(x\) and \(z\) directions as being orthogonal and indeed if we were dealing here with measurements of the orbital angular momentum of some object then of course there could be no component in the \(x\)-direction of \(z\)-oriented angular momentum. The result of this spin measurement however is that we find about half the electrons entering the second apparatus emerge from the \(S_x=\hbar/2\) output and half from the \(S_x=-\hbar/2\) output. Thus in the quantum world, we conclude that if we measure the \(x\)-component of spin, \(S_x\), of a particle known to have \(z\)-component of spin, \(S_z=\hbar/2\), then we will measure \(S_x\) to be either \(S_x=\hbar/2\) or \(S_x=-\hbar/2\) with equal probability.

Finally, let us consider taking the apparatus of the previous experiment and directing the \(S_x=-\hbar/2\) beam through a \(\mathbf{e}_z\)-machine, SG4.

image

One might perhaps think that having, in our first machine, selected only atoms carrying spin \(S_z=\hbar/2\) to enter the second machine that only such atoms would emerge through the final \(\mathbf{e}_z\)-machine. However, we find that about half emerge from the \(S_z=\hbar/2\) and half through the \(S_x=-\hbar/2\) output. It’s as if the intervening \(\mathbf{e}_x\)-machine has scrambled any memory of the output of the first machine.