Author Archives: abocadabro

Space and time become spacetime

Physicists at the beginning of the 20th century were thus faced with a conundrum. They had Newton’s theory of mechanics, working in perfect harmony with Galileo’s principle of relativity, tested over two centuries and never once found wanting. Maxwell’s theory of electrodynamics was, by comparison, the new kid on the block. But its experimental confirmation, particularly thanks to the work of Heinrich Hertz (1857-1894) proving the existence of electromagnetic waves travelling at the speed of light, was compelling. Maxwell’s theory pointed to the future of physics, it signalled a radical departure from the ‘action at a distance’ concept implicit in the classical interpretation of the interaction of physical bodies. But what was to be made of its inconsistency with Galilean relativity and the null result of Michelson-Morley?

Both Hendrik Lorentz (1853-1928) and Henri Poincaré (1854-1912) made significant contributions to the solution of this puzzle but it was Albert Einstein (1879-1955) in his 1905 paper “On the Electrodynamics of Moving Bodies” who had the clarity and audacity of vision to see that what was required was nothing less than a radically new understanding of the relationship between space and time. His solution was as simple as it was bold. He declared that the laws of physics, including Maxwell’s equations, are indeed valid in all inertial frames of reference. In particular, this means that no matter how fast a light source is travelling, the light always travels at the same speed \(c\). The Galilean transformations between inertial frames were no longer tenable but, thanks to Lorentz, their replacement, the Lorentz transformations, were already known. They were part of a theory, “Lorentz Aether Theory”, which Einstein’s bold insight swept aside. Einstein was able to show that the Lorentz transformations were a natural consequence of the fundamental principle of the constancy of the speed of light. The aether was now redundant.

The Galilean transformations assume that time is absolute, that observers in uniform motion relative to one another agree on the rate at which time passes and so always agree on the time interval between two given events. It was this assumption of an absolute time, such a deeply intuitive notion, which Einstein had the brilliance to dispense with. Subsequent notes in this series will discuss in more detail the derivation and remarkable consequences of the Lorentz transformations but its worth having a look at them now to get a qualitative sense of their departure from the Galilean paradigm.
\begin{align*}
x’&=\frac{x-vt}{\sqrt{1-(v/c)^2}}\\
y’&=y\\
z’&=z\\
t’&=\frac{t-vx/c}{\sqrt{1-(v/c)^2}}
\end{align*}Notice how the spatial and time coordinates have become intertwined. Notice also that in the limit \(c\to\infty\) the Lorentz transformations become the Galilean transformations. Over the course of the next few notes we’ll come to appreciate the speed of light as Nature’s speed limit. We’ll also see how Newtonian mechanics had to be modified to become consistent with this new principle of relativity.

To this day special relativity is regarded as the correct geometric setting for all of physics except gravity. Three of the four fundamental forces known, the electromagnetic and strong and weak nuclear forces are understood in terms of quantum field theories, a framework in which quantum mechanics and special relativity are successfully reconciled. Gravity though is specifically excluded in special relativity. Being an action at a distance theory, Newtonian gravity had no place in the new framework and it took Einstein 11 years to complete his monumental general theory of relativity which established gravity as curvature of spacetime. En route, in 1907, Einstein made a crucial observation regarding its nature. As I’ve already mentioned, gravity is a somewhat peculiar force in that it accelerates all masses equally. This led Einstein to the realisation that in free fall gravity is no longer perceptible. Nowadays this effect is familiar to us from footage of astronauts floating weightless in the International Space Station (ISS). Note that the space station isn’t in some sort of zero-gravity environment. On the contrary, the earth’s pull up there is only about 5% less than we experience it on the ground. The ISS is simply falling. It is in free-fall, but doesn’t come crashing down to earth since it has just the required velocity perpendicular to ‘down’ to ensure that as fast as it’s falling, the earth is curving away from it so it maintains its orbit. In fact, though pretty thin, the atmosphere in the space station’s orbit creates a drag which requires periodic re-boosts to maintain this crucial balance. During these, the ISS is not in free-fall, an effect vividly demonstrated by its crew members in this video:

The boosts ‘turn on’ gravity momentarily. This is in fact the crucial point. If you are in a windowless spaceship, there is no way to tell the difference between the spaceship being at rest on earth or being in deep space, far from any massive gravitation-inducing bodies, with its boosters on to provide an acceleration equal to that induced by earth’s gravity. The weightfulness will feel identical in both cases, just as the weighlessness experienced in free fall is no different from that which would be experienced in deep empty space. These are both examples of Einstein’s principle of equivalence upon which he based the general theory of relativity.

Though that is the beginning of a story for another day, we should now recall our definition of an inertial frame of reference as one in which a free test particle would have a constant velocity. We previously brushed over the issue of gravity. Now we see that something like the ISS is an excellent approximation to an inertial frame of reference. In fact, to be precise we should restrict attention to local reference frames. That is, windowless spaceships small enough so that tidal effects due to the non-uniformity of the gravitational pull are not perceptible ¹ With respect to a local frame of reference in free-fall such free test particles really do exist! Thus, real world inertial frames of reference, those to which Einstein’s special relativity applies, are local free-fall frames, sometimes called free-float frames.

But if inertial frames of reference are really free-fall frames then where does that leave our earth-bound ‘inertial’ frames. In particular, are we entitled to use special relativity in analysing particle trajectories at the LHC? Fortunately the answer is yes. Since we are in any case interested in understanding the behaviour of objects moving at or near light speed, over the relevant time scales gravity isn’t an issue. To see this we note that in a laboratory on earth in a time \(t\) a particle falls a distance \((1/2)gt^2\), where \(g\approx10\text{ms}^{-2}\) is the acceleration due to gravity. So if the smallest displacement we can detect is of the order of a micrometer, \(10^{-6}\text{m}\), (the best spatial resolution of the tracking devices at the LHC), then that corresponds to a falling time of the order of \(10^{-4}\text{s}\). That doesn’t sound like long but near light speed particles can cover distances of the order of \(10\text{km}\) in that time so no deviation from inertial, straight line, motion could be detected in a realistically proportioned earth-bound laboratory.

To summarise then, we can say that the known laws of physics are invariant under Lorentz transformations and these transformations relate inertial frames which are best understood as local free-fall frames in uniform relative motion with respect to one another. An earth-bound laboratory is a reasonable approximation of a free fall frame when considering motion at or near light speed since over sensible distances the relevant time frames are so short that gravity may reasonably be ignored. Newtonian physics is invariant under Galilean transformations. That physics and those transformations are the low speed approximations of relativistic mechanics and Lorentz transformations respectively. In either context gravity doesn’t have to be excluded and earth-bound laboratories are reasonable approximations of inertial frames of reference when earth’s rotational motion is irrelevant.

Notes:

If two balls are in free fall together towards the earth and are a certain horizontal distance apart they will tend to move closer together since they are both being pulled towards the centre of the earth. Likewise two balls in free fall with a certain initial vertical separation will tend to move further apart since the pull on the closer of the two is greater than on the other. ↩

The Michelson-Morley Experiment

Towards the end of the 19th century, it had become generally accepted that Maxwell’s equations, as presented by James Clerk Maxwell (1831-1879) in his 1865 paper “A dynamical theory of the electromagnetic field”, were the correct and unifying description of the physics of electricity and magnetism. Light was by then understood to be electromagnetic waves, with Maxwell’s equations specifying their speed in vacuum to be a universal constant of nature, \(c=299,792,458\text{ms}^{-1}\approx3\times10^8\text{ms}^{-1}\). The equations make it clear that the speed of light does not depend on the speed of the source. It was therefore assumed that light waves must propagate through some kind of material medium, a ‘luminiferous aether’, just as sound waves propagate, independent of the speed of their source, through air. Consistent with this belief, Maxwell’s equations are not invariant under Galilean transformations. The presumption was that they held only in those frames which happen to be at rest with respect to the mysterious aether — only in such a preferred frame would light travel in all directions at speed \(c\). But this state of affairs should then present an opportunity to detect the relative motion between earth and the aether. The most famous such attempt was the Michelson-Morley experiment of 1887. ¹ Here is a schematic of the optical interferometer used in their experiment.

Sodium light was split into two beams travelling at right-angles to one another. After travelling (approximately equal) distances \(L=11\text{m}\) each beam is reflected back to the beam splitter where they are recombined and directed towards a detector ready to observe interference fringes. The apparatus was mounted on a bed of mercury allowing it to be smoothly rotated. If by some miracle (the earth’s velocity relative to the sun is \(30\text{kms}^{-1}\) and \(200\text{kms}^{-1}\) relative to the centre of the milky way) the apparatus was at rest in the aether, then no shift in the observed interference fringes would be expected as the apparatus is rotated. Considering the more likely scenario of the interferometer travelling with some velocity \(v\) relative to the aether’s rest frame and aligned at an angle \(\theta\) to this direction we consider the following schematic.

We can write down the following pairs of equations for the round trip paths taken by each of the pair of beams.
\begin{align*}
c^2{t_1}^2&=(L-vt_1\sin\theta)^2+v^2{t_1}^2\cos^2\theta,\\
c^2{t_2}^2&=(L+vt_2\sin\theta)^2+v^2{t_2}^2\cos^2\theta,
\end{align*}
and
\begin{align*}
c^2{T_1}^2&=(L+vT_1\cos\theta)^2+v^2{T_1}^2\sin^2\theta,\\
c^2{T_2}^2&=(L-vT_2\cos\theta)^2+v^2{T_2}^2\sin^2\theta.
\end{align*}

From which we calculate(for example)
\begin{equation*}
(c^2-v^2){t_1}^2+2Lv\sin\theta t_1-L^2=0
\end{equation*}
so that
\begin{equation*}
t_1=\frac{-2Lv\sin\theta+2L\sqrt{v^2\sin^2\theta+(c^2-v^2)}}{2(c^2-v^2)}
\end{equation*}
and then
\begin{equation*}
t_1=\frac{-2Lv\sin\theta+2L\sqrt{c^2-v^2\cos^2\theta}}{2(c^2-v^2)}
\end{equation*}etc… the respective total round trip paths to be
\begin{equation*}
c(t_1+t_2)=\frac{2L\sqrt{1-(v\cos\theta/c)^2}}{1-(v/c)^2},
\end{equation*}and
\begin{equation*}
c(T_1+T_2)=\frac{2L\sqrt{1-(v\sin\theta/c)^2}}{1-(v/c)^2},
\end{equation*}and the path difference, which we’ll call \(\Delta(\theta)\), to be
\begin{equation*}
\Delta(\theta)=\frac{2L}{1-(v/c)^2}\left(\sqrt{1-(v\sin\theta/c)^2}-\sqrt{1-(v\cos\theta/c)^2}\right).
\end{equation*}If the apparatus is rotated through \(90^\circ\) then the path difference is \(-\Delta(\theta)\) so the expected fringe shift between the two orientations will be a fraction \(2\Delta(\theta)/\lambda\) of a wavelength where \(\lambda=589\times10^{-9}\text{m}\) is the wavelength of sodium light. Assuming the apparatus starts off with an orientation of \(\theta=45^\circ\) to the direction of relative motion, in which case \(\Delta(\pi/4)=0\), and assuming the aether is at rest relative to the sun so the relative velocity is \(v=30\text{kms}^{-1}\) we can plot the expected shifts.

The greatest shift is expected to occur between two alignments in which one arm is parallel and the other perpendicular to the direction of motion. In this case we have a fringe shift of approximately \(2Lv^2/c^2\lambda\approx0.37\). In fact Michelson and Morley found nothing of the sort, observing fringe shifts no bigger than 0.01 of a wavelength which translate to a relative velocity of less than \(5\text{kms}^{-1}\) ². The extraordinarily slim chance that the ether and earth frames happened to be comoving with the same velocity relative to the sun was eliminated by repeating the experiment at three month intervals. The result was the same.

A somewhat bizarre but theoretically possible explanation for the Michelson-Morley result, that the earth somehow drags the aether with it, is ruled out by the well established phenomenon of stellar aberration. The gist of the issue here is familiar to anyone who’s noticed that when cycling through falling snow, the snowflakes seem to fall towards us from somewhere in the sky in front of us rather than, as we observe when stationary, straight down. If somehow the clouds producing the snow were moving with us this apparent shift in the source of the snow wouldn’t occur. Analogously, due to earth’s motion in orbit about the sun, the apparent positions of stars in the sky is shifted. This is stellar aberration and, if the aether were dragged along with earth, it wouldn’t be observed — but it is.

Notes:

Albert Abraham Michelson (1852-1931) was an esteemed experimenter who in 1907 became the first American to win a Nobel Prize in science. ↩
Subsequent, more accurate, measurements reduced this to less than \(1\text{kms}^{-1}\). ↩

The principle of relativity

Special relativity was introduced by Einstein in 1905 to reconcile inconsistencies between Newtonian mechanics and Maxwell’s electromagnetism. In turned out that Newton’s theory had to be brought into line with Maxwell’s, with the reformulated mechanics then able to correctly describe motion approaching or at light speed. Just as significantly, the theory demanded a radical reappraisal of the relationship between space and time. The notion that our local geometry is purely spatial with time somehow distinct, absolute and universal, had to be abandoned. Space and time though different in character had to be regarded as combining to play equally important roles in a richer spacetime geometry.

The theory rests upon the simple yet profound principle of relativity:

The laws of physics are identical in all inertial frames of reference.

The giants of physics, Galileo, Newton, Maxwell and Einstein, have each had a hand in shaping our understanding of this principle. Here I’ll try to place our current understanding of its meaning in the context of its historical development.

Galileo’s ship

In his 1632 “Dialogue Concerning the Two Chief World Systems”, Galileo Galilei (1564-1642) described a picturesque scene below deck of a ship featuring “small winged creatures”, fish, dripping bottles as well as a game of catch and some jumping about to illustrate a phenomenon well known to us all. Stuff behaves the same whether we and our immediate environment are stationary or moving uniformly. Sitting in an aeroplane, the windows shuttered and headphones on are you parked on the runway or cruising at 30,000 feet? You have no way of distinguishing between these two possibilities. It’s only if the plane hits turbulence, its velocity suddenly changing, that you appreciate the importance of keeping your seat belt fastened and realise you’re perhaps further from the ground than you’d like! Galileo had identified that the way things move, mechanics, is identical whether our frame of reference, a laboratory in which we can measure distances and times, is stationary or moving with constant velocity. The laws of physics, as far as they were then understood, do not and cannot distinguish between frames of reference in uniform relative motion. This was Galileo’s principle of relativity.

Galilean invariance of Newton’s laws

Isaac Newton (1642-1727) formalised the laws of mechanics in terms of his three laws of motion and law of universal gravity. These were presented, along with a great deal else, in his monumental “Principia” published in 1687. The first law of motion states that every body continues in its state of rest or uniform motion in a straight line unless compelled by some external force to change that state. Explicitly, this says that a free particle (one not acted on by any force) has constant velocity. But also, implicitly, that there exists a frame of reference in which this is the case. Such a reference frame is precisely what is meant by an inertial frame of reference. In other words, the validity of Newton’s first law tests whether or not we are in an inertial frame. Furthermore, given one inertial frame, Galileo’s principle of relativity tells us that any other frame of reference moving uniformly with respect to it is also an inertial frame. One obvious question is where (on earth!) are these free particles — nothing escapes gravity! There are of course special situations, a puck on an ice rink, particles with no mass, in which gravity can clearly be ignored, but Newton posits that quite generally were it not for gravity the natural state of all things is rest or uniform motion. Some justification for setting aside gravity in this way is provided by the following observation. Recall that Newton’s law of gravity says that the gravitational force exerted on a point pass \(m\) by another point mass \(M\) is given by
\begin{equation}
F=G\frac{Mm}{r^2}
\end{equation}where \(G\) is the gravitational constant and \(r\) is the distance separating the point masses. Now normally, in accordance with Newton’s second law, \(F=ma\), the acceleration due to an applied force is inversely proportional to the mass. In the case of gravity though, since the force itself is proportional to the mass on which it acts, it accelerates all bodies equally regardless of their mass and so can be regarded as a kind of overlay upon the existing physics.

Mathematically, a frame of reference may be regarded as a coordinate system. To specify the location of a particle we need four coordinates, \(x,y,z,t\). Three, \(x,y,z\), to specify its where and one, \(t\), specifies its when. Let’s call this coordinate system \(S\) and assume it corresponds to an inertial frame of reference. Of course any other coordinate system which is simply spatially translated and/or rotated with respect to \(S\) also corresponds to an inertial reference frame. More interesting though would be one which was also in relative motion with respect to \(S\). Let’s call the corresponding coordinate system \(S’\). To keep things simple let’s focus on the relative motion and assume that we’ve arranged that at \(t=0\) the coordinate systems are aligned with \(S’\) moving with a velocity \(\mathbf{v}=(v,0,0)\), that is, with speed \(v\) in the positive \(x\)-direction relative to \(S\).

Then at some time \(t\) the spatial coordinates of a point with respect to \(S’\) are related to its coordinates in \(S\) according to the simple equations,
\begin{align}
x’&=x-vt\nonumber\\
y’&=y\label{eq:Galilean_space}\\
z’&=z.\nonumber
\end{align}Notice that we’ve implicitly assumed that there is a single, absolute time. That is, time in \(S’\) is assumed to be the same as time in \(S\),
\begin{equation}
t’=t.\label{eq:Galilean_time}\\
\end{equation}Together, the equations \eqref{eq:Galilean_space} and \eqref{eq:Galilean_time} are called the Galilean transformations relating the inertial coordinate systems \(S\) and \(S’\).

An immediate consequence of the Galilean transformations is that velocities add. That is, if \(u_x\) and \(u’_x\) are the \(x\)-components of the velocities \(\mathbf{u}\) and \(\mathbf{u}’\) of a particle as measured in \(S\) and \(S’\) respectively then,
\begin{equation*}
u_x=\frac{dx}{dt}=\frac{dx’}{dt’}+v=u’_x+v,\\
\end{equation*}so that, together with the obvious relations for the \(y\)- and \(z\)-components, \(\mathbf{u}=\mathbf{u}’+\mathbf{v}\). In particular there is no notion of absolute rest. As Galileo had observed, one person’s state of rest is another’s uniform motion, it is a matter of perspective.

Coordinate systems related by Galilean transformations are the mathematical abstraction of inertial frames of reference and Galileo’s principle of relativity set in this context is the statement that the mathematical expression of the laws of physics should be invariant under Galilean transformations. Take Newton’s second law as an example, \(\mathbf{F}=m\mathbf{a}\), now expressed in terms of 3-dimensional vectors, \(\mathbf{F}=(F_x,F_y,F_z)\) and \(\mathbf{a}=(a_x,a_y,a_z)\). If \(\mathbf{a}’\) is the acceleration as measured in \(S’\) then we have, say for its \(x\)-component, \(a’_x\),
\begin{equation*}
a’_x=\frac{d^2x’}{dt’^2}=\frac{d^2x}{dt^2}=a_x,\\
\end{equation*}and similarly for the \(y\)- and \(z\)-components, so \(\mathbf{a}’=\mathbf{a}\), that is, acceleration is invariant under Galilean transformation. To confirm that Newton’s second law is true in all inertial frames of reference then becomes the mathematical problem of checking, on a case by case basis, that all forces of interest are also invariant under Galilean transformations. In fact, most forces encountered in Newtonian dynamics depend only on relative position, relative velocity and time. So, since each of these are invariant under Galilean transformations so are the forces. In particular, this is the case for the force of gravity between two objects since it is inversely proportional to the square of their separation.

In most cases a frame of reference fixed to earth, such as the room in which you’re sitting, is a good approximation to an inertial frame. Technically though it isn’t, consider for example earth’s rotation about its axis which constitutes a radially directed acceleration. When working with such noninertial frames the acceleration of the frame manifests itself in the form of ‘fictitious’ forces — in the case of rotating frames, the Coriolis and centrifugal forces. Incidentally, it is a feature of such fictitious forces that, like gravity they are always proportional to the mass of the object whose motion is being studied. Could it be that gravity is also somehow a fictitious force? This idea turns out to have considerable legs!

Galilean relativity in action: What happens when you drop a soccer ball with a table-tennis ball sitting on top?

If you’ve never tried it you should — seeing the table-tennis ball ping high in the air is pretty dramatic. Understanding this behaviour provides a nice example of the power of translating between inertial frames of reference using the Galilean transformations. It will be intuitively obvious that a table-tennis ball hitting a soccer ball will simply bounce back with essentially the same speed but in the opposite direction leaving the football unmoved. Now, when we drop the football with the table-tennis ball on top, for a split second after the football hits the ground, we have the two balls colliding with each other with equal and opposite velocities. Schematically, and imagined horizontally, the situation we wish to understand is this:

Now let us consider the situation from the perspective of a frame of reference moving to the right with velocity \(v\). In this frame the football is at rest and, thanks to the way velocities add in Galilean transformations, we know that the table-tennis ball is on a collision course travelling at a speed of \(2v\). As already mentioned we know what happens is this situation – the table-tennis ball simply bounces back travelling in the opposite direction with the same speed \(2v\) and the football remains at rest. To understand the original problem we simply translate this outcome back to the original frame of reference to find the football still travelling a \(v\) but the table tennis ball travelling at \(3v\). In other words, when we drop a football with a table tennis ball on top the table tennis ball bounces back up at three times the speed at which the pair hit the ground!

Stern-Gerlach Revisited

In this section we revisit the discussion of the Stern-Gerlach experiment and show that the observed behaviour of that system can be perfectly described using the mathematical framework developed so far.

In the previous discussion of the Stern-Gerlach experiment we saw that what was being measured was the component of spin angular momentum along a particular direction in space and that there were only ever two possible outcomes to such a measurement. Thus, it would appear that spin states of the electron live in a 2-dimensional state space, they are qubits, and it will be useful to employ the \(\ket{\mathbf{n},\pm}\) notation for states labeled by a particular direction in space. With respect to some (arbitrarily chosen) coordinate axes a Stern-Gerlach apparatus may be arranged to measure the component of spin in the \(z\)-direction and the corresponding spin states could be denoted \(\ket{z;\pm}\). We would be inclined to posit that these are eigenstates of an observable \(S_z\) corresponding to eigenvalues \(\hbar/2\) and \(-\hbar/2\) respectively. That is,
\begin{equation}
S_z\ket{z;\pm}=\pm\frac{\hbar}{2}\ket{z;\pm}
\end{equation}
and so the matrix representation of \(S_z\) in this basis is
\begin{equation}
\mathbf{S}_z=\frac{\hbar}{2}\begin{pmatrix}1&0\\0&-1\end{pmatrix}=\frac{\hbar}{2}\boldsymbol{\sigma}_z
\end{equation}
where we have recalled the definition of the Pauli matrix \(\sigma_z\).

Indeed, the discussion in Qubit Mechanics I suggests that we should define spin observables for a general orientation, \(\mathbf{n}\), in space according to \begin{equation}
\mathbf{S}_\mathbf{n}=\frac{\hbar}{2}\mathbf{n}\cdot\boldsymbol{\sigma}
\end{equation}
with corresponding eigenstates \(\ket{\mathbf{n},\pm}\). So in particular we would have
\begin{equation}
\mathbf{S}_x=\frac{\hbar}{2}\begin{pmatrix}0&1\\1&0\end{pmatrix}=\frac{\hbar}{2}\boldsymbol{\sigma}_x
\end{equation}
and
\begin{equation}
\mathbf{S}_y=\frac{\hbar}{2}\begin{pmatrix}0&-i\\i&0\end{pmatrix}=\frac{\hbar}{2}\boldsymbol{\sigma}_y
\end{equation}
with respective orthonormal eigenstates,
\begin{eqnarray}
\ket{x;+}&=&\frac{1}{\sqrt{2}}\begin{pmatrix}1\\1\end{pmatrix}&=&\frac{1}{\sqrt{2}}\left(\ket{z;+}+\ket{z;-}\right)\\
\ket{x;-}&=&\frac{1}{\sqrt{2}}\begin{pmatrix}1\\-1\end{pmatrix}&=&\frac{1}{\sqrt{2}}\left(\ket{z;+}-\ket{z;-}\right)
\end{eqnarray}
and
\begin{eqnarray}
\ket{y;+}&=&\frac{1}{\sqrt{2}}\begin{pmatrix}1\\i\end{pmatrix}&=&\frac{1}{\sqrt{2}}\left(\ket{z;+}+i\ket{z;-}\right)\\
\ket{y;-}&=&\frac{1}{\sqrt{2}}\begin{pmatrix}1\\-i\end{pmatrix}&=&\frac{1}{\sqrt{2}}\left(\ket{z;+}-i\ket{z;-}\right).
\end{eqnarray}

Stern-Gerlach Explained

We now have all the quantum mechanical machinery in place to understand the Stern-Gerlach experiments. Recall the basic setup, which we referred to as SG1. We assume that the spin state of an atom entering SG1 is in some arbitrary state \(\ket{\psi}\) in a 2-dimensional state space. The measuring device in SG1 corresponds to the observable \(S_z\) whose spectral decomposition is
\begin{equation}
S_z=\frac{\hbar}{2}\ket{z;+}\bra{z;+}-\frac{\hbar}{2}\ket{z;-}\bra{z;-}
\end{equation}
and therefore the probability of measuring a particle to be spin up is \(p(z;+)\) given by
\begin{equation}
p(z;+)=\braket{\psi\ket{z;+}\bra{z;+}\psi}
\end{equation}
in other words it is the squared modulus of the probability amplitude \(\braket{z;+|\psi}\) for finding \(\ket{\psi}\) in the state \(\ket{z;+}\). Likewise, the probability amplitude for finding \(\ket{\psi}\) in the state \(\ket{z;-}\) is \(\braket{z;-|\psi}\) corresponding to the probability \(p(z;-)=|\braket{z;-|\psi}|^2\).

Now let us consider SG2. In this case, we retain only the atoms emerging with spin state \(\ket{z;+}\) from the initial \(S_z\) measuring device then subject these atoms to a second \(S_z\) device. In this case the amplitudes for measuring the \(z\)-component of spin up and down are respectively \(\braket{z;+|z;+}=1\) and \(\braket{z;-|z;+}=0\) so we are sure to confirm that the atom has spin state up.

If instead of passing the atoms retained from the first \(S_z\) device in SG2 into a second \(S_z\) device we pass them instead into an \(S_x\) device, as in SG3, then the relevant amplitudes are
\begin{equation}
\braket{x;+|z;+}=\frac{1}{\sqrt{2}}\braket{z;+|z;+}+\frac{1}{\sqrt{2}}\braket{z;-|z;+}=\frac{1}{\sqrt{2}}
\end{equation}
for finding the \(x\)-component of the spin to be up and
\begin{equation}
\braket{x;-|z;+}=\frac{1}{\sqrt{2}}\braket{z;+|z;+}-\frac{1}{\sqrt{2}}\braket{z;-|z;+}=-\frac{1}{\sqrt{2}}
\end{equation}
for finding the \(x\)-component of the spin to be down. That is, we find that there is an equal probability of \(1/2\) for the \(x\)-component of the spin to be up or down.

In SG4 we retain the atoms measured to be spin down by a \(S_x\) device which took as input atoms measured to be spin up by a \(S_z\) device. These atoms are then passed to another \(S_z\) measuring device. The relevant amplitudes are now
\begin{equation}
\braket{z;+|x;-}=\frac{1}{\sqrt{2}}\braket{z;+|z;+}-\frac{1}{\sqrt{2}}\braket{z;+|z;-}=\frac{1}{\sqrt{2}}
\end{equation}
for finding the \(z\)-component of the spin to be up and
\begin{equation}
\braket{z;-|x;-}=\frac{1}{\sqrt{2}}\braket{z;-|z;+}-\frac{1}{\sqrt{2}}\braket{z;-|z;-}=-\frac{1}{\sqrt{2}}
\end{equation}
for finding the \(z\)-component of the spin to be down. That is, we find that there is an equal probability of \(1/2\) for the \(z\)-component of the spin to be up or down. The quantum mechanical formalism makes it clear that there is no ‘memory’ that the atoms had previously, before the \(S_x\) measurement, been found with probability 1 to have \(z\)-component of their spin up!

Qubit mechanics I

Two-level systems, quantum mechanical systems whose state space is \(\CC^2\), are relatively simple yet still rich enough to exhibit most of the peculiarities of the quantum world. Moreover, they are physically important – we will consider nuclear magnetic resonance and the ammonia maser as examples.

Throughout we will assume, unless explicitly stated otherwise, that the column vectors and matrices representing state vectors and observables respectively are with respect to the standard basis of \(\CC^2\).

Properties of Pauli matrices

It’s straightforward to verify that the three Pauli matrices,
\begin{equation}
\boldsymbol{\sigma}_1=\begin{pmatrix}0&1\\1&0\end{pmatrix}\qquad\boldsymbol{\sigma}_2=\begin{pmatrix}0&-i\\i&0\end{pmatrix}\qquad\boldsymbol{\sigma}_3=\begin{pmatrix}1&0\\0&-1\end{pmatrix},
\end{equation}
each square to the identity,
\begin{equation}
\boldsymbol{\sigma}_i^2=\mathbf{I},\qquad i=1,2,3
\end{equation}
and that they are all traceless,
\begin{equation}
\tr\boldsymbol{\sigma}_i=0,\qquad i=1,2,3.
\end{equation}
From these two facts it follows that each Pauli matrix has two eigenvalues \(\pm1\). We can compute the commutators and find
\begin{equation}
[\boldsymbol{\sigma}_i,\boldsymbol{\sigma}_j]=2i\epsilon_{ijk}\boldsymbol{\sigma}_k.
\end{equation}
Likewise the anti-commutators are
\begin{equation}
\{\boldsymbol{\sigma}_i,\boldsymbol{\sigma}_j\}=2\delta_{ij}\mathbf{I},
\end{equation}
and since the product of any pair of operators is one-half the sum of the anti-commutator and the commutator we have
\begin{equation}
\boldsymbol{\sigma}_i\boldsymbol{\sigma}_j=\delta_{ij}\mathbf{I}+i\epsilon_{ijk}\boldsymbol{\sigma}_k.
\end{equation}
A simple consequence of this is the rather useful relation,
\begin{equation}
(\mathbf{u}\cdot\boldsymbol{\sigma})(\mathbf{v}\cdot\boldsymbol{\sigma})=(\mathbf{u}\cdot\mathbf{v})\mathbf{I}+i(\mathbf{u}\times\mathbf{v})\cdot\boldsymbol{\sigma}.
\end{equation}

Hermitian and Unitary operators on \(\CC^2\)

Any \(2\times2\) matrix \(\mathbf{M}\) representing a linear operator on \(\CC^2\) can be expressed as a linear combination of the identity matrix and the three Pauli matrices,
\begin{equation}
\mathbf{M}=m_0\mathbf{I}+\mathbf{m}\cdot\boldsymbol{\sigma}
\end{equation}
where \(m_0\) and the components \(m_1,m_2,m_3\) of the vector \(\mathbf{m}\) are complex numbers and \(\boldsymbol{\sigma}\) is the vector with components the Pauli matrices, \(\boldsymbol{\sigma}=(\boldsymbol{\sigma}_1,\boldsymbol{\sigma}_2,\boldsymbol{\sigma}_3)\). It follows that
\begin{equation}
m_0=\frac{1}{2}\tr\mathbf{M},\quad m_i=\frac{1}{2}\tr(\mathbf{M}\boldsymbol{\sigma}_i).
\end{equation}

The condition for a matrix \(\mathbf{Q}\) to be Hermitian is that \(\mathbf{Q}^\dagger=\mathbf{Q}\). Thus we must have
\begin{equation}
\mathbf{Q}^\dagger=q_0^*\mathbf{I}+\mathbf{q}^*\cdot\boldsymbol{\sigma}=q_0\mathbf{I}+\mathbf{q}\cdot\boldsymbol{\sigma}=\mathbf{Q},
\end{equation}
where \(\mathbf{q}^*\) indicates the vector whose components are the complex conjugate of the vector \(\mathbf{q}\). It follows immediately that any Hermitian operator, that is, any qubit observable \(Q\), can be represented by a matrix
\begin{equation}
\mathbf{Q}=q_0\mathbf{I}+\mathbf{q}\cdot\boldsymbol{\sigma}
\end{equation}
where \(q_0\) and the components of the vector \(\mathbf{q}\) are all real.

The condition for a matrix \(\mathbf{U}\) to be unitary is \(\mathbf{U}^\dagger\mathbf{U}=\mathbf{U}\mathbf{U}^\dagger=\mathbf{I}\).

Theorem Any qubit unitary transformation \(U\) can, up to a choice of phase, be represented by a matrix \(\mathbf{U}\) given by
\begin{equation}
\mathbf{U}=\exp{(-i\theta\mathbf{n}\cdot\boldsymbol{\sigma})}
\end{equation}
where \(\theta\) and \(\mathbf{n}\) can be interpreted respectively as an angle and a unit vector in 3-dimensional space.

Proof We begin with the general form
\begin{equation}
\mathbf{U}=u_0\mathbf{I}+\mathbf{u}\cdot\boldsymbol{\sigma}
\end{equation}
in which \(u_0\) and \(\mathbf{u}\) are an arbitrary complex number and complex valued vector respectively. We will impose the condition \(\mathbf{U}^\dagger\mathbf{U}=\mathbf{I}\) but observe that this condition leaves an overall choice of phase unconstrained. Using this flexibility we can take \(u_0\) to be real then
\begin{align*}
\mathbf{U}^\dagger\mathbf{U}&=(u_0\mathbf{I}+\mathbf{u}^*\cdot\boldsymbol{\sigma})(u_0\mathbf{I}+\mathbf{u}\cdot\boldsymbol{\sigma})\\
&=u_0^2\mathbf{I}+2u_0\Real\mathbf{u}\cdot\boldsymbol{\sigma}+(\mathbf{u}^*\cdot\boldsymbol{\sigma})(\mathbf{u}\cdot\boldsymbol{\sigma})\\
&=u_0^2\mathbf{I}+2u_0\Real\mathbf{u}\cdot\boldsymbol{\sigma}+\mathbf{u}\cdot\mathbf{u}^*\mathbf{I}-i(\mathbf{u}\times\mathbf{u}^*)\cdot\boldsymbol{\sigma}=\mathbf{I}.
\end{align*}
Similarly, we have,
\begin{equation*}
\mathbf{U}\mathbf{U}^\dagger=u_0^2\mathbf{I}+2u_0\Real\mathbf{u}\cdot\boldsymbol{\sigma}+\mathbf{u}\cdot\mathbf{u}^*\mathbf{I}+i(\mathbf{u}\times\mathbf{u}^*)\cdot\boldsymbol{\sigma}=\mathbf{I}.
\end{equation*}
This means that we must have
\begin{equation}
\mathbf{u}\times\mathbf{u}^*=0,\label{eq:condition one}
\end{equation}
\begin{equation}
u_0\Real\mathbf{u}=0\label{eq:condition two}
\end{equation}
and
\begin{equation}
u_0^2+\mathbf{u}^*\cdot\mathbf{u}=1.\label{eq:condition three}
\end{equation}
Equation \eqref{eq:condition one} implies that
\begin{equation}
(\Real\mathbf{u}+i\Imag\mathbf{u})\times(\Real\mathbf{u}-i\Imag\mathbf{u})=-2i\Real\mathbf{u}\times\Imag\mathbf{u}=0
\end{equation}
that is, \(\Real\mathbf{u}\parallel\Imag\mathbf{u}\) so that we must be able to write \(\mathbf{u}=\alpha\mathbf{v}\) for some complex number \(\alpha\) and real vector \(\mathbf{v}\). The second condition, \eqref{eq:condition two}, tells us that either \(u_0=0\) or \(\mathbf{u}\) is pure imaginary. So that together, \eqref{eq:condition one} and \eqref{eq:condition two} imply that either \(\mathbf{u}=i\mathbf{v}\) or \(u_0=0\) and \(\mathbf{u}=\alpha\mathbf{v}\). In the latter case \eqref{eq:condition three} then implies that \(|\alpha|^2|\mathbf{v}|^2=1\) so we can write
\begin{align*}
\mathbf{U}&=\alpha\mathbf{v}\cdot\boldsymbol{\sigma}\\
&=e^{i\phi}|\alpha||\mathbf{v}|\frac{\mathbf{v}}{|\mathbf{v}|}\cdot\boldsymbol{\sigma}
\end{align*}
which up to a choice of phase has the form
\begin{equation}
\mathbf{U}=i\mathbf{n}\cdot\boldsymbol{\sigma}
\end{equation}
for some real unit vector \(\mathbf{n}\).
In the case that \(u_0\neq0\), since
\begin{equation}
u_0^2+\mathbf{v}\cdot\mathbf{v}=1
\end{equation}
we can write \(u_0=\cos\theta\) and \(\mathbf{v}=-\sin\theta\mathbf{n}\) for some angle \(\theta\) and a (real) unit vector \(\mathbf{n}\), that is, in either case, we have that up to an overall phase,
\begin{equation}
\mathbf{U}=\cos\theta\mathbf{I}-i\sin\theta\mathbf{n}\cdot\boldsymbol{\sigma}.
\end{equation}
Finally, observing that \((\mathbf{n}\cdot\boldsymbol{\sigma})^2=\mathbf{I}\) the desired matrix exponential can be written as
\begin{align*}
\exp{(-i\theta\mathbf{n}\cdot\boldsymbol{\sigma})}&=\mathbf{I}-i\theta\mathbf{n}\cdot\boldsymbol{\sigma}+\frac{i^2\theta^2}{2!}(\mathbf{n}\cdot\boldsymbol{\sigma})^2-\frac{i^3\theta^3}{3!}(\mathbf{n}\cdot\boldsymbol{\sigma})^3+\dots\\
&=\left(1-\frac{\theta^2}{2!}+\frac{\theta^4}{4!}+\dots\right)\mathbf{I}-i\left(\theta-\frac{\theta^3}{3!}+\frac{\theta^5}{5!}+\dots\right)\mathbf{n}\cdot\boldsymbol{\sigma}\\
&=\cos\theta\mathbf{I}-i\sin\theta\mathbf{n}\cdot\boldsymbol{\sigma}
\end{align*}\(\blacksquare\)

Unitary Operators on \(\CC^2\) and Rotations in \(\RR^3\)

The \(2\times2\) unitary matrices form a group under the usual matrix multiplication. This is the Lie group \(U(2)\). The \(2\times2\) unitary matrices with determinant 1 form the special unitary group \(SU(2)\). The previous theorem tells us that a general element of \(SU(2)\), which we’ll denote \(\mathbf{U}(\mathbf{n},\theta)\), can be written as
\begin{equation}
\mathbf{U}(\mathbf{n},\theta)=\exp{\left(-i\frac{\theta}{2}\mathbf{n}\cdot\boldsymbol{\sigma}\right)}.
\end{equation}
That is, the phase has been chosen such that \(\det\mathbf{U}(\mathbf{n},\theta)=1\).

The group of rotations in 3-dimensional space, denoted \(SO(3)\), consists of the all \(3\times3\)-orthogonal matrices with determinant 1. The reason for choosing the half-angle \(\theta\) is made clear in the following result.

Theorem There is a 2-to-1 group homomorphism from \(SU(2)\) to \(SO(3)\), \(\mathbf{U}(\mathbf{n},\theta)\mapsto\mathbf{R}(\mathbf{n},\theta)\) where \(\mathbf{R}(\mathbf{n},\theta)\) is the rotation through an angle \(\theta\) about the axis \(\mathbf{n}\).

Proof If we denote by \(H\) the set of traceless, Hermitian, \(2\times2\) matrices then it is not difficult to see that this is a real 3-dimensional vector space for which the Pauli matrices are a basis. The map \(f:\RR^3\mapto H\) given by \(f(\mathbf{v})=\mathbf{v}\cdot\boldsymbol{\sigma}\) is then an isomorphism of vector spaces. Defining an inner product on \(H\) according to
\begin{equation}
(\mathbf{M},\mathbf{N})_H=\frac{1}{2}\tr(\mathbf{M}\mathbf{N})
\end{equation}
this isomorphism becomes an isometry of vector spaces since,
\begin{equation*}
(f(\mathbf{v}),f(\mathbf{w}))_H=\frac{1}{2}\tr((\mathbf{v}\cdot\boldsymbol{\sigma})(\mathbf{w}\cdot\boldsymbol{\sigma}))=\mathbf{v}\cdot\mathbf{w}=(\mathbf{v},\mathbf{w})_{\RR^3},
\end{equation*}
that is, \(H\) and \(\RR^3\) are isometric. Now to any \(2\times2\) unitary matrix \(\mathbf{U}\) we can associate a linear operator \(T_\mathbf{U}\) on \(H\) such that \(T_\mathbf{U}\mathbf{M}=\mathbf{U}\mathbf{M}\mathbf{U}^\dagger\). This is clearly an isometry on \(H\) and so the corresponding linear operator on \(\RR^3\), \(f^{-1}\circ T_\mathbf{U}\circ f\) which we’ll denote \(\mathbf{R}_\mathbf{U}\) and is such that
\begin{equation}
\mathbf{R}_\mathbf{U}\mathbf{v}\cdot\boldsymbol{\sigma}=\mathbf{U}\mathbf{v}\cdot\boldsymbol{\sigma}\mathbf{U}^\dagger\label{eq:rotation from unitary}
\end{equation}
for any \(\mathbf{v}\in\RR^3\) and must be an isometrty, that is, an orthogonal operator. In fact, since
\begin{align*}
\tr\left((\mathbf{R}_\mathbf{U}\mathbf{e}_1\cdot\boldsymbol{\sigma})(\mathbf{R}_\mathbf{U}\mathbf{e}_2\cdot\boldsymbol{\sigma})
(\mathbf{R}_\mathbf{U}\mathbf{e}_3\cdot\boldsymbol{\sigma})\right)&=\tr\left((R_\mathbf{U})_1^i(R_\mathbf{U})_2^j(R_\mathbf{U})_3^k\boldsymbol{\sigma}_i\boldsymbol{\sigma}_j\boldsymbol{\sigma}_k\right)\\
&=2i\epsilon_{ijk}(R_\mathbf{U})_1^i(R_\mathbf{U})_2^j(R_\mathbf{U})_3^k\\
&=2i\det{\mathbf{R}_\mathbf{U}}
\end{align*}
and also
\begin{align*}
\tr\left((\mathbf{R}_\mathbf{U}\mathbf{e}_1\cdot\boldsymbol{\sigma})(\mathbf{R}_\mathbf{U}\mathbf{e}_2\cdot\boldsymbol{\sigma})
(\mathbf{R}_\mathbf{U}\mathbf{e}_3\cdot\boldsymbol{\sigma})\right)&=\tr\left(\mathbf{U}\boldsymbol{\sigma}_1\mathbf{U}^\dagger\mathbf{U}\boldsymbol{\sigma}_2\mathbf{U}^\dagger\mathbf{U}\boldsymbol{\sigma}_3\mathbf{U}^\dagger\right)\\
&=\tr(\boldsymbol{\sigma}_1\boldsymbol{\sigma}_2\boldsymbol{\sigma}_3)\\
&=2i
\end{align*}
we see that \(\mathbf{R}_\mathbf{U}\in SO(3)\). Also we observe that given two unitary matrices \(\mathbf{U}_1\) and \(\mathbf{U}_2\),
\begin{align*}
\mathbf{R}_{\mathbf{U}_1\mathbf{U}_2}\mathbf{v}\cdot\boldsymbol{\sigma}&=(\mathbf{U}_1\mathbf{U}_2)\mathbf{v}\cdot\boldsymbol{\sigma}(\mathbf{U}_1\mathbf{U}_2)^\dagger\\
&=\mathbf{U}_1\left(\mathbf{R}_{\mathbf{U}_2}\mathbf{v}\cdot\boldsymbol{\sigma}\right)\mathbf{U}_1^\dagger\\
&=(\mathbf{R}_{\mathbf{U}_1}\mathbf{R}_{\mathbf{U}_2}\mathbf{v})\cdot\boldsymbol{\sigma}
\end{align*}
So defining the map \(\Phi:SU(2)\mapto SO(3)\) such that
\begin{equation}
\Phi(\mathbf{U}(\mathbf{n},\theta))=\mathbf{R}_\mathbf{U},
\end{equation}
we have a group homomorphism. The kernel of this map consists of unitary matrices \(\mathbf{U}(\mathbf{n},\theta)\) such that
\begin{equation*}
\mathbf{U}(\mathbf{n},\theta)(\mathbf{v}\cdot\boldsymbol{\sigma})=(\mathbf{v}\cdot\boldsymbol{\sigma})\mathbf{U}(\mathbf{n},\theta)
\end{equation*}
for any vector \(\mathbf{v}\). It follows that \(\mathbf{U}(\mathbf{n},\theta)\) must be a multiple of the identity matrix and since \(\det\mathbf{U}(\mathbf{n},\theta)=1\) it can only be \(\pm\mathbf{I}\). Thus, \(\ker\Phi=\{\mathbf{I},-\mathbf{I}\}\) and so the homomorphism is 2-to-1. To confirm the nature of the spatial rotation corresponding to \(\mathbf{U}(\mathbf{n},\theta)\), chose \(\mathbf{v}=\mathbf{n}\) in \eqref{eq:rotation from unitary} to see that \(\mathbf{R}_\mathbf{U}\mathbf{n}=\mathbf{n}\) so that \(\mathbf{R}_\mathbf{U}\) is a rotation about the axis \(\mathbf{n}\). To determine the angle \(\gamma\) of rotation we note that if \(\mathbf{m}\) is a vector perpendicular to \(\mathbf{n}\) then \(\cos\gamma=(\mathbf{R}_\mathbf{U}\mathbf{m})\cdot\mathbf{m}\) and we have
\begin{align*}
\cos\gamma&=(\mathbf{R}_\mathbf{U}\mathbf{m})\cdot\mathbf{m}\\
&=\frac{1}{2}\tr\left(\left(\cos\frac{\theta}{2}\mathbf{I}-i\sin\frac{\theta}{2}\mathbf{n}\cdot\boldsymbol{\sigma}\right)(\mathbf{m}\cdot\boldsymbol{\sigma})\right.\\
&\qquad\times\left.\left(\cos\frac{\theta}{2}\mathbf{I}+i\sin\frac{\theta}{2}\mathbf{n}\cdot\boldsymbol{\sigma}\right)(\mathbf{m}\cdot\boldsymbol{\sigma})\right)\\
&=\frac{1}{2}\tr\left(\left(\cos\frac{\theta}{2}\mathbf{m}\cdot\boldsymbol{\sigma}-\sin\frac{\theta}{2}(\mathbf{m}\times\mathbf{n})\cdot\boldsymbol{\sigma}\right)\right.\\
&\qquad\times\left.\left(\cos\frac{\theta}{2}\mathbf{m}\cdot\boldsymbol{\sigma}+\sin\frac{\theta}{2}(\mathbf{m}\times\mathbf{n})\cdot\boldsymbol{\sigma}\right)\right)\\
&=\cos^2\frac{\theta}{2}-\sin^2\frac{\theta}{2}\\
&=\cos\theta
\end{align*}
so that the unitary operator \(\mathbf{U}(\mathbf{n},\theta)\) corresponds to a spatial rotation about the axis \(\mathbf{n}\) through an angle \(\theta\). We therefore denote the rotation \(\mathbf{R}(\mathbf{n},\theta)\).\(\blacksquare\)

The Bloch sphere revisited

We have seen that any qubit observable, \(Q\), can be represented as a matrix

\begin{equation*}
\mathbf{Q}=q_0\mathbf{I}+\mathbf{q}\cdot\boldsymbol{\sigma}
\end{equation*}

where \(q_0\in\RR\) and \(\mathbf{q}\in\RR^3\). Recall the Bloch sphere,

in which a general qubit state, \(\ket{\psi}\), is given by,

\begin{equation*}
\ket{\psi}=\cos(\theta/2)\ket{0}+e^{i\varphi}\sin(\theta/2)\ket{1}.
\end{equation*}

It can be useful to denote this state vector by \(\ket{\mathbf{n};+}\), where \(\mathbf{n}\) is the unit vector with polar coordinates \((1,\theta,\phi)\), that is,
\begin{equation*}
\ket{\mathbf{n};+}=\cos(\theta/2)\ket{0}+e^{i\varphi}\sin(\theta/2)\ket{1}.
\end{equation*}
where
\begin{equation*}
\mathbf{n}=(\sin\theta\cos\phi,\sin\theta\sin\phi,\cos\theta)
\end{equation*}
and
\begin{equation*}
\ket{\mathbf{n};-}=\sin(\theta/2)\ket{0}-e^{i\varphi}\cos(\theta/2)\ket{1}.
\end{equation*}
corresponding to the antipodal point on the Bloch sphere (\(\theta\mapto\pi-\theta\) and \(\phi\mapto2\pi+\phi\)). Indeed, \(\ket{\mathbf{n};\pm}\) are precisely the eigenvectors of the observable \(\mathbf{n}\cdot\boldsymbol{\sigma}\),
\begin{equation}
(\mathbf{n}\cdot\boldsymbol{\sigma})\ket{\mathbf{n};\pm}=\pm\ket{\mathbf{n};\pm}.
\end{equation}
For example,
\begin{align*}
(\mathbf{n}\cdot\boldsymbol{\sigma})\ket{\mathbf{n};+}&=\begin{pmatrix}\cos\theta&&e^{-i\phi}\sin\theta\\ e^{i\phi}\sin\theta&&-\cos\theta\end{pmatrix}\begin{pmatrix}\cos\frac{\theta}{2}\\e^{i\phi}\sin\frac{\theta}{2}\end{pmatrix}\\
&=\begin{pmatrix}\cos\theta\cos\frac{\theta}{2}+\sin\theta\sin\frac{\theta}{2}\\
e^{i\phi}(\sin\theta\cos\frac{\theta}{2}-\cos\theta\sin\frac{\theta}{2})\end{pmatrix}\\
&=\begin{pmatrix}\cos\frac{\theta}{2}\\e^{i\phi}\sin\frac{\theta}{2}\end{pmatrix}=\ket{\mathbf{n};+}
\end{align*}
Note of course that \(\braket{\mathbf{n};+|\mathbf{n};-}=0\), that is \(\ket{\mathbf{n};+}\) and \(\ket{\mathbf{n};-}\) are orthogonal as state vectors in the Hilbert space \(\CC^2\) though of course \(\mathbf{n}\) and \(-\mathbf{n}\) are certainly not orhthogonal vectors in \(\RR^3\)!

We’ve seen that there is a 2-to-1 homomorphism from \(SU(2)\) to \(SO(3)\) such that \(\mathbf{U}(\mathbf{n},\theta)\mapsto\mathbf{R}(\mathbf{n},\theta)\) where
\begin{equation*}
\mathbf{U}(\mathbf{n},\theta)=\exp\left(-i\frac{\theta}{2}\mathbf{n}\cdot\boldsymbol{\sigma}\right)=\cos\frac{\theta}{2}\mathbf{I}-i\sin\frac{\theta}{2}\mathbf{n}\cdot\boldsymbol{\sigma}
\end{equation*}
and the rotation \(\mathbf{R}(\mathbf{n},\theta)\) is such that
\begin{equation*}
(\mathbf{R}(\mathbf{n},\theta)\mathbf{v})\cdot\boldsymbol{\sigma}=\mathbf{U}(\mathbf{n},\theta)\mathbf{v}\cdot\boldsymbol{\sigma}\mathbf{U}(\mathbf{n},\theta)^\dagger,
\end{equation*}
which we confirmed was a rotation of \(\theta\) about the axis \(\mathbf{n}\). This means that for an arbitrary vector \(\mathbf{v}\in\RR^3\),
\begin{equation}
\mathbf{R}(\mathbf{n},\theta)\mathbf{v}=\cos\theta\mathbf{v}+(1-\cos\theta)(\mathbf{v}\cdot\mathbf{n})\mathbf{n}+\sin\theta\mathbf{n}\times\mathbf{v}.
\end{equation}

In terms of the state vector notation \(\ket{\mathbf{n};\pm}\) relating unit vectors in \(\RR^3\) to states on the Bloch sphere we have that
\begin{equation}
\mathbf{U}(\mathbf{m},\alpha)\ket{\mathbf{n};+}=\ket{\mathbf{R}(\mathbf{m},\alpha)\mathbf{n};+}
\end{equation}
since
\begin{align*}
\left((\mathbf{R}(\mathbf{m},\alpha)\mathbf{n})\cdot\boldsymbol{\sigma}\right)\mathbf{U}(\mathbf{m},\alpha)\ket{\mathbf{n};+}&=\mathbf{U}(\mathbf{m},\alpha)\mathbf{n}\cdot\boldsymbol{\sigma}\ket{\mathbf{n};+}\\
&=\mathbf{U}(\mathbf{m},\alpha)\ket{\mathbf{n};+}
\end{align*}

Thus, as would have been anticipated, the unitary operator \(\mathbf{U}(\mathbf{m},\alpha)\) rotates the Bloch sphere state \(\ket{\mathbf{n};+}\) by an angle \(\alpha\) around the axis \(\mathbf{m}\).

Basic Postulates and Mathematical Framework

Quantum mechanics plays out in the mathematical context of Hilbert spaces. These may be finite or infinite dimensional. A finite dimensional Hilbert space in in fact nothing other than a complex vector space equipped with an Hermitian inner product. In infinite dimensions the space needs some extra technical attributes but for the time being our focus will be on finite dimensional spaces. We’ll state a series of postulates in terms of (general) Hilbert spaces safe in the knowledge that they will require little or no modification when we come to consider infinite dimensions.

State vectors

Postulate [State vectors in state space] Everything that can be said about the state of a physical system is encoded in a mathematical object called a state vector belonging to a Hilbert space also called a state space. Commonly used notation for such a vector is \(\ket{\psi}\). Conversely every non-zero vector \(\ket{\psi}\) in the Hilbert space corresponds to (everything that can be said about) a possible state of the system.

In fact, any non-zero multiple of a vector \(\ket{\psi}\) contains precisely the same information about a given state of the system and so most often we restrict attention to normalised states, that is, those of unit length, \(\braket{\psi|\psi}=1\). But normalisation only fixes state vectors up to a phase factor and the equivalence class in state space of all normalised vectors differing only by a phase is called a ray. Thus, to be precise, we say that physical states are in one-to-one correspondence with rays in state space.

Remarkable richness and physical relevance is already found in a 2-dimensional state space, the inhabitants of which are called qubits (their \(n\)-dimensional counterparts are called qudits). Mathematically, such a state space is just \(\CC^2\) and the standard basis would be provided by the pair of vectors
\begin{equation*}
\begin{pmatrix}1\\0\end{pmatrix}\qquad\qquad\begin{pmatrix}0\\1\end{pmatrix}
\end{equation*}
In quantum information contexts these are typically denoted by \(\ket{0}\) and \(\ket{1}\) respectively,
\begin{equation}
\ket{0}=\begin{pmatrix}1\\0\end{pmatrix}\qquad\qquad\ket{1}=\begin{pmatrix}0\\1\end{pmatrix}
\end{equation}
An arbitary state vector in this state space would have the form,
\begin{equation}
\ket{\psi}=a\ket{0}+b\ket{1}
\end{equation}
with \(a,b\in\CC\) and normalisation requiring that \(|a|^2+|b|^2=1\). This means that we could write a general state vector as
\begin{equation*}
\ket{\psi}=e^{i\gamma}\left(\cos(\theta/2)\ket{0}+e^{i\varphi}\sin(\theta/2)\ket{1}\right)
\end{equation*}
but for different \(\gamma\) values these are just all the members of the same ray and so we can write the most general state as
\begin{equation}
\ket{\psi}=\cos(\theta/2)\ket{0}+e^{i\varphi}\sin(\theta/2)\ket{1}.
\end{equation}
Given the assumption of unit length, we can represent qubits as point on a unit sphere, called the Bloch sphere,

In this diagram diagram we have illustrated an arbitrary qubit, \(\ket{\psi}\), as well as another possible pair of basis vectors,
\begin{equation}
\ket{+}\equiv\frac{1}{\sqrt{2}}\left(\ket{0}+\ket{1}\right)
\end{equation}
and
\begin{equation}
\ket{-}\equiv\frac{1}{\sqrt{2}}\left(\ket{0}-\ket{1}\right).
\end{equation}

Observables

Physical quantities of a quantum mechanical system which can be measured, such as position or momentum, are called observables.

Postulate [Observables] The observables of a quantum mechanical system corresponding to a state space, \(\mathcal{H}\), are represented by self-adjoint (Hermitian) operators on \(\mathcal{H}\).

In two dimensions, \(\mathcal{H}=\CC^2\), the Pauli matrices,

\begin{equation*}
\boldsymbol{\sigma}_1=\begin{pmatrix}0&1\\1&0\end{pmatrix}\qquad\boldsymbol{\sigma}_2=\begin{pmatrix}0&-i\\i&0\end{pmatrix}\qquad\boldsymbol{\sigma}_3=\begin{pmatrix}1&0\\0&-1\end{pmatrix},
\end{equation*}

are examples of (qubit) observables. In due course we’ll see their relationship to spin.

Recall that the eigenvalues of Hermitian operators are real, that eigenvectors with distinct eigenvalues are orthogonal and that there exists an orthonormal basis of eigenvectors of such operators. If the state space, \(\mathcal{H}\), has dimension \(d\) then a quantum mechanical observable, \(O\), may have \(r\leq d\) distinct eigenvalues \(\lambda_i\), each with geometric multiplicity \(d_i\) such that \(\sum_{i=1}^rd_i=d\). Denoting the corresponding othonormal basis of eigenvectors, \(\ket{i,j}\), with \(i=1,\dots,r\) and \(j=1,\dots,d_i\), then,
\begin{equation*}
O\ket{i,j}=\lambda_i\ket{i,j},\quad i=1,\dots,r,\; j=1,\dots,d_i.
\end{equation*}
If we denote by \(P_{\lambda_i}\) the projector onto the eigenspace, \(V_{\lambda_i}\), so that
\begin{equation*}
P_{\lambda_i}=\sum_{j=1}^{d_i}\ket{i,j}\bra{i,j},
\end{equation*}
then \(O\) has the spectral decomposition,
\begin{equation*}
O=\sum_{i=1}^r \lambda_iP_{\lambda_i}.
\end{equation*}

Time Development of State Vectors

Leaving aside for the moment the question of how we extract some meaningful information from these state vectors, the next postulate deals with the question of how state vectors change in time. For this we restrict attention to closed systems, that is, (idealised) systems isolated from their environment.

Postulate [Unitary time evolution] If the state of a closed system at time \(t_1\) is represented by a state vector \(\ket{\psi(t_1)}\) then at a later time \(t_2\) the state is represented by a state vector \(\ket{\psi(t_2)}\) related to \(\ket{\psi(t_1)}\) by a unitary operator \(U(t_2,t_1)\) such that
\begin{equation}
\ket{\psi(t_2)}=U(t_2,t_1)\ket{\psi(t_1)}
\end{equation}
The unitary operator \(U(t_2,t_1)\) is a property of the given physical system and describes the time evolution of any possible state of the system from time \(t_1\) to time \(t_2\).

Since \(U(t_2,t_1)\) is unitary we have that
\begin{equation*}
U^\dagger(t_2,t_1)U(t_2,t_1)=\id
\end{equation*}
but of course \(U(t,t)=\id\) and if \(t_1{<}t{<}t_2\) then \(U(t_2,t_1)=U(t_2,t)U(t,t_1)\) so we must have
\begin{equation*}
U^\dagger(t_2,t_1)=U(t_1,t_2).
\end{equation*}

Starting from some fixed time \(t_0\) let us consider the time development of a state \(\ket{\psi(t_0)}\) to some later time \(t\),
\begin{equation*}
\ket{\psi(t)}=U(t,t_0)\ket{\psi(t_0)}.
\end{equation*}
Differentiating this with respect to \(t\) we obtain
\begin{equation*}
\frac{\partial}{\partial t}\ket{\psi(t)}=\frac{\partial U(t,t_0)}{\partial t}U^\dagger(t,t_0)\ket{\psi(t)},
\end{equation*}
or, defining
\begin{equation*}
\Lambda(t,t_0)\equiv\frac{\partial U(t,t_0)}{\partial t}U^\dagger(t,t_0),
\end{equation*}
we have
\begin{equation*}
\frac{\partial}{\partial t}\ket{\psi(t)}=\Lambda(t,t_0)\ket{\psi(t)}.
\end{equation*}
The operator \(\Lambda\) is actually independent of \(t_0\) since,
\begin{align*}
\Lambda(t,t_0)&=\frac{\partial U(t,t_0)}{\partial t}U^\dagger(t,t_0)\\
&=\frac{\partial U(t,t_0)}{\partial t}U(t_0,t_1)U^\dagger(t_0,t_1)U^\dagger(t,t_0)\\
&=\frac{\partial U(t,t_0)U(t_0,t_1)}{\partial t}(U(t,t_0)U(t_1,t_1))^\dagger\\
&=\frac{\partial U(t,t_0)}{\partial t}U^\dagger(t,t_0)\\
&=\Lambda(t,t_1)
\end{align*}
so we may as well write it simply as \(\Lambda(t)\). Moreover, \(\Lambda(t)\) is anti-Hermitian as can be seen by differentiating \(U(t,t_0)U^\dagger(t,t_0)=\id\) to obtain
\begin{equation*}
\Lambda(t)+\Lambda^\dagger(t)=0.
\end{equation*}
Thus if we define a new operator \(H(t)\) according to
\begin{equation}
H(t)=i\hbar\Lambda(t)
\end{equation}
where \(\hbar\) is Planck’s constant, then \(H(t)\) is an Hermitian operator with units of energy and the time development equation becomes
\begin{equation}
i\hbar\frac{\partial}{\partial t}\ket{\psi(t)}=H(t)\ket{\psi(t)}.
\end{equation}
The operator \(H(t)\) is interpreted as the Hamiltonian of the closed system, the energy observable, and the time development equation in this form is called the Schrödinger equation.

Because the Hamiltonian is an Hermitian operator it has a spectral decomposition (dropping the the explicit reference to potential time dependence)
\begin{equation}
H=\sum_{i=1}^rE_iP_{E_i}
\end{equation}
where \(E_i\) are the (real) energy eigenvalues and \(P_{E_i}\) is a projector onto the energy eigenspace corresponding to the eigenvalue \(E_i\),
\begin{equation}
P_{E_i}=\sum_{j=1}^{d_i}\ket{E_i,j}\bra{E_i,j}
\end{equation}
where \(\ket{E_i,j}\) are energy eigenstates and \(d_i\) is the degeneracy of the energy eigenvalue \(E_i\).

The typical situation is that for a given closed system we know the Hamiltonian \(H(t)\), perhaps by analogy with a corresponding classical system. In this case, at least in principle, we can compute the corresponding unitary operator \(U(t)\) by solving the differential equation
\begin{equation}
\frac{dU(t)}{dt}=-\frac{i}{\hbar}H(t)U(t).
\end{equation}
There are 3 cases to consider.

The simplest situation is that the Hamiltonian is time independent since then it is straightforward to confirm that the solution is
\begin{equation}
U(t,t_0)=\exp\left[-\frac{i}{\hbar}H(t-t_0)\right].
\end{equation}

The second case is that the Hamiltonian is time dependent but the Hamiltonians at two different times commute, that is, \([H(t_1),H(t_2)]=0\), then we claim that the solution is
\begin{equation}
U(t,t_0)=\exp\left[-\frac{i}{\hbar}\int_{t_0}^tds\,H(s)\right].
\end{equation}
To see this first define
\begin{equation*}
R(t)=-\frac{i}{\hbar}\int_{t_0}^tds\,H(s),
\end{equation*}
so that \(R'(t)=-(i/\hbar)H(t)\) and note that
\begin{equation*}
[R'(t),R(t)]=\left[-\frac{i}{\hbar}H(t),-\frac{i}{\hbar}\int_{t_0}^tds\,H(s)\right]=-\frac{1}{\hbar^2}\int_{t_0}^tds\,[H(t),H(s)]=0.
\end{equation*}
That \(R'(t)\) and \(R(t)\) commute then means that me can write the derivative,
\begin{align*}
\frac{d}{dt}\exp R(t)&=\frac{d}{dt}\left[\id+R(t)+\frac{1}{2!}R(t)R(t)+\frac{1}{3!}R(t)R(t)R(t)+\dots\right]\\
&=R’+\frac{1}{2!}(R’R+RR’)+\frac{1}{3!}(R’RR+RR’R+RRR’)+\dots
\end{align*}
as
\begin{equation*}
\frac{d}{dt}\exp R(t)=R’\left(\id+R+\frac{1}{2!}R^2+\dots\right)=R'(t)\exp R(t),
\end{equation*}
confirming the result.

The third case is the most general situation in which Hamiltonians at two different times do not commute. In this case the best we can do is write the differential equation for \(U(t)\) as an integral equation,
\begin{equation*}
U(t,t_0)=\id-\frac{i}{\hbar}\int_{t_0}^tdt_1\,H(t_1)U(t_1,t_0)
\end{equation*}
and then, expressing \(U(t_1,t_0)\) as
\begin{equation*}
U(t_1,t_0)=\id-\frac{i}{\hbar}\int_{t_0}^{t_1}dt_2\,H(t_2)U(t_2,t_0)
\end{equation*}
iterate once to obtain,
\begin{equation*}
U(t,t_0)=\id+\left(-\frac{i}{\hbar}\right)\int_{t_0}^tdt_1\,H(t_1)+\left(-\frac{i}{\hbar}\right)^2\int_{t_0}^tdt_1H(t_1)\int_{t_0}^{t_1}dt_2H(t_2)U(t_2,t_0).
\end{equation*}
Continuing in this way we obtain a formal series,
\begin{align*}
U(t,t_0)=\id+\left(-\frac{i}{\hbar}\right)\int_{t_0}^tdt_1\,H(t_1)&+\left(-\frac{i}{\hbar}\right)^2\int_{t_0}^tdt_1H(t_1)\int_{t_0}^{t_1}dt_2H(t_2)\\
&+\left(-\frac{i}{\hbar}\right)^3\int_{t_0}^tdt_1H(t_1)\int_{t_0}^{t_1}dt_2H(t_2)\int_{t_0}^{t_2}dt_3H(t_3)\\
&+\dots
\end{align*}
the right hand side of which is called a time-ordered exponential.

Measurement

Information is extracted from a quantum system through the process of measurement and, in contrast to classical physics, the process of measurement is incorporated into the theoretical framework.

Postulate [General measurement]To the \(i\)th possible outcome of a measurement of a quantum system in a state \(\ket{\psi}\) there corresponds a measurement operator \(M_i\) such that the probability that the \(i\)th outcome occurs is \(p(i)\) where
\begin{equation}
p(i)=\bra{\psi}M^\dagger_iM_i\ket{\psi}
\end{equation}
and if this occurs then the state of the system after the measurement is given by,
\begin{equation}
\frac{M_i\ket{\psi}}{\sqrt{\bra{\psi}M_i^\dagger M_i\ket{\psi}}}.
\end{equation}
The measurement operators satisfy
\begin{equation}
\sum_iM_i^\dagger M_i=\id
\end{equation}
expressing the fact that the probabilities sum to 1.

Distinguishing States by General Measurements

Suppose we have a two dimensional state space and we are given one of two states, \(\ket{0}\) or \(\ket{1}\), at random. There is a measurement which can definitely distinguish between these two states, namely, defining the measurement operators \(M_0=\ket{0}\bra{0}\) and \(M_1=\ket{1}\bra{1}\) then \(M_0+M_1=\id\) and assuming we receive \(\ket{0}\) then \(p(0)=1\), that is \(p(0|\text{receive} \ket{0})=1\) and similarly \(p(1|\text{receive} \ket{1})=1\) so that the probability of successfully identifying the received state is
\begin{equation*}
P_S=p(\text{receive} \ket{0})p(0\,|\,\text{receive} \ket{0})+p(\text{receive} \ket{1})p(1\,|\,\text{receiving} \ket{1})=1.
\end{equation*}
Of course this is a perfect situation. More realistic is that we must decide what kind of measurement to perform and how to infer from a given measurement outcome the identity of the original state. So for example if we (correctly) chose the \(\{M_0,M_1\}\) measurement but inferred from a 0 outcome the state \(\ket{1}\) and vice versa then the probability of successful identification would be 0. If instead we chose a measurement based on basis elements \(\ket{+}\) and \(\ket{-}\), and inferred from a \(+\) outcome the state \(\ket{0}\) and from a \(-\) outcome the state \(\ket{1}\) then since
\begin{equation*}
p(0\,|\,\text{receive} \ket{0})=p(+\,|\,\text{receive} \ket{0})=\braket{0|+}\braket{+|0}=\frac{1}{2}
\end{equation*}
and
\begin{equation*}
p(1\,|\,\text{receive} \ket{1})=p(-\,|\,\text{receive} \ket{1})=\braket{1|-}\braket{-|1}=\frac{1}{2}
\end{equation*}
the probability of successfully identifying the received state is \(1/2\).

We can generalise this discussion as follows. Suppose we receive, with equal probability, one of \(N\) states \(\{\ket{\phi_1},\dots,\ket{\phi_N}\}\) from a \(d\)-dimensional subspace \(U\subset\mathcal{H}\) of a state space \(\mathcal{H}\). We investigate the probability of successfully distinguishing these \(N\) states based on a measurement corresponding to \(n\) measurement operators \(\{M_1,\dots,M_n\}\). We need a rule which encodes how we infer one of the \(N\) given states from one of the \(n\) measurement outcomes. We can express this as a surjective map \(f\) from the set of outcomes, \(\{1,\dots,n\}\), to the given states, \(\{1,\dots,N\}\). Then the probability of success is given by
\begin{equation}
P_S=\sum_{i=1}^Np(\text{receive} \ket{\phi_i})\times\left(\sum_{j:f(j)=i}p(j\,|\,\text{receive} \ket{\phi_i})\right).
\end{equation}
Now if by \(P_U\) we denote the orthogonal projector onto the subspace \(U\) to which the \(N\) states \(\ket{\phi_i}\) belong then we can write
\begin{equation}
p(j\,|\,\text{receive} \ket{\phi_i})=\braket{\phi_i|M_j^\dagger M_j|\phi_i}=\braket{\phi_i|P_UM_j^\dagger M_jP_U|\phi_i}.
\end{equation}
But \(M_j^\dagger M_j\) is a positive operator and therefore so is \(P_UM_j^\dagger M_jP_U\) and since the \(\ket{\phi_i}\) are assumed to be normalised we can say that \(\braket{\phi_i|P_UM_j^\dagger M_jP_U|\phi_i}\leq\tr P_UM_j^\dagger M_jP_U\). Thus, noting that \(\tr P_U=d\), we obtain
\begin{align*}
P_S&=\sum_{i=1}^Np(\text{receive} \ket{\phi_i})\times\left(\sum_{j:f(j)=i}\braket{\phi_i|P_UM_j^\dagger M_jP_U|\phi_i}\right)\\
&=\frac{1}{N}\sum_{i=1}^N\sum_{j:f(j)=i}\braket{\phi_i|P_UM_j^\dagger M_jP_U|\phi_i}\\
&\leq\frac{1}{N}\sum_{i=1}^N\sum_{j:f(j)=i}\tr P_UM_j^\dagger M_jP_U\\
&=\frac{1}{N}\tr P_U\left(\sum_jM_j^\dagger M_j\right)P_U\\
&=\frac{d}{N}.
\end{align*}
That is, the probability of success is bounded from above according to \(P_S\leq d/N\). If \(N\leq d\) and the states \(\{\ket{\phi_1},\dots,\ket{\phi_N}\}\) are orthogonal then it is possible to distinguish the states with certainty. Indeed, defining operators \(M_i=\ket{\phi_i}\bra{\phi_i}\) for \(i=1,..,N\) and \(M_{N+1}=\sqrt{\id-\sum_{i=1}^NM_i}\) then we have the appropriate measurement to be combined with with the trivial inference map \(f(i)=i\) for \(i=1,\dots,n\) and, for example, \(f(N+1)=1\).

Let’s now focus on the case that we have two states \(\ket{\phi_1}\) and \(\ket{\phi_2}\) belonging to a two-dimensional subspace \(U\). We already know that if the states are orthogonal then in principle it is possible to distinguish them with certainty. Let us then consider the case that they are not orthogonal. We will show that in this case we cannot reliably distinguish the two states. To see this suppose on the contrary that it were indeed possible. Then we must have a measurement with operators \(M_i\) and an inference rule \(f\) such that
\begin{equation*}
\sum_{j:f(j)=1}p(j\,|\,\text{receive} \ket{\phi_1})=\sum_{j:f(j)=1}\braket{\phi_1|M_j^\dagger M_j|\phi_1}=1
\end{equation*}
and
\begin{equation*}
\sum_{j:f(j)=2}p(j\,|\,\text{receive} \ket{\phi_2})=\sum_{j:f(j)=2}\braket{\phi_2|M_j^\dagger M_j|\phi_2}=1
\end{equation*}
So defining \(E_1\equiv\sum_{j:f(j)=1}M_j^\dagger M_j\) and \(E_2\equiv\sum_{j:f(j)=2}M_j^\dagger M_j\) and noting that \(E_1+E_2=\id\) we have \(\braket{\phi_1|E_2|\phi_1}=0\) so that \(\sqrt{E_2}\ket{\phi_1}=0\). Now we can form an orthonormal basis for \(U\) as \(\{\ket{\phi_1},\ket{\tilde{\phi_1}}\}\) such that \(\ket{\phi_2}=\alpha\ket{\phi_1}+\beta\ket{\tilde{\phi_1}}\) with \(|\alpha|^2+|\beta|^2=1\) and \(\beta<1\). So that
\begin{equation*}
\braket{\phi_2|E_2|\phi_2}=|\beta|^2\braket{\tilde{\phi_1}|E_2|\tilde{\phi_1}}{<}1.
\end{equation*}

Projective measurement

The preceding discussion of measurement is rather abstract – we have conspicuously not mentioned what is being measured. Let us now consider the more familiar projective measurement corresponding to the measurement of a particular observable.

Postulate [Projective measurement] The eigenvalues \(\lambda_i\) of the Hermitian operator, \(O\), representing a quantum mechanical observable are the possible outcomes of any experiment carried out on the system to establish the value of the observable. In this case the measurement operators are the orthogonal projectors, \(P_{\lambda_i}\), of the spectral decomposition of the Hermitian operator \(O\). That is, \(O=\sum_i\lambda_iP_{\lambda_i}\) where \(P_{\lambda_i}\) is the projector onto the eigenspace corresponding to the eigenvalue \(\lambda_i\). If a system is in a state \(\ket{\psi}\) then a measurement of an observable represented by the operator \(O\) will obtain a value \(\lambda_i\) with a probability
\begin{equation}
p(i)=\braket{\psi|P_i|\psi}
\end{equation}
and subsequently the system will be in a state
\begin{equation}
\frac{P_i\ket{\psi}}{\sqrt{p(i)}}
\end{equation}

Note that in contrast to general measurements, if we repeat a projective measurement of the same observable then we are guaranteed to get the same outcome.

We sometimes speak of measuring in (or along) a basis. Suppose \(\ket{i}\) is an orthonormal basis for the Hilbert space describing our system. If the system is initially in a state \(\ket{\psi}\) and we make a measurement in the basis \(\ket{i}\) then with probability \(P(i)=|\braket{i|\psi}|^2\) the measurement results in the system being in the state \(\ket{i}\). The measurement operators in this case are the one-dimensional projectors \(\ket{i}\bra{i}\).

Expectation values and uncertainty relations

The expectation value of the operator \(O\) when the system is in a state \(\ket{\psi}\), that is the expected value of a (projective) measurement of the observable represented by the operator \(O\) when the system is described by the state vector \(\ket{\psi}\) is given by
\begin{equation*}
\mathbf{E}_{\psi}[O]=\sum_ip(i)\lambda_i=\sum_i\braket{\psi|P_i|\psi}\lambda_i=\sum_i\braket{\psi|\lambda_iP_i|\psi}=\braket{\psi|O|\psi}.
\end{equation*}
We typically denote this expectation value \(\braket{O}_{\psi}\), thus
\begin{equation}
\braket{O}_{\psi}=\braket{\psi|O|\psi}.
\end{equation}

If a system is in an eigenstate of an observable \(O\) then when we measure this property we are sure to obtain the eigenvalue corresponding to that eigenstate. If though the system is in some arbitrary state \(\ket{\psi}\) then there will be some uncertainty in the value obtained. We denote the uncertainty of the Hermitian operator \(O\) in the state \(\ket{\psi}\) by \(\Delta_{\psi}O\), defined by
\begin{equation}
\Delta_{\psi}O\equiv\left|\left(O-\braket{O}_{\psi}\id\right)\ket{\psi}\right|
\end{equation}

It is not difficult to see that the uncertainty \(\Delta_{\psi}O\) vanishes if and only if \(\ket{\psi}\) is an eigenstate of \(O\).

We would expect there to be a relationship between the uncertainty \(\Delta_{\psi}O\) and the usual statistical standard deviation, \(\sqrt{\mathbf{E}_{\psi}[O^2]-\mathbf{E}_{\psi}[O]^2}\) and indeed we have,
\begin{align*}
(\Delta_{\psi}O)^2&=\left|\left(O-\braket{O}_{\psi}\id\right)\ket{\psi}\right|^2\\
&=\braket{\psi|\left(O-\braket{O}_{\psi}\id\right)^2|\psi}\\
&=\braket{\psi|O^2-2\braket{O}_{\psi}O+\braket{O}_{\psi}^2\id|\psi}\\
&=\braket{O^2}_{\psi}-\braket{O}_{\psi}^2.
\end{align*}

Geometrically, the orthogonal projection of \(O\ket{\psi}\) on the 1-dimensional subspace, \(U_{\psi}\), spanned by \(\ket{\psi}\) is \(P_{\psi}O\ket{\psi}\) where \(P_{\psi}\equiv\ket{\psi}\bra{\psi}\) and therefore \(P_{\psi}O\ket{\psi}=\braket{O}_{\psi}\ket{\psi}\). Furthermore, the component of \(O\ket{\psi}\) in the orthogonal complement of \(U_{\psi}\), \(U_{\psi}^\perp\), is \((\id-P_{\psi})O\ket{\psi}\), the length of which is just \(\Delta_{\psi}O\).

Theorem (The Uncertainty Principle) In a state \(\ket{\psi}\) the uncertainties in any pair of Hermitian operators, \(A\) and \(B\), satisfy the the relation
\begin{equation}
\Delta_{\psi}A\Delta_{\psi}B\geq\left|\braket{\psi|\frac{1}{2i}[A,B]|\psi}\right|
\end{equation}

Proof This is simply an application of the Cauchy-Schwarz inequality. We define two new operators, \(\tilde{A}=A-\braket{A}_{\psi}\id\) and \(\tilde{B}=B-\braket{B}_{\psi}\id\) and states \(\ket{a}=\tilde{A}\ket{\psi}\) and \(\ket{b}=\tilde{B}\ket{\psi}\). Then Cauchy-Schwarz tells us that
\begin{equation*}
\braket{a|a}\braket{b|b}\geq\left|\braket{a|b}\right|^2,
\end{equation*}
from which, observing that \(\braket{a|a}=(\Delta_{\psi}A)^2\) and \(\braket{b|b}=(\Delta_{\psi}B)^2\),
\begin{equation*}
(\Delta_{\psi}A)^2(\Delta_{\psi}B)^2\geq\left|\braket{a|b}\right|^2.
\end{equation*}
Now, \(\braket{a|b}=\braket{\psi|\tilde{A}\tilde{B}|\psi}\), and observe that we can write,
\begin{equation*}
\braket{\psi|\tilde{A}\tilde{B}|\psi}=\frac{1}{2}\braket{\psi|\{\tilde{A},\tilde{B}\}|\psi}+\frac{1}{2i}\braket{\psi|[\tilde{A},\tilde{B}]|\psi}i
\end{equation*}
where \(\{\tilde{A},\tilde{B}\}=\tilde{A}\tilde{B}+\tilde{B}\tilde{A}\) is the anti-commutator of \(\tilde{A}\) and \(\tilde{B}\). Therefore we have
\begin{equation*}
\left|\braket{a|b}\right|^2=\left(\frac{1}{2}\braket{\psi|\{\tilde{A},\tilde{B}\}|\psi}\right)^2+\left(\frac{1}{2i}\braket{\psi|[\tilde{A},\tilde{B}]|\psi}\right)^2
\end{equation*}
and can write the uncertainty relation as
\begin{equation}
(\Delta_{\psi}A)^2(\Delta_{\psi}B)^2\geq\left(\frac{1}{2}\braket{\psi|\{\tilde{A},\tilde{B}\}|\psi}\right)^2+\left(\frac{1}{2i}\braket{\psi|[\tilde{A},\tilde{B}]|\psi}\right)^2
\end{equation}
from which it immediately follows that
\begin{equation}
(\Delta_{\psi}A)^2(\Delta_{\psi}B)^2\geq\left(\frac{1}{2i}\braket{\psi|[\tilde{A},\tilde{B}]|\psi}\right)^2
\end{equation}
which is just the squared version of the desired result.\(\blacksquare\)

It is of interest to establish under what conditions the uncertainty relation is saturated. As can be seen from the proof, that will require that Cauchy-Schwarz inequality to be saturated and that the term involving the anti-commutator vanishes. We recall that saturation of Cauchy-Schwarz is equivalent to the linear dependence of the two vectors, \(\ket{b}=\alpha\ket{a}\), for some \(\alpha\in\CC\). The anti-commutator term came from the real part of \(\braket{a|b}\) so we require \(\braket{a|b}+\braket{b|a}=0\). That is, using \(\ket{b}=\alpha\ket{a}\), \((\alpha+\alpha^*)\braket{a|a}=0\), so that \(\alpha\) must be pure imaginary, \(\ket{b}=it\ket{a}\) for \(t\in\RR\). In terms of the original operators and state this is the condition,
\begin{equation}
(B-\braket{B}_{\psi}\id)\ket{\psi}=it(A-\braket{A}_{\psi}\id)\ket{\psi}
\end{equation}
from which we see that \(|t|=\Delta_{\psi}B/\Delta_{\psi}A\) and which can be rewritten as an eigenvalue equation involving a non-Hermitian operator \((B-itA)\),
\begin{equation}
(B-itA)\ket{\psi}=(\braket{B}_\psi-it\braket{A}_{\psi})\ket{\psi}.
\end{equation}

The Stern-Gerlach Experiment

Recall that a current circulating in a closed loop induces a magnetic dipole moment (a quantity that determines the torque which the loop will experience in an external magnetic field). This is a vector quantity, \(\boldsymbol{\mu}\), given by, \(\boldsymbol{\mu}=I\mathbf{A}\), where \(I\) is the current in the loop and \(\mathbf{A}\) is the oriented area enclosed by the loop, the orientation given by the right hand rule to be consistent with the direction of the current. The torque \(\boldsymbol{\tau}\) experienced by such a current loop in a magnetic field \(\mathbf{B}\) is then given by \(\boldsymbol{\tau}=\boldsymbol{\mu}\times\mathbf{B}\). This turning force works to align the magnetic moment with the magnetic field.

More generally a rotating charge distribution results in a magnetic moment and if the distribution has mass then an angular momentum which must be related to the magnetic moment. Indeed, consider a ring of charge with radius \(r\) that has a uniform charge distribution and total charge \(Q\). We assume the ring is rotating about an axis perpendicular to the plane of the ring and going through its centre. If the tangential velocity is \(v\), then the current at the loop is given by \(I=\lambda v\) where \(\lambda\) is the charge density, that is, \(\lambda=Q/2\pi r\). Thus we have that the magnitude \(\mu\) of \(\boldsymbol{\mu}\) is given by,
\begin{equation}
\mu=IA=\frac{Q}{2\pi r}v\pi r^2=\frac{Q}{2}rv.
\end{equation}
Now if the mass of the ring is \(M\), then recalling that the angular momentum is given by \(\mathbf{L}=M\mathbf{r}\times\mathbf{v}\), then
\begin{equation}
\boldsymbol{\mu}=\frac{Q}{2M}\mathbf{L}
\end{equation}
In particular, for a single electron with charge \(-e\) and mass \(m_e\),
\begin{equation}
\boldsymbol{\mu}=\frac{-e}{2m_e}\mathbf{L}.
\end{equation}
The ratio of the magnetic moment to the angular momentum is called the gyromagnetic ratio and denoted \(\gamma\). It depends only on the total charge and total mass. Generally we have,
\begin{equation}
\gamma=\frac{\mu}{L}=\frac{Q}{2M},
\end{equation}
and in the case of a single electron,
\begin{equation}
\gamma=\frac{-e}{2m_e}.
\end{equation}

It might be thought that the motion of electrons inside an atom and the motion of protons within a nucleus would account for, respectively, observed atomic and nuclear magnetism. However this is not found to be the case. Rather, such particles possess a wholly intrinsic angular momentum, quite distinct from the usual spatial, or orbital angular momentum, called spin and it is only when this extra contribution is incorporated that agreement with experiment is achieved.

The gyromagnetic ratio associated with spin is different to that associated with spatial or orbital angular momentum. For example, for the electron, this ratio, denoted \(\gamma_e\), is given by,
\begin{equation}
\gamma_e=-\frac{e}{m_e},
\end{equation}
and we have a relationship between the magnetic moment and spin angular momentum, \(\mathbf{S}\), given by,
\begin{equation}
\boldsymbol{\mu}=-g\mu_B\frac{\mathbf{S}}{\hbar},
\end{equation}
where we have introduced the so called “g-factor”, which for an electron is 2, and the Bohr magneton,
\begin{equation}
\mu_B=\frac{e\hbar}{2m_e}.
\end{equation}
Note that despite the motivation in terms of moments induced by current loops, the spin induced magnetic moment has nothing to do with electric charge. Indeed, a neutron carries no charge yet possesses a magnetic moment with gyromagnetic ratio given by,
\begin{equation}
\gamma_n=-3.83\frac{q_p}{2m_p},
\end{equation}
where \(q_p\) and \(m_p\) are respectively the charge and mass of a proton.

The Stern-Gerlach experiment depicted below probes the mysterious intrinsic angular momentum of an electron. Silver atoms have forty-seven electrons. Forty-six of them fill completely the \(n=1,2,3\) and \(4\) energy levels leaving a solitary \(n=5\) electron with zero orbital angular momentum. In the Stern-Gerlach apparatus silver is vaporised in an oven then collimated to create a beam of such atoms which are directed through a magnetic field behind which is a detector screen.
The potential energy, \(U\), of a magnetic moment \(\boldsymbol{\mu}\) in a magnetic field \(\mathbf{B}\) is given by \(U=-\boldsymbol{\mu}\cdot\mathbf{B}\) and the corresponding force is thus,
\begin{equation}
\mathbf{F}=-\nabla U=\nabla(\boldsymbol{\mu}\cdot\mathbf{B}).
\end{equation}
Thus the force points in the direction for which \(\boldsymbol{\mu}\cdot\mathbf{B}\) increases fastest. In the Stern-Gerlach setup the magnetic field is highly inhomogeneous, and to a good approximation,
\begin{equation}
\mathbf{F}=\mu_z\frac{\partial B_z}{\partial z}\mathbf{e}_z.
\end{equation}
Note that in the arrangement depicted, \(\partial B_z/\partial z\) is negative. Now, thanks to the high temperature of the oven generating the beam of silver atoms we would expect, reasoning classically, that the distribution of the magnetic moments of the sliver atoms passing through the apparatus would be isotropic. In particular, the component of the magnetic moment in the \(z\)-direction would be expected to be \(\mu_z=|\boldsymbol{\mu}|\cos\theta\) with no preferred angle \(\theta\) between the moment and the \(z\)-axis. Thus we would expect a spread of deflections with the upper and lower bounds corresponding respectively to \(-|\boldsymbol{\mu}|\) and \(|\boldsymbol{\mu}|\) and therefore a distribution detected by the screen looking something like,In fact what is observed is a distribution of the form,The atoms are deflected either up or down with nothing in between. It is as if all the atoms have either a fixed positive \(\mu_z\), corresponding to the lower screen distribution, or fixed negative \(\mu_z\), corresponding to the upper screen distribution. We therefore conclude that the dipole moment, and therefore the spin angular momentum, of an electron is quantized. The two values of \(\mu_z\) can be calculated and leads to a determination of the two possible values of \(S_z\),
\begin{equation}
S_z=\pm\frac{\hbar}{2}.
\end{equation}
The Stern-Gerlach experiment is effectively measuring the component of spin angular momentum along a particular direction in space of electrons in a beam and finds that they can take just two discrete values which we call up and down. Startling though this is the mystery certainly doesn’t stop here.

We’ll now consider a series of thought experiments, involving two or more Stern-Gerlach experiments in series. A single such experiment, which we’ll subsequently refer to as SG1, will be represented schematically as

The label \(\mathbf{e}_z\) on this machine indicates that the beam of electrons entering from the left will be subjected to a measurement of the electron spin in the \(z\)-direction. From such a machine two beams may emerge, in this case corresponding respectively to the \(z\)-component of spin ‘up’, \(S_z=\hbar/2\), and ‘down’, \(S_z=-\hbar/2\).

Let us now consider the following experiment, SG2

We begin by sending a beam of electrons of undetermined spin (i.e. silver atoms produced in an oven) to an \(\mathbf{e}_z\)-machine. Of the two beams emerging from this machine we discard the spin down beam passing only the spin up beam into another \(\mathbf{e}_z\)-machine. From this machine only one beam emerges corresponding to the spin up atoms with \(S_z=\hbar/2\). So if electrons are already in a \(S_z=\hbar/2\) state then another measurement of the \(z\)-component of the spin of the electrons is certain to find that \(S_z=\hbar/2\). There will be no electrons found with \(S_z=-\hbar/2\). In some sense these two states, spin up and spin down are ‘orthogonal’, an electron in a spin up state has no ‘component’ of spin in the spin down state of the same direction.

Now consider replacing the second machine above with an \(\mathbf{e}_x\)-machine whose two outputs correspond respectively to \(S_x=\hbar/2\) and \(S_x=-\hbar/2\), SG3.

Intuitively we think of the \(x\) and \(z\) directions as being orthogonal and indeed if we were dealing here with measurements of the orbital angular momentum of some object then of course there could be no component in the \(x\)-direction of \(z\)-oriented angular momentum. The result of this spin measurement however is that we find about half the electrons entering the second apparatus emerge from the \(S_x=\hbar/2\) output and half from the \(S_x=-\hbar/2\) output. Thus in the quantum world, we conclude that if we measure the \(x\)-component of spin, \(S_x\), of a particle known to have \(z\)-component of spin, \(S_z=\hbar/2\), then we will measure \(S_x\) to be either \(S_x=\hbar/2\) or \(S_x=-\hbar/2\) with equal probability.

Finally, let us consider taking the apparatus of the previous experiment and directing the \(S_x=-\hbar/2\) beam through a \(\mathbf{e}_z\)-machine, SG4.

One might perhaps think that having, in our first machine, selected only atoms carrying spin \(S_z=\hbar/2\) to enter the second machine that only such atoms would emerge through the final \(\mathbf{e}_z\)-machine. However, we find that about half emerge from the \(S_z=\hbar/2\) and half through the \(S_x=-\hbar/2\) output. It’s as if the intervening \(\mathbf{e}_x\)-machine has scrambled any memory of the output of the first machine.

The Hodge Dual

In this section we will assume \(V\) is a real \(n\)-dimensional vector space with a symmetric non-degenerate inner product (metric), \(g(\cdot,\cdot):V\times V\mapto\RR\). In such a vector space we can always choose an orthonormal basis, \(\{e_i\}\), and know from the classification result, Theorem~\ref{in prod class}, that such spaces are characterised up to isometry by a pair of integers, \((n,s)\), where \(s\) is the number of \(e_i\) such that \(g(e_i,e_i)=-1\).

We have seen that the dimensions of the spaces \(\Lambda^r(V)\) are given by the binomial coefficients, \({n \choose r}\). In particular, simply by virtue of having the same dimension, this means that the spaces \(\Lambda^r(V)\) and \(\Lambda^{n-r}(V)\) are isomorphic. In fact, as we shall see, the metric allows us to establish an essentially natural isomorphism between these spaces called Hodge duality.

Take any pair of pure \(r\)-vectors in \(\Lambda^r(V)\), \(\alpha=v_1\wedge\dots\wedge v_r\) and \(\beta=w_1\wedge\dots\wedge w_r\), with \(v_i,w_i\in V\). Then we can define an inner product on \(\Lambda^r(V)\) as
\begin{equation}
(\alpha,\beta)=\det(g(v_i,w_j)),
\end{equation}
where \(g(v_i,w_j)\) is regarded as the \(ij\)th entry of an \(r\times r\) matrix, and extended bilinearly to the whole of \(\Lambda^r(V)\). Since the determinant of a matrix and its transpose are identical, the inner product is symmetric. Given our orthonormal basis, \(\{e_i\}\), of \(V\), consider the inner product of the corresponding basis elements, \(e_{i_1}\wedge\dots\wedge e_{i_r}\), where \(1\leq i_1Example Take the single basis vector of \(\Lambda^n(V)\) to be \(\sigma=e_1\wedge\dots\wedge e_n\), then \((\sigma,\sigma)=(-1)^s\).

Now whenever we have a symmetric non-degenerate inner product on some space \(U\), there is a natural isomorphism, \(U\cong U^*\), which associates to every linear functional, \(f\), on \(U\) a unique vector, \(v_f\in U\), such that \(f(u)=(v_f,u)\) for all \(u\in U\). Choose a normalised basis vector, \(\sigma\), for \(\Lambda^n(V)\) and notice that to any \(\lambda\in\Lambda^r(V)\) is associated a linear functional on \(\Lambda^{n-r}(V)\), \(f_\lambda\), according to \(\lambda\wedge\mu=f_\lambda(\mu)\sigma\). But to \(f_\lambda\) we can uniquely associate an element of \(\Lambda^{n-r}(V)\), call it \(\star\lambda\), according to \(f_\lambda(\mu)=(\star\lambda,\mu)\). \(\star\lambda\) is called the Hodge dual of \(\lambda\) and we may write,
\begin{equation}
\lambda\wedge\mu=(\star\lambda,\mu)\sigma.
\end{equation}
As a map, \(\star:\Lambda^r(V)\mapto\Lambda^{n-r}(V)\) is clearly linear.

Example Consider the 2-dimensional vector space \(\RR^2\) with the usual inner (scalar) product which we’ll here denote \(g(\cdot,\cdot)\). Denoting it’s standard basis vectors by \(\mathbf{e}_1\) and \(\mathbf{e}_2\), we have \(g(\mathbf{e}_i,\mathbf{e}_j)=\delta_{ij}\) and a basis for \(\Lambda^2(\RR^2)\) is \(\mathbf{e}_1\wedge\mathbf{e}_2\) with \((\mathbf{e}_1\wedge\mathbf{e}_2,\mathbf{e}_1\wedge\mathbf{e}_2)=1\). Clearly, we must then have
\begin{equation}
\star1=\mathbf{e}_1\wedge\mathbf{e}_2,
\end{equation}
and
\begin{equation}
\star(\mathbf{e}_1\wedge\mathbf{e}_2)=1.
\end{equation}
\(\star\mathbf{e}_1\) must be such that \((\star\mathbf{e}_1,\mathbf{e}_1)=0\) and \((\star\mathbf{e}_1,\mathbf{e}_2)=1\), that is,
\begin{equation}
\star\mathbf{e}_1=\mathbf{e}_2,
\end{equation}
and \(\star\mathbf{e}_2\) must be such that \((\star\mathbf{e}_2,\mathbf{e}_1)=-1\) and \((\star\mathbf{e}_2,\mathbf{e}_2)=0\), so
\begin{equation}
\star\mathbf{e}_2=-\mathbf{e}_1.
\end{equation}
Notice that if we had chosen \(\mathbf{e}_2\wedge\mathbf{e}_1=-\mathbf{e}_1\wedge\mathbf{e}_2\) as the basis for \(\Lambda^2(\RR^2)\) then \(\star1=-\mathbf{e}_1\wedge\mathbf{e}_2\), \(\star(-\mathbf{e}_1\wedge\mathbf{e}_2)=1\), \(\star\mathbf{e}_1=-\mathbf{e}_2\) and \(\star\mathbf{e}_2=\mathbf{e}_1\).

Given two bases of a vector space \(V\), \(\{e_i\}\) and \(\{f_i\}\), we say that they share the same orientation if the determinant of the change of basis matrix relating them is positive. Bases of \(V\) thus belong to one of two equivalence classes. From a slightly different perspective, given the bases \(\{e_i\}\) and \(\{f_i\}\) we can form the vectors \(e_1\wedge\dots\wedge e_n\) and \(f_1\wedge\dots\wedge f_n\) both of which belong the the 1-dimensional space \(\Lambda^n(V)\) and so we must have
\begin{equation}
f_1\wedge\dots\wedge f_n=ce_1\wedge\dots\wedge e_n.
\end{equation}
We know that we must be able to express the \(f_i\) in terms of the \(e_i\) as \(f_i=T_i^je_j\) where \(T_i^j\) are the elements of the change of basis linear operator defined by \(Te_i=f_i\). But we know that,
\begin{equation}
f_1\wedge\dots\wedge f_n=T^{\wedge n}(e_1\wedge\dots\wedge e_n)=\det Te_1\wedge\dots\wedge e_n,
\end{equation}
so \(c=\det T\). In other words given a basis \(\{e_i\}\) of \(V\), another basis \(f_i\) shares the same orientation if the corresponding top exterior powers are related by a positive constant. The Hodge dual thus depends on both the metric and the orientation of a given vector space.

Example Consider the 3-dimensional space \(\RR^3\) equipped with the usual inner product, with standard basis vectors \(\mathbf{e}_1\), \(\mathbf{e}_2\) and \(\mathbf{e}_3\) and \(\mathbf{e}_1\wedge\mathbf{e}_2\wedge\mathbf{e}_3\) as our prefered top exterior product. Then,
\begin{align}
\star1&=\mathbf{e}_1\wedge\mathbf{e}_2\wedge\mathbf{e}_3\\
\star\mathbf{e}_1&=\mathbf{e}_2\wedge\mathbf{e}_3\\
\star\mathbf{e}_2&=\mathbf{e}_3\wedge\mathbf{e}_1\\
\star\mathbf{e}_3&=\mathbf{e}_1\wedge\mathbf{e}_2\\
\star(\mathbf{e}_1\wedge\mathbf{e}_2)&=\mathbf{e}_3\\
\star(\mathbf{e}_2\wedge\mathbf{e}_3)&=\mathbf{e}_1\\
\star(\mathbf{e}_3\wedge\mathbf{e}_1)&=\mathbf{e}_2\\
\star(\mathbf{e}_1\wedge\mathbf{e}_2\wedge\mathbf{e}_3)&=1.
\end{align}

Let us now establish some general properties of the Hodge dual. We take an orthonormal basis of the \(n\)-dimensional space \(V\) to be \(\{e_i\}\) with top exterior form \(\sigma=ae_1\wedge\dots\wedge e_n\) with \(a=\pm1\). Then consider the pure \(r\)-vector \(e_I=e_1\wedge\dots\wedge e_r\) (no loss of generality will be incurred choosing \(I=(1,\dots,n)\)), we must then have that
\begin{equation}
\star e_I=ce_{r+1}\wedge\dots\wedge e_n=ce_J,
\end{equation}
for \(c=\pm1\) and \(J=(r+1,\dots,n)\). Of course \(c\) depends on our original choice \(a\) according to,
\begin{equation}
c=a(e_J,e_J).
\end{equation}
Consider now, \(\star e_J\), clearly
\begin{equation}
\star e_J=de_I,
\end{equation}
for some \(d=\pm1\) but since \(e_J\wedge e_I=(-1)^{r(n-r)}e_I\wedge e_J\), we have,
\begin{equation}
d=a(-1)^{r(n-r)}(e_I,e_I).
\end{equation}
We may therefore conclude that,
\begin{equation}
\star\star e_I=(-1)^{r(n-r)}(e_I,e_I)(e_J,e_J)e_I,
\end{equation}
but assuming \((\sigma,\sigma)=(-1)^s\) this is then,
\begin{equation}
\star\star e_I=(-1)^{r(n-r)+s}e_I,
\end{equation}
and by linearity we may conclude that for any \(\lambda\in\Lambda^r(V)\),
\begin{equation}
\star\star\lambda=(-1)^{r(n-r)+s}\lambda.
\end{equation}

Notice that for \(\lambda,\mu\in\Lambda^r(V)\), \(\lambda\wedge\star\mu=(\star\lambda,\star\mu)\sigma=(\star\mu,\star\lambda)\sigma=\mu\wedge\star\lambda\), that is,
\begin{equation}
\lambda\wedge\star\mu=\mu\wedge\star\lambda.
\end{equation}
But \(\mu\wedge\star\lambda=(-1)^r(n-r)\star\lambda\wedge\mu=(-1)^s(\lambda,\mu)\sigma\), that is,
\begin{equation}
\lambda\wedge\star\mu=\mu\wedge\star\lambda=(-1)^s(\lambda,\mu)\sigma.
\end{equation}

The Determinant Revisited

Suppose \(L:V\mapto V\) is a linear operator and consider the tensor product map \(L^{\otimes r}=L\otimes\dots\otimes L:T^r(V)\mapto T^r(V)\). Then clearly \(L^{\otimes r}\circ A=A\circ L^{\otimes r}\) so that \(L^{\otimes r}|_{\Lambda^r(V)}:\Lambda^r(V)\mapto\Lambda^r(V)\). This restriction is typically denoted \(L^{\wedge p}\). Now, as we’ve already observed, if \(V\) is an \(n\)-dimensional vector space, then \(\dim\Lambda^n(V)=1\). So any \(L^{\wedge n}\) is multiplication by a scalar. Choosing a basis, \(\{e_i\}\), of \(V\), then \(e_1\wedge\dots\wedge e_n\) is the single basis element of \(\Lambda^n(V)\), and if we write, \(Le_i=L_i^je_j\), then
\begin{equation}
L^{\wedge n}(e_1\wedge\dots\wedge e_n)=d_Le_1\wedge\dots\wedge e_n,
\end{equation}
where \(d_L\) is some scalar. But we also have,
\begin{equation}
L^{\wedge n}(e_1\wedge\dots\wedge e_n)=L_1^{i_1}\cdots L_n^{i_n}e_{i_1}\wedge\dots\wedge e_{i_n}.
\end{equation}
Now, the right hand side here is only non-zero when the set of indices \(\{i_1,\dots,i_n\}\) is precisely \(\{1,2,\dots,n\}\) and in this case
\begin{equation}
L_1^{i_1}\cdots L_n^{i_n}e_{i_1}\wedge\dots\wedge e_{i_n}=\sum_{\sigma\in S_n}\sgn(\sigma)L_1^{\sigma_1}\cdots L_n^{\sigma_n}e_1\wedge\dots\wedge e_n,
\end{equation}
in which we see precisely our original definition of the determinant,
so that \(d_L=\det L\).

Tensor Symmetries in Coordinate Representation

If \(T^{i_1\dots i_r}\) are the components of a \((r,0)\) tensor, \(T\), with respect to some basis then the symmetrization of \(T\), \(S(T)\), has components which are conventionally denoted, \(T^{(i_1\dots i_r)}\). That is, by definition,
\begin{equation}
T^{(i_1\dots i_r)}=\frac{1}{r!}\sum_{\sigma\in S_r}T^{i_{\sigma(1)}\dots i_{\sigma(r)}}.
\end{equation}
Similarly, the antisymmetrization of \(T\), \(A(T)\), has components which are conventionally deonted, \(T^{[i_1\dots i_r]}\). That is, by definition,
\begin{equation}
T^{[i_1\dots i_r]}=\frac{1}{r!}\sum_{\sigma\in S_r}\sgn(\sigma)T^{i_{\sigma(1)}\dots i_{\sigma(r)}}.
\end{equation}

The Problem of Outcomes

an Institute for Enquiring Minds production