Author Archives: abocadabro

Differential forms

The cotangent space

As discussed, at each point \(a\) of our space \(\RR^n\), is a tangent space, \(T_a(\RR^n)\), consisting of the tangent vectors at \(a\), or, intuitively, the “arrows at \(a\)”. Let us now consider the dual space, \(T_a(\RR^n)^*\), consisting of linear functionals on \(T_a(\RR^n)\). We call this the cotangent space at \(a\), or the space of 1-forms at \(a\), and define the differential of a function \(f:\RR^n\mapto\RR\), \(df\), to be the element of \(T_a(\RR^n)^*\) such that for any tangent vector \(\mathbf{v}\in T_a(\RR^n)\),
\begin{equation}
df(\mathbf{v})=\mathbf{v}(f).
\end{equation}
With respect to the coordinate basis of \(T_a(\RR^n)\) we then have,
\begin{equation}
df\left(\sum_{i=1}^{n}v^i\left.\frac{\partial}{\partial x^i}\right|_a\right)=\sum_{i=1}^{n}v^i\left.\frac{\partial f}{\partial x^i}\right|_a
\end{equation}
In particular, we see that the differentials, \(dx^i\), of the coordinate functions, \(x^i\), are dual basis vectors to the \(\partial/\partial x^i\),
\begin{equation}
dx^i(\partial/\partial x^j)=\delta^i_j.
\end{equation}
Thus any 1-form, \(\alpha\), at \(a\) may be written in the form,
\begin{equation}
\alpha=\sum_{i=1}^na_idx^i.
\end{equation}
In particular, the differential \(df\) is just,
\begin{equation}
df=\sum_{i=1}^n\frac{\partial f}{\partial x^i}dx^i.
\end{equation}

Just as we defined a smooth vector field on \(\RR^n\) as a smooth assignment of a tangent vector to each point of \(\RR^n\), we define a smooth 1-form field, \(\alpha\), on \(\RR^n\) as a smooth assignment of a 1-form to each point of \(\RR^n\),
\begin{equation}
\alpha=\sum_{i=1}^n\alpha_idx^i
\end{equation}
where the \(\alpha^i\) are smooth functions, \(\alpha_i:\RR^n\mapto\RR\).

Tensors on \(\RR^n\)

At any point of \(a\in\RR^n\) we have defined the tangent space, \(T_a(\RR^n)\), and its dual space of 1-forms, \(T_a(\RR^n)^*\). From these two spaces we may then, using the machinery developed in the notes on multilinear algebra, build tensors of rank \((r,s)\) living in,
\begin{equation}
\underbrace{T_a(\RR^n)\otimes\dots\otimes T_a(\RR^n)}_r\otimes\underbrace{T_a(\RR^n)^*\otimes\dots\otimes T_a(\RR^n)^*}_s,
\end{equation}
where \(r\) is the contravariant rank and \(s\) the covariant rank. Assigning, in a smooth manner, a rank \((r,s)\) tensor to every point of \(\RR^n\) we then have a smooth tensor field on \(\RR^n\). An important example is the rank \((0,2)\) tensor field called the metric tensor,
\begin{equation}
g=\sum_{i,j=1}^ng_{ij}dx^i\otimes dx^j
\end{equation}
where the smooth functions \(g_{ij}\) are such that at every point \(a\in\RR^n\), \(g_{ij}(a)=g_{ji}(a)\) and \(\det(g_{ij}(a))\neq0\). Such a tensor then provides a symmetric non-degenerate bilinear form \((\,,\,)\) on the tangent space \(T_a(\RR^n)\) according to the definition,
\begin{equation}
(v,w)=g(v,w)
\end{equation}
so that, in terms of a coordinate basis \({\partial/\partial x^i}\),
\begin{equation}
\left(\frac{\partial}{\partial x^i},\frac{\partial}{\partial x^j}\right)=g_{ij}.
\end{equation}
Note that in terms of another coordinate basis \({\partial/\partial y^i}\) we have
\begin{equation}
\left(\frac{\partial}{\partial y^i},\frac{\partial}{\partial y^j}\right)=\left(\sum_s\frac{\partial x^s}{\partial y^i}\frac{\partial}{\partial x^s},\sum_t\frac{\partial x^t}{\partial y^j}\frac{\partial}{\partial x^t}\right)=\sum_{s,t}\frac{\partial x^s}{\partial y^i}\frac{\partial x^t}{\partial y^j}g_{st}.
\end{equation}
Writing \(g_ij(x)\) for the components of the metric tensor with respect to coordinates \(x^i\) and \(g_ij(y)\) the components with respect to another coordinate system \(y^i\) we have
\begin{equation}
g_{ij}(y)=\sum_{s,t}\frac{\partial x^s}{\partial y^i}\frac{\partial x^t}{\partial y^j}g_{st}(x)
\end{equation}

Example In \(\RR^3\) the Euclidean metric, with respect to the Cartesian coordinate basis, is given by \(g_{ij}=\delta_{ij}\). In matrix form we have
\begin{equation}
(g_{ij}(x,y,z))=\begin{pmatrix}
1&0&0\\
0&1&0\\
0&0&1
\end{pmatrix}
\end{equation}
Recall that cylindrical coordinates, \((\rho,\varphi,z)\), are related to Cartesians according to,
\begin{equation}
x=\rho\cos\varphi,\quad y=\rho\sin\varphi,\quad z=z,
\end{equation}
so that, for example,
\begin{align*}
g_{11}(\rho,\varphi,z)&=\left(\frac{\partial x}{\partial\rho}\right)^2+\left(\frac{\partial y}{\partial\rho}\right)^2+\left(\frac{\partial z}{\partial\rho}\right)^2\\
&=\cos^2\varphi+\sin^2\varphi\\
&=1,
\end{align*}
\begin{align*}
g_{12}(\rho,\varphi,z)&=\left(\frac{\partial x}{\partial\rho}\right)\left(\frac{\partial x}{\partial\varphi}\right)+\left(\frac{\partial y}{\partial\rho}\right)\left(\frac{\partial y}{\partial\varphi}\right)+\left(\frac{\partial z}{\partial\rho}\right)\left(\frac{\partial z}{\partial\varphi}\right)\\
&=-\rho\sin\varphi\cos\varphi+\rho\sin\varphi\cos\varphi\\
&=0,
\end{align*}
whilst
\begin{align*}
g_{22}(\rho,\varphi,z)&=\left(\frac{\partial x}{\partial\varphi}\right)^2+\left(\frac{\partial y}{\partial\varphi}\right)^2+\left(\frac{\partial z}{\partial\varphi}\right)^2\\
&=\rho^2\sin^2\varphi+\rho^2\cos^2\varphi\\
&=\rho^2.
\end{align*}
Computing in this way we establish the metric in cylindrical polar coordinates as,
\begin{equation}
(g_{ij}(\rho,\varphi,z))=\begin{pmatrix}
1&0&0\\
0&\rho^2&0\\
0&0&1
\end{pmatrix}
\end{equation}
Recall that spherical polar coordinates, \((r,\theta,\varphi)\), are related to Cartesian coordinates by,
\begin{equation}
x=r\cos\varphi\sin\theta,\quad y=r\sin\varphi\sin\theta,\quad z=r\cos\theta,
\end{equation}
so that, for example,
\begin{align*}
g_{11}(r,\theta,\varphi)&=\left(\frac{\partial x}{\partial r}\right)^2+\left(\frac{\partial y}{\partial r}\right)^2+\left(\frac{\partial z}{\partial r}\right)^2\\
&=\cos^2\varphi\sin^2\theta+\sin^2\varphi\sin^2\theta+\cos^2\theta\\
&=1,
\end{align*}
\begin{align*}
g_{12}(r,\theta,\varphi)&=\left(\frac{\partial x}{\partial r}\right)\left(\frac{\partial x}{\partial\theta}\right)+\left(\frac{\partial y}{\partial r}\right)\left(\frac{\partial y}{\partial\theta}\right)+\left(\frac{\partial z}{\partial r}\right)\left(\frac{\partial z}{\partial\theta}\right)\\
&=r\cos^2\varphi\sin\theta\cos\theta+r\sin^2\varphi\sin\theta\cos\theta-r\cos\theta\sin\theta\\
&=0,
\end{align*}
whilst,
\begin{align*}
g_{22}(r,\theta,\varphi)&=\left(\frac{\partial x}{\partial\theta}\right)^2+\left(\frac{\partial y}{\partial\theta}\right)^2+\left(\frac{\partial z}{\partial\theta}\right)^2\\
&=r^2\cos^2\varphi\cos^2\theta+r^2\sin^2\varphi\cos^2\theta+r^2\sin^2\theta\\
&=r^2.
\end{align*}
Computing in this way we establish the metric in spherical polar coordinates as,
\begin{equation}
(g_{ij}(r,\theta,\varphi))=\begin{pmatrix}
1&0&0\\
0&r^2&0\\
0&0&r^2\sin^2\theta
\end{pmatrix}
\end{equation}

Example Minkowski space is \(\RR^4\) endowed with the Lorentzian metric. Given coordinates, \((x^0=t,x^1=x,x^2=y,x^3=z)\) (units chosen so that the speed of light \(c=1\)), then
\begin{equation}
g_{ij}=\begin{cases}
-1&i=j=0\\
1&i=j=1,2,3\\
0&\text{otherwise}
\end{cases}
\end{equation}
This is sometimes written as \(ds^2=-dt^2+dx^2+dy^2+dz^2\).

The gradient

Recall that if we have a non-degenerate inner product, \((\,,\,)\), on a vector space \(V\) then there is a natural isomorphism \(V^*\cong V\) such that any \(f\in V^*\) corresponds to a vector \(v_f\in V\) such that for all \(v\in V\), \(f(v)=(v_f,v)\).

Definition Given a non-degenerate inner product, \((\,,\,)\), on the the tangent space, \(T_a(\RR^n)\), the gradient vector of a function \(f:\RR^n\mapto\RR\) is defined to be, \(\textbf{grad}f=\nabla f\), such that
\begin{equation}
(\nabla f,v)=df(v)
\end{equation}
for all \(v\in T_a(\RR^n)\).

Suppose our space is \(\RR^3\), then in terms of the coordinate basis we have
\begin{equation*}
\left(\sum_i(\nabla f)^i\frac{\partial}{\partial x^i},\frac{\partial}{\partial x^j}\right)=\sum_i(\nabla f)^ig_{ij}=\frac{\partial f}{\partial x^j}
\end{equation*}
so that
\begin{equation}
(\nabla f)^i=\sum_jg^{ij}(x)\frac{\partial f}{\partial x^j}
\end{equation}
where \(g^{ij}(x)\) is the inverse of the matrix of the (Euclidean) metric tensor, \((g_{ij}(x))\), in terms of the coordinate system, \(x^i\). Thus in Cartesian coordinates we have
\begin{equation}
\nabla f=\frac{\partial f}{\partial x}\mathbf{e}_x+\frac{\partial f}{\partial y}\mathbf{e}_y+\frac{\partial f}{\partial z}\mathbf{e}_z.
\end{equation}
In cylindrical coordinates we have
\begin{equation}
\nabla f=\frac{\partial f}{\partial\rho}\frac{\partial}{\partial\rho}+\frac{1}{\rho^2}\frac{\partial f}{\partial\varphi}\frac{\partial}{\partial\varphi}+\frac{\partial f}{\partial z}\frac{\partial}{\partial z}
\end{equation}
or, in terms of the unit vectors, \(\mathbf{e}_\rho\), \(\mathbf{e}_\varphi\) and \(\mathbf{e}_z\),
\begin{equation}
\nabla f=\frac{\partial f}{\partial\rho}\mathbf{e}_\rho+\frac{1}{\rho}\frac{\partial f}{\partial\varphi}\mathbf{e}_\varphi+\frac{\partial f}{\partial z}\mathbf{e}_z.
\end{equation}
Finally, in spherical coordinates we have
\begin{equation}
\nabla f=\frac{\partial f}{\partial r}\frac{\partial}{\partial r}+\frac{1}{r^2}\frac{\partial f}{\partial\theta}\frac{\partial}{\partial\theta}+\frac{1}{r^2\sin^2\theta}\frac{\partial f}{\partial\varphi}\frac{\partial}{\partial\varphi}
\end{equation}
or, in terms of the unit vectors, \(\mathbf{e}_r\), \(\mathbf{e}_\theta\) and \(\mathbf{e}_\varphi\),
\begin{equation}
\nabla f=\frac{\partial f}{\partial\rho}\mathbf{e}_r+\frac{1}{r}\frac{\partial f}{\partial\varphi}\mathbf{e}_\theta+\frac{1}{r\sin\theta}\frac{\partial f}{\partial\varphi}\mathbf{e}_\varphi.
\end{equation}

In \(\RR^n\) with the Euclidean metric the Cauchy-Schwarz inequality tells us that for some unit vector \(\mathbf{v}\) at a point \(a\in\RR^n\),
\begin{equation}
\mathbf{v}(f)=(\nabla f,v)\leq\norm{\nabla f}
\end{equation}
so that the greatest rate of change of \(f\) at some point is in the direction of \(\nabla f\). By a level surface we will mean the surface specified by the points \(x\in\RR^n\) such that \(f(x)=c\) for some \(c\in\RR\). Consider a curve, \(\gamma(t)\), in this level surface. Then a tangent vector, \(\mathbf{v}_{\gamma(t_0)}\), to this curve at \(\gamma(t_0)\) is such that \(\mathbf{v}_{\gamma(t_0)}(f)=0\), that is, \(\nabla f,\mathbf{v}_{\gamma(t_0)})=0\). The gradient vector is orthogonal to the level surface.

Vector fields and integral curves

Vector fields

Definition A vector field \(\mathbf{v}\) on the space \(\RR^n\) is an assignment of a tangent vector \(\mathbf{v}_a\in T_a(\RR^n)\) to each point \(a\in\RR^n\). Since each tangent space \(T_a(\RR^n)\) has a coordinate basis \({\partial/\partial x^i|_a}\), at each point \(a\) we can write
\begin{equation}
\mathbf{v}_a=\sum_{i=1}^nv^i(a)\left.\frac{\partial}{\partial x^i}\right|_a
\end{equation}
or,
\begin{equation}
\mathbf{v}=\sum_{i=1}^nv^i\frac{\partial}{\partial x^i}
\end{equation}
where the \(v^i\) are functions \(v^i:\RR^n\mapto\RR\). The vector field is said to be smooth if the functions \(v^i\) are smooth.

Example On the space \(\RR^2-{0}\) we can visualise the vector field defined by,
\begin{equation}
\mathbf{v}=\frac{-y}{\sqrt{x^2+y^2}}\frac{\partial}{\partial x}+\frac{x}{\sqrt{x^2+y^2}}\frac{\partial}{\partial y},
\end{equation}
as,

Example On the space \(\RR^2\) the vector field defined as
\begin{equation}
\mathbf{v}=x\frac{\partial}{\partial x}-y\frac{\partial}{\partial y}
\end{equation}
can be visualised as

The vector fields on \(\RR^n\) and the derivations of the (algebra of) smooth functions on \(\RR^n\) are isomorphic as vector spaces. Note that a derivation, \(X\), of \(C^\infty(\RR^n)\) is a linear map \(X:C^\infty(\RR^n)\mapto C^\infty(\RR^n)\) such that the Leibniz rule,
\begin{equation}
X(fg)=(Xf)g+f(Xg)
\end{equation}
is satisfied for all \(f,g\in C^\infty(\RR^n)\).

Vector fields and ODEs — integral curves

Consider a fluid in motion such that its “flow” is independent of time. The path of a single particle would trace out a path in space — a curve, \(\gamma(t)\) say, parameterised by time. The velocity of such a particle, say at \(\gamma(0)\), is the tangent vector \(d\gamma(t)/dt|_0\). The “flow” of the whole system could be modelled by a 1-parameter family of maps \(\phi_t:\RR^3\mapto\RR^3\) such that \(\phi_t(a)\) is the location of a particle a time \(t\) after it was located at the point \(a\), in other words, \(\phi_t(a)\) is the curve \(\gamma\) such that \(\gamma(0)=a\) and \(\gamma(t)=\phi_t(a)\). Since the flow is stationary we have that
\begin{equation}
\phi_{s+t}(a)=\phi_s(\phi_t(a))=\phi_t(\phi_s(a)).
\end{equation}
Also,
\begin{equation}
\phi_{-t}(\phi_t(a))=a,
\end{equation}
where we understand \(\phi_{-t}(a)\) to mean the location of a particle a time \(t\) before it was at \(a\). So, understanding \(\phi_0\) to be the identity map and \(\phi_{-t}=\phi_t^{-1}\) we have a 1-paramter group of maps, which, assuming they are smooth, are each diffeomorphisms, \(\phi_t:\RR^3\mapto\RR^3\), collectively called a flow. Given such a flow we obtain a velocity (vector) field as
\begin{equation}
\mathbf{v}_a=\left.\frac{d\phi_t(a)}{dt}\right|_0.
\end{equation}
An individual curve \(\gamma\) in the flow is then called an integral curve through \(a\) of the vector field \(\mathbf{v}\). All this generalises to \(n\)-dimensions. A flow \(\phi_t:\RR^n\mapto\RR^n\) gives rise to a (velocity) vector field on \(\RR^n\). Conversely, suppose we have some vector field, \(\mathbf{v}\), then we can wonder about the existence of integral curves through the points of our space having the vectors of the vector field as tangents. Such an integral curve would have to satisfy,
\begin{equation}
\mathbf{v}_{\gamma(0)}(f)=\left.\frac{df(\gamma(t))}{dt}\right|_0
\end{equation}
for any function, \(f\), so that considering in turn the coordinate functions, \(x^i\), we have a system of differential equations,
\begin{equation}
\frac{dx^i(t)}{dt}=v^i(x^1(t),\dots,x^n(t))
\end{equation}
where the \(v^i\) are the components of the vector field and \(x^i(t)=x^i(\gamma(t))\). The theorem on the existence and uniqueness of the solution of this system of equations and hence of the corresponding integral curves and flow is the following.

Theorem If \(\mathbf{v}\) is a smooth vector field defined on \(\RR^n\) then for each point \(a\in\RR^n\) there is a curve \(\gamma:I\mapto\RR^n\) (\(I\) an open interval in \(\RR\) containing 0) such that \(\gamma(0)=a\) and
\begin{equation}
\frac{d\gamma(t)}{dt}=\mathbf{v}_{\gamma(t)}
\end{equation}
for all \(t\in I\) and any two such curves are equal on the intersection of their domains. Furthermore, there is a neighbourhood \(U_a\) of \(a\) and an interval \(I_\epsilon=(-\epsilon,\epsilon)\) such that for all \(t\in I_\epsilon\) and \(b\in U_a\) there is a curve \(t\mapsto\phi_t(b)\) satisfying
\begin{equation}
\frac{d\phi_t(b)}{dt}=\mathbf{v}_{\phi_t(b)}
\end{equation}
which is a flow on \(U_a\) — a local flow.

Linear vector fields on \(\RR^n\)

Suppose we have a linear transformation \(X\) of the vector space \(\RR^n\). Then to any point \(a\in\RR^n\) we can associate an element \(Xa\) which we can understand as a vector in \(T_a(\RR^n)\). The previous theorem tells us that at any point \(a\) we can find a solution to the system of differential equations,
\begin{equation}
\frac{d\gamma(t)}{dt}=X(\gamma(t)),
\end{equation}
valid in some open interval around 0 with \(\gamma(0)=a\). Let’s construct this solution explicitly. We seek a power series solution
\begin{equation}
\gamma(t)=\sum_{k=0}^\infty t^ka_k
\end{equation}
such that \(a_0=a\) and where we understand \(\gamma(t)=(x^1(\gamma(t)),\dots,x^n(\gamma(t)))\) and \(a_k=(x^1(a_k),\dots,x^n(a_k))\). Plugging the power series into the differential equation we obtain,
\begin{equation}
\sum_{k=1}^\infty kt^{k-1}a_k=\sum_{k=0}^\infty t^kXa_k,
\end{equation}
from which we obtain the recurrence relation,
\begin{equation}
a_{k+1}=\frac{1}{k+1}Xa_k,
\end{equation}
which itself leads to,
\begin{equation}
a_k=\frac{1}{k!}X^k(a_0)=\frac{1}{k!}X^ka,
\end{equation}
so that,
\begin{equation}
\gamma(t)=\sum_{k=0}^\infty\frac{t^kX^k}{k!}a=\exp(tX)a,
\end{equation}
where we’ve introduced the matrix exponential which, as we’ve already mentioned, converges for any matrix \(X\). It’s not difficult to show that this solution is unique. In this case the flow defined by \(\phi_t=\exp(tX)\) is defined on the whole of \(\RR^n\) for all times \(t\).

Tangent vectors

As we’ve already indicated we choose to view \(\RR^n\) as a space of points, \(x=(x^1,\dots,x^n)\), with “arrows” emerging from a point \(x\) in space living in the tangent space \(T_x(\RR^n)\) at \(x\). This is just another copy of \(\RR^n\), now viewed as vector space. For example, at some point \(a\in\RR^n\) we might have vectors \(\mathbf{v}_a,\mathbf{w}_a\in T_a(\RR^n)\) such that \(\mathbf{v}_a+\mathbf{w}_a=(\mathbf{v}+\mathbf{w})_a\). Given two distinct points \(a,b\in\RR^n\), \(T_a(\RR^n)\) and \(T_b(\RR^n)\) are distinct copies of \(\RR^n\). Without some further mechanism by which we could transport a vector from \(T_a(\RR^n)\) to a vector in \(T_b(\RR^n)\) there can be no meaning attached to the sum of a vector in \(T_a(\RR^n)\) and a vector in \(T_b(\RR^n)\).

Working with the space \(\RR^n\) we can safely think of the tangent space at each point as the collection of all arrows at the point. But suppose our space was the surface of a sphere. In this case we have the idea of tangent vectors at a point living in a tangent plane within an ambient space. But what if there was no ambient space? We’re anticipating here the generalisation of the tools of calculus to spaces far more general than \(\RR^n\). With this in mind we’ll consider here more sophisticated characterisations of the notion of tangent vectors. Specifically, we’ll avoid, as far as possible, explicitly exploiting the fact that our underlying space of points, \(\RR^n\), is itself a vector space. Instead we’ll rely on the fact that at any point we have a valid coordinate system through which we can access a vector space structure.

A tangent vector as an equivalence class of curves

A smooth curve in \(\RR^n\) is a smooth map \(\gamma:(\alpha,\beta)\mapto\RR^n\) which we denote simply by \(\gamma(t)\). With respect to some coordinate system \(x^i\), two curves, \(\gamma(t)\) and \(\tilde{\gamma}(t)\), are said to be tangent at a point \(\gamma(t_0)=a=\tilde{\gamma}(t_0)\) if
\begin{equation}
\left.\frac{dx^i(\gamma(t))}{dt}\right|_{t_0}=\left.\frac{dx^i(\tilde{\gamma}(t))}{dt}\right|_{t_0}
\end{equation}
for \(i=1,\dots,n\). Curves are tangent regardless of the coordinate system used. Indeed, suppose instead of \(x^i\) we used some other coordinate system \(y^i\) such that \(y^i=y^i(x^1,\dots,x^n)\) then
\begin{align*}
\left.\frac{dy^i(\gamma(t))}{dt}\right|_{t_0}&=\sum_{j=1}^{n}\frac{\partial y^i}{\partial x^j}\left.\frac{dx^i(\gamma(t))}{dt}\right|_{t_0}\\
&=\sum_{j=1}^{n}\frac{\partial y^i}{\partial x^j}\left.\frac{dx^i(\tilde{\gamma}(t))}{dt}\right|_{t_0}\\
&=\left.\frac{dy^i(\tilde{\gamma}(t))}{dt}\right|_{t_0}
\end{align*}
One definition of a tangent vector, \(\mathbf{v}\), at a point \(a\) is then as the equivalence class \([\gamma]\) of curves tangent to one another at the point \(a\). We can define addition and scalar multiplication by \(\mathbf{v}_1+\mathbf{v}_2=[\gamma_1+\gamma_2]\) and \(c\mathbf{v}=[c\gamma]\). These definitions are clearly exploiting the vector space structure of our space \(\RR^n\) but can easily be tweaked not to do so. The tangent vectors so defined form a real vector space, the tangent space \(T_a(\RR^n)\). This is clearly equivalent to our intuitive notion of vectors as arrows at a point but is applicable even when our space of points is more general than \(\RR^n\).

We can now introduce the directional derivative of a (smooth) function \(f:\RR^n\mapto\RR\), at a point \(a=\gamma(t_0)\), in the direction of a tangent vector \(\mathbf{v}\) according to the definition,
\begin{equation}
D_{\mathbf{v}}f(a)=\left.\frac{df(\gamma(t))}{dt}\right|_{t_0},
\end{equation}
where \(\mathbf{v}\) is the tangent vector corresponding to the equivalence class of curves \([\gamma]\). Note that this does not depend on the representative of the equivalence class chosen since with respect to any coordinate system \(x^i\),
\begin{align*}
\left.\frac{df(\gamma(t))}{dt}\right|_{t_0}&=\left.\sum_{i=1}^n\frac{\partial f}{\partial x^i}\frac{dx^i(\gamma(t))}{dt}\right|_{t_0}\\
&=\left.\sum_{i=1}^n\frac{\partial f}{\partial x^i}\frac{dx^i(\tilde{\gamma}(t))}{dt}\right|_{t_0}\\
&=\left.\frac{df(\tilde{\gamma}(t))}{dt}\right|_{t_0}
\end{align*}
Note also that this corresponds to the usual definition of a directional derivative in \(\RR^n\) as,
\begin{equation}
D_vf(a)=\frac{d}{dt}f(a+t\mathbf{v})|_{t=0},
\end{equation}
by considering the curve in \(\RR^n\) through the point \(a\) defined according to \(t\mapto a+t\mathbf{v}\).

Directional derivatives and derivations

Let us regard the directional derivative, \(D_\mathbf{v}\), as a map, \(D_\mathbf{v}:C^\infty(\RR^n)\mapto\RR\), at any point in \(\RR^n\). Then directional derivatives are examples of derivations according to the following definition.

Definition A map \(X:C^\infty(\RR^n)\mapto\RR\) is called a derivation at a point \(a\in\RR^n\) if it is linear over \(\RR\) and satisfies the Leibniz rule,
\begin{equation}
X(fg)=X(f)g(a)+f(a)X(g).
\end{equation}

To see that \(D_\mathbf{v}\) is a derivation at some point \(a=\gamma(t_0)\in\RR^n\) with \(\mathbf{v}=[\gamma]\),
\begin{align*}
D_\mathbf{v}(f+g)(a)&=\left.\frac{d}{dt}((f+g)(\gamma(t)))\right|_{t_0}\\
&=\left.\frac{df(\gamma(t))}{dt}\right|_{t_0}+\left.\frac{dg(\gamma(t))}{dt}\right|_{t_0}\\
&=D_{\mathbf{v}}f(a)+D_{\mathbf{v}}g(a),
\end{align*}
and for \(c\in\RR\),
\begin{align*}
D_\mathbf{v}(cf)(a)&=\left.\frac{d}{dt}((cf)(\gamma(t)))\right|_{t_0}\\
&=c\left.\frac{df(\gamma(t))}{dt}\right|_{t_0}\\
&=cD_\mathbf{v}(f)(a),
\end{align*}
and
\begin{align*}
D_\mathbf{v}(fg)(a)&=\left.\frac{d}{dt}(f(\gamma(t))g(\gamma(t)))\right|_{t_0}\\
&=\left.\frac{df(\gamma(t))}{dt}g(\gamma(t))\right|_{t_0}+\left.f(\gamma(t))\frac{dg(\gamma(t))}{dt}\right|_{t_0}\\
&=D_\mathbf{v}f(a)g(a)+f(a)D_\mathbf{v}g(a)
\end{align*}

The Leibniz rule is what really captures the essence of differentiation. Let’s consider some of its consequences. Suppose \(f\) is the constant function, \(f(x)=1\). Then for any derivation \(X\), \(X(f)=0\). This follows since \(f=ff\) and by the Leibniz rule, \(X(f)=X(ff)=X(f)f(a)+f(a)X(f)=2X(f)\), so \(Xf=0\). It follows immediately, by linearity of derivations, that \(Xf=0\) for any constant function \(f(x)=c\). Another consequence is that if \(f(a)=g(a)=0\) then \(X(fg)=0\) since \(X(fg)=X(f)g(a)+f(a)X(g)=0\).

It’s straightforward to verify that derivations at a point \(a\in\RR^n\) form a real vector space which we denote by \(\mathcal{D}_a(\RR^n)\). For any coordinate system \(x^i\) the partial derivatives \(\partial/\partial x^i\) are easily seen to be derivations and we’ll now demonstrate that the partial derivatives \(\partial/\partial x^i\) at \(a\) provide a basis for \(\mathcal{D}_a(\RR^n)\). Indeed, from Taylor’s theorem, we know that for any smooth function \(f\), in the neighbourhood of a point \(a\in\RR^n\) and in terms of coordinates \(x^i\),
\begin{equation}
f(x)=f(a)+\sum_{i=1}^n\left.(x^i-a^i)\frac{\partial f}{\partial x^i}\right|_a+\sum_{i,j=1}^n(x^i-a^i)(x^j-a^j)\int_0^1(1-t)\frac{\partial^2f(a+t(x-a))}{\partial x^i\partial x^j}dt.
\end{equation}
Consider applying the derivation \(X\) to \(f\) at \(a\), \(X(f)\).
\begin{equation}
Xf=X(f(a))+\sum_{i=1}^n\left.X((x^i-a^i))\frac{\partial f}{\partial x^i}\right|_a+\sum_{i,j=1}^nX((x^i-a^i)(x^j-a^j))\int_0^1(1-t)\frac{\partial^2f(a+t(x-a))}{\partial x^i\partial x^j}dt
\end{equation}
but \(X(f(a))=0\) since \(f(a)\) is a constant and \(X((x^i-a^i)(x^j-a^j))=0\) since both \((x^i-a^i)\) and \((x^j-a^j)\) are 0 at \(a\). Thus we have that,
\begin{equation}
Xf=\sum_{i=1}^n\left.X(x^i)\frac{\partial f}{\partial x^i}\right|_a.
\end{equation}
In other words \(X=\sum_{i=1}^nX(x^i)\partial/\partial x^i\). Notice that if \(y^i\) is any other coordinate system valid in the neighbourhood of \(a\) then
\begin{equation}
Xy^j=\left.\sum_{i=1}^nX(x^i)\frac{\partial y^j}{\partial x^i}\right|_a
\end{equation}
so that
\begin{align*}
Xf&=\sum_{i=1}^n\left.X(x^i)\frac{\partial f}{\partial x^i}\right|_a\\
&=\sum_{i=1}^n\left.X(x^i)\frac{\partial y^j}{\partial x^i}\frac{\partial f}{\partial y^j}\right|_a\\
&=\sum_{i=1}^nX(y^i)\left.\frac{\partial f}{\partial y^i}\right|_a,
\end{align*}
and the result does not depend on the chosen coordinate system. That the coordinate partials are linearly independent follows since applying \(\sum_ic^i\partial/\partial x^i=0\) to the coordinate functions \(x^i\) in turn yields \(c^i=0\) for all \(i\). It follows from what we’ve observed here that every derivation is the directional derivative to a curve,

So, to any tangent vector \(\mathbf{v}\) is associated the directional derivative \(D_\mathbf{v}\) which is a derivation. Are all derivations directional derivatives? The answer is yes. If we have a derivation \(X\) at a point \(a\) then we know that for any smooth function in a neighbourhood of \(a\), in terms of coordinates \(x^i\),
\begin{equation}
Xf=\sum_{i=1}^n\left.X(x^i)\frac{\partial f}{\partial x^i}\right|_a.
\end{equation}
We also know that the directional derivative of \(f\) in the direction of a tangent vector \(\mathbf{v}=[\gamma]\) at \(a=\gamma(t_0)\) is, again in terms of local coordinates \(x^i\),
\begin{equation}
D_{\mathbf{v}}f(a)=\left.\frac{df(\gamma(t))}{dt}\right|_{t_0}=\sum_{i=1}^n\left.\frac{dx^i(\gamma(t))}{dt}\frac{\partial f}{\partial x^i}\right|_{t_0}.
\end{equation}
So, if we choose a curve \(\gamma\) such that \(\gamma(t_0)=a\) and
\begin{equation}
\left.\frac{dx^i(\gamma(t))}{dt}\right|_{t_0}=X(x^i),
\end{equation}
then \(Xf=D_{\mathbf{v}}f(a)\). Thus, we can just take \(\gamma(t)=(a^1+X(x^1)(t-t_0),\dots,a^n+X(x^n)(t-t_0))\) (even though we’re explicitly relying on the vector space structure of \(\RR^n\) there is nothing essentially different required in the more general setting). Finally, we can ask whether each tangent vector corresponds to a unique derivation. This follows since if \(D_\mathbf{v}=D_\mathbf{w}\), where \(\mathbf{v}=[\gamma_1]\) and \(\mathbf{w}=[\gamma_2]\), then applying this in turn to each coordinate function, \(x^I\), we obtain,
\begin{equation*}
\left.\frac{dx^i(\gamma_1(t)}{dt}\right|_{t_0}=\left.\frac{dx^i(\gamma_2(t)}{dt}\right|_{t_0}
\end{equation*}
so \(\gamma_1\) and \(\gamma_2\) are in the same equivalence class, or, \(\mathbf{v}=\mathbf{w}\). We have therefore proved the following important theorem.

Theorem The vector spaces \(T_a(\RR^n)\) and \(\mathcal{D}_a(\RR^n)\) are isomorphic and under this isomorphism tangent vectors \(v\) map to derivations \(D_v\).

Under this isomorphism the standard basis vectors \(\mathbf{e}_i\) at \(a\) map to the partial derivatives \(\partial/\partial x^i\) at \(a\) and indeed in the more general setting of differential geometry, which the treatment here anticipates, it is usual to treat those partials as the basis “vectors” of the tangent space \(T_a(\RR^n)\). The basis \(\partial/\partial x^i\) is called the coordinate basis. Suppose our space was the plane \(\RR^2\) then the Cartesian coordinate basis would be \((\partial/\partial x,\partial/\partial y)\) corresponding respectively to the standard basis vectors \((\mathbf{e}_x,\mathbf{e}_y)\). If we choose to work with polar coordinates, the coordinate basis would be \((\partial/\partial r,\partial/\partial\theta)\) with
\begin{align*}
\frac{\partial}{\partial r}&=\frac{\partial x}{\partial r}\frac{\partial}{\partial x}+\frac{\partial y}{\partial r}\frac{\partial}{\partial y}\\
&=\cos\theta\frac{\partial}{\partial x}+\sin\theta\frac{\partial}{\partial y}
\end{align*}
and
\begin{align*}
\frac{\partial}{\partial\theta}&=\frac{\partial x}{\partial\theta}\frac{\partial}{\partial x}+\frac{\partial y}{\partial\theta}\frac{\partial}{\partial y}\\
&=-r\sin\theta\frac{\partial}{\partial x}+r\cos\theta\frac{\partial}{\partial y}
\end{align*}
Note that if in \(T_a(\RR^n)\) we adopt the usual Euclidean metric (a non-degenerate, symmetric, positive definite inner product, \((\,,\,)\)), such that the standard basis is orthonormal, \((\mathbf{e}_i,\mathbf{e}_j)=\delta_{ij}\), then the polar basis vectors, \(\partial/\partial r\) and \(\partial/\partial\theta\) of \(T_a(\RR^2)\) are not orthonormal. The corresponding normalised basis vectors would be \(\mathbf{e}_r\) and \(\mathbf{e}_\theta\) defined by
\begin{equation}
\mathbf{e}_r=\frac{\partial}{\partial r}=\cos\theta\mathbf{e}_x+\sin\theta\mathbf{e}_y
\end{equation}
and
\begin{equation}
\mathbf{e}_\theta=\frac{1}{r}\frac{\partial}{\partial\theta}=-\sin\theta\mathbf{e}_x+\cos\theta\mathbf{e}_y.
\end{equation}

In the case of cylindrical coordinates we find
\begin{align}
\frac{\partial}{\partial\rho}&=\cos\varphi\frac{\partial}{\partial x}+\sin\varphi\frac{\partial}{\partial y}\\
\frac{\partial}{\partial\varphi}&=-\rho\sin\varphi\frac{\partial}{\partial x}+\rho\cos\varphi\frac{\partial}{\partial y}\\
\frac{\partial}{\partial z}&=\frac{\partial}{\partial z}
\end{align}
and the corresponding normalised basis vectors are \(\mathbf{e}_\rho\), \(\mathbf{e}_\varphi\) and \(\mathbf{e}_z\) defined by
\begin{equation}
\mathbf{e}_\rho=\frac{\partial}{\partial\rho}=\cos\varphi\mathbf{e}_x+\sin\varphi\mathbf{e}_y,
\end{equation}
\begin{equation}
\mathbf{e}_\varphi=\frac{1}{\rho}\frac{\partial}{\partial\varphi}=-\sin\varphi\mathbf{e}_x+\cos\varphi\mathbf{e}_y,
\end{equation}
and
\begin{equation}
\mathbf{e}_z=\frac{\partial}{\partial z}=\mathbf{e}_z.
\end{equation}

In the case of sperical coordinates we find
\begin{align}
\frac{\partial}{\partial r}&=\cos\varphi\sin\theta\frac{\partial}{\partial x}+\sin\varphi\sin\theta\frac{\partial}{\partial y}+\cos\theta\frac{\partial}{\partial z}\\
\frac{\partial}{\partial\theta}&=r\cos\varphi\cos\theta\frac{\partial}{\partial x}+r\sin\varphi\cos\theta\frac{\partial}{\partial y}-r\sin\theta\frac{\partial}{\partial z}\\
\frac{\partial}{\partial\varphi}&=-r\sin\varphi\sin\theta\frac{\partial}{\partial x}+r\cos\varphi\sin\theta\frac{\partial}{\partial y}
\end{align}
and the corresponding normalised basis vectors are \(\mathbf{e}_r\), \(\mathbf{e}_\theta\) and \(\mathbf{e}_\varphi\) defined by
\begin{equation}
\mathbf{e}_r=\frac{\partial}{\partial r}=\cos\varphi\sin\theta\mathbf{e}_x+\sin\varphi\sin\theta\mathbf{e}_y+\cos\theta\mathbf{e}_z,
\end{equation}
\begin{equation}
\mathbf{e}_\theta=\frac{1}{r}\frac{\partial}{\partial\theta}=\cos\varphi\cos\theta\mathbf{e}_x+\sin\varphi\cos\theta\mathbf{e}_y-\sin\theta\mathbf{e}_z,
\end{equation}
and
\begin{equation}
\mathbf{e}_\varphi=\frac{1}{r\sin\theta}\frac{\partial}{\partial\varphi}=-\sin\varphi\mathbf{e}_x+\cos\varphi\mathbf{e}_y.
\end{equation}

Inverse Function Theorem

From elementary calculus we recall that a continuous function is invertible if and only if it is monotonically increasing or decreasing over the interval of the required inverse. We can see how this arises by looking at the linear approximation to \(f\) in the neighbourhood of some point \(x=a\), \(f(x)\approx f(a)+f'(a)\cdot(x-a)\). Clearly, to be able to invert this and express, at least locally, \(x\) in terms of \(f(x)\) we must have \(f'(a)\neq0\).

As we’ve seen, we can similarly approximate the function \(f:\RR^n\mapto\RR^m\) in the neighbourhood of a point \(a\in\RR^n\) as \(f(x)\approx f(a)+J_f(a)(x-a)\) which tells us that for \(f\) to be invertible in the neighbourhood of some point will certainly require the Jacobian matrix to be invertible at that point. In particular we must have \(n=m\), in which case the determinant of this matrix is called the Jacobian determinant of the map \(f\). We now state the important inverse function theorem.

Theorem (Inverse function theorem) Suppose \(f:\RR^n\mapto\RR^n\) is smooth on some open subset of \(\RR^n\). Then if \(\det\mathbf{J}_f(a)\neq0\) at some \(a\) in that subset then there exists an open neighbourhood \(U\) of \(a\) such that \(V=f(U)\) is open and \(f:U\mapto V\) is a diffeomorphism. In this case, if \(x\in U\) and \(y=f(x)\) then \(J_{f^{-1}}(y)=(J_f(x))^{-1}\).

Note that if \(f:U\mapto V\) is a diffeomorphism of open sets then we may form the identity function \(f\circ f^{-1}\) on \(V\). Clearly, for all \(y\in V\), \(J_{f\circ f^{-1}}(y)=\id_V\) but by the chain rule we have \(\id_V=J_{f\circ f^{-1}}(y)=J_f(x)J_{f^{-1}}(y)\) for any \(y=f(x)\in V\) and so \(J_f(x)\) is invertible at all points \(x\in U\).

Example In one dimension, the function \(f(x)=x^3\) is invertible with \(f^{-1}(x)=x^{1/3}\). Notice though that, \(f'(x)=3x^2\), so that, \(f'(0)=0\), and the hypothesis of the inverse function theorem is violated. The point is that \(f^{-1}\) is not differentiable at \(f(0)=0\).

A useful consequence of the inverse function theorem is the following. If \(U\subset\RR^n\) is some open subset of \(\RR^n\) on which a map \(f:U\mapto\RR^n\) is smooth and for which the Jacobian determinant \(\det\mathbf{J}_f(x)\neq0\) for all \(x\in U\) then \(f(U)\) is open and if \(f\) is injective then \(f:U\mapto f(U)\) is a diffeomorphism. To see this, note that since at every \(x\in U\), \(\det\mathbf{J}_f(x)\neq0\), the inverse function theorem tells us that we have open sets which we can call \(U_x\) and \(V_x\) such that \(x\in U_x\) and \(V_x=f(U_x)\) open in \(f(U)\) so that, since \(f(x)\in V_x\subset f(U)\), \(f(U)\) is open. If \(f\) is injective then, since by the theorem \(f:U_x\mapto V_x\) is a diffeomorphism for every \(x\in U\) and since \(f(U)\) is open we conclude that the inverse \(f^{-1}\) is smooth on \(f(U)\) so that indeed \(f:U\mapto f(U)\) is a diffeomorphism.

A coordinate system, \((y^1,\dots,y^n)\), for some subset \(U\) of points of \(\RR^n\) is simply a map
\begin{equation}
(x^1,\dots,x^n)\mapsto(y^1(x^1,\dots,x^n),\dots,y^n(x^1,\dots,x^n)), \label{map:coordmap}
\end{equation}
allowing us to (re-)coordinatize points \(x=(x^1,\dots,x^n)\in U\). Intuitively, for the \(y^i\) to be good coordinates, the map \eqref{map:coordmap} should be a diffeomorphism — points should be uniquely identified and we should be able to differentiate at will. Using the inverse function theorem we can test this by examining the Jacobian of the transformation.

Example Consider the coordinate transformation maps of the previous section. For polar coordinates in the plane the map \(r,\theta)\mapsto(r\cos\theta,r\sin\theta\) defined on the open set \((0,\infty)\times\RR\) is smooth with the Jacobian determinant \(r\) which is non-zero everywhere on the domain. Thus the inverse function theorem tells us that the restriction of this map to any open subset on which it is injective is a diffeomorphism onto its image. We could restrict, for example, to \((0,\infty)\times(0,2\pi)\) and the polar coordinates map is then a diffeomorphism onto the complement of the non-negative \(x\)-axis. For cylindrical coordinates the Jacobian is \(\rho\). Restricting to \((0,\infty)\times(0,2\pi)\times\RR\) the cylindrical polar coordinates map is a diffeomorphism onto the complement of the \(yz\) half plane corresponding to non-negative \(y\)-values. In the case of spherical polar coordinates the Jacobian is \(r^2\sin\theta\) so restricting to \((0,\infty)\times(0,\pi)\times(0,2\pi)\) we have a diffeomorphism onto the image.

Vector differentiation

Recall that the derivative, \(f'(t)\), of a scalar function of one real variable, \(f(t)\), is defined to be
\begin{equation}
f'(t)=\lim_{h\mapto 0}\frac{f(t+h)-f(t)}{h}.\label{def:one-dim deriv}
\end{equation}
We can also consider functions taking values in \(\RR^n\), \(f:\RR\mapto\RR^n\). In the definition of derivative we’ll then explicitly make use of the vector space nature of \(\RR^n\) and though we won’t in general, it can be useful in this context to denote the image under \(f\) of some \(x\in\RR\) using bold face \(\mathbf{f}(x)\).
\begin{equation}
\mathbf{f}'(x)=\frac{d\mathbf{f}}{dx}=\lim_{h\mapto 0}\frac{\mathbf{f}(x+h)-\mathbf{f}(x)}{h}.
\end{equation}
The vector \(\mathbf{f}(x)\) is nothing but the vector corresponding to the element \(f(x)\in\RR^n\) with respect to the standard basis in \(\RR^n\). The following product rules follow from this definition in the same way as the scalar function product rule,
\begin{align}
\frac{d}{dx}\left(c(x)\mathbf{f}(x)\right)&=c\frac{d\mathbf{f}}{dx}+\frac{dc}{dx}\mathbf{f},\\
\frac{d}{dx}\left(\mathbf{f}(x)\cdot\mathbf{g}(x)\right)&=\mathbf{f}\cdot\frac{d\mathbf{g}}{dx}+\frac{d\mathbf{f}}{dx}\cdot\mathbf{g},\\
\frac{d}{dx}\left(\mathbf{f}(x)\times\mathbf{g}(x)\right)&=\mathbf{f}\times\frac{d\mathbf{g}}{dx}+\frac{d\mathbf{f}}{dx}\times\mathbf{g},
\end{align}
where \(c:\RR\mapto\RR\) and \(g:\RR\mapto\RR^n\) with \(\mathbf{g}(x)\) the vector representation of \(g(x)\) with respect to the standard basis.

More generally, we can consider vector valued functions \(f:\RR^n\mapto\RR^m\) such that points \(x\in\RR^n\) are mapped to points \(f(x)=(f^1(x),\dots,f^m(x))\) of \(\RR^m\) where we have here introduced the component functions, \(f^i:\RR^n\mapto\RR\), of \(f\). Such a function \(f\) is said to be differentiable at \(a\) if there exists a linear map \(J_f(a):\RR^n\mapto\RR^m\) such that
\begin{equation}
\lim_{h\mapto0}\frac{|f(a+h)-f(a)-J_f(a)h|}{|h|}=0\label{eq:genderiv}
\end{equation}
where \(||\) is the appropriate length (for \(\RR^m\) in the numerator and \(\RR^n\) in the denominator). In this case, \(J_f(a)\) is called the derivative (sometimes total derivative) of \(f\) at \(a\). Introducing \(R(h)\in\RR^m\) as \(R(h)=f(a+h)-f(a)-J_f(a)h\) we can interpret \eqref{eq:genderiv} as saying that
\begin{equation}
f(a+h)=f(a)+J_f(a)h+R(h)
\end{equation}
where the “remainder” \(R(h)\) is such that \(\lim_{h\mapto0}|R(h)|/|h|=0\) and so can interpret \(J_f(a)\) as linearly approximating \(f(a+h)-f(a)\) near \(a\). Perhaps not surprisingly it turns out that, if \(f\) is differentiable at \(a\), then with respect to the standard bases of \(\RR^n\) and \(\RR^m\) the matrix of the linear map \(J_f(a)\), \(\mathbf{J}_f(a)\), has elements given by the partial derivatives,
\begin{equation}
{J_f(a)}_i^j=\frac{\partial f^j}{\partial x^i}(a).
\end{equation}
To see this, note that, if it exists, the \(i\)th partial derivative of \(f^j\) at \(a\) is given by
\begin{equation}
\partial_if^j(a)=\lim_{\epsilon\mapto0}\frac{f^j(a+\epsilon e_i)-f^j(a)}{\epsilon}.
\end{equation}
where \(e_i\) is \(i\)th standard basis element of \(\RR^n\). Now, recalling the definition of the remainder \(R(h)\in\RR^m\), we have that, with respect to the standard basis of \(\RR^m\) the \(j\)th component of \(R(\epsilon e_i)\) is \(R^j(\epsilon e_i)=f^j(a+\epsilon e_i) – f^j(a)-{J_f(a)}_i^j\epsilon\). Therefore we can write
\begin{align*}
\partial_if^j(a)&=\lim_{\epsilon\mapto0}\frac{f^j(a+\epsilon e_i)-f^j(a)}{\epsilon}=\lim_{\epsilon\mapto0}\frac{{J_f(a)}_i^j\epsilon+R^j(\epsilon e_i)}{\epsilon}\\
&={J_f(a)}_i^j+\lim_{\epsilon\mapto0}\frac{R^j(\epsilon e_i)}{\epsilon}\\
&={J_f(a)}_i^j.
\end{align*}
The converse also holds. That is, if all the component functions of a function \(f:\RR^n\mapto\RR^m\) are differentiable at a point \(a\in\RR^n\), then \(f\) is differentiable at \(a\). Thus, we have that a function \(f:\RR^n\mapto\RR^m\) is differentiable at \(a\) if and only if all its component functions are differentiable at \(a\). In this case, with respect to the standard bases of \(\RR^n\) and \(\RR^m\), the matrix of the derivative of \(f\), \(\mathbf{J}_f(a)\), the matrix of partial derivatives of the component functions at \(a\). This matrix is called the Jacobian matrix of \(f\) at \(a\).

A function \(f:\RR^n\mapto\RR^m\) is said to be smooth if all its component functions are smooth. A smooth function \(f\) between open sets of \(\RR^n\) and \(\RR^m\) is called a diffeomorphism if it is bijective and its inverse function is also smooth. We will consider the inevitability of functions in the section on the inverse function theorem.

The derivative of a composition of maps \(f:\RR^n\mapto\RR^m\), and \(g:\RR^m\mapto\RR^p\), \(g\circ f\), at a point \(a\in\RR^n\), that is, the generalisation of the familiar chain rule, is then given my the matrix product of the respective Jacobian matrices,
\begin{equation}
J_{g\circ f}(a)=J_g(f(a))J_f(a).
\end{equation}

Example Suppose \(f:\RR^n\mapto\RR^m\) is a linear map whose matrix representation with respect to the standard bases of \(\RR^n\) and \(\RR^m\) is \(\mathbf{f}\). Then \(f\) maps \(\mathbf{x}\mapsto\mathbf{f}\mathbf{x}\) so clearly \(\mathbf{J}_f=\mathbf{f}\).

Example
As we discussed earlier, the relationship between polar and cartesian coordinates can be described through a map from \(\RR^2\mapto\RR^2\) given by
\begin{equation}
\begin{pmatrix}r\\\theta\end{pmatrix}\mapsto\begin{pmatrix}r\cos\theta\\r\sin\theta\end{pmatrix},
\end{equation}
the domain of which we take to be \((0,\infty)\times[0,2\pi)\subset\RR^2\). We typically write the components of this map as \(x(r,\theta)=r\cos\theta\) and \(y(r,\theta)=r\sin\theta\). The Jacobian matrix for the polar coordinate map is then
\begin{equation}
\begin{pmatrix}
\partial x/\partial r&\partial x/\partial\theta\\
\partial y/\partial r&\partial y/\partial\theta
\end{pmatrix}=
\begin{pmatrix}
\cos\theta&-r\sin\theta\\
\sin\theta& r\cos\theta
\end{pmatrix}.
\end{equation}
Likewise, cylindrical coordinates are related to cartesian coordinate through a map from \(\RR^3\mapto\RR^3\) given by
\begin{equation}
\begin{pmatrix}
\rho\\\phi\\z\end{pmatrix}\mapsto\begin{pmatrix}
\rho\cos\phi\\\rho\sin\phi\\z
\end{pmatrix}.
\end{equation}
In this case the domain is taken to be \((0,\infty)\times[0,2\pi)\times\RR\subset\RR^3\) and the Jacobian matrix is
\begin{equation}
\begin{pmatrix}
\cos\phi&-\rho\sin\phi&0\\
\sin\phi&\rho\cos\phi&0\\
0&0&1
\end{pmatrix}.
\end{equation}
For spherical polar coordinates the \(\RR^3\mapto\RR^3\) map is
\begin{equation}
\begin{pmatrix}
r\\\theta\\\phi
\end{pmatrix}\mapsto\begin{pmatrix}
r\sin\theta\cos\phi\\
r\sin\theta\sin\phi\\
r\cos\theta
\end{pmatrix},
\end{equation}
we take the domain to be \((0,\infty)\times(0,\pi)\times[0,2\pi)\subset\RR^3\) and with Jacobian matrix
\begin{equation}
\begin{pmatrix}
\sin\theta\cos\phi&r\cos\theta\cos\phi&-r\sin\theta\sin\phi\\
\sin\theta\sin\phi&r\cos\theta\sin\phi&r\sin\theta\cos\phi\\
\cos\theta&-r\sin\theta&0\\
\end{pmatrix}.
\end{equation}

Partial derivatives and some applications

Some Multivariable Functions

The most familiar examples of multivariable functions are those taking values in \(\RR\). These are also called scalar fields — they assign a scalar to each point in space. One example, a function \(f:\RR^2\mapto\RR\), is
\begin{equation*}
f(x,y)=x^2+y^2.
\end{equation*}
We say that it has ‘level curves’ — the set of points \((x,y)\) such that \(f(x,y)=r^2\) — which are circles of radius \(r\). An analogous example this time of a function \(f:\RR^3\mapto\RR\) is
\begin{equation*}
f(x,y,z)=x^2+y^2+z^2,
\end{equation*}
and in this case a ‘level surface’, specified by the points \((x,y,z)\) such that \(f(x,y,z)=r^2\), is a sphere of radius \(r\).

The curvilinear coordinates provide important examples of functions taking values in \(\RR^2\) and \(\RR^3\). Take the polar coordinates first. The function mapping a points polar coordinates to its cartesian coordinates is given by \(f:(0,\infty)\times[0,2\pi)\mapto\RR^2\)
\begin{equation*}
f(r,\theta)=(r\cos\theta,r\sin\theta).
\end{equation*}
The function mapping a point’s cylindrical coordinates to its Cartesian coordinates is a function \((0,\infty)\times[0,2\pi)\times\RR\mapto\RR^3\) which we could write as
\begin{equation*}
f(\rho,\varphi,z)=(\rho\cos\varphi,\rho\sin\varphi,z).
\end{equation*}
The function mapping a point’s spherical coordinates to its Cartesian coordinates is a function \((0,\infty)\times(0,\pi)\times[0,2\pi)\mapto\RR^3\) which we could write as
\begin{equation*}
f(r,\theta,\varphi)=(r\cos\varphi\sin\theta,r\sin\varphi\sin\theta,r\cos\theta).
\end{equation*}
Note that in each of these functions the domain has been restricted to ensure the function is one-to-one.

Definition of partial derivative

If \(f:\RR^n\mapto\RR\) is a real valued function on \(\RR^n\) we define the partial derivative of \(f\) with respect to \(x^i\) as,
\begin{equation}
\frac{\partial f}{\partial x^i}=\lim_{\epsilon\mapto0}\frac{f(x^1,\dots,x^i+\epsilon,\dots,x^n)-f(x^1,\dots,x^i,\dots,x^n) }{\epsilon}.
\end{equation}
Thus, a small change \(\Delta x^i\) in the \(x^i\) coordinate leads to an increment in the value of the function given by,
\begin{equation}
\Delta f\approx\frac{\partial f}{\partial x^i}\Delta x^i.
\end{equation}
More generally, we have
\begin{equation}
\Delta f\approx\sum_{i=1}^n\frac{\partial f}{\partial x^i}\Delta x^i=\partial_if\Delta x^i,
\end{equation}
where we’ve introduced the notation,
\begin{equation}
\partial_if=\frac{\partial f}{\partial x^i}.
\end{equation}
In this notation, second order partials are represented as
\begin{equation}
\partial_{ij}f=\frac{\partial^2 f}{\partial x^i\partial x^j}.
\end{equation}
An extremely important property of partial derivatives is that the order in which we take partial derivatives is irrelevant: \(\partial_{ij}f=\partial_{ji}f\). A function \(f:\RR^n\mapto\RR\) is said to be smooth if all higher order partials exist and are continuous. We denote the set of smooth functions \(f:\RR^n\mapto\RR\) by \(C^\infty(\RR^n)\).

Leibnitz’ rule

Partial differentiation can be useful in evaluating certain integrals, a technique informally known as ‘differentiating under the integral sign’. Suppose \(F(x,t)=\int f(x,t)\,dt\), then
\begin{equation*}
\frac{\partial F}{\partial t}=f(x,t),
\end{equation*}
so that
\begin{equation*}
\frac{\partial^2 F(x,t)}{\partial x\partial t}=\frac{\partial f(x,t)}{\partial x},
\end{equation*}
which upon integrating yields
\begin{equation*}
\frac{\partial F(x,t)}{\partial x}=\int \frac{\partial f(x,t)}{\partial x}\,dt.
\end{equation*}
More generally, if
\begin{equation*}
I(x)=\int_{u(x)}^{v(x)}f(x,t)\,dt=F(x,v(x))-F(x,u(x)),
\end{equation*}
then \(\partial I/\partial v=f(x,v(x))\), \(\partial I/\partial u=-f(x,u(x))\) and
\begin{align*}
\frac{\partial I}{\partial x}&= \int^v\frac{\partial f(x,t)}{\partial x}\,dt-\int^u\frac{\partial f(x,t)}{\partial x}\,dt\\
&=\int_u^v\frac{\partial f(x,t)}{\partial x}\,dt
\end{align*}
so that
\begin{equation}
\frac{dI}{dx}=f(x,v(x))\frac{dv}{dx}-f(x,u(x))\frac{du}{dx}+\int_u^v\frac{\partial f(x,t)}{\partial x}\,dt
\end{equation}
which is called Leibnitz’ rule.

Example
If
\begin{equation}
\phi(\alpha)=\int_\alpha^{\alpha^2}\frac{\sin\alpha x}{x}\,dx
\end{equation}
then by Leibnitz’ rule we have,
\begin{align*}
\phi'(\alpha)&=\frac{\sin\alpha^3}{\alpha^2}\cdot2\alpha-\frac{\sin\alpha^2}{\alpha}+\int_\alpha^{\alpha^2}\cos\alpha x\,dx\\
&=2\frac{\sin\alpha^3}{\alpha}-\frac{\sin\alpha^2}{\alpha}+\frac{\sin\alpha^3}{\alpha}-\frac{\sin\alpha^2}{\alpha}\\
&=\frac{3\sin\alpha^3-2\sin\alpha^2}{\alpha}.
\end{align*}

Taylor expansion and stationary points

The Taylor expansion for \(f:\RR^n\mapto\RR\) about a point \(a\) is
\begin{equation}
f(x)=f(a)+\sum_{i=1}^n\partial_if(a)(x^i-a^i)+\frac{1}{2}\sum_{i,j=1}^n\partial_{ij}f(a)(x^i-a^i)(x^j-a^j)+\dots,
\end{equation}
where for clarity (here and below) we’re not employing the summation convention for repeated indices.

The stationary points of a function \(f\) may be analysed with the help of the Taylor expansion as follows. At any stationary point, \(a\), the first partial derivatives must be zero. To try to determine the nature of the stationary point we consider the approximation of the function given by the Taylor expansion about the point,
\begin{equation}
f(x)-f(a)\approx\frac{1}{2}{\Delta\mathbf{x}}^\mathsf{T}\mathbf{M}\Delta\mathbf{x}\label{eq:1st order taylor}
\end{equation}
where \(\Delta\mathbf{x}^\mathsf{T}=(x^1-a^1,\dots,x^n-a^n)\), \(\Delta\mathbf{x}\) the corresponding column vector and \(\mathbf{M}\) is the matrix with elements \(M_{ij}=\partial_{ij}f(a)\). Since \(\mathbf{M}\) is a real symmetric matrix it is diagonalisable through a similarity transformation by an orthogonal matrix \(\mathbf{O}\). That is, \(\mathbf{O}^\mathsf{T}\mathbf{M}\mathbf{O}\) is diagonal with diagonal elements the eigenvalues of \(\mathbf{M}\). Thus we have
\begin{equation}
f(x)-f(a)\approx\frac{1}{2}{\Delta\mathbf{x}’}^\mathsf{T}\mathbf{M}’\Delta\mathbf{x}’,
\end{equation}
where \(\Delta\mathbf{x}’=\mathbf{O}^\mathsf{T}\Delta\mathbf{x}\) and \(M’_{ij}=\delta_{ij}\lambda_i\) with \(\lambda_i\) the eigenvalues of \(\mathbf{M}\). That is,
\begin{equation}
f(x)-f(a)\approx\frac{1}{2}\sum_i(\Delta x’^i)^2\lambda_i,
\end{equation}
from which we conclude the following:

  1. If \(\lambda_i>0\) for all \(i\) then the stationary point at \(a\) is a minimum.
  2. If \(\lambda_i<0\) for all \(i\) then the stationary point at \(a\) is a maximum.
  3. If at least one \(\lambda_i>0\) and at least one \(\lambda_i<0\) then the stationary point at \(a\) is a saddle point (a stationary point which is not an extremum).
  4. If some \(\lambda_i=0\) and the non-zero \(\lambda_i\) all have the same sign then the test is inconclusive.

For a function of two real variables, \(f:\RR^2\mapto\RR\), the eigenvalues of \(\mathbf{M}\) are obtained from
\begin{equation*}
\det\begin{pmatrix}
\partial_{xx}f-\lambda & \partial_{xy}f\\
\partial_{xy}f & \partial_{yy}f-\lambda\\
\end{pmatrix}=\lambda^2-(\partial_{xx}f+\partial_{yy}f)\lambda+\partial_{xx}f\partial_{yy}f-\partial_{xy}f^2=0.
\end{equation*}
That is,
\begin{equation*}
\lambda=(\partial_{xx}f+\partial_{yy}f)\pm\sqrt{(\partial_{xx}f-\partial_{yy}f)^2+4\partial_{xy}f^2},
\end{equation*}
so that for \(\lambda>0\) we need \(\partial_{xx}f>0\), \(\partial_{yy}f>0\) and \(\partial_{xx}f\partial_{yy}f-\partial_{xy}f^2>0\). For \(\lambda<0\) we need \(\partial_{xx}f<0\), \(\partial_{yy}f<0\) and \(\partial_{xx}f\partial_{yy}f-\partial_{xy}f^2>0\). For a saddle point we need \(\partial_{xx}f\) and \(\partial_{yy}f\) having opposite signs or \(\partial_{xx}f\partial_{yy}f<\partial_{xy}f^2\).

Taylor’s Theorem

Further refining our index notation for partials such that for any \(m\)-tuple \(I=(i_1,\dots,i_m)\) and \(|I|=m\) we define
\begin{equation}
\partial_I=\frac{\partial^m}{\partial x^{i_1}\cdots\partial x^{i_m}}
\end{equation}
and
\begin{equation}
(x-a)^I=(x^{i_1}-a^{i_1})\cdots(x^{i_m}-a^{i_m})
\end{equation}
we can state Taylor’s Theorem. This says that for a function \(f:\RR^n\mapto\RR\) appropriately differentiable near a point \(a\) we have for all \(x\) near \(a\),
\begin{equation}
f(x)=P_k(x)+R_k(x),
\end{equation}
where
\begin{equation}
P_k(x)=f(a)+\sum_{m=1}^k\frac{1}{m!}\sum_{I:|I|=m}(x-a)^I\partial_If(a),
\end{equation}
is the the \(k\)th-order Taylor polynomial of \(f\) at \(a\) and
\begin{equation}
R_k(x)=\frac{1}{k!}\sum_{I:|I|=k+1}(x-a)^I\int_0^1(1-t)^k\partial_If(a+t(x-a))dt
\end{equation}
is the \(k\)th remainder term. To see why this is true we use induction. For \(k=0\), \(P_0(x)=f(a)\) and the \(k\)th remainder term is
\begin{align*}
R_0(x)&=\sum_{i=1}^n(x^i-a^i)\int_0^1\partial_if(a+t(x-a))dt\\
&=\int_0^1\frac{d}{dt}f(a+t(x-a))dt\\
&=f(x)-f(a).
\end{align*}
Now assume the result for some \(k\) and use integration by parts on the integral in the remainder term,
\begin{align*}
\int_0^1(1-t)^k\partial_If(a+t(x-a))dt&=\left.\left(-\frac{(1-t)^{k+1}}{k+1}\partial_If(a+t(x-a))\right)\right\rvert_0^1\\
&+\frac{(1-t)^{k+1}}{k+1}\frac{d}{dt}\partial_If(a+t(x-a))dt\\
&=\frac{1}{k+1}\partial_If(a)\\
&+\frac{1}{k+1}\sum_{i=1}^n(x^i-a^i)\int_0^1(1-t)^k+1\frac{\partial}{\partial x^i}\partial_If(a+t(x-a))dt
\end{align*}
Now observe that
\begin{equation*}
P_k(x)+\frac{1}{k!}\sum_{I:|I|=k+1}(x-a)^I\frac{1}{k+1}\partial_If(a)=P_{k+1}(x)
\end{equation*}
and that
\begin{align*}
&\frac{1}{k!}\sum_{I:|I|=k+1}(x-a)^I\frac{1}{k+1}\sum_{i=1}^n(x^i-a^i)\int_0^1(1-t)^k+1\frac{\partial}{\partial x^i}\partial_If(a+t(x-a))dt\\
&=\frac{1}{(k+1)!}\sum_{I:|I|=k+1}\sum_{i=1}^n(x-a)^I(x^i-a^i)\int_0^1(1-t)^k+1\frac{\partial}{\partial x^i}\partial_If(a+t(x-a))dt\\
&=\frac{1}{(k+1)!}\sum_{I:|I|=k+2}(x-a)^I\int_0^1(1-t)^k+1\partial_If(a+t(x-a))dt\\
&=R_{k+1}(x).
\end{align*}

Differentials

Recall that the total differential \(df\) of a function \(f:\RR^n\mapto\RR\) is defined to be
\begin{equation}
df=\partial_if dx^i.
\end{equation}
Though the relation to the infinitesimal increment is clear, there is no approximation intended here. Later we will formally define \(df\) as an object belonging to the dual space of the tangent space at a point, a “differential form”, but for the time being it is safe to think of it either as a small change in \(f\) or as the kind of object we are used to integrating.

When working with partial derivatives it is always wise to indicate clearly which variables are being held constant. Thus,
\begin{equation}
\left(\frac{\partial\phi}{\partial x}\right)_{y,z},
\end{equation}
means the partial derivative of \(\phi\), regarded as a function of \(x\), \(y\) and \(z\), with respect to \(x\), holding \(y\) and \(z\) constant. The following example demonstrates how differentials naturally ‘spit out’ all partial derivatives simultaneously.

Example Suppose \(w=x^3y-z^2t\), \(xy=zt\), and we wish to calculate
\begin{equation*}
\left(\frac{\partial w}{\partial y}\right)_{x,t}.
\end{equation*}
We could either proceed directly, using the chain rule,
\begin{equation*}
\left(\frac{\partial w}{\partial y}\right)_{x,t}=x^3-2zt\quad\quad\left(\frac{\partial z}{\partial y}\right)_{x,t}=x^3-2xz,
\end{equation*}
or take differentials,
\begin{equation*}
dw=3x^2y\,dx+x^3\,dy-2zt\,dz-z^2\,dt,
\end{equation*}
\begin{equation*}
y\,dx+x\,dy=t\,dz+z\,dt,
\end{equation*}
then subsituting for \(dz\), since \(x\), \(y\) and \(t\) are being treated as the independent variables, to get,
\begin{equation*}
dw=(3x^2y-2yz)\,dx+(x^3-2xz)\,dy+z^2\,dt,
\end{equation*}
from which we obtain all the partials at once,
\begin{equation*}
\left(\frac{\partial w}{\partial x}\right)_{y,t}=3x^2y-2yz\quad
\left(\frac{\partial w}{\partial y}\right)_{x,t}=x^3-2xz\quad
\left(\frac{\partial w}{\partial t}\right)_{x,y}=z^2.\\
\end{equation*}

A differential of the form \(g_idx^i\) is said to be exact if there exists a function \(f\) such that \(df=g_idx^i\). This turns out to be an important attribute and in many situations is equivalent to the condition that \(\partial_ig_j=\partial_jg_i\) for all pairs \(i,j\).

Example Consider the differential, \((x+y^2)\,dx+(2xy+3y^2)\,dy\). This certainly satisfies the condition so let us try to identify an \(f\) such that it is equal to \(df\). Integrating \((x+y^2)\) with respect to \(x\) treating \(y\) as constant, we find our candidate must have the form, \(x^2/2+xy^2+c(y)\), where \(c(y)\) is some function of \(y\). Now differentiating this with respect to \(y\) we get \(2xy+c'(y)\) and this must be equal to \(2xy+3y^2\). Therefore \(f\) must have the form \(f(x,y)=x^2/2+xy^2+y^3+c\) where \(c\) is an arbitrary constant.

The Vector Space \(\RR^n\)

Points and coordinates

We’ll be considering the generalisation to \(n\)- (particularly \(n=3\)) dimensions of the familiar notions of single variable differential and integral calculus. This will all be further generalised when we come to the discussion of calculus on manifolds and with that goal in mind we’ll try to take a little more care than is perhaps strictly necessary in setting the stage.

Our space will be \(\RR^n\), \(n\)-dimensional Euclidean space. Forget for the moment that this is a vector space and consider it simply as a space of points, \(n\)-tuples such as \(a=(a^1,\dots,a^n)\). The (Cartesian) coordinates on \(\RR^n\) will be denoted by \(x^1,\dots,x^n\) so that the \(x^ith\) coordinate of the point \(a\) is \(a^i\). When talking about a general (variable) point in \(\RR^n\) we’ll denote it by \(x\) with \(x=(x^1,\dots,x^n)\). Beware though that in two and three dimensions we’ll sometimes also denote the Cartesian coordinates as \(x,y\) and \(x,y,z\) respectively. The \(x^is\) in \(\RR^n\) and the \(x\), \(y\), and \(z\) in \(\RR^2\) and \(\RR^3\) are best thought of as coordinate functions. So, for example, \(x^i:\RR^n\mapto\RR\) is such that \(x^i(a)=a^i\). When discussing general (variable) points we’re therefore abusing notation — using the same symbol to denote coordinate functions and coordinates. In other words we might come across a notationally undesirable equation such as \(x^i(x)=x^i\). Context should make it clear what is intended. It’s worth noting here that every point of the \(\RR^n\) space can be specified by a single Cartesian coordinate system.

When we come to consider more general spaces, for example the surface of a sphere, this will not be the case. In such cases we’ll still assign (Cartesian) coordinates to points in our space through coordinate maps which effectively identify coordinate patches of the general space of points with pieces of \(\RR^n\). Where these patches overlap, the coordinate maps must, in a precise mathematical sense, be compatible — we must be able to consistently “sew” the patches together. Spaces which can, in this way, be treated as “locally Euclidean” are important because we can do calculus on functions on these spaces just as we can for functions on \(\RR^n\). We simply exploit the vector space properties of \(\RR^n\) via the coordinate maps. Crucial in this regard is the fact that \(\RR^n\) is a normed vector space, the norm, \(|a|\), of a point \(a=(a^1,\dots,a^n)\) being given by \(|a|=\sqrt{(a^1)^2+\cdots+(a^n)^2}\) so that the distance between two points is given by \(d(a,b)\) where
\begin{equation}
d(a,b)=\sqrt{(b^1-a^1)^2+\cdots+(b^n-a^n)^2}.
\end{equation}
As a vector space, \(\RR^n\) has a standard set of basis vectors, \(e_1,\dots,e_n\), with \(e_i\) typically regarded as column vector with 1 in the \(i\)th row and zeros everywhere else.

Vectors and the choice of scalar product

Standard treatments of vector calculus exploit the fact that, \(\RR^3\) say, can be simultaneously thought of as a space of points and of vectors. There’s no need to distinguish since we can always “parallel transport” a vector at some point in space back to the origin or, for that matter, to any other point. In such treatments the usual scalar product is typically taken for granted. Personally, I’ve found that this leads to a certain amount of confusion as to the real role of the scalar product, particularly when it comes to, say, discussions of the geometry of spacetime in special relativity. In that case the space is \(\RR^4\) as per the previous section but the scalar product of tangent vectors is crucially not the Euclidean scalar product.

For this reason, we’ll take some care to distinguish the intuitive notion of vectors as “arrows in space” from the underlying space of points. To each point, \(x\), in \(\RR^n\) will be associated a vector space, the tangent space at \(x\), \(T_x(\RR^n)\). This is the space containing all the arrows at the point \(x\) and is, of course, a copy of the vector space \(\RR^n\). In other words our intuitive notion of an arrow between two points \(a\) and \(b\) is treated as an object within the tangent space at \(a\). When dealing with tangent vectors we’ll use boldface. Thus, the standard set of basis vectors in a tangent space, \(T_x(\RR^n)\), will be denoted \(\mathbf{e}_1,\dots,\mathbf{e}_n\), with \(\mathbf{e}_i\) a column vector with 1 in the \(i\)th row and zeros everywhere else. The basis vector \(\mathbf{e}_i\) can be regarded as pointing from \(x\) in the direction of increasing coordinate \(x^i\).

The usual scalar product, also call dot product, of two vectors \(\mathbf{u}=(u^1,\dots,u^n)\) and \(\mathbf{v}=(v^1,\dots,v^n)\) of \(\RR^n\), is given by
\begin{equation}
\mathbf{u}\cdot\mathbf{v}=\sum_{i=1}^nu^iv^i.
\end{equation}
The dot product is a non-degenerate, symmetric, positive-definite inner product on \(\RR^n\) and allows us to define the length of any vector \(\mathbf{v}\) as
\begin{equation}
|\mathbf{v}|=\sqrt{\mathbf{v}\cdot\mathbf{v}}.
\end{equation}
Thanks to the Cauchy-Schwarz Theorem the angle, \(\theta\), between two vectors \(\mathbf{u}\) and \(\mathbf{v}\) may be defined as
\begin{equation}
\cos\theta=\frac{\mathbf{u}\cdot\mathbf{v}}{|\mathbf{u}||\mathbf{v}|}.
\end{equation}
As we’ve mentioned, the Minkowski space-time of special relativity is, as a space of points, \(\RR^4\). However a different choice of scalar product, in this case called a metric, is made, namely, \(\mathbf{u}\cdot\mathbf{v}=-u^0v^0+u^1v^1+u^2v^2+u^3v^3\).

In \(\RR^3\), recall that given two vectors \(\mathbf{u}\) and \(\mathbf{v}\), their vector product, \(\mathbf{u}\times\mathbf{v}\), with respect to Cartesian basis vectors is defined as,
\begin{equation}
\mathbf{u}\times\mathbf{v}=(u^2v^3-u^3v^2)\mathbf{e}_1-(u^1v^3-u^3v^1)\mathbf{e}_2+(u^1v^2-u^2v^1)\mathbf{e}_3,
\end{equation}
which can be conveniently remembered as a determinant,
\begin{equation}
\mathbf{u}\times\mathbf{v}=\det\begin{pmatrix}
\mathbf{e}_1&\mathbf{e}_2&\mathbf{e}_3\\
u^1&u^2&u^3\\
v^1&v^2&v^3
\end{pmatrix}.
\end{equation}
Alternatively, using the summation convention,
\begin{equation}
(\mathbf{u}\times\mathbf{v})^i=\epsilon^i_{jk}u^jv^k,
\end{equation}
where \(\epsilon^i_{jk}=\delta^{il}\epsilon_{ljk}\) is the Levi-Civita symbol. Note that the distinction between upper and lower indices is not important here but in more general contexts it will become so and therefore here we choose to take more care than is really necessary. The Levi-Civita symbol is given by
\begin{align*}
\epsilon_{123}=\epsilon_{231}=\epsilon_{312}&=1\\
\epsilon_{213}=\epsilon_{132}=\epsilon_{321}&=-1\\
\end{align*}
and zero in all other cases. Recall that the Levi-Civita symbol satisfies the useful relations,
\begin{equation}
\epsilon_{ijk}\epsilon_{ipq}=\delta_{jp}\delta_{kq}-\delta_{jq}\delta_{kp}
\end{equation}
and
\begin{equation}
\epsilon_{ijk}\epsilon_{ijq}=2\delta_{kq}.
\end{equation}
where summation over repeated indices is understood.

Geometrically, \(|\mathbf{u}\times\mathbf{v}|\) is the area of the parallelogram with adjacent sides \(\mathbf{u}\) and \(\mathbf{v}\), \(|\mathbf{u}||\mathbf{v}|\sin\theta\), and its direction is normal to the plane of those vectors. Of the two possible normal directions, the right hand rule gives the correct one.

The combination \((\mathbf{u}\times\mathbf{v})\cdot\mathbf{w}\) is called the triple product. It is the volume of the parallelepiped with base area \(|\mathbf{u}\times\mathbf{v}|\) and height \(\mathbf{w}\cdot\hat{\mathbf{n}}\), where \(\hat{\mathbf{n}}=\mathbf{u}\times\mathbf{v}/|\mathbf{u}\times\mathbf{v}|\). It has the property that permuting the three vectors cyclically doesn’t affect its value,
\begin{equation*}
(\mathbf{u}\times\mathbf{v})\cdot\mathbf{w}=(\mathbf{v}\times\mathbf{w})\cdot\mathbf{u}=(\mathbf{w}\times\mathbf{u})\cdot\mathbf{v},
\end{equation*}
and also that
\begin{equation*}
(\mathbf{u}\times\mathbf{v})\cdot\mathbf{w}=\mathbf{u}\cdot(\mathbf{v}\times\mathbf{w}).
\end{equation*}
Both of these follow immediately from the observation that
\begin{equation}
(\mathbf{u}\times\mathbf{v})\cdot\mathbf{w}=\delta^{il}\epsilon_{ljk}u^jv^kw^i.
\end{equation}

A useful formula relating the cross and scalar products is
\begin{equation}
\mathbf{u}\times(\mathbf{v}\times\mathbf{w})=(\mathbf{u}\cdot\mathbf{w})\mathbf{v}-(\mathbf{u}\cdot\mathbf{v})\mathbf{w}.
\end{equation}
This relationship is established as follows.
\begin{align*}
(\mathbf{u}\times(\mathbf{v}\times\mathbf{w}))^i&=\epsilon^i_{jk}u^j(\mathbf{v}\times\mathbf{w})^k\\
&=\epsilon^i_{jk}\epsilon^k_{lm}u^jv^lw^m\\
&=\epsilon^k_{ij}\epsilon^k_{lm}u^jv^lw^m\\
&=(\delta_{il}\delta_{jm}-\delta_{im}\delta_{jl})u^jv^lw^m\\
&=(\mathbf{u}\cdot\mathbf{w})v^i-(\mathbf{u}\cdot\mathbf{v})w^i
\end{align*}

A First Look at Curvilinear Coordinate Systems

In \(\RR^2\), a point whose Cartesian coordinates are \((x,y)\) could also be identified by its polar coordinates, \((r,\theta)\), where \(r\) is the length of the point’s position vector and \(\theta\) the angle between the position vector and the \(x\)-axis (as given by the vector \((1,0)\)). In fact what we are doing here is putting a subset of points of \(\RR^2\), \(\RR^2\) minus the origin (since the polar coordinates of the origin are not well defined) into 1-1 correspondence with a subset of points, \((0,\infty)\times[0,2\pi)\), of another copy of \(\RR^2\). We have a pair of coordinate functions, \(r:\RR^2\mapto\RR\) and \(\theta:\RR^2\mapto\RR\), such that \(r(x,y)=r=\sqrt{x^2+y^2}\) and \(\theta(x,y)=\theta=\tan^{-1}(y/x)\). Note again the unfortunate notation here — \(r\) and \(\theta\) are being used to denote coordinate functions as well as the coordinates (real numbers) themselves.

Coordinates at a point give rise to basis vectors for the tangent space at that point. We’ll discuss this more rigorously later, but the basic idea is simple. Take polar coordinates as an example. If we invert the 1-1 coordinate maps, \(r=r(x,y)\) and \(\theta=\theta(x,y)\) to obtain functions \(x=x(r,\theta)\) and \(y=y(r,\theta)\) then we may consider the two coordinate curves through any point \(P(x,y)\) obtained by holding in turn \(r\) and \(\theta\) fixed whilst allowing the other to vary. The tangent vectors at \(P\) to these curves are then the basis vectors corresponding to the coordinates being varied. Let’s consider some particular examples, for which the construction is geometrically straightforward.

In the case of \(\RR^2\), consider polar coordinates at a point \(P(x,y)\).
image
Then we have \(x=r\cos\theta\) and \(y=r\sin\theta\). Corresponding to the \(r\) and \(\theta\) coordinates are basis vectors \(\mathbf{e}_r\) and \(\mathbf{e}_\theta\) at \(P\), pointing respectively in the directions obtained by increasing the \(r\)-coordinate holding the \(\theta\)-coordinate fixed and increasing the \(\theta\)-coordinate holding the \(r\)-coordinate fixed. We can use the scalar product to compute the relationship between the Cartesian and polar basis vectors according to,
\begin{equation}
\mathbf{e}_r=(\mathbf{e}_r\cdot\mathbf{e}_x)\mathbf{e}_x+(\mathbf{e}_r\cdot\mathbf{e}_y)\mathbf{e}_y,
\end{equation}
and
\begin{equation}
\mathbf{e}_\theta=(\mathbf{e}_\theta\cdot\mathbf{e}_x)\mathbf{e}_x+(\mathbf{e}_\theta\cdot\mathbf{e}_y)\mathbf{e}_y,
\end{equation}
which, assuming \(\mathbf{e}_r\) and \(\mathbf{e}_\theta\) to be of unit length, result in the relations,
\begin{align}
\mathbf{e}_r&=\cos\theta\mathbf{e}_x+\sin\theta\mathbf{e}_y\\
\mathbf{e}_\theta&=-\sin\theta\mathbf{e}_x+\cos\theta\mathbf{e}_y.
\end{align}

In three dimensional space, the cylindrical coordinates of a point, \((\rho,\varphi,z)\), are related to its Cartesian coordinates by,
\begin{equation}
x=\rho\cos\varphi,\quad y=\rho\sin\varphi,\quad z=z,
\end{equation}

imageand its not difficult to check that the unit basis vectors defined any some point by the cylindrical coordinate system there are related to the Cartesian basis vectors as,
\begin{align}
\mathbf{e}_\rho&=\cos\varphi\mathbf{e}_x+\sin\varphi\mathbf{e}_y\\
\mathbf{e}_\varphi&=-\sin\varphi\mathbf{e}_x+\cos\varphi\mathbf{e}_y\\
\mathbf{e}_z&=\mathbf{e}_z.
\end{align}

The spherical polar coordinates,

image\((r,\theta,\varphi)\), are related to its Cartesian coordinates by,
\begin{equation}
x=r\cos\varphi\sin\theta,\quad y=r\sin\varphi\sin\theta,\quad z=r\cos\theta.
\end{equation}
To relate the unit basis vectors of the spherical polar coordinate system to the Cartesian basis vectors it is easiest to first express them in terms of the cylindrical basis vectors as,
\begin{align*}
\mathbf{e}_r&=\sin\theta\mathbf{e}_\rho+\cos\theta\mathbf{e}_z\\
\mathbf{e}_\theta&=\cos\theta\mathbf{e}_\rho-\sin\theta\mathbf{e}_z\\
\mathbf{e}_\varphi&=\mathbf{e}_\varphi,
\end{align*}
so that,
\begin{align}
\mathbf{e}_r&=\sin\theta\cos\varphi\mathbf{e}_x+\sin\theta\sin\varphi\mathbf{e}_y+\cos\theta\mathbf{e}_z\\
\mathbf{e}_\theta&=\cos\theta\cos\varphi\mathbf{e}_x+\cos\theta\sin\varphi\mathbf{e}_y-\sin\theta\mathbf{e}_z\\
\mathbf{e}_\varphi&=-\sin\varphi\mathbf{e}_x+\cos\varphi\mathbf{e}_y.
\end{align}

Time dilation and length contraction

In the previous post we learnt that if in one frame two clocks are synchronised a distance \(D\) apart, then in another frame in which these clocks are moving along the line joining them with speed \(v\), the clock in front lags the clock behind by a time of \(Dv/c^2\). Let’s now think more about the contrasting perspectives of Alice, riding a train, and Bob, track side, thinking in particular about their respective clock and length readings.

The following picture sums up Alice’s perspective:

Here and below, clocks in either Alice’s or Bob’s frame are denoted by the rounded rectangles with times displayed within. Lengths and times in Alice’s train frame will be denoted by primed symbols, those in Bob’s track frame, unprimed. Above we see that Alice records the two events to occur simultaneously at a time we’ve taken to be 0. We’re free to take the time on Bob’s clock at the rear of the carriage to read 0 at this time in Alice’s frame, but then, since from Alice’s perspective Bob’s clocks are approaching her with speed \(v\), we know that Bob’s clock at the front of the carriage, when Alice’s clock there is showing 0, must be already showing a later time which we denote \(T\). This is unprimed as it’s a time displayed by a clock in Bob’s frame. From our discussion of the relativity of simultaneity we know \(T\) must be given by
\begin{equation}
T=\frac{Dv}{c^2}\label{eq:tracktime}
\end{equation}
where \(D\) is the separation of those two clocks in Bob’s frame as measured in Bob’s frame. Alice measures the length of her carriage to be \(L’\). We call the length of an object measured in its rest frame its proper length so both \(L’\) and \(D\) are proper lengths, whereas \(D’\), the distance between Bob’s clocks as measured by Alice is not proper. Note that, of course, \(L’=D’\).

Now let’s consider Bob’s perspective. We now need two pictures corresponding to two different times in Bob’s frame.

From Bob’s perspective, at time 0 the rear of the carriage is located at his rear clock and the carriage clock there also shows zero. At the later time \(T\), the front of the carriage is located at his front clock and the carriage clock there shows 0. Bob sees Alice’s clocks travel with speed \(v\) towards him so we know that the front clock lag’s the clock to the rear by a time given by
\begin{equation*}
\frac{L’v}{c^2}
\end{equation*}
where \(L’\) is the distance between Alice’s clocks, the length of the carriage, as measured in Alice’s frame. Thus, in the first of the two Bob frame snapshots, the rear carriage clock shows 0 whilst the front carriage clock shows \(-T’\), and in the second snapshot, the rear carriage clock shows \(T’\) while the front carriage clock shows 0 where
\begin{equation}
T’=\frac{L’v}{c^2}.\label{eq:traintime}
\end{equation}

Now, we’re going to be interested in the ratio \(T’/T\), the fraction of track frame time recorded by train frame clocks – a ratio of a moving clock time to a stationary clock time. We see immediately from \eqref{eq:tracktime} and \eqref{eq:traintime} that this is the same as \(L’/D\). But recall that \(L’=D’\) so we have
\begin{equation*}
\frac{T’}{T}=\frac{D’}{D}.
\end{equation*}
\(D’/D\) is the ratio of a measurement of a length moving with speed \(v\), \(D’\), to a measurement of a length at rest, \(D\). This ratio must therefore also be equal to \(L/L’\), the length of the carriage as viewed from Bob’s perspective to the (rest-frame) length of the carriage as measured by Alice. So in fact we have
\begin{equation}
\frac{T’}{T}=\frac{D’}{D}=\frac{L}{L’}.
\end{equation}
Now recall that \(D=\gamma^2L\), where \(\gamma=1/\sqrt{1-(v/c)^2}\), from which it follows that
\begin{equation}
{L’}^2=\gamma^2L^2
\end{equation}
or,
\begin{equation}
L=\frac{1}{\gamma}L’.
\end{equation}
This is length contraction! Recall that \(\gamma{>}1\) so that the length of the carriage as measured by Bob is smaller than the carriage’s proper length measured by Alice. It follows also that
\begin{equation}
T’=\frac{1}{\gamma}T.
\end{equation}
This is time dilation! Whilst stationary clocks record a time \(T\), clocks in motion record a shorter time \(T’\) — moving clocks run slow.

The relativity of simultaneity

Following Mermin again, we’ll see how the invariance of the speed of light in all inertial frames leads directly to the relativity of simultaneity. Alice rides a train. In one of the carriages it is arranged to have two photons of light emitted from the center of the carriage, one traveling towards the front and the other towards the back. imageThe events \(E_f\) and \(E_r\) are respectively the photon reaching the front and rear of the carriage. In Alice’s frame of reference these events occur simultaneously — we don’t even have to refer to clock’s in Alice’s frame since we know that light travels at the same speed in all directions.

Now consider the situation from the perspective of a track-side observer, Bob. From his perspective Alice’s train is traveling with a velocity \(v\). Three events take place, first, the photons are emitted from the center 1 of the carriage. Since they still travel at light speed \(c\) in all directions he ‘sees’, that is, the clocks in his latticework record, the event \(E_r\) occurring before the event \(E_f\). Schematically we have,image
This is of course as we’d expect, as the left traveling photon heads to the back of the carriage, the back of the carriage is traveling towards it with velocity \(v\) while as the right traveling photon heads towards the front of the carriage, the front is traveling away from it with speed \(v\). Lets say that in Bob’s frame the length of the train carriage is \(L\). If \(T_r\) is the elapsed time in Bob’s frame between the photons being emitted and the left traveling photon reaching the back of the carriage then we have
\begin{equation}
cT_r=\frac{1}{2}L-vT_r
\label{eq:reartime}
\end{equation}
and after a time \(T_f\) the right traveling photon covers \(cT_f\) given by
\begin{equation}
cT_f=\frac{1}{2}L+vT_f.
\label{eq:fronttime}
\end{equation}
These individual times are not what we’re interested in though. We’re interested in the time difference, let’s call it \(\Delta T\), between the events \(E_r\) and \(E_f\) as observed by Bob, for which we obtain,
\begin{equation*}
c\Delta T=v(T_r+T_f).
\end{equation*}
But the total distance traveled by the photons is \(D=cT_r+cT_f\), the spatial separation Bob observes between the two events. So finally we obtain
\begin{equation*}
\Delta T=\frac{Dv}{c^2}.
\end{equation*}

Two events, \(E_r\) and \(E_f\), which are simultaneous in Alice’s inertial frame of reference, are not simultaneous in Bob’s frame, moving with velocity \(v\) in the direction pointing from \(E_f\) to \(E_r\) relative to Alice’s. In Bob’s frame, the event \(E_r\) occurs a time \(Dv/c^2\) before the event \(E_f\), where \(D\) is the spatial separation of the events as seen by Bob.

Alice’s clocks, that is those synchronised in her frame, will record the events \(E_r\) and \(E_f\) occurring at the same time. Moreover they will also record the fact that Bob’s clocks show the event \(E_f\) occurring a time \(Dv/c^2\) after \(E_r\). Alice’s explanation for this fact will be that Bob’s clocks aren’t properly synchronised. Bob on the other hand says the events aren’t simultaneous and says that Alice’s clocks cannot, therefore, be properly synchronised.

The rule about simultaneous events in one frame not being simultaneous in another can be stated in terms of clocks thus:

If in one frame two clocks are synchronised a distance \(D\) apart, then in another frame, in which these clocks are moving along the line joining them with speed \(v\), the clock in front lags the clock behind by a time of \(Dv/c^2\).

It will be useful in the next post, where we consider some consequences of the relativity of simultaneity, to have a relation between the two lengths \(L\) and \(D\) in Bob’s frame. From \eqref{eq:reartime},
\begin{equation*}
T_r=\frac{L}{2}\frac{1}{c+v},
\end{equation*}
and from \eqref{eq:fronttime},
\begin{equation*}
T_f=\frac{L}{2}\frac{1}{c-v},
\end{equation*}
so that
\begin{align*}
\Delta T&=\frac{L}{2}\frac{2v}{c^2-v^2}\\
&=\frac{L\gamma^2v}{c^2}
\end{align*}
where we have introduced the Lorentz factor, \(\gamma\), defined as
\begin{equation*}
\gamma=\frac{1}{\sqrt{1-(v/c)^2}}.
\end{equation*}
Notice that for \(v{<}c\), \(\gamma{>}1\), and, combining with our previous expression for \(\Delta T\), we conclude that \(L\) and \(D\) are related according to
\begin{equation}
D=\gamma^2L.
\end{equation}

Notes:

  1. We are assuming here of course that the center remains the center but even though, as we’ll soon see, the length of something does change depending on the frame of reference both the front half and back half would change by the same amount!

Velocity addition in special relativity — sometimes \(1+1\neq2\)

There’s a great little book on special relativity by the physicist N. David Mermin in which he gets to the heart of the astonishing consequences of Einstein’s special relativity in a particularly elegant fashion and with only very basic mathematics. In this and the following note we’ll closely follow Mermin’s treatment. The crucial fact of life which we have to come to terms with is that whether or not two events which are spatially separated happen at the same time is a matter of perspective. This flies in the face of our intuition 1. We’re wired to think of time as a kind of universal clock and that we and the rest of the universe march forward with its tick-tock relentlessly and in unison.

Let us begin by reconsidering the relativity of velocities. Our intuition, and Galilean relativity, tells us that if you are riding a train and throw a ball in the direction of travel then to someone stationary with respect to the tracks the speed of the ball is simply the sum of the train’s speed and the speed with which the ball  leaves your hand. But thanks to special relativity we know that, at least for light, this isn’t the case. A photon (particle of light) emitted from a moving train moves at light speed \(c\) with respect to the train and with respect to the tracks. This surely has consequences for the relativity of motion in general.

Following Mermin we employ the neat device of measuring the velocity of an object by racing it against a photon. With a corrected velcocity addition rule as our goal we conduct this race on a train carriage.

image

The particle, black dot, whose velocity \(v\) we seek, sets off from the back of the carriage towards the front in a race with a photon which, as we know, travels at speed \(c\). We arrange that the front of the carriage is mirrored so that once the photon reaches the front it’s reflected back. The point at which the particle and photon meet is recorded (perhaps a mark is made on the floor of the carriage – this is a gedanken experiment!). At that point the particle has travelled a fraction \(1-f\) of the length of the carriage whilst the photon has travelled \(1+f\) times the length of the carriage. The ratio of those distances must be proportional to the ratio of the velocities, that is,
\begin{equation}
\frac{1-f}{1+f}=\frac{v}{c},
\end{equation}
which we can rewrite as an equation for \(f\),
\begin{equation}
f=\frac{c-v}{c+v}\label{eq:f1}.
\end{equation}
The velocity is thus established in an entirely unambiguous manner. This may strike you as a somewhat indirect approach to measuring speed but notice that we’ve avoided measuring either time or distance. As we’ll soon see, in special relativity such measurements are rather more subtle than we might imagine.

Now let’s consider the same race but from the perspective of the track frame relative to which the train carriage is travelling (left to right) with velocity \(u\).

image

We’re after the correct rule for adding the velocity \(v\), of the particle relative to the train, to the velocity \(u\), of the train relative to the track, to give the velocity \(w\), of the particle relative to the track. To facilitate the calculations we’ll allow ourselves to use some lengths and times. However their values aren’t important — as we’ll see they fall out of the final equation. We’re really just using their ‘existence’. As indicated in the diagram, after time \(T_0\) the photon is a distance \(D\) in front of the particle, that is,
\begin{equation}
D=cT_0-wT_0,
\end{equation}
but this distance is then also the sum of the distances covered respectively by the photon and particle in time \(T_1\),
\begin{equation}
D=cT_1+wT_1.
\end{equation}
So we can write the ratio of the times as
\begin{equation}
\frac{T_1}{T_0}=\frac{c-w}{c+w}\label{eq:time-ratio1}.
\end{equation}
If the length of the carriage in the track frame is \(L\) then we also have that the distance covered by the photon in time \(T_0\) is
\begin{equation}
cT_0=L+uT_0
\end{equation}
and in time \(T_1\) is
\begin{equation}
cT_1=fL-uT_1.
\end{equation}
Combining these we eliminate \(L\) to obtain another expression for the ratio of times,
\begin{equation}
\frac{T_1}{T_0}=f\frac{(c-u)}{(c+u)}\label{eq:time-ratio2}.
\end{equation}
The two equations, \eqref{eq:time-ratio1} and \eqref{eq:time-ratio2} provide us with a second equation for \(f\),
\begin{equation}
f=\frac{(c+u)}{(c-u)}\frac{c-w}{c+w},
\end{equation}
which in combination with the first, \eqref{eq:f1}, leads to
\begin{equation}
\frac{c-w}{c+w}=\frac{c-u}{c+u}\frac{c-v}{c+v}\label{eq:velocity-addition1},
\end{equation}
which expresses the velocity \(w\) of the particle in the track frame in terms of the velocity \(u\) of the train in the track frame and the velocity \(v\) of the particle in the train frame. With a bit more work this can be rewritten as
\begin{equation}
w=\frac{u+v}{1+uv/c^2}\label{eq:velocity-addition2},
\end{equation}
which should be compared to the Galilean addition rule, \(w=u+v\).

Here’s a plot, with velocities in units of \(c\), comparing the Galilean with the special relativity velocity addition for an object fired at a speed \(v\) from a train carriage moving at half the speed of light:
velocityadd
Equation \eqref{eq:velocity-addition2} ensures that no matter how fast the particle travels with respect to the train (assuming it’s less than light speed), its velocity with respect to the track is always less than light speed. In the extreme case of a particle traveling at light speed with respect to a train which is also travelling at light speed, \(1+1=1\)!

Events, observers and measurements

In special relativity we often read that such and such an inertial observer measures the time between two events or such and such an inertial observer measures the distance between two events. On the face of it such assertions seem reasonably clear and straightforward and indeed very often their perspicuity is simply taken for granted. But as we’ll see their meanings in relativity are not what we’d expect and therefore its important to establish early exactly what is meant by an ‘observer’ an ‘event’ and what constitutes a measurement.

The adjective ‘inertial’ in ‘inertial observer’ has been dealt with already — whatever or whoever constitutes an observer should be in free-fall. Let’s also be clear that by an ‘event’ we mean a happening, somewhere, sometime, corresponding to a point in spacetime — perhaps a photon of light leaving an emitter or being absorbed by a detector, perhaps a particle passing through a particular point in space, perhaps a time being recorded by a clock at a particular point in space. Events, points in spacetime, are real, they care nothing for coordinate systems, frames of reference etc.

When we introduced the idea of a frame of reference we vaguely mentioned a laboratory in which lengths and times could be measured. Let’s be more concrete now and imagine an inertial frame of reference as a freely floating 3-dimensional latticework of rods and clocks with one node designated as the origin.

All the rods have the same length but the clocks at each node are rather special. Like all good clocks they can of course keep time. In addition though they are programmed with their respective locations with respect to the origin, so in particular they ‘know’ their distance from the origin. Furthermore they are sophisticated recording devices ready to detect any event and record its location and time for future inspection. In particular, this allows them all to be synchronized with the clock at the origin in the following way. A flash of light is sent out from the origin just as the clock there is set to 0. The spherical light front spreads out at the same speed \(c\) in all directions. As each clock in the lattice detects this light it sets its time equal to its distance from the origin divided by \(c\) and is then ‘in sync’ with the clock at the origin. We should imagine this latticework to be ‘fine-grained’ enough to ensure that to any required accuracy a clock is located ‘at’ the spatial location of any event. This is a crucial point. The time assigned to an event, with respect to an inertial frame of reference, is always that of one of the inertial frame’s clocks at the event. The spacetime location of the event is then given by the spatial coordinates of the clock there together with the clock’s time at the moment the event happens and is recorded along with a description of what took place. This would then constitute a ‘measurement’ and the inertial ‘observer’ carrying out the measurement should be thought of as the whole latticework. An observer is better thought of as the all-seeing eye of the entire inertial frame than as somebody located at some specific point in space with a pair of binoculars and a notepad! If we do speak of an observer as a person, and it is convenient and usual to do so, then we really mean such an intelligent latticework of rods and detecting clocks with respect to which that person is at rest.

Shortly we’ll see that when two or more events at different points in space occur simultaneously with respect to one inertial observer, with respect to another they generally occur at different times. Let’s be clear though that if two or more things happen at the same place at the same time then that’s an event and as such its reality is independent of any frame of reference. All observers must agree that it took place even if they assign to it different spacetime coordinates. Sometimes this is obvious. Consider two particles colliding somewhere. Then obviously the collision either took place or it didn’t and the question is merely what spacetime coordinates should be assigned to the location in spacetime of the collision. But other times it might seem a little more confusing. We might say that an observer, let’s call ‘her’ Alice, records that two spatially separated events, for example photons arriving at two different places, occur at the same time. Recall that this really means that at each location a clock records a time corresponding to the event there and these times turn out to be the same, let’s say 2pm. Now the clock striking 2 at a location just as the event takes place there is itself an event and so will be confirmed by any other inertial observer. Let’s call Bob our other observer. He will assign his own times to the two events, and, as we’ll see, he’ll find that his clocks record different times. However, recall that clock’s don’t just tell time — they also record the event — so Bob will certainly confirm that Alice’s clocks both struck 2pm as the photons arrived at those points in spacetime but Bob will conclude that Alice’s clocks aren’t synchronised since from his perspective these two events did NOT occur simultaneously!

Notes:

  1. It’s worth remarking that if we reverse the roles of space and time the corresponding conclusion is not at all surprising. We are entirely comfortable with the fact that whether or not two events which take place at different times occur at the same place is a matter of perspective.