Monthly Archives: September 2017

Tangent vectors

As we’ve already indicated we choose to view \(\RR^n\) as a space of points, \(x=(x^1,\dots,x^n)\), with “arrows” emerging from a point \(x\) in space living in the tangent space \(T_x(\RR^n)\) at \(x\). This is just another copy of \(\RR^n\), now viewed as vector space. For example, at some point \(a\in\RR^n\) we might have vectors \(\mathbf{v}_a,\mathbf{w}_a\in T_a(\RR^n)\) such that \(\mathbf{v}_a+\mathbf{w}_a=(\mathbf{v}+\mathbf{w})_a\). Given two distinct points \(a,b\in\RR^n\), \(T_a(\RR^n)\) and \(T_b(\RR^n)\) are distinct copies of \(\RR^n\). Without some further mechanism by which we could transport a vector from \(T_a(\RR^n)\) to a vector in \(T_b(\RR^n)\) there can be no meaning attached to the sum of a vector in \(T_a(\RR^n)\) and a vector in \(T_b(\RR^n)\).

Working with the space \(\RR^n\) we can safely think of the tangent space at each point as the collection of all arrows at the point. But suppose our space was the surface of a sphere. In this case we have the idea of tangent vectors at a point living in a tangent plane within an ambient space. But what if there was no ambient space? We’re anticipating here the generalisation of the tools of calculus to spaces far more general than \(\RR^n\). With this in mind we’ll consider here more sophisticated characterisations of the notion of tangent vectors. Specifically, we’ll avoid, as far as possible, explicitly exploiting the fact that our underlying space of points, \(\RR^n\), is itself a vector space. Instead we’ll rely on the fact that at any point we have a valid coordinate system through which we can access a vector space structure.

A tangent vector as an equivalence class of curves

A smooth curve in \(\RR^n\) is a smooth map \(\gamma:(\alpha,\beta)\mapto\RR^n\) which we denote simply by \(\gamma(t)\). With respect to some coordinate system \(x^i\), two curves, \(\gamma(t)\) and \(\tilde{\gamma}(t)\), are said to be tangent at a point \(\gamma(t_0)=a=\tilde{\gamma}(t_0)\) if
\begin{equation}
\left.\frac{dx^i(\gamma(t))}{dt}\right|_{t_0}=\left.\frac{dx^i(\tilde{\gamma}(t))}{dt}\right|_{t_0}
\end{equation}
for \(i=1,\dots,n\). Curves are tangent regardless of the coordinate system used. Indeed, suppose instead of \(x^i\) we used some other coordinate system \(y^i\) such that \(y^i=y^i(x^1,\dots,x^n)\) then
\begin{align*}
\left.\frac{dy^i(\gamma(t))}{dt}\right|_{t_0}&=\sum_{j=1}^{n}\frac{\partial y^i}{\partial x^j}\left.\frac{dx^i(\gamma(t))}{dt}\right|_{t_0}\\
&=\sum_{j=1}^{n}\frac{\partial y^i}{\partial x^j}\left.\frac{dx^i(\tilde{\gamma}(t))}{dt}\right|_{t_0}\\
&=\left.\frac{dy^i(\tilde{\gamma}(t))}{dt}\right|_{t_0}
\end{align*}
One definition of a tangent vector, \(\mathbf{v}\), at a point \(a\) is then as the equivalence class \([\gamma]\) of curves tangent to one another at the point \(a\). We can define addition and scalar multiplication by \(\mathbf{v}_1+\mathbf{v}_2=[\gamma_1+\gamma_2]\) and \(c\mathbf{v}=[c\gamma]\). These definitions are clearly exploiting the vector space structure of our space \(\RR^n\) but can easily be tweaked not to do so. The tangent vectors so defined form a real vector space, the tangent space \(T_a(\RR^n)\). This is clearly equivalent to our intuitive notion of vectors as arrows at a point but is applicable even when our space of points is more general than \(\RR^n\).

We can now introduce the directional derivative of a (smooth) function \(f:\RR^n\mapto\RR\), at a point \(a=\gamma(t_0)\), in the direction of a tangent vector \(\mathbf{v}\) according to the definition,
\begin{equation}
D_{\mathbf{v}}f(a)=\left.\frac{df(\gamma(t))}{dt}\right|_{t_0},
\end{equation}
where \(\mathbf{v}\) is the tangent vector corresponding to the equivalence class of curves \([\gamma]\). Note that this does not depend on the representative of the equivalence class chosen since with respect to any coordinate system \(x^i\),
\begin{align*}
\left.\frac{df(\gamma(t))}{dt}\right|_{t_0}&=\left.\sum_{i=1}^n\frac{\partial f}{\partial x^i}\frac{dx^i(\gamma(t))}{dt}\right|_{t_0}\\
&=\left.\sum_{i=1}^n\frac{\partial f}{\partial x^i}\frac{dx^i(\tilde{\gamma}(t))}{dt}\right|_{t_0}\\
&=\left.\frac{df(\tilde{\gamma}(t))}{dt}\right|_{t_0}
\end{align*}
Note also that this corresponds to the usual definition of a directional derivative in \(\RR^n\) as,
\begin{equation}
D_vf(a)=\frac{d}{dt}f(a+t\mathbf{v})|_{t=0},
\end{equation}
by considering the curve in \(\RR^n\) through the point \(a\) defined according to \(t\mapto a+t\mathbf{v}\).

Directional derivatives and derivations

Let us regard the directional derivative, \(D_\mathbf{v}\), as a map, \(D_\mathbf{v}:C^\infty(\RR^n)\mapto\RR\), at any point in \(\RR^n\). Then directional derivatives are examples of derivations according to the following definition.

Definition A map \(X:C^\infty(\RR^n)\mapto\RR\) is called a derivation at a point \(a\in\RR^n\) if it is linear over \(\RR\) and satisfies the Leibniz rule,
\begin{equation}
X(fg)=X(f)g(a)+f(a)X(g).
\end{equation}

To see that \(D_\mathbf{v}\) is a derivation at some point \(a=\gamma(t_0)\in\RR^n\) with \(\mathbf{v}=[\gamma]\),
\begin{align*}
D_\mathbf{v}(f+g)(a)&=\left.\frac{d}{dt}((f+g)(\gamma(t)))\right|_{t_0}\\
&=\left.\frac{df(\gamma(t))}{dt}\right|_{t_0}+\left.\frac{dg(\gamma(t))}{dt}\right|_{t_0}\\
&=D_{\mathbf{v}}f(a)+D_{\mathbf{v}}g(a),
\end{align*}
and for \(c\in\RR\),
\begin{align*}
D_\mathbf{v}(cf)(a)&=\left.\frac{d}{dt}((cf)(\gamma(t)))\right|_{t_0}\\
&=c\left.\frac{df(\gamma(t))}{dt}\right|_{t_0}\\
&=cD_\mathbf{v}(f)(a),
\end{align*}
and
\begin{align*}
D_\mathbf{v}(fg)(a)&=\left.\frac{d}{dt}(f(\gamma(t))g(\gamma(t)))\right|_{t_0}\\
&=\left.\frac{df(\gamma(t))}{dt}g(\gamma(t))\right|_{t_0}+\left.f(\gamma(t))\frac{dg(\gamma(t))}{dt}\right|_{t_0}\\
&=D_\mathbf{v}f(a)g(a)+f(a)D_\mathbf{v}g(a)
\end{align*}

The Leibniz rule is what really captures the essence of differentiation. Let’s consider some of its consequences. Suppose \(f\) is the constant function, \(f(x)=1\). Then for any derivation \(X\), \(X(f)=0\). This follows since \(f=ff\) and by the Leibniz rule, \(X(f)=X(ff)=X(f)f(a)+f(a)X(f)=2X(f)\), so \(Xf=0\). It follows immediately, by linearity of derivations, that \(Xf=0\) for any constant function \(f(x)=c\). Another consequence is that if \(f(a)=g(a)=0\) then \(X(fg)=0\) since \(X(fg)=X(f)g(a)+f(a)X(g)=0\).

It’s straightforward to verify that derivations at a point \(a\in\RR^n\) form a real vector space which we denote by \(\mathcal{D}_a(\RR^n)\). For any coordinate system \(x^i\) the partial derivatives \(\partial/\partial x^i\) are easily seen to be derivations and we’ll now demonstrate that the partial derivatives \(\partial/\partial x^i\) at \(a\) provide a basis for \(\mathcal{D}_a(\RR^n)\). Indeed, from Taylor’s theorem, we know that for any smooth function \(f\), in the neighbourhood of a point \(a\in\RR^n\) and in terms of coordinates \(x^i\),
\begin{equation}
f(x)=f(a)+\sum_{i=1}^n\left.(x^i-a^i)\frac{\partial f}{\partial x^i}\right|_a+\sum_{i,j=1}^n(x^i-a^i)(x^j-a^j)\int_0^1(1-t)\frac{\partial^2f(a+t(x-a))}{\partial x^i\partial x^j}dt.
\end{equation}
Consider applying the derivation \(X\) to \(f\) at \(a\), \(X(f)\).
\begin{equation}
Xf=X(f(a))+\sum_{i=1}^n\left.X((x^i-a^i))\frac{\partial f}{\partial x^i}\right|_a+\sum_{i,j=1}^nX((x^i-a^i)(x^j-a^j))\int_0^1(1-t)\frac{\partial^2f(a+t(x-a))}{\partial x^i\partial x^j}dt
\end{equation}
but \(X(f(a))=0\) since \(f(a)\) is a constant and \(X((x^i-a^i)(x^j-a^j))=0\) since both \((x^i-a^i)\) and \((x^j-a^j)\) are 0 at \(a\). Thus we have that,
\begin{equation}
Xf=\sum_{i=1}^n\left.X(x^i)\frac{\partial f}{\partial x^i}\right|_a.
\end{equation}
In other words \(X=\sum_{i=1}^nX(x^i)\partial/\partial x^i\). Notice that if \(y^i\) is any other coordinate system valid in the neighbourhood of \(a\) then
\begin{equation}
Xy^j=\left.\sum_{i=1}^nX(x^i)\frac{\partial y^j}{\partial x^i}\right|_a
\end{equation}
so that
\begin{align*}
Xf&=\sum_{i=1}^n\left.X(x^i)\frac{\partial f}{\partial x^i}\right|_a\\
&=\sum_{i=1}^n\left.X(x^i)\frac{\partial y^j}{\partial x^i}\frac{\partial f}{\partial y^j}\right|_a\\
&=\sum_{i=1}^nX(y^i)\left.\frac{\partial f}{\partial y^i}\right|_a,
\end{align*}
and the result does not depend on the chosen coordinate system. That the coordinate partials are linearly independent follows since applying \(\sum_ic^i\partial/\partial x^i=0\) to the coordinate functions \(x^i\) in turn yields \(c^i=0\) for all \(i\). It follows from what we’ve observed here that every derivation is the directional derivative to a curve,

So, to any tangent vector \(\mathbf{v}\) is associated the directional derivative \(D_\mathbf{v}\) which is a derivation. Are all derivations directional derivatives? The answer is yes. If we have a derivation \(X\) at a point \(a\) then we know that for any smooth function in a neighbourhood of \(a\), in terms of coordinates \(x^i\),
\begin{equation}
Xf=\sum_{i=1}^n\left.X(x^i)\frac{\partial f}{\partial x^i}\right|_a.
\end{equation}
We also know that the directional derivative of \(f\) in the direction of a tangent vector \(\mathbf{v}=[\gamma]\) at \(a=\gamma(t_0)\) is, again in terms of local coordinates \(x^i\),
\begin{equation}
D_{\mathbf{v}}f(a)=\left.\frac{df(\gamma(t))}{dt}\right|_{t_0}=\sum_{i=1}^n\left.\frac{dx^i(\gamma(t))}{dt}\frac{\partial f}{\partial x^i}\right|_{t_0}.
\end{equation}
So, if we choose a curve \(\gamma\) such that \(\gamma(t_0)=a\) and
\begin{equation}
\left.\frac{dx^i(\gamma(t))}{dt}\right|_{t_0}=X(x^i),
\end{equation}
then \(Xf=D_{\mathbf{v}}f(a)\). Thus, we can just take \(\gamma(t)=(a^1+X(x^1)(t-t_0),\dots,a^n+X(x^n)(t-t_0))\) (even though we’re explicitly relying on the vector space structure of \(\RR^n\) there is nothing essentially different required in the more general setting). Finally, we can ask whether each tangent vector corresponds to a unique derivation. This follows since if \(D_\mathbf{v}=D_\mathbf{w}\), where \(\mathbf{v}=[\gamma_1]\) and \(\mathbf{w}=[\gamma_2]\), then applying this in turn to each coordinate function, \(x^I\), we obtain,
\begin{equation*}
\left.\frac{dx^i(\gamma_1(t)}{dt}\right|_{t_0}=\left.\frac{dx^i(\gamma_2(t)}{dt}\right|_{t_0}
\end{equation*}
so \(\gamma_1\) and \(\gamma_2\) are in the same equivalence class, or, \(\mathbf{v}=\mathbf{w}\). We have therefore proved the following important theorem.

Theorem The vector spaces \(T_a(\RR^n)\) and \(\mathcal{D}_a(\RR^n)\) are isomorphic and under this isomorphism tangent vectors \(v\) map to derivations \(D_v\).

Under this isomorphism the standard basis vectors \(\mathbf{e}_i\) at \(a\) map to the partial derivatives \(\partial/\partial x^i\) at \(a\) and indeed in the more general setting of differential geometry, which the treatment here anticipates, it is usual to treat those partials as the basis “vectors” of the tangent space \(T_a(\RR^n)\). The basis \(\partial/\partial x^i\) is called the coordinate basis. Suppose our space was the plane \(\RR^2\) then the Cartesian coordinate basis would be \((\partial/\partial x,\partial/\partial y)\) corresponding respectively to the standard basis vectors \((\mathbf{e}_x,\mathbf{e}_y)\). If we choose to work with polar coordinates, the coordinate basis would be \((\partial/\partial r,\partial/\partial\theta)\) with
\begin{align*}
\frac{\partial}{\partial r}&=\frac{\partial x}{\partial r}\frac{\partial}{\partial x}+\frac{\partial y}{\partial r}\frac{\partial}{\partial y}\\
&=\cos\theta\frac{\partial}{\partial x}+\sin\theta\frac{\partial}{\partial y}
\end{align*}
and
\begin{align*}
\frac{\partial}{\partial\theta}&=\frac{\partial x}{\partial\theta}\frac{\partial}{\partial x}+\frac{\partial y}{\partial\theta}\frac{\partial}{\partial y}\\
&=-r\sin\theta\frac{\partial}{\partial x}+r\cos\theta\frac{\partial}{\partial y}
\end{align*}
Note that if in \(T_a(\RR^n)\) we adopt the usual Euclidean metric (a non-degenerate, symmetric, positive definite inner product, \((\,,\,)\)), such that the standard basis is orthonormal, \((\mathbf{e}_i,\mathbf{e}_j)=\delta_{ij}\), then the polar basis vectors, \(\partial/\partial r\) and \(\partial/\partial\theta\) of \(T_a(\RR^2)\) are not orthonormal. The corresponding normalised basis vectors would be \(\mathbf{e}_r\) and \(\mathbf{e}_\theta\) defined by
\begin{equation}
\mathbf{e}_r=\frac{\partial}{\partial r}=\cos\theta\mathbf{e}_x+\sin\theta\mathbf{e}_y
\end{equation}
and
\begin{equation}
\mathbf{e}_\theta=\frac{1}{r}\frac{\partial}{\partial\theta}=-\sin\theta\mathbf{e}_x+\cos\theta\mathbf{e}_y.
\end{equation}

In the case of cylindrical coordinates we find
\begin{align}
\frac{\partial}{\partial\rho}&=\cos\varphi\frac{\partial}{\partial x}+\sin\varphi\frac{\partial}{\partial y}\\
\frac{\partial}{\partial\varphi}&=-\rho\sin\varphi\frac{\partial}{\partial x}+\rho\cos\varphi\frac{\partial}{\partial y}\\
\frac{\partial}{\partial z}&=\frac{\partial}{\partial z}
\end{align}
and the corresponding normalised basis vectors are \(\mathbf{e}_\rho\), \(\mathbf{e}_\varphi\) and \(\mathbf{e}_z\) defined by
\begin{equation}
\mathbf{e}_\rho=\frac{\partial}{\partial\rho}=\cos\varphi\mathbf{e}_x+\sin\varphi\mathbf{e}_y,
\end{equation}
\begin{equation}
\mathbf{e}_\varphi=\frac{1}{\rho}\frac{\partial}{\partial\varphi}=-\sin\varphi\mathbf{e}_x+\cos\varphi\mathbf{e}_y,
\end{equation}
and
\begin{equation}
\mathbf{e}_z=\frac{\partial}{\partial z}=\mathbf{e}_z.
\end{equation}

In the case of sperical coordinates we find
\begin{align}
\frac{\partial}{\partial r}&=\cos\varphi\sin\theta\frac{\partial}{\partial x}+\sin\varphi\sin\theta\frac{\partial}{\partial y}+\cos\theta\frac{\partial}{\partial z}\\
\frac{\partial}{\partial\theta}&=r\cos\varphi\cos\theta\frac{\partial}{\partial x}+r\sin\varphi\cos\theta\frac{\partial}{\partial y}-r\sin\theta\frac{\partial}{\partial z}\\
\frac{\partial}{\partial\varphi}&=-r\sin\varphi\sin\theta\frac{\partial}{\partial x}+r\cos\varphi\sin\theta\frac{\partial}{\partial y}
\end{align}
and the corresponding normalised basis vectors are \(\mathbf{e}_r\), \(\mathbf{e}_\theta\) and \(\mathbf{e}_\varphi\) defined by
\begin{equation}
\mathbf{e}_r=\frac{\partial}{\partial r}=\cos\varphi\sin\theta\mathbf{e}_x+\sin\varphi\sin\theta\mathbf{e}_y+\cos\theta\mathbf{e}_z,
\end{equation}
\begin{equation}
\mathbf{e}_\theta=\frac{1}{r}\frac{\partial}{\partial\theta}=\cos\varphi\cos\theta\mathbf{e}_x+\sin\varphi\cos\theta\mathbf{e}_y-\sin\theta\mathbf{e}_z,
\end{equation}
and
\begin{equation}
\mathbf{e}_\varphi=\frac{1}{r\sin\theta}\frac{\partial}{\partial\varphi}=-\sin\varphi\mathbf{e}_x+\cos\varphi\mathbf{e}_y.
\end{equation}

Vector differentiation

Recall that the derivative, \(f'(t)\), of a scalar function of one real variable, \(f(t)\), is defined to be
\begin{equation}
f'(t)=\lim_{h\mapto 0}\frac{f(t+h)-f(t)}{h}.\label{def:one-dim deriv}
\end{equation}
We can also consider functions taking values in \(\RR^n\), \(f:\RR\mapto\RR^n\). In the definition of derivative we’ll then explicitly make use of the vector space nature of \(\RR^n\) and though we won’t in general, it can be useful in this context to denote the image under \(f\) of some \(x\in\RR\) using bold face \(\mathbf{f}(x)\).
\begin{equation}
\mathbf{f}'(x)=\frac{d\mathbf{f}}{dx}=\lim_{h\mapto 0}\frac{\mathbf{f}(x+h)-\mathbf{f}(x)}{h}.
\end{equation}
The vector \(\mathbf{f}(x)\) is nothing but the vector corresponding to the element \(f(x)\in\RR^n\) with respect to the standard basis in \(\RR^n\). The following product rules follow from this definition in the same way as the scalar function product rule,
\begin{align}
\frac{d}{dx}\left(c(x)\mathbf{f}(x)\right)&=c\frac{d\mathbf{f}}{dx}+\frac{dc}{dx}\mathbf{f},\\
\frac{d}{dx}\left(\mathbf{f}(x)\cdot\mathbf{g}(x)\right)&=\mathbf{f}\cdot\frac{d\mathbf{g}}{dx}+\frac{d\mathbf{f}}{dx}\cdot\mathbf{g},\\
\frac{d}{dx}\left(\mathbf{f}(x)\times\mathbf{g}(x)\right)&=\mathbf{f}\times\frac{d\mathbf{g}}{dx}+\frac{d\mathbf{f}}{dx}\times\mathbf{g},
\end{align}
where \(c:\RR\mapto\RR\) and \(g:\RR\mapto\RR^n\) with \(\mathbf{g}(x)\) the vector representation of \(g(x)\) with respect to the standard basis.

More generally, we can consider vector valued functions \(f:\RR^n\mapto\RR^m\) such that points \(x\in\RR^n\) are mapped to points \(f(x)=(f^1(x),\dots,f^m(x))\) of \(\RR^m\) where we have here introduced the component functions, \(f^i:\RR^n\mapto\RR\), of \(f\). Such a function \(f\) is said to be differentiable at \(a\) if there exists a linear map \(J_f(a):\RR^n\mapto\RR^m\) such that
\begin{equation}
\lim_{h\mapto0}\frac{|f(a+h)-f(a)-J_f(a)h|}{|h|}=0\label{eq:genderiv}
\end{equation}
where \(||\) is the appropriate length (for \(\RR^m\) in the numerator and \(\RR^n\) in the denominator). In this case, \(J_f(a)\) is called the derivative (sometimes total derivative) of \(f\) at \(a\). Introducing \(R(h)\in\RR^m\) as \(R(h)=f(a+h)-f(a)-J_f(a)h\) we can interpret \eqref{eq:genderiv} as saying that
\begin{equation}
f(a+h)=f(a)+J_f(a)h+R(h)
\end{equation}
where the “remainder” \(R(h)\) is such that \(\lim_{h\mapto0}|R(h)|/|h|=0\) and so can interpret \(J_f(a)\) as linearly approximating \(f(a+h)-f(a)\) near \(a\). Perhaps not surprisingly it turns out that, if \(f\) is differentiable at \(a\), then with respect to the standard bases of \(\RR^n\) and \(\RR^m\) the matrix of the linear map \(J_f(a)\), \(\mathbf{J}_f(a)\), has elements given by the partial derivatives,
\begin{equation}
{J_f(a)}_i^j=\frac{\partial f^j}{\partial x^i}(a).
\end{equation}
To see this, note that, if it exists, the \(i\)th partial derivative of \(f^j\) at \(a\) is given by
\begin{equation}
\partial_if^j(a)=\lim_{\epsilon\mapto0}\frac{f^j(a+\epsilon e_i)-f^j(a)}{\epsilon}.
\end{equation}
where \(e_i\) is \(i\)th standard basis element of \(\RR^n\). Now, recalling the definition of the remainder \(R(h)\in\RR^m\), we have that, with respect to the standard basis of \(\RR^m\) the \(j\)th component of \(R(\epsilon e_i)\) is \(R^j(\epsilon e_i)=f^j(a+\epsilon e_i) – f^j(a)-{J_f(a)}_i^j\epsilon\). Therefore we can write
\begin{align*}
\partial_if^j(a)&=\lim_{\epsilon\mapto0}\frac{f^j(a+\epsilon e_i)-f^j(a)}{\epsilon}=\lim_{\epsilon\mapto0}\frac{{J_f(a)}_i^j\epsilon+R^j(\epsilon e_i)}{\epsilon}\\
&={J_f(a)}_i^j+\lim_{\epsilon\mapto0}\frac{R^j(\epsilon e_i)}{\epsilon}\\
&={J_f(a)}_i^j.
\end{align*}
The converse also holds. That is, if all the component functions of a function \(f:\RR^n\mapto\RR^m\) are differentiable at a point \(a\in\RR^n\), then \(f\) is differentiable at \(a\). Thus, we have that a function \(f:\RR^n\mapto\RR^m\) is differentiable at \(a\) if and only if all its component functions are differentiable at \(a\). In this case, with respect to the standard bases of \(\RR^n\) and \(\RR^m\), the matrix of the derivative of \(f\), \(\mathbf{J}_f(a)\), the matrix of partial derivatives of the component functions at \(a\). This matrix is called the Jacobian matrix of \(f\) at \(a\).

A function \(f:\RR^n\mapto\RR^m\) is said to be smooth if all its component functions are smooth. A smooth function \(f\) between open sets of \(\RR^n\) and \(\RR^m\) is called a diffeomorphism if it is bijective and its inverse function is also smooth. We will consider the inevitability of functions in the section on the inverse function theorem.

The derivative of a composition of maps \(f:\RR^n\mapto\RR^m\), and \(g:\RR^m\mapto\RR^p\), \(g\circ f\), at a point \(a\in\RR^n\), that is, the generalisation of the familiar chain rule, is then given my the matrix product of the respective Jacobian matrices,
\begin{equation}
J_{g\circ f}(a)=J_g(f(a))J_f(a).
\end{equation}

Example Suppose \(f:\RR^n\mapto\RR^m\) is a linear map whose matrix representation with respect to the standard bases of \(\RR^n\) and \(\RR^m\) is \(\mathbf{f}\). Then \(f\) maps \(\mathbf{x}\mapsto\mathbf{f}\mathbf{x}\) so clearly \(\mathbf{J}_f=\mathbf{f}\).

Example
As we discussed earlier, the relationship between polar and cartesian coordinates can be described through a map from \(\RR^2\mapto\RR^2\) given by
\begin{equation}
\begin{pmatrix}r\\\theta\end{pmatrix}\mapsto\begin{pmatrix}r\cos\theta\\r\sin\theta\end{pmatrix},
\end{equation}
the domain of which we take to be \((0,\infty)\times[0,2\pi)\subset\RR^2\). We typically write the components of this map as \(x(r,\theta)=r\cos\theta\) and \(y(r,\theta)=r\sin\theta\). The Jacobian matrix for the polar coordinate map is then
\begin{equation}
\begin{pmatrix}
\partial x/\partial r&\partial x/\partial\theta\\
\partial y/\partial r&\partial y/\partial\theta
\end{pmatrix}=
\begin{pmatrix}
\cos\theta&-r\sin\theta\\
\sin\theta& r\cos\theta
\end{pmatrix}.
\end{equation}
Likewise, cylindrical coordinates are related to cartesian coordinate through a map from \(\RR^3\mapto\RR^3\) given by
\begin{equation}
\begin{pmatrix}
\rho\\\phi\\z\end{pmatrix}\mapsto\begin{pmatrix}
\rho\cos\phi\\\rho\sin\phi\\z
\end{pmatrix}.
\end{equation}
In this case the domain is taken to be \((0,\infty)\times[0,2\pi)\times\RR\subset\RR^3\) and the Jacobian matrix is
\begin{equation}
\begin{pmatrix}
\cos\phi&-\rho\sin\phi&0\\
\sin\phi&\rho\cos\phi&0\\
0&0&1
\end{pmatrix}.
\end{equation}
For spherical polar coordinates the \(\RR^3\mapto\RR^3\) map is
\begin{equation}
\begin{pmatrix}
r\\\theta\\\phi
\end{pmatrix}\mapsto\begin{pmatrix}
r\sin\theta\cos\phi\\
r\sin\theta\sin\phi\\
r\cos\theta
\end{pmatrix},
\end{equation}
we take the domain to be \((0,\infty)\times(0,\pi)\times[0,2\pi)\subset\RR^3\) and with Jacobian matrix
\begin{equation}
\begin{pmatrix}
\sin\theta\cos\phi&r\cos\theta\cos\phi&-r\sin\theta\sin\phi\\
\sin\theta\sin\phi&r\cos\theta\sin\phi&r\sin\theta\cos\phi\\
\cos\theta&-r\sin\theta&0\\
\end{pmatrix}.
\end{equation}

Time dilation and length contraction

In the previous post we learnt that if in one frame two clocks are synchronised a distance \(D\) apart, then in another frame in which these clocks are moving along the line joining them with speed \(v\), the clock in front lags the clock behind by a time of \(Dv/c^2\). Let’s now think more about the contrasting perspectives of Alice, riding a train, and Bob, track side, thinking in particular about their respective clock and length readings.

The following picture sums up Alice’s perspective:

Here and below, clocks in either Alice’s or Bob’s frame are denoted by the rounded rectangles with times displayed within. Lengths and times in Alice’s train frame will be denoted by primed symbols, those in Bob’s track frame, unprimed. Above we see that Alice records the two events to occur simultaneously at a time we’ve taken to be 0. We’re free to take the time on Bob’s clock at the rear of the carriage to read 0 at this time in Alice’s frame, but then, since from Alice’s perspective Bob’s clocks are approaching her with speed \(v\), we know that Bob’s clock at the front of the carriage, when Alice’s clock there is showing 0, must be already showing a later time which we denote \(T\). This is unprimed as it’s a time displayed by a clock in Bob’s frame. From our discussion of the relativity of simultaneity we know \(T\) must be given by
\begin{equation}
T=\frac{Dv}{c^2}\label{eq:tracktime}
\end{equation}
where \(D\) is the separation of those two clocks in Bob’s frame as measured in Bob’s frame. Alice measures the length of her carriage to be \(L’\). We call the length of an object measured in its rest frame its proper length so both \(L’\) and \(D\) are proper lengths, whereas \(D’\), the distance between Bob’s clocks as measured by Alice is not proper. Note that, of course, \(L’=D’\).

Now let’s consider Bob’s perspective. We now need two pictures corresponding to two different times in Bob’s frame.

From Bob’s perspective, at time 0 the rear of the carriage is located at his rear clock and the carriage clock there also shows zero. At the later time \(T\), the front of the carriage is located at his front clock and the carriage clock there shows 0. Bob sees Alice’s clocks travel with speed \(v\) towards him so we know that the front clock lag’s the clock to the rear by a time given by
\begin{equation*}
\frac{L’v}{c^2}
\end{equation*}
where \(L’\) is the distance between Alice’s clocks, the length of the carriage, as measured in Alice’s frame. Thus, in the first of the two Bob frame snapshots, the rear carriage clock shows 0 whilst the front carriage clock shows \(-T’\), and in the second snapshot, the rear carriage clock shows \(T’\) while the front carriage clock shows 0 where
\begin{equation}
T’=\frac{L’v}{c^2}.\label{eq:traintime}
\end{equation}

Now, we’re going to be interested in the ratio \(T’/T\), the fraction of track frame time recorded by train frame clocks – a ratio of a moving clock time to a stationary clock time. We see immediately from \eqref{eq:tracktime} and \eqref{eq:traintime} that this is the same as \(L’/D\). But recall that \(L’=D’\) so we have
\begin{equation*}
\frac{T’}{T}=\frac{D’}{D}.
\end{equation*}
\(D’/D\) is the ratio of a measurement of a length moving with speed \(v\), \(D’\), to a measurement of a length at rest, \(D\). This ratio must therefore also be equal to \(L/L’\), the length of the carriage as viewed from Bob’s perspective to the (rest-frame) length of the carriage as measured by Alice. So in fact we have
\begin{equation}
\frac{T’}{T}=\frac{D’}{D}=\frac{L}{L’}.
\end{equation}
Now recall that \(D=\gamma^2L\), where \(\gamma=1/\sqrt{1-(v/c)^2}\), from which it follows that
\begin{equation}
{L’}^2=\gamma^2L^2
\end{equation}
or,
\begin{equation}
L=\frac{1}{\gamma}L’.
\end{equation}
This is length contraction! Recall that \(\gamma{>}1\) so that the length of the carriage as measured by Bob is smaller than the carriage’s proper length measured by Alice. It follows also that
\begin{equation}
T’=\frac{1}{\gamma}T.
\end{equation}
This is time dilation! Whilst stationary clocks record a time \(T\), clocks in motion record a shorter time \(T’\) — moving clocks run slow.

The relativity of simultaneity

Following Mermin again, we’ll see how the invariance of the speed of light in all inertial frames leads directly to the relativity of simultaneity. Alice rides a train. In one of the carriages it is arranged to have two photons of light emitted from the center of the carriage, one traveling towards the front and the other towards the back. The events \(E_f\) and \(E_r\) are respectively the photon reaching the front and rear of the carriage. In Alice’s frame of reference these events occur simultaneously — we don’t even have to refer to clock’s in Alice’s frame since we know that light travels at the same speed in all directions.

Now consider the situation from the perspective of a track-side observer, Bob. From his perspective Alice’s train is traveling with a velocity \(v\). Three events take place, first, the photons are emitted from the center ¹ of the carriage. Since they still travel at light speed \(c\) in all directions he ‘sees’, that is, the clocks in his latticework record, the event \(E_r\) occurring before the event \(E_f\). Schematically we have,
This is of course as we’d expect, as the left traveling photon heads to the back of the carriage, the back of the carriage is traveling towards it with velocity \(v\) while as the right traveling photon heads towards the front of the carriage, the front is traveling away from it with speed \(v\). Lets say that in Bob’s frame the length of the train carriage is \(L\). If \(T_r\) is the elapsed time in Bob’s frame between the photons being emitted and the left traveling photon reaching the back of the carriage then we have
\begin{equation}
cT_r=\frac{1}{2}L-vT_r
\label{eq:reartime}
\end{equation}
and after a time \(T_f\) the right traveling photon covers \(cT_f\) given by
\begin{equation}
cT_f=\frac{1}{2}L+vT_f.
\label{eq:fronttime}
\end{equation}
These individual times are not what we’re interested in though. We’re interested in the time difference, let’s call it \(\Delta T\), between the events \(E_r\) and \(E_f\) as observed by Bob, for which we obtain,
\begin{equation*}
c\Delta T=v(T_r+T_f).
\end{equation*}
But the total distance traveled by the photons is \(D=cT_r+cT_f\), the spatial separation Bob observes between the two events. So finally we obtain
\begin{equation*}
\Delta T=\frac{Dv}{c^2}.
\end{equation*}

Two events, \(E_r\) and \(E_f\), which are simultaneous in Alice’s inertial frame of reference, are not simultaneous in Bob’s frame, moving with velocity \(v\) in the direction pointing from \(E_f\) to \(E_r\) relative to Alice’s. In Bob’s frame, the event \(E_r\) occurs a time \(Dv/c^2\) before the event \(E_f\), where \(D\) is the spatial separation of the events as seen by Bob.

Alice’s clocks, that is those synchronised in her frame, will record the events \(E_r\) and \(E_f\) occurring at the same time. Moreover they will also record the fact that Bob’s clocks show the event \(E_f\) occurring a time \(Dv/c^2\) after \(E_r\). Alice’s explanation for this fact will be that Bob’s clocks aren’t properly synchronised. Bob on the other hand says the events aren’t simultaneous and says that Alice’s clocks cannot, therefore, be properly synchronised.

The rule about simultaneous events in one frame not being simultaneous in another can be stated in terms of clocks thus:

If in one frame two clocks are synchronised a distance \(D\) apart, then in another frame, in which these clocks are moving along the line joining them with speed \(v\), the clock in front lags the clock behind by a time of \(Dv/c^2\).

It will be useful in the next post, where we consider some consequences of the relativity of simultaneity, to have a relation between the two lengths \(L\) and \(D\) in Bob’s frame. From \eqref{eq:reartime},
\begin{equation*}
T_r=\frac{L}{2}\frac{1}{c+v},
\end{equation*}
and from \eqref{eq:fronttime},
\begin{equation*}
T_f=\frac{L}{2}\frac{1}{c-v},
\end{equation*}
so that
\begin{align*}
\Delta T&=\frac{L}{2}\frac{2v}{c^2-v^2}\\
&=\frac{L\gamma^2v}{c^2}
\end{align*}
where we have introduced the Lorentz factor, \(\gamma\), defined as
\begin{equation*}
\gamma=\frac{1}{\sqrt{1-(v/c)^2}}.
\end{equation*}
Notice that for \(v{<}c\), \(\gamma{>}1\), and, combining with our previous expression for \(\Delta T\), we conclude that \(L\) and \(D\) are related according to
\begin{equation}
D=\gamma^2L.
\end{equation}

Notes:

We are assuming here of course that the center remains the center but even though, as we’ll soon see, the length of something does change depending on the frame of reference both the front half and back half would change by the same amount! ↩

The Problem of Outcomes

an Institute for Enquiring Minds production