Category Archives: Differential Geometry

Differential forms

The cotangent space

As discussed, at each point \(a\) of our space \(\RR^n\), is a tangent space, \(T_a(\RR^n)\), consisting of the tangent vectors at \(a\), or, intuitively, the “arrows at \(a\)”. Let us now consider the dual space, \(T_a(\RR^n)^*\), consisting of linear functionals on \(T_a(\RR^n)\). We call this the cotangent space at \(a\), or the space of 1-forms at \(a\), and define the differential of a function \(f:\RR^n\mapto\RR\), \(df\), to be the element of \(T_a(\RR^n)^*\) such that for any tangent vector \(\mathbf{v}\in T_a(\RR^n)\),
\begin{equation}
df(\mathbf{v})=\mathbf{v}(f).
\end{equation}
With respect to the coordinate basis of \(T_a(\RR^n)\) we then have,
\begin{equation}
df\left(\sum_{i=1}^{n}v^i\left.\frac{\partial}{\partial x^i}\right|_a\right)=\sum_{i=1}^{n}v^i\left.\frac{\partial f}{\partial x^i}\right|_a
\end{equation}
In particular, we see that the differentials, \(dx^i\), of the coordinate functions, \(x^i\), are dual basis vectors to the \(\partial/\partial x^i\),
\begin{equation}
dx^i(\partial/\partial x^j)=\delta^i_j.
\end{equation}
Thus any 1-form, \(\alpha\), at \(a\) may be written in the form,
\begin{equation}
\alpha=\sum_{i=1}^na_idx^i.
\end{equation}
In particular, the differential \(df\) is just,
\begin{equation}
df=\sum_{i=1}^n\frac{\partial f}{\partial x^i}dx^i.
\end{equation}

Just as we defined a smooth vector field on \(\RR^n\) as a smooth assignment of a tangent vector to each point of \(\RR^n\), we define a smooth 1-form field, \(\alpha\), on \(\RR^n\) as a smooth assignment of a 1-form to each point of \(\RR^n\),
\begin{equation}
\alpha=\sum_{i=1}^n\alpha_idx^i
\end{equation}
where the \(\alpha^i\) are smooth functions, \(\alpha_i:\RR^n\mapto\RR\).

Tensors on \(\RR^n\)

At any point of \(a\in\RR^n\) we have defined the tangent space, \(T_a(\RR^n)\), and its dual space of 1-forms, \(T_a(\RR^n)^*\). From these two spaces we may then, using the machinery developed in the notes on multilinear algebra, build tensors of rank \((r,s)\) living in,
\begin{equation}
\underbrace{T_a(\RR^n)\otimes\dots\otimes T_a(\RR^n)}_r\otimes\underbrace{T_a(\RR^n)^*\otimes\dots\otimes T_a(\RR^n)^*}_s,
\end{equation}
where \(r\) is the contravariant rank and \(s\) the covariant rank. Assigning, in a smooth manner, a rank \((r,s)\) tensor to every point of \(\RR^n\) we then have a smooth tensor field on \(\RR^n\). An important example is the rank \((0,2)\) tensor field called the metric tensor,
\begin{equation}
g=\sum_{i,j=1}^ng_{ij}dx^i\otimes dx^j
\end{equation}
where the smooth functions \(g_{ij}\) are such that at every point \(a\in\RR^n\), \(g_{ij}(a)=g_{ji}(a)\) and \(\det(g_{ij}(a))\neq0\). Such a tensor then provides a symmetric non-degenerate bilinear form \((\,,\,)\) on the tangent space \(T_a(\RR^n)\) according to the definition,
\begin{equation}
(v,w)=g(v,w)
\end{equation}
so that, in terms of a coordinate basis \({\partial/\partial x^i}\),
\begin{equation}
\left(\frac{\partial}{\partial x^i},\frac{\partial}{\partial x^j}\right)=g_{ij}.
\end{equation}
Note that in terms of another coordinate basis \({\partial/\partial y^i}\) we have
\begin{equation}
\left(\frac{\partial}{\partial y^i},\frac{\partial}{\partial y^j}\right)=\left(\sum_s\frac{\partial x^s}{\partial y^i}\frac{\partial}{\partial x^s},\sum_t\frac{\partial x^t}{\partial y^j}\frac{\partial}{\partial x^t}\right)=\sum_{s,t}\frac{\partial x^s}{\partial y^i}\frac{\partial x^t}{\partial y^j}g_{st}.
\end{equation}
Writing \(g_ij(x)\) for the components of the metric tensor with respect to coordinates \(x^i\) and \(g_ij(y)\) the components with respect to another coordinate system \(y^i\) we have
\begin{equation}
g_{ij}(y)=\sum_{s,t}\frac{\partial x^s}{\partial y^i}\frac{\partial x^t}{\partial y^j}g_{st}(x)
\end{equation}

Example In \(\RR^3\) the Euclidean metric, with respect to the Cartesian coordinate basis, is given by \(g_{ij}=\delta_{ij}\). In matrix form we have
\begin{equation}
(g_{ij}(x,y,z))=\begin{pmatrix}
1&0&0\\
0&1&0\\
0&0&1
\end{pmatrix}
\end{equation}
Recall that cylindrical coordinates, \((\rho,\varphi,z)\), are related to Cartesians according to,
\begin{equation}
x=\rho\cos\varphi,\quad y=\rho\sin\varphi,\quad z=z,
\end{equation}
so that, for example,
\begin{align*}
g_{11}(\rho,\varphi,z)&=\left(\frac{\partial x}{\partial\rho}\right)^2+\left(\frac{\partial y}{\partial\rho}\right)^2+\left(\frac{\partial z}{\partial\rho}\right)^2\\
&=\cos^2\varphi+\sin^2\varphi\\
&=1,
\end{align*}
\begin{align*}
g_{12}(\rho,\varphi,z)&=\left(\frac{\partial x}{\partial\rho}\right)\left(\frac{\partial x}{\partial\varphi}\right)+\left(\frac{\partial y}{\partial\rho}\right)\left(\frac{\partial y}{\partial\varphi}\right)+\left(\frac{\partial z}{\partial\rho}\right)\left(\frac{\partial z}{\partial\varphi}\right)\\
&=-\rho\sin\varphi\cos\varphi+\rho\sin\varphi\cos\varphi\\
&=0,
\end{align*}
whilst
\begin{align*}
g_{22}(\rho,\varphi,z)&=\left(\frac{\partial x}{\partial\varphi}\right)^2+\left(\frac{\partial y}{\partial\varphi}\right)^2+\left(\frac{\partial z}{\partial\varphi}\right)^2\\
&=\rho^2\sin^2\varphi+\rho^2\cos^2\varphi\\
&=\rho^2.
\end{align*}
Computing in this way we establish the metric in cylindrical polar coordinates as,
\begin{equation}
(g_{ij}(\rho,\varphi,z))=\begin{pmatrix}
1&0&0\\
0&\rho^2&0\\
0&0&1
\end{pmatrix}
\end{equation}
Recall that spherical polar coordinates, \((r,\theta,\varphi)\), are related to Cartesian coordinates by,
\begin{equation}
x=r\cos\varphi\sin\theta,\quad y=r\sin\varphi\sin\theta,\quad z=r\cos\theta,
\end{equation}
so that, for example,
\begin{align*}
g_{11}(r,\theta,\varphi)&=\left(\frac{\partial x}{\partial r}\right)^2+\left(\frac{\partial y}{\partial r}\right)^2+\left(\frac{\partial z}{\partial r}\right)^2\\
&=\cos^2\varphi\sin^2\theta+\sin^2\varphi\sin^2\theta+\cos^2\theta\\
&=1,
\end{align*}
\begin{align*}
g_{12}(r,\theta,\varphi)&=\left(\frac{\partial x}{\partial r}\right)\left(\frac{\partial x}{\partial\theta}\right)+\left(\frac{\partial y}{\partial r}\right)\left(\frac{\partial y}{\partial\theta}\right)+\left(\frac{\partial z}{\partial r}\right)\left(\frac{\partial z}{\partial\theta}\right)\\
&=r\cos^2\varphi\sin\theta\cos\theta+r\sin^2\varphi\sin\theta\cos\theta-r\cos\theta\sin\theta\\
&=0,
\end{align*}
whilst,
\begin{align*}
g_{22}(r,\theta,\varphi)&=\left(\frac{\partial x}{\partial\theta}\right)^2+\left(\frac{\partial y}{\partial\theta}\right)^2+\left(\frac{\partial z}{\partial\theta}\right)^2\\
&=r^2\cos^2\varphi\cos^2\theta+r^2\sin^2\varphi\cos^2\theta+r^2\sin^2\theta\\
&=r^2.
\end{align*}
Computing in this way we establish the metric in spherical polar coordinates as,
\begin{equation}
(g_{ij}(r,\theta,\varphi))=\begin{pmatrix}
1&0&0\\
0&r^2&0\\
0&0&r^2\sin^2\theta
\end{pmatrix}
\end{equation}

Example Minkowski space is \(\RR^4\) endowed with the Lorentzian metric. Given coordinates, \((x^0=t,x^1=x,x^2=y,x^3=z)\) (units chosen so that the speed of light \(c=1\)), then
\begin{equation}
g_{ij}=\begin{cases}
-1&i=j=0\\
1&i=j=1,2,3\\
0&\text{otherwise}
\end{cases}
\end{equation}
This is sometimes written as \(ds^2=-dt^2+dx^2+dy^2+dz^2\).

The gradient

Recall that if we have a non-degenerate inner product, \((\,,\,)\), on a vector space \(V\) then there is a natural isomorphism \(V^*\cong V\) such that any \(f\in V^*\) corresponds to a vector \(v_f\in V\) such that for all \(v\in V\), \(f(v)=(v_f,v)\).

Definition Given a non-degenerate inner product, \((\,,\,)\), on the the tangent space, \(T_a(\RR^n)\), the gradient vector of a function \(f:\RR^n\mapto\RR\) is defined to be, \(\textbf{grad}f=\nabla f\), such that
\begin{equation}
(\nabla f,v)=df(v)
\end{equation}
for all \(v\in T_a(\RR^n)\).

Suppose our space is \(\RR^3\), then in terms of the coordinate basis we have
\begin{equation*}
\left(\sum_i(\nabla f)^i\frac{\partial}{\partial x^i},\frac{\partial}{\partial x^j}\right)=\sum_i(\nabla f)^ig_{ij}=\frac{\partial f}{\partial x^j}
\end{equation*}
so that
\begin{equation}
(\nabla f)^i=\sum_jg^{ij}(x)\frac{\partial f}{\partial x^j}
\end{equation}
where \(g^{ij}(x)\) is the inverse of the matrix of the (Euclidean) metric tensor, \((g_{ij}(x))\), in terms of the coordinate system, \(x^i\). Thus in Cartesian coordinates we have
\begin{equation}
\nabla f=\frac{\partial f}{\partial x}\mathbf{e}_x+\frac{\partial f}{\partial y}\mathbf{e}_y+\frac{\partial f}{\partial z}\mathbf{e}_z.
\end{equation}
In cylindrical coordinates we have
\begin{equation}
\nabla f=\frac{\partial f}{\partial\rho}\frac{\partial}{\partial\rho}+\frac{1}{\rho^2}\frac{\partial f}{\partial\varphi}\frac{\partial}{\partial\varphi}+\frac{\partial f}{\partial z}\frac{\partial}{\partial z}
\end{equation}
or, in terms of the unit vectors, \(\mathbf{e}_\rho\), \(\mathbf{e}_\varphi\) and \(\mathbf{e}_z\),
\begin{equation}
\nabla f=\frac{\partial f}{\partial\rho}\mathbf{e}_\rho+\frac{1}{\rho}\frac{\partial f}{\partial\varphi}\mathbf{e}_\varphi+\frac{\partial f}{\partial z}\mathbf{e}_z.
\end{equation}
Finally, in spherical coordinates we have
\begin{equation}
\nabla f=\frac{\partial f}{\partial r}\frac{\partial}{\partial r}+\frac{1}{r^2}\frac{\partial f}{\partial\theta}\frac{\partial}{\partial\theta}+\frac{1}{r^2\sin^2\theta}\frac{\partial f}{\partial\varphi}\frac{\partial}{\partial\varphi}
\end{equation}
or, in terms of the unit vectors, \(\mathbf{e}_r\), \(\mathbf{e}_\theta\) and \(\mathbf{e}_\varphi\),
\begin{equation}
\nabla f=\frac{\partial f}{\partial\rho}\mathbf{e}_r+\frac{1}{r}\frac{\partial f}{\partial\varphi}\mathbf{e}_\theta+\frac{1}{r\sin\theta}\frac{\partial f}{\partial\varphi}\mathbf{e}_\varphi.
\end{equation}

In \(\RR^n\) with the Euclidean metric the Cauchy-Schwarz inequality tells us that for some unit vector \(\mathbf{v}\) at a point \(a\in\RR^n\),
\begin{equation}
\mathbf{v}(f)=(\nabla f,v)\leq\norm{\nabla f}
\end{equation}
so that the greatest rate of change of \(f\) at some point is in the direction of \(\nabla f\). By a level surface we will mean the surface specified by the points \(x\in\RR^n\) such that \(f(x)=c\) for some \(c\in\RR\). Consider a curve, \(\gamma(t)\), in this level surface. Then a tangent vector, \(\mathbf{v}_{\gamma(t_0)}\), to this curve at \(\gamma(t_0)\) is such that \(\mathbf{v}_{\gamma(t_0)}(f)=0\), that is, \(\nabla f,\mathbf{v}_{\gamma(t_0)})=0\). The gradient vector is orthogonal to the level surface.

Vector fields and integral curves

Vector fields

Definition A vector field \(\mathbf{v}\) on the space \(\RR^n\) is an assignment of a tangent vector \(\mathbf{v}_a\in T_a(\RR^n)\) to each point \(a\in\RR^n\). Since each tangent space \(T_a(\RR^n)\) has a coordinate basis \({\partial/\partial x^i|_a}\), at each point \(a\) we can write
\begin{equation}
\mathbf{v}_a=\sum_{i=1}^nv^i(a)\left.\frac{\partial}{\partial x^i}\right|_a
\end{equation}
or,
\begin{equation}
\mathbf{v}=\sum_{i=1}^nv^i\frac{\partial}{\partial x^i}
\end{equation}
where the \(v^i\) are functions \(v^i:\RR^n\mapto\RR\). The vector field is said to be smooth if the functions \(v^i\) are smooth.

Example On the space \(\RR^2-{0}\) we can visualise the vector field defined by,
\begin{equation}
\mathbf{v}=\frac{-y}{\sqrt{x^2+y^2}}\frac{\partial}{\partial x}+\frac{x}{\sqrt{x^2+y^2}}\frac{\partial}{\partial y},
\end{equation}
as,

Example On the space \(\RR^2\) the vector field defined as
\begin{equation}
\mathbf{v}=x\frac{\partial}{\partial x}-y\frac{\partial}{\partial y}
\end{equation}
can be visualised as

The vector fields on \(\RR^n\) and the derivations of the (algebra of) smooth functions on \(\RR^n\) are isomorphic as vector spaces. Note that a derivation, \(X\), of \(C^\infty(\RR^n)\) is a linear map \(X:C^\infty(\RR^n)\mapto C^\infty(\RR^n)\) such that the Leibniz rule,
\begin{equation}
X(fg)=(Xf)g+f(Xg)
\end{equation}
is satisfied for all \(f,g\in C^\infty(\RR^n)\).

Vector fields and ODEs — integral curves

Consider a fluid in motion such that its “flow” is independent of time. The path of a single particle would trace out a path in space — a curve, \(\gamma(t)\) say, parameterised by time. The velocity of such a particle, say at \(\gamma(0)\), is the tangent vector \(d\gamma(t)/dt|_0\). The “flow” of the whole system could be modelled by a 1-parameter family of maps \(\phi_t:\RR^3\mapto\RR^3\) such that \(\phi_t(a)\) is the location of a particle a time \(t\) after it was located at the point \(a\), in other words, \(\phi_t(a)\) is the curve \(\gamma\) such that \(\gamma(0)=a\) and \(\gamma(t)=\phi_t(a)\). Since the flow is stationary we have that
\begin{equation}
\phi_{s+t}(a)=\phi_s(\phi_t(a))=\phi_t(\phi_s(a)).
\end{equation}
Also,
\begin{equation}
\phi_{-t}(\phi_t(a))=a,
\end{equation}
where we understand \(\phi_{-t}(a)\) to mean the location of a particle a time \(t\) before it was at \(a\). So, understanding \(\phi_0\) to be the identity map and \(\phi_{-t}=\phi_t^{-1}\) we have a 1-paramter group of maps, which, assuming they are smooth, are each diffeomorphisms, \(\phi_t:\RR^3\mapto\RR^3\), collectively called a flow. Given such a flow we obtain a velocity (vector) field as
\begin{equation}
\mathbf{v}_a=\left.\frac{d\phi_t(a)}{dt}\right|_0.
\end{equation}
An individual curve \(\gamma\) in the flow is then called an integral curve through \(a\) of the vector field \(\mathbf{v}\). All this generalises to \(n\)-dimensions. A flow \(\phi_t:\RR^n\mapto\RR^n\) gives rise to a (velocity) vector field on \(\RR^n\). Conversely, suppose we have some vector field, \(\mathbf{v}\), then we can wonder about the existence of integral curves through the points of our space having the vectors of the vector field as tangents. Such an integral curve would have to satisfy,
\begin{equation}
\mathbf{v}_{\gamma(0)}(f)=\left.\frac{df(\gamma(t))}{dt}\right|_0
\end{equation}
for any function, \(f\), so that considering in turn the coordinate functions, \(x^i\), we have a system of differential equations,
\begin{equation}
\frac{dx^i(t)}{dt}=v^i(x^1(t),\dots,x^n(t))
\end{equation}
where the \(v^i\) are the components of the vector field and \(x^i(t)=x^i(\gamma(t))\). The theorem on the existence and uniqueness of the solution of this system of equations and hence of the corresponding integral curves and flow is the following.

Theorem If \(\mathbf{v}\) is a smooth vector field defined on \(\RR^n\) then for each point \(a\in\RR^n\) there is a curve \(\gamma:I\mapto\RR^n\) (\(I\) an open interval in \(\RR\) containing 0) such that \(\gamma(0)=a\) and
\begin{equation}
\frac{d\gamma(t)}{dt}=\mathbf{v}_{\gamma(t)}
\end{equation}
for all \(t\in I\) and any two such curves are equal on the intersection of their domains. Furthermore, there is a neighbourhood \(U_a\) of \(a\) and an interval \(I_\epsilon=(-\epsilon,\epsilon)\) such that for all \(t\in I_\epsilon\) and \(b\in U_a\) there is a curve \(t\mapsto\phi_t(b)\) satisfying
\begin{equation}
\frac{d\phi_t(b)}{dt}=\mathbf{v}_{\phi_t(b)}
\end{equation}
which is a flow on \(U_a\) — a local flow.

Linear vector fields on \(\RR^n\)

Suppose we have a linear transformation \(X\) of the vector space \(\RR^n\). Then to any point \(a\in\RR^n\) we can associate an element \(Xa\) which we can understand as a vector in \(T_a(\RR^n)\). The previous theorem tells us that at any point \(a\) we can find a solution to the system of differential equations,
\begin{equation}
\frac{d\gamma(t)}{dt}=X(\gamma(t)),
\end{equation}
valid in some open interval around 0 with \(\gamma(0)=a\). Let’s construct this solution explicitly. We seek a power series solution
\begin{equation}
\gamma(t)=\sum_{k=0}^\infty t^ka_k
\end{equation}
such that \(a_0=a\) and where we understand \(\gamma(t)=(x^1(\gamma(t)),\dots,x^n(\gamma(t)))\) and \(a_k=(x^1(a_k),\dots,x^n(a_k))\). Plugging the power series into the differential equation we obtain,
\begin{equation}
\sum_{k=1}^\infty kt^{k-1}a_k=\sum_{k=0}^\infty t^kXa_k,
\end{equation}
from which we obtain the recurrence relation,
\begin{equation}
a_{k+1}=\frac{1}{k+1}Xa_k,
\end{equation}
which itself leads to,
\begin{equation}
a_k=\frac{1}{k!}X^k(a_0)=\frac{1}{k!}X^ka,
\end{equation}
so that,
\begin{equation}
\gamma(t)=\sum_{k=0}^\infty\frac{t^kX^k}{k!}a=\exp(tX)a,
\end{equation}
where we’ve introduced the matrix exponential which, as we’ve already mentioned, converges for any matrix \(X\). It’s not difficult to show that this solution is unique. In this case the flow defined by \(\phi_t=\exp(tX)\) is defined on the whole of \(\RR^n\) for all times \(t\).

Tangent vectors

As we’ve already indicated we choose to view \(\RR^n\) as a space of points, \(x=(x^1,\dots,x^n)\), with “arrows” emerging from a point \(x\) in space living in the tangent space \(T_x(\RR^n)\) at \(x\). This is just another copy of \(\RR^n\), now viewed as vector space. For example, at some point \(a\in\RR^n\) we might have vectors \(\mathbf{v}_a,\mathbf{w}_a\in T_a(\RR^n)\) such that \(\mathbf{v}_a+\mathbf{w}_a=(\mathbf{v}+\mathbf{w})_a\). Given two distinct points \(a,b\in\RR^n\), \(T_a(\RR^n)\) and \(T_b(\RR^n)\) are distinct copies of \(\RR^n\). Without some further mechanism by which we could transport a vector from \(T_a(\RR^n)\) to a vector in \(T_b(\RR^n)\) there can be no meaning attached to the sum of a vector in \(T_a(\RR^n)\) and a vector in \(T_b(\RR^n)\).

Working with the space \(\RR^n\) we can safely think of the tangent space at each point as the collection of all arrows at the point. But suppose our space was the surface of a sphere. In this case we have the idea of tangent vectors at a point living in a tangent plane within an ambient space. But what if there was no ambient space? We’re anticipating here the generalisation of the tools of calculus to spaces far more general than \(\RR^n\). With this in mind we’ll consider here more sophisticated characterisations of the notion of tangent vectors. Specifically, we’ll avoid, as far as possible, explicitly exploiting the fact that our underlying space of points, \(\RR^n\), is itself a vector space. Instead we’ll rely on the fact that at any point we have a valid coordinate system through which we can access a vector space structure.

A tangent vector as an equivalence class of curves

A smooth curve in \(\RR^n\) is a smooth map \(\gamma:(\alpha,\beta)\mapto\RR^n\) which we denote simply by \(\gamma(t)\). With respect to some coordinate system \(x^i\), two curves, \(\gamma(t)\) and \(\tilde{\gamma}(t)\), are said to be tangent at a point \(\gamma(t_0)=a=\tilde{\gamma}(t_0)\) if
\begin{equation}
\left.\frac{dx^i(\gamma(t))}{dt}\right|_{t_0}=\left.\frac{dx^i(\tilde{\gamma}(t))}{dt}\right|_{t_0}
\end{equation}
for \(i=1,\dots,n\). Curves are tangent regardless of the coordinate system used. Indeed, suppose instead of \(x^i\) we used some other coordinate system \(y^i\) such that \(y^i=y^i(x^1,\dots,x^n)\) then
\begin{align*}
\left.\frac{dy^i(\gamma(t))}{dt}\right|_{t_0}&=\sum_{j=1}^{n}\frac{\partial y^i}{\partial x^j}\left.\frac{dx^i(\gamma(t))}{dt}\right|_{t_0}\\
&=\sum_{j=1}^{n}\frac{\partial y^i}{\partial x^j}\left.\frac{dx^i(\tilde{\gamma}(t))}{dt}\right|_{t_0}\\
&=\left.\frac{dy^i(\tilde{\gamma}(t))}{dt}\right|_{t_0}
\end{align*}
One definition of a tangent vector, \(\mathbf{v}\), at a point \(a\) is then as the equivalence class \([\gamma]\) of curves tangent to one another at the point \(a\). We can define addition and scalar multiplication by \(\mathbf{v}_1+\mathbf{v}_2=[\gamma_1+\gamma_2]\) and \(c\mathbf{v}=[c\gamma]\). These definitions are clearly exploiting the vector space structure of our space \(\RR^n\) but can easily be tweaked not to do so. The tangent vectors so defined form a real vector space, the tangent space \(T_a(\RR^n)\). This is clearly equivalent to our intuitive notion of vectors as arrows at a point but is applicable even when our space of points is more general than \(\RR^n\).

We can now introduce the directional derivative of a (smooth) function \(f:\RR^n\mapto\RR\), at a point \(a=\gamma(t_0)\), in the direction of a tangent vector \(\mathbf{v}\) according to the definition,
\begin{equation}
D_{\mathbf{v}}f(a)=\left.\frac{df(\gamma(t))}{dt}\right|_{t_0},
\end{equation}
where \(\mathbf{v}\) is the tangent vector corresponding to the equivalence class of curves \([\gamma]\). Note that this does not depend on the representative of the equivalence class chosen since with respect to any coordinate system \(x^i\),
\begin{align*}
\left.\frac{df(\gamma(t))}{dt}\right|_{t_0}&=\left.\sum_{i=1}^n\frac{\partial f}{\partial x^i}\frac{dx^i(\gamma(t))}{dt}\right|_{t_0}\\
&=\left.\sum_{i=1}^n\frac{\partial f}{\partial x^i}\frac{dx^i(\tilde{\gamma}(t))}{dt}\right|_{t_0}\\
&=\left.\frac{df(\tilde{\gamma}(t))}{dt}\right|_{t_0}
\end{align*}
Note also that this corresponds to the usual definition of a directional derivative in \(\RR^n\) as,
\begin{equation}
D_vf(a)=\frac{d}{dt}f(a+t\mathbf{v})|_{t=0},
\end{equation}
by considering the curve in \(\RR^n\) through the point \(a\) defined according to \(t\mapto a+t\mathbf{v}\).

Directional derivatives and derivations

Let us regard the directional derivative, \(D_\mathbf{v}\), as a map, \(D_\mathbf{v}:C^\infty(\RR^n)\mapto\RR\), at any point in \(\RR^n\). Then directional derivatives are examples of derivations according to the following definition.

Definition A map \(X:C^\infty(\RR^n)\mapto\RR\) is called a derivation at a point \(a\in\RR^n\) if it is linear over \(\RR\) and satisfies the Leibniz rule,
\begin{equation}
X(fg)=X(f)g(a)+f(a)X(g).
\end{equation}

To see that \(D_\mathbf{v}\) is a derivation at some point \(a=\gamma(t_0)\in\RR^n\) with \(\mathbf{v}=[\gamma]\),
\begin{align*}
D_\mathbf{v}(f+g)(a)&=\left.\frac{d}{dt}((f+g)(\gamma(t)))\right|_{t_0}\\
&=\left.\frac{df(\gamma(t))}{dt}\right|_{t_0}+\left.\frac{dg(\gamma(t))}{dt}\right|_{t_0}\\
&=D_{\mathbf{v}}f(a)+D_{\mathbf{v}}g(a),
\end{align*}
and for \(c\in\RR\),
\begin{align*}
D_\mathbf{v}(cf)(a)&=\left.\frac{d}{dt}((cf)(\gamma(t)))\right|_{t_0}\\
&=c\left.\frac{df(\gamma(t))}{dt}\right|_{t_0}\\
&=cD_\mathbf{v}(f)(a),
\end{align*}
and
\begin{align*}
D_\mathbf{v}(fg)(a)&=\left.\frac{d}{dt}(f(\gamma(t))g(\gamma(t)))\right|_{t_0}\\
&=\left.\frac{df(\gamma(t))}{dt}g(\gamma(t))\right|_{t_0}+\left.f(\gamma(t))\frac{dg(\gamma(t))}{dt}\right|_{t_0}\\
&=D_\mathbf{v}f(a)g(a)+f(a)D_\mathbf{v}g(a)
\end{align*}

The Leibniz rule is what really captures the essence of differentiation. Let’s consider some of its consequences. Suppose \(f\) is the constant function, \(f(x)=1\). Then for any derivation \(X\), \(X(f)=0\). This follows since \(f=ff\) and by the Leibniz rule, \(X(f)=X(ff)=X(f)f(a)+f(a)X(f)=2X(f)\), so \(Xf=0\). It follows immediately, by linearity of derivations, that \(Xf=0\) for any constant function \(f(x)=c\). Another consequence is that if \(f(a)=g(a)=0\) then \(X(fg)=0\) since \(X(fg)=X(f)g(a)+f(a)X(g)=0\).

It’s straightforward to verify that derivations at a point \(a\in\RR^n\) form a real vector space which we denote by \(\mathcal{D}_a(\RR^n)\). For any coordinate system \(x^i\) the partial derivatives \(\partial/\partial x^i\) are easily seen to be derivations and we’ll now demonstrate that the partial derivatives \(\partial/\partial x^i\) at \(a\) provide a basis for \(\mathcal{D}_a(\RR^n)\). Indeed, from Taylor’s theorem, we know that for any smooth function \(f\), in the neighbourhood of a point \(a\in\RR^n\) and in terms of coordinates \(x^i\),
\begin{equation}
f(x)=f(a)+\sum_{i=1}^n\left.(x^i-a^i)\frac{\partial f}{\partial x^i}\right|_a+\sum_{i,j=1}^n(x^i-a^i)(x^j-a^j)\int_0^1(1-t)\frac{\partial^2f(a+t(x-a))}{\partial x^i\partial x^j}dt.
\end{equation}
Consider applying the derivation \(X\) to \(f\) at \(a\), \(X(f)\).
\begin{equation}
Xf=X(f(a))+\sum_{i=1}^n\left.X((x^i-a^i))\frac{\partial f}{\partial x^i}\right|_a+\sum_{i,j=1}^nX((x^i-a^i)(x^j-a^j))\int_0^1(1-t)\frac{\partial^2f(a+t(x-a))}{\partial x^i\partial x^j}dt
\end{equation}
but \(X(f(a))=0\) since \(f(a)\) is a constant and \(X((x^i-a^i)(x^j-a^j))=0\) since both \((x^i-a^i)\) and \((x^j-a^j)\) are 0 at \(a\). Thus we have that,
\begin{equation}
Xf=\sum_{i=1}^n\left.X(x^i)\frac{\partial f}{\partial x^i}\right|_a.
\end{equation}
In other words \(X=\sum_{i=1}^nX(x^i)\partial/\partial x^i\). Notice that if \(y^i\) is any other coordinate system valid in the neighbourhood of \(a\) then
\begin{equation}
Xy^j=\left.\sum_{i=1}^nX(x^i)\frac{\partial y^j}{\partial x^i}\right|_a
\end{equation}
so that
\begin{align*}
Xf&=\sum_{i=1}^n\left.X(x^i)\frac{\partial f}{\partial x^i}\right|_a\\
&=\sum_{i=1}^n\left.X(x^i)\frac{\partial y^j}{\partial x^i}\frac{\partial f}{\partial y^j}\right|_a\\
&=\sum_{i=1}^nX(y^i)\left.\frac{\partial f}{\partial y^i}\right|_a,
\end{align*}
and the result does not depend on the chosen coordinate system. That the coordinate partials are linearly independent follows since applying \(\sum_ic^i\partial/\partial x^i=0\) to the coordinate functions \(x^i\) in turn yields \(c^i=0\) for all \(i\). It follows from what we’ve observed here that every derivation is the directional derivative to a curve,

So, to any tangent vector \(\mathbf{v}\) is associated the directional derivative \(D_\mathbf{v}\) which is a derivation. Are all derivations directional derivatives? The answer is yes. If we have a derivation \(X\) at a point \(a\) then we know that for any smooth function in a neighbourhood of \(a\), in terms of coordinates \(x^i\),
\begin{equation}
Xf=\sum_{i=1}^n\left.X(x^i)\frac{\partial f}{\partial x^i}\right|_a.
\end{equation}
We also know that the directional derivative of \(f\) in the direction of a tangent vector \(\mathbf{v}=[\gamma]\) at \(a=\gamma(t_0)\) is, again in terms of local coordinates \(x^i\),
\begin{equation}
D_{\mathbf{v}}f(a)=\left.\frac{df(\gamma(t))}{dt}\right|_{t_0}=\sum_{i=1}^n\left.\frac{dx^i(\gamma(t))}{dt}\frac{\partial f}{\partial x^i}\right|_{t_0}.
\end{equation}
So, if we choose a curve \(\gamma\) such that \(\gamma(t_0)=a\) and
\begin{equation}
\left.\frac{dx^i(\gamma(t))}{dt}\right|_{t_0}=X(x^i),
\end{equation}
then \(Xf=D_{\mathbf{v}}f(a)\). Thus, we can just take \(\gamma(t)=(a^1+X(x^1)(t-t_0),\dots,a^n+X(x^n)(t-t_0))\) (even though we’re explicitly relying on the vector space structure of \(\RR^n\) there is nothing essentially different required in the more general setting). Finally, we can ask whether each tangent vector corresponds to a unique derivation. This follows since if \(D_\mathbf{v}=D_\mathbf{w}\), where \(\mathbf{v}=[\gamma_1]\) and \(\mathbf{w}=[\gamma_2]\), then applying this in turn to each coordinate function, \(x^I\), we obtain,
\begin{equation*}
\left.\frac{dx^i(\gamma_1(t)}{dt}\right|_{t_0}=\left.\frac{dx^i(\gamma_2(t)}{dt}\right|_{t_0}
\end{equation*}
so \(\gamma_1\) and \(\gamma_2\) are in the same equivalence class, or, \(\mathbf{v}=\mathbf{w}\). We have therefore proved the following important theorem.

Theorem The vector spaces \(T_a(\RR^n)\) and \(\mathcal{D}_a(\RR^n)\) are isomorphic and under this isomorphism tangent vectors \(v\) map to derivations \(D_v\).

Under this isomorphism the standard basis vectors \(\mathbf{e}_i\) at \(a\) map to the partial derivatives \(\partial/\partial x^i\) at \(a\) and indeed in the more general setting of differential geometry, which the treatment here anticipates, it is usual to treat those partials as the basis “vectors” of the tangent space \(T_a(\RR^n)\). The basis \(\partial/\partial x^i\) is called the coordinate basis. Suppose our space was the plane \(\RR^2\) then the Cartesian coordinate basis would be \((\partial/\partial x,\partial/\partial y)\) corresponding respectively to the standard basis vectors \((\mathbf{e}_x,\mathbf{e}_y)\). If we choose to work with polar coordinates, the coordinate basis would be \((\partial/\partial r,\partial/\partial\theta)\) with
\begin{align*}
\frac{\partial}{\partial r}&=\frac{\partial x}{\partial r}\frac{\partial}{\partial x}+\frac{\partial y}{\partial r}\frac{\partial}{\partial y}\\
&=\cos\theta\frac{\partial}{\partial x}+\sin\theta\frac{\partial}{\partial y}
\end{align*}
and
\begin{align*}
\frac{\partial}{\partial\theta}&=\frac{\partial x}{\partial\theta}\frac{\partial}{\partial x}+\frac{\partial y}{\partial\theta}\frac{\partial}{\partial y}\\
&=-r\sin\theta\frac{\partial}{\partial x}+r\cos\theta\frac{\partial}{\partial y}
\end{align*}
Note that if in \(T_a(\RR^n)\) we adopt the usual Euclidean metric (a non-degenerate, symmetric, positive definite inner product, \((\,,\,)\)), such that the standard basis is orthonormal, \((\mathbf{e}_i,\mathbf{e}_j)=\delta_{ij}\), then the polar basis vectors, \(\partial/\partial r\) and \(\partial/\partial\theta\) of \(T_a(\RR^2)\) are not orthonormal. The corresponding normalised basis vectors would be \(\mathbf{e}_r\) and \(\mathbf{e}_\theta\) defined by
\begin{equation}
\mathbf{e}_r=\frac{\partial}{\partial r}=\cos\theta\mathbf{e}_x+\sin\theta\mathbf{e}_y
\end{equation}
and
\begin{equation}
\mathbf{e}_\theta=\frac{1}{r}\frac{\partial}{\partial\theta}=-\sin\theta\mathbf{e}_x+\cos\theta\mathbf{e}_y.
\end{equation}

In the case of cylindrical coordinates we find
\begin{align}
\frac{\partial}{\partial\rho}&=\cos\varphi\frac{\partial}{\partial x}+\sin\varphi\frac{\partial}{\partial y}\\
\frac{\partial}{\partial\varphi}&=-\rho\sin\varphi\frac{\partial}{\partial x}+\rho\cos\varphi\frac{\partial}{\partial y}\\
\frac{\partial}{\partial z}&=\frac{\partial}{\partial z}
\end{align}
and the corresponding normalised basis vectors are \(\mathbf{e}_\rho\), \(\mathbf{e}_\varphi\) and \(\mathbf{e}_z\) defined by
\begin{equation}
\mathbf{e}_\rho=\frac{\partial}{\partial\rho}=\cos\varphi\mathbf{e}_x+\sin\varphi\mathbf{e}_y,
\end{equation}
\begin{equation}
\mathbf{e}_\varphi=\frac{1}{\rho}\frac{\partial}{\partial\varphi}=-\sin\varphi\mathbf{e}_x+\cos\varphi\mathbf{e}_y,
\end{equation}
and
\begin{equation}
\mathbf{e}_z=\frac{\partial}{\partial z}=\mathbf{e}_z.
\end{equation}

In the case of sperical coordinates we find
\begin{align}
\frac{\partial}{\partial r}&=\cos\varphi\sin\theta\frac{\partial}{\partial x}+\sin\varphi\sin\theta\frac{\partial}{\partial y}+\cos\theta\frac{\partial}{\partial z}\\
\frac{\partial}{\partial\theta}&=r\cos\varphi\cos\theta\frac{\partial}{\partial x}+r\sin\varphi\cos\theta\frac{\partial}{\partial y}-r\sin\theta\frac{\partial}{\partial z}\\
\frac{\partial}{\partial\varphi}&=-r\sin\varphi\sin\theta\frac{\partial}{\partial x}+r\cos\varphi\sin\theta\frac{\partial}{\partial y}
\end{align}
and the corresponding normalised basis vectors are \(\mathbf{e}_r\), \(\mathbf{e}_\theta\) and \(\mathbf{e}_\varphi\) defined by
\begin{equation}
\mathbf{e}_r=\frac{\partial}{\partial r}=\cos\varphi\sin\theta\mathbf{e}_x+\sin\varphi\sin\theta\mathbf{e}_y+\cos\theta\mathbf{e}_z,
\end{equation}
\begin{equation}
\mathbf{e}_\theta=\frac{1}{r}\frac{\partial}{\partial\theta}=\cos\varphi\cos\theta\mathbf{e}_x+\sin\varphi\cos\theta\mathbf{e}_y-\sin\theta\mathbf{e}_z,
\end{equation}
and
\begin{equation}
\mathbf{e}_\varphi=\frac{1}{r\sin\theta}\frac{\partial}{\partial\varphi}=-\sin\varphi\mathbf{e}_x+\cos\varphi\mathbf{e}_y.
\end{equation}

Inverse Function Theorem

From elementary calculus we recall that a continuous function is invertible if and only if it is monotonically increasing or decreasing over the interval of the required inverse. We can see how this arises by looking at the linear approximation to \(f\) in the neighbourhood of some point \(x=a\), \(f(x)\approx f(a)+f'(a)\cdot(x-a)\). Clearly, to be able to invert this and express, at least locally, \(x\) in terms of \(f(x)\) we must have \(f'(a)\neq0\).

As we’ve seen, we can similarly approximate the function \(f:\RR^n\mapto\RR^m\) in the neighbourhood of a point \(a\in\RR^n\) as \(f(x)\approx f(a)+J_f(a)(x-a)\) which tells us that for \(f\) to be invertible in the neighbourhood of some point will certainly require the Jacobian matrix to be invertible at that point. In particular we must have \(n=m\), in which case the determinant of this matrix is called the Jacobian determinant of the map \(f\). We now state the important inverse function theorem.

Theorem (Inverse function theorem) Suppose \(f:\RR^n\mapto\RR^n\) is smooth on some open subset of \(\RR^n\). Then if \(\det\mathbf{J}_f(a)\neq0\) at some \(a\) in that subset then there exists an open neighbourhood \(U\) of \(a\) such that \(V=f(U)\) is open and \(f:U\mapto V\) is a diffeomorphism. In this case, if \(x\in U\) and \(y=f(x)\) then \(J_{f^{-1}}(y)=(J_f(x))^{-1}\).

Note that if \(f:U\mapto V\) is a diffeomorphism of open sets then we may form the identity function \(f\circ f^{-1}\) on \(V\). Clearly, for all \(y\in V\), \(J_{f\circ f^{-1}}(y)=\id_V\) but by the chain rule we have \(\id_V=J_{f\circ f^{-1}}(y)=J_f(x)J_{f^{-1}}(y)\) for any \(y=f(x)\in V\) and so \(J_f(x)\) is invertible at all points \(x\in U\).

Example In one dimension, the function \(f(x)=x^3\) is invertible with \(f^{-1}(x)=x^{1/3}\). Notice though that, \(f'(x)=3x^2\), so that, \(f'(0)=0\), and the hypothesis of the inverse function theorem is violated. The point is that \(f^{-1}\) is not differentiable at \(f(0)=0\).

A useful consequence of the inverse function theorem is the following. If \(U\subset\RR^n\) is some open subset of \(\RR^n\) on which a map \(f:U\mapto\RR^n\) is smooth and for which the Jacobian determinant \(\det\mathbf{J}_f(x)\neq0\) for all \(x\in U\) then \(f(U)\) is open and if \(f\) is injective then \(f:U\mapto f(U)\) is a diffeomorphism. To see this, note that since at every \(x\in U\), \(\det\mathbf{J}_f(x)\neq0\), the inverse function theorem tells us that we have open sets which we can call \(U_x\) and \(V_x\) such that \(x\in U_x\) and \(V_x=f(U_x)\) open in \(f(U)\) so that, since \(f(x)\in V_x\subset f(U)\), \(f(U)\) is open. If \(f\) is injective then, since by the theorem \(f:U_x\mapto V_x\) is a diffeomorphism for every \(x\in U\) and since \(f(U)\) is open we conclude that the inverse \(f^{-1}\) is smooth on \(f(U)\) so that indeed \(f:U\mapto f(U)\) is a diffeomorphism.

A coordinate system, \((y^1,\dots,y^n)\), for some subset \(U\) of points of \(\RR^n\) is simply a map
\begin{equation}
(x^1,\dots,x^n)\mapsto(y^1(x^1,\dots,x^n),\dots,y^n(x^1,\dots,x^n)), \label{map:coordmap}
\end{equation}
allowing us to (re-)coordinatize points \(x=(x^1,\dots,x^n)\in U\). Intuitively, for the \(y^i\) to be good coordinates, the map \eqref{map:coordmap} should be a diffeomorphism — points should be uniquely identified and we should be able to differentiate at will. Using the inverse function theorem we can test this by examining the Jacobian of the transformation.

Example Consider the coordinate transformation maps of the previous section. For polar coordinates in the plane the map \(r,\theta)\mapsto(r\cos\theta,r\sin\theta\) defined on the open set \((0,\infty)\times\RR\) is smooth with the Jacobian determinant \(r\) which is non-zero everywhere on the domain. Thus the inverse function theorem tells us that the restriction of this map to any open subset on which it is injective is a diffeomorphism onto its image. We could restrict, for example, to \((0,\infty)\times(0,2\pi)\) and the polar coordinates map is then a diffeomorphism onto the complement of the non-negative \(x\)-axis. For cylindrical coordinates the Jacobian is \(\rho\). Restricting to \((0,\infty)\times(0,2\pi)\times\RR\) the cylindrical polar coordinates map is a diffeomorphism onto the complement of the \(yz\) half plane corresponding to non-negative \(y\)-values. In the case of spherical polar coordinates the Jacobian is \(r^2\sin\theta\) so restricting to \((0,\infty)\times(0,\pi)\times(0,2\pi)\) we have a diffeomorphism onto the image.

Vector differentiation

Recall that the derivative, \(f'(t)\), of a scalar function of one real variable, \(f(t)\), is defined to be
\begin{equation}
f'(t)=\lim_{h\mapto 0}\frac{f(t+h)-f(t)}{h}.\label{def:one-dim deriv}
\end{equation}
We can also consider functions taking values in \(\RR^n\), \(f:\RR\mapto\RR^n\). In the definition of derivative we’ll then explicitly make use of the vector space nature of \(\RR^n\) and though we won’t in general, it can be useful in this context to denote the image under \(f\) of some \(x\in\RR\) using bold face \(\mathbf{f}(x)\).
\begin{equation}
\mathbf{f}'(x)=\frac{d\mathbf{f}}{dx}=\lim_{h\mapto 0}\frac{\mathbf{f}(x+h)-\mathbf{f}(x)}{h}.
\end{equation}
The vector \(\mathbf{f}(x)\) is nothing but the vector corresponding to the element \(f(x)\in\RR^n\) with respect to the standard basis in \(\RR^n\). The following product rules follow from this definition in the same way as the scalar function product rule,
\begin{align}
\frac{d}{dx}\left(c(x)\mathbf{f}(x)\right)&=c\frac{d\mathbf{f}}{dx}+\frac{dc}{dx}\mathbf{f},\\
\frac{d}{dx}\left(\mathbf{f}(x)\cdot\mathbf{g}(x)\right)&=\mathbf{f}\cdot\frac{d\mathbf{g}}{dx}+\frac{d\mathbf{f}}{dx}\cdot\mathbf{g},\\
\frac{d}{dx}\left(\mathbf{f}(x)\times\mathbf{g}(x)\right)&=\mathbf{f}\times\frac{d\mathbf{g}}{dx}+\frac{d\mathbf{f}}{dx}\times\mathbf{g},
\end{align}
where \(c:\RR\mapto\RR\) and \(g:\RR\mapto\RR^n\) with \(\mathbf{g}(x)\) the vector representation of \(g(x)\) with respect to the standard basis.

More generally, we can consider vector valued functions \(f:\RR^n\mapto\RR^m\) such that points \(x\in\RR^n\) are mapped to points \(f(x)=(f^1(x),\dots,f^m(x))\) of \(\RR^m\) where we have here introduced the component functions, \(f^i:\RR^n\mapto\RR\), of \(f\). Such a function \(f\) is said to be differentiable at \(a\) if there exists a linear map \(J_f(a):\RR^n\mapto\RR^m\) such that
\begin{equation}
\lim_{h\mapto0}\frac{|f(a+h)-f(a)-J_f(a)h|}{|h|}=0\label{eq:genderiv}
\end{equation}
where \(||\) is the appropriate length (for \(\RR^m\) in the numerator and \(\RR^n\) in the denominator). In this case, \(J_f(a)\) is called the derivative (sometimes total derivative) of \(f\) at \(a\). Introducing \(R(h)\in\RR^m\) as \(R(h)=f(a+h)-f(a)-J_f(a)h\) we can interpret \eqref{eq:genderiv} as saying that
\begin{equation}
f(a+h)=f(a)+J_f(a)h+R(h)
\end{equation}
where the “remainder” \(R(h)\) is such that \(\lim_{h\mapto0}|R(h)|/|h|=0\) and so can interpret \(J_f(a)\) as linearly approximating \(f(a+h)-f(a)\) near \(a\). Perhaps not surprisingly it turns out that, if \(f\) is differentiable at \(a\), then with respect to the standard bases of \(\RR^n\) and \(\RR^m\) the matrix of the linear map \(J_f(a)\), \(\mathbf{J}_f(a)\), has elements given by the partial derivatives,
\begin{equation}
{J_f(a)}_i^j=\frac{\partial f^j}{\partial x^i}(a).
\end{equation}
To see this, note that, if it exists, the \(i\)th partial derivative of \(f^j\) at \(a\) is given by
\begin{equation}
\partial_if^j(a)=\lim_{\epsilon\mapto0}\frac{f^j(a+\epsilon e_i)-f^j(a)}{\epsilon}.
\end{equation}
where \(e_i\) is \(i\)th standard basis element of \(\RR^n\). Now, recalling the definition of the remainder \(R(h)\in\RR^m\), we have that, with respect to the standard basis of \(\RR^m\) the \(j\)th component of \(R(\epsilon e_i)\) is \(R^j(\epsilon e_i)=f^j(a+\epsilon e_i) – f^j(a)-{J_f(a)}_i^j\epsilon\). Therefore we can write
\begin{align*}
\partial_if^j(a)&=\lim_{\epsilon\mapto0}\frac{f^j(a+\epsilon e_i)-f^j(a)}{\epsilon}=\lim_{\epsilon\mapto0}\frac{{J_f(a)}_i^j\epsilon+R^j(\epsilon e_i)}{\epsilon}\\
&={J_f(a)}_i^j+\lim_{\epsilon\mapto0}\frac{R^j(\epsilon e_i)}{\epsilon}\\
&={J_f(a)}_i^j.
\end{align*}
The converse also holds. That is, if all the component functions of a function \(f:\RR^n\mapto\RR^m\) are differentiable at a point \(a\in\RR^n\), then \(f\) is differentiable at \(a\). Thus, we have that a function \(f:\RR^n\mapto\RR^m\) is differentiable at \(a\) if and only if all its component functions are differentiable at \(a\). In this case, with respect to the standard bases of \(\RR^n\) and \(\RR^m\), the matrix of the derivative of \(f\), \(\mathbf{J}_f(a)\), the matrix of partial derivatives of the component functions at \(a\). This matrix is called the Jacobian matrix of \(f\) at \(a\).

A function \(f:\RR^n\mapto\RR^m\) is said to be smooth if all its component functions are smooth. A smooth function \(f\) between open sets of \(\RR^n\) and \(\RR^m\) is called a diffeomorphism if it is bijective and its inverse function is also smooth. We will consider the inevitability of functions in the section on the inverse function theorem.

The derivative of a composition of maps \(f:\RR^n\mapto\RR^m\), and \(g:\RR^m\mapto\RR^p\), \(g\circ f\), at a point \(a\in\RR^n\), that is, the generalisation of the familiar chain rule, is then given my the matrix product of the respective Jacobian matrices,
\begin{equation}
J_{g\circ f}(a)=J_g(f(a))J_f(a).
\end{equation}

Example Suppose \(f:\RR^n\mapto\RR^m\) is a linear map whose matrix representation with respect to the standard bases of \(\RR^n\) and \(\RR^m\) is \(\mathbf{f}\). Then \(f\) maps \(\mathbf{x}\mapsto\mathbf{f}\mathbf{x}\) so clearly \(\mathbf{J}_f=\mathbf{f}\).

Example
As we discussed earlier, the relationship between polar and cartesian coordinates can be described through a map from \(\RR^2\mapto\RR^2\) given by
\begin{equation}
\begin{pmatrix}r\\\theta\end{pmatrix}\mapsto\begin{pmatrix}r\cos\theta\\r\sin\theta\end{pmatrix},
\end{equation}
the domain of which we take to be \((0,\infty)\times[0,2\pi)\subset\RR^2\). We typically write the components of this map as \(x(r,\theta)=r\cos\theta\) and \(y(r,\theta)=r\sin\theta\). The Jacobian matrix for the polar coordinate map is then
\begin{equation}
\begin{pmatrix}
\partial x/\partial r&\partial x/\partial\theta\\
\partial y/\partial r&\partial y/\partial\theta
\end{pmatrix}=
\begin{pmatrix}
\cos\theta&-r\sin\theta\\
\sin\theta& r\cos\theta
\end{pmatrix}.
\end{equation}
Likewise, cylindrical coordinates are related to cartesian coordinate through a map from \(\RR^3\mapto\RR^3\) given by
\begin{equation}
\begin{pmatrix}
\rho\\\phi\\z\end{pmatrix}\mapsto\begin{pmatrix}
\rho\cos\phi\\\rho\sin\phi\\z
\end{pmatrix}.
\end{equation}
In this case the domain is taken to be \((0,\infty)\times[0,2\pi)\times\RR\subset\RR^3\) and the Jacobian matrix is
\begin{equation}
\begin{pmatrix}
\cos\phi&-\rho\sin\phi&0\\
\sin\phi&\rho\cos\phi&0\\
0&0&1
\end{pmatrix}.
\end{equation}
For spherical polar coordinates the \(\RR^3\mapto\RR^3\) map is
\begin{equation}
\begin{pmatrix}
r\\\theta\\\phi
\end{pmatrix}\mapsto\begin{pmatrix}
r\sin\theta\cos\phi\\
r\sin\theta\sin\phi\\
r\cos\theta
\end{pmatrix},
\end{equation}
we take the domain to be \((0,\infty)\times(0,\pi)\times[0,2\pi)\subset\RR^3\) and with Jacobian matrix
\begin{equation}
\begin{pmatrix}
\sin\theta\cos\phi&r\cos\theta\cos\phi&-r\sin\theta\sin\phi\\
\sin\theta\sin\phi&r\cos\theta\sin\phi&r\sin\theta\cos\phi\\
\cos\theta&-r\sin\theta&0\\
\end{pmatrix}.
\end{equation}

Partial derivatives and some applications

Some Multivariable Functions

The most familiar examples of multivariable functions are those taking values in \(\RR\). These are also called scalar fields — they assign a scalar to each point in space. One example, a function \(f:\RR^2\mapto\RR\), is
\begin{equation*}
f(x,y)=x^2+y^2.
\end{equation*}
We say that it has ‘level curves’ — the set of points \((x,y)\) such that \(f(x,y)=r^2\) — which are circles of radius \(r\). An analogous example this time of a function \(f:\RR^3\mapto\RR\) is
\begin{equation*}
f(x,y,z)=x^2+y^2+z^2,
\end{equation*}
and in this case a ‘level surface’, specified by the points \((x,y,z)\) such that \(f(x,y,z)=r^2\), is a sphere of radius \(r\).

The curvilinear coordinates provide important examples of functions taking values in \(\RR^2\) and \(\RR^3\). Take the polar coordinates first. The function mapping a points polar coordinates to its cartesian coordinates is given by \(f:(0,\infty)\times[0,2\pi)\mapto\RR^2\)
\begin{equation*}
f(r,\theta)=(r\cos\theta,r\sin\theta).
\end{equation*}
The function mapping a point’s cylindrical coordinates to its Cartesian coordinates is a function \((0,\infty)\times[0,2\pi)\times\RR\mapto\RR^3\) which we could write as
\begin{equation*}
f(\rho,\varphi,z)=(\rho\cos\varphi,\rho\sin\varphi,z).
\end{equation*}
The function mapping a point’s spherical coordinates to its Cartesian coordinates is a function \((0,\infty)\times(0,\pi)\times[0,2\pi)\mapto\RR^3\) which we could write as
\begin{equation*}
f(r,\theta,\varphi)=(r\cos\varphi\sin\theta,r\sin\varphi\sin\theta,r\cos\theta).
\end{equation*}
Note that in each of these functions the domain has been restricted to ensure the function is one-to-one.

Definition of partial derivative

If \(f:\RR^n\mapto\RR\) is a real valued function on \(\RR^n\) we define the partial derivative of \(f\) with respect to \(x^i\) as,
\begin{equation}
\frac{\partial f}{\partial x^i}=\lim_{\epsilon\mapto0}\frac{f(x^1,\dots,x^i+\epsilon,\dots,x^n)-f(x^1,\dots,x^i,\dots,x^n) }{\epsilon}.
\end{equation}
Thus, a small change \(\Delta x^i\) in the \(x^i\) coordinate leads to an increment in the value of the function given by,
\begin{equation}
\Delta f\approx\frac{\partial f}{\partial x^i}\Delta x^i.
\end{equation}
More generally, we have
\begin{equation}
\Delta f\approx\sum_{i=1}^n\frac{\partial f}{\partial x^i}\Delta x^i=\partial_if\Delta x^i,
\end{equation}
where we’ve introduced the notation,
\begin{equation}
\partial_if=\frac{\partial f}{\partial x^i}.
\end{equation}
In this notation, second order partials are represented as
\begin{equation}
\partial_{ij}f=\frac{\partial^2 f}{\partial x^i\partial x^j}.
\end{equation}
An extremely important property of partial derivatives is that the order in which we take partial derivatives is irrelevant: \(\partial_{ij}f=\partial_{ji}f\). A function \(f:\RR^n\mapto\RR\) is said to be smooth if all higher order partials exist and are continuous. We denote the set of smooth functions \(f:\RR^n\mapto\RR\) by \(C^\infty(\RR^n)\).

Leibnitz’ rule

Partial differentiation can be useful in evaluating certain integrals, a technique informally known as ‘differentiating under the integral sign’. Suppose \(F(x,t)=\int f(x,t)\,dt\), then
\begin{equation*}
\frac{\partial F}{\partial t}=f(x,t),
\end{equation*}
so that
\begin{equation*}
\frac{\partial^2 F(x,t)}{\partial x\partial t}=\frac{\partial f(x,t)}{\partial x},
\end{equation*}
which upon integrating yields
\begin{equation*}
\frac{\partial F(x,t)}{\partial x}=\int \frac{\partial f(x,t)}{\partial x}\,dt.
\end{equation*}
More generally, if
\begin{equation*}
I(x)=\int_{u(x)}^{v(x)}f(x,t)\,dt=F(x,v(x))-F(x,u(x)),
\end{equation*}
then \(\partial I/\partial v=f(x,v(x))\), \(\partial I/\partial u=-f(x,u(x))\) and
\begin{align*}
\frac{\partial I}{\partial x}&= \int^v\frac{\partial f(x,t)}{\partial x}\,dt-\int^u\frac{\partial f(x,t)}{\partial x}\,dt\\
&=\int_u^v\frac{\partial f(x,t)}{\partial x}\,dt
\end{align*}
so that
\begin{equation}
\frac{dI}{dx}=f(x,v(x))\frac{dv}{dx}-f(x,u(x))\frac{du}{dx}+\int_u^v\frac{\partial f(x,t)}{\partial x}\,dt
\end{equation}
which is called Leibnitz’ rule.

Example
If
\begin{equation}
\phi(\alpha)=\int_\alpha^{\alpha^2}\frac{\sin\alpha x}{x}\,dx
\end{equation}
then by Leibnitz’ rule we have,
\begin{align*}
\phi'(\alpha)&=\frac{\sin\alpha^3}{\alpha^2}\cdot2\alpha-\frac{\sin\alpha^2}{\alpha}+\int_\alpha^{\alpha^2}\cos\alpha x\,dx\\
&=2\frac{\sin\alpha^3}{\alpha}-\frac{\sin\alpha^2}{\alpha}+\frac{\sin\alpha^3}{\alpha}-\frac{\sin\alpha^2}{\alpha}\\
&=\frac{3\sin\alpha^3-2\sin\alpha^2}{\alpha}.
\end{align*}

Taylor expansion and stationary points

The Taylor expansion for \(f:\RR^n\mapto\RR\) about a point \(a\) is
\begin{equation}
f(x)=f(a)+\sum_{i=1}^n\partial_if(a)(x^i-a^i)+\frac{1}{2}\sum_{i,j=1}^n\partial_{ij}f(a)(x^i-a^i)(x^j-a^j)+\dots,
\end{equation}
where for clarity (here and below) we’re not employing the summation convention for repeated indices.

The stationary points of a function \(f\) may be analysed with the help of the Taylor expansion as follows. At any stationary point, \(a\), the first partial derivatives must be zero. To try to determine the nature of the stationary point we consider the approximation of the function given by the Taylor expansion about the point,
\begin{equation}
f(x)-f(a)\approx\frac{1}{2}{\Delta\mathbf{x}}^\mathsf{T}\mathbf{M}\Delta\mathbf{x}\label{eq:1st order taylor}
\end{equation}
where \(\Delta\mathbf{x}^\mathsf{T}=(x^1-a^1,\dots,x^n-a^n)\), \(\Delta\mathbf{x}\) the corresponding column vector and \(\mathbf{M}\) is the matrix with elements \(M_{ij}=\partial_{ij}f(a)\). Since \(\mathbf{M}\) is a real symmetric matrix it is diagonalisable through a similarity transformation by an orthogonal matrix \(\mathbf{O}\). That is, \(\mathbf{O}^\mathsf{T}\mathbf{M}\mathbf{O}\) is diagonal with diagonal elements the eigenvalues of \(\mathbf{M}\). Thus we have
\begin{equation}
f(x)-f(a)\approx\frac{1}{2}{\Delta\mathbf{x}’}^\mathsf{T}\mathbf{M}’\Delta\mathbf{x}’,
\end{equation}
where \(\Delta\mathbf{x}’=\mathbf{O}^\mathsf{T}\Delta\mathbf{x}\) and \(M’_{ij}=\delta_{ij}\lambda_i\) with \(\lambda_i\) the eigenvalues of \(\mathbf{M}\). That is,
\begin{equation}
f(x)-f(a)\approx\frac{1}{2}\sum_i(\Delta x’^i)^2\lambda_i,
\end{equation}
from which we conclude the following:

  1. If \(\lambda_i>0\) for all \(i\) then the stationary point at \(a\) is a minimum.
  2. If \(\lambda_i<0\) for all \(i\) then the stationary point at \(a\) is a maximum.
  3. If at least one \(\lambda_i>0\) and at least one \(\lambda_i<0\) then the stationary point at \(a\) is a saddle point (a stationary point which is not an extremum).
  4. If some \(\lambda_i=0\) and the non-zero \(\lambda_i\) all have the same sign then the test is inconclusive.

For a function of two real variables, \(f:\RR^2\mapto\RR\), the eigenvalues of \(\mathbf{M}\) are obtained from
\begin{equation*}
\det\begin{pmatrix}
\partial_{xx}f-\lambda & \partial_{xy}f\\
\partial_{xy}f & \partial_{yy}f-\lambda\\
\end{pmatrix}=\lambda^2-(\partial_{xx}f+\partial_{yy}f)\lambda+\partial_{xx}f\partial_{yy}f-\partial_{xy}f^2=0.
\end{equation*}
That is,
\begin{equation*}
\lambda=(\partial_{xx}f+\partial_{yy}f)\pm\sqrt{(\partial_{xx}f-\partial_{yy}f)^2+4\partial_{xy}f^2},
\end{equation*}
so that for \(\lambda>0\) we need \(\partial_{xx}f>0\), \(\partial_{yy}f>0\) and \(\partial_{xx}f\partial_{yy}f-\partial_{xy}f^2>0\). For \(\lambda<0\) we need \(\partial_{xx}f<0\), \(\partial_{yy}f<0\) and \(\partial_{xx}f\partial_{yy}f-\partial_{xy}f^2>0\). For a saddle point we need \(\partial_{xx}f\) and \(\partial_{yy}f\) having opposite signs or \(\partial_{xx}f\partial_{yy}f<\partial_{xy}f^2\).

Taylor’s Theorem

Further refining our index notation for partials such that for any \(m\)-tuple \(I=(i_1,\dots,i_m)\) and \(|I|=m\) we define
\begin{equation}
\partial_I=\frac{\partial^m}{\partial x^{i_1}\cdots\partial x^{i_m}}
\end{equation}
and
\begin{equation}
(x-a)^I=(x^{i_1}-a^{i_1})\cdots(x^{i_m}-a^{i_m})
\end{equation}
we can state Taylor’s Theorem. This says that for a function \(f:\RR^n\mapto\RR\) appropriately differentiable near a point \(a\) we have for all \(x\) near \(a\),
\begin{equation}
f(x)=P_k(x)+R_k(x),
\end{equation}
where
\begin{equation}
P_k(x)=f(a)+\sum_{m=1}^k\frac{1}{m!}\sum_{I:|I|=m}(x-a)^I\partial_If(a),
\end{equation}
is the the \(k\)th-order Taylor polynomial of \(f\) at \(a\) and
\begin{equation}
R_k(x)=\frac{1}{k!}\sum_{I:|I|=k+1}(x-a)^I\int_0^1(1-t)^k\partial_If(a+t(x-a))dt
\end{equation}
is the \(k\)th remainder term. To see why this is true we use induction. For \(k=0\), \(P_0(x)=f(a)\) and the \(k\)th remainder term is
\begin{align*}
R_0(x)&=\sum_{i=1}^n(x^i-a^i)\int_0^1\partial_if(a+t(x-a))dt\\
&=\int_0^1\frac{d}{dt}f(a+t(x-a))dt\\
&=f(x)-f(a).
\end{align*}
Now assume the result for some \(k\) and use integration by parts on the integral in the remainder term,
\begin{align*}
\int_0^1(1-t)^k\partial_If(a+t(x-a))dt&=\left.\left(-\frac{(1-t)^{k+1}}{k+1}\partial_If(a+t(x-a))\right)\right\rvert_0^1\\
&+\frac{(1-t)^{k+1}}{k+1}\frac{d}{dt}\partial_If(a+t(x-a))dt\\
&=\frac{1}{k+1}\partial_If(a)\\
&+\frac{1}{k+1}\sum_{i=1}^n(x^i-a^i)\int_0^1(1-t)^k+1\frac{\partial}{\partial x^i}\partial_If(a+t(x-a))dt
\end{align*}
Now observe that
\begin{equation*}
P_k(x)+\frac{1}{k!}\sum_{I:|I|=k+1}(x-a)^I\frac{1}{k+1}\partial_If(a)=P_{k+1}(x)
\end{equation*}
and that
\begin{align*}
&\frac{1}{k!}\sum_{I:|I|=k+1}(x-a)^I\frac{1}{k+1}\sum_{i=1}^n(x^i-a^i)\int_0^1(1-t)^k+1\frac{\partial}{\partial x^i}\partial_If(a+t(x-a))dt\\
&=\frac{1}{(k+1)!}\sum_{I:|I|=k+1}\sum_{i=1}^n(x-a)^I(x^i-a^i)\int_0^1(1-t)^k+1\frac{\partial}{\partial x^i}\partial_If(a+t(x-a))dt\\
&=\frac{1}{(k+1)!}\sum_{I:|I|=k+2}(x-a)^I\int_0^1(1-t)^k+1\partial_If(a+t(x-a))dt\\
&=R_{k+1}(x).
\end{align*}

Differentials

Recall that the total differential \(df\) of a function \(f:\RR^n\mapto\RR\) is defined to be
\begin{equation}
df=\partial_if dx^i.
\end{equation}
Though the relation to the infinitesimal increment is clear, there is no approximation intended here. Later we will formally define \(df\) as an object belonging to the dual space of the tangent space at a point, a “differential form”, but for the time being it is safe to think of it either as a small change in \(f\) or as the kind of object we are used to integrating.

When working with partial derivatives it is always wise to indicate clearly which variables are being held constant. Thus,
\begin{equation}
\left(\frac{\partial\phi}{\partial x}\right)_{y,z},
\end{equation}
means the partial derivative of \(\phi\), regarded as a function of \(x\), \(y\) and \(z\), with respect to \(x\), holding \(y\) and \(z\) constant. The following example demonstrates how differentials naturally ‘spit out’ all partial derivatives simultaneously.

Example Suppose \(w=x^3y-z^2t\), \(xy=zt\), and we wish to calculate
\begin{equation*}
\left(\frac{\partial w}{\partial y}\right)_{x,t}.
\end{equation*}
We could either proceed directly, using the chain rule,
\begin{equation*}
\left(\frac{\partial w}{\partial y}\right)_{x,t}=x^3-2zt\quad\quad\left(\frac{\partial z}{\partial y}\right)_{x,t}=x^3-2xz,
\end{equation*}
or take differentials,
\begin{equation*}
dw=3x^2y\,dx+x^3\,dy-2zt\,dz-z^2\,dt,
\end{equation*}
\begin{equation*}
y\,dx+x\,dy=t\,dz+z\,dt,
\end{equation*}
then subsituting for \(dz\), since \(x\), \(y\) and \(t\) are being treated as the independent variables, to get,
\begin{equation*}
dw=(3x^2y-2yz)\,dx+(x^3-2xz)\,dy+z^2\,dt,
\end{equation*}
from which we obtain all the partials at once,
\begin{equation*}
\left(\frac{\partial w}{\partial x}\right)_{y,t}=3x^2y-2yz\quad
\left(\frac{\partial w}{\partial y}\right)_{x,t}=x^3-2xz\quad
\left(\frac{\partial w}{\partial t}\right)_{x,y}=z^2.\\
\end{equation*}

A differential of the form \(g_idx^i\) is said to be exact if there exists a function \(f\) such that \(df=g_idx^i\). This turns out to be an important attribute and in many situations is equivalent to the condition that \(\partial_ig_j=\partial_jg_i\) for all pairs \(i,j\).

Example Consider the differential, \((x+y^2)\,dx+(2xy+3y^2)\,dy\). This certainly satisfies the condition so let us try to identify an \(f\) such that it is equal to \(df\). Integrating \((x+y^2)\) with respect to \(x\) treating \(y\) as constant, we find our candidate must have the form, \(x^2/2+xy^2+c(y)\), where \(c(y)\) is some function of \(y\). Now differentiating this with respect to \(y\) we get \(2xy+c'(y)\) and this must be equal to \(2xy+3y^2\). Therefore \(f\) must have the form \(f(x,y)=x^2/2+xy^2+y^3+c\) where \(c\) is an arbitrary constant.

The Vector Space \(\RR^n\)

Points and coordinates

We’ll be considering the generalisation to \(n\)- (particularly \(n=3\)) dimensions of the familiar notions of single variable differential and integral calculus. This will all be further generalised when we come to the discussion of calculus on manifolds and with that goal in mind we’ll try to take a little more care than is perhaps strictly necessary in setting the stage.

Our space will be \(\RR^n\), \(n\)-dimensional Euclidean space. Forget for the moment that this is a vector space and consider it simply as a space of points, \(n\)-tuples such as \(a=(a^1,\dots,a^n)\). The (Cartesian) coordinates on \(\RR^n\) will be denoted by \(x^1,\dots,x^n\) so that the \(x^ith\) coordinate of the point \(a\) is \(a^i\). When talking about a general (variable) point in \(\RR^n\) we’ll denote it by \(x\) with \(x=(x^1,\dots,x^n)\). Beware though that in two and three dimensions we’ll sometimes also denote the Cartesian coordinates as \(x,y\) and \(x,y,z\) respectively. The \(x^is\) in \(\RR^n\) and the \(x\), \(y\), and \(z\) in \(\RR^2\) and \(\RR^3\) are best thought of as coordinate functions. So, for example, \(x^i:\RR^n\mapto\RR\) is such that \(x^i(a)=a^i\). When discussing general (variable) points we’re therefore abusing notation — using the same symbol to denote coordinate functions and coordinates. In other words we might come across a notationally undesirable equation such as \(x^i(x)=x^i\). Context should make it clear what is intended. It’s worth noting here that every point of the \(\RR^n\) space can be specified by a single Cartesian coordinate system.

When we come to consider more general spaces, for example the surface of a sphere, this will not be the case. In such cases we’ll still assign (Cartesian) coordinates to points in our space through coordinate maps which effectively identify coordinate patches of the general space of points with pieces of \(\RR^n\). Where these patches overlap, the coordinate maps must, in a precise mathematical sense, be compatible — we must be able to consistently “sew” the patches together. Spaces which can, in this way, be treated as “locally Euclidean” are important because we can do calculus on functions on these spaces just as we can for functions on \(\RR^n\). We simply exploit the vector space properties of \(\RR^n\) via the coordinate maps. Crucial in this regard is the fact that \(\RR^n\) is a normed vector space, the norm, \(|a|\), of a point \(a=(a^1,\dots,a^n)\) being given by \(|a|=\sqrt{(a^1)^2+\cdots+(a^n)^2}\) so that the distance between two points is given by \(d(a,b)\) where
\begin{equation}
d(a,b)=\sqrt{(b^1-a^1)^2+\cdots+(b^n-a^n)^2}.
\end{equation}
As a vector space, \(\RR^n\) has a standard set of basis vectors, \(e_1,\dots,e_n\), with \(e_i\) typically regarded as column vector with 1 in the \(i\)th row and zeros everywhere else.

Vectors and the choice of scalar product

Standard treatments of vector calculus exploit the fact that, \(\RR^3\) say, can be simultaneously thought of as a space of points and of vectors. There’s no need to distinguish since we can always “parallel transport” a vector at some point in space back to the origin or, for that matter, to any other point. In such treatments the usual scalar product is typically taken for granted. Personally, I’ve found that this leads to a certain amount of confusion as to the real role of the scalar product, particularly when it comes to, say, discussions of the geometry of spacetime in special relativity. In that case the space is \(\RR^4\) as per the previous section but the scalar product of tangent vectors is crucially not the Euclidean scalar product.

For this reason, we’ll take some care to distinguish the intuitive notion of vectors as “arrows in space” from the underlying space of points. To each point, \(x\), in \(\RR^n\) will be associated a vector space, the tangent space at \(x\), \(T_x(\RR^n)\). This is the space containing all the arrows at the point \(x\) and is, of course, a copy of the vector space \(\RR^n\). In other words our intuitive notion of an arrow between two points \(a\) and \(b\) is treated as an object within the tangent space at \(a\). When dealing with tangent vectors we’ll use boldface. Thus, the standard set of basis vectors in a tangent space, \(T_x(\RR^n)\), will be denoted \(\mathbf{e}_1,\dots,\mathbf{e}_n\), with \(\mathbf{e}_i\) a column vector with 1 in the \(i\)th row and zeros everywhere else. The basis vector \(\mathbf{e}_i\) can be regarded as pointing from \(x\) in the direction of increasing coordinate \(x^i\).

The usual scalar product, also call dot product, of two vectors \(\mathbf{u}=(u^1,\dots,u^n)\) and \(\mathbf{v}=(v^1,\dots,v^n)\) of \(\RR^n\), is given by
\begin{equation}
\mathbf{u}\cdot\mathbf{v}=\sum_{i=1}^nu^iv^i.
\end{equation}
The dot product is a non-degenerate, symmetric, positive-definite inner product on \(\RR^n\) and allows us to define the length of any vector \(\mathbf{v}\) as
\begin{equation}
|\mathbf{v}|=\sqrt{\mathbf{v}\cdot\mathbf{v}}.
\end{equation}
Thanks to the Cauchy-Schwarz Theorem the angle, \(\theta\), between two vectors \(\mathbf{u}\) and \(\mathbf{v}\) may be defined as
\begin{equation}
\cos\theta=\frac{\mathbf{u}\cdot\mathbf{v}}{|\mathbf{u}||\mathbf{v}|}.
\end{equation}
As we’ve mentioned, the Minkowski space-time of special relativity is, as a space of points, \(\RR^4\). However a different choice of scalar product, in this case called a metric, is made, namely, \(\mathbf{u}\cdot\mathbf{v}=-u^0v^0+u^1v^1+u^2v^2+u^3v^3\).

In \(\RR^3\), recall that given two vectors \(\mathbf{u}\) and \(\mathbf{v}\), their vector product, \(\mathbf{u}\times\mathbf{v}\), with respect to Cartesian basis vectors is defined as,
\begin{equation}
\mathbf{u}\times\mathbf{v}=(u^2v^3-u^3v^2)\mathbf{e}_1-(u^1v^3-u^3v^1)\mathbf{e}_2+(u^1v^2-u^2v^1)\mathbf{e}_3,
\end{equation}
which can be conveniently remembered as a determinant,
\begin{equation}
\mathbf{u}\times\mathbf{v}=\det\begin{pmatrix}
\mathbf{e}_1&\mathbf{e}_2&\mathbf{e}_3\\
u^1&u^2&u^3\\
v^1&v^2&v^3
\end{pmatrix}.
\end{equation}
Alternatively, using the summation convention,
\begin{equation}
(\mathbf{u}\times\mathbf{v})^i=\epsilon^i_{jk}u^jv^k,
\end{equation}
where \(\epsilon^i_{jk}=\delta^{il}\epsilon_{ljk}\) is the Levi-Civita symbol. Note that the distinction between upper and lower indices is not important here but in more general contexts it will become so and therefore here we choose to take more care than is really necessary. The Levi-Civita symbol is given by
\begin{align*}
\epsilon_{123}=\epsilon_{231}=\epsilon_{312}&=1\\
\epsilon_{213}=\epsilon_{132}=\epsilon_{321}&=-1\\
\end{align*}
and zero in all other cases. Recall that the Levi-Civita symbol satisfies the useful relations,
\begin{equation}
\epsilon_{ijk}\epsilon_{ipq}=\delta_{jp}\delta_{kq}-\delta_{jq}\delta_{kp}
\end{equation}
and
\begin{equation}
\epsilon_{ijk}\epsilon_{ijq}=2\delta_{kq}.
\end{equation}
where summation over repeated indices is understood.

Geometrically, \(|\mathbf{u}\times\mathbf{v}|\) is the area of the parallelogram with adjacent sides \(\mathbf{u}\) and \(\mathbf{v}\), \(|\mathbf{u}||\mathbf{v}|\sin\theta\), and its direction is normal to the plane of those vectors. Of the two possible normal directions, the right hand rule gives the correct one.

The combination \((\mathbf{u}\times\mathbf{v})\cdot\mathbf{w}\) is called the triple product. It is the volume of the parallelepiped with base area \(|\mathbf{u}\times\mathbf{v}|\) and height \(\mathbf{w}\cdot\hat{\mathbf{n}}\), where \(\hat{\mathbf{n}}=\mathbf{u}\times\mathbf{v}/|\mathbf{u}\times\mathbf{v}|\). It has the property that permuting the three vectors cyclically doesn’t affect its value,
\begin{equation*}
(\mathbf{u}\times\mathbf{v})\cdot\mathbf{w}=(\mathbf{v}\times\mathbf{w})\cdot\mathbf{u}=(\mathbf{w}\times\mathbf{u})\cdot\mathbf{v},
\end{equation*}
and also that
\begin{equation*}
(\mathbf{u}\times\mathbf{v})\cdot\mathbf{w}=\mathbf{u}\cdot(\mathbf{v}\times\mathbf{w}).
\end{equation*}
Both of these follow immediately from the observation that
\begin{equation}
(\mathbf{u}\times\mathbf{v})\cdot\mathbf{w}=\delta^{il}\epsilon_{ljk}u^jv^kw^i.
\end{equation}

A useful formula relating the cross and scalar products is
\begin{equation}
\mathbf{u}\times(\mathbf{v}\times\mathbf{w})=(\mathbf{u}\cdot\mathbf{w})\mathbf{v}-(\mathbf{u}\cdot\mathbf{v})\mathbf{w}.
\end{equation}
This relationship is established as follows.
\begin{align*}
(\mathbf{u}\times(\mathbf{v}\times\mathbf{w}))^i&=\epsilon^i_{jk}u^j(\mathbf{v}\times\mathbf{w})^k\\
&=\epsilon^i_{jk}\epsilon^k_{lm}u^jv^lw^m\\
&=\epsilon^k_{ij}\epsilon^k_{lm}u^jv^lw^m\\
&=(\delta_{il}\delta_{jm}-\delta_{im}\delta_{jl})u^jv^lw^m\\
&=(\mathbf{u}\cdot\mathbf{w})v^i-(\mathbf{u}\cdot\mathbf{v})w^i
\end{align*}

A First Look at Curvilinear Coordinate Systems

In \(\RR^2\), a point whose Cartesian coordinates are \((x,y)\) could also be identified by its polar coordinates, \((r,\theta)\), where \(r\) is the length of the point’s position vector and \(\theta\) the angle between the position vector and the \(x\)-axis (as given by the vector \((1,0)\)). In fact what we are doing here is putting a subset of points of \(\RR^2\), \(\RR^2\) minus the origin (since the polar coordinates of the origin are not well defined) into 1-1 correspondence with a subset of points, \((0,\infty)\times[0,2\pi)\), of another copy of \(\RR^2\). We have a pair of coordinate functions, \(r:\RR^2\mapto\RR\) and \(\theta:\RR^2\mapto\RR\), such that \(r(x,y)=r=\sqrt{x^2+y^2}\) and \(\theta(x,y)=\theta=\tan^{-1}(y/x)\). Note again the unfortunate notation here — \(r\) and \(\theta\) are being used to denote coordinate functions as well as the coordinates (real numbers) themselves.

Coordinates at a point give rise to basis vectors for the tangent space at that point. We’ll discuss this more rigorously later, but the basic idea is simple. Take polar coordinates as an example. If we invert the 1-1 coordinate maps, \(r=r(x,y)\) and \(\theta=\theta(x,y)\) to obtain functions \(x=x(r,\theta)\) and \(y=y(r,\theta)\) then we may consider the two coordinate curves through any point \(P(x,y)\) obtained by holding in turn \(r\) and \(\theta\) fixed whilst allowing the other to vary. The tangent vectors at \(P\) to these curves are then the basis vectors corresponding to the coordinates being varied. Let’s consider some particular examples, for which the construction is geometrically straightforward.

In the case of \(\RR^2\), consider polar coordinates at a point \(P(x,y)\).
image
Then we have \(x=r\cos\theta\) and \(y=r\sin\theta\). Corresponding to the \(r\) and \(\theta\) coordinates are basis vectors \(\mathbf{e}_r\) and \(\mathbf{e}_\theta\) at \(P\), pointing respectively in the directions obtained by increasing the \(r\)-coordinate holding the \(\theta\)-coordinate fixed and increasing the \(\theta\)-coordinate holding the \(r\)-coordinate fixed. We can use the scalar product to compute the relationship between the Cartesian and polar basis vectors according to,
\begin{equation}
\mathbf{e}_r=(\mathbf{e}_r\cdot\mathbf{e}_x)\mathbf{e}_x+(\mathbf{e}_r\cdot\mathbf{e}_y)\mathbf{e}_y,
\end{equation}
and
\begin{equation}
\mathbf{e}_\theta=(\mathbf{e}_\theta\cdot\mathbf{e}_x)\mathbf{e}_x+(\mathbf{e}_\theta\cdot\mathbf{e}_y)\mathbf{e}_y,
\end{equation}
which, assuming \(\mathbf{e}_r\) and \(\mathbf{e}_\theta\) to be of unit length, result in the relations,
\begin{align}
\mathbf{e}_r&=\cos\theta\mathbf{e}_x+\sin\theta\mathbf{e}_y\\
\mathbf{e}_\theta&=-\sin\theta\mathbf{e}_x+\cos\theta\mathbf{e}_y.
\end{align}

In three dimensional space, the cylindrical coordinates of a point, \((\rho,\varphi,z)\), are related to its Cartesian coordinates by,
\begin{equation}
x=\rho\cos\varphi,\quad y=\rho\sin\varphi,\quad z=z,
\end{equation}

imageand its not difficult to check that the unit basis vectors defined any some point by the cylindrical coordinate system there are related to the Cartesian basis vectors as,
\begin{align}
\mathbf{e}_\rho&=\cos\varphi\mathbf{e}_x+\sin\varphi\mathbf{e}_y\\
\mathbf{e}_\varphi&=-\sin\varphi\mathbf{e}_x+\cos\varphi\mathbf{e}_y\\
\mathbf{e}_z&=\mathbf{e}_z.
\end{align}

The spherical polar coordinates,

image\((r,\theta,\varphi)\), are related to its Cartesian coordinates by,
\begin{equation}
x=r\cos\varphi\sin\theta,\quad y=r\sin\varphi\sin\theta,\quad z=r\cos\theta.
\end{equation}
To relate the unit basis vectors of the spherical polar coordinate system to the Cartesian basis vectors it is easiest to first express them in terms of the cylindrical basis vectors as,
\begin{align*}
\mathbf{e}_r&=\sin\theta\mathbf{e}_\rho+\cos\theta\mathbf{e}_z\\
\mathbf{e}_\theta&=\cos\theta\mathbf{e}_\rho-\sin\theta\mathbf{e}_z\\
\mathbf{e}_\varphi&=\mathbf{e}_\varphi,
\end{align*}
so that,
\begin{align}
\mathbf{e}_r&=\sin\theta\cos\varphi\mathbf{e}_x+\sin\theta\sin\varphi\mathbf{e}_y+\cos\theta\mathbf{e}_z\\
\mathbf{e}_\theta&=\cos\theta\cos\varphi\mathbf{e}_x+\cos\theta\sin\varphi\mathbf{e}_y-\sin\theta\mathbf{e}_z\\
\mathbf{e}_\varphi&=-\sin\varphi\mathbf{e}_x+\cos\varphi\mathbf{e}_y.
\end{align}