Monthly Archives: March 2016

Inverse Function Theorem

From elementary calculus we recall that a continuous function is invertible if and only if it is monotonically increasing or decreasing over the interval of the required inverse. We can see how this arises by looking at the linear approximation to \(f\) in the neighbourhood of some point \(x=a\), \(f(x)\approx f(a)+f'(a)\cdot(x-a)\). Clearly, to be able to invert this and express, at least locally, \(x\) in terms of \(f(x)\) we must have \(f'(a)\neq0\).

As we’ve seen, we can similarly approximate the function \(f:\RR^n\mapto\RR^m\) in the neighbourhood of a point \(a\in\RR^n\) as \(f(x)\approx f(a)+J_f(a)(x-a)\) which tells us that for \(f\) to be invertible in the neighbourhood of some point will certainly require the Jacobian matrix to be invertible at that point. In particular we must have \(n=m\), in which case the determinant of this matrix is called the Jacobian determinant of the map \(f\). We now state the important inverse function theorem.

Theorem (Inverse function theorem) Suppose \(f:\RR^n\mapto\RR^n\) is smooth on some open subset of \(\RR^n\). Then if \(\det\mathbf{J}_f(a)\neq0\) at some \(a\) in that subset then there exists an open neighbourhood \(U\) of \(a\) such that \(V=f(U)\) is open and \(f:U\mapto V\) is a diffeomorphism. In this case, if \(x\in U\) and \(y=f(x)\) then \(J_{f^{-1}}(y)=(J_f(x))^{-1}\).

Note that if \(f:U\mapto V\) is a diffeomorphism of open sets then we may form the identity function \(f\circ f^{-1}\) on \(V\). Clearly, for all \(y\in V\), \(J_{f\circ f^{-1}}(y)=\id_V\) but by the chain rule we have \(\id_V=J_{f\circ f^{-1}}(y)=J_f(x)J_{f^{-1}}(y)\) for any \(y=f(x)\in V\) and so \(J_f(x)\) is invertible at all points \(x\in U\).

Example In one dimension, the function \(f(x)=x^3\) is invertible with \(f^{-1}(x)=x^{1/3}\). Notice though that, \(f'(x)=3x^2\), so that, \(f'(0)=0\), and the hypothesis of the inverse function theorem is violated. The point is that \(f^{-1}\) is not differentiable at \(f(0)=0\).

A useful consequence of the inverse function theorem is the following. If \(U\subset\RR^n\) is some open subset of \(\RR^n\) on which a map \(f:U\mapto\RR^n\) is smooth and for which the Jacobian determinant \(\det\mathbf{J}_f(x)\neq0\) for all \(x\in U\) then \(f(U)\) is open and if \(f\) is injective then \(f:U\mapto f(U)\) is a diffeomorphism. To see this, note that since at every \(x\in U\), \(\det\mathbf{J}_f(x)\neq0\), the inverse function theorem tells us that we have open sets which we can call \(U_x\) and \(V_x\) such that \(x\in U_x\) and \(V_x=f(U_x)\) open in \(f(U)\) so that, since \(f(x)\in V_x\subset f(U)\), \(f(U)\) is open. If \(f\) is injective then, since by the theorem \(f:U_x\mapto V_x\) is a diffeomorphism for every \(x\in U\) and since \(f(U)\) is open we conclude that the inverse \(f^{-1}\) is smooth on \(f(U)\) so that indeed \(f:U\mapto f(U)\) is a diffeomorphism.

A coordinate system, \((y^1,\dots,y^n)\), for some subset \(U\) of points of \(\RR^n\) is simply a map
\begin{equation}
(x^1,\dots,x^n)\mapsto(y^1(x^1,\dots,x^n),\dots,y^n(x^1,\dots,x^n)), \label{map:coordmap}
\end{equation}
allowing us to (re-)coordinatize points \(x=(x^1,\dots,x^n)\in U\). Intuitively, for the \(y^i\) to be good coordinates, the map \eqref{map:coordmap} should be a diffeomorphism — points should be uniquely identified and we should be able to differentiate at will. Using the inverse function theorem we can test this by examining the Jacobian of the transformation.

Example Consider the coordinate transformation maps of the previous section. For polar coordinates in the plane the map \(r,\theta)\mapsto(r\cos\theta,r\sin\theta\) defined on the open set \((0,\infty)\times\RR\) is smooth with the Jacobian determinant \(r\) which is non-zero everywhere on the domain. Thus the inverse function theorem tells us that the restriction of this map to any open subset on which it is injective is a diffeomorphism onto its image. We could restrict, for example, to \((0,\infty)\times(0,2\pi)\) and the polar coordinates map is then a diffeomorphism onto the complement of the non-negative \(x\)-axis. For cylindrical coordinates the Jacobian is \(\rho\). Restricting to \((0,\infty)\times(0,2\pi)\times\RR\) the cylindrical polar coordinates map is a diffeomorphism onto the complement of the \(yz\) half plane corresponding to non-negative \(y\)-values. In the case of spherical polar coordinates the Jacobian is \(r^2\sin\theta\) so restricting to \((0,\infty)\times(0,\pi)\times(0,2\pi)\) we have a diffeomorphism onto the image.

Partial derivatives and some applications

Some Multivariable Functions

The most familiar examples of multivariable functions are those taking values in \(\RR\). These are also called scalar fields — they assign a scalar to each point in space. One example, a function \(f:\RR^2\mapto\RR\), is
\begin{equation*}
f(x,y)=x^2+y^2.
\end{equation*}
We say that it has ‘level curves’ — the set of points \((x,y)\) such that \(f(x,y)=r^2\) — which are circles of radius \(r\). An analogous example this time of a function \(f:\RR^3\mapto\RR\) is
\begin{equation*}
f(x,y,z)=x^2+y^2+z^2,
\end{equation*}
and in this case a ‘level surface’, specified by the points \((x,y,z)\) such that \(f(x,y,z)=r^2\), is a sphere of radius \(r\).

The curvilinear coordinates provide important examples of functions taking values in \(\RR^2\) and \(\RR^3\). Take the polar coordinates first. The function mapping a points polar coordinates to its cartesian coordinates is given by \(f:(0,\infty)\times[0,2\pi)\mapto\RR^2\)
\begin{equation*}
f(r,\theta)=(r\cos\theta,r\sin\theta).
\end{equation*}
The function mapping a point’s cylindrical coordinates to its Cartesian coordinates is a function \((0,\infty)\times[0,2\pi)\times\RR\mapto\RR^3\) which we could write as
\begin{equation*}
f(\rho,\varphi,z)=(\rho\cos\varphi,\rho\sin\varphi,z).
\end{equation*}
The function mapping a point’s spherical coordinates to its Cartesian coordinates is a function \((0,\infty)\times(0,\pi)\times[0,2\pi)\mapto\RR^3\) which we could write as
\begin{equation*}
f(r,\theta,\varphi)=(r\cos\varphi\sin\theta,r\sin\varphi\sin\theta,r\cos\theta).
\end{equation*}
Note that in each of these functions the domain has been restricted to ensure the function is one-to-one.

Definition of partial derivative

If \(f:\RR^n\mapto\RR\) is a real valued function on \(\RR^n\) we define the partial derivative of \(f\) with respect to \(x^i\) as,
\begin{equation}
\frac{\partial f}{\partial x^i}=\lim_{\epsilon\mapto0}\frac{f(x^1,\dots,x^i+\epsilon,\dots,x^n)-f(x^1,\dots,x^i,\dots,x^n) }{\epsilon}.
\end{equation}
Thus, a small change \(\Delta x^i\) in the \(x^i\) coordinate leads to an increment in the value of the function given by,
\begin{equation}
\Delta f\approx\frac{\partial f}{\partial x^i}\Delta x^i.
\end{equation}
More generally, we have
\begin{equation}
\Delta f\approx\sum_{i=1}^n\frac{\partial f}{\partial x^i}\Delta x^i=\partial_if\Delta x^i,
\end{equation}
where we’ve introduced the notation,
\begin{equation}
\partial_if=\frac{\partial f}{\partial x^i}.
\end{equation}
In this notation, second order partials are represented as
\begin{equation}
\partial_{ij}f=\frac{\partial^2 f}{\partial x^i\partial x^j}.
\end{equation}
An extremely important property of partial derivatives is that the order in which we take partial derivatives is irrelevant: \(\partial_{ij}f=\partial_{ji}f\). A function \(f:\RR^n\mapto\RR\) is said to be smooth if all higher order partials exist and are continuous. We denote the set of smooth functions \(f:\RR^n\mapto\RR\) by \(C^\infty(\RR^n)\).

Leibnitz’ rule

Partial differentiation can be useful in evaluating certain integrals, a technique informally known as ‘differentiating under the integral sign’. Suppose \(F(x,t)=\int f(x,t)\,dt\), then
\begin{equation*}
\frac{\partial F}{\partial t}=f(x,t),
\end{equation*}
so that
\begin{equation*}
\frac{\partial^2 F(x,t)}{\partial x\partial t}=\frac{\partial f(x,t)}{\partial x},
\end{equation*}
which upon integrating yields
\begin{equation*}
\frac{\partial F(x,t)}{\partial x}=\int \frac{\partial f(x,t)}{\partial x}\,dt.
\end{equation*}
More generally, if
\begin{equation*}
I(x)=\int_{u(x)}^{v(x)}f(x,t)\,dt=F(x,v(x))-F(x,u(x)),
\end{equation*}
then \(\partial I/\partial v=f(x,v(x))\), \(\partial I/\partial u=-f(x,u(x))\) and
\begin{align*}
\frac{\partial I}{\partial x}&= \int^v\frac{\partial f(x,t)}{\partial x}\,dt-\int^u\frac{\partial f(x,t)}{\partial x}\,dt\\
&=\int_u^v\frac{\partial f(x,t)}{\partial x}\,dt
\end{align*}
so that
\begin{equation}
\frac{dI}{dx}=f(x,v(x))\frac{dv}{dx}-f(x,u(x))\frac{du}{dx}+\int_u^v\frac{\partial f(x,t)}{\partial x}\,dt
\end{equation}
which is called Leibnitz’ rule.

Example
If
\begin{equation}
\phi(\alpha)=\int_\alpha^{\alpha^2}\frac{\sin\alpha x}{x}\,dx
\end{equation}
then by Leibnitz’ rule we have,
\begin{align*}
\phi'(\alpha)&=\frac{\sin\alpha^3}{\alpha^2}\cdot2\alpha-\frac{\sin\alpha^2}{\alpha}+\int_\alpha^{\alpha^2}\cos\alpha x\,dx\\
&=2\frac{\sin\alpha^3}{\alpha}-\frac{\sin\alpha^2}{\alpha}+\frac{\sin\alpha^3}{\alpha}-\frac{\sin\alpha^2}{\alpha}\\
&=\frac{3\sin\alpha^3-2\sin\alpha^2}{\alpha}.
\end{align*}

Taylor expansion and stationary points

The Taylor expansion for \(f:\RR^n\mapto\RR\) about a point \(a\) is
\begin{equation}
f(x)=f(a)+\sum_{i=1}^n\partial_if(a)(x^i-a^i)+\frac{1}{2}\sum_{i,j=1}^n\partial_{ij}f(a)(x^i-a^i)(x^j-a^j)+\dots,
\end{equation}
where for clarity (here and below) we’re not employing the summation convention for repeated indices.

The stationary points of a function \(f\) may be analysed with the help of the Taylor expansion as follows. At any stationary point, \(a\), the first partial derivatives must be zero. To try to determine the nature of the stationary point we consider the approximation of the function given by the Taylor expansion about the point,
\begin{equation}
f(x)-f(a)\approx\frac{1}{2}{\Delta\mathbf{x}}^\mathsf{T}\mathbf{M}\Delta\mathbf{x}\label{eq:1st order taylor}
\end{equation}
where \(\Delta\mathbf{x}^\mathsf{T}=(x^1-a^1,\dots,x^n-a^n)\), \(\Delta\mathbf{x}\) the corresponding column vector and \(\mathbf{M}\) is the matrix with elements \(M_{ij}=\partial_{ij}f(a)\). Since \(\mathbf{M}\) is a real symmetric matrix it is diagonalisable through a similarity transformation by an orthogonal matrix \(\mathbf{O}\). That is, \(\mathbf{O}^\mathsf{T}\mathbf{M}\mathbf{O}\) is diagonal with diagonal elements the eigenvalues of \(\mathbf{M}\). Thus we have
\begin{equation}
f(x)-f(a)\approx\frac{1}{2}{\Delta\mathbf{x}’}^\mathsf{T}\mathbf{M}’\Delta\mathbf{x}’,
\end{equation}
where \(\Delta\mathbf{x}’=\mathbf{O}^\mathsf{T}\Delta\mathbf{x}\) and \(M’_{ij}=\delta_{ij}\lambda_i\) with \(\lambda_i\) the eigenvalues of \(\mathbf{M}\). That is,
\begin{equation}
f(x)-f(a)\approx\frac{1}{2}\sum_i(\Delta x’^i)^2\lambda_i,
\end{equation}
from which we conclude the following:

  1. If \(\lambda_i>0\) for all \(i\) then the stationary point at \(a\) is a minimum.
  2. If \(\lambda_i<0\) for all \(i\) then the stationary point at \(a\) is a maximum.
  3. If at least one \(\lambda_i>0\) and at least one \(\lambda_i<0\) then the stationary point at \(a\) is a saddle point (a stationary point which is not an extremum).
  4. If some \(\lambda_i=0\) and the non-zero \(\lambda_i\) all have the same sign then the test is inconclusive.

For a function of two real variables, \(f:\RR^2\mapto\RR\), the eigenvalues of \(\mathbf{M}\) are obtained from
\begin{equation*}
\det\begin{pmatrix}
\partial_{xx}f-\lambda & \partial_{xy}f\\
\partial_{xy}f & \partial_{yy}f-\lambda\\
\end{pmatrix}=\lambda^2-(\partial_{xx}f+\partial_{yy}f)\lambda+\partial_{xx}f\partial_{yy}f-\partial_{xy}f^2=0.
\end{equation*}
That is,
\begin{equation*}
\lambda=(\partial_{xx}f+\partial_{yy}f)\pm\sqrt{(\partial_{xx}f-\partial_{yy}f)^2+4\partial_{xy}f^2},
\end{equation*}
so that for \(\lambda>0\) we need \(\partial_{xx}f>0\), \(\partial_{yy}f>0\) and \(\partial_{xx}f\partial_{yy}f-\partial_{xy}f^2>0\). For \(\lambda<0\) we need \(\partial_{xx}f<0\), \(\partial_{yy}f<0\) and \(\partial_{xx}f\partial_{yy}f-\partial_{xy}f^2>0\). For a saddle point we need \(\partial_{xx}f\) and \(\partial_{yy}f\) having opposite signs or \(\partial_{xx}f\partial_{yy}f<\partial_{xy}f^2\).

Taylor’s Theorem

Further refining our index notation for partials such that for any \(m\)-tuple \(I=(i_1,\dots,i_m)\) and \(|I|=m\) we define
\begin{equation}
\partial_I=\frac{\partial^m}{\partial x^{i_1}\cdots\partial x^{i_m}}
\end{equation}
and
\begin{equation}
(x-a)^I=(x^{i_1}-a^{i_1})\cdots(x^{i_m}-a^{i_m})
\end{equation}
we can state Taylor’s Theorem. This says that for a function \(f:\RR^n\mapto\RR\) appropriately differentiable near a point \(a\) we have for all \(x\) near \(a\),
\begin{equation}
f(x)=P_k(x)+R_k(x),
\end{equation}
where
\begin{equation}
P_k(x)=f(a)+\sum_{m=1}^k\frac{1}{m!}\sum_{I:|I|=m}(x-a)^I\partial_If(a),
\end{equation}
is the the \(k\)th-order Taylor polynomial of \(f\) at \(a\) and
\begin{equation}
R_k(x)=\frac{1}{k!}\sum_{I:|I|=k+1}(x-a)^I\int_0^1(1-t)^k\partial_If(a+t(x-a))dt
\end{equation}
is the \(k\)th remainder term. To see why this is true we use induction. For \(k=0\), \(P_0(x)=f(a)\) and the \(k\)th remainder term is
\begin{align*}
R_0(x)&=\sum_{i=1}^n(x^i-a^i)\int_0^1\partial_if(a+t(x-a))dt\\
&=\int_0^1\frac{d}{dt}f(a+t(x-a))dt\\
&=f(x)-f(a).
\end{align*}
Now assume the result for some \(k\) and use integration by parts on the integral in the remainder term,
\begin{align*}
\int_0^1(1-t)^k\partial_If(a+t(x-a))dt&=\left.\left(-\frac{(1-t)^{k+1}}{k+1}\partial_If(a+t(x-a))\right)\right\rvert_0^1\\
&+\frac{(1-t)^{k+1}}{k+1}\frac{d}{dt}\partial_If(a+t(x-a))dt\\
&=\frac{1}{k+1}\partial_If(a)\\
&+\frac{1}{k+1}\sum_{i=1}^n(x^i-a^i)\int_0^1(1-t)^k+1\frac{\partial}{\partial x^i}\partial_If(a+t(x-a))dt
\end{align*}
Now observe that
\begin{equation*}
P_k(x)+\frac{1}{k!}\sum_{I:|I|=k+1}(x-a)^I\frac{1}{k+1}\partial_If(a)=P_{k+1}(x)
\end{equation*}
and that
\begin{align*}
&\frac{1}{k!}\sum_{I:|I|=k+1}(x-a)^I\frac{1}{k+1}\sum_{i=1}^n(x^i-a^i)\int_0^1(1-t)^k+1\frac{\partial}{\partial x^i}\partial_If(a+t(x-a))dt\\
&=\frac{1}{(k+1)!}\sum_{I:|I|=k+1}\sum_{i=1}^n(x-a)^I(x^i-a^i)\int_0^1(1-t)^k+1\frac{\partial}{\partial x^i}\partial_If(a+t(x-a))dt\\
&=\frac{1}{(k+1)!}\sum_{I:|I|=k+2}(x-a)^I\int_0^1(1-t)^k+1\partial_If(a+t(x-a))dt\\
&=R_{k+1}(x).
\end{align*}

Differentials

Recall that the total differential \(df\) of a function \(f:\RR^n\mapto\RR\) is defined to be
\begin{equation}
df=\partial_if dx^i.
\end{equation}
Though the relation to the infinitesimal increment is clear, there is no approximation intended here. Later we will formally define \(df\) as an object belonging to the dual space of the tangent space at a point, a “differential form”, but for the time being it is safe to think of it either as a small change in \(f\) or as the kind of object we are used to integrating.

When working with partial derivatives it is always wise to indicate clearly which variables are being held constant. Thus,
\begin{equation}
\left(\frac{\partial\phi}{\partial x}\right)_{y,z},
\end{equation}
means the partial derivative of \(\phi\), regarded as a function of \(x\), \(y\) and \(z\), with respect to \(x\), holding \(y\) and \(z\) constant. The following example demonstrates how differentials naturally ‘spit out’ all partial derivatives simultaneously.

Example Suppose \(w=x^3y-z^2t\), \(xy=zt\), and we wish to calculate
\begin{equation*}
\left(\frac{\partial w}{\partial y}\right)_{x,t}.
\end{equation*}
We could either proceed directly, using the chain rule,
\begin{equation*}
\left(\frac{\partial w}{\partial y}\right)_{x,t}=x^3-2zt\quad\quad\left(\frac{\partial z}{\partial y}\right)_{x,t}=x^3-2xz,
\end{equation*}
or take differentials,
\begin{equation*}
dw=3x^2y\,dx+x^3\,dy-2zt\,dz-z^2\,dt,
\end{equation*}
\begin{equation*}
y\,dx+x\,dy=t\,dz+z\,dt,
\end{equation*}
then subsituting for \(dz\), since \(x\), \(y\) and \(t\) are being treated as the independent variables, to get,
\begin{equation*}
dw=(3x^2y-2yz)\,dx+(x^3-2xz)\,dy+z^2\,dt,
\end{equation*}
from which we obtain all the partials at once,
\begin{equation*}
\left(\frac{\partial w}{\partial x}\right)_{y,t}=3x^2y-2yz\quad
\left(\frac{\partial w}{\partial y}\right)_{x,t}=x^3-2xz\quad
\left(\frac{\partial w}{\partial t}\right)_{x,y}=z^2.\\
\end{equation*}

A differential of the form \(g_idx^i\) is said to be exact if there exists a function \(f\) such that \(df=g_idx^i\). This turns out to be an important attribute and in many situations is equivalent to the condition that \(\partial_ig_j=\partial_jg_i\) for all pairs \(i,j\).

Example Consider the differential, \((x+y^2)\,dx+(2xy+3y^2)\,dy\). This certainly satisfies the condition so let us try to identify an \(f\) such that it is equal to \(df\). Integrating \((x+y^2)\) with respect to \(x\) treating \(y\) as constant, we find our candidate must have the form, \(x^2/2+xy^2+c(y)\), where \(c(y)\) is some function of \(y\). Now differentiating this with respect to \(y\) we get \(2xy+c'(y)\) and this must be equal to \(2xy+3y^2\). Therefore \(f\) must have the form \(f(x,y)=x^2/2+xy^2+y^3+c\) where \(c\) is an arbitrary constant.

The Vector Space \(\RR^n\)

Points and coordinates

We’ll be considering the generalisation to \(n\)- (particularly \(n=3\)) dimensions of the familiar notions of single variable differential and integral calculus. This will all be further generalised when we come to the discussion of calculus on manifolds and with that goal in mind we’ll try to take a little more care than is perhaps strictly necessary in setting the stage.

Our space will be \(\RR^n\), \(n\)-dimensional Euclidean space. Forget for the moment that this is a vector space and consider it simply as a space of points, \(n\)-tuples such as \(a=(a^1,\dots,a^n)\). The (Cartesian) coordinates on \(\RR^n\) will be denoted by \(x^1,\dots,x^n\) so that the \(x^ith\) coordinate of the point \(a\) is \(a^i\). When talking about a general (variable) point in \(\RR^n\) we’ll denote it by \(x\) with \(x=(x^1,\dots,x^n)\). Beware though that in two and three dimensions we’ll sometimes also denote the Cartesian coordinates as \(x,y\) and \(x,y,z\) respectively. The \(x^is\) in \(\RR^n\) and the \(x\), \(y\), and \(z\) in \(\RR^2\) and \(\RR^3\) are best thought of as coordinate functions. So, for example, \(x^i:\RR^n\mapto\RR\) is such that \(x^i(a)=a^i\). When discussing general (variable) points we’re therefore abusing notation — using the same symbol to denote coordinate functions and coordinates. In other words we might come across a notationally undesirable equation such as \(x^i(x)=x^i\). Context should make it clear what is intended. It’s worth noting here that every point of the \(\RR^n\) space can be specified by a single Cartesian coordinate system.

When we come to consider more general spaces, for example the surface of a sphere, this will not be the case. In such cases we’ll still assign (Cartesian) coordinates to points in our space through coordinate maps which effectively identify coordinate patches of the general space of points with pieces of \(\RR^n\). Where these patches overlap, the coordinate maps must, in a precise mathematical sense, be compatible — we must be able to consistently “sew” the patches together. Spaces which can, in this way, be treated as “locally Euclidean” are important because we can do calculus on functions on these spaces just as we can for functions on \(\RR^n\). We simply exploit the vector space properties of \(\RR^n\) via the coordinate maps. Crucial in this regard is the fact that \(\RR^n\) is a normed vector space, the norm, \(|a|\), of a point \(a=(a^1,\dots,a^n)\) being given by \(|a|=\sqrt{(a^1)^2+\cdots+(a^n)^2}\) so that the distance between two points is given by \(d(a,b)\) where
\begin{equation}
d(a,b)=\sqrt{(b^1-a^1)^2+\cdots+(b^n-a^n)^2}.
\end{equation}
As a vector space, \(\RR^n\) has a standard set of basis vectors, \(e_1,\dots,e_n\), with \(e_i\) typically regarded as column vector with 1 in the \(i\)th row and zeros everywhere else.

Vectors and the choice of scalar product

Standard treatments of vector calculus exploit the fact that, \(\RR^3\) say, can be simultaneously thought of as a space of points and of vectors. There’s no need to distinguish since we can always “parallel transport” a vector at some point in space back to the origin or, for that matter, to any other point. In such treatments the usual scalar product is typically taken for granted. Personally, I’ve found that this leads to a certain amount of confusion as to the real role of the scalar product, particularly when it comes to, say, discussions of the geometry of spacetime in special relativity. In that case the space is \(\RR^4\) as per the previous section but the scalar product of tangent vectors is crucially not the Euclidean scalar product.

For this reason, we’ll take some care to distinguish the intuitive notion of vectors as “arrows in space” from the underlying space of points. To each point, \(x\), in \(\RR^n\) will be associated a vector space, the tangent space at \(x\), \(T_x(\RR^n)\). This is the space containing all the arrows at the point \(x\) and is, of course, a copy of the vector space \(\RR^n\). In other words our intuitive notion of an arrow between two points \(a\) and \(b\) is treated as an object within the tangent space at \(a\). When dealing with tangent vectors we’ll use boldface. Thus, the standard set of basis vectors in a tangent space, \(T_x(\RR^n)\), will be denoted \(\mathbf{e}_1,\dots,\mathbf{e}_n\), with \(\mathbf{e}_i\) a column vector with 1 in the \(i\)th row and zeros everywhere else. The basis vector \(\mathbf{e}_i\) can be regarded as pointing from \(x\) in the direction of increasing coordinate \(x^i\).

The usual scalar product, also call dot product, of two vectors \(\mathbf{u}=(u^1,\dots,u^n)\) and \(\mathbf{v}=(v^1,\dots,v^n)\) of \(\RR^n\), is given by
\begin{equation}
\mathbf{u}\cdot\mathbf{v}=\sum_{i=1}^nu^iv^i.
\end{equation}
The dot product is a non-degenerate, symmetric, positive-definite inner product on \(\RR^n\) and allows us to define the length of any vector \(\mathbf{v}\) as
\begin{equation}
|\mathbf{v}|=\sqrt{\mathbf{v}\cdot\mathbf{v}}.
\end{equation}
Thanks to the Cauchy-Schwarz Theorem the angle, \(\theta\), between two vectors \(\mathbf{u}\) and \(\mathbf{v}\) may be defined as
\begin{equation}
\cos\theta=\frac{\mathbf{u}\cdot\mathbf{v}}{|\mathbf{u}||\mathbf{v}|}.
\end{equation}
As we’ve mentioned, the Minkowski space-time of special relativity is, as a space of points, \(\RR^4\). However a different choice of scalar product, in this case called a metric, is made, namely, \(\mathbf{u}\cdot\mathbf{v}=-u^0v^0+u^1v^1+u^2v^2+u^3v^3\).

In \(\RR^3\), recall that given two vectors \(\mathbf{u}\) and \(\mathbf{v}\), their vector product, \(\mathbf{u}\times\mathbf{v}\), with respect to Cartesian basis vectors is defined as,
\begin{equation}
\mathbf{u}\times\mathbf{v}=(u^2v^3-u^3v^2)\mathbf{e}_1-(u^1v^3-u^3v^1)\mathbf{e}_2+(u^1v^2-u^2v^1)\mathbf{e}_3,
\end{equation}
which can be conveniently remembered as a determinant,
\begin{equation}
\mathbf{u}\times\mathbf{v}=\det\begin{pmatrix}
\mathbf{e}_1&\mathbf{e}_2&\mathbf{e}_3\\
u^1&u^2&u^3\\
v^1&v^2&v^3
\end{pmatrix}.
\end{equation}
Alternatively, using the summation convention,
\begin{equation}
(\mathbf{u}\times\mathbf{v})^i=\epsilon^i_{jk}u^jv^k,
\end{equation}
where \(\epsilon^i_{jk}=\delta^{il}\epsilon_{ljk}\) is the Levi-Civita symbol. Note that the distinction between upper and lower indices is not important here but in more general contexts it will become so and therefore here we choose to take more care than is really necessary. The Levi-Civita symbol is given by
\begin{align*}
\epsilon_{123}=\epsilon_{231}=\epsilon_{312}&=1\\
\epsilon_{213}=\epsilon_{132}=\epsilon_{321}&=-1\\
\end{align*}
and zero in all other cases. Recall that the Levi-Civita symbol satisfies the useful relations,
\begin{equation}
\epsilon_{ijk}\epsilon_{ipq}=\delta_{jp}\delta_{kq}-\delta_{jq}\delta_{kp}
\end{equation}
and
\begin{equation}
\epsilon_{ijk}\epsilon_{ijq}=2\delta_{kq}.
\end{equation}
where summation over repeated indices is understood.

Geometrically, \(|\mathbf{u}\times\mathbf{v}|\) is the area of the parallelogram with adjacent sides \(\mathbf{u}\) and \(\mathbf{v}\), \(|\mathbf{u}||\mathbf{v}|\sin\theta\), and its direction is normal to the plane of those vectors. Of the two possible normal directions, the right hand rule gives the correct one.

The combination \((\mathbf{u}\times\mathbf{v})\cdot\mathbf{w}\) is called the triple product. It is the volume of the parallelepiped with base area \(|\mathbf{u}\times\mathbf{v}|\) and height \(\mathbf{w}\cdot\hat{\mathbf{n}}\), where \(\hat{\mathbf{n}}=\mathbf{u}\times\mathbf{v}/|\mathbf{u}\times\mathbf{v}|\). It has the property that permuting the three vectors cyclically doesn’t affect its value,
\begin{equation*}
(\mathbf{u}\times\mathbf{v})\cdot\mathbf{w}=(\mathbf{v}\times\mathbf{w})\cdot\mathbf{u}=(\mathbf{w}\times\mathbf{u})\cdot\mathbf{v},
\end{equation*}
and also that
\begin{equation*}
(\mathbf{u}\times\mathbf{v})\cdot\mathbf{w}=\mathbf{u}\cdot(\mathbf{v}\times\mathbf{w}).
\end{equation*}
Both of these follow immediately from the observation that
\begin{equation}
(\mathbf{u}\times\mathbf{v})\cdot\mathbf{w}=\delta^{il}\epsilon_{ljk}u^jv^kw^i.
\end{equation}

A useful formula relating the cross and scalar products is
\begin{equation}
\mathbf{u}\times(\mathbf{v}\times\mathbf{w})=(\mathbf{u}\cdot\mathbf{w})\mathbf{v}-(\mathbf{u}\cdot\mathbf{v})\mathbf{w}.
\end{equation}
This relationship is established as follows.
\begin{align*}
(\mathbf{u}\times(\mathbf{v}\times\mathbf{w}))^i&=\epsilon^i_{jk}u^j(\mathbf{v}\times\mathbf{w})^k\\
&=\epsilon^i_{jk}\epsilon^k_{lm}u^jv^lw^m\\
&=\epsilon^k_{ij}\epsilon^k_{lm}u^jv^lw^m\\
&=(\delta_{il}\delta_{jm}-\delta_{im}\delta_{jl})u^jv^lw^m\\
&=(\mathbf{u}\cdot\mathbf{w})v^i-(\mathbf{u}\cdot\mathbf{v})w^i
\end{align*}

A First Look at Curvilinear Coordinate Systems

In \(\RR^2\), a point whose Cartesian coordinates are \((x,y)\) could also be identified by its polar coordinates, \((r,\theta)\), where \(r\) is the length of the point’s position vector and \(\theta\) the angle between the position vector and the \(x\)-axis (as given by the vector \((1,0)\)). In fact what we are doing here is putting a subset of points of \(\RR^2\), \(\RR^2\) minus the origin (since the polar coordinates of the origin are not well defined) into 1-1 correspondence with a subset of points, \((0,\infty)\times[0,2\pi)\), of another copy of \(\RR^2\). We have a pair of coordinate functions, \(r:\RR^2\mapto\RR\) and \(\theta:\RR^2\mapto\RR\), such that \(r(x,y)=r=\sqrt{x^2+y^2}\) and \(\theta(x,y)=\theta=\tan^{-1}(y/x)\). Note again the unfortunate notation here — \(r\) and \(\theta\) are being used to denote coordinate functions as well as the coordinates (real numbers) themselves.

Coordinates at a point give rise to basis vectors for the tangent space at that point. We’ll discuss this more rigorously later, but the basic idea is simple. Take polar coordinates as an example. If we invert the 1-1 coordinate maps, \(r=r(x,y)\) and \(\theta=\theta(x,y)\) to obtain functions \(x=x(r,\theta)\) and \(y=y(r,\theta)\) then we may consider the two coordinate curves through any point \(P(x,y)\) obtained by holding in turn \(r\) and \(\theta\) fixed whilst allowing the other to vary. The tangent vectors at \(P\) to these curves are then the basis vectors corresponding to the coordinates being varied. Let’s consider some particular examples, for which the construction is geometrically straightforward.

In the case of \(\RR^2\), consider polar coordinates at a point \(P(x,y)\).
image
Then we have \(x=r\cos\theta\) and \(y=r\sin\theta\). Corresponding to the \(r\) and \(\theta\) coordinates are basis vectors \(\mathbf{e}_r\) and \(\mathbf{e}_\theta\) at \(P\), pointing respectively in the directions obtained by increasing the \(r\)-coordinate holding the \(\theta\)-coordinate fixed and increasing the \(\theta\)-coordinate holding the \(r\)-coordinate fixed. We can use the scalar product to compute the relationship between the Cartesian and polar basis vectors according to,
\begin{equation}
\mathbf{e}_r=(\mathbf{e}_r\cdot\mathbf{e}_x)\mathbf{e}_x+(\mathbf{e}_r\cdot\mathbf{e}_y)\mathbf{e}_y,
\end{equation}
and
\begin{equation}
\mathbf{e}_\theta=(\mathbf{e}_\theta\cdot\mathbf{e}_x)\mathbf{e}_x+(\mathbf{e}_\theta\cdot\mathbf{e}_y)\mathbf{e}_y,
\end{equation}
which, assuming \(\mathbf{e}_r\) and \(\mathbf{e}_\theta\) to be of unit length, result in the relations,
\begin{align}
\mathbf{e}_r&=\cos\theta\mathbf{e}_x+\sin\theta\mathbf{e}_y\\
\mathbf{e}_\theta&=-\sin\theta\mathbf{e}_x+\cos\theta\mathbf{e}_y.
\end{align}

In three dimensional space, the cylindrical coordinates of a point, \((\rho,\varphi,z)\), are related to its Cartesian coordinates by,
\begin{equation}
x=\rho\cos\varphi,\quad y=\rho\sin\varphi,\quad z=z,
\end{equation}

imageand its not difficult to check that the unit basis vectors defined any some point by the cylindrical coordinate system there are related to the Cartesian basis vectors as,
\begin{align}
\mathbf{e}_\rho&=\cos\varphi\mathbf{e}_x+\sin\varphi\mathbf{e}_y\\
\mathbf{e}_\varphi&=-\sin\varphi\mathbf{e}_x+\cos\varphi\mathbf{e}_y\\
\mathbf{e}_z&=\mathbf{e}_z.
\end{align}

The spherical polar coordinates,

image\((r,\theta,\varphi)\), are related to its Cartesian coordinates by,
\begin{equation}
x=r\cos\varphi\sin\theta,\quad y=r\sin\varphi\sin\theta,\quad z=r\cos\theta.
\end{equation}
To relate the unit basis vectors of the spherical polar coordinate system to the Cartesian basis vectors it is easiest to first express them in terms of the cylindrical basis vectors as,
\begin{align*}
\mathbf{e}_r&=\sin\theta\mathbf{e}_\rho+\cos\theta\mathbf{e}_z\\
\mathbf{e}_\theta&=\cos\theta\mathbf{e}_\rho-\sin\theta\mathbf{e}_z\\
\mathbf{e}_\varphi&=\mathbf{e}_\varphi,
\end{align*}
so that,
\begin{align}
\mathbf{e}_r&=\sin\theta\cos\varphi\mathbf{e}_x+\sin\theta\sin\varphi\mathbf{e}_y+\cos\theta\mathbf{e}_z\\
\mathbf{e}_\theta&=\cos\theta\cos\varphi\mathbf{e}_x+\cos\theta\sin\varphi\mathbf{e}_y-\sin\theta\mathbf{e}_z\\
\mathbf{e}_\varphi&=-\sin\varphi\mathbf{e}_x+\cos\varphi\mathbf{e}_y.
\end{align}

Velocity addition in special relativity — sometimes \(1+1\neq2\)

There’s a great little book on special relativity by the physicist N. David Mermin in which he gets to the heart of the astonishing consequences of Einstein’s special relativity in a particularly elegant fashion and with only very basic mathematics. In this and the following note we’ll closely follow Mermin’s treatment. The crucial fact of life which we have to come to terms with is that whether or not two events which are spatially separated happen at the same time is a matter of perspective. This flies in the face of our intuition 1. We’re wired to think of time as a kind of universal clock and that we and the rest of the universe march forward with its tick-tock relentlessly and in unison.

Let us begin by reconsidering the relativity of velocities. Our intuition, and Galilean relativity, tells us that if you are riding a train and throw a ball in the direction of travel then to someone stationary with respect to the tracks the speed of the ball is simply the sum of the train’s speed and the speed with which the ball  leaves your hand. But thanks to special relativity we know that, at least for light, this isn’t the case. A photon (particle of light) emitted from a moving train moves at light speed \(c\) with respect to the train and with respect to the tracks. This surely has consequences for the relativity of motion in general.

Following Mermin we employ the neat device of measuring the velocity of an object by racing it against a photon. With a corrected velcocity addition rule as our goal we conduct this race on a train carriage.

image

The particle, black dot, whose velocity \(v\) we seek, sets off from the back of the carriage towards the front in a race with a photon which, as we know, travels at speed \(c\). We arrange that the front of the carriage is mirrored so that once the photon reaches the front it’s reflected back. The point at which the particle and photon meet is recorded (perhaps a mark is made on the floor of the carriage – this is a gedanken experiment!). At that point the particle has travelled a fraction \(1-f\) of the length of the carriage whilst the photon has travelled \(1+f\) times the length of the carriage. The ratio of those distances must be proportional to the ratio of the velocities, that is,
\begin{equation}
\frac{1-f}{1+f}=\frac{v}{c},
\end{equation}
which we can rewrite as an equation for \(f\),
\begin{equation}
f=\frac{c-v}{c+v}\label{eq:f1}.
\end{equation}
The velocity is thus established in an entirely unambiguous manner. This may strike you as a somewhat indirect approach to measuring speed but notice that we’ve avoided measuring either time or distance. As we’ll soon see, in special relativity such measurements are rather more subtle than we might imagine.

Now let’s consider the same race but from the perspective of the track frame relative to which the train carriage is travelling (left to right) with velocity \(u\).

image

We’re after the correct rule for adding the velocity \(v\), of the particle relative to the train, to the velocity \(u\), of the train relative to the track, to give the velocity \(w\), of the particle relative to the track. To facilitate the calculations we’ll allow ourselves to use some lengths and times. However their values aren’t important — as we’ll see they fall out of the final equation. We’re really just using their ‘existence’. As indicated in the diagram, after time \(T_0\) the photon is a distance \(D\) in front of the particle, that is,
\begin{equation}
D=cT_0-wT_0,
\end{equation}
but this distance is then also the sum of the distances covered respectively by the photon and particle in time \(T_1\),
\begin{equation}
D=cT_1+wT_1.
\end{equation}
So we can write the ratio of the times as
\begin{equation}
\frac{T_1}{T_0}=\frac{c-w}{c+w}\label{eq:time-ratio1}.
\end{equation}
If the length of the carriage in the track frame is \(L\) then we also have that the distance covered by the photon in time \(T_0\) is
\begin{equation}
cT_0=L+uT_0
\end{equation}
and in time \(T_1\) is
\begin{equation}
cT_1=fL-uT_1.
\end{equation}
Combining these we eliminate \(L\) to obtain another expression for the ratio of times,
\begin{equation}
\frac{T_1}{T_0}=f\frac{(c-u)}{(c+u)}\label{eq:time-ratio2}.
\end{equation}
The two equations, \eqref{eq:time-ratio1} and \eqref{eq:time-ratio2} provide us with a second equation for \(f\),
\begin{equation}
f=\frac{(c+u)}{(c-u)}\frac{c-w}{c+w},
\end{equation}
which in combination with the first, \eqref{eq:f1}, leads to
\begin{equation}
\frac{c-w}{c+w}=\frac{c-u}{c+u}\frac{c-v}{c+v}\label{eq:velocity-addition1},
\end{equation}
which expresses the velocity \(w\) of the particle in the track frame in terms of the velocity \(u\) of the train in the track frame and the velocity \(v\) of the particle in the train frame. With a bit more work this can be rewritten as
\begin{equation}
w=\frac{u+v}{1+uv/c^2}\label{eq:velocity-addition2},
\end{equation}
which should be compared to the Galilean addition rule, \(w=u+v\).

Here’s a plot, with velocities in units of \(c\), comparing the Galilean with the special relativity velocity addition for an object fired at a speed \(v\) from a train carriage moving at half the speed of light:
velocityadd
Equation \eqref{eq:velocity-addition2} ensures that no matter how fast the particle travels with respect to the train (assuming it’s less than light speed), its velocity with respect to the track is always less than light speed. In the extreme case of a particle traveling at light speed with respect to a train which is also travelling at light speed, \(1+1=1\)!

Events, observers and measurements

In special relativity we often read that such and such an inertial observer measures the time between two events or such and such an inertial observer measures the distance between two events. On the face of it such assertions seem reasonably clear and straightforward and indeed very often their perspicuity is simply taken for granted. But as we’ll see their meanings in relativity are not what we’d expect and therefore its important to establish early exactly what is meant by an ‘observer’ an ‘event’ and what constitutes a measurement.

The adjective ‘inertial’ in ‘inertial observer’ has been dealt with already — whatever or whoever constitutes an observer should be in free-fall. Let’s also be clear that by an ‘event’ we mean a happening, somewhere, sometime, corresponding to a point in spacetime — perhaps a photon of light leaving an emitter or being absorbed by a detector, perhaps a particle passing through a particular point in space, perhaps a time being recorded by a clock at a particular point in space. Events, points in spacetime, are real, they care nothing for coordinate systems, frames of reference etc.

When we introduced the idea of a frame of reference we vaguely mentioned a laboratory in which lengths and times could be measured. Let’s be more concrete now and imagine an inertial frame of reference as a freely floating 3-dimensional latticework of rods and clocks with one node designated as the origin.

All the rods have the same length but the clocks at each node are rather special. Like all good clocks they can of course keep time. In addition though they are programmed with their respective locations with respect to the origin, so in particular they ‘know’ their distance from the origin. Furthermore they are sophisticated recording devices ready to detect any event and record its location and time for future inspection. In particular, this allows them all to be synchronized with the clock at the origin in the following way. A flash of light is sent out from the origin just as the clock there is set to 0. The spherical light front spreads out at the same speed \(c\) in all directions. As each clock in the lattice detects this light it sets its time equal to its distance from the origin divided by \(c\) and is then ‘in sync’ with the clock at the origin. We should imagine this latticework to be ‘fine-grained’ enough to ensure that to any required accuracy a clock is located ‘at’ the spatial location of any event. This is a crucial point. The time assigned to an event, with respect to an inertial frame of reference, is always that of one of the inertial frame’s clocks at the event. The spacetime location of the event is then given by the spatial coordinates of the clock there together with the clock’s time at the moment the event happens and is recorded along with a description of what took place. This would then constitute a ‘measurement’ and the inertial ‘observer’ carrying out the measurement should be thought of as the whole latticework. An observer is better thought of as the all-seeing eye of the entire inertial frame than as somebody located at some specific point in space with a pair of binoculars and a notepad! If we do speak of an observer as a person, and it is convenient and usual to do so, then we really mean such an intelligent latticework of rods and detecting clocks with respect to which that person is at rest.

Shortly we’ll see that when two or more events at different points in space occur simultaneously with respect to one inertial observer, with respect to another they generally occur at different times. Let’s be clear though that if two or more things happen at the same place at the same time then that’s an event and as such its reality is independent of any frame of reference. All observers must agree that it took place even if they assign to it different spacetime coordinates. Sometimes this is obvious. Consider two particles colliding somewhere. Then obviously the collision either took place or it didn’t and the question is merely what spacetime coordinates should be assigned to the location in spacetime of the collision. But other times it might seem a little more confusing. We might say that an observer, let’s call ‘her’ Alice, records that two spatially separated events, for example photons arriving at two different places, occur at the same time. Recall that this really means that at each location a clock records a time corresponding to the event there and these times turn out to be the same, let’s say 2pm. Now the clock striking 2 at a location just as the event takes place there is itself an event and so will be confirmed by any other inertial observer. Let’s call Bob our other observer. He will assign his own times to the two events, and, as we’ll see, he’ll find that his clocks record different times. However, recall that clock’s don’t just tell time — they also record the event — so Bob will certainly confirm that Alice’s clocks both struck 2pm as the photons arrived at those points in spacetime but Bob will conclude that Alice’s clocks aren’t synchronised since from his perspective these two events did NOT occur simultaneously!

Notes:

  1. It’s worth remarking that if we reverse the roles of space and time the corresponding conclusion is not at all surprising. We are entirely comfortable with the fact that whether or not two events which take place at different times occur at the same place is a matter of perspective.

The Hodge Dual

In this section we will assume \(V\) is a real \(n\)-dimensional vector space with a symmetric non-degenerate inner product (metric), \(g(\cdot,\cdot):V\times V\mapto\RR\). In such a vector space we can always choose an orthonormal basis, \(\{e_i\}\), and know from the classification result, Theorem~\ref{in prod class}, that such spaces are characterised up to isometry by a pair of integers, \((n,s)\), where \(s\) is the number of \(e_i\) such that \(g(e_i,e_i)=-1\).

We have seen that the dimensions of the spaces \(\Lambda^r(V)\) are given by the binomial coefficients, \({n \choose r}\). In particular, simply by virtue of having the same dimension, this means that the spaces \(\Lambda^r(V)\) and \(\Lambda^{n-r}(V)\) are isomorphic. In fact, as we shall see, the metric allows us to establish an essentially natural isomorphism between these spaces called Hodge duality.

Take any pair of pure \(r\)-vectors in \(\Lambda^r(V)\), \(\alpha=v_1\wedge\dots\wedge v_r\) and \(\beta=w_1\wedge\dots\wedge w_r\), with \(v_i,w_i\in V\). Then we can define an inner product on \(\Lambda^r(V)\) as
\begin{equation}
(\alpha,\beta)=\det(g(v_i,w_j)),
\end{equation}
where \(g(v_i,w_j)\) is regarded as the \(ij\)th entry of an \(r\times r\) matrix, and extended bilinearly to the whole of \(\Lambda^r(V)\). Since the determinant of a matrix and its transpose are identical, the inner product is symmetric. Given our orthonormal basis, \(\{e_i\}\), of \(V\), consider the inner product of the corresponding basis elements, \(e_{i_1}\wedge\dots\wedge e_{i_r}\), where \(1\leq i_1Example Take the single basis vector of \(\Lambda^n(V)\) to be \(\sigma=e_1\wedge\dots\wedge e_n\), then \((\sigma,\sigma)=(-1)^s\).

Now whenever we have a symmetric non-degenerate inner product on some space \(U\), there is a natural isomorphism, \(U\cong U^*\), which associates to every linear functional, \(f\), on \(U\) a unique vector, \(v_f\in U\), such that \(f(u)=(v_f,u)\) for all \(u\in U\). Choose a normalised basis vector, \(\sigma\), for \(\Lambda^n(V)\) and notice that to any \(\lambda\in\Lambda^r(V)\) is associated a linear functional on \(\Lambda^{n-r}(V)\), \(f_\lambda\), according to \(\lambda\wedge\mu=f_\lambda(\mu)\sigma\). But to \(f_\lambda\) we can uniquely associate an element of \(\Lambda^{n-r}(V)\), call it \(\star\lambda\), according to \(f_\lambda(\mu)=(\star\lambda,\mu)\). \(\star\lambda\) is called the Hodge dual of \(\lambda\) and we may write,
\begin{equation}
\lambda\wedge\mu=(\star\lambda,\mu)\sigma.
\end{equation}
As a map, \(\star:\Lambda^r(V)\mapto\Lambda^{n-r}(V)\) is clearly linear.

Example Consider the 2-dimensional vector space \(\RR^2\) with the usual inner (scalar) product which we’ll here denote \(g(\cdot,\cdot)\). Denoting it’s standard basis vectors by \(\mathbf{e}_1\) and \(\mathbf{e}_2\), we have \(g(\mathbf{e}_i,\mathbf{e}_j)=\delta_{ij}\) and a basis for \(\Lambda^2(\RR^2)\) is \(\mathbf{e}_1\wedge\mathbf{e}_2\) with \((\mathbf{e}_1\wedge\mathbf{e}_2,\mathbf{e}_1\wedge\mathbf{e}_2)=1\). Clearly, we must then have
\begin{equation}
\star1=\mathbf{e}_1\wedge\mathbf{e}_2,
\end{equation}
and
\begin{equation}
\star(\mathbf{e}_1\wedge\mathbf{e}_2)=1.
\end{equation}
\(\star\mathbf{e}_1\) must be such that \((\star\mathbf{e}_1,\mathbf{e}_1)=0\) and \((\star\mathbf{e}_1,\mathbf{e}_2)=1\), that is,
\begin{equation}
\star\mathbf{e}_1=\mathbf{e}_2,
\end{equation}
and \(\star\mathbf{e}_2\) must be such that \((\star\mathbf{e}_2,\mathbf{e}_1)=-1\) and \((\star\mathbf{e}_2,\mathbf{e}_2)=0\), so
\begin{equation}
\star\mathbf{e}_2=-\mathbf{e}_1.
\end{equation}
Notice that if we had chosen \(\mathbf{e}_2\wedge\mathbf{e}_1=-\mathbf{e}_1\wedge\mathbf{e}_2\) as the basis for \(\Lambda^2(\RR^2)\) then \(\star1=-\mathbf{e}_1\wedge\mathbf{e}_2\), \(\star(-\mathbf{e}_1\wedge\mathbf{e}_2)=1\), \(\star\mathbf{e}_1=-\mathbf{e}_2\) and \(\star\mathbf{e}_2=\mathbf{e}_1\).

Given two bases of a vector space \(V\), \(\{e_i\}\) and \(\{f_i\}\), we say that they share the same orientation if the determinant of the change of basis matrix relating them is positive. Bases of \(V\) thus belong to one of two equivalence classes. From a slightly different perspective, given the bases \(\{e_i\}\) and \(\{f_i\}\) we can form the vectors \(e_1\wedge\dots\wedge e_n\) and \(f_1\wedge\dots\wedge f_n\) both of which belong the the 1-dimensional space \(\Lambda^n(V)\) and so we must have
\begin{equation}
f_1\wedge\dots\wedge f_n=ce_1\wedge\dots\wedge e_n.
\end{equation}
We know that we must be able to express the \(f_i\) in terms of the \(e_i\) as \(f_i=T_i^je_j\) where \(T_i^j\) are the elements of the change of basis linear operator defined by \(Te_i=f_i\). But we know that,
\begin{equation}
f_1\wedge\dots\wedge f_n=T^{\wedge n}(e_1\wedge\dots\wedge e_n)=\det Te_1\wedge\dots\wedge e_n,
\end{equation}
so \(c=\det T\). In other words given a basis \(\{e_i\}\) of \(V\), another basis \(f_i\) shares the same orientation if the corresponding top exterior powers are related by a positive constant. The Hodge dual thus depends on both the metric and the orientation of a given vector space.

Example Consider the 3-dimensional space \(\RR^3\) equipped with the usual inner product, with standard basis vectors \(\mathbf{e}_1\), \(\mathbf{e}_2\) and \(\mathbf{e}_3\) and \(\mathbf{e}_1\wedge\mathbf{e}_2\wedge\mathbf{e}_3\) as our prefered top exterior product. Then,
\begin{align}
\star1&=\mathbf{e}_1\wedge\mathbf{e}_2\wedge\mathbf{e}_3\\
\star\mathbf{e}_1&=\mathbf{e}_2\wedge\mathbf{e}_3\\
\star\mathbf{e}_2&=\mathbf{e}_3\wedge\mathbf{e}_1\\
\star\mathbf{e}_3&=\mathbf{e}_1\wedge\mathbf{e}_2\\
\star(\mathbf{e}_1\wedge\mathbf{e}_2)&=\mathbf{e}_3\\
\star(\mathbf{e}_2\wedge\mathbf{e}_3)&=\mathbf{e}_1\\
\star(\mathbf{e}_3\wedge\mathbf{e}_1)&=\mathbf{e}_2\\
\star(\mathbf{e}_1\wedge\mathbf{e}_2\wedge\mathbf{e}_3)&=1.
\end{align}

Let us now establish some general properties of the Hodge dual. We take an orthonormal basis of the \(n\)-dimensional space \(V\) to be \(\{e_i\}\) with top exterior form \(\sigma=ae_1\wedge\dots\wedge e_n\) with \(a=\pm1\). Then consider the pure \(r\)-vector \(e_I=e_1\wedge\dots\wedge e_r\) (no loss of generality will be incurred choosing \(I=(1,\dots,n)\)), we must then have that
\begin{equation}
\star e_I=ce_{r+1}\wedge\dots\wedge e_n=ce_J,
\end{equation}
for \(c=\pm1\) and \(J=(r+1,\dots,n)\). Of course \(c\) depends on our original choice \(a\) according to,
\begin{equation}
c=a(e_J,e_J).
\end{equation}
Consider now, \(\star e_J\), clearly
\begin{equation}
\star e_J=de_I,
\end{equation}
for some \(d=\pm1\) but since \(e_J\wedge e_I=(-1)^{r(n-r)}e_I\wedge e_J\), we have,
\begin{equation}
d=a(-1)^{r(n-r)}(e_I,e_I).
\end{equation}
We may therefore conclude that,
\begin{equation}
\star\star e_I=(-1)^{r(n-r)}(e_I,e_I)(e_J,e_J)e_I,
\end{equation}
but assuming \((\sigma,\sigma)=(-1)^s\) this is then,
\begin{equation}
\star\star e_I=(-1)^{r(n-r)+s}e_I,
\end{equation}
and by linearity we may conclude that for any \(\lambda\in\Lambda^r(V)\),
\begin{equation}
\star\star\lambda=(-1)^{r(n-r)+s}\lambda.
\end{equation}

Notice that for \(\lambda,\mu\in\Lambda^r(V)\), \(\lambda\wedge\star\mu=(\star\lambda,\star\mu)\sigma=(\star\mu,\star\lambda)\sigma=\mu\wedge\star\lambda\), that is,
\begin{equation}
\lambda\wedge\star\mu=\mu\wedge\star\lambda.
\end{equation}
But \(\mu\wedge\star\lambda=(-1)^r(n-r)\star\lambda\wedge\mu=(-1)^s(\lambda,\mu)\sigma\), that is,
\begin{equation}
\lambda\wedge\star\mu=\mu\wedge\star\lambda=(-1)^s(\lambda,\mu)\sigma.
\end{equation}

The Determinant Revisited

Suppose \(L:V\mapto V\) is a linear operator and consider the tensor product map \(L^{\otimes r}=L\otimes\dots\otimes L:T^r(V)\mapto T^r(V)\). Then clearly \(L^{\otimes r}\circ A=A\circ L^{\otimes r}\) so that \(L^{\otimes r}|_{\Lambda^r(V)}:\Lambda^r(V)\mapto\Lambda^r(V)\). This restriction is typically denoted \(L^{\wedge p}\). Now, as we’ve already observed, if \(V\) is an \(n\)-dimensional vector space, then \(\dim\Lambda^n(V)=1\). So any \(L^{\wedge n}\) is multiplication by a scalar. Choosing a basis, \(\{e_i\}\), of \(V\), then \(e_1\wedge\dots\wedge e_n\) is the single basis element of \(\Lambda^n(V)\), and if we write, \(Le_i=L_i^je_j\), then
\begin{equation}
L^{\wedge n}(e_1\wedge\dots\wedge e_n)=d_Le_1\wedge\dots\wedge e_n,
\end{equation}
where \(d_L\) is some scalar. But we also have,
\begin{equation}
L^{\wedge n}(e_1\wedge\dots\wedge e_n)=L_1^{i_1}\cdots L_n^{i_n}e_{i_1}\wedge\dots\wedge e_{i_n}.
\end{equation}
Now, the right hand side here is only non-zero when the set of indices \(\{i_1,\dots,i_n\}\) is precisely \(\{1,2,\dots,n\}\) and in this case
\begin{equation}
L_1^{i_1}\cdots L_n^{i_n}e_{i_1}\wedge\dots\wedge e_{i_n}=\sum_{\sigma\in S_n}\sgn(\sigma)L_1^{\sigma_1}\cdots L_n^{\sigma_n}e_1\wedge\dots\wedge e_n,
\end{equation}
in which we see precisely our original definition of the determinant,
so that \(d_L=\det L\).

Tensor Symmetries in Coordinate Representation

If \(T^{i_1\dots i_r}\) are the components of a \((r,0)\) tensor, \(T\), with respect to some basis then the symmetrization of \(T\), \(S(T)\), has components which are conventionally denoted, \(T^{(i_1\dots i_r)}\). That is, by definition,
\begin{equation}
T^{(i_1\dots i_r)}=\frac{1}{r!}\sum_{\sigma\in S_r}T^{i_{\sigma(1)}\dots i_{\sigma(r)}}.
\end{equation}
Similarly, the antisymmetrization of \(T\), \(A(T)\), has components which are conventionally deonted, \(T^{[i_1\dots i_r]}\). That is, by definition,
\begin{equation}
T^{[i_1\dots i_r]}=\frac{1}{r!}\sum_{\sigma\in S_r}\sgn(\sigma)T^{i_{\sigma(1)}\dots i_{\sigma(r)}}.
\end{equation}

Skew-Symmetric Tensors and the Exterior Algebra

A tensor, \(T\in T^r(V)\), is called skew-symmetric if \(P_\sigma(T)=\sgn(\sigma)T\) for all \(\sigma\in S_r\). The subspace in \(T^r(V)\) of all skew-symmetric tensors will be denoted \(\Lambda^r(V)\).

Define on \(T^r(V)\) the linear operator,
\begin{equation}
A=\frac{1}{r!}\sum_{\sigma\in S_r}\sgn(\sigma)P_\sigma.
\end{equation}
This is called the antisymmetrization on \(T^r(V)\). For any \(T\in T^r(V)\), \(A(T)\) is skew-symmetric, since for any \(\tau\in S_r\),
\begin{align*}
P_\tau\left(\frac{1}{r!}\sum_{\sigma\in S_r}\sgn(\sigma)P_\sigma(T)\right)&=\frac{1}{r!}\sum_{\sigma\in S_r}\sgn(\sigma)P_{\tau\sigma}(T)\\
&=\sgn(\tau)\frac{1}{r!}\sum_{\sigma\in S_r}\sgn(\tau\sigma)P_{\tau\sigma}(T)\\
&=\sgn(\tau)\frac{1}{r!}\sum_{\sigma\in S_r}\sgn(\sigma)P_{\sigma}(T)\\
&=\sgn(\tau)A(T).
\end{align*}
Conversely, suppose \(T\) is skew-symmetric, then
\begin{equation}
A(T)=\frac{1}{r!}\sum_{\sigma\in S_r}\sgn(\sigma)P_\sigma(T)=\frac{1}{r!}\sum_{\sigma\in S_r}\sgn(\sigma)^2T=T,
\end{equation}
so that \(\img A=\Lambda^r(V)\) and \(A^2=A\), so that \(A\) is a projector onto \(\Lambda^r(V)\).

If \(\{e_i\}\) is a basis for the \(n\)-dimensional vector space, \(V\), then all pure tensors of the form, \(e_{i_1}\otimes\dots\otimes e_{i_r}\), form a basis of \(T^r(V)\). A standard notation is to write,
\begin{equation}
A(e_{i_1}\otimes\dots\otimes e_{i_r})=e_{i_1}\wedge\dots\wedge e_{i_r}.
\end{equation}
The symbol \(\wedge\) is called the exterior or wedge product. Since by definition, two pure tensors,
\begin{equation*}
e_{i_1}\otimes\dots\otimes e_{i_j}\otimes\dots\otimes e_{i_k}\otimes\dots\otimes e_{i_r},
\end{equation*}
and
\begin{equation*}
e_{i_1}\otimes\dots\otimes e_{i_k}\otimes\dots\otimes e_{i_j}\otimes\dots\otimes e_{i_r},
\end{equation*}
differing only by the interchange of the pair \(e_{i_j}\) and \(e_{i_k}\), are related by a permutation \(\sigma\) with \(\sgn(\sigma)=-1\), we have,
\begin{equation*}
e_{i_1}\wedge\dots\wedge e_{i_j}\wedge\dots\wedge e_{i_k}\wedge\dots\wedge e_{i_r}=-e_{i_1}\wedge\dots\wedge e_{i_k}\wedge\dots\wedge e_{i_j}\wedge\dots\wedge e_{i_r}.
\end{equation*}
In particular, if \(i_j=i_k\) for some \(j\neq k\), then \(e_{i_1}\wedge\dots\wedge e_{i_r}=0\). It also follows that \(\Lambda^r(V)\) is spanned by tensors of the form, \(e_{i_1}\wedge\dots\wedge e_{i_r}\), such that, \(1\leq i_1n\). But these are also clearly linearly independent, since distinct \(e_{i_1}\wedge\dots\wedge e_{i_r}\) are linear combinations of non-intersecting subsets of basis elements of \(T^r(V)\). It follows then that,
\begin{equation}
\dim\Lambda^r(V)={n\choose r},
\end{equation}
with \(\dim\Lambda^n(V)=1\). We define,
\begin{equation}
\Lambda(V)=\bigoplus_{r=0}^n\Lambda^r(V),
\end{equation}
(\(\dim\Lambda(V)=2^n\)) and introduce a multiplication according to \(T_1\wedge T_2=A(T_1\otimes T_2)\) for any \(T_1\in\Lambda^r(V)\) and \(T_2\in\Lambda^s(V)\). Then for any, \(T_1\in T^r(V)\), \(T_2\in T^r(V)\) and \(T_3\in T^s(V)\),
\begin{align*}
(T_1\wedge T_2)\wedge T_3&=A(A(T_1\otimes T_2)\otimes T_3)\\
&=A\left(\frac{1}{(r+s)!}\sum_{\sigma\in S_{r+s}}\sgn(\sigma)P_\sigma(T_1\otimes T_2)\otimes T_3\right)\\
&=\frac{1}{(r+s)!}\sum_{\sigma\in S_{r+s}}\sgn(\sigma)A(P_\sigma(T_1\otimes T_2)\otimes T_3)\\
&=\frac{1}{(r+s)!}\sum_{\sigma\in S_{r+s}}\sgn(\sigma)^2A(T_1\otimes T_2\otimes T_3)\\
&=A(T_1\otimes T_2\otimes T_3),
\end{align*}
and similarly for \(T_1\wedge(T_2\wedge T_3)=A(T_1\otimes T_2\otimes T_3)\). We conclude, therefore, that the wedge product is associative. Also, for any \(T_1\in T^r(V)\) and \(T_2\in T^s(V)\), we have, \(A(T_1\otimes T_2)=(-1)^{rs}A(T_2\otimes T_1)\) so, in particular, \(T_1\wedge T_2=(-1)^{rs}T_2\wedge T_1\), for any \(T_1\in\Lambda^r(V)\) and \(T_2\in\Lambda^s(V)\).

As with the symmetric algebra, let us now realise \(\Lambda(V)\) as a quotient of the tensor algebra \(T(V)\).

Definition The exterior algebra on the vector space \(V\) over the field \(K\) is the quotient \(T(V)/J\) of the tensor algebra \(T(V)\) by the ideal \(J\) generated by the elements \(v\otimes v\) for all \(v\in V\).

As in the symmetric algebra case, we define \(J^r=T^r(V)\cap J\) so that \(J=\bigoplus_{r=0}^\infty J^r\). Then defining, \(\tilde{\Lambda}(V)=T(V)/J\), it follows as before that, \(\tilde{\Lambda}(V)=\bigoplus_{r=0}^\infty T^r(V)/J^r\). Thus we define, \(\tilde{\Lambda}^r(V)=T^r(V)/J^r\), and seek to relate this to \(\Lambda^r(V)\) defined above.

An alternative definition of the ideal \(J\), is as the ideal generated by the elements, \(u\otimes v+v\otimes u\), for any \(u,v\in V\). The equivalence of these definitions amounts to observing that for any \(u,v\in V\),
\begin{equation*}
(u+v)\otimes(u+v)-u\otimes u-v\otimes v=u\otimes v+v\otimes u.
\end{equation*}
Then, by an argument similar to the one we used in the symmetric case, this ideal is equivalent to the ideal generated by \(T-\sgn(\sigma)P_\sigma(T)\) for all \(T\in T^r(V)\) and any \(\sigma\in S_r\). Once again abusing notation, we’ll denote the product of two elements of \(\tilde{\Lambda}(V)\), \((T_1+J)(T_2+J)\), as \(T_1\wedge T_2\) rather than \(T_1\otimes T_2+J\). Thus the image in, \(\tilde{\Lambda}(V)\), of some pure tensor, \(v_1\otimes\dots\otimes v_r\in T^r(V)\), is denoted \(v_1\wedge\dots\wedge v_r\). Then since, \(T-\sgn(\sigma)P_\sigma(T)\in J\), it follows that, \(T_1\wedge T_2=(-1)^{rs}T_2\wedge T_1\), for any \(T_1\in\tilde{\Lambda}^r(V)\) and \(T_2\in\tilde{\Lambda}^s(V)\).

Just as in the symmetric case, skew-symmetric tensors and the exterior algebra inherit universal properties from the tensor product and tensor algebra respectively. The proofs follow those of the symmetric case.

Proposition If \(\iota\) is the \(r\)-linear function, \(\iota:V\times\dots\times V\mapto\tilde{\Lambda}^r(V)\), defined as , \(\iota(v_1,\dots,v_r)=v_1\cdots v_r\), then \((\tilde{\Lambda}^r(V),\iota)\) has the following universal mapping property: whenever \(f:V\times\dots\times V\mapto W\) is an alternating\footnote{We already met alternating forms in our earlier discussion of determinants, the only generalisation here is that the target space is another vector space.} \(r\)-linear function with values in a vector space \(W\) there exists a unique linear map \(L:\tilde{\Lambda}^r(V)\mapto W\) such that \(f=L\iota\).

Consequently, the space of linear maps \(\mathcal{L}(\tilde{\Lambda}^r(V),W)\) is isomorphic to the vector space of alternating \(r\)-linear functions from \(V\times\dots\times V\) to \(W\) and in particular \(\tilde{\Lambda}^r(V)^*\), the dual space of \(\tilde{\Lambda}^r(V)\), is isomorphic to the space of all alternating \(r\)-linear forms on \(V\times\dots\times V\).

Given a basis, \(\{e_i\}\), of \(V\), the pure tensors, \(e_{i_1}\otimes\dots\otimes e_{i_r}\), form a basis of \(T^r(V)\), and so the, \(e_{i_1}\wedge\dots\wedge e_{i_r}\), span \(\tilde{\Lambda}^r(V)\). In fact its clear that this space is already spanned by the set \(e_{i_1}\wedge\dots\wedge e_{i_r}\) with \(1\leq i_1Remark In the case of \(r=2\), we clearly have \(A+S=\id_{T^2(V)}\) and \(AS=0\), so that
\begin{equation}
T^2(V)=S^2(V)\oplus\Lambda^2(V).
\end{equation}

Remark The elements of \(\Lambda^r(V)\) are called \(r\)-vectors. An \(r\)-vector which can be written \(v_1\wedge\dots\wedge v_r\) for some \(v_i\in V\) will be called a pure \(r\)-vector.

Symmetric Tensors and the Symmetric Algebra

In this and the next section we will identify symmetric and skew-symmetric tensors within \(T(V)\), and demonstrate, that with a suitably defined multiplication, they form subalgebras of \(T(V)\). In both cases we’ll then realise these algebras as quotients of \(T(V)\).

To any permutation \(\sigma\in S_r\) denote by \(P_\sigma:T^r(V)\mapto T^r(V)\) the linear operator defined on pure tensors by \(P_\sigma(v_1\otimes\dots\otimes v_r)=v_{\sigma(1)}\otimes\dots\otimes v_{\sigma(r)}\). A tensor, \(T\in T^r(V)\), is called symmetric if \(P_\sigma(T)=T\) for all \(\sigma\in S_r\). The subspace in \(T^r(V)\) of all symmetric tensors will be denoted \(S^r(V)\).

Consider the linear operator, \(S\), on \(T^r(V)\), defined as,
\begin{equation}
S=\frac{1}{r!}\sum_{\sigma\in S_r}P_\sigma.
\end{equation}
This is called the symmetrization on \(T^r(V)\). For any permutation, \(\sigma\in S_r\), \(P_\sigma S=S\), so for any \(T\in T^r(V)\), \(S(T)\) is symmetric. Conversely, it is clear that if \(T\) is symmetric, then \(S(T)=T\). Thus \(\img S=S^r(V)\) and \(S^2=S\), so \(S\) is a projector onto \(S^r(V)\).

Example Consider a 2-dimensional vector space over \(\CC\) with basis \(\{e_1,e_2\}\). Then on the natural basis of \(T^3(V)\), we have
\begin{align*}
S(e_1\otimes e_1\otimes e_1)&=e_1\otimes e_1\otimes e_1\\
S(e_1\otimes e_1\otimes e_2)&=\frac{1}{3}(e_1\otimes e_1\otimes e_2+e_1\otimes e_2\otimes e_1+e_2\otimes e_1\otimes e_1)\\
S(e_1\otimes e_2\otimes e_1)&=\frac{1}{3}(e_1\otimes e_1\otimes e_2+e_1\otimes e_2\otimes e_1+e_2\otimes e_1\otimes e_1)\\
S(e_2\otimes e_1\otimes e_1)&=\frac{1}{3}(e_1\otimes e_1\otimes e_2+e_1\otimes e_2\otimes e_1+e_2\otimes e_1\otimes e_1)\\
S(e_1\otimes e_2\otimes e_2)&=\frac{1}{3}(e_1\otimes e_2\otimes e_2+e_2\otimes e_1\otimes e_2+e_2\otimes e_2\otimes e_1)\\
S(e_2\otimes e_1\otimes e_1)&=\frac{1}{3}(e_1\otimes e_2\otimes e_2+e_2\otimes e_1\otimes e_2+e_2\otimes e_2\otimes e_1)\\
S(e_2\otimes e_2\otimes e_1)&=\frac{1}{3}(e_1\otimes e_2\otimes e_2+e_2\otimes e_1\otimes e_2+e_2\otimes e_2\otimes e_1)\\
S(e_2\otimes e_2\otimes e_2)&=e_2\otimes e_2\otimes e_2
\end{align*}

Let us consider the dimension of \(S^r(V)\). If \(\{e_i\}\) is a basis of the \(n\)-dimensional vector space, \(V\), then all pure tensors of the form, \(e_{i_1}\otimes\dots\otimes e_{i_r}\), form a basis of \(T^r(V)\). A standard notation is to write,
\begin{equation}
S(e_{i_1}\otimes\dots\otimes e_{i_r})=e_{i_1}\dots e_{i_r}.
\end{equation}
The tensors, \(e_{i_1}\dots e_{i_r}\), clearly span \(S^r(V)\), but, as the Example makes clear, whenever \(\{i_1,\dots,i_r\}\) and \(\{j_1,\dots,j_r\}\) are identical as sets \(e_{i_1}\dots e_{i_r}=e_{j_1}\dots e_{j_r}\). In other words, \(e_{i_1}\dots e_{i_r}\) only depends on the number of times each \(e_i\) appears in the product, so we can write \(e_{i_1}\dots e_{i_r}=e_1^{a_1}\dots e_n^{a_n}\) where \(a_i\) is the multiplicity of \(e_i\) in \(e_{i_1}\dots e_{i_r}\), and \(a_1+\dots+a_n=r\). It is then clear that the tensors, \(e_1^{a_1}\dots e_n^{a_n}\), are linearly independent — for distinct \(n\)-tuples, \((a_1,\dots,a_n)\) and \((b_1,\dots, b_n)\), \(e_1^{a_1}\dots e_n^{a_n}\) and \(e_1^{b_1}\dots e_n^{b_n}\) are linear combinations of non-intersecting subsets of basis elements of \(T^r(V)\). Thus the \(e_1^{a_1}\dots e_n^{a_n}\) are a basis for \(S^r(V)\) and so to determine the dimension of \(S^r(V)\), we must count the number of distinct \(n\)-tuples \((a_1,\dots,a_n)\), \(a_i\in\ZZ_{\geq0}\), such that \(a_1+\dots+a_n=r\). A nice way of understanding this counting problem is through Feller’s `stars and bars’. Suppose \(r=8\) and \(n=5\) so that we wish to determine the dimension of the space \(S^8(V)\) where \(V\) is a \(5\)-dimensional vector space. Then each valid \(5\)-tuple corresponds to a diagram such as,
\begin{equation*}
|\star\star||\star\star|\star\star\star|\star|,
\end{equation*}
in which, reading left to right, the number of stars between the \(i\)th pair of bars corresponds to \(a_i\), so in particular, this example corresponds to \((2,0,2,3,1)\). We therefore need to count the number of possible arrangements of \(n-1\) bars and \(r\) stars, or in other words, the number of ways of choosing \(r\) star locations from the \(n+r-1\) possible locations. Thus we have that for an \(n\)-dimensional vector space \(V\),
\begin{equation}
\dim S^r(V)={n+r-1\choose r}.
\end{equation}

On the space, \(S(V)=\bigoplus_{r=0}^\infty S^r(V)\), we can define a multiplication according to, \(T_1\cdot T_2=S(T_1\otimes T_2)\), for any \(T_1\in S^r(V)\) and \(T_2\in S^s(V)\). Equipped with this multiplication \(S(V)\) becomes a commutative associative algebra. That the product is commutative is clear. That it is associative follows since, for any \(T_1\in S^r(V)\), \(T_2\in S^s(V)\) and \(T_3\in S^t(V)\),
\begin{align*}
(T_1\cdot T_2)\cdot T_3&=S(S(T_1\otimes T_2)\otimes T_3)\\
&=S\left(\frac{1}{(r+s)!}\sum_{\sigma\in S_{r+s}}P_\sigma(T_1\otimes T_2)\otimes T_3\right)\\
&=\frac{1}{(r+s)!}\sum_{\sigma\in S_{r+s}}S(P_\sigma(T_1\otimes T_2)\otimes T_3)\\
&=S(T_1\otimes T_2\otimes T_3),
\end{align*}
and similarly for \(T_1\cdot(T_2\cdot T_3)=S(T_1\otimes T_2\otimes T_3)\).

As already discussed, the tensor product provides a multiplication on the space \(T(V)=\bigoplus_{r=0}^\infty T^r(V)\) such that it becomes an associative algebra with identity. Moreover, by virtue of its universal mapping property we should expect to be able to realise the algebra \(S(V)\) as a quotient of \(T(V)\) by a certain ideal\footnote{Recall subspace \(I\) of an algebra \(A\) is an ideal of \(A\) if for all \(a\in A\) and all \(i\in I\), \(ai\in I\) and \(ia\in I\). In this case the space \(A/I\) is an algebra with the multiplication inherited from \(A\) according to \((a+I)\cdot(b+I)=ab+I\)}, \(I\). Indeed, if \(\pi:T(V)\mapto T(V)/I\) is the quotient map, then for \(u,v\in V\) we’ll want that \(\pi(u\otimes v)=\pi(v\otimes u)\) in \(T(V)/I\), that is, we’ll want \(u\otimes v-v\otimes u\in I\). We are led, therefore, to the following definition.

Definition The symmetric algebra on the vector space \(V\) over the field \(K\) is the quotient \(T(V)/I\) of the tensor algebra \(T(V)\) by the ideal \(I\) generated by the elements \(u\otimes v-v\otimes u\) for all \(u,v\in V\).

Let us denote by \(\tilde{S}(V)\), the symmetric algebra as defined here. Then defining \(I^r=T^r(V)\cap I\), it’s not difficult to see that \(I=\bigoplus_{r=0}^\infty I^r\). In fact, \(\tilde{S}(V)=\bigoplus_{r=0}^\infty T^r(V)/I^r\), which we can see by observing that the linear map defined as \(\sum_r(T_r+I^r)\mapsto \sum_rT_r+I\), where \(T_r\in T^r(V)\), is clearly surjective and is injective since if \(\sum_rT^r\in I\) then \(T^r\in I^r\). Thus, setting, \(\tilde{S}^r(V)=T^r(V)/I^r\), so that \(\tilde{S}(V)=\bigoplus_{r=0}^\infty\tilde{S}^r(V)\), we will want to establish that \(\tilde{S}^r(V)\) and \(S^r(V)\) are isomorphic, from which, it will immediately follow that \(\tilde{S}(V)\) and \(S(V)\) are isomorphic.

There is an alternative description of the ideal, \(I\). Let us denote by, \(I’\), the ideal generated by all elements, \(T-P_\sigma(T)\), for any tensor, \(T\in T^r(V)\), and any permutation, \(\sigma\in S_r\). Now for any pure tensor, \(v_1\otimes\dots\otimes v_r\),
\begin{equation*}
v_1\otimes\dots\otimes v_r-v_{\sigma(1)}\otimes\dots\otimes v_{\sigma(r)},
\end{equation*}
can be written as a sum of terms of the form
\begin{equation*}
v_1\otimes\dots\otimes v_{i_1}\otimes v_{i_1+1}\otimes\dots\otimes v_r-v_1\otimes\dots\otimes v_{i_1+1}\otimes v_{i_1}\otimes\dots\otimes v_r,\end{equation*}
in which only neighbouring factors are transposed. Since each of these clearly belongs to the ideal, \(I\), as originally defined, it follows that \(I’\subseteq I\). The reverse inclusion is obvious so we have \(I’=I\).
In particular, \(T(V)/I\) is commutative since for any pure tensors, \(T_1,T_2\in T(V)\), \(T_2\otimes T_1=P_\sigma(T_1\otimes T_2)\), for some permutation, \(\sigma\), so that, \(T_1\otimes T_2-T_2\otimes T_1=T_1\otimes T_2-P_\sigma(T_1\otimes T_2)\in I\). That is, \((T_1+I)(T_2+I)=T_1\otimes T_2+I=T_2\otimes T_1+I=(T_2+I)(T_1+I)\).

Abusing notation, for any \(v_1\otimes\dots\otimes v_r\in T^r(V)\), \(v_i\in V\), let us write its image in \(\tilde{S}^r(V)\), via the quotient map, \(\pi\), as \(v_1\cdots v_r\). Recall that the tensor product was defined via a universal mapping property. In particular, whenever we have an \(r\)-linear function \(f:V\times\dots\times V\mapto W\), where \(W\) is some vector space, then there is a unique linear mapping \(L:T^r(V)\mapto W\) such that \(f=L\iota\) where \(\iota \) was the \(r\)-linear function \(\iota(v_1,\dots,v_r)=v_1\otimes\dots\otimes v_r\). This leads to the following result for \(\tilde{S}^r(V)\).

Proposition If \(\iota\) is the \(r\)-linear function, \(\iota:V\times\dots\times V\mapto\tilde{S}^r(V)\), defined as, \(\iota(v_1,\dots,v_r)=v_1\cdots v_r\), then \((\tilde{S}^r(V),\iota)\) has the following universal mapping property: whenever \(f:V\times\dots\times V\mapto W\) is a symmetric \(r\)-linear function with values in a vector space \(W\) there exists a unique linear map \(L:\tilde{S}^r(V)\mapto W\) such that \(f=L\iota\).

Proof From the universal mapping property of the tensor product we have a map \(L’:T^r(V)\mapto W\) such that on pure tensors \(L'(v_1\otimes\dots\otimes v_r)=f(v_1,\dots,v_r)\). But since \(f(v_1,\dots,v_i,v_{i+1},\dots,v_r)=f(v_1,\dots,v_{i+1},v_i,\dots,v_r)\) it is clear that for any \(T\in I^r\), \(L'(T)=0\), and so \(L’\) factorises as, \(L’=L\pi\), where \(\pi\) is the quotient map \(\pi:T^r(V)\mapto T^r(V)/I^r\) and \(L:T^r(V)/I^r\mapto W\) is desired linear map.\(\blacksquare\)

As a consequence, we have that the space of linear maps \(\mathcal{L}(\tilde{S}^r(V),W)\) is isomorphic to the vector space of symmetric \(r\)-linear functions from \(V\times\dots\times V\) to \(W\) and in particular that \(\tilde{S}^r(V)^*\), the dual space of \(\tilde{S}^r(V)\), is isomorphic to the space of all symmetric \(r\)-linear forms on \(V\times\dots\times V\).

Recall that the tensor algebra, \(T(V)\), has a universal mapping property whereby whenever \(f:V\mapto A\) is a linear map from \(V\) into an associative algebra \(A\) with identity there exists a unique algebra homomorphism, \(F:T(V)\mapto A\), with \(F(1)=1\) and such that \(F(v)=f(v)\) with \(F(v_1\otimes\dots\otimes v_r)=f(v_1)\cdots f(v_r)\). This leads to the following result for \(\tilde{S}(V)\).

Proposition If \(\iota\) is the linear map embedding \(V\) in \(T(V)\) then \((\tilde{S}(V),\iota)\) has the following universal mapping property: whenever \(f:V\mapto A\) is a linear map from \(V\) into a commutative associative algebra \(A\) with identity, there exists a unique algebra homomorphism, \(F:\tilde{S}(V)\mapto A\), with \(F(1)=1\) such that \(F(v)=f(v)\) with \(F(v_1\cdots v_r)=f(v_1)\cdots f(v_r)\).

Proof From the universal mapping property of the tensor algebra we have an algebra homomorphism \(F’:T(V)\mapto A\) such that \(F'(v)=f(v)\) and since \(A\) is commutative we have \(F'(u\otimes v-v\otimes u)=0\), so \(I\in\ker F’\) and \(F’\) factorises as \(F’=F\pi\) where \(\pi\) is the quotient map \(\pi:T(V)\mapto T(V)/I\) and \(F:T(V)/I\mapto A\) is the desired algebra homomorphism.\(\blacksquare\)

Now if \(\{e_i\}\) is a basis of \(V\), then \(r\)-fold (tensor) products of the \(e_i\), \(e_{i_1}\otimes\cdots\otimes e_{i_r}\), span \(T^r(V)\). But since \(T(V)/I\) is commutative, this means that the elements, \(e_1^{a_1}\cdots e_n^{a_n}\), such that \(a_1+\dots+a_n=r\), must span \(\tilde{S}^r(V)\). Now recall that \(S^r(V)\) was defined as the image of the symmetrization operator, \(S\), on \(T^r(V)\), and that \(S\) is a projector. This means that \(T^r(V)=\ker S\oplus\img S=\ker S\oplus S^r(V)\). Clearly any element of \(I^r\) belongs to \(\ker S\), so \(I^r\subseteq\ker S\). But if there was some \(T\in\ker S\) such that \(T\notin I^r\) then \(\pi(T)\neq0\) and we must be able to express \(\pi(T)\) as a linear combination of the \(e_1^{a_1}\cdots e_n^{a_n}\). Thus, using these same linear coefficients, we can chose a tensor, \(T’\in T^r(V)\), as a linear combination of pure tensors of the form,
\begin{equation*}
\underbrace{e_1\otimes\dots\otimes e_1}_{a_1}\otimes\dots\otimes\underbrace{e_n\otimes\dots\otimes e_n}_{a_n},
\end{equation*}
each tensor in this linear combination corresponding to a distinct \(n\)-tuple, \((a_1,\dots,a_n)\), such that \(\pi(T)=\pi(T’)\). Then, \(T-T’\in I^r\), so \(S(T)=S(T’)\).
But \(S(T’)\) cannot be zero, since the symmetrization of distinct pure tensors in the linear combination, \(T’\), are non-zero linear combinations of distinct sets of basis elements of \(T^r(V)\). Thus, \(S(T)\neq0\), contradicting our initial assumption. It follows that \(\ker S=I^r\) and we have established that \begin{equation}
T^r(V)=S^r(V)\oplus I^r.
\end{equation}
In particular, this means that \(\dim T^r(V)/I^r=\dim S^r(V)\), so that the elements, \(e_1^{a_1}\cdots e_n^{a_n}\), such that \(a_1+\dots+a_n=r\), are a basis for \(\tilde{S}^r(V)\) and of course that \(\tilde{S}^r(V)\cong S^r(V)\) with this isomorphism such that, \(T+\ker S\mapsto S(T)\), mapping basis elements whose identification has already been anticipated by our abuse of notation. This clearly extends to a (grade preserving) algebra isomorphism \(\tilde{S}(V)\cong S(V)\).

Component Representation of Tensors

If \(\{e_i\}\) is a basis for \(V\) with \(\{e^i\}\) the dual basis of \(V^*\). Then any tensor, \(T\), of type \((r,s)\) can be expressed as the linear combination,
\begin{equation}
T=\sum_{\substack{i_1,\dots,i_r\\j_1,\dots,j_s}}T^{i_1\dots i_r}_{j_1\dots j_s}e_{i_1}\otimes\dots\otimes e_{i_r}\otimes e^{j_1}\otimes\dots\otimes e^{j_s},
\end{equation}
or, employing the summation convention,
\begin{equation}
T=T^{i_1\dots i_r}_{j_1\dots j_s}e_{i_1}\otimes\dots\otimes e_{i_r}\otimes e^{j_1}\otimes\dots\otimes e^{j_s},
\end{equation}
with the \(T^{i_1\dots i_r}_{j_1\dots j_s}\) the components of \(T\) with respect to the chosen basis of \(V\). In physics literature it is common for the collection of components, \(T^{i_1\dots i_r}_{j_1\dots j_s}\), to be actually referred to as “the” tensor. If we were to choose another basis for \(V\), say \(\{e’_i\}\), related to the first according to, \(e’_i=A_i^je_j\), then the new dual basis, \(\{e’^i\}\), is related to the old one by, \(e’^i=(A^{-1})^i_je^j\) (\(e’^i(e’_j)=(A^{-1})^i_ke^k(A_j^le_l)=(A^{-1})^i_kA_j^l\delta_l^k=\delta_j^i\)). With respect to this new pair of dual bases, the tensor \(T\) is given by
\begin{equation}
T=T^{i_1\dots i_r}_{j_1\dots j_s}A_{i_1}^{k_1}\cdots A_{i_r}^{k_r}(A^{-1})^{j_1}_{l_1}\cdots(A^{-1})^{j_r}_{l_r}e_{k_1}\otimes\dots\otimes e_{k_s}\otimes e^{l_1}\otimes\dots\otimes e^{l_r},
\end{equation}
so that the components of \(T\) with respect to the new basis, \({T’}^{k_1\dots k_r}_{l_1\dots l_s}\) say, are given by
\begin{equation}
{T’}^{k_1\dots k_r}_{l_1\dots l_s}=T^{i_1\dots i_r}_{j_1\dots j_s}A_{i_1}^{k_1}\cdots A_{i_r}^{k_r}(A^{-1})^{j_1}_{l_1}\cdots(A^{-1})^{j_r}_{l_r}.
\end{equation}
When treating tensors “as” their components the question naturally arises of how to distinguish between components with respect different bases of a single tensor. The usual approach, sometimes called kernel-index notation, maintains a “kernel” letter indicating the tensor with primes on the indices indicating that the components are with respect to another basis. For example in the case of a vector \(v\), \(v^i\) and \(v^{i’}\) are the same vector expressed with respect to two different bases. \(v^{i’}\) and \(v^{i}\) are thus related according to \(v^{i’}=(A^{-1})^{i’}_iv^i\).

A vector is sometimes defined as an object whose components transform in this way, that is contravariantly (with the inverse of the matrix relating the basis vectors) 1. Likewise, a covariant vector, \(v_i\), in other words, the components of a vector \(v\) with respect to the dual basis \(e^i\), transforms according to \(v_{i’}=A_{i’}^{i}v_i\). More generally, tensors of rank \((r,s)\) are then defined to by objects whose \(r+s\) coordinates, \(T^{i_1\dots i_r}_{j_1\dots j_s}\), transform as you’d expect based on the `upstairs’ or `downstairs’ position of its indices, as
\begin{equation}
T^{i’_1\dots i’_r}_{j’_1\dots j’_s}=T^{i_1\dots i_r}_{j_1\dots j_s}A_{i_1}^{i’_1}\cdots A_{i_r}^{i’_r}(A^{-1})^{j_1}_{j’_1}\cdots(A^{-1})^{j_r}_{j’_r}.
\end{equation}

Recall the notion of contraction. If we have a tensor of type \((r,s)\), \(T^{i_1\dots i_r}_{j_1\dots j_s}\), then contraction corresponds to forming a new \((r-1,s-1)\) tensor, say \(S^{i_1\dots i_{r-1}}_{j_1\dots j_{s-1}}\) as
\begin{equation}
S^{i_1\dots i_{r-1}}_{j_1\dots j_{s-1}}=T^{i_1\dots i_{a-1}ki_{a+1}\dots i_r}_{j_1\dots j_{b-1}kj_{b+1}\dots j_s}.
\end{equation}
In this case we have contracted over the \((i_a,j_b)\) pair of indices.

If the underlying vector space is equipped with a symmetric, non-degenerate inner product then this inner product can be regarded as a \((0,2)\) tensor. This is called the metric tensor, conventionally denoted \(g\). With respect to a given basis it has components, \(g_{ij}\), which are of course the elements of what we previously called the Gram matrix. The inner product provides us with a natural isomorphism \(V\mapto V^*\) such that \(v\mapsto \alpha_v\) with \(\alpha_v(w)=(v,w)\) for any \(w\in V\), that is, to uniquely associate a covariant vector with each contravariant vector and vice versa. In terms of a basis \(e_i\) of \(V\) with dual basis \(e^i\) of \(V^*\), we have \(e_i\mapsto\alpha_{e_i}\) which we could write as \(\alpha_{e_i}=\alpha_{ij}e^j\) with the \(\alpha_{ij}\) determined by \(\alpha_{e_i}(e_j)=g_{ij}=\alpha_{ik}e^k(e_j)=\alpha_{ij}\). So an arbitrary vector \(v^ie_i\) is mapped to \(v^ig_{ij}e^j\), or in other words, by applying the metric tensor to the contravariant vector \(v^i\) we obtain the covariant vector \(v_i\) given by
\begin{equation}
v_i=g_{ij}v^j.
\end{equation}
In the other direction, we have the inverse map, \(V^*\mapto V\), which we’ll write as \(e^i\mapsto g^{ij}e_j\) with the \(g^{ij}\) determined by \(v^ig_{ij}e^j\mapsto v^ig_{ij}g^{jk}e_k=v^ie_i\). That is,
\begin{equation}
g_{ij}g^{jk}=\delta_i^k,
\end{equation}
that is, \(g^{ij}\) is the inverse of the matrix \(g_{ij}\), and given a covariant vector \(v_i\) we obtain a contravariant vector \(v^i\) as
\begin{equation}
v^i=g^{ij}v_j.
\end{equation}
What we have here then is a way of raising and lowering indices which of course generalises to arbitrary tensors.

Let us note here, that in physical applications vectors and tensors very often arise as vector or tensor fields at a particular point in some space. Thus we might have, for example, a vector \(V(x)\) or tensor \(T(x)\) at some point \(x\). The components of \(V(x)\) or \(T(x)\) obviously depend on a choice of basis vectors. This in turn corresponds to a choice of coordinate system and as explained in the appendix on vector calculus, for non-cartesian coordinate systems, the corresponding basis vectors depend on the point in space, \(x\), so that the change of basis matrices relating components of \(V(x)\) or \(T(x)\) for different coordinate systems will also depend on \(x\).

Notes:

  1. In physics contexts there is often a restriction placed on the kinds of basis change considered. For example it is typical to see vectors defined as objects whose components transform contravaraintly with respect to spatial rotations.