Monthly Archives: March 2016

Norms

Unless otherwise stated, an inner product space will be here either real orthogonal or complex Hermitian. Thus, we can assume that the inner product of a vector, \(v\), with itself, \((v,v)\), is real. In the case that the space \(V\) has a positive definite inner product, then \((v,v)>0\) for any non-zero \(v\) and we can define the length or norm of a vector \(v\) to be
\begin{equation}
\norm{v}=\sqrt{(v,v)}.
\end{equation}
This is a genuine norm, since, as defined, \(\norm{av}=\abs{a}\norm{v}\), \(\norm{v}=0\) implies \(v=0\) and, as we’ll see shortly, \(\norm{v+w}\leq\norm{v}+\norm{w}\).
If, on the other hand, \(V\) has only a non-degnerate inner product, there can exist non-zero vectors, \(v\), with \((v,v)\leq0\). In this case, we could define
\begin{equation}
\norm{v}=\sqrt{|(v,v)|},
\end{equation}
but should note that this is, of course, no longer properly called a norm.

In the positive definite case, we have the important Cauchy-Schwarz inequality.

Theorem (Cauchy-Schwarz inequality) In a positive definite inner product space, \(V\), for any vectors \(v,w\in V\),\begin{equation}
\abs{(v,w)}\leq\norm{v}\norm{w},\label{equ:Cauchy-Schwarz}
\end{equation}
with equality if and only if \(v\) and \(w\) are linearly dependent.

Proof The statement is trivially true when \((v,w)=0\). Assuming \((v,w)\neq0\), so that \(v\neq0\) and \(w\neq0\), for any complex number \(a\) we consider the inner product of the vector \(v-aw\) with itself. Then,\begin{equation*}
(v-aw,v-aw)=\norm{v}^2-a(v,w)-a^*(w,v)+\abs{a}^2\norm{w}^2\geq0,
\end{equation*}
with equality if and only if \(v=aw\), so that choosing
\begin{equation*}
a=\frac{\norm{v}^2}{(v,w)}
\end{equation*}
we have
\begin{equation*}
\norm{v}^2-2\norm{v}^2+\frac{\norm{v}^4\norm{w}^2}{\abs{(v,w)}^2}\geq0
\end{equation*}
so that
\begin{equation*}
\abs{(v,w)}^2\leq\norm{v}^2\norm{w}^2
\end{equation*}
from which the result follows after taking square roots.\(\blacksquare\)

This means that we can define the angle, \(\theta\), \(0\leq\theta\leq\pi/2\) between two vectors, \(v\) and \(w\), in a real orthogonal or complex Hermitian space through,
\begin{equation}
\cos\theta=\frac{\abs{(v,w)}}{{\norm{v}\norm{w}}}.
\end{equation}
The triangle inequality,
\begin{equation}
\norm{v+w}\leq\norm{v}+\norm{w},\label{triangle-standard}
\end{equation}
follows by considering the square of the left hand side and using the Cauchy-Schwarz inequality,
\begin{align*}
\norm{v+w}^2&=\norm{v}^2+\norm{w}^2+2\Real(v,w)\\
&\leq\norm{v}^2+\norm{w}^2+2\abs{(v,w)}\\
&\leq\norm{v}^2+\norm{w}^2+2\norm{v}\norm{w}\\
&=\left(\norm{v}+\norm{w}\right)^2,
\end{align*}
then taking the square root.

Similarly, a couple of variants of the triangle inequality can be obtained by considering the square of the difference of norms,
\begin{align*}
\left(\norm{v}-\norm{w}\right)^2&=\norm{v}^2+\norm{w}^2-2\norm{v}\norm{w}\\
&\leq\norm{v}^2+\norm{w}^2-2\abs{(v,w)}\\
&\leq\norm{v}^2+\norm{w}^2-2\abs{\Real(v,w)}\\
&\leq\norm{v}^2+\norm{w}^2-2\Real(v,w)\\
&=\norm{v-w}^2
\end{align*}
so that taking square roots we obtain
\begin{equation}
\abs{\norm{v}-\norm{w}}\leq\norm{v-w},\label{triangle-variant1}
\end{equation}
or alternatively
\begin{align*}
\left(\norm{v}-\norm{w}\right)^2&=\norm{v}^2+\norm{w}^2-2\norm{v}\norm{w}\\
&\leq\norm{v}^2+\norm{w}^2-2\abs{(v,w)}\\
&\leq\norm{v}^2+\norm{w}^2-2\abs{\Real(v,w)}\\
&\leq\norm{v}^2+\norm{w}^2+2\Real(v,w)\\
&=\norm{v+w}^2
\end{align*}
so that taking square roots we obtain
\begin{equation}
\abs{\norm{v}-\norm{w}}\leq\norm{v+w}.\label{triangle-variant2}
\end{equation}
A further simple consequence of the definition of the norm in terms of the positive definite inner product is the parallelogram identity,
\begin{equation}
\norm{v+w}^2+\norm{v-w}^2=2\left(\norm{v}^2+\norm{w}^2\right),\label{equ:parallelogram}
\end{equation}
which in \(\RR^2\) expresses the fact that the sum of the squared lengths of the diagonals of a parallelogram is equal to twice the sum of the squared lengths of the sides.

We have seen that specifying a positive definite symmetric or Hermitian inner product, \((\cdot,\cdot)\), on a vector space, \(V\), over respectively \(\RR\) or \(\CC\) implies the existence of a norm \(\norm{\cdot}\) on \(V\). That is, real orthogonal and complex Hermitian spaces in which the inner product is positive definite are normed vector spaces. In the other direction, given a normed vector space, \(V\), over \(\RR\), in which the norm, \(\norm{\cdot}\), satisfies the parallelogram identity, \eqref{equ:parallelogram}, we can define a positive definite symmetric inner product as
\begin{equation}
(u,v)=\frac{1}{4}\left(\norm{u+v}^2-\norm{u-v}^2\right).\label{equ:norm inner product}
\end{equation}
Notice that this definition ensures that \(\norm{v}=\sqrt{(v,v)}\). Also, since by the parallelogram identity,
\begin{equation*}
\frac{1}{4}\left(\norm{u+v}^2-\norm{u-v}^2\right)=\frac{1}{2}\left(\norm{u+v}^2-\norm{u}^2-\norm{v}^2\right),
\end{equation*}
and
\begin{equation*}
\frac{1}{4}\left(\norm{u-v}^2-\norm{u+v}^2\right)=\frac{1}{2}\left(\norm{u-v}^2-\norm{u}^2-\norm{v}^2\right),
\end{equation*}
applying the triangle inequality, which the norm satisfies by definition, then leads to the Cauchy-Schwarz identity. To confirm that \eqref{equ:norm inner product} defines a genuine inner product on \(V\) we must check \eqref{inprod-linear}. Using the parallelogram identity we have,
\begin{align*}
\norm{u+v+w}^2&=2(\norm{u+v}^2+\norm{w}^2)-\norm{u+v-w}^2\\
&=2(\norm{u+v}^2+\norm{w}^2)-(2(\norm{u-w}^2+\norm{v}^2)-\norm{u-v-w}^2)\\
&=2\norm{u+v}^2-2\norm{u-w}^2+2(\norm{w}^2-\norm{v}^2)+\norm{u-v-w}^2,
\end{align*}
and
\begin{align*}
\norm{u-v-w}^2&=2(\norm{u-v}^2+\norm{w}^2)-\norm{u-v+w}^2\\
&=2(\norm{u-v}^2+\norm{w}^2)-(2(\norm{u+w}^2+\norm{v}^2)-\norm{u+v+w}^2)\\
&=2\norm{u-v}^2-2\norm{u+w}^2+2(\norm{w}^2-\norm{v}^2)+\norm{u+v+w}^2.
\end{align*}
That is,
\begin{equation*}
\norm{u+v+w}^2-\norm{u-v-w}^2=\norm{u+v}^2-\norm{u-v}^2+\norm{u+w}^2-\norm{u-w}^2
\end{equation*}
so \((u,v+w)=(u,v)+(u,w)\). Next, observe that the definition makes it clear that, \((u,-v)=-(u,v)\), so that,
\((u,nv)=n(u,v)\), for any integer \(n\). It is then easy to see that we must have, \((u,qv)=q(u,v)\), for any \(q\in\QQ\), so that for any \(a\in\RR\) and \(q\in\QQ\),
\begin{align*}
\abs{(u,av)-a(u,v)}&=\abs{(u,(a-q)v)-(a-q)(u,v)}\\
&\leq\abs{(u,(a-q)v)}+\abs{(a-q)}\abs{(u,v)}\\
&\leq2\abs{a-q}\norm{u}\norm{v},
\end{align*}
by Cauchy-Schwartz, and we have that \((u,av)=a(u,v)\) for any \(a\in\RR\).

In the case of a normed vector space, \(V\), over \(\CC\), in which the norm once again satisfies the parallelogram identity we can define a positive definite Hermitian inner product as
\begin{equation}
(u,v)=\frac{1}{4}\left(\norm{u+v}^2-\norm{u-v}^2+i\norm{u+iv}^2-i\norm{u-iv}^2\right).
\end{equation}
The proof that this defines a genuine inner product on \(V\) proceeds as for the real case, the only difference being the real and complex parts are treated separately.

Classification of Inner Product Spaces

Definition Two vectors \(v,w\in V\) are said to be orthogonal if \((v,w)=0\). If \(U\) is a subspace of \(V\) then the orthogonal complement of \(U\), denoted \(U^\perp\), is defined as
\begin{equation}
U^\perp=\{v\in V\mid(u,v)=0\;\forall u\in U\}.
\end{equation}

It is clear that \(U^\perp\) is also a subspace of \(V\) and that if the restriction of the inner product to \(U\) is non-singular, then \(U\cap U^\perp=\{0\}\). In fact, in this case we have the following result.

Theorem If \(U\) is a subspace of an inner product space \(V\) such that the restriction of the inner product to \(U\) is non-degenerate, then \(V=U\oplus U^\perp\).

Proof Since we already know that \(U\cap U^\perp=\{0\}\), we need only demonstrate that any \(v\in V\) can be written as \(v=u+v’\) with \(u\in U\) and \(v’\in U^\perp\). To this end suppose \(e_1,\dots,e_r\) is a basis of \(U\). Then we must find numbers \(c^i\) and some \(v’\in U^\perp\) such that
\begin{equation}
v=c^1e_1+\dots+c^re_r+v’.\label{equ:orthog comp intermediate}
\end{equation}
Now define the matrix \(\mathbf{M}\) through the matrix elements, \(M_{ij}=(e_i,e_j)\). Then taking succesive inner products of \eqref{equ:orthog comp intermediate} with the basis elements of \(U\), we get the system of equations,
\begin{eqnarray*}
(e_1,v)&=&M_{11}c_1+\dots+M_{rr}c_r\\
\vdots\quad&\vdots&\qquad\quad\vdots\\
(e_r,v)&=&M_{r1}c_1+\dots+M_{rr}c_r,
\end{eqnarray*}
and since the restriction of the inner product to \(U\) is non-degenerate, \(\mathbf{M}\) is non-singular. Thus there is a unique solution for the \(c^i\), so any \(v\in V\) can be expressed in the form, \eqref{equ:orthog comp intermediate}, and the result follows.\(\blacksquare\)

Remark Recall that a direct sum decomposition, \(V=U_1\oplus U_2\), determines projectors, \(P_1,P_2\in\mathcal{L}(V)\), \(P_i^2=P_i\), such that, \(P_1+P_2=\id_V\) and \(P_iP_j=0\) when \(i\neq j\), and, \(\img P_i=U_i\), \(\ker P_1=U_2\) and \(\ker P_2=U_1\). In the context of inner product spaces orthogonal projections are natural. These are the projections corresponding to an orthogonal direct sum decomposition, such as \(V=U\oplus U^\perp\), that is, projections whose image and kernel are orthogonal.

A non-zero vector \(v\) of an inner product space \(V\) is said to be a null vector if \((v,v)=0\). All vectors are null in symplectic geometries. In the case of orthogonal or Hermitian geometries, aside from the trivial case of a zero inner product, all vectors cannot be null. Indeed, suppose on the contrary, that every \(v\in V\) was such that \((v,v)=0\). Then for every pair of vectors \(u,v\in V\), we have, in the case of a symmetric inner product,
\begin{equation*}
0=(u+v,u+v)=(u,u)+(v,v)+2(u,v)=2(u,v),
\end{equation*}
so \((u,v)=0\) implying the inner product is zero. In the case of an Hermitian inner product,
\begin{equation*}
0=(u+v,u+v)=(u,u)+(v,v)+2\Real(u,v)=2\Real(u,v),
\end{equation*}
and
\begin{equation*}
0=(u+iv,u+iv)=(u,u)+(v,v)+2i\Imag(u,v)=2i\Imag(u,v),
\end{equation*}
so also in this case, \((u,v)=0\), contradicting our assumption that inner product is non-zero.

Theorem Any finite dimensional inner product space, \(V\), over \(\RR\) or \(\CC\), can be decomposed into a direct sum \(V=V_1\oplus\dots\oplus V_r\) where the subspaces \(V_i\) are pairwise orthogonal. In the case of symmetric or Hermitian inner products they are \(1\)-dimensional. In the case of an anti-symmetric inner product the \(V_i\) may be either \(1\)-dimensional, in which case the restriction of the inner product to \(V_i\) is degenerate, or \(2\)-dimensional, in which case the restriction is non-degenerate.

Proof The proof is by induction on the dimension of \(V\). The case of \(\dim V=1\) is trivial so consider \(\dim V\geq 2\). We assume the inner product is not zero, since in this trivial case there is nothing to prove. In the case of symmetric or Hermitian inner products, as already observed, we can choose a non-null vector \(u\) from \(V\). If \(U=\Span(u)\) then the restriction of the inner product of \(V\) to \(U\) is certainly non-degenerate so we have \(V=U\oplus U^\perp\) by the previous result. Thus, if \(V\) is \(n\) dimensional then \(U^\perp\) is \(n-1\) dimensional and by induction we can therefore assume that \(U^\perp\) has the desired decomposition and the result follows. In the case of an anti-symmetric inner product, there must exist two vectors, \(v_1\) and \(v_2\) say, such that \((v_1,v_2)\neq0\). Call the subspace spanned by these vectors \(U\) then the restriction of the inner product to \(U\) is non-degenerate and the result follows as before.\(\blacksquare\)

Given this orthogonal decomposition we can use what we already know about the classification of low dimensional inner product spaces to complete the classification in general. To that end, we consider two \(n\)-dimensional vector spaces, \(V\) and \(V’\), with inner products, \((\cdot,\cdot)\) and \((\cdot,\cdot)’\), and orthogonal decompositions \(\oplus_{i=1}^rV_i\) and \(\oplus_{i=1}^{r’}{V_i}’\) respectively.

Define the subspace, \(V_0=\oplus_{i=1}^{r_0}V_i\), the sum of the degenerate subspaces of the orthogonal decomposition, and \(V^\perp=\{v\in V\mid(v’,v)=0\;\forall v’\in V\}\), sometimes called the radical of the inner product \((\cdot,\cdot)\). Clearly \(V_0\subseteq V^\perp\) and conversely, by virtue of the decomposition, we know that any \(v\in V^\perp\) can be written uniquely as a sum \(v=\sum_{i=1}^nv_i\) with each \(v_i\in V_i\). Suppose it was the case that \(v_k\neq0\) for some \(k>r_0\). Then, in the case of symmetric or Hermitian inner products we’d have \((v_k,v_k)\neq0\) and in the anti-symmetric case there’d be some vector \({v’}_k\in V_k\) such that \(({v’}_k,v_k)\neq0\) so in either case we contradict \(v\in V^\perp\) and conclude that \(V_0=V^\perp\). But we also have that if \(\{e_i\}\) is a basis of \(V\) in which the Gram matrix is \(\mathbf{G}\) then \(V^\perp\) consists of those \(v=v^ie_i\in V\) such that \((e_j,v)=0\) for each \(j=1,\dots,n\), that is, such that, \(\sum_{i=1}^nG_{ji}v^i=0\), for each \(j=1,\dots,n\). But this is just the condition for \(v\in\ker L_\mathbf{G}\), so \(V^\perp=\ker L_\mathbf{G}\) and \(\dim V^\perp=n-\rank\mathbf{G}\). Thus, we may conclude that \(r_0=\dim V^\perp=n-\rank\mathbf{G}\). Moreover, since the Gram matrices of isometric inner product spaces have the same rank, if our two inner product spaces \(V\) and \(V’\) are isometric we may further conclude that \(r_0={r_0}’\) where \({r_0}’\) is the number of degenerate subspaces in the orthogonal decomposition of \(V’\).

Now let us consider the case of \(V\) and \(V’\) having anti-symmetric inner products. We know that aside from \(1\)-dimensional degenerate subspaces all the remaining subspaces in their respective orthogonal decompositions are \(2\)-dimensional non-degenerate and that any two such spaces are isometric. Thus, if \(V\) and \(V’\) have \(r_0={r_0}’\) then they must both have \(r_0\) degenerate \(1\)-dimensional subspaces and \((n-r_0)/2\) non-degenerate \(2\)-dimensional subspaces in their respective decompositions, which we may order such that in both cases non-degenerate precede degenerate subspaces. Therefore they must be isometric, since we can construct an isometry \(f:V\mapto V’\) as the direct sum \(f=\oplus_{i=1}^rf_i\) of isometries \(f_i:V_i\mapto {V_i}’\) which we know must exist from the discussion in Low Dimensional Inner Product Spaces. Conversely, if \(V\) and \(V’\) are isometric we know that \(r_0={r_0}’\). Thus, we conclude that two vector spaces equipped with antisymmetric inner products are isometric if and only if the vector spaces and their respective radicals have the same dimension. It should be clear that precisely the same statement can be made for two complex vector spaces with symmetric inner products.

In the orthogonal decompositions of real vector spaces equipped with symmetric, or complex vector spaces equipped with Hermitian, inner products, aside from the degenerate subspaces, there are in each case two possibilities for the remaining \(1\)-dimensional subspaces. They may be either positive or negative definite. Denote by \(r_+\) and \(r_-\) respectively the number of positive and negative definite subspaces of \(V\). If \(V_+\) and \(V_-\) are the respective direct sums of these subspaces then it is clear that \(V_+\) and \(V_-\) are respectively positive and negative definite, that \(r_+=\dim V_+\) and \(r_-=\dim V_-\) and that we can write \(V=V_+\oplus V_-\oplus V_0\). The triple \((r_+,r_-,r_0)\) is called the signature of the inner product. Define the same primed quantities, \({r_+}’\), \({V’}_+\), \({r_-}’\) and \({V’}_-\) for \(V’\), whose decomposition can then be written as, \(V’={V’}_+\oplus {V’}_-\oplus {V’}_0\). Now suppose \(V\) and \(V’\) are isometric. We know that they must have the same dimension and that \(r_0={r_0}’\). If \(f:V\mapto V’\) is an isometry, then for any \(v\in V\), by virtue of the decomposition of \(V’\), we can write \(f(v)=f(v)_++f(v)_-+f(v)_0\) where \(f(v)_+\in {V’}_+\), \(f(v)_-\in {V’}_-\) and \(f(v)_0\in {V’}_0\). We consider the restriction \(f|_{V_+}\) of \(f\) to \(V_+\) and note that if \({P’}_+:V’\mapto V’\) is the projection operator onto the subspace \({V’}_+\), then \({P’}_+\circ f|_{V_+}:V_+\mapto {V’}_+\) is linear. Now suppose \(r_+>{r_+}’\), then there must exist some \(v\in V_+\) such that \({P’}_+\circ f|_{V_+}(v)=0\), so that for this \(v\), \(f(v)_+=0\) and we have \(f(v)=f(v)_-+f(v)_0\). But notice that then \((v,v)=(f(v),f(v))’=(f(v)_-,f(v)_-)’<0\) contradicting the fact that \(v\in V_+\). Similarly, if \(r_+<{r_+}'\), then we must have \(r_->{r_-}’\), and we would again arrive at a contradiction by considering the restriction \(f|_{V_-}\). So we conclude that isometry of \(V\) and \(V’\) implies \(r_+={r_+}’\), \(r_-={r_-}’\) and \(r_0={r_0}’\), that is the have the same signature. Conversely, if two vector spaces \(V\) and \(V’\) have the same signature, then their orthogonal decompositions can be appropriately ordered such that an isometry, \(f:V\mapto V’\), can be constructed as the direct sum, \(f=\oplus f_i\), of isometries, \(f_i:V_i\mapto V_i\), which we know must exist from the discussion in Low Dimensional Inner Product Spaces. Thus, we conclude that two real vector spaces equipped with symmetric inner products or two complex vector spaces equipped with Hermitian inner products are isometric if and only if they share the same signature.

Let us summarise the above discussion in the following

Theorem Symplectic spaces and complex orthogonal spaces are characterised up to isometry by the pair of integers \((n,r_0)\) where \(n\) is the dimension of the space and \(r_0\) is the dimension of the radical of the inner product. Real orthogonal spaces and complex Hermitian spaces are characterised up to isometry by their signature, \((r_+,r_-,r_0)\), where \(r_+\) and \(r_-\) are the dimensions of the subspaces upon which the restriction of the inner product is respectively positive and negative definite.

Theorem and Theorem tell us that for orthogonal and Hermitian spaces we can always find an orthogonal basis. Such a basis is particularly useful when the inner product is non-degenerate. In this case, any vector \(v\) may be expressed as \(v=v^je_j\), but then \((e_i,v)=(e_i,v^je_j)\) so
\begin{equation}
v=\sum_{i=1}^n\frac{(e_i,v)}{(e_i,e_i)}e_i.
\end{equation}
A vector \(v\) is said to be normalised if \((v,v)=\pm1\), and a set of normalised vectors \(\{e_i\}\) is said to be orthonormal if they are orthogonal with one another, that is, \((e_i,e_j)=0\) whenever \(i\neq j\) and \((e_i,e_i)=\pm1\). So given any orthogonal basis of a non-degenerate inner product space we can always choose an orthonormal basis. If the inner product is positive definite then with respect to an orthonormal basis any vector the convenient decomposition,
\begin{equation}
v=\sum_{i=1}^n(e_i,v)e_i.
\end{equation}

It should be clear that for real orthogonal and complex Hermitian spaces we can always find a basis in which the Gram matrix has the form,
\begin{equation}
\mathbf{G}=\begin{pmatrix}
\mathbf{I}_{r_+}&\mathbf{0}&\mathbf{0}\\
\mathbf{0}&-\mathbf{I}_{r_-}&\mathbf{0}\\
\mathbf{0}&\mathbf{0}&\mathbf{0}
\end{pmatrix},
\end{equation}
and for complex orthogonal spaces, a basis in which the Gram matrix has the form,
\begin{equation}
\mathbf{G}=\begin{pmatrix}
\mathbf{I}_{n-r_0}&\mathbf{0}\\
\mathbf{0}&\mathbf{0}
\end{pmatrix}.
\end{equation}

Clearly there’s no such thing as an orthogonal basis for a symplectic space. However, Theorem and Theorem do make it clear that we can always choose a basis, \(\{e_1,f_1,\dots,e_{(n-r_0)/2},f_{(n-r_0)/2},e_{(n-r_0)/2+1},\dots,e_{(n+r_0)/2}\}\), such that, \((e_i,f_i)=-(f_i,e_i)=1\) for \(i=1,\dots,(n-r_0)/2\), are the only non-zero inner products of basis elements. Reordering, we obtain the symplectic basis,
\begin{equation}
\{e_1,\dots,e_{(n-r_0)/2},f_1,\dots,f_{(n-r_0)/2},e_{(n-r_0)/2+1},\dots,e_{(n+r_0)/2}\},
\end{equation}
in terms of which, the Gram matrix has the form,
\begin{equation}
\mathbf{G}=\begin{pmatrix}
\mathbf{0}&\mathbf{I}_{(n-r_0)/2}&\mathbf{0}\\
-\mathbf{I}_{(n-r_0)/2}&\mathbf{0}&\mathbf{0}\\
\mathbf{0}&\mathbf{0}&\mathbf{0}
\end{pmatrix}
\end{equation}

Hermitian, real orthogonal and real symplectic geometries arise quite naturally together, in the following way. Suppose we have a complex vector space \(V\) on which is defined the Hermitian inner product \((\cdot,\cdot):V\times V\mapto\CC\). We consider its realification, \(V_\RR\), on which we define two inner products, \(g(v,w)=\Real(v,w)\) and \(\omega(v,w)=\Imag(v,w)\). It is clear that \(g\) is symmetric and \(\omega\) is antisymmetric and that therefore, \(g\) is positive definite if and only if \((\cdot,\cdot)\) is positive definite. We also have the following relations,
\begin{eqnarray}
g(v,w)=g(iv,iw)=\omega(v,iw)=-\omega(iv,w)\label{orthsympl1}\\
\omega(v,w)=\omega(iv,iw)=-g(v,iw)=g(iv,w).\label{orthsympl2}
\end{eqnarray}
Conversely, if on \(V_\RR\), there are defined inner products, \(g\) and \(\omega\), respectively symmetric and antisymmetric, which satisfy the relations \eqref{orthsympl1} and \eqref{orthsympl2}, then the inner product on \(V\) defined as \((v,w)=g(v,w)+i\omega(v,w)\) is Hermitian.

Consider, in particular, \(\CC^n\) with the standard (orthonormal) basis \(\{\mathbf{e}_i\}\) and Hermitian inner product,
\begin{equation*}
(\mathbf{v},\mathbf{w})=\sum_{i=1}^n{v_i}^*w_i.
\end{equation*}
It’s realification is \(\RR^{2n}\) with basis \(\{\mathbf{e}_1,\dots,\mathbf{e}_n,i\mathbf{e}_1,\dots,i\mathbf{e}_n\}\) which is orthonormal with respect to \(g\) and symplectic with respect to \(\omega\).

The Hermitian inner product, or more precisely its absolute value, measures the extent to which two vectors are parallel or linearly dependent over \(\CC\) while \(g\) measures this over \(\RR\). Thus, \(\omega\) measures the extent to which the linear dependence of two vectors is due to extending the base field from \(\RR\) from \(\CC\).

The upshot of this strong relationship, particularly between complex Hermitian and real orthogonal inner products, is that we can develop their theory largely in parallel.

Low Dimensional Inner Product Spaces

We consider the different flavours of inner product in turn.

Antisymmetric Clearly, for a \(1\)-dimensional vector space the only possible antisymmetric inner product is the zero inner product. In the \(2\)-dimensional case, suppose first that the skew-symmetric inner product is degenerate. Then there exists, in our \(2\)-dimensional space \(V\), a non-zero vector \(v\) such that \((u,v)=0\) \(\forall u\in V\) and we can extend \(v\) to a basis \(\{v,v’\}\) of \(V\). Now consider the inner product of two arbitrary elements, \(av+bv’\) and \(cv+dv’\), where \(a\), \(b\), \(c\) and \(d\) are elements of the base field (\(\RR\) or \(\CC\)). We have
\begin{equation*}
(av+bv’,cv+dv’)=ac(v,v)+ad(v,v’)+bc(v’,v)+bd(v’,v’)=0,
\end{equation*}
so the only degenerate antisymmetric inner product on a \(2\)-dimensional vector space is the zero inner product. So consider the non-degenerate case. There must exist two vectors, \(v_1\) and \(v_2\), say, such that \((v_1,v_2)\neq0\). In particular this means they are linearly independent and so we can take them to be a basis of \(V\). If \((v_1,v_2)=a\) then the map \(f:V\mapto K^2\) defined by \(f(v_1)=a\mathbf{e}_1\), \(f(v_2)=\mathbf{e}_2\) is clearly an isometry between \(V\) and the symplectic space \(K^2\) of Example. That is, any symplectic geometry on a 2-dimensional vector space is isometric to either the trivial, zero case, or that of Example.

Symmetric Consider first a \(1\)-dimensional vector space, \(V\), over \(\RR\). If \(v\in V\) is non-zero but \((v,v)=0\) then we have the trivial case. So suppose \((v,v)=a\) for some \(a\in\RR\). Either \(a>0\), in which case the inner product is positive definite and we have isometry with the inner product space \(\RR\) equipped with the inner product given by simple multiplication, \((x,y)=xy\) for \(x,y\in\RR\), or \(a<0\), in which case the inner product is negative definite and we have isometry with \(\RR\) equipped with the inner product, \((x,y)=-xy\). As already observed, these two cases are not isometric. If, however, \(V\) is over \(\CC\) then any non-degenerate inner product space is simply isometric to \(\CC\) with the inner product \((x,y)=xy\). Hermitian Any non-trivial \(1\)-dimensional Hermitian inner product space \(V\) must be such that \((v,v)\neq0\) for some \(v\in V\). In this case \((v,v)=a\) with \(a\) some non zero real number. Similar to the real symmetric case we have two cases, positive or negative definite, each clearly isometric to \(\CC\) equipped respectively with the Hermitian inner product \((x,y)=x^*y\) or \((x,y)=-x^*y\).

Definitions and Examples

Definition An inner product space is a vector space \(V\) over \(K=\RR\) or \(\CC\) equipped with an inner product, \((\cdot,\cdot):V\times V\mapto K\), associating a number \((v,w)\in K\) to every pair of vectors \(v,w\in V\). For \(u,v,w\in V\) and \(a,b\in K\), this inner product must satisfy the linearity condition,
\begin{equation}
(u,av+bw)=a(u,v)+b(u,w),\label{inprod-linear}
\end{equation}
together with one of three possible symmetry properties,
\begin{equation}
(v,w)=(w,v),\label{inprod-symmetric}
\end{equation}
in which case the inner product is called symmetric and the space will be said to have an orthogonal geometry,
\begin{equation}
(v,w)=(w,v)^*,\label{inprod-hermitian}
\end{equation}
in which case the inner product is called Hermitian and the space will be said to have an Hermitian geometry, or
\begin{equation}
(v,w)=-(w,v),\label{inprod-symplectic}
\end{equation}
in which case the inner product is called antisymmetric and the space will be said to have a symplectic geometry.
If the inner product further satisifes the condition,
\begin{equation}
(v,v)\geq 0\text{ and }(v,v)=0 \iff v=0\label{inprod-posdef},
\end{equation}
then it is said to be positive definite, or, if it satisifes the weaker condition,
\begin{equation}
(v,w)=0\:\,\forall v\in V\implies w=0,\label{inprod-non sing}
\end{equation}
then it is said to be non-singular or non-degenerate.

A series of remarks relating to this definition are in order.

Remark When \(K=\CC\) and the inner product is Hermitian, we have,
\begin{equation}
(au+bv,w)=a^*(u,w)+b^*(v,w).
\end{equation}
The inner product is then said to be sequilinear with respect to the first argument. Notice also, that, \((v,v)\in\RR\), \(\forall v\in V\).

Remark In all cases, aside from complex Hermitian, the inner product is bilinear.

Remark When \(K=\RR\), an Hermitian inner product is simply symmetric so when considering Hermitian geometries we only consider vector spaces over \(\CC\).

Remark A negative definite inner product is of course such that \((v,v)\leq0\) and \((v,v)=0\) if and only if \(v=0\).

Remark In a space with symplectic geometry we have \((v,v)=0\), \(\forall v\in V\). In particular, such a space can never be positive (or negative) definite, but may be non-degenerate.

Remark The three symmetry properties described in the definition ensure that \((v,w)=0\) if and only if \((w,v)=0\).

If \(\{e_i\}\) is a basis for \(V\), then we can define a matrix \(\mathbf{G}\) with components \(G_{ij}=(e_i,e_j)\). \(\mathbf{G}\) is called the Gram matrix or matrix of the inner product with respect to the basis \(\{e_i\}\). Symmetric, Hermitian and anti-symmetric inner products then correspond respectively to Gram matrices of that type (recall that \(\mathbf{G}\) is Hermitian if \(\mathbf{G}=\mathbf{G}^\dagger\), where \(\mathbf{G}^\dagger\) is the complex conjugate of the transpose).

If we change basis according to \(e’_i=P_i^je_j\), then \(G’_{ij}=(e’_i,e’_j)=(P_i^ke_k,P_i^le_l)=(P_i^k)^*G_{kl}P_i^l\). So when \(K=\RR\) we have \(\mathbf{G}’=\mathbf{P}^\mathsf{T}\mathbf{G}\mathbf{P}\), and for \(K=\CC\), \(\mathbf{G}’=\mathbf{P}^\dagger\mathbf{G}\mathbf{P}\). In either case the matrices \(\mathbf{G}’\) and \(\mathbf{G}\) are said to be congruent. Note that congruent matrices have the same rank so it makes sense to define the rank of an inner product as the rank of its corresponding Gram matrix in some basis.

Let us consider the property of non-degeneracy in terms of the Gram matrix. Given a basis \(\{e_i\}\) of \(V\), we can define a linear map \(L_\mathbf{G}\) in the usual way such that for any \(v=v^ie_i\in V\), \(L_\mathbf{G}v=\sum_{i,j=1}^nG_{ji}v^ie_j\). Then the rank of \(\mathbf{G}\) is just \(n-\dim\ker L_\mathbf{G}\). A vector \(v\in\ker L_\mathbf{G}\) if and only if \(\sum G_{ji}v^i=0\) for each \(j=1,\dots,n\). But notice that non-degeneracy is equivalent to the statement that \((e_j,v)=0\) for all \(j=1,\dots,n\) implies \(v=0\) that is \(G_{ji}v^j=0\) for each \(j=1,\dots,n\) implies \(v=0\). So we see non-degeneracy is equivalent to \(\ker L_\mathbf{G}\) being trivial, that is, the gram matrix, \(\mathbf{G}\), being of full rank.

A real symmetric inner product space, with a positive definite inner product, is also called a Euclidean vector space.

Example The standard example of a Euclidean vector space is \(\RR^n\) with the inner product of any pair of vectors \(\mathbf{v},\mathbf{w}\in\RR^n\) given by the usual dot or scalar product,
\begin{equation}
(\mathbf{v},\mathbf{w})=\mathbf{v}\cdot\mathbf{w}=\sum_{i=1}^nv^iw^i.
\end{equation}

A real symmetric inner product space with a non-singular inner product is called a pseudo-Euclidean space.

Example The Minkowski space, sometimes denoted, \(\MM^4\), of special relativity is an important example of a pseudo-Euclidean space. It is \(\RR^4\) equipped with the inner product,
\begin{equation}
(v,w)=v^0w^0-\sum_{i=1}^3v^iw^i.
\end{equation}
(Four)-vectors of \(\MM^4\) are conventionally indexed from \(0\), with the 0-component being the ‘time-like’ component and the others being the ‘space-like’ components.

Example \(\CC^n\) has a natural Hermitian geometry when equipped with the inner product defined on any pair of vectors \(v,w\in\CC^n\) by,
\begin{equation}
(v,w)=\sum_{i=1}^n{v^i}^*w^i.
\end{equation}

Example For a simple example of a symplectic geometry on \(K^2\), consider the inner product defined on the standard basis vectors by \((\mathbf{e}_1,\mathbf{e}_2)=1=-(\mathbf{e}_2,\mathbf{e}_1)\) so that for any \(\mathbf{v}\) and \(\mathbf{w}\) in \(K^2\) we have,
\begin{equation}
(\mathbf{v},\mathbf{w})=\det(\mathbf{v},\mathbf{w})=v^1w^2-v^2w^1,
\end{equation}
that is, in the case of \(K=\RR\), the signed area of the parallelogram spanned by \(\mathbf{v}\) and \(\mathbf{w}\).

We’ll see shortly that this symplectic geometry is in fact the only non-degenerate possibility for a 2-dimensional space, up to a notion of equivalence we now define.

Definition An isometry between inner product spaces \(V\) and \(W\) is a linear isomorphism, \(f:V\mapto W\), which preserves the values of the inner products. That is,
\begin{equation}
(u,v)_V=(f(u),f(v))_W,
\end{equation}
for all \(u,v\in V\), where \((\cdot,\cdot)_V\) and \((\cdot,\cdot)_W\) are the inner products on \(V\) and \(W\) respectively. If such an isometry exists, the inner product spaces are said to be isometric.

Remark Clearly, isometric inner product spaces have Gram matrices of the same rank.

Remark Isomorphic spaces equipped with the trivial, zero inner product, are trivially isometric.

Example On \(\RR\), multiplication defines a symmetric inner product, \((x,y)=xy\), which is clearly positive definite. We could also define a negative definite symmetric inner product as, \((x,y)=-xy\). These two inner product spaces cannot be isometric since any automorphism of \(\RR\) is of the form \(f(x)=ax\), \(a\in\RR\), \(a\neq0\), and for \(f\) to be an isometry we’d need, \(x^2=-a^2x^2\), for all \(x\in\RR\). Indeed, this shows us why, on \(\CC\), the inner product spaces with symmetric inner products \((x,y)=xy\) and \((x,y)=-xy\), \(x,y\in\CC\) are isometric – just consider the automorphism \(f(x)=ix\). Staying with \(\CC\), the inner product spaces with Hermitian inner products \((x,y)=x^*y\), the positive definite case, and \((x,y)=-x^*y\), the negative definite case, are not isometric since any automorphism of \(\CC\) is of the form \(f(x)=ax\), \(a\in\CC\), \(a\neq0\), and we’d need, \(\abs{x}^2=-\abs{a}^2\abs{x}^2\), for all \(x\in\CC\).

Finite dimensional vector spaces are classified up to isomorphism in terms of an integer, \(n\), their dimension. In other words two vector spaces are isomorphic if and only if they have the same dimension. Similarly, we would like to classify inner product spaces up to isometry. The dimension will clearly be one of the ‘labels’ of these equivalence classes since in particular isometric spaces are isomorphic. The question is, what other data, related to the inner product, is required to characterise isometric spaces? We’ll see that the key to answering this question lies in expressing a given inner product space as a direct sum of low dimensional ones – the type of inner product determines the structural data needed to characterise the decomposition. We begin, therefore, by considering in detail, low dimensional inner product spaces up to isometry.

Applications of the Jordan Normal Form

We have already met the characteristic polynomial, \(p_T(x)\), of a linear operator \(T\) on a vector space \(V\), which, when the underlying field is algebraically closed, factors as \(p_T(x)=\prod_{i=1}^r(x-\lambda_i)^{n_i}\) with \(n_i\) the dimension of the generalised eigenspace of the eigenvalue \(\lambda_i\). It is an example of an annihilating polynomial of \(T\), that is, a polynomial \(f\) such that \(f(T)=0\). The minimal polynomial of \(T\) is the annihilating polynomial of \(T\) of least degree.

Under the assumption of an algebraically closed field, we know that \(T\) may be represented by a matrix \(\mathbf{T}\) in Jordan normal form. For a given eigenvalue, \(\lambda_i\), let us denote by \(k_i\) the maximal order of the Jordan blocks associated with \(\lambda_i\). Then the minimal polynomial of \(T\) is given by
\begin{equation}
m_T(x)=\prod_{i=1}^r(x-\lambda_i)^{k_i}.
\end{equation}

Computing powers of a matrix…
Solving differential equations…

The Real Jordan Normal Form

We can use the Jordan normal form of a linear operator defined over \(\CC\) to establish what might be called a ‘real Jordan normal form’ for a linear operator, \(T\), on a vector space, \(V\), over \(\RR\). The idea is to use the basis corresponding to the Jordan normal form of \(T\)’s complexification, \(T_\CC:V_\CC\mapto V_\CC\), to identify a distinguished basis for \(V\) in which the matrix representation of \(T\) has a ‘nice’ form.

Recall that the complexification, \(V_\CC\), of \(V\) consists of pairs \((x,y)\) such that \(x,y\in V\) and \((a+ib)(x,y)=(ax-by,ay+bx)\) for \(a,b\in\RR\) and that \(T_\CC\) acts on \(V_\CC\) according to \(T_\CC(x,y)=(Tx,Ty)\). This operator may have both real and complex eigenvalues, with its real eigenvalues corresponding to eigenvalues of \(T\) and complex eigenvalues, the ‘missing’ eigenvalues of \(T\), coming in conjugate pairs. Clearly Jordan bases for the generalised eigenspaces of \emph{real} eigenvalues of \(T_\CC\) can be identified as images, through the natural inclusion, \(v\mapsto(v,0)\) of \(V\) in \(V_\CC\), of the Jordan bases of the generalised eigenspaces in \(V\) of those eigenvalues of \(T\).

So we focus on the conjugate pairs, \((\lambda_i,{\lambda_i}^*)\), of complex eigenvalues of \(T_\CC\). As we’ve seen, the number and size of the Jordan blocks corresponding to an eigenvalue \(\lambda_i\) is determined by the dimensions of the spaces \(\ker(T_\CC-\lambda_i\id_{V_{\CC}})^k\) for \(k\leq\dim V\). But taking complex conjugates, we see that \((T_\CC-\lambda_i\id_{V_{\CC}})^kv=0\) is equivalent to \((T_\CC-{\lambda_i}^*\id_{V_{\CC}})^kv^*=0\). That is, the map \(v\mapsto v^*\) determines a one-to-one correspondence between \(\ker(T_\CC-\lambda_i\id_{V_{\CC}})^k\) and \(\ker(T_\CC-{\lambda_i}^*\id_{V_{\CC}})^k\). In other words, there is a one-to-one correspondence between the Jordan blocks corresponding to conjugate eigenvalues \(\lambda_i\) and \({\lambda_i}^*\) of \(T_\CC\). So if \((x_i,y_i)\), \(i=1\dots n_i\), is a basis for the generalised eigenspace of \(\lambda_i\) in \(V_\CC\) then \((x_i,-y_i)\), \(i=1\dots n_i\), is the basis for the generalised eigenspace of \({\lambda_i}^*\) in \(V_\CC\). But this means that the \(x_i,y_i\in V\) are linearly independent in \(V\). So we may consider the matrix representation of the restriction of \(T\) to \(\Span(x_1,y_1,\dots,x_{n_i},y_{n_i})\) with respect to this basis. It is not difficult to see that starting with the \(n_i\times n_i\) Jordan matrix of the restriction of \(T_\CC\) to the generalised eigenspace of \(\lambda_i\), this is just the \(2n_i\times2n_i\) matrix obtained by replacing
\begin{equation*}
\lambda_i\mapsto\begin{pmatrix}
a_i&b_i\\
-b_i&a_i
\end{pmatrix}
\end{equation*}
where \(\lambda_i=a_i+ib_i\), on the diagonal and
\begin{equation*}
1\mapsto\begin{pmatrix}
1&0\\
0&1
\end{pmatrix}
\end{equation*}
on the superdiagonal.

In summary, we have obtained the following

Theorem (Real Jordan normal form) For any linear operator \(T\in\mathcal{L}(V)\) on a finite dimensional real vector space \(V\), there is a basis for \(V\) such that the matrix representation of \(T\), \(\mathbf{T}\), has the form
\begin{equation*}
\mathbf{T}=\begin{pmatrix}
\mathbf{T}_1& &\mathbf{0}\\
&\ddots& \\
\mathbf{0}& &\mathbf{T}_m
\end{pmatrix}
\end{equation*}
where each \(\mathbf{T}_i\) has the form
\begin{equation*}
\mathbf{T}_i=\begin{pmatrix}
\boldsymbol{\lambda}_i&\mathbf{1}& &\mathbf{0}\\
&\ddots&\ddots& \\
& &\ddots&\mathbf{1}\\
\mathbf{0}& & &\boldsymbol{\lambda}_i
\end{pmatrix},
\end{equation*}
with \(\boldsymbol{\lambda_i}\) and \(\mathbf{1}\) being simply the eigenvalue \(\lambda_i\) and the number \(1\) respectively in the case that \(\lambda_i\) is a (real) eigenvalue of \(T\) and
\begin{equation*}
\boldsymbol{\lambda}_i=\begin{pmatrix}
a_i&b_i\\
-b_i&a_i
\end{pmatrix}
\quad\text{and}\quad\mathbf{1}=\begin{pmatrix}
1&0\\
0&1
\end{pmatrix}
\end{equation*}
for each complex conjugate pair, \((\lambda_i,{\lambda_i}^*)\), \(\lambda_i=a_i+ib_i\), of (complex) eigenvalues of the complexified operator \(T_\CC\).

The Jordan Normal Form

Thus far, we have seen that given an \(n\)-dimensional vector space \(V\) defined over an algebraically closed field, then for any linear operator, \(T\in\mathcal{L}(V)\), a basis of \(V\) may be chosen such that the matrix representation of \(T\) is upper triangular. In certain optimal cases we know that a basis can be chosen so that the matrix representation is diagonal with entries given by the eigenvalues of \(T\) each appearing a number of times given by its geometric multiplicity. In these ‘good’ cases then, the linear operators are characterised by \(n\) continuous parameters, \(\lambda_i\), the eigenvalues.

The obvious question is whether we can extend such a characterisation beyond the particularly nice, diagonalisable, cases. The problem we have is clear, in general the characteristic polynomial \(p_T(x)=\prod_{i=1}^r(x-\lambda_i)^{n_i}\) does not have the property that \(\dim V_{\lambda_i}=n_i\). In other words, the eigenspaces may be a little too small and so prevent us achieving a perfect direct sum decomposition \(V=V_{\lambda_1}\oplus\dots\oplus V_{\lambda_r}\). However, by slightly generalising the notion of eigenspace we can achieve such a decomposition, and hence, a useful characterisation of general linear operators.

Whereas an eigenvector \(v\) of \(T\) with eigenvalue \(\lambda\) is such that \((T-\lambda\id_V)v=0\), a generalised eigenvector \(v\) of \(\lambda\) is such that \((T-\lambda\id_V)^kv=0\) for some \(k\geq 1\). The set of all such vectors corresponding to a given eigenvalue is then called the generalised eigenspace with eigenvalue \(\lambda\). This definition is actually equivalent to the following.

Definition The generalised eigenspace with eigenvalue \(\lambda\) of a linear operator \(T\in\mathcal{L}(V)\) is
\begin{equation}
V_{[\lambda]}=\left\{v\in V\mid(T-\lambda\id_V)^{\dim V}v=0\right\}.
\end{equation}

To see the equivalence we need to confirm that any \(v\) such that \((T-\lambda\id_V)^kv=0\) for some \(k\geq 1\) is in \(\ker(T-\lambda\id_V)^{\dim V}\). Now in general, for any \(T\in\mathcal{L}(V)\), we have an obvious chain of inclusions,
\begin{equation*}
\{0\}=\ker T^0\subseteq\ker T^1\subseteq\dotsm\subseteq\ker T^k\subseteq\ker T^{k+1}\subseteq\dotsm.
\end{equation*}
Observe, that if for some \(m\geq 1\), \(\ker T^m=\ker T^{m+1}\), then all subsequent spaces in the chain are also equal (for \(k\in\ZZ_{>0}\), if \(v\in\ker T^{m+k+1}\) then \(0=T^{m+1}(T^kv)\) so \(T^kv\in\ker T^{m+1}=\ker T^{m}\) so \(v\in\ker T^{k+m}\)). Thus the inclusions are strict until they are all equalities. In particular, if the chain hasn’t stopped growing by \(\ker T^{\dim V}\) then it certainly does at that point since we are limited by the dimension of \(V\). Thus, the definitions are indeed equivalent.

Just how big are the generalised eigenspaces? Before answering this question let’s establish that, in common with eigenspaces, generalised eigenspaces corresponding to distinct eigenvalues are distinct.

Proposition For any pair of distinct eigenvalues \(\lambda\neq\mu\), \(V_{[\lambda]}\cap V_{[\mu]}=\{0\}\).

Proof Suppose on the contrary that there was a non-zero \(v\in V\) such that \(v\in V_{[\lambda]}\) and \(v\in V_{[\mu]}\). Then, let \(k\) be the smallest integer such that \((T-\lambda\id_V)^kv=0\). It follows that \((T-\lambda\id_V)^{k-1}v\neq 0\) must be a genuine eigenvector of \(T\), call it \(v_\lambda\). But \(v\in V_{[\mu]}\), so \((T-\mu\id_V)^lv=0\) for some \(l\geq 1\), and thus
\begin{align*}
(T-\mu\id_V)^lv_\lambda&=(T-\mu\id_V)^l(T-\lambda\id_V)^{k-1}v\\
&=(T-\lambda\id_V)^{k-1}(T-\mu\id_V)^lv\\
&=0,
\end{align*}
so \(v_\lambda\in V_{[\mu]}\). But for any such eigenvector \(v_\lambda\) and all \(j\geq 1\), \((T-\mu\id_V)^jv_\lambda=(\lambda-\mu)^j\neq 0\) so \(v_\lambda\notin V_{[\mu]}\) and we have established a contradiction.\(\blacksquare\)

In particular, it follows from this that for any \(\mu\neq\lambda\), the restriction of \((T-\mu\id_V)\) to \(V_{[\lambda]}\) is one-to-one. That is, \(\lambda\) is the only eigenvalue of the restriction \(T|_{V_{[\lambda]}}\) of \(T\) to \(V_{[\lambda]}\). So the characteristic polynomial of \(T|_{V_{[\lambda]}}\) must be \((x-\lambda)^{\dim V_{[\lambda]}}\). Now, if the characteristic polynomial of \(T\) is \(p_T(x)=\prod_{i=1}^r(x-\lambda_i)^{n_i}\) then since \(V_{[\lambda_i]}\) is a \(T\)-invariant subspace of \(V\) we know that \(p_T(x)=(\lambda_i-x)^{\dim V_{[\lambda_i]}}p_{T’}(x)\) where \(T’:V/V_{[\lambda_i]}\mapto V/V_{[\lambda_i]}\) is the linear operator induced from \(T\). Thus, if we can show that \(\lambda_i\) is not a root of \(p_{T’}(x)\) then we have established that \(\dim V_{[\lambda_i]}=n_i\). That is we will have established that the dimensions of the generalised eigenspaces are precisely the algebraic multiplicities of the corresponding eigenvalues. So assume, on the contrary, that \(\lambda_i\) is a root of \(p_{T’}(x)\), then there must exist a non-zero \(v+V_{[\lambda_i]}\in V/V_{[\lambda_i]}\) such that \(T'(v+V_{[\lambda_i]})=\lambda_i(v+V_{[\lambda_i]})\). That is there exists some \(v\notin V_{[\lambda_i]}\) such that \((T-\lambda_i\id_V)v\in V_{[\lambda_i]}\). But then \(v\in V_{[\lambda_i]}\) and we have a contradicition.

Now consider the sum, \(U=\sum_{i=1}^rV_{[\lambda_i]}\). We would like to establish this to be a direct sum. That is, we must demonstrate the uniqueness of the expansion of an arbitrary element \(u\in U\) as a sum \(u=v_1+\dots+v_r\) of vectors \(v_i\in V_{[\lambda_i]}\). This is equivalent to showing that if \(v_1+\dots+v_r=0\) then \(v_i=0\) for all \(i=1\dots r\) which we may establish by applying in turn the linear operators \(\prod_{j\neq i}^r(T-\lambda_j\id_V)^{\dim V}\) for \(i=1\dots r\) to \(v_1+\dots+v_n=0\). For example, applying
\begin{equation}
(T-\lambda_2\id_V)^{\dim V}\cdots(T-\lambda_n\id_V)^{\dim V},
\end{equation}
we get
\begin{equation}
(\lambda_1-\lambda_2)^{\dim V}\cdots(\lambda_1-\lambda_n)^{\dim V}v_1=0,
\end{equation}
from which we deduce \(v_1=0\).

Thus, the sum is indeed direct and since each summand, \(V_{[\lambda_i]}\), has dimension \(\dim V_{[\lambda_i]}=n_i\) and \(\sum n_i=\dim V\) we have established the following theorem.

Theorem If \(\lambda_1,\dots,\lambda_r\) are the distinct eigenvalues of a linear operator \(T\in\mathcal{L}(V)\) with characteristic polynomial \(p_T(x)=\prod_{i=1}^r(x-\lambda_i)^{n_i}\), then
\begin{equation}
V=\bigoplus_{i=1}^rV_{[\lambda_i]},
\end{equation}
and \(\dim V_{[\lambda_i]}=n_i\).

From this theorem we see that any linear operator \(T\in\mathcal{L}(V)\) can be represented by a matrix in block diagonal form with each block corresponding to distinct eigenvalues of \(T\) and the size of each block equal to the algebraic multiplicity of the corresponding eigenvalue. Define \(T_i\) to be the restriction of \(T\) to the generalised eigenspace \(V_{[\lambda_i]}\). Since we are working over an algebraically closed field, each block, which is just a matrix representation of each \(T_i\), can be made upper triangular and since \(T_i\) has only \(\lambda_i\) as an eigenvalue, the diagonal elements of each upper triangular block are the eigenvalue corresponding to that block. If we are going to do better than this we’ll have to find a general way of choosing ‘nice’ bases for the generalised eigenspaces. By Cayley-Hamilton we know that each \(T_i\) satisfies \((T_i-\lambda_i\id_{V_{[\lambda_i]}})^{n_i}=0\). So \(T_i=\lambda_i\id_{V_{[\lambda_i]}}+N_i\) where \(N_i^{n_i}=0\). That is, \(T_i\) is a sum of a multiple of the identity operator and a nilpotent operator.

Thus our task is really to find a basis that expresses the matrix of any nilpotent operator in some sort of canonical way. Let us define \(m(v)\) to be the maximum positive integer such that \(N^{m(v)}v\neq 0\). This is sometimes called the height of the vector \(v\) with respect to the operator \(N\). The following result establishes the existence and uniqueness of such a basis.

Theorem If \(N\in\mathcal{L}(V)\) is nilpotent then there exist vectors \(v_1,\dots,v_k\in V\) such that
\begin{equation}
\{v_1,Nv_1,\dots,N^{m(v_1)}v_1,\dots,v_k,Nv_k,\dots,N^{m(v_k)}v_k\}\label{equ:nilpotent_basis}
\end{equation}
is a basis of \(V\). Here \(k=\dim\ker N\) and the \(m(v_i)\) are unique up to permutation.

Proof If \(\dim V=1\) then the result is trivial. We will prove it for general dimension by induction on the dimension of \(V\). The ‘trick’, as it were, is to observe that since the linear operator \(N\) has a non-trivial kernel, \(\img N\) is strictly contained within \(V\) and so we may assume by the induction hypothesis that the result holds for this subspace. That is, we may assume we have vectors, \(u_1,\dots,u_j\in\img N\), such that,
\begin{equation*}
\{u_1,Nu_1,\dots,N^{m(u_1)}u_1,\dots,u_j,Nu_j,\dots,N^{m(u_j)}u_j\},
\end{equation*}
is a basis of \(\img N\). Let us define \(v_i\), \(1\leq i\leq j\) to be such that \(u_i=Nv_i\). Then \(m(u_i)=m(v_i)-1\) and our basis of \(\img N\) is,
\begin{equation*}
\{Nv_1,N^2v_1,\dots,N^{m(v_1)}v_1,\dots,Nv_j,N^2v_j,\dots,N^{m(v_j)}v_j\}.
\end{equation*}
The vectors \(\{N^{m(v_1)}v_1,\dots,N^{m(v_j)}v_j\}\) are linearly independent and belong to \(\ker N\), indeed they are a basis for the kernel of the restriction to \(\img N\) of \(N\), \(\ker N|_{\img N}\). Define the vectors \(v_{j+1},\dots,v_k\) as the vectors which extend those \(j\) vectors to a basis of \(\ker N\). With \(v_1,\dots,v_k\) so defined, let us consider the set of vectors
\begin{equation*}
\{v_1,Nv_1,\dots,N^{m(v_1)}v_1,\dots,v_j,Nv_j,\dots,N^{m(v_j)}v_j,v_{j+1},\dots,v_k\}.
\end{equation*}
Notice that there are \(\dim\img N+\dim\ker N=\dim V\) of them, so if we can show they are linearly independent then they are a basis for \(V\) as specified in the theorem (\(m(v_i)=0\) for \(j+1\leq i\leq k\)).
So let us suppose we have,
\begin{align*}
0&=a_{1,0}v_1+\dots+a_{1,m(v_1)}N^{m(v_1)}v_1+\dots\\
&+a_{j,0}v_j+\dots+a_{j,m(v_j)}N^{m(v_j)}v_j\\
&+a_{j+1,0}v_{j+1}+\dots+a_{k,0}v_k.
\end{align*}
Applying \(N\) to this we obtain
\begin{equation*}
a_{1,0}u_1+\dots a_{1,m(u_1)}N^{m(u_1)}u_1+\dots+a_{j,0}u_j+\dots a_{j,m(u_j)}N^{m(u_j)}u_j=0.
\end{equation*}
By the induction hypothesis, this implies \(a_{r,s}=0\) for \(1\leq r\leq j\), \(0\leq s\leq m(v_r)-1\). Thus, we are reduced to considering
\begin{equation*}
a_{1,m(v_1)}N^{m(v_1)}v_1+\dots+a_{j,m(v_j)}N^{m(v_j)}v_j+
a_{j+1,0}v_{j+1}+\dots+a_{k,0}v_k=0.
\end{equation*}
But the \(v_{j+1},\dots,v_k\) were defined to be precisely an extension of the basis \begin{equation*}
\{N^{m(v_1)}v_1,\dots,N^{m(v_j)}v_j\}
\end{equation*}
of \(\ker N|_{\img N}\) to a basis of \(\ker N\). That is
\begin{equation*}
\ker N=\ker N|_{\img N}\oplus\Span(v_{j+1},\dots,v_k),
\end{equation*}
so we must have
\begin{equation*}
a_{1,m(v_1)}N^{m(v_1)}v_1+\dots+a_{j,m(v_j)}N^{m(v_j)}v_j=0
\end{equation*}
and
\begin{equation*}
a_{j+1,0}v_{j+1}+\dots+a_{k,0}v_k=0
\end{equation*}
and hence all the coefficients \(a_{i,j}\) must be 0. We have thus established the existence of \(k=\dim\ker N\) vectors \(v_i\) such that \eqref{equ:nilpotent_basis} is a basis of \(V\). For uniqueness, observe that \(\dim\ker N=|\{j\mid m(v_j)\geq 0\}|=k\), that is, the number of \(v_i\) whose height is greater than or equal to \(0\) is \(\dim\ker N\). Likewise, \(\dim\ker N^2=\dim\ker N+|\{j\mid m(v_j)\geq 1\}|\), so that the number of \(v_i\) whose height is greater than \(1\) is \(\dim\ker N^2-\dim\ker N\). Continuing in this fashion we obtain a partition ¹ of \(n\) which may be represented as a Young tableau such as (in the case of \(n=11\))

In such a diagram the length of the first row is \(k\) with the length of successive rows given by \(\dim\ker N^i-\dim\ker N^{i-1}\). But note that the number of boxes in each column are then just the numbers \(m(v_i)\) the collection of heights. Thus the set of heights are completely determined by \(N\) and so the \(m(v_i)\) are indeed unique up to permutation.\(\blacksquare\)

Let’s consider what the matrix representation of some nilpotent operator \(N\) looks like with respect to this basis. In fact, we’ll make a small cosmetic tweak and flip the order of the basis elements corresponding to each \(v_i\). That is, we consider the basis
\begin{equation*}
(N^{m(v_1)}v_1,\dots,Nv_1,v_1,\dots,N^{m(v_k)}v_k,\dots,Nv_k,v_k).
\end{equation*}
Then the matrix representation of \(N\) with respect to this basis has a block diagonal, with \(k\) blocks each of size \(m(v_i)\), \(1\leq i\leq k\), and each of the form
\begin{equation*}
\begin{pmatrix}
0&1&&\mathbf{0}\\
&\ddots&\ddots&\\
& &\ddots&1\\
\mathbf{0}& & & 0
\end{pmatrix}.
\end{equation*}
We have arrived at the following result.

Theorem (Jordan normal form) For any linear operator \(T\in\mathcal{L}(V)\) on a finite dimensional vector space \(V\), over an algebraically closed field \(K\), there is a basis for \(V\) such that the matrix representation of \(T\), \(\mathbf{T}\), has the form
\begin{equation*}
\mathbf{T}=\begin{pmatrix}
\mathbf{T}_1& &\mathbf{0}\\
&\ddots& \\
\mathbf{0}& &\mathbf{T}_m
\end{pmatrix}
\end{equation*}
where each \(\mathbf{T}_i\), called a Jordan block, is an upper triangular matrix of the form
\begin{equation*}
\mathbf{T}_i=\begin{pmatrix}
\lambda_i&1& &\mathbf{0}\\
&\ddots&\ddots& \\
& &\ddots&1\\
\mathbf{0}& & &\lambda_i
\end{pmatrix},
\end{equation*}
with some eigenvalue \(\lambda_i\) of \(T\) on its diagonal. \(\mathbf{T}\) is called the Jordan normal form of \(T\) and is unique up to rearranging the order of the blocks.

In the language of group actions and orbits we have.

The orbits of the action of \(\text{GL}(V)\) on \(\mathcal{L}(V)\) where \(V\) is an \(n\)-dimensional vector space over an algebraically closed field are characterised by a set of \(n\) continuous parameters, the eigenvalues, together with some discrete parameters, partitions of the multiplicities of repeated eigenvalues.

Example Consider the matrix,
\begin{equation*}
\mathbf{A}=\begin{pmatrix}
2&-1&0\\
1&0&0\\
1&-1&1
\end{pmatrix}.
\end{equation*}
We form the characteristic equation,
\begin{align*}
\det(\mathbf{A}-\lambda\mathbf{I}_3)&=\det\begin{pmatrix}
2-\lambda&-1&0\\
1&-\lambda&0\\
1&-1&1-\lambda
\end{pmatrix}.\\
&=(1-\lambda)\det\begin{pmatrix}
2-\lambda\\
1&-\lambda
\end{pmatrix}\\
&=-(\lambda-1)^3.
\end{align*}
So we have one eigenvalue, \(\lambda=1\) with algebraic multiplicity \(3\). Consider then,
\(\mathbf{A}-\lambda\mathbf{I}_3\),
\begin{equation*}
\mathbf{A}-\lambda\mathbf{I}_3=\begin{pmatrix}
1&-1&0\\
1&-1&0\\
1&-1&0
\end{pmatrix}.
\end{equation*}
Clearly this has rank \(1\) so \(\dim\ker(\mathbf{A}-\lambda\mathbf{I}_3)=2\). We then know immediately that \(\dim\ker(\mathbf{A}-\lambda\mathbf{I}_3)^2=3\) and thus we have two blocks and the Jordan normal form is
\begin{equation*}
\begin{pmatrix}
1&0&0\\
0&1&1\\
0&0&1
\end{pmatrix}.
\end{equation*}

Notes:

Recall that a partition of some positive integer \(n\) is simply a way of writing \(n\) as a sum of positive integers \(n_i\) typically written as a tuple \((n_1,\dots,n_r)\) such that \(n_i\geq n_{i+1}\). A visual representation of a partition is given by its Young diagram. For example the partition \((2,1)\) of 3 would correspond to the diagram . ↩

The Cayley-Hamilton Theorem

As a vector space \(\mathcal{L}(V)\) is \(n^2\)-dimensional so there must exist some relationship between the \(n^2+1\) operators \(\id_V,T,\dots,T^{n^2}\). In fact, the following result, known as the Cayley-Hamilton theorem, guarantees a relationship between the powers of \(T\) up to \(n\).

Theorem (Cayley-Hamilton) Every linear operator \(T:V\mapto V\) satisfies its own characteristic equation, \(p_T(T)=0\). Equivalently, every \(n\times n\) matrix \(\mathbf{A}\) satisfies its own characteristic equation \(p_\mathbf{A}(\mathbf{A})=0\).

Proof When \(T\) is diagonalisable the result is obvious since by choosing the union of bases of the eigenspaces, \(V_{\lambda_i}\), as a basis of \(V\), any basis element is clearly annihilated by the product \((T-\lambda_1)\dots(T-\lambda_n)\). More generally, even if \(T\) is not diagonalisable, we know that we can always construct a basis \(v_i\), as in the discussion following Theorem, such that its matrix representation is upper triangular. As already observed, defining \(W_0=\{0\}\) and \(W_i=\Span(v_1,\dots,v_i)\) (\(W_n=V\)) for \(1\leq i\leq n\), \(W_i\) is \(T\)-invariant and \((T-\lambda_i\id_V)W_i\subseteq W_{i-1}\). Indeed,
\begin{align*}
(T-\lambda_n\id_V)V&\subseteq W_{n-1}\\
(T-\lambda_{n-1}\id_V)(T-\lambda_n\id_V)V&\subseteq W_{n-2}\\
&\vdots \\
\prod_{i=1}^n(T-\lambda_i\id_V)V&\subseteq W_0={0},
\end{align*}
that is, \(p_T(T)=0\).\(\blacksquare\)

Another way to see the Cayley-Hamilton result is as follows. Choose a basis \(\{e_i\}\) of \(V\) in terms of which \(Te_i=T_i^je_j\), the \(T^i_j\) being the components of the matrix representation of \(T\). We could write this as \((\delta^i_jT-T^i_jI_V)e_i=0\), or as the matrix equation,
\begin{equation*}
(T\mathbf{I}_{n}-\mathbf{T}^\mathsf{T})\begin{pmatrix}
e_1\\
\vdots\\
e_n
\end{pmatrix}
=
\begin{pmatrix}
T-T^1_1 & \dots & -T^n_1\\
\vdots & \ddots & \vdots\\
-T^1_n & \dots & T-T^n_n
\end{pmatrix}\begin{pmatrix}
e_1\\
\vdots\\
e_n
\end{pmatrix}
=0.
\end{equation*}
The matrix \(\mathbf{S}(T)\), defined by
\begin{equation*}
\mathbf{S}(T)=\begin{pmatrix}
T-T^1_1 & \dots & -T^n_1\\
\vdots & \ddots & \vdots\\
-T^1_n & \dots & T-T^n_n
\end{pmatrix}
\end{equation*}
exists in \(\text{Mat}_n(\text{End}(V))\), and as such would appear unlikely to be amenable to the techniques developed thus far for matrices over fields. In fact, we can regard \(\mathbf{S}(T)\) as a matrix over the commutative ring \(K[T]\), of polynomials in the symbol \(T\), with the obvious action on \(V\). As such, the standard definition and results from the theory of determinants, as described in Determinants, do indeed apply. In particular, we have
\begin{equation*}
\det(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})=p_\mathbf{T}(T),
\end{equation*}
and
\begin{equation*}
\adj(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})=\det(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})\mathbf{I}_n.
\end{equation*}
So
\begin{equation*}
0=\adj(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})
\begin{pmatrix}
e_1\\
\vdots\\
e_n
\end{pmatrix}
=\det(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})\begin{pmatrix}
e_1\\
\vdots\\
e_n
\end{pmatrix}=p_\mathbf{T}(T)\begin{pmatrix}
e_1\\
\vdots\\
e_n
\end{pmatrix},
\end{equation*}
and the result is established.

Diagonalisable Linear Operators

From now on, unless otherwise stated, we assume our vector spaces are defined over an algebraically closed field such as \(\CC\). Recall that in this case any linear operator on an \(n\)-dimensional vector space has a characteristic polynomial which factors as \(\prod_{i=1}^r(x-\lambda_i)^{n_i}\). This means in particular that any linear operator has at least one eigenvector and indeed, at least one per distinct eigenvalue.

Proposition A set of eigenvectors \(v_1,\dots,v_r\) corresponding to distinct eigenvalues \(\lambda_1,\dots,\lambda_r\) of a linear operator \(T\) are linearly independent.

Proof By assumption we have \(Tv_i=\lambda_iv_i\) for \(1\leq i\leq r\). Suppose there are numbers \(c^1,\dots,c^r\) such that \(c^iv_i=0\). Then we must have,
\begin{align*}
(T-\lambda_2)\dots(T-\lambda_r)c^iv_i&=c^1(T-\lambda_2)\dots(T-\lambda_r)v_1=0\\
&=c^1(\lambda_1-\lambda_2)\dots(\lambda_1-\lambda_r)v_1=0,
\end{align*}
which in turn means \(c_1=0\). In the same way we show \(c_2=\dots=c_n=0\), so proving linear independence.\(\blacksquare\)

Definition A linear operator \(T\in\mathcal{L}(V)\) is said to be diagonalisable if there is some basis of \(V\) with respect to which the matrix representation of \(T\) is diagonal. \ede

Suppose the factorisation of the characteristic polynomial is ‘nice’ in the sense that
\(p_T(x)=\prod_{i=1}^r(x-\lambda_i)^{n_i}\) with \(\dim V_{\lambda_i}=n_i\) for all \(i\). That is, the geometric multiplicity of each eigenvalue equals its algebraic multiplicity. Then, as follows from the Proposition, \(\sum_{\lambda_i}V_{\lambda_i}\) is a direct sum and so by equation, \(\dim(\sum_{\lambda_i}V_{\lambda_i})=\sum_i n_i=n\), and in any basis which is the union of bases for the \(V_{\lambda_i}\) the matrix representation of \(T\) is diagonal. The converse is obvious so we have demonstrated the following

Corollary A linear operator \(T\) is diagonalisable if and only if \(p_T(x)=\prod_{i=1}^r(x-\lambda_i)^{n_i}\) with \(\dim V_{\lambda_i}=n_i\) for all \(i\).

Suppose \(V\) is an \(n\)-dimensional vector space and \(T:V\mapto V\) a diagonalisable linear operator. If \(\{e_i\}\) is a basis of \(V\) with respect to which \(T\) has matrix representation \(\mathbf{T}\) such that \(Te_i=T_i^je_j\) and if \(\{v_i\}\) is a basis for \(V\) which is a union of eigenspace bases, such that \(Tv_i=\lambda_iv_i\) (not all \(\lambda_i\) necessarily distinct) then we may relate the two bases as \(v_i=P_i^je_j\) where \(P_i^j\) are the components of an invertible matrix \(\mathbf{P}\) and since \(Tv_i=P_i^jT_j^k{P^{-1}}_k^lv_l=\lambda_iv_i\) we see that the similarity transformation, \(\mathbf{P}\) diagonalises \(\mathbf{T}\). In particular, any \(n\times n\) matrix, \(\mathbf{A}\), over an algebraically closed field \(K\) is a linear operator on \(K^n\). In terms of the standard basis, \(\{\mathbf{e}_i\}\), of \(K^n\) we have \(\mathbf{A}\mathbf{e}_i=A_i^j\mathbf{e}_j\) and if \(\mathbf{A}\) is diagonalisable then there must exist a basis, \(\{\mathbf{v}_i\}\), of \(K^n\) such that \(\mathbf{A}\mathbf{v}_i=\lambda_i\mathbf{v}_i\) and an invertible matrix \(\mathbf{P}\) such that \(\mathbf{v}_i=P_i^j\mathbf{e}_j\). Note that \(\mathbf{P}\) is precisely the matrix whose \(i\)th column is the \(i\)th vector \(\mathbf{v}_i\). A diagonalisable matrix is diagonalised by the matrix whose columns are its eigenvectors.

Example (The Pauli matrices) Extremely important in quantum mechanics, the Pauli matrices, \(\sigma_x\), \(\sigma_y\) and \(\sigma_z\) are given by
\begin{equation}
\sigma_x=\begin{pmatrix}
0&1\\1&0
\end{pmatrix}
\quad
\sigma_y=\begin{pmatrix}
0&-i\\i&0
\end{pmatrix}
\quad
\sigma_z=\begin{pmatrix}
1&0\\0&-1
\end{pmatrix}.
\end{equation}
It is not difficult to see that each has the pair of eigenvalues \(\pm1\) so \(\sigma_x\) and \(\sigma_y\) are in fact similar to \(\sigma_z\) with the similarity transformation matrices given by
\begin{equation*}
\begin{pmatrix}
1&1\\
1&-1
\end{pmatrix}
\quad\text{and}\quad\begin{pmatrix}
1&1\\
i&-i
\end{pmatrix}
\end{equation*}
for \(\sigma_x\) and \(\sigma_y\) respectively.

The Trace

Another basis independent property of linear operators is the trace.

Definition The trace of an \(n\times n\) matrix \(\mathbf{A}\), \(\tr\mathbf{A}\) is defined to be
\begin{equation}
\tr\mathbf{A}=\sum_{i=1}^nA_i^i=A_i^i.
\end{equation}

As \(\tr\mathbf{A}\mathbf{B}=(AB)^i_i=A^i_jB^j_i=(BA)^i_i=\tr\mathbf{B}\mathbf{A}\) it follows that \(\tr\mathbf{P}^{-1}\mathbf{A}\mathbf{P}=\tr\mathbf{A}\) so it makes sense to define the trace of any linear operator \(T:V\mapto V\), \(\tr T\), as the trace of any matrix representation of \(T\).

Working over an algebraically closed field \(K\), since any matrix \(\mathbf{A}\in\text{Mat}_n(K)\) is similar to an upper triangular matrix, we have \(\tr\mathbf{A}=\sum_{i=1}^n\lambda_i\) and \(\det\mathbf{A}=\prod_{i=1}^n\lambda_i\) (not all \(\lambda_i\) necessarily distinct), quantities which are in fact encoded as particular coefficients in the characteristic polynomial,
\begin{equation}
p_\mathbf{A}(x)=x^n-\tr\mathbf{A}x^{n-1}+e_2x^{n-2}-\dots+(-1)^{n-1}e_{n-1}x+(-1)^n\det\mathbf{A}.
\end{equation}
The coefficients \(e_1=\tr\mathbf{A},e_2,\dots,e_{n-1},e_n=\det\mathbf{A}\) are called the elementary symmetric functions.

There is a nice relationship between the trace and determinant. It can be shown that the matrix exponential,
\begin{equation}
\exp\mathbf{A}=\sum_{i=0}^\infty\frac{1}{i!}\mathbf{A}^i=\mathbf{I}_n+\mathbf{A}+\frac{1}{2!}\mathbf{A}^2+\dots,
\end{equation}
converges for any \(n\times n\) matrix over \(K\). Consider then, the function on \(\RR\) defined as,
\begin{equation*}
f(t)=\det\exp\mathbf{A}t.
\end{equation*}
Differentiating, we find that
\begin{equation*}
\frac{df(t)}{dt}=\tr\mathbf{A}f(t),
\end{equation*}
so that \(\ln f(t)=\tr\mathbf{A}t\) and in particular we have the following relationship between the determinant and trace,
\begin{equation}
\det\exp\mathbf{A}=\exp\tr\mathbf{A}.
\end{equation}

The Problem of Outcomes

an Institute for Enquiring Minds production