Category Archives: Linear Operators

Applications of the Jordan Normal Form

We have already met the characteristic polynomial, \(p_T(x)\), of a linear operator \(T\) on a vector space \(V\), which, when the underlying field is algebraically closed, factors as \(p_T(x)=\prod_{i=1}^r(x-\lambda_i)^{n_i}\) with \(n_i\) the dimension of the generalised eigenspace of the eigenvalue \(\lambda_i\). It is an example of an annihilating polynomial of \(T\), that is, a polynomial \(f\) such that \(f(T)=0\). The minimal polynomial of \(T\) is the annihilating polynomial of \(T\) of least degree.

Under the assumption of an algebraically closed field, we know that \(T\) may be represented by a matrix \(\mathbf{T}\) in Jordan normal form. For a given eigenvalue, \(\lambda_i\), let us denote by \(k_i\) the maximal order of the Jordan blocks associated with \(\lambda_i\). Then the minimal polynomial of \(T\) is given by
\begin{equation}
m_T(x)=\prod_{i=1}^r(x-\lambda_i)^{k_i}.
\end{equation}

Computing powers of a matrix…
Solving differential equations…

The Real Jordan Normal Form

We can use the Jordan normal form of a linear operator defined over \(\CC\) to establish what might be called a ‘real Jordan normal form’ for a linear operator, \(T\), on a vector space, \(V\), over \(\RR\). The idea is to use the basis corresponding to the Jordan normal form of \(T\)’s complexification, \(T_\CC:V_\CC\mapto V_\CC\), to identify a distinguished basis for \(V\) in which the matrix representation of \(T\) has a ‘nice’ form.

Recall that the complexification, \(V_\CC\), of \(V\) consists of pairs \((x,y)\) such that \(x,y\in V\) and \((a+ib)(x,y)=(ax-by,ay+bx)\) for \(a,b\in\RR\) and that \(T_\CC\) acts on \(V_\CC\) according to \(T_\CC(x,y)=(Tx,Ty)\). This operator may have both real and complex eigenvalues, with its real eigenvalues corresponding to eigenvalues of \(T\) and complex eigenvalues, the ‘missing’ eigenvalues of \(T\), coming in conjugate pairs. Clearly Jordan bases for the generalised eigenspaces of \emph{real} eigenvalues of \(T_\CC\) can be identified as images, through the natural inclusion, \(v\mapsto(v,0)\) of \(V\) in \(V_\CC\), of the Jordan bases of the generalised eigenspaces in \(V\) of those eigenvalues of \(T\).

So we focus on the conjugate pairs, \((\lambda_i,{\lambda_i}^*)\), of complex eigenvalues of \(T_\CC\). As we’ve seen, the number and size of the Jordan blocks corresponding to an eigenvalue \(\lambda_i\) is determined by the dimensions of the spaces \(\ker(T_\CC-\lambda_i\id_{V_{\CC}})^k\) for \(k\leq\dim V\). But taking complex conjugates, we see that \((T_\CC-\lambda_i\id_{V_{\CC}})^kv=0\) is equivalent to \((T_\CC-{\lambda_i}^*\id_{V_{\CC}})^kv^*=0\). That is, the map \(v\mapsto v^*\) determines a one-to-one correspondence between \(\ker(T_\CC-\lambda_i\id_{V_{\CC}})^k\) and \(\ker(T_\CC-{\lambda_i}^*\id_{V_{\CC}})^k\). In other words, there is a one-to-one correspondence between the Jordan blocks corresponding to conjugate eigenvalues \(\lambda_i\) and \({\lambda_i}^*\) of \(T_\CC\). So if \((x_i,y_i)\), \(i=1\dots n_i\), is a basis for the generalised eigenspace of \(\lambda_i\) in \(V_\CC\) then \((x_i,-y_i)\), \(i=1\dots n_i\), is the basis for the generalised eigenspace of \({\lambda_i}^*\) in \(V_\CC\). But this means that the \(x_i,y_i\in V\) are linearly independent in \(V\). So we may consider the matrix representation of the restriction of \(T\) to \(\Span(x_1,y_1,\dots,x_{n_i},y_{n_i})\) with respect to this basis. It is not difficult to see that starting with the \(n_i\times n_i\) Jordan matrix of the restriction of \(T_\CC\) to the generalised eigenspace of \(\lambda_i\), this is just the \(2n_i\times2n_i\) matrix obtained by replacing
\begin{equation*}
\lambda_i\mapsto\begin{pmatrix}
a_i&b_i\\
-b_i&a_i
\end{pmatrix}
\end{equation*}
where \(\lambda_i=a_i+ib_i\), on the diagonal and
\begin{equation*}
1\mapsto\begin{pmatrix}
1&0\\
0&1
\end{pmatrix}
\end{equation*}
on the superdiagonal.

In summary, we have obtained the following

Theorem (Real Jordan normal form) For any linear operator \(T\in\mathcal{L}(V)\) on a finite dimensional real vector space \(V\), there is a basis for \(V\) such that the matrix representation of \(T\), \(\mathbf{T}\), has the form
\begin{equation*}
\mathbf{T}=\begin{pmatrix}
\mathbf{T}_1& &\mathbf{0}\\
&\ddots& \\
\mathbf{0}& &\mathbf{T}_m
\end{pmatrix}
\end{equation*}
where each \(\mathbf{T}_i\) has the form
\begin{equation*}
\mathbf{T}_i=\begin{pmatrix}
\boldsymbol{\lambda}_i&\mathbf{1}& &\mathbf{0}\\
&\ddots&\ddots& \\
& &\ddots&\mathbf{1}\\
\mathbf{0}& & &\boldsymbol{\lambda}_i
\end{pmatrix},
\end{equation*}
with \(\boldsymbol{\lambda_i}\) and \(\mathbf{1}\) being simply the eigenvalue \(\lambda_i\) and the number \(1\) respectively in the case that \(\lambda_i\) is a (real) eigenvalue of \(T\) and
\begin{equation*}
\boldsymbol{\lambda}_i=\begin{pmatrix}
a_i&b_i\\
-b_i&a_i
\end{pmatrix}
\quad\text{and}\quad\mathbf{1}=\begin{pmatrix}
1&0\\
0&1
\end{pmatrix}
\end{equation*}
for each complex conjugate pair, \((\lambda_i,{\lambda_i}^*)\), \(\lambda_i=a_i+ib_i\), of (complex) eigenvalues of the complexified operator \(T_\CC\).

The Jordan Normal Form

Thus far, we have seen that given an \(n\)-dimensional vector space \(V\) defined over an algebraically closed field, then for any linear operator, \(T\in\mathcal{L}(V)\), a basis of \(V\) may be chosen such that the matrix representation of \(T\) is upper triangular. In certain optimal cases we know that a basis can be chosen so that the matrix representation is diagonal with entries given by the eigenvalues of \(T\) each appearing a number of times given by its geometric multiplicity. In these ‘good’ cases then, the linear operators are characterised by \(n\) continuous parameters, \(\lambda_i\), the eigenvalues.

The obvious question is whether we can extend such a characterisation beyond the particularly nice, diagonalisable, cases. The problem we have is clear, in general the characteristic polynomial \(p_T(x)=\prod_{i=1}^r(x-\lambda_i)^{n_i}\) does not have the property that \(\dim V_{\lambda_i}=n_i\). In other words, the eigenspaces may be a little too small and so prevent us achieving a perfect direct sum decomposition \(V=V_{\lambda_1}\oplus\dots\oplus V_{\lambda_r}\). However, by slightly generalising the notion of eigenspace we can achieve such a decomposition, and hence, a useful characterisation of general linear operators.

Whereas an eigenvector \(v\) of \(T\) with eigenvalue \(\lambda\) is such that \((T-\lambda\id_V)v=0\), a generalised eigenvector \(v\) of \(\lambda\) is such that \((T-\lambda\id_V)^kv=0\) for some \(k\geq 1\). The set of all such vectors corresponding to a given eigenvalue is then called the generalised eigenspace with eigenvalue \(\lambda\). This definition is actually equivalent to the following.

Definition The generalised eigenspace with eigenvalue \(\lambda\) of a linear operator \(T\in\mathcal{L}(V)\) is
\begin{equation}
V_{[\lambda]}=\left\{v\in V\mid(T-\lambda\id_V)^{\dim V}v=0\right\}.
\end{equation}

To see the equivalence we need to confirm that any \(v\) such that \((T-\lambda\id_V)^kv=0\) for some \(k\geq 1\) is in \(\ker(T-\lambda\id_V)^{\dim V}\). Now in general, for any \(T\in\mathcal{L}(V)\), we have an obvious chain of inclusions,
\begin{equation*}
\{0\}=\ker T^0\subseteq\ker T^1\subseteq\dotsm\subseteq\ker T^k\subseteq\ker T^{k+1}\subseteq\dotsm.
\end{equation*}
Observe, that if for some \(m\geq 1\), \(\ker T^m=\ker T^{m+1}\), then all subsequent spaces in the chain are also equal (for \(k\in\ZZ_{>0}\), if \(v\in\ker T^{m+k+1}\) then \(0=T^{m+1}(T^kv)\) so \(T^kv\in\ker T^{m+1}=\ker T^{m}\) so \(v\in\ker T^{k+m}\)). Thus the inclusions are strict until they are all equalities. In particular, if the chain hasn’t stopped growing by \(\ker T^{\dim V}\) then it certainly does at that point since we are limited by the dimension of \(V\). Thus, the definitions are indeed equivalent.

Just how big are the generalised eigenspaces? Before answering this question let’s establish that, in common with eigenspaces, generalised eigenspaces corresponding to distinct eigenvalues are distinct.

Proposition For any pair of distinct eigenvalues \(\lambda\neq\mu\), \(V_{[\lambda]}\cap V_{[\mu]}=\{0\}\).

Proof Suppose on the contrary that there was a non-zero \(v\in V\) such that \(v\in V_{[\lambda]}\) and \(v\in V_{[\mu]}\). Then, let \(k\) be the smallest integer such that \((T-\lambda\id_V)^kv=0\). It follows that \((T-\lambda\id_V)^{k-1}v\neq 0\) must be a genuine eigenvector of \(T\), call it \(v_\lambda\). But \(v\in V_{[\mu]}\), so \((T-\mu\id_V)^lv=0\) for some \(l\geq 1\), and thus
\begin{align*}
(T-\mu\id_V)^lv_\lambda&=(T-\mu\id_V)^l(T-\lambda\id_V)^{k-1}v\\
&=(T-\lambda\id_V)^{k-1}(T-\mu\id_V)^lv\\
&=0,
\end{align*}
so \(v_\lambda\in V_{[\mu]}\). But for any such eigenvector \(v_\lambda\) and all \(j\geq 1\), \((T-\mu\id_V)^jv_\lambda=(\lambda-\mu)^j\neq 0\) so \(v_\lambda\notin V_{[\mu]}\) and we have established a contradiction.\(\blacksquare\)

In particular, it follows from this that for any \(\mu\neq\lambda\), the restriction of \((T-\mu\id_V)\) to \(V_{[\lambda]}\) is one-to-one. That is, \(\lambda\) is the only eigenvalue of the restriction \(T|_{V_{[\lambda]}}\) of \(T\) to \(V_{[\lambda]}\). So the characteristic polynomial of \(T|_{V_{[\lambda]}}\) must be \((x-\lambda)^{\dim V_{[\lambda]}}\). Now, if the characteristic polynomial of \(T\) is \(p_T(x)=\prod_{i=1}^r(x-\lambda_i)^{n_i}\) then since \(V_{[\lambda_i]}\) is a \(T\)-invariant subspace of \(V\) we know that \(p_T(x)=(\lambda_i-x)^{\dim V_{[\lambda_i]}}p_{T’}(x)\) where \(T’:V/V_{[\lambda_i]}\mapto V/V_{[\lambda_i]}\) is the linear operator induced from \(T\). Thus, if we can show that \(\lambda_i\) is not a root of \(p_{T’}(x)\) then we have established that \(\dim V_{[\lambda_i]}=n_i\). That is we will have established that the dimensions of the generalised eigenspaces are precisely the algebraic multiplicities of the corresponding eigenvalues. So assume, on the contrary, that \(\lambda_i\) is a root of \(p_{T’}(x)\), then there must exist a non-zero \(v+V_{[\lambda_i]}\in V/V_{[\lambda_i]}\) such that \(T'(v+V_{[\lambda_i]})=\lambda_i(v+V_{[\lambda_i]})\). That is there exists some \(v\notin V_{[\lambda_i]}\) such that \((T-\lambda_i\id_V)v\in V_{[\lambda_i]}\). But then \(v\in V_{[\lambda_i]}\) and we have a contradicition.

Now consider the sum, \(U=\sum_{i=1}^rV_{[\lambda_i]}\). We would like to establish this to be a direct sum. That is, we must demonstrate the uniqueness of the expansion of an arbitrary element \(u\in U\) as a sum \(u=v_1+\dots+v_r\) of vectors \(v_i\in V_{[\lambda_i]}\). This is equivalent to showing that if \(v_1+\dots+v_r=0\) then \(v_i=0\) for all \(i=1\dots r\) which we may establish by applying in turn the linear operators \(\prod_{j\neq i}^r(T-\lambda_j\id_V)^{\dim V}\) for \(i=1\dots r\) to \(v_1+\dots+v_n=0\). For example, applying
\begin{equation}
(T-\lambda_2\id_V)^{\dim V}\cdots(T-\lambda_n\id_V)^{\dim V},
\end{equation}
we get
\begin{equation}
(\lambda_1-\lambda_2)^{\dim V}\cdots(\lambda_1-\lambda_n)^{\dim V}v_1=0,
\end{equation}
from which we deduce \(v_1=0\).

Thus, the sum is indeed direct and since each summand, \(V_{[\lambda_i]}\), has dimension \(\dim V_{[\lambda_i]}=n_i\) and \(\sum n_i=\dim V\) we have established the following theorem.

Theorem If \(\lambda_1,\dots,\lambda_r\) are the distinct eigenvalues of a linear operator \(T\in\mathcal{L}(V)\) with characteristic polynomial \(p_T(x)=\prod_{i=1}^r(x-\lambda_i)^{n_i}\), then
\begin{equation}
V=\bigoplus_{i=1}^rV_{[\lambda_i]},
\end{equation}
and \(\dim V_{[\lambda_i]}=n_i\).

From this theorem we see that any linear operator \(T\in\mathcal{L}(V)\) can be represented by a matrix in block diagonal form with each block corresponding to distinct eigenvalues of \(T\) and the size of each block equal to the algebraic multiplicity of the corresponding eigenvalue. Define \(T_i\) to be the restriction of \(T\) to the generalised eigenspace \(V_{[\lambda_i]}\). Since we are working over an algebraically closed field, each block, which is just a matrix representation of each \(T_i\), can be made upper triangular and since \(T_i\) has only \(\lambda_i\) as an eigenvalue, the diagonal elements of each upper triangular block are the eigenvalue corresponding to that block. If we are going to do better than this we’ll have to find a general way of choosing ‘nice’ bases for the generalised eigenspaces. By Cayley-Hamilton we know that each \(T_i\) satisfies \((T_i-\lambda_i\id_{V_{[\lambda_i]}})^{n_i}=0\). So \(T_i=\lambda_i\id_{V_{[\lambda_i]}}+N_i\) where \(N_i^{n_i}=0\). That is, \(T_i\) is a sum of a multiple of the identity operator and a nilpotent operator.

Thus our task is really to find a basis that expresses the matrix of any nilpotent operator in some sort of canonical way. Let us define \(m(v)\) to be the maximum positive integer such that \(N^{m(v)}v\neq 0\). This is sometimes called the height of the vector \(v\) with respect to the operator \(N\). The following result establishes the existence and uniqueness of such a basis.

Theorem If \(N\in\mathcal{L}(V)\) is nilpotent then there exist vectors \(v_1,\dots,v_k\in V\) such that
\begin{equation}
\{v_1,Nv_1,\dots,N^{m(v_1)}v_1,\dots,v_k,Nv_k,\dots,N^{m(v_k)}v_k\}\label{equ:nilpotent_basis}
\end{equation}
is a basis of \(V\). Here \(k=\dim\ker N\) and the \(m(v_i)\) are unique up to permutation.

Proof If \(\dim V=1\) then the result is trivial. We will prove it for general dimension by induction on the dimension of \(V\). The ‘trick’, as it were, is to observe that since the linear operator \(N\) has a non-trivial kernel, \(\img N\) is strictly contained within \(V\) and so we may assume by the induction hypothesis that the result holds for this subspace. That is, we may assume we have vectors, \(u_1,\dots,u_j\in\img N\), such that,
\begin{equation*}
\{u_1,Nu_1,\dots,N^{m(u_1)}u_1,\dots,u_j,Nu_j,\dots,N^{m(u_j)}u_j\},
\end{equation*}
is a basis of \(\img N\). Let us define \(v_i\), \(1\leq i\leq j\) to be such that \(u_i=Nv_i\). Then \(m(u_i)=m(v_i)-1\) and our basis of \(\img N\) is,
\begin{equation*}
\{Nv_1,N^2v_1,\dots,N^{m(v_1)}v_1,\dots,Nv_j,N^2v_j,\dots,N^{m(v_j)}v_j\}.
\end{equation*}
The vectors \(\{N^{m(v_1)}v_1,\dots,N^{m(v_j)}v_j\}\) are linearly independent and belong to \(\ker N\), indeed they are a basis for the kernel of the restriction to \(\img N\) of \(N\), \(\ker N|_{\img N}\). Define the vectors \(v_{j+1},\dots,v_k\) as the vectors which extend those \(j\) vectors to a basis of \(\ker N\). With \(v_1,\dots,v_k\) so defined, let us consider the set of vectors
\begin{equation*}
\{v_1,Nv_1,\dots,N^{m(v_1)}v_1,\dots,v_j,Nv_j,\dots,N^{m(v_j)}v_j,v_{j+1},\dots,v_k\}.
\end{equation*}
Notice that there are \(\dim\img N+\dim\ker N=\dim V\) of them, so if we can show they are linearly independent then they are a basis for \(V\) as specified in the theorem (\(m(v_i)=0\) for \(j+1\leq i\leq k\)).
So let us suppose we have,
\begin{align*}
0&=a_{1,0}v_1+\dots+a_{1,m(v_1)}N^{m(v_1)}v_1+\dots\\
&+a_{j,0}v_j+\dots+a_{j,m(v_j)}N^{m(v_j)}v_j\\
&+a_{j+1,0}v_{j+1}+\dots+a_{k,0}v_k.
\end{align*}
Applying \(N\) to this we obtain
\begin{equation*}
a_{1,0}u_1+\dots a_{1,m(u_1)}N^{m(u_1)}u_1+\dots+a_{j,0}u_j+\dots a_{j,m(u_j)}N^{m(u_j)}u_j=0.
\end{equation*}
By the induction hypothesis, this implies \(a_{r,s}=0\) for \(1\leq r\leq j\), \(0\leq s\leq m(v_r)-1\). Thus, we are reduced to considering
\begin{equation*}
a_{1,m(v_1)}N^{m(v_1)}v_1+\dots+a_{j,m(v_j)}N^{m(v_j)}v_j+
a_{j+1,0}v_{j+1}+\dots+a_{k,0}v_k=0.
\end{equation*}
But the \(v_{j+1},\dots,v_k\) were defined to be precisely an extension of the basis \begin{equation*}
\{N^{m(v_1)}v_1,\dots,N^{m(v_j)}v_j\}
\end{equation*}
of \(\ker N|_{\img N}\) to a basis of \(\ker N\). That is
\begin{equation*}
\ker N=\ker N|_{\img N}\oplus\Span(v_{j+1},\dots,v_k),
\end{equation*}
so we must have
\begin{equation*}
a_{1,m(v_1)}N^{m(v_1)}v_1+\dots+a_{j,m(v_j)}N^{m(v_j)}v_j=0
\end{equation*}
and
\begin{equation*}
a_{j+1,0}v_{j+1}+\dots+a_{k,0}v_k=0
\end{equation*}
and hence all the coefficients \(a_{i,j}\) must be 0. We have thus established the existence of \(k=\dim\ker N\) vectors \(v_i\) such that \eqref{equ:nilpotent_basis} is a basis of \(V\). For uniqueness, observe that \(\dim\ker N=|\{j\mid m(v_j)\geq 0\}|=k\), that is, the number of \(v_i\) whose height is greater than or equal to \(0\) is \(\dim\ker N\). Likewise, \(\dim\ker N^2=\dim\ker N+|\{j\mid m(v_j)\geq 1\}|\), so that the number of \(v_i\) whose height is greater than \(1\) is \(\dim\ker N^2-\dim\ker N\). Continuing in this fashion we obtain a partition ¹ of \(n\) which may be represented as a Young tableau such as (in the case of \(n=11\))

In such a diagram the length of the first row is \(k\) with the length of successive rows given by \(\dim\ker N^i-\dim\ker N^{i-1}\). But note that the number of boxes in each column are then just the numbers \(m(v_i)\) the collection of heights. Thus the set of heights are completely determined by \(N\) and so the \(m(v_i)\) are indeed unique up to permutation.\(\blacksquare\)

Let’s consider what the matrix representation of some nilpotent operator \(N\) looks like with respect to this basis. In fact, we’ll make a small cosmetic tweak and flip the order of the basis elements corresponding to each \(v_i\). That is, we consider the basis
\begin{equation*}
(N^{m(v_1)}v_1,\dots,Nv_1,v_1,\dots,N^{m(v_k)}v_k,\dots,Nv_k,v_k).
\end{equation*}
Then the matrix representation of \(N\) with respect to this basis has a block diagonal, with \(k\) blocks each of size \(m(v_i)\), \(1\leq i\leq k\), and each of the form
\begin{equation*}
\begin{pmatrix}
0&1&&\mathbf{0}\\
&\ddots&\ddots&\\
& &\ddots&1\\
\mathbf{0}& & & 0
\end{pmatrix}.
\end{equation*}
We have arrived at the following result.

Theorem (Jordan normal form) For any linear operator \(T\in\mathcal{L}(V)\) on a finite dimensional vector space \(V\), over an algebraically closed field \(K\), there is a basis for \(V\) such that the matrix representation of \(T\), \(\mathbf{T}\), has the form
\begin{equation*}
\mathbf{T}=\begin{pmatrix}
\mathbf{T}_1& &\mathbf{0}\\
&\ddots& \\
\mathbf{0}& &\mathbf{T}_m
\end{pmatrix}
\end{equation*}
where each \(\mathbf{T}_i\), called a Jordan block, is an upper triangular matrix of the form
\begin{equation*}
\mathbf{T}_i=\begin{pmatrix}
\lambda_i&1& &\mathbf{0}\\
&\ddots&\ddots& \\
& &\ddots&1\\
\mathbf{0}& & &\lambda_i
\end{pmatrix},
\end{equation*}
with some eigenvalue \(\lambda_i\) of \(T\) on its diagonal. \(\mathbf{T}\) is called the Jordan normal form of \(T\) and is unique up to rearranging the order of the blocks.

In the language of group actions and orbits we have.

The orbits of the action of \(\text{GL}(V)\) on \(\mathcal{L}(V)\) where \(V\) is an \(n\)-dimensional vector space over an algebraically closed field are characterised by a set of \(n\) continuous parameters, the eigenvalues, together with some discrete parameters, partitions of the multiplicities of repeated eigenvalues.

Example Consider the matrix,
\begin{equation*}
\mathbf{A}=\begin{pmatrix}
2&-1&0\\
1&0&0\\
1&-1&1
\end{pmatrix}.
\end{equation*}
We form the characteristic equation,
\begin{align*}
\det(\mathbf{A}-\lambda\mathbf{I}_3)&=\det\begin{pmatrix}
2-\lambda&-1&0\\
1&-\lambda&0\\
1&-1&1-\lambda
\end{pmatrix}.\\
&=(1-\lambda)\det\begin{pmatrix}
2-\lambda\\
1&-\lambda
\end{pmatrix}\\
&=-(\lambda-1)^3.
\end{align*}
So we have one eigenvalue, \(\lambda=1\) with algebraic multiplicity \(3\). Consider then,
\(\mathbf{A}-\lambda\mathbf{I}_3\),
\begin{equation*}
\mathbf{A}-\lambda\mathbf{I}_3=\begin{pmatrix}
1&-1&0\\
1&-1&0\\
1&-1&0
\end{pmatrix}.
\end{equation*}
Clearly this has rank \(1\) so \(\dim\ker(\mathbf{A}-\lambda\mathbf{I}_3)=2\). We then know immediately that \(\dim\ker(\mathbf{A}-\lambda\mathbf{I}_3)^2=3\) and thus we have two blocks and the Jordan normal form is
\begin{equation*}
\begin{pmatrix}
1&0&0\\
0&1&1\\
0&0&1
\end{pmatrix}.
\end{equation*}

Notes:

Recall that a partition of some positive integer \(n\) is simply a way of writing \(n\) as a sum of positive integers \(n_i\) typically written as a tuple \((n_1,\dots,n_r)\) such that \(n_i\geq n_{i+1}\). A visual representation of a partition is given by its Young diagram. For example the partition \((2,1)\) of 3 would correspond to the diagram . ↩

The Cayley-Hamilton Theorem

As a vector space \(\mathcal{L}(V)\) is \(n^2\)-dimensional so there must exist some relationship between the \(n^2+1\) operators \(\id_V,T,\dots,T^{n^2}\). In fact, the following result, known as the Cayley-Hamilton theorem, guarantees a relationship between the powers of \(T\) up to \(n\).

Theorem (Cayley-Hamilton) Every linear operator \(T:V\mapto V\) satisfies its own characteristic equation, \(p_T(T)=0\). Equivalently, every \(n\times n\) matrix \(\mathbf{A}\) satisfies its own characteristic equation \(p_\mathbf{A}(\mathbf{A})=0\).

Proof When \(T\) is diagonalisable the result is obvious since by choosing the union of bases of the eigenspaces, \(V_{\lambda_i}\), as a basis of \(V\), any basis element is clearly annihilated by the product \((T-\lambda_1)\dots(T-\lambda_n)\). More generally, even if \(T\) is not diagonalisable, we know that we can always construct a basis \(v_i\), as in the discussion following Theorem, such that its matrix representation is upper triangular. As already observed, defining \(W_0=\{0\}\) and \(W_i=\Span(v_1,\dots,v_i)\) (\(W_n=V\)) for \(1\leq i\leq n\), \(W_i\) is \(T\)-invariant and \((T-\lambda_i\id_V)W_i\subseteq W_{i-1}\). Indeed,
\begin{align*}
(T-\lambda_n\id_V)V&\subseteq W_{n-1}\\
(T-\lambda_{n-1}\id_V)(T-\lambda_n\id_V)V&\subseteq W_{n-2}\\
&\vdots \\
\prod_{i=1}^n(T-\lambda_i\id_V)V&\subseteq W_0={0},
\end{align*}
that is, \(p_T(T)=0\).\(\blacksquare\)

Another way to see the Cayley-Hamilton result is as follows. Choose a basis \(\{e_i\}\) of \(V\) in terms of which \(Te_i=T_i^je_j\), the \(T^i_j\) being the components of the matrix representation of \(T\). We could write this as \((\delta^i_jT-T^i_jI_V)e_i=0\), or as the matrix equation,
\begin{equation*}
(T\mathbf{I}_{n}-\mathbf{T}^\mathsf{T})\begin{pmatrix}
e_1\\
\vdots\\
e_n
\end{pmatrix}
=
\begin{pmatrix}
T-T^1_1 & \dots & -T^n_1\\
\vdots & \ddots & \vdots\\
-T^1_n & \dots & T-T^n_n
\end{pmatrix}\begin{pmatrix}
e_1\\
\vdots\\
e_n
\end{pmatrix}
=0.
\end{equation*}
The matrix \(\mathbf{S}(T)\), defined by
\begin{equation*}
\mathbf{S}(T)=\begin{pmatrix}
T-T^1_1 & \dots & -T^n_1\\
\vdots & \ddots & \vdots\\
-T^1_n & \dots & T-T^n_n
\end{pmatrix}
\end{equation*}
exists in \(\text{Mat}_n(\text{End}(V))\), and as such would appear unlikely to be amenable to the techniques developed thus far for matrices over fields. In fact, we can regard \(\mathbf{S}(T)\) as a matrix over the commutative ring \(K[T]\), of polynomials in the symbol \(T\), with the obvious action on \(V\). As such, the standard definition and results from the theory of determinants, as described in Determinants, do indeed apply. In particular, we have
\begin{equation*}
\det(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})=p_\mathbf{T}(T),
\end{equation*}
and
\begin{equation*}
\adj(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})=\det(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})\mathbf{I}_n.
\end{equation*}
So
\begin{equation*}
0=\adj(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})
\begin{pmatrix}
e_1\\
\vdots\\
e_n
\end{pmatrix}
=\det(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})\begin{pmatrix}
e_1\\
\vdots\\
e_n
\end{pmatrix}=p_\mathbf{T}(T)\begin{pmatrix}
e_1\\
\vdots\\
e_n
\end{pmatrix},
\end{equation*}
and the result is established.

Diagonalisable Linear Operators

From now on, unless otherwise stated, we assume our vector spaces are defined over an algebraically closed field such as \(\CC\). Recall that in this case any linear operator on an \(n\)-dimensional vector space has a characteristic polynomial which factors as \(\prod_{i=1}^r(x-\lambda_i)^{n_i}\). This means in particular that any linear operator has at least one eigenvector and indeed, at least one per distinct eigenvalue.

Proposition A set of eigenvectors \(v_1,\dots,v_r\) corresponding to distinct eigenvalues \(\lambda_1,\dots,\lambda_r\) of a linear operator \(T\) are linearly independent.

Proof By assumption we have \(Tv_i=\lambda_iv_i\) for \(1\leq i\leq r\). Suppose there are numbers \(c^1,\dots,c^r\) such that \(c^iv_i=0\). Then we must have,
\begin{align*}
(T-\lambda_2)\dots(T-\lambda_r)c^iv_i&=c^1(T-\lambda_2)\dots(T-\lambda_r)v_1=0\\
&=c^1(\lambda_1-\lambda_2)\dots(\lambda_1-\lambda_r)v_1=0,
\end{align*}
which in turn means \(c_1=0\). In the same way we show \(c_2=\dots=c_n=0\), so proving linear independence.\(\blacksquare\)

Definition A linear operator \(T\in\mathcal{L}(V)\) is said to be diagonalisable if there is some basis of \(V\) with respect to which the matrix representation of \(T\) is diagonal. \ede

Suppose the factorisation of the characteristic polynomial is ‘nice’ in the sense that
\(p_T(x)=\prod_{i=1}^r(x-\lambda_i)^{n_i}\) with \(\dim V_{\lambda_i}=n_i\) for all \(i\). That is, the geometric multiplicity of each eigenvalue equals its algebraic multiplicity. Then, as follows from the Proposition, \(\sum_{\lambda_i}V_{\lambda_i}\) is a direct sum and so by equation, \(\dim(\sum_{\lambda_i}V_{\lambda_i})=\sum_i n_i=n\), and in any basis which is the union of bases for the \(V_{\lambda_i}\) the matrix representation of \(T\) is diagonal. The converse is obvious so we have demonstrated the following

Corollary A linear operator \(T\) is diagonalisable if and only if \(p_T(x)=\prod_{i=1}^r(x-\lambda_i)^{n_i}\) with \(\dim V_{\lambda_i}=n_i\) for all \(i\).

Suppose \(V\) is an \(n\)-dimensional vector space and \(T:V\mapto V\) a diagonalisable linear operator. If \(\{e_i\}\) is a basis of \(V\) with respect to which \(T\) has matrix representation \(\mathbf{T}\) such that \(Te_i=T_i^je_j\) and if \(\{v_i\}\) is a basis for \(V\) which is a union of eigenspace bases, such that \(Tv_i=\lambda_iv_i\) (not all \(\lambda_i\) necessarily distinct) then we may relate the two bases as \(v_i=P_i^je_j\) where \(P_i^j\) are the components of an invertible matrix \(\mathbf{P}\) and since \(Tv_i=P_i^jT_j^k{P^{-1}}_k^lv_l=\lambda_iv_i\) we see that the similarity transformation, \(\mathbf{P}\) diagonalises \(\mathbf{T}\). In particular, any \(n\times n\) matrix, \(\mathbf{A}\), over an algebraically closed field \(K\) is a linear operator on \(K^n\). In terms of the standard basis, \(\{\mathbf{e}_i\}\), of \(K^n\) we have \(\mathbf{A}\mathbf{e}_i=A_i^j\mathbf{e}_j\) and if \(\mathbf{A}\) is diagonalisable then there must exist a basis, \(\{\mathbf{v}_i\}\), of \(K^n\) such that \(\mathbf{A}\mathbf{v}_i=\lambda_i\mathbf{v}_i\) and an invertible matrix \(\mathbf{P}\) such that \(\mathbf{v}_i=P_i^j\mathbf{e}_j\). Note that \(\mathbf{P}\) is precisely the matrix whose \(i\)th column is the \(i\)th vector \(\mathbf{v}_i\). A diagonalisable matrix is diagonalised by the matrix whose columns are its eigenvectors.

Example (The Pauli matrices) Extremely important in quantum mechanics, the Pauli matrices, \(\sigma_x\), \(\sigma_y\) and \(\sigma_z\) are given by
\begin{equation}
\sigma_x=\begin{pmatrix}
0&1\\1&0
\end{pmatrix}
\quad
\sigma_y=\begin{pmatrix}
0&-i\\i&0
\end{pmatrix}
\quad
\sigma_z=\begin{pmatrix}
1&0\\0&-1
\end{pmatrix}.
\end{equation}
It is not difficult to see that each has the pair of eigenvalues \(\pm1\) so \(\sigma_x\) and \(\sigma_y\) are in fact similar to \(\sigma_z\) with the similarity transformation matrices given by
\begin{equation*}
\begin{pmatrix}
1&1\\
1&-1
\end{pmatrix}
\quad\text{and}\quad\begin{pmatrix}
1&1\\
i&-i
\end{pmatrix}
\end{equation*}
for \(\sigma_x\) and \(\sigma_y\) respectively.

The Trace

Another basis independent property of linear operators is the trace.

Definition The trace of an \(n\times n\) matrix \(\mathbf{A}\), \(\tr\mathbf{A}\) is defined to be
\begin{equation}
\tr\mathbf{A}=\sum_{i=1}^nA_i^i=A_i^i.
\end{equation}

As \(\tr\mathbf{A}\mathbf{B}=(AB)^i_i=A^i_jB^j_i=(BA)^i_i=\tr\mathbf{B}\mathbf{A}\) it follows that \(\tr\mathbf{P}^{-1}\mathbf{A}\mathbf{P}=\tr\mathbf{A}\) so it makes sense to define the trace of any linear operator \(T:V\mapto V\), \(\tr T\), as the trace of any matrix representation of \(T\).

Working over an algebraically closed field \(K\), since any matrix \(\mathbf{A}\in\text{Mat}_n(K)\) is similar to an upper triangular matrix, we have \(\tr\mathbf{A}=\sum_{i=1}^n\lambda_i\) and \(\det\mathbf{A}=\prod_{i=1}^n\lambda_i\) (not all \(\lambda_i\) necessarily distinct), quantities which are in fact encoded as particular coefficients in the characteristic polynomial,
\begin{equation}
p_\mathbf{A}(x)=x^n-\tr\mathbf{A}x^{n-1}+e_2x^{n-2}-\dots+(-1)^{n-1}e_{n-1}x+(-1)^n\det\mathbf{A}.
\end{equation}
The coefficients \(e_1=\tr\mathbf{A},e_2,\dots,e_{n-1},e_n=\det\mathbf{A}\) are called the elementary symmetric functions.

There is a nice relationship between the trace and determinant. It can be shown that the matrix exponential,
\begin{equation}
\exp\mathbf{A}=\sum_{i=0}^\infty\frac{1}{i!}\mathbf{A}^i=\mathbf{I}_n+\mathbf{A}+\frac{1}{2!}\mathbf{A}^2+\dots,
\end{equation}
converges for any \(n\times n\) matrix over \(K\). Consider then, the function on \(\RR\) defined as,
\begin{equation*}
f(t)=\det\exp\mathbf{A}t.
\end{equation*}
Differentiating, we find that
\begin{equation*}
\frac{df(t)}{dt}=\tr\mathbf{A}f(t),
\end{equation*}
so that \(\ln f(t)=\tr\mathbf{A}t\) and in particular we have the following relationship between the determinant and trace,
\begin{equation}
\det\exp\mathbf{A}=\exp\tr\mathbf{A}.
\end{equation}

Eigenvectors, Eigenvalues and Diagonalisation

We have seen that given any pair of vector spaces \(V,W\), then up to choice of bases, the linear transformations between them are completely determined by their rank, that is, by integers \(i\leq\min(\dim V,\dim W)\). We turn now to the task of establishing a similar result for linear operators \(T:V\mapto V\). That is, for a given vector space \(V\), we wish to characterise the linear operators \(\mathcal{L}(V)\) up to a choice of basis for \(V\).

Suppose \(T:V\mapto V\) is a linear operator and \(U\) a \(T\)-invariant subspace of \(V\). If \(\{v_1,\dots,v_s\}\) is a basis of \(U\) we can extend it to a basis of \(V\) and then with respect to this basis the matrix representation of \(T\), \(\mathbf{T}\), will have the form,
\begin{equation}
\mathbf{T}=\begin{pmatrix}
\mathbf{t} & \mathbf{*}\\
\mathbf{0}&\mathbf{T}’
\end{pmatrix},
\end{equation}
where \(\mathbf{t}\) is the matrix representation of \(T|_U\), the restriction of \(T\) to \(U\) and \(\mathbf{T}’\) is the matrix representation of the induced map \(T’:V/U\mapto V/U\), such that \(T'(v+U)=Tv+U\), with respect to the basis \(\{v_{s+1}+U,\dots,v_n+U\}\) of \(V/U\). If we were able to do better and decompose \(V\) as a direct sum of two invariant subspaces, \(V=U_1\oplus U_2\) say, then if \(\{v_1,\dots, v_s\}\) is a basis of \(U_1\) and \(\{v_{s+1},\dots,v_n\}\) is a basis of \(U_2\) then \(\{v_1,\dots,v_n\}\) is a basis of \(V\) and with respect to this basis the matrix representation of \(T\) will have the form,
\begin{equation}
\mathbf{T}=\begin{pmatrix}
\mathbf{t}_1 & \mathbf{0}\\
\mathbf{0}&\mathbf{t}_2
\end{pmatrix},
\end{equation}
where \(\mathbf{t}_1\) is the basis of \(T|_{U_1}\) with respect to the basis \(\{v_1,\dots, v_s\}\) and \(\mathbf{t}_2\) is the basis of \(T|_{U_2}\) with respect to the basis \(\{v_{s+1},\dots, v_n\}\). So, assuming our our goal is to find a basis for \(V\) such that the matrix of a given linear operator \(T\) assumes the simplest possible form then it is clear that finding \(T\)-invariant subspaces will play an important role. This motivates the following definition.

Definition A non-zero vector \(v\in V\) is an eigenvector of a linear operator \(T:V\mapto V\) if
\begin{equation}
Tv=\lambda v,
\end{equation}
for some \(\lambda\in K\). In this case, \(\lambda\) is called the eigenvalue associated with \(v\). The subspace of \(V\) defined by
\begin{equation}
V_\lambda=\{v\in V\mid Tv=\lambda v\},
\end{equation}
is called the eigenspace of \(\lambda\) and its dimension, \(\dim V_\lambda\), the geometric multiplicity of \(\lambda\).

Clearly \(V_\lambda=\ker(T-\lambda\id_V)\) which is non-trivial if and only if \(\det(T-\lambda\id_V)=0\). That is, having chosen a basis for \(V\) such that the matrix representation of \(T\) is \(\mathbf{T}\), \(\lambda\) is an eigenvalue of \(T\) if and only if \(\det(\mathbf{T}-\lambda\mathbf{I}_n)=0\). This motivates the following definition.

Definition The characteristic polynomial of an \(n\times n\) matrix \(\mathbf{A}\) is
\begin{equation}
p_\mathbf{A}(x)=\det(\mathbf{A}-x\mathbf{I}_n),
\end{equation}
with \(p_\mathbf{A}(x)=0\) the characteristic equation.

Notice that since similar matrices clearly have the same characteristic polynomial it makes sense to define the characteristic polynomial \(p_T(x)\) for any \(T\in\mathcal{L}(V)\) as the characteristic polynomial of any matrix representation of \(T\).

Let us now consider a few simple examples of the calculation of eigenvalues and eigenvectors. Given an \(n\times n\) matrix \(\mathbf{A}\), the general procedure is first to compute its eigenvalues \(\lambda\) as the roots of the characteristic equation, \(\det(\mathbf{A}-\lambda\mathbf{I}_n)=0\), and then for each eigenvalue in turn solve \((\mathbf{A}-\lambda\mathbf{I}_n)\mathbf{v}=0\) for the corresponding eigenvectors \(\mathbf{v}\).

Example The matrix,
\begin{equation}
\begin{pmatrix}
1&1\\
0&2
\end{pmatrix},
\end{equation}
has characteristic polynomial \((1-\lambda)(2-\lambda)\) so has two eigenvalues, \(1\) and \(2\). For the eigenvalue \(1\), a corresponding eigenvector \(\mathbf{v}\), must satisfy,
\begin{equation}
\begin{pmatrix}
0&1\\
0&1
\end{pmatrix}
\mathbf{v}=0
\end{equation}
that is,
\begin{equation}
\mathbf{v}=\begin{pmatrix}
1\\
0
\end{pmatrix},
\end{equation}
while for the eigenvalue \(2\) we have
\begin{equation}
\begin{pmatrix}
-1&1\\
0&0
\end{pmatrix}
\mathbf{v}=0
\end{equation}
that is,
\begin{equation}
\mathbf{v}=\begin{pmatrix}
1\\
1
\end{pmatrix}.
\end{equation}
So in this case there are two distinct eigenvalues with each corresponding eigenspace one-dimensional.

Example The matrix,
\begin{equation}
\begin{pmatrix}
1&1\\
0&1
\end{pmatrix}.
\end{equation}
has characteristic polynomial \((1-\lambda)^2\) so has just one eigenvalue, \(1\). The corresponding eigenvector equation,
\begin{equation}
\begin{pmatrix}
0&1\\
0&0
\end{pmatrix}
\mathbf{v}=0
\end{equation}
tells us that the single corresponding eigenvector is
\begin{equation}
\mathbf{v}=\begin{pmatrix}
1\\
0
\end{pmatrix}.
\end{equation}
So in this case there is just one eigenvalue and its corresponding eigenspace is one-dimensional.

Example The identity matrix,
\begin{equation}
\begin{pmatrix}
1&0\\
0&1
\end{pmatrix}.
\end{equation}
also has characteristic polynomial \((1-\lambda)^2\) and so just one eigenvalue, \(1\). The corresponding eigenvector equation,
\begin{equation}
\begin{pmatrix}
0&0\\
0&0
\end{pmatrix}
\mathbf{v}=0
\end{equation}
though has two possible solutions
\begin{equation}
\mathbf{v}=\begin{pmatrix}
1\\
0
\end{pmatrix}\quad\text{and}\quad\begin{pmatrix}
0\\
1
\end{pmatrix}.
\end{equation}
So in this case there is just one eigenvalue but its corresponding eigenspace is two-dimensional.

The algebraic multiplicity of an eigenvalue \(\lambda\) is the multiplicity of \(\lambda\) as a root of the characteristic polynomial. From the previous examples we might guess that the algebraic multiplicity is greater than or equal to the geometric multiplicity. This is indeed the case, for if the geometric multiplicity of an eigenvalue \(\lambda\) of a linear operator \(T\) is \(s\) and we take \(\{v_1,\dots,v_s\}\) as a basis for \(V_\lambda\) we can extend it to a basis of \(V\). In this basis the matrix representation of \(T\) must have the form,
\begin{equation*}
\mathbf{T}=\begin{pmatrix}
\lambda\mathbf{I}_s & \mathbf{*}\\
\mathbf{0}&\mathbf{T}’
\end{pmatrix},
\end{equation*}
so the characteristic polynomial of \(T\) has the form, \(p_T(x)=p_\mathbf{T}(x)=(\lambda-x)^s\det(\mathbf{T}’-x\mathbf{I}_{n-s})\). So \((\lambda-x)^s\), which is just the characteristic polynomial of \(T|_{V_\lambda}\), divides \(p_T(x)\), from which it follows that the algebraic multiplicity is greater than or equal to the geometric multiplicity.

As already discussed, more generally, for any \(T\)-invariant subspace \(U\) of \(V\), we may choose a basis \(\{v_1,\dots,v_p\}\) of \(U\) and extend it to a basis \(\{v_1,\dots,v_p,v_{p+1},\dots,v_n\}\) of \(V\). Then with respect to this basis \(T\) has the matrix representation,
\begin{equation*}
\mathbf{T}=\begin{pmatrix}
\mathbf{t} & \mathbf{*}\\
\mathbf{0}&\mathbf{T}’
\end{pmatrix}.
\end{equation*}
where \(\mathbf{t}\) is the matrix representation of \(T|_U\) and \(\mathbf{T}’\) is the matrix representation of \(T’\) with respect to the basis \(\{v_{p+1}+U,\dots,v_n+U\}\) of \(V/U\). So we have \(p_T(x)=p_{T|_U}(x)\det(\mathbf{T}’-x\mathbf{I}_{n-p})=p_{T|_U}(x)p_{T’}(x)\).

In the previous examples we did not specify the field and indeed in each case we could have been working over either \(\RR\) or \(\CC\). In the following example the choice matters.

Example The matrix,
\begin{equation}
\begin{pmatrix}
0&1\\
-1&0
\end{pmatrix}.
\end{equation}
has characteristic polynomial \(\lambda^2+1\) and so regarded as a matrix over \(\RR\) it has no eigenvalues. Over \(\CC\), of course, there are two eigenvalues, \(\pm i\).

The characteristic polynomial, \(p_\mathbf{A}(x)\), of a matrix \(\mathbf{A}\) with \(r\) distinct eigenvalues, is said to factor over the field \(K\) if it can be written as
\begin{equation}
p_\mathbf{A}(x)=\prod_{i=1}^r(x-\lambda_i)^{n_i},
\end{equation}
for some \(\lambda_i\in K\) (\(\lambda_i\neq\lambda_j\) if \(i\neq j\)), and the algebraic multiplicities, \(n_i\) such that \(\sum_i n_i=n\). Clearly, the characteristic polynomial of the matrix of the last example does not factor over \(\RR\) (though it does over \(\CC\)), whilst those of the previous three, which happen to be upper triangular, do. Generally, it is not difficult to see that for any upper triangular matrix, \(\mathbf{A}\), its characteristic polynomial has the particularly simple form, \(p_\mathbf{A}(x)=(A_1^1-x)\dots(A_n^n-x)\), that is, any upper triangular matrix factors and its diagonal elements are precisely its eigenvalues.

Theorem If \(\mathbf{A}\) is an \(n\times n\) matrix over \(K\), then \(p_\mathbf{A}(x)\) factors if and only if \(\mathbf{A}\) is similar to an upper triangular matrix.

Proof The if we’ve already observed. Conversely, suppose \(p_\mathbf{A}(x)\) factors. If \(n=1\) then there’s nothing to prove. For \(n>1\), there is some \(\lambda\) such that \(p_\mathbf{A}(\lambda)=0\). That is, there is some \(\mathbf{v}_1\in K^n\) such that \(\mathbf{A}\mathbf{v}_1=\lambda\mathbf{v}_1\). Extend this to a basis of \(K^n\), \(v_1,\dots,v_n\). Now consider the matrix representation of \(L_\mathbf{A}\) with respect to this basis,
\begin{equation}
\begin{pmatrix}
\lambda & \mathbf{*}\\
\mathbf{0} & \mathbf{A}’
\end{pmatrix}.
\end{equation}
This is, of course, just the similarity transformation of \(\mathbf{A}\) by the appropriate change of basis matrix. It’s characteristic polynomial is clearly, \((x-\lambda)p_{\mathbf{A}’}(x)\), and given the initial assumption that \(p_\mathbf{A}(x)\) factors it follows that \(p_{\mathbf{A}’}(x)\) factors. So we can repeat the process for the matrix \(\mathbf{A}’\), finding a similarity transformation \(\mathbf{P}\) such that
\begin{equation*}
\mathbf{P}^{-1}\mathbf{A}’\mathbf{P}=\begin{pmatrix}
\lambda’&\mathbf{*}\\
\mathbf{0}&\mathbf{A}”
\end{pmatrix},
\end{equation*}
that is,
\begin{equation*}
\begin{pmatrix}
1 & \mathbf{0}\\
\mathbf{0} & \mathbf{P}^{-1}
\end{pmatrix}
\begin{pmatrix}
\lambda & \mathbf{*}\\
\mathbf{0} & \mathbf{A}’
\end{pmatrix}
\begin{pmatrix}
1 & \mathbf{0}\\
\mathbf{0} & \mathbf{P}
\end{pmatrix}
=
\begin{pmatrix}
\lambda & * & \mathbf{*}\\
0 & \lambda’ & \mathbf{*}\\
\mathbf{0} & \mathbf{0} & \mathbf{A}”
\end{pmatrix},
\end{equation*}
and so on, arriving finally at the desired upper triangular matrix.\(\blacksquare\)

From the perspective of linear operators, the construction works as follows. Since \(p_T(x)\) factors we know that \(T\) has at least one non-zero eigenvector, \(v_1\) say, with corresponding eigenvalue \(\lambda_1\). Defining \(W_1\equiv\Span(v_1)\) then since \(W_1\) is \(T\)-invariant we can define \(T’:V/W_1\mapto V/W_1\) as \(T'(v+W_1)=Tv+W_1\). As already observed, the characteristic polynomial of \(T\) can then be written as \(p_T(x)=p_{T|_{W_1}}(x)p_{T’}(x)=(x-\lambda_1)p_{T’}(x)\) and since by assumption \(p_T(x)\) factors, so too must \(p_{T’}(x)\). Thus \(T’\) has a non-zero eigenvector, \(v_2+W_1\) say, and if \(\lambda_2\) is the corresponding eigenvalue we have
\begin{equation*}
T'(v_2+W_1)=\lambda_2(v_2+W_1).
\end{equation*}
This means that \(Tv_2-\lambda_2v_2\in W_1\), so defining \(W_2=\Span(v_1,v_2)\), we have \(Tv_2\in W_2\) and so \(W_2\) is \(T\)-invariant. Also, since \(v_2+W_1\) is non-zero, \(v_1\) and \(v_2\) are linearly independent. Continuing in this way we construct a basis, \(\{v_1,\dots,v_n\}\), of \(V\) with the property that for any \(1\leq k\leq n\), \(W_k\) is \(T\)-invariant. The corresponding matrix representation is upper triangular and we note for future reference that \((T-\lambda_k\id_V)W_k\subseteq W_{k-1}\).

The field \(\CC\) is said to be algebraically closed since, by the fundamental theorem of algebra, every polynomial in \(\CC[x]\) splits into linear factors. Clearly any matrix over an algebraically closed field \(K\) is similar to an upper triangular matrix from which its eigenvalues can then be read. Note that the field \(\RR\) is not algebraically closed (\(\CC\) is actually the algebraic closure of \(\RR\)).

Determinants

If we focus on linear operators which are endomorphisms, that is, operators \(T\in\mathcal{L}(V)\) for some vector space \(V\), then basis independent properties are those of the orbits of the group action of \(\text{GL}(V)\) on \(\mathcal{L}(V)\) which maps \(T\mapto P^{-1}TP\). In other words, properties of square \(n\times n\) matrices \(\mathbf{A}\) which are also properties of all similar (also called conjugate) matrices \(\mathbf{P}^{-1}\mathbf{A}\mathbf{P}\) for some invertible \(n\times n\) matrix \(\mathbf{P}\). The change of basis matrix \(\mathbf{P}\) here is called a similarity transformation. One such attribute is the determinant.

Definition The determinant, \(\det\mathbf{A}\), of an \(n\times n\) matrix \(\mathbf{A}\) is defined to be
\begin{equation}
\det\mathbf{A} = \sum_{\sigma\in S_n}\sgn(\sigma)A_{\sigma_{1}}^1A_{\sigma_{2}}^2\dots A_{\sigma_{n}}^n\label{eq:def det}
\end{equation}
where the sum is over all permutations \(\sigma\) of the set \(\{1, 2,\dots, n\}\), \(\sigma_i\) being the image of \(i\) in a given permutation and \(S_n\) the set of all such permutations (the \(n\)-dimensional symmetric group) and \(\sgn(\sigma)\) the signature of \(\sigma\) (\(+1\) whenever the reordering given by \(\sigma\) can be achieved by successively interchanging two entries an even number of times, and \(-1\) otherwise.

This is sometimes expressed in terms of the Levi-Civita symbol \(\epsilon_{i_1i_2\dots i_n}\), defined as
\begin{equation}
\epsilon_{i_1i_2\dots i_n}=\begin{cases}
\sgn(i_1,i_2,\dots,i_n)&(i_1,i_2,\dots,i_n)\in S_n\\
0&(i_1,i_2,\dots,i_n)\notin S_n.
\end{cases}\label{def:Levi-Cevita}
\end{equation}

For any permutation \(\sigma\) given by
\begin{equation*}
\sigma=\begin{pmatrix}
1&2&\dots&n\\
\sigma_1&\sigma_2&\dots&\sigma_n
\end{pmatrix}
\end{equation*}
there is an inverse permutation, let’s call it \(\tau\), given by
\begin{equation*}
\tau=\begin{pmatrix}
\sigma_1&\sigma_2&\dots&\sigma_n\\
1&2&\dots&n
\end{pmatrix}
\end{equation*}
for which it is clear that \(\sgn(\sigma)=\sgn(\tau)\). This leads us to the observation that we could as well write the determinant as
\begin{equation}
\det\mathbf{A} = \sum_{\sigma\in S_n}\sgn(\sigma)A_1^{\sigma_{1}}A_2^{\sigma_{2}}\dots A_n^{\sigma_{n}}
\end{equation}
and that indeed
\begin{equation}
\det\mathbf{A}^\mathsf{T}=\det\mathbf{A}
\end{equation}
Note also that if the matrix is upper (or lower) triangular then \(\det\mathbf{A}=A_1^1\dots A_n^n\) and also that if two columns (or rows) are identical then \(\det\mathbf{A}=0\)

The effect of elementary row operations on the determinant is as follows. Interchanging rows changes the sign. Multiplying a row by a number multiplies the corresponding determinant by that number. Adding a multiple of one row to another leaves the determinant unchanged. The first two of these are obvious. For the third notice that for any permutation \(\sigma\) given by
\begin{equation*}
\sigma=\begin{pmatrix}
1&\dots&i&\dots&j&\dots&n\\
\sigma_1&\dots&\sigma_i&\dots&\sigma_j&\dots&\sigma_n
\end{pmatrix}
\end{equation*}
there will also be a permutation \(\sigma’\) given by
\begin{equation*}
\sigma’=\begin{pmatrix}
1&\dots&i&\dots&j&\dots&n\\
\sigma_1&\dots&\sigma_j&\dots&\sigma_i&\dots&\sigma_n
\end{pmatrix}
\end{equation*}
with \(\sgn(\sigma’)=-\sgn(\sigma)\). Thus if the elementary row operation transforms \(\mathbf{A}\) to \(\mathbf{A}’\), the matrix \(\mathbf{A}\) but with the \(i\)th row replaced by row \(i\) plus \(c\) times row \(j\), then
\begin{align*}
\det\tilde{\mathbf{A}} = \sum_{\sigma\in S_n}\sgn(\sigma) \bigl( &A_{\sigma_{1}}^1\dots A_{\sigma_{i}}^i\dots A_{\sigma_{j}}^j\dots A_{\sigma_{n}}^n \\
+ &cA_{\sigma_{1}}^1\dots A_{\sigma_{i}}^j\dots A_{\sigma_{j}}^j\dots A_{\sigma_{n}}^n \bigr)
\end{align*}
and we see that the term involving the constant \(c\) will be cancelled by the term corresponding to the permutation \(\sigma’\). The determinant of the identity matrix being of course 1, we see that if \(\mathbf{E}\) is any elementary row matrix, then \(\det\mathbf{E}\mathbf{A}=\det\mathbf{E}\det\mathbf{A}\). An entirely analogous discussion applies of course to elementary column matrices. So if an \(n\times n\) matrix \(\mathbf{A}\) is invertible then since it can be written as a product of elementary matrices, its determinant is non-zero. Conversely if the determinant is non-zero then the reduced row echelon form of the matrix has non-zero determinant so it must have full rank and therefore be invertible. We have therefore proved the following

Theorem An \(n\times n\) matrix \(\mathbf{A}\) is invertible if and only if \(\det\mathbf{A}\neq0\).

Another important property of determinants is contained in the following

Theorem For any \(n\times n\) matrices \(\mathbf{A}\) and \(\mathbf{B}\)
\begin{equation}
\det\mathbf{A}\mathbf{B}=\det\mathbf{A}\det\mathbf{B}.
\end{equation}

Proof Consider first the the case that \(A\) is not invertible. Then \(\rank L_\mathbf{A}<n\) which means \(\rank L_\mathbf{A}L_\mathbf{B}<n\) so \(\mathbf{A}\mathbf{B}\) is not invertible and both sides are zero. Similarly if \(\mathbf{B}\) is not invertible, then the nullity of \(L_\mathbf{B}\) and therefore \(L_\mathbf{A}L_\mathbf{B}\) contain more than \(0\) so again both sides are zero. Finally if both \(A\) and \(B\) are invertible then both can be expressed as products of elementary matrices and the result follows by repeated use of \(\det\mathbf{E}\mathbf{A}=\det\mathbf{E}\det\mathbf{A}\).\(\blacksquare\)

Another perspective on determinants comes from regarding an \(n\times n\) matrix as an \(n\)-tuple of \(n\)-dimensional column vectors. There is then an obvious isomorphism \(\text{Mat}_n(K)\cong K^n\times\dots\times K^n\) and the determinant may be regarded as a function \(\det:K^n\times\dots\times K^n\mapto K\). As such, it follows from the definition that it is multilinear in the sense that it is linear in each column, \(v_i\) say, of the matrix. That is,

\begin{equation*}
\det(v_1,\dots,cv_i,\dots,v_n)=c\det(v_1,\dots,v_i,\dots,v_n),
\end{equation*}
and
\begin{equation*}
\det(v_1,\dots,u_i+v_i,\dots,v_n)=\det(v_1,\dots,u_i,\dots,v_n)+\det(v_1,\dots,v_i,\dots,v_n).
\end{equation*}
Definition A multilinear function \(f:K^n\times\dots\times K^n\mapto K\) is a volume form on \(K^n\) if is alternating. That is, whenever \(i\neq j\) and \(v_i=v_j\), \(f(v_1,\dots,v_n)=0\).

Clearly, the determinant is an example of a volume form, and we may ask if there are any others. Consider the standard basis \(\{\mathbf{e}_i\}\) of \(K^n\), then any element of \(K^n\times\dots\times K^n\) has the form
\begin{equation*}
(A_1^i\mathbf{e}_i,A_2^i\mathbf{e}_i,\dots,A_n^i\mathbf{e}_i)
\end{equation*}
so if \(f\) is a volume form then
\begin{equation*}
f(A_1^i\mathbf{e}_i,A_2^i\mathbf{e}_i,\dots,A_n^i\mathbf{e}_i)=A_1^{i_1}A_2^{i_2}\dots A_n^{i_n}f(\mathbf{e}_{i_1}\dots,\mathbf{e}_{i_n}).
\end{equation*}
But \(f\) is alternating, so this is zero unless \((i_1,\dots,i_n)\) is some permutation of \((1,\dots,n)\) and moreover it is not difficult to see that \(f(\mathbf{e}_{\sigma(1)},\dots,\mathbf{e}_{\sigma(n)})=\sgn(\sigma)f(\mathbf{e}_1,\dots,\mathbf{e}_n)\) so
\begin{align*}
f(A_1^i\mathbf{e}_i,A_2^i\mathbf{e}_i,\dots,A_n^i\mathbf{e}_i)&=\sum_{\sigma\in S_n}A_1^{i_1}A_2^{i_2}\dots A_n^{i_n}\sgn(\sigma)f(\mathbf{e}_1\dots,\mathbf{e}_n)\\
&=\det(v_1,\dots,v_n)f(\mathbf{e}_1,\dots,\mathbf{e}_n).
\end{align*}
So any volume form is determined by \(f(\mathbf{e}_1,\dots,\mathbf{e}_n)\) and indeed the set of volume forms is a \(1\)-dimensional vector space. In other words, the determinant is uniquely specified by being multilinear, alternating, and having \(\det\mathbf{I}_n=1\) (\(f(\mathbf{e}_1,\dots,\mathbf{e}_n)=1\)), where \(\mathbf{I}_n\) is the \(n\times n\) identity matrix.

A rather neat demonstration of the fact that \(\det\mathbf{A}\mathbf{B}=\det\mathbf{A}\det\mathbf{B}\) may now be given. Define \(f:\text{Mat}_n(K)\mapto K\) as \(f(\mathbf{B})=\det\mathbf{A}\mathbf{B}\). Then \(f\) is a volume form, which is clear having observed, as above, that the \(j\)th column of \(\mathbf{A}\mathbf{B}\), let’s denote this \((AB)_j\), is a linear sum of the columns of \(\mathbf{A}\) with coefficients from the \(j\)th column of \(\mathbf{B}\), that is,
\begin{equation*}
(AB)_j=B_j^1(A)_1+B_j^2(A)_2+\dots+B_j^n(A)_n.
\end{equation*}
This means that
\begin{equation*}
f(\mathbf{B})=\det\mathbf{B}f(\mathbf{e}_1,\dots,\mathbf{e}_n),
\end{equation*}
but by the definition we have \(f(\mathbf{e}_1,\dots,\mathbf{e}_n)=f(\mathbf{I}_n)=\det\mathbf{A}\).

Clearly, \(\det{\mathbf{A}}^{-1}=(\det\mathbf{A})^{-1}\) so \(\det\mathbf{P}^{-1}\mathbf{A}\mathbf{P}=\det\mathbf{A}\) and it therefore makes sense to define the determinant of any linear operator \(T:V\mapto V\) as \(\det\mathbf{T}\) where \(\mathbf{T}\) is the matrix representation of \(T\) with respect to any basis. Thus, we may summarise what we have learnt about the invertibility of linear operators in the following theorem.

Theorem For any linear operator \(T\in\mathcal{L}(V)\) the following are equivalent:

\(T\) is invertible;
\(\det T\neq0\);
\(\ker T=\{0\}\);
\(\img T=V\).

There is another expression for the determinant which may be derived, called Laplace’s formula,\begin{equation}
\det(A) = \sum_{j=1}^n (-1)^{i+j} A_j^i M_{i,j} = \sum_{i=1}^n (-1)^{i+j} A_j^i M_{i,j},\label{eq:Laplace’s formula}
\end{equation}
where the expansion is along the \(i\)th row and \(j\)th column respectively. Here \(M_{i,j}\) is defined to be the determinant of the \((n-1)\times (n-1)\) matrix obtained by removing the \(i\)th row and \(j\)th column of the matrix \(\mathbf{A}\). It is called a minor. With the adjugate matrix, \(\adj\mathbf{A}\), defined as,
\begin{equation}
\adj A_j^i=(-1)^{i+j}M_{j,i},
\end{equation}
we then have
\begin{equation*}
((\adj A)A)_j^i=(\adj A)_k^iA_j^k=\sum_{k=1}^n(-1)^{i+k}M_{k,i}A_j^k.
\end{equation*}
In the last expression, if \(i=j\) then we simply have \(\det\mathbf{A}\), while if \(i\neq j\) then the expression is precisely that of the determinant of a matrix derived from \(\mathbf{A}\) by replacing the \(i\)th column by the \(j\)th column and as such is zero. Thus we see that,
\begin{equation}
\adj \mathbf{A}\mathbf{A}=\mathbf{A}\adj\mathbf{A}=\det\mathbf{A}\mathbf{I}_n,
\end{equation}
which is called Cramer’s rule, and which means in particular that
\begin{equation}
\mathbf{A}^{-1}=\frac{1}{\det\mathbf{A}}\adj\mathbf{A},\label{eq:Cramer’s rule}
\end{equation}
if \(\det\mathbf{A}\neq 0\).

The Problem of Outcomes

an Institute for Enquiring Minds production