The Jordan Normal Form

Thus far, we have seen that given an \(n\)-dimensional vector space \(V\) defined over an algebraically closed field, then for any linear operator, \(T\in\mathcal{L}(V)\), a basis of \(V\) may be chosen such that the matrix representation of \(T\) is upper triangular. In certain optimal cases we know that a basis can be chosen so that the matrix representation is diagonal with entries given by the eigenvalues of \(T\) each appearing a number of times given by its geometric multiplicity. In these ‘good’ cases then, the linear operators are characterised by \(n\) continuous parameters, \(\lambda_i\), the eigenvalues.

The obvious question is whether we can extend such a characterisation beyond the particularly nice, diagonalisable, cases. The problem we have is clear, in general the characteristic polynomial \(p_T(x)=\prod_{i=1}^r(x-\lambda_i)^{n_i}\) does not have the property that \(\dim V_{\lambda_i}=n_i\). In other words, the eigenspaces may be a little too small and so prevent us achieving a perfect direct sum decomposition \(V=V_{\lambda_1}\oplus\dots\oplus V_{\lambda_r}\). However, by slightly generalising the notion of eigenspace we can achieve such a decomposition, and hence, a useful characterisation of general linear operators.

Whereas an eigenvector \(v\) of \(T\) with eigenvalue \(\lambda\) is such that \((T-\lambda\id_V)v=0\), a generalised eigenvector \(v\) of \(\lambda\) is such that \((T-\lambda\id_V)^kv=0\) for some \(k\geq 1\). The set of all such vectors corresponding to a given eigenvalue is then called the generalised eigenspace with eigenvalue \(\lambda\). This definition is actually equivalent to the following.

Definition The generalised eigenspace with eigenvalue \(\lambda\) of a linear operator \(T\in\mathcal{L}(V)\) is
\begin{equation}
V_{[\lambda]}=\left\{v\in V\mid(T-\lambda\id_V)^{\dim V}v=0\right\}.
\end{equation}

To see the equivalence we need to confirm that any \(v\) such that \((T-\lambda\id_V)^kv=0\) for some \(k\geq 1\) is in \(\ker(T-\lambda\id_V)^{\dim V}\). Now in general, for any \(T\in\mathcal{L}(V)\), we have an obvious chain of inclusions,
\begin{equation*}
\{0\}=\ker T^0\subseteq\ker T^1\subseteq\dotsm\subseteq\ker T^k\subseteq\ker T^{k+1}\subseteq\dotsm.
\end{equation*}
Observe, that if for some \(m\geq 1\), \(\ker T^m=\ker T^{m+1}\), then all subsequent spaces in the chain are also equal (for \(k\in\ZZ_{>0}\), if \(v\in\ker T^{m+k+1}\) then \(0=T^{m+1}(T^kv)\) so \(T^kv\in\ker T^{m+1}=\ker T^{m}\) so \(v\in\ker T^{k+m}\)). Thus the inclusions are strict until they are all equalities. In particular, if the chain hasn’t stopped growing by \(\ker T^{\dim V}\) then it certainly does at that point since we are limited by the dimension of \(V\). Thus, the definitions are indeed equivalent.

Just how big are the generalised eigenspaces? Before answering this question let’s establish that, in common with eigenspaces, generalised eigenspaces corresponding to distinct eigenvalues are distinct.

Proposition For any pair of distinct eigenvalues \(\lambda\neq\mu\), \(V_{[\lambda]}\cap V_{[\mu]}=\{0\}\).

Proof Suppose on the contrary that there was a non-zero \(v\in V\) such that \(v\in V_{[\lambda]}\) and \(v\in V_{[\mu]}\). Then, let \(k\) be the smallest integer such that \((T-\lambda\id_V)^kv=0\). It follows that \((T-\lambda\id_V)^{k-1}v\neq 0\) must be a genuine eigenvector of \(T\), call it \(v_\lambda\). But \(v\in V_{[\mu]}\), so \((T-\mu\id_V)^lv=0\) for some \(l\geq 1\), and thus
\begin{align*}
(T-\mu\id_V)^lv_\lambda&=(T-\mu\id_V)^l(T-\lambda\id_V)^{k-1}v\\
&=(T-\lambda\id_V)^{k-1}(T-\mu\id_V)^lv\\
&=0,
\end{align*}
so \(v_\lambda\in V_{[\mu]}\). But for any such eigenvector \(v_\lambda\) and all \(j\geq 1\), \((T-\mu\id_V)^jv_\lambda=(\lambda-\mu)^j\neq 0\) so \(v_\lambda\notin V_{[\mu]}\) and we have established a contradiction.\(\blacksquare\)

In particular, it follows from this that for any \(\mu\neq\lambda\), the restriction of \((T-\mu\id_V)\) to \(V_{[\lambda]}\) is one-to-one. That is, \(\lambda\) is the only eigenvalue of the restriction \(T|_{V_{[\lambda]}}\) of \(T\) to \(V_{[\lambda]}\). So the characteristic polynomial of \(T|_{V_{[\lambda]}}\) must be \((x-\lambda)^{\dim V_{[\lambda]}}\). Now, if the characteristic polynomial of \(T\) is \(p_T(x)=\prod_{i=1}^r(x-\lambda_i)^{n_i}\) then since \(V_{[\lambda_i]}\) is a \(T\)-invariant subspace of \(V\) we know that \(p_T(x)=(\lambda_i-x)^{\dim V_{[\lambda_i]}}p_{T’}(x)\) where \(T’:V/V_{[\lambda_i]}\mapto V/V_{[\lambda_i]}\) is the linear operator induced from \(T\). Thus, if we can show that \(\lambda_i\) is not a root of \(p_{T’}(x)\) then we have established that \(\dim V_{[\lambda_i]}=n_i\). That is we will have established that the dimensions of the generalised eigenspaces are precisely the algebraic multiplicities of the corresponding eigenvalues. So assume, on the contrary, that \(\lambda_i\) is a root of \(p_{T’}(x)\), then there must exist a non-zero \(v+V_{[\lambda_i]}\in V/V_{[\lambda_i]}\) such that \(T'(v+V_{[\lambda_i]})=\lambda_i(v+V_{[\lambda_i]})\). That is there exists some \(v\notin V_{[\lambda_i]}\) such that \((T-\lambda_i\id_V)v\in V_{[\lambda_i]}\). But then \(v\in V_{[\lambda_i]}\) and we have a contradicition.

Now consider the sum, \(U=\sum_{i=1}^rV_{[\lambda_i]}\). We would like to establish this to be a direct sum. That is, we must demonstrate the uniqueness of the expansion of an arbitrary element \(u\in U\) as a sum \(u=v_1+\dots+v_r\) of vectors \(v_i\in V_{[\lambda_i]}\). This is equivalent to showing that if \(v_1+\dots+v_r=0\) then \(v_i=0\) for all \(i=1\dots r\) which we may establish by applying in turn the linear operators \(\prod_{j\neq i}^r(T-\lambda_j\id_V)^{\dim V}\) for \(i=1\dots r\) to \(v_1+\dots+v_n=0\). For example, applying
\begin{equation}
(T-\lambda_2\id_V)^{\dim V}\cdots(T-\lambda_n\id_V)^{\dim V},
\end{equation}
we get
\begin{equation}
(\lambda_1-\lambda_2)^{\dim V}\cdots(\lambda_1-\lambda_n)^{\dim V}v_1=0,
\end{equation}
from which we deduce \(v_1=0\).

Thus, the sum is indeed direct and since each summand, \(V_{[\lambda_i]}\), has dimension \(\dim V_{[\lambda_i]}=n_i\) and \(\sum n_i=\dim V\) we have established the following theorem.

Theorem If \(\lambda_1,\dots,\lambda_r\) are the distinct eigenvalues of a linear operator \(T\in\mathcal{L}(V)\) with characteristic polynomial \(p_T(x)=\prod_{i=1}^r(x-\lambda_i)^{n_i}\), then
\begin{equation}
V=\bigoplus_{i=1}^rV_{[\lambda_i]},
\end{equation}
and \(\dim V_{[\lambda_i]}=n_i\).

From this theorem we see that any linear operator \(T\in\mathcal{L}(V)\) can be represented by a matrix in block diagonal form with each block corresponding to distinct eigenvalues of \(T\) and the size of each block equal to the algebraic multiplicity of the corresponding eigenvalue. Define \(T_i\) to be the restriction of \(T\) to the generalised eigenspace \(V_{[\lambda_i]}\). Since we are working over an algebraically closed field, each block, which is just a matrix representation of each \(T_i\), can be made upper triangular and since \(T_i\) has only \(\lambda_i\) as an eigenvalue, the diagonal elements of each upper triangular block are the eigenvalue corresponding to that block. If we are going to do better than this we’ll have to find a general way of choosing ‘nice’ bases for the generalised eigenspaces. By Cayley-Hamilton we know that each \(T_i\) satisfies \((T_i-\lambda_i\id_{V_{[\lambda_i]}})^{n_i}=0\). So \(T_i=\lambda_i\id_{V_{[\lambda_i]}}+N_i\) where \(N_i^{n_i}=0\). That is, \(T_i\) is a sum of a multiple of the identity operator and a nilpotent operator.

Thus our task is really to find a basis that expresses the matrix of any nilpotent operator in some sort of canonical way. Let us define \(m(v)\) to be the maximum positive integer such that \(N^{m(v)}v\neq 0\). This is sometimes called the height of the vector \(v\) with respect to the operator \(N\). The following result establishes the existence and uniqueness of such a basis.

Theorem If \(N\in\mathcal{L}(V)\) is nilpotent then there exist vectors \(v_1,\dots,v_k\in V\) such that
\begin{equation}
\{v_1,Nv_1,\dots,N^{m(v_1)}v_1,\dots,v_k,Nv_k,\dots,N^{m(v_k)}v_k\}\label{equ:nilpotent_basis}
\end{equation}
is a basis of \(V\). Here \(k=\dim\ker N\) and the \(m(v_i)\) are unique up to permutation.

Proof If \(\dim V=1\) then the result is trivial. We will prove it for general dimension by induction on the dimension of \(V\). The ‘trick’, as it were, is to observe that since the linear operator \(N\) has a non-trivial kernel, \(\img N\) is strictly contained within \(V\) and so we may assume by the induction hypothesis that the result holds for this subspace. That is, we may assume we have vectors, \(u_1,\dots,u_j\in\img N\), such that,
\begin{equation*}
\{u_1,Nu_1,\dots,N^{m(u_1)}u_1,\dots,u_j,Nu_j,\dots,N^{m(u_j)}u_j\},
\end{equation*}
is a basis of \(\img N\). Let us define \(v_i\), \(1\leq i\leq j\) to be such that \(u_i=Nv_i\). Then \(m(u_i)=m(v_i)-1\) and our basis of \(\img N\) is,
\begin{equation*}
\{Nv_1,N^2v_1,\dots,N^{m(v_1)}v_1,\dots,Nv_j,N^2v_j,\dots,N^{m(v_j)}v_j\}.
\end{equation*}
The vectors \(\{N^{m(v_1)}v_1,\dots,N^{m(v_j)}v_j\}\) are linearly independent and belong to \(\ker N\), indeed they are a basis for the kernel of the restriction to \(\img N\) of \(N\), \(\ker N|_{\img N}\). Define the vectors \(v_{j+1},\dots,v_k\) as the vectors which extend those \(j\) vectors to a basis of \(\ker N\). With \(v_1,\dots,v_k\) so defined, let us consider the set of vectors
\begin{equation*}
\{v_1,Nv_1,\dots,N^{m(v_1)}v_1,\dots,v_j,Nv_j,\dots,N^{m(v_j)}v_j,v_{j+1},\dots,v_k\}.
\end{equation*}
Notice that there are \(\dim\img N+\dim\ker N=\dim V\) of them, so if we can show they are linearly independent then they are a basis for \(V\) as specified in the theorem (\(m(v_i)=0\) for \(j+1\leq i\leq k\)).
So let us suppose we have,
\begin{align*}
0&=a_{1,0}v_1+\dots+a_{1,m(v_1)}N^{m(v_1)}v_1+\dots\\
&+a_{j,0}v_j+\dots+a_{j,m(v_j)}N^{m(v_j)}v_j\\
&+a_{j+1,0}v_{j+1}+\dots+a_{k,0}v_k.
\end{align*}
Applying \(N\) to this we obtain
\begin{equation*}
a_{1,0}u_1+\dots a_{1,m(u_1)}N^{m(u_1)}u_1+\dots+a_{j,0}u_j+\dots a_{j,m(u_j)}N^{m(u_j)}u_j=0.
\end{equation*}
By the induction hypothesis, this implies \(a_{r,s}=0\) for \(1\leq r\leq j\), \(0\leq s\leq m(v_r)-1\). Thus, we are reduced to considering
\begin{equation*}
a_{1,m(v_1)}N^{m(v_1)}v_1+\dots+a_{j,m(v_j)}N^{m(v_j)}v_j+
a_{j+1,0}v_{j+1}+\dots+a_{k,0}v_k=0.
\end{equation*}
But the \(v_{j+1},\dots,v_k\) were defined to be precisely an extension of the basis \begin{equation*}
\{N^{m(v_1)}v_1,\dots,N^{m(v_j)}v_j\}
\end{equation*}
of \(\ker N|_{\img N}\) to a basis of \(\ker N\). That is
\begin{equation*}
\ker N=\ker N|_{\img N}\oplus\Span(v_{j+1},\dots,v_k),
\end{equation*}
so we must have
\begin{equation*}
a_{1,m(v_1)}N^{m(v_1)}v_1+\dots+a_{j,m(v_j)}N^{m(v_j)}v_j=0
\end{equation*}
and
\begin{equation*}
a_{j+1,0}v_{j+1}+\dots+a_{k,0}v_k=0
\end{equation*}
and hence all the coefficients \(a_{i,j}\) must be 0. We have thus established the existence of \(k=\dim\ker N\) vectors \(v_i\) such that \eqref{equ:nilpotent_basis} is a basis of \(V\). For uniqueness, observe that \(\dim\ker N=|\{j\mid m(v_j)\geq 0\}|=k\), that is, the number of \(v_i\) whose height is greater than or equal to \(0\) is \(\dim\ker N\). Likewise, \(\dim\ker N^2=\dim\ker N+|\{j\mid m(v_j)\geq 1\}|\), so that the number of \(v_i\) whose height is greater than \(1\) is \(\dim\ker N^2-\dim\ker N\). Continuing in this fashion we obtain a partition 1 of \(n\) which may be represented as a Young tableau such as (in the case of \(n=11\))

image
In such a diagram the length of the first row is \(k\) with the length of successive rows given by \(\dim\ker N^i-\dim\ker N^{i-1}\). But note that the number of boxes in each column are then just the numbers \(m(v_i)\) the collection of heights. Thus the set of heights are completely determined by \(N\) and so the \(m(v_i)\) are indeed unique up to permutation.\(\blacksquare\)

Let’s consider what the matrix representation of some nilpotent operator \(N\) looks like with respect to this basis. In fact, we’ll make a small cosmetic tweak and flip the order of the basis elements corresponding to each \(v_i\). That is, we consider the basis
\begin{equation*}
(N^{m(v_1)}v_1,\dots,Nv_1,v_1,\dots,N^{m(v_k)}v_k,\dots,Nv_k,v_k).
\end{equation*}
Then the matrix representation of \(N\) with respect to this basis has a block diagonal, with \(k\) blocks each of size \(m(v_i)\), \(1\leq i\leq k\), and each of the form
\begin{equation*}
\begin{pmatrix}
0&1&&\mathbf{0}\\
&\ddots&\ddots&\\
& &\ddots&1\\
\mathbf{0}& & & 0
\end{pmatrix}.
\end{equation*}
We have arrived at the following result.

Theorem (Jordan normal form) For any linear operator \(T\in\mathcal{L}(V)\) on a finite dimensional vector space \(V\), over an algebraically closed field \(K\), there is a basis for \(V\) such that the matrix representation of \(T\), \(\mathbf{T}\), has the form
\begin{equation*}
\mathbf{T}=\begin{pmatrix}
\mathbf{T}_1& &\mathbf{0}\\
&\ddots& \\
\mathbf{0}& &\mathbf{T}_m
\end{pmatrix}
\end{equation*}
where each \(\mathbf{T}_i\), called a Jordan block, is an upper triangular matrix of the form
\begin{equation*}
\mathbf{T}_i=\begin{pmatrix}
\lambda_i&1& &\mathbf{0}\\
&\ddots&\ddots& \\
& &\ddots&1\\
\mathbf{0}& & &\lambda_i
\end{pmatrix},
\end{equation*}
with some eigenvalue \(\lambda_i\) of \(T\) on its diagonal. \(\mathbf{T}\) is called the Jordan normal form of \(T\) and is unique up to rearranging the order of the blocks.

In the language of group actions and orbits we have.

The orbits of the action of \(\text{GL}(V)\) on \(\mathcal{L}(V)\) where \(V\) is an \(n\)-dimensional vector space over an algebraically closed field are characterised by a set of \(n\) continuous parameters, the eigenvalues, together with some discrete parameters, partitions of the multiplicities of repeated eigenvalues.

Example Consider the matrix,
\begin{equation*}
\mathbf{A}=\begin{pmatrix}
2&-1&0\\
1&0&0\\
1&-1&1
\end{pmatrix}.
\end{equation*}
We form the characteristic equation,
\begin{align*}
\det(\mathbf{A}-\lambda\mathbf{I}_3)&=\det\begin{pmatrix}
2-\lambda&-1&0\\
1&-\lambda&0\\
1&-1&1-\lambda
\end{pmatrix}.\\
&=(1-\lambda)\det\begin{pmatrix}
2-\lambda\\
1&-\lambda
\end{pmatrix}\\
&=-(\lambda-1)^3.
\end{align*}
So we have one eigenvalue, \(\lambda=1\) with algebraic multiplicity \(3\). Consider then,
\(\mathbf{A}-\lambda\mathbf{I}_3\),
\begin{equation*}
\mathbf{A}-\lambda\mathbf{I}_3=\begin{pmatrix}
1&-1&0\\
1&-1&0\\
1&-1&0
\end{pmatrix}.
\end{equation*}
Clearly this has rank \(1\) so \(\dim\ker(\mathbf{A}-\lambda\mathbf{I}_3)=2\). We then know immediately that \(\dim\ker(\mathbf{A}-\lambda\mathbf{I}_3)^2=3\) and thus we have two blocks and the Jordan normal form is
\begin{equation*}
\begin{pmatrix}
1&0&0\\
0&1&1\\
0&0&1
\end{pmatrix}.
\end{equation*}

Notes:

  1. Recall that a partition of some positive integer \(n\) is simply a way of writing \(n\) as a sum of positive integers \(n_i\) typically written as a tuple \((n_1,\dots,n_r)\) such that \(n_i\geq n_{i+1}\). A visual representation of a partition is given by its Young diagram. For example the partition \((2,1)\) of 3 would correspond to the diagram image.