The Cayley-Hamilton Theorem

As a vector space \(\mathcal{L}(V)\) is \(n^2\)-dimensional so there must exist some relationship between the \(n^2+1\) operators \(\id_V,T,\dots,T^{n^2}\). In fact, the following result, known as the Cayley-Hamilton theorem, guarantees a relationship between the powers of \(T\) up to \(n\).

Theorem (Cayley-Hamilton) Every linear operator \(T:V\mapto V\) satisfies its own characteristic equation, \(p_T(T)=0\). Equivalently, every \(n\times n\) matrix \(\mathbf{A}\) satisfies its own characteristic equation \(p_\mathbf{A}(\mathbf{A})=0\).

Proof When \(T\) is diagonalisable the result is obvious since by choosing the union of bases of the eigenspaces, \(V_{\lambda_i}\), as a basis of \(V\), any basis element is clearly annihilated by the product \((T-\lambda_1)\dots(T-\lambda_n)\). More generally, even if \(T\) is not diagonalisable, we know that we can always construct a basis \(v_i\), as in the discussion following Theorem, such that its matrix representation is upper triangular. As already observed, defining \(W_0=\{0\}\) and \(W_i=\Span(v_1,\dots,v_i)\) (\(W_n=V\)) for \(1\leq i\leq n\), \(W_i\) is \(T\)-invariant and \((T-\lambda_i\id_V)W_i\subseteq W_{i-1}\). Indeed,
\begin{align*}
(T-\lambda_n\id_V)V&\subseteq W_{n-1}\\
(T-\lambda_{n-1}\id_V)(T-\lambda_n\id_V)V&\subseteq W_{n-2}\\
&\vdots \\
\prod_{i=1}^n(T-\lambda_i\id_V)V&\subseteq W_0={0},
\end{align*}
that is, \(p_T(T)=0\).\(\blacksquare\)

Another way to see the Cayley-Hamilton result is as follows. Choose a basis \(\{e_i\}\) of \(V\) in terms of which \(Te_i=T_i^je_j\), the \(T^i_j\) being the components of the matrix representation of \(T\). We could write this as \((\delta^i_jT-T^i_jI_V)e_i=0\), or as the matrix equation,
\begin{equation*}
(T\mathbf{I}_{n}-\mathbf{T}^\mathsf{T})\begin{pmatrix}
e_1\\
\vdots\\
e_n
\end{pmatrix}
=
\begin{pmatrix}
T-T^1_1 & \dots & -T^n_1\\
\vdots & \ddots & \vdots\\
-T^1_n & \dots & T-T^n_n
\end{pmatrix}\begin{pmatrix}
e_1\\
\vdots\\
e_n
\end{pmatrix}
=0.
\end{equation*}
The matrix \(\mathbf{S}(T)\), defined by
\begin{equation*}
\mathbf{S}(T)=\begin{pmatrix}
T-T^1_1 & \dots & -T^n_1\\
\vdots & \ddots & \vdots\\
-T^1_n & \dots & T-T^n_n
\end{pmatrix}
\end{equation*}
exists in \(\text{Mat}_n(\text{End}(V))\), and as such would appear unlikely to be amenable to the techniques developed thus far for matrices over fields. In fact, we can regard \(\mathbf{S}(T)\) as a matrix over the commutative ring \(K[T]\), of polynomials in the symbol \(T\), with the obvious action on \(V\). As such, the standard definition and results from the theory of determinants, as described in Determinants, do indeed apply. In particular, we have
\begin{equation*}
\det(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})=p_\mathbf{T}(T),
\end{equation*}
and
\begin{equation*}
\adj(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})=\det(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})\mathbf{I}_n.
\end{equation*}
So
\begin{equation*}
0=\adj(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})
\begin{pmatrix}
e_1\\
\vdots\\
e_n
\end{pmatrix}
=\det(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})\begin{pmatrix}
e_1\\
\vdots\\
e_n
\end{pmatrix}=p_\mathbf{T}(T)\begin{pmatrix}
e_1\\
\vdots\\
e_n
\end{pmatrix},
\end{equation*}
and the result is established.