Eigenvectors, Eigenvalues and Diagonalisation

We have seen that given any pair of vector spaces \(V,W\), then up to choice of bases, the linear transformations between them are completely determined by their rank, that is, by integers \(i\leq\min(\dim V,\dim W)\). We turn now to the task of establishing a similar result for linear operators \(T:V\mapto V\). That is, for a given vector space \(V\), we wish to characterise the linear operators \(\mathcal{L}(V)\) up to a choice of basis for \(V\).

Suppose \(T:V\mapto V\) is a linear operator and \(U\) a \(T\)-invariant subspace of \(V\). If \(\{v_1,\dots,v_s\}\) is a basis of \(U\) we can extend it to a basis of \(V\) and then with respect to this basis the matrix representation of \(T\), \(\mathbf{T}\), will have the form,
\begin{equation}
\mathbf{T}=\begin{pmatrix}
\mathbf{t} & \mathbf{*}\\
\mathbf{0}&\mathbf{T}’
\end{pmatrix},
\end{equation}
where \(\mathbf{t}\) is the matrix representation of \(T|_U\), the restriction of \(T\) to \(U\) and \(\mathbf{T}’\) is the matrix representation of the induced map \(T’:V/U\mapto V/U\), such that \(T'(v+U)=Tv+U\), with respect to the basis \(\{v_{s+1}+U,\dots,v_n+U\}\) of \(V/U\). If we were able to do better and decompose \(V\) as a direct sum of two invariant subspaces, \(V=U_1\oplus U_2\) say, then if \(\{v_1,\dots, v_s\}\) is a basis of \(U_1\) and \(\{v_{s+1},\dots,v_n\}\) is a basis of \(U_2\) then \(\{v_1,\dots,v_n\}\) is a basis of \(V\) and with respect to this basis the matrix representation of \(T\) will have the form,
\begin{equation}
\mathbf{T}=\begin{pmatrix}
\mathbf{t}_1 & \mathbf{0}\\
\mathbf{0}&\mathbf{t}_2
\end{pmatrix},
\end{equation}
where \(\mathbf{t}_1\) is the basis of \(T|_{U_1}\) with respect to the basis \(\{v_1,\dots, v_s\}\) and \(\mathbf{t}_2\) is the basis of \(T|_{U_2}\) with respect to the basis \(\{v_{s+1},\dots, v_n\}\). So, assuming our our goal is to find a basis for \(V\) such that the matrix of a given linear operator \(T\) assumes the simplest possible form then it is clear that finding \(T\)-invariant subspaces will play an important role. This motivates the following definition.

Definition A non-zero vector \(v\in V\) is an eigenvector of a linear operator \(T:V\mapto V\) if
\begin{equation}
Tv=\lambda v,
\end{equation}
for some \(\lambda\in K\). In this case, \(\lambda\) is called the eigenvalue associated with \(v\). The subspace of \(V\) defined by
\begin{equation}
V_\lambda=\{v\in V\mid Tv=\lambda v\},
\end{equation}
is called the eigenspace of \(\lambda\) and its dimension, \(\dim V_\lambda\), the geometric multiplicity of \(\lambda\).

Clearly \(V_\lambda=\ker(T-\lambda\id_V)\) which is non-trivial if and only if \(\det(T-\lambda\id_V)=0\). That is, having chosen a basis for \(V\) such that the matrix representation of \(T\) is \(\mathbf{T}\), \(\lambda\) is an eigenvalue of \(T\) if and only if \(\det(\mathbf{T}-\lambda\mathbf{I}_n)=0\). This motivates the following definition.

Definition The characteristic polynomial of an \(n\times n\) matrix \(\mathbf{A}\) is
\begin{equation}
p_\mathbf{A}(x)=\det(\mathbf{A}-x\mathbf{I}_n),
\end{equation}
with \(p_\mathbf{A}(x)=0\) the characteristic equation.

Notice that since similar matrices clearly have the same characteristic polynomial it makes sense to define the characteristic polynomial \(p_T(x)\) for any \(T\in\mathcal{L}(V)\) as the characteristic polynomial of any matrix representation of \(T\).

Let us now consider a few simple examples of the calculation of eigenvalues and eigenvectors. Given an \(n\times n\) matrix \(\mathbf{A}\), the general procedure is first to compute its eigenvalues \(\lambda\) as the roots of the characteristic equation, \(\det(\mathbf{A}-\lambda\mathbf{I}_n)=0\), and then for each eigenvalue in turn solve \((\mathbf{A}-\lambda\mathbf{I}_n)\mathbf{v}=0\) for the corresponding eigenvectors \(\mathbf{v}\).

Example The matrix,
\begin{equation}
\begin{pmatrix}
1&1\\
0&2
\end{pmatrix},
\end{equation}
has characteristic polynomial \((1-\lambda)(2-\lambda)\) so has two eigenvalues, \(1\) and \(2\). For the eigenvalue \(1\), a corresponding eigenvector \(\mathbf{v}\), must satisfy,
\begin{equation}
\begin{pmatrix}
0&1\\
0&1
\end{pmatrix}
\mathbf{v}=0
\end{equation}
that is,
\begin{equation}
\mathbf{v}=\begin{pmatrix}
1\\
0
\end{pmatrix},
\end{equation}
while for the eigenvalue \(2\) we have
\begin{equation}
\begin{pmatrix}
-1&1\\
0&0
\end{pmatrix}
\mathbf{v}=0
\end{equation}
that is,
\begin{equation}
\mathbf{v}=\begin{pmatrix}
1\\
1
\end{pmatrix}.
\end{equation}
So in this case there are two distinct eigenvalues with each corresponding eigenspace one-dimensional.

Example The matrix,
\begin{equation}
\begin{pmatrix}
1&1\\
0&1
\end{pmatrix}.
\end{equation}
has characteristic polynomial \((1-\lambda)^2\) so has just one eigenvalue, \(1\). The corresponding eigenvector equation,
\begin{equation}
\begin{pmatrix}
0&1\\
0&0
\end{pmatrix}
\mathbf{v}=0
\end{equation}
tells us that the single corresponding eigenvector is
\begin{equation}
\mathbf{v}=\begin{pmatrix}
1\\
0
\end{pmatrix}.
\end{equation}
So in this case there is just one eigenvalue and its corresponding eigenspace is one-dimensional.

Example The identity matrix,
\begin{equation}
\begin{pmatrix}
1&0\\
0&1
\end{pmatrix}.
\end{equation}
also has characteristic polynomial \((1-\lambda)^2\) and so just one eigenvalue, \(1\). The corresponding eigenvector equation,
\begin{equation}
\begin{pmatrix}
0&0\\
0&0
\end{pmatrix}
\mathbf{v}=0
\end{equation}
though has two possible solutions
\begin{equation}
\mathbf{v}=\begin{pmatrix}
1\\
0
\end{pmatrix}\quad\text{and}\quad\begin{pmatrix}
0\\
1
\end{pmatrix}.
\end{equation}
So in this case there is just one eigenvalue but its corresponding eigenspace is two-dimensional.

The algebraic multiplicity of an eigenvalue \(\lambda\) is the multiplicity of \(\lambda\) as a root of the characteristic polynomial. From the previous examples we might guess that the algebraic multiplicity is greater than or equal to the geometric multiplicity. This is indeed the case, for if the geometric multiplicity of an eigenvalue \(\lambda\) of a linear operator \(T\) is \(s\) and we take \(\{v_1,\dots,v_s\}\) as a basis for \(V_\lambda\) we can extend it to a basis of \(V\). In this basis the matrix representation of \(T\) must have the form,
\begin{equation*}
\mathbf{T}=\begin{pmatrix}
\lambda\mathbf{I}_s & \mathbf{*}\\
\mathbf{0}&\mathbf{T}’
\end{pmatrix},
\end{equation*}
so the characteristic polynomial of \(T\) has the form, \(p_T(x)=p_\mathbf{T}(x)=(\lambda-x)^s\det(\mathbf{T}’-x\mathbf{I}_{n-s})\). So \((\lambda-x)^s\), which is just the characteristic polynomial of \(T|_{V_\lambda}\), divides \(p_T(x)\), from which it follows that the algebraic multiplicity is greater than or equal to the geometric multiplicity.

As already discussed, more generally, for any \(T\)-invariant subspace \(U\) of \(V\), we may choose a basis \(\{v_1,\dots,v_p\}\) of \(U\) and extend it to a basis \(\{v_1,\dots,v_p,v_{p+1},\dots,v_n\}\) of \(V\). Then with respect to this basis \(T\) has the matrix representation,
\begin{equation*}
\mathbf{T}=\begin{pmatrix}
\mathbf{t} & \mathbf{*}\\
\mathbf{0}&\mathbf{T}’
\end{pmatrix}.
\end{equation*}
where \(\mathbf{t}\) is the matrix representation of \(T|_U\) and \(\mathbf{T}’\) is the matrix representation of \(T’\) with respect to the basis \(\{v_{p+1}+U,\dots,v_n+U\}\) of \(V/U\). So we have \(p_T(x)=p_{T|_U}(x)\det(\mathbf{T}’-x\mathbf{I}_{n-p})=p_{T|_U}(x)p_{T’}(x)\).

In the previous examples we did not specify the field and indeed in each case we could have been working over either \(\RR\) or \(\CC\). In the following example the choice matters.

Example The matrix,
\begin{equation}
\begin{pmatrix}
0&1\\
-1&0
\end{pmatrix}.
\end{equation}
has characteristic polynomial \(\lambda^2+1\) and so regarded as a matrix over \(\RR\) it has no eigenvalues. Over \(\CC\), of course, there are two eigenvalues, \(\pm i\).

The characteristic polynomial, \(p_\mathbf{A}(x)\), of a matrix \(\mathbf{A}\) with \(r\) distinct eigenvalues, is said to factor over the field \(K\) if it can be written as
\begin{equation}
p_\mathbf{A}(x)=\prod_{i=1}^r(x-\lambda_i)^{n_i},
\end{equation}
for some \(\lambda_i\in K\) (\(\lambda_i\neq\lambda_j\) if \(i\neq j\)), and the algebraic multiplicities, \(n_i\) such that \(\sum_i n_i=n\). Clearly, the characteristic polynomial of the matrix of the last example does not factor over \(\RR\) (though it does over \(\CC\)), whilst those of the previous three, which happen to be upper triangular, do. Generally, it is not difficult to see that for any upper triangular matrix, \(\mathbf{A}\), its characteristic polynomial has the particularly simple form, \(p_\mathbf{A}(x)=(A_1^1-x)\dots(A_n^n-x)\), that is, any upper triangular matrix factors and its diagonal elements are precisely its eigenvalues.

Theorem If \(\mathbf{A}\) is an \(n\times n\) matrix over \(K\), then \(p_\mathbf{A}(x)\) factors if and only if \(\mathbf{A}\) is similar to an upper triangular matrix.

Proof The if we’ve already observed. Conversely, suppose \(p_\mathbf{A}(x)\) factors. If \(n=1\) then there’s nothing to prove. For \(n>1\), there is some \(\lambda\) such that \(p_\mathbf{A}(\lambda)=0\). That is, there is some \(\mathbf{v}_1\in K^n\) such that \(\mathbf{A}\mathbf{v}_1=\lambda\mathbf{v}_1\). Extend this to a basis of \(K^n\), \(v_1,\dots,v_n\). Now consider the matrix representation of \(L_\mathbf{A}\) with respect to this basis,
\begin{equation}
\begin{pmatrix}
\lambda & \mathbf{*}\\
\mathbf{0} & \mathbf{A}’
\end{pmatrix}.
\end{equation}
This is, of course, just the similarity transformation of \(\mathbf{A}\) by the appropriate change of basis matrix. It’s characteristic polynomial is clearly, \((x-\lambda)p_{\mathbf{A}’}(x)\), and given the initial assumption that \(p_\mathbf{A}(x)\) factors it follows that \(p_{\mathbf{A}’}(x)\) factors. So we can repeat the process for the matrix \(\mathbf{A}’\), finding a similarity transformation \(\mathbf{P}\) such that
\begin{equation*}
\mathbf{P}^{-1}\mathbf{A}’\mathbf{P}=\begin{pmatrix}
\lambda’&\mathbf{*}\\
\mathbf{0}&\mathbf{A}”
\end{pmatrix},
\end{equation*}
that is,
\begin{equation*}
\begin{pmatrix}
1 & \mathbf{0}\\
\mathbf{0} & \mathbf{P}^{-1}
\end{pmatrix}
\begin{pmatrix}
\lambda & \mathbf{*}\\
\mathbf{0} & \mathbf{A}’
\end{pmatrix}
\begin{pmatrix}
1 & \mathbf{0}\\
\mathbf{0} & \mathbf{P}
\end{pmatrix}
=
\begin{pmatrix}
\lambda & * & \mathbf{*}\\
0 & \lambda’ & \mathbf{*}\\
\mathbf{0} & \mathbf{0} & \mathbf{A}”
\end{pmatrix},
\end{equation*}
and so on, arriving finally at the desired upper triangular matrix.\(\blacksquare\)

From the perspective of linear operators, the construction works as follows. Since \(p_T(x)\) factors we know that \(T\) has at least one non-zero eigenvector, \(v_1\) say, with corresponding eigenvalue \(\lambda_1\). Defining \(W_1\equiv\Span(v_1)\) then since \(W_1\) is \(T\)-invariant we can define \(T’:V/W_1\mapto V/W_1\) as \(T'(v+W_1)=Tv+W_1\). As already observed, the characteristic polynomial of \(T\) can then be written as \(p_T(x)=p_{T|_{W_1}}(x)p_{T’}(x)=(x-\lambda_1)p_{T’}(x)\) and since by assumption \(p_T(x)\) factors, so too must \(p_{T’}(x)\). Thus \(T’\) has a non-zero eigenvector, \(v_2+W_1\) say, and if \(\lambda_2\) is the corresponding eigenvalue we have
\begin{equation*}
T'(v_2+W_1)=\lambda_2(v_2+W_1).
\end{equation*}
This means that \(Tv_2-\lambda_2v_2\in W_1\), so defining \(W_2=\Span(v_1,v_2)\), we have \(Tv_2\in W_2\) and so \(W_2\) is \(T\)-invariant. Also, since \(v_2+W_1\) is non-zero, \(v_1\) and \(v_2\) are linearly independent. Continuing in this way we construct a basis, \(\{v_1,\dots,v_n\}\), of \(V\) with the property that for any \(1\leq k\leq n\), \(W_k\) is \(T\)-invariant. The corresponding matrix representation is upper triangular and we note for future reference that \((T-\lambda_k\id_V)W_k\subseteq W_{k-1}\).

The field \(\CC\) is said to be algebraically closed since, by the fundamental theorem of algebra, every polynomial in \(\CC[x]\) splits into linear factors. Clearly any matrix over an algebraically closed field \(K\) is similar to an upper triangular matrix from which its eigenvalues can then be read. Note that the field \(\RR\) is not algebraically closed (\(\CC\) is actually the algebraic closure of \(\RR\)).