Diagonalisable Linear Operators

From now on, unless otherwise stated, we assume our vector spaces are defined over an algebraically closed field such as \(\CC\). Recall that in this case any linear operator on an \(n\)-dimensional vector space has a characteristic polynomial which factors as \(\prod_{i=1}^r(x-\lambda_i)^{n_i}\). This means in particular that any linear operator has at least one eigenvector and indeed, at least one per distinct eigenvalue.

Proposition A set of eigenvectors \(v_1,\dots,v_r\) corresponding to distinct eigenvalues \(\lambda_1,\dots,\lambda_r\) of a linear operator \(T\) are linearly independent.

Proof By assumption we have \(Tv_i=\lambda_iv_i\) for \(1\leq i\leq r\). Suppose there are numbers \(c^1,\dots,c^r\) such that \(c^iv_i=0\). Then we must have,
\begin{align*}
(T-\lambda_2)\dots(T-\lambda_r)c^iv_i&=c^1(T-\lambda_2)\dots(T-\lambda_r)v_1=0\\
&=c^1(\lambda_1-\lambda_2)\dots(\lambda_1-\lambda_r)v_1=0,
\end{align*}
which in turn means \(c_1=0\). In the same way we show \(c_2=\dots=c_n=0\), so proving linear independence.\(\blacksquare\)

Definition A linear operator \(T\in\mathcal{L}(V)\) is said to be diagonalisable if there is some basis of \(V\) with respect to which the matrix representation of \(T\) is diagonal. \ede

Suppose the factorisation of the characteristic polynomial is ‘nice’ in the sense that
\(p_T(x)=\prod_{i=1}^r(x-\lambda_i)^{n_i}\) with \(\dim V_{\lambda_i}=n_i\) for all \(i\). That is, the geometric multiplicity of each eigenvalue equals its algebraic multiplicity. Then, as follows from the Proposition, \(\sum_{\lambda_i}V_{\lambda_i}\) is a direct sum and so by equation, \(\dim(\sum_{\lambda_i}V_{\lambda_i})=\sum_i n_i=n\), and in any basis which is the union of bases for the \(V_{\lambda_i}\) the matrix representation of \(T\) is diagonal. The converse is obvious so we have demonstrated the following

Corollary A linear operator \(T\) is diagonalisable if and only if \(p_T(x)=\prod_{i=1}^r(x-\lambda_i)^{n_i}\) with \(\dim V_{\lambda_i}=n_i\) for all \(i\).

Suppose \(V\) is an \(n\)-dimensional vector space and \(T:V\mapto V\) a diagonalisable linear operator. If \(\{e_i\}\) is a basis of \(V\) with respect to which \(T\) has matrix representation \(\mathbf{T}\) such that \(Te_i=T_i^je_j\) and if \(\{v_i\}\) is a basis for \(V\) which is a union of eigenspace bases, such that \(Tv_i=\lambda_iv_i\) (not all \(\lambda_i\) necessarily distinct) then we may relate the two bases as \(v_i=P_i^je_j\) where \(P_i^j\) are the components of an invertible matrix \(\mathbf{P}\) and since \(Tv_i=P_i^jT_j^k{P^{-1}}_k^lv_l=\lambda_iv_i\) we see that the similarity transformation, \(\mathbf{P}\) diagonalises \(\mathbf{T}\). In particular, any \(n\times n\) matrix, \(\mathbf{A}\), over an algebraically closed field \(K\) is a linear operator on \(K^n\). In terms of the standard basis, \(\{\mathbf{e}_i\}\), of \(K^n\) we have \(\mathbf{A}\mathbf{e}_i=A_i^j\mathbf{e}_j\) and if \(\mathbf{A}\) is diagonalisable then there must exist a basis, \(\{\mathbf{v}_i\}\), of \(K^n\) such that \(\mathbf{A}\mathbf{v}_i=\lambda_i\mathbf{v}_i\) and an invertible matrix \(\mathbf{P}\) such that \(\mathbf{v}_i=P_i^j\mathbf{e}_j\). Note that \(\mathbf{P}\) is precisely the matrix whose \(i\)th column is the \(i\)th vector \(\mathbf{v}_i\). A diagonalisable matrix is diagonalised by the matrix whose columns are its eigenvectors.

Example (The Pauli matrices) Extremely important in quantum mechanics, the Pauli matrices, \(\sigma_x\), \(\sigma_y\) and \(\sigma_z\) are given by
\begin{equation}
\sigma_x=\begin{pmatrix}
0&1\\1&0
\end{pmatrix}
\quad
\sigma_y=\begin{pmatrix}
0&-i\\i&0
\end{pmatrix}
\quad
\sigma_z=\begin{pmatrix}
1&0\\0&-1
\end{pmatrix}.
\end{equation}
It is not difficult to see that each has the pair of eigenvalues \(\pm1\) so \(\sigma_x\) and \(\sigma_y\) are in fact similar to \(\sigma_z\) with the similarity transformation matrices given by
\begin{equation*}
\begin{pmatrix}
1&1\\
1&-1
\end{pmatrix}
\quad\text{and}\quad\begin{pmatrix}
1&1\\
i&-i
\end{pmatrix}
\end{equation*}
for \(\sigma_x\) and \(\sigma_y\) respectively.