Author Archives: abocadabro

The Cayley-Hamilton Theorem

As a vector space \(\mathcal{L}(V)\) is \(n^2\)-dimensional so there must exist some relationship between the \(n^2+1\) operators \(\id_V,T,\dots,T^{n^2}\). In fact, the following result, known as the Cayley-Hamilton theorem, guarantees a relationship between the powers of \(T\) up to \(n\).

Theorem (Cayley-Hamilton) Every linear operator \(T:V\mapto V\) satisfies its own characteristic equation, \(p_T(T)=0\). Equivalently, every \(n\times n\) matrix \(\mathbf{A}\) satisfies its own characteristic equation \(p_\mathbf{A}(\mathbf{A})=0\).

Proof When \(T\) is diagonalisable the result is obvious since by choosing the union of bases of the eigenspaces, \(V_{\lambda_i}\), as a basis of \(V\), any basis element is clearly annihilated by the product \((T-\lambda_1)\dots(T-\lambda_n)\). More generally, even if \(T\) is not diagonalisable, we know that we can always construct a basis \(v_i\), as in the discussion following Theorem, such that its matrix representation is upper triangular. As already observed, defining \(W_0=\{0\}\) and \(W_i=\Span(v_1,\dots,v_i)\) (\(W_n=V\)) for \(1\leq i\leq n\), \(W_i\) is \(T\)-invariant and \((T-\lambda_i\id_V)W_i\subseteq W_{i-1}\). Indeed,
\begin{align*}
(T-\lambda_n\id_V)V&\subseteq W_{n-1}\\
(T-\lambda_{n-1}\id_V)(T-\lambda_n\id_V)V&\subseteq W_{n-2}\\
&\vdots \\
\prod_{i=1}^n(T-\lambda_i\id_V)V&\subseteq W_0={0},
\end{align*}
that is, \(p_T(T)=0\).\(\blacksquare\)

Another way to see the Cayley-Hamilton result is as follows. Choose a basis \(\{e_i\}\) of \(V\) in terms of which \(Te_i=T_i^je_j\), the \(T^i_j\) being the components of the matrix representation of \(T\). We could write this as \((\delta^i_jT-T^i_jI_V)e_i=0\), or as the matrix equation,
\begin{equation*}
(T\mathbf{I}_{n}-\mathbf{T}^\mathsf{T})\begin{pmatrix}
e_1\\
\vdots\\
e_n
\end{pmatrix}
=
\begin{pmatrix}
T-T^1_1 & \dots & -T^n_1\\
\vdots & \ddots & \vdots\\
-T^1_n & \dots & T-T^n_n
\end{pmatrix}\begin{pmatrix}
e_1\\
\vdots\\
e_n
\end{pmatrix}
=0.
\end{equation*}
The matrix \(\mathbf{S}(T)\), defined by
\begin{equation*}
\mathbf{S}(T)=\begin{pmatrix}
T-T^1_1 & \dots & -T^n_1\\
\vdots & \ddots & \vdots\\
-T^1_n & \dots & T-T^n_n
\end{pmatrix}
\end{equation*}
exists in \(\text{Mat}_n(\text{End}(V))\), and as such would appear unlikely to be amenable to the techniques developed thus far for matrices over fields. In fact, we can regard \(\mathbf{S}(T)\) as a matrix over the commutative ring \(K[T]\), of polynomials in the symbol \(T\), with the obvious action on \(V\). As such, the standard definition and results from the theory of determinants, as described in Determinants, do indeed apply. In particular, we have
\begin{equation*}
\det(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})=p_\mathbf{T}(T),
\end{equation*}
and
\begin{equation*}
\adj(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})=\det(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})\mathbf{I}_n.
\end{equation*}
So
\begin{equation*}
0=\adj(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})
\begin{pmatrix}
e_1\\
\vdots\\
e_n
\end{pmatrix}
=\det(T\mathbf{I}_n-\mathbf{T}^\mathsf{T})\begin{pmatrix}
e_1\\
\vdots\\
e_n
\end{pmatrix}=p_\mathbf{T}(T)\begin{pmatrix}
e_1\\
\vdots\\
e_n
\end{pmatrix},
\end{equation*}
and the result is established.

Diagonalisable Linear Operators

From now on, unless otherwise stated, we assume our vector spaces are defined over an algebraically closed field such as \(\CC\). Recall that in this case any linear operator on an \(n\)-dimensional vector space has a characteristic polynomial which factors as \(\prod_{i=1}^r(x-\lambda_i)^{n_i}\). This means in particular that any linear operator has at least one eigenvector and indeed, at least one per distinct eigenvalue.

Proposition A set of eigenvectors \(v_1,\dots,v_r\) corresponding to distinct eigenvalues \(\lambda_1,\dots,\lambda_r\) of a linear operator \(T\) are linearly independent.

Proof By assumption we have \(Tv_i=\lambda_iv_i\) for \(1\leq i\leq r\). Suppose there are numbers \(c^1,\dots,c^r\) such that \(c^iv_i=0\). Then we must have,
\begin{align*}
(T-\lambda_2)\dots(T-\lambda_r)c^iv_i&=c^1(T-\lambda_2)\dots(T-\lambda_r)v_1=0\\
&=c^1(\lambda_1-\lambda_2)\dots(\lambda_1-\lambda_r)v_1=0,
\end{align*}
which in turn means \(c_1=0\). In the same way we show \(c_2=\dots=c_n=0\), so proving linear independence.\(\blacksquare\)

Definition A linear operator \(T\in\mathcal{L}(V)\) is said to be diagonalisable if there is some basis of \(V\) with respect to which the matrix representation of \(T\) is diagonal. \ede

Suppose the factorisation of the characteristic polynomial is ‘nice’ in the sense that
\(p_T(x)=\prod_{i=1}^r(x-\lambda_i)^{n_i}\) with \(\dim V_{\lambda_i}=n_i\) for all \(i\). That is, the geometric multiplicity of each eigenvalue equals its algebraic multiplicity. Then, as follows from the Proposition, \(\sum_{\lambda_i}V_{\lambda_i}\) is a direct sum and so by equation, \(\dim(\sum_{\lambda_i}V_{\lambda_i})=\sum_i n_i=n\), and in any basis which is the union of bases for the \(V_{\lambda_i}\) the matrix representation of \(T\) is diagonal. The converse is obvious so we have demonstrated the following

Corollary A linear operator \(T\) is diagonalisable if and only if \(p_T(x)=\prod_{i=1}^r(x-\lambda_i)^{n_i}\) with \(\dim V_{\lambda_i}=n_i\) for all \(i\).

Suppose \(V\) is an \(n\)-dimensional vector space and \(T:V\mapto V\) a diagonalisable linear operator. If \(\{e_i\}\) is a basis of \(V\) with respect to which \(T\) has matrix representation \(\mathbf{T}\) such that \(Te_i=T_i^je_j\) and if \(\{v_i\}\) is a basis for \(V\) which is a union of eigenspace bases, such that \(Tv_i=\lambda_iv_i\) (not all \(\lambda_i\) necessarily distinct) then we may relate the two bases as \(v_i=P_i^je_j\) where \(P_i^j\) are the components of an invertible matrix \(\mathbf{P}\) and since \(Tv_i=P_i^jT_j^k{P^{-1}}_k^lv_l=\lambda_iv_i\) we see that the similarity transformation, \(\mathbf{P}\) diagonalises \(\mathbf{T}\). In particular, any \(n\times n\) matrix, \(\mathbf{A}\), over an algebraically closed field \(K\) is a linear operator on \(K^n\). In terms of the standard basis, \(\{\mathbf{e}_i\}\), of \(K^n\) we have \(\mathbf{A}\mathbf{e}_i=A_i^j\mathbf{e}_j\) and if \(\mathbf{A}\) is diagonalisable then there must exist a basis, \(\{\mathbf{v}_i\}\), of \(K^n\) such that \(\mathbf{A}\mathbf{v}_i=\lambda_i\mathbf{v}_i\) and an invertible matrix \(\mathbf{P}\) such that \(\mathbf{v}_i=P_i^j\mathbf{e}_j\). Note that \(\mathbf{P}\) is precisely the matrix whose \(i\)th column is the \(i\)th vector \(\mathbf{v}_i\). A diagonalisable matrix is diagonalised by the matrix whose columns are its eigenvectors.

Example (The Pauli matrices) Extremely important in quantum mechanics, the Pauli matrices, \(\sigma_x\), \(\sigma_y\) and \(\sigma_z\) are given by
\begin{equation}
\sigma_x=\begin{pmatrix}
0&1\\1&0
\end{pmatrix}
\quad
\sigma_y=\begin{pmatrix}
0&-i\\i&0
\end{pmatrix}
\quad
\sigma_z=\begin{pmatrix}
1&0\\0&-1
\end{pmatrix}.
\end{equation}
It is not difficult to see that each has the pair of eigenvalues \(\pm1\) so \(\sigma_x\) and \(\sigma_y\) are in fact similar to \(\sigma_z\) with the similarity transformation matrices given by
\begin{equation*}
\begin{pmatrix}
1&1\\
1&-1
\end{pmatrix}
\quad\text{and}\quad\begin{pmatrix}
1&1\\
i&-i
\end{pmatrix}
\end{equation*}
for \(\sigma_x\) and \(\sigma_y\) respectively.

The Trace

Another basis independent property of linear operators is the trace.

Definition The trace of an \(n\times n\) matrix \(\mathbf{A}\), \(\tr\mathbf{A}\) is defined to be
\begin{equation}
\tr\mathbf{A}=\sum_{i=1}^nA_i^i=A_i^i.
\end{equation}

As \(\tr\mathbf{A}\mathbf{B}=(AB)^i_i=A^i_jB^j_i=(BA)^i_i=\tr\mathbf{B}\mathbf{A}\) it follows that \(\tr\mathbf{P}^{-1}\mathbf{A}\mathbf{P}=\tr\mathbf{A}\) so it makes sense to define the trace of any linear operator \(T:V\mapto V\), \(\tr T\), as the trace of any matrix representation of \(T\).

Working over an algebraically closed field \(K\), since any matrix \(\mathbf{A}\in\text{Mat}_n(K)\) is similar to an upper triangular matrix, we have \(\tr\mathbf{A}=\sum_{i=1}^n\lambda_i\) and \(\det\mathbf{A}=\prod_{i=1}^n\lambda_i\) (not all \(\lambda_i\) necessarily distinct), quantities which are in fact encoded as particular coefficients in the characteristic polynomial,
\begin{equation}
p_\mathbf{A}(x)=x^n-\tr\mathbf{A}x^{n-1}+e_2x^{n-2}-\dots+(-1)^{n-1}e_{n-1}x+(-1)^n\det\mathbf{A}.
\end{equation}
The coefficients \(e_1=\tr\mathbf{A},e_2,\dots,e_{n-1},e_n=\det\mathbf{A}\) are called the elementary symmetric functions.

There is a nice relationship between the trace and determinant. It can be shown that the matrix exponential,
\begin{equation}
\exp\mathbf{A}=\sum_{i=0}^\infty\frac{1}{i!}\mathbf{A}^i=\mathbf{I}_n+\mathbf{A}+\frac{1}{2!}\mathbf{A}^2+\dots,
\end{equation}
converges for any \(n\times n\) matrix over \(K\). Consider then, the function on \(\RR\) defined as,
\begin{equation*}
f(t)=\det\exp\mathbf{A}t.
\end{equation*}
Differentiating, we find that
\begin{equation*}
\frac{df(t)}{dt}=\tr\mathbf{A}f(t),
\end{equation*}
so that \(\ln f(t)=\tr\mathbf{A}t\) and in particular we have the following relationship between the determinant and trace,
\begin{equation}
\det\exp\mathbf{A}=\exp\tr\mathbf{A}.
\end{equation}

Eigenvectors, Eigenvalues and Diagonalisation

We have seen that given any pair of vector spaces \(V,W\), then up to choice of bases, the linear transformations between them are completely determined by their rank, that is, by integers \(i\leq\min(\dim V,\dim W)\). We turn now to the task of establishing a similar result for linear operators \(T:V\mapto V\). That is, for a given vector space \(V\), we wish to characterise the linear operators \(\mathcal{L}(V)\) up to a choice of basis for \(V\).

Suppose \(T:V\mapto V\) is a linear operator and \(U\) a \(T\)-invariant subspace of \(V\). If \(\{v_1,\dots,v_s\}\) is a basis of \(U\) we can extend it to a basis of \(V\) and then with respect to this basis the matrix representation of \(T\), \(\mathbf{T}\), will have the form,
\begin{equation}
\mathbf{T}=\begin{pmatrix}
\mathbf{t} & \mathbf{*}\\
\mathbf{0}&\mathbf{T}’
\end{pmatrix},
\end{equation}
where \(\mathbf{t}\) is the matrix representation of \(T|_U\), the restriction of \(T\) to \(U\) and \(\mathbf{T}’\) is the matrix representation of the induced map \(T’:V/U\mapto V/U\), such that \(T'(v+U)=Tv+U\), with respect to the basis \(\{v_{s+1}+U,\dots,v_n+U\}\) of \(V/U\). If we were able to do better and decompose \(V\) as a direct sum of two invariant subspaces, \(V=U_1\oplus U_2\) say, then if \(\{v_1,\dots, v_s\}\) is a basis of \(U_1\) and \(\{v_{s+1},\dots,v_n\}\) is a basis of \(U_2\) then \(\{v_1,\dots,v_n\}\) is a basis of \(V\) and with respect to this basis the matrix representation of \(T\) will have the form,
\begin{equation}
\mathbf{T}=\begin{pmatrix}
\mathbf{t}_1 & \mathbf{0}\\
\mathbf{0}&\mathbf{t}_2
\end{pmatrix},
\end{equation}
where \(\mathbf{t}_1\) is the basis of \(T|_{U_1}\) with respect to the basis \(\{v_1,\dots, v_s\}\) and \(\mathbf{t}_2\) is the basis of \(T|_{U_2}\) with respect to the basis \(\{v_{s+1},\dots, v_n\}\). So, assuming our our goal is to find a basis for \(V\) such that the matrix of a given linear operator \(T\) assumes the simplest possible form then it is clear that finding \(T\)-invariant subspaces will play an important role. This motivates the following definition.

Definition A non-zero vector \(v\in V\) is an eigenvector of a linear operator \(T:V\mapto V\) if
\begin{equation}
Tv=\lambda v,
\end{equation}
for some \(\lambda\in K\). In this case, \(\lambda\) is called the eigenvalue associated with \(v\). The subspace of \(V\) defined by
\begin{equation}
V_\lambda=\{v\in V\mid Tv=\lambda v\},
\end{equation}
is called the eigenspace of \(\lambda\) and its dimension, \(\dim V_\lambda\), the geometric multiplicity of \(\lambda\).

Clearly \(V_\lambda=\ker(T-\lambda\id_V)\) which is non-trivial if and only if \(\det(T-\lambda\id_V)=0\). That is, having chosen a basis for \(V\) such that the matrix representation of \(T\) is \(\mathbf{T}\), \(\lambda\) is an eigenvalue of \(T\) if and only if \(\det(\mathbf{T}-\lambda\mathbf{I}_n)=0\). This motivates the following definition.

Definition The characteristic polynomial of an \(n\times n\) matrix \(\mathbf{A}\) is
\begin{equation}
p_\mathbf{A}(x)=\det(\mathbf{A}-x\mathbf{I}_n),
\end{equation}
with \(p_\mathbf{A}(x)=0\) the characteristic equation.

Notice that since similar matrices clearly have the same characteristic polynomial it makes sense to define the characteristic polynomial \(p_T(x)\) for any \(T\in\mathcal{L}(V)\) as the characteristic polynomial of any matrix representation of \(T\).

Let us now consider a few simple examples of the calculation of eigenvalues and eigenvectors. Given an \(n\times n\) matrix \(\mathbf{A}\), the general procedure is first to compute its eigenvalues \(\lambda\) as the roots of the characteristic equation, \(\det(\mathbf{A}-\lambda\mathbf{I}_n)=0\), and then for each eigenvalue in turn solve \((\mathbf{A}-\lambda\mathbf{I}_n)\mathbf{v}=0\) for the corresponding eigenvectors \(\mathbf{v}\).

Example The matrix,
\begin{equation}
\begin{pmatrix}
1&1\\
0&2
\end{pmatrix},
\end{equation}
has characteristic polynomial \((1-\lambda)(2-\lambda)\) so has two eigenvalues, \(1\) and \(2\). For the eigenvalue \(1\), a corresponding eigenvector \(\mathbf{v}\), must satisfy,
\begin{equation}
\begin{pmatrix}
0&1\\
0&1
\end{pmatrix}
\mathbf{v}=0
\end{equation}
that is,
\begin{equation}
\mathbf{v}=\begin{pmatrix}
1\\
0
\end{pmatrix},
\end{equation}
while for the eigenvalue \(2\) we have
\begin{equation}
\begin{pmatrix}
-1&1\\
0&0
\end{pmatrix}
\mathbf{v}=0
\end{equation}
that is,
\begin{equation}
\mathbf{v}=\begin{pmatrix}
1\\
1
\end{pmatrix}.
\end{equation}
So in this case there are two distinct eigenvalues with each corresponding eigenspace one-dimensional.

Example The matrix,
\begin{equation}
\begin{pmatrix}
1&1\\
0&1
\end{pmatrix}.
\end{equation}
has characteristic polynomial \((1-\lambda)^2\) so has just one eigenvalue, \(1\). The corresponding eigenvector equation,
\begin{equation}
\begin{pmatrix}
0&1\\
0&0
\end{pmatrix}
\mathbf{v}=0
\end{equation}
tells us that the single corresponding eigenvector is
\begin{equation}
\mathbf{v}=\begin{pmatrix}
1\\
0
\end{pmatrix}.
\end{equation}
So in this case there is just one eigenvalue and its corresponding eigenspace is one-dimensional.

Example The identity matrix,
\begin{equation}
\begin{pmatrix}
1&0\\
0&1
\end{pmatrix}.
\end{equation}
also has characteristic polynomial \((1-\lambda)^2\) and so just one eigenvalue, \(1\). The corresponding eigenvector equation,
\begin{equation}
\begin{pmatrix}
0&0\\
0&0
\end{pmatrix}
\mathbf{v}=0
\end{equation}
though has two possible solutions
\begin{equation}
\mathbf{v}=\begin{pmatrix}
1\\
0
\end{pmatrix}\quad\text{and}\quad\begin{pmatrix}
0\\
1
\end{pmatrix}.
\end{equation}
So in this case there is just one eigenvalue but its corresponding eigenspace is two-dimensional.

The algebraic multiplicity of an eigenvalue \(\lambda\) is the multiplicity of \(\lambda\) as a root of the characteristic polynomial. From the previous examples we might guess that the algebraic multiplicity is greater than or equal to the geometric multiplicity. This is indeed the case, for if the geometric multiplicity of an eigenvalue \(\lambda\) of a linear operator \(T\) is \(s\) and we take \(\{v_1,\dots,v_s\}\) as a basis for \(V_\lambda\) we can extend it to a basis of \(V\). In this basis the matrix representation of \(T\) must have the form,
\begin{equation*}
\mathbf{T}=\begin{pmatrix}
\lambda\mathbf{I}_s & \mathbf{*}\\
\mathbf{0}&\mathbf{T}’
\end{pmatrix},
\end{equation*}
so the characteristic polynomial of \(T\) has the form, \(p_T(x)=p_\mathbf{T}(x)=(\lambda-x)^s\det(\mathbf{T}’-x\mathbf{I}_{n-s})\). So \((\lambda-x)^s\), which is just the characteristic polynomial of \(T|_{V_\lambda}\), divides \(p_T(x)\), from which it follows that the algebraic multiplicity is greater than or equal to the geometric multiplicity.

As already discussed, more generally, for any \(T\)-invariant subspace \(U\) of \(V\), we may choose a basis \(\{v_1,\dots,v_p\}\) of \(U\) and extend it to a basis \(\{v_1,\dots,v_p,v_{p+1},\dots,v_n\}\) of \(V\). Then with respect to this basis \(T\) has the matrix representation,
\begin{equation*}
\mathbf{T}=\begin{pmatrix}
\mathbf{t} & \mathbf{*}\\
\mathbf{0}&\mathbf{T}’
\end{pmatrix}.
\end{equation*}
where \(\mathbf{t}\) is the matrix representation of \(T|_U\) and \(\mathbf{T}’\) is the matrix representation of \(T’\) with respect to the basis \(\{v_{p+1}+U,\dots,v_n+U\}\) of \(V/U\). So we have \(p_T(x)=p_{T|_U}(x)\det(\mathbf{T}’-x\mathbf{I}_{n-p})=p_{T|_U}(x)p_{T’}(x)\).

In the previous examples we did not specify the field and indeed in each case we could have been working over either \(\RR\) or \(\CC\). In the following example the choice matters.

Example The matrix,
\begin{equation}
\begin{pmatrix}
0&1\\
-1&0
\end{pmatrix}.
\end{equation}
has characteristic polynomial \(\lambda^2+1\) and so regarded as a matrix over \(\RR\) it has no eigenvalues. Over \(\CC\), of course, there are two eigenvalues, \(\pm i\).

The characteristic polynomial, \(p_\mathbf{A}(x)\), of a matrix \(\mathbf{A}\) with \(r\) distinct eigenvalues, is said to factor over the field \(K\) if it can be written as
\begin{equation}
p_\mathbf{A}(x)=\prod_{i=1}^r(x-\lambda_i)^{n_i},
\end{equation}
for some \(\lambda_i\in K\) (\(\lambda_i\neq\lambda_j\) if \(i\neq j\)), and the algebraic multiplicities, \(n_i\) such that \(\sum_i n_i=n\). Clearly, the characteristic polynomial of the matrix of the last example does not factor over \(\RR\) (though it does over \(\CC\)), whilst those of the previous three, which happen to be upper triangular, do. Generally, it is not difficult to see that for any upper triangular matrix, \(\mathbf{A}\), its characteristic polynomial has the particularly simple form, \(p_\mathbf{A}(x)=(A_1^1-x)\dots(A_n^n-x)\), that is, any upper triangular matrix factors and its diagonal elements are precisely its eigenvalues.

Theorem If \(\mathbf{A}\) is an \(n\times n\) matrix over \(K\), then \(p_\mathbf{A}(x)\) factors if and only if \(\mathbf{A}\) is similar to an upper triangular matrix.

Proof The if we’ve already observed. Conversely, suppose \(p_\mathbf{A}(x)\) factors. If \(n=1\) then there’s nothing to prove. For \(n>1\), there is some \(\lambda\) such that \(p_\mathbf{A}(\lambda)=0\). That is, there is some \(\mathbf{v}_1\in K^n\) such that \(\mathbf{A}\mathbf{v}_1=\lambda\mathbf{v}_1\). Extend this to a basis of \(K^n\), \(v_1,\dots,v_n\). Now consider the matrix representation of \(L_\mathbf{A}\) with respect to this basis,
\begin{equation}
\begin{pmatrix}
\lambda & \mathbf{*}\\
\mathbf{0} & \mathbf{A}’
\end{pmatrix}.
\end{equation}
This is, of course, just the similarity transformation of \(\mathbf{A}\) by the appropriate change of basis matrix. It’s characteristic polynomial is clearly, \((x-\lambda)p_{\mathbf{A}’}(x)\), and given the initial assumption that \(p_\mathbf{A}(x)\) factors it follows that \(p_{\mathbf{A}’}(x)\) factors. So we can repeat the process for the matrix \(\mathbf{A}’\), finding a similarity transformation \(\mathbf{P}\) such that
\begin{equation*}
\mathbf{P}^{-1}\mathbf{A}’\mathbf{P}=\begin{pmatrix}
\lambda’&\mathbf{*}\\
\mathbf{0}&\mathbf{A}”
\end{pmatrix},
\end{equation*}
that is,
\begin{equation*}
\begin{pmatrix}
1 & \mathbf{0}\\
\mathbf{0} & \mathbf{P}^{-1}
\end{pmatrix}
\begin{pmatrix}
\lambda & \mathbf{*}\\
\mathbf{0} & \mathbf{A}’
\end{pmatrix}
\begin{pmatrix}
1 & \mathbf{0}\\
\mathbf{0} & \mathbf{P}
\end{pmatrix}
=
\begin{pmatrix}
\lambda & * & \mathbf{*}\\
0 & \lambda’ & \mathbf{*}\\
\mathbf{0} & \mathbf{0} & \mathbf{A}”
\end{pmatrix},
\end{equation*}
and so on, arriving finally at the desired upper triangular matrix.\(\blacksquare\)

From the perspective of linear operators, the construction works as follows. Since \(p_T(x)\) factors we know that \(T\) has at least one non-zero eigenvector, \(v_1\) say, with corresponding eigenvalue \(\lambda_1\). Defining \(W_1\equiv\Span(v_1)\) then since \(W_1\) is \(T\)-invariant we can define \(T’:V/W_1\mapto V/W_1\) as \(T'(v+W_1)=Tv+W_1\). As already observed, the characteristic polynomial of \(T\) can then be written as \(p_T(x)=p_{T|_{W_1}}(x)p_{T’}(x)=(x-\lambda_1)p_{T’}(x)\) and since by assumption \(p_T(x)\) factors, so too must \(p_{T’}(x)\). Thus \(T’\) has a non-zero eigenvector, \(v_2+W_1\) say, and if \(\lambda_2\) is the corresponding eigenvalue we have
\begin{equation*}
T'(v_2+W_1)=\lambda_2(v_2+W_1).
\end{equation*}
This means that \(Tv_2-\lambda_2v_2\in W_1\), so defining \(W_2=\Span(v_1,v_2)\), we have \(Tv_2\in W_2\) and so \(W_2\) is \(T\)-invariant. Also, since \(v_2+W_1\) is non-zero, \(v_1\) and \(v_2\) are linearly independent. Continuing in this way we construct a basis, \(\{v_1,\dots,v_n\}\), of \(V\) with the property that for any \(1\leq k\leq n\), \(W_k\) is \(T\)-invariant. The corresponding matrix representation is upper triangular and we note for future reference that \((T-\lambda_k\id_V)W_k\subseteq W_{k-1}\).

The field \(\CC\) is said to be algebraically closed since, by the fundamental theorem of algebra, every polynomial in \(\CC[x]\) splits into linear factors. Clearly any matrix over an algebraically closed field \(K\) is similar to an upper triangular matrix from which its eigenvalues can then be read. Note that the field \(\RR\) is not algebraically closed (\(\CC\) is actually the algebraic closure of \(\RR\)).

Determinants

If we focus on linear operators which are endomorphisms, that is, operators \(T\in\mathcal{L}(V)\) for some vector space \(V\), then basis independent properties are those of the orbits of the group action of \(\text{GL}(V)\) on \(\mathcal{L}(V)\) which maps \(T\mapto P^{-1}TP\). In other words, properties of square \(n\times n\) matrices \(\mathbf{A}\) which are also properties of all similar (also called conjugate) matrices \(\mathbf{P}^{-1}\mathbf{A}\mathbf{P}\) for some invertible \(n\times n\) matrix \(\mathbf{P}\). The change of basis matrix \(\mathbf{P}\) here is called a similarity transformation. One such attribute is the determinant.

Definition The determinant, \(\det\mathbf{A}\), of an \(n\times n\) matrix \(\mathbf{A}\) is defined to be
\begin{equation}
\det\mathbf{A} = \sum_{\sigma\in S_n}\sgn(\sigma)A_{\sigma_{1}}^1A_{\sigma_{2}}^2\dots A_{\sigma_{n}}^n\label{eq:def det}
\end{equation}
where the sum is over all permutations \(\sigma\) of the set \(\{1, 2,\dots, n\}\), \(\sigma_i\) being the image of \(i\) in a given permutation and \(S_n\) the set of all such permutations (the \(n\)-dimensional symmetric group) and \(\sgn(\sigma)\) the signature of \(\sigma\) (\(+1\) whenever the reordering given by \(\sigma\) can be achieved by successively interchanging two entries an even number of times, and \(-1\) otherwise.

This is sometimes expressed in terms of the Levi-Civita symbol \(\epsilon_{i_1i_2\dots i_n}\), defined as
\begin{equation}
\epsilon_{i_1i_2\dots i_n}=\begin{cases}
\sgn(i_1,i_2,\dots,i_n)&(i_1,i_2,\dots,i_n)\in S_n\\
0&(i_1,i_2,\dots,i_n)\notin S_n.
\end{cases}\label{def:Levi-Cevita}
\end{equation}

For any permutation \(\sigma\) given by
\begin{equation*}
\sigma=\begin{pmatrix}
1&2&\dots&n\\
\sigma_1&\sigma_2&\dots&\sigma_n
\end{pmatrix}
\end{equation*}
there is an inverse permutation, let’s call it \(\tau\), given by
\begin{equation*}
\tau=\begin{pmatrix}
\sigma_1&\sigma_2&\dots&\sigma_n\\
1&2&\dots&n
\end{pmatrix}
\end{equation*}
for which it is clear that \(\sgn(\sigma)=\sgn(\tau)\). This leads us to the observation that we could as well write the determinant as
\begin{equation}
\det\mathbf{A} = \sum_{\sigma\in S_n}\sgn(\sigma)A_1^{\sigma_{1}}A_2^{\sigma_{2}}\dots A_n^{\sigma_{n}}
\end{equation}
and that indeed
\begin{equation}
\det\mathbf{A}^\mathsf{T}=\det\mathbf{A}
\end{equation}
Note also that if the matrix is upper (or lower) triangular then \(\det\mathbf{A}=A_1^1\dots A_n^n\) and also that if two columns (or rows) are identical then \(\det\mathbf{A}=0\)

The effect of elementary row operations on the determinant is as follows. Interchanging rows changes the sign. Multiplying a row by a number multiplies the corresponding determinant by that number. Adding a multiple of one row to another leaves the determinant unchanged. The first two of these are obvious. For the third notice that for any permutation \(\sigma\) given by
\begin{equation*}
\sigma=\begin{pmatrix}
1&\dots&i&\dots&j&\dots&n\\
\sigma_1&\dots&\sigma_i&\dots&\sigma_j&\dots&\sigma_n
\end{pmatrix}
\end{equation*}
there will also be a permutation \(\sigma’\) given by
\begin{equation*}
\sigma’=\begin{pmatrix}
1&\dots&i&\dots&j&\dots&n\\
\sigma_1&\dots&\sigma_j&\dots&\sigma_i&\dots&\sigma_n
\end{pmatrix}
\end{equation*}
with \(\sgn(\sigma’)=-\sgn(\sigma)\). Thus if the elementary row operation transforms \(\mathbf{A}\) to \(\mathbf{A}’\), the matrix \(\mathbf{A}\) but with the \(i\)th row replaced by row \(i\) plus \(c\) times row \(j\), then
\begin{align*}
\det\tilde{\mathbf{A}} = \sum_{\sigma\in S_n}\sgn(\sigma) \bigl( &A_{\sigma_{1}}^1\dots A_{\sigma_{i}}^i\dots A_{\sigma_{j}}^j\dots A_{\sigma_{n}}^n \\
+ &cA_{\sigma_{1}}^1\dots A_{\sigma_{i}}^j\dots A_{\sigma_{j}}^j\dots A_{\sigma_{n}}^n \bigr)
\end{align*}
and we see that the term involving the constant \(c\) will be cancelled by the term corresponding to the permutation \(\sigma’\). The determinant of the identity matrix being of course 1, we see that if \(\mathbf{E}\) is any elementary row matrix, then \(\det\mathbf{E}\mathbf{A}=\det\mathbf{E}\det\mathbf{A}\). An entirely analogous discussion applies of course to elementary column matrices. So if an \(n\times n\) matrix \(\mathbf{A}\) is invertible then since it can be written as a product of elementary matrices, its determinant is non-zero. Conversely if the determinant is non-zero then the reduced row echelon form of the matrix has non-zero determinant so it must have full rank and therefore be invertible. We have therefore proved the following

Theorem An \(n\times n\) matrix \(\mathbf{A}\) is invertible if and only if \(\det\mathbf{A}\neq0\).

Another important property of determinants is contained in the following

Theorem For any \(n\times n\) matrices \(\mathbf{A}\) and \(\mathbf{B}\)
\begin{equation}
\det\mathbf{A}\mathbf{B}=\det\mathbf{A}\det\mathbf{B}.
\end{equation}

Proof Consider first the the case that \(A\) is not invertible. Then \(\rank L_\mathbf{A}<n\) which means \(\rank L_\mathbf{A}L_\mathbf{B}<n\) so \(\mathbf{A}\mathbf{B}\) is not invertible and both sides are zero. Similarly if \(\mathbf{B}\) is not invertible, then the nullity of \(L_\mathbf{B}\) and therefore \(L_\mathbf{A}L_\mathbf{B}\) contain more than \(0\) so again both sides are zero. Finally if both \(A\) and \(B\) are invertible then both can be expressed as products of elementary matrices and the result follows by repeated use of \(\det\mathbf{E}\mathbf{A}=\det\mathbf{E}\det\mathbf{A}\).\(\blacksquare\)

Another perspective on determinants comes from regarding an \(n\times n\) matrix as an \(n\)-tuple of \(n\)-dimensional column vectors. There is then an obvious isomorphism \(\text{Mat}_n(K)\cong K^n\times\dots\times K^n\) and the determinant may be regarded as a function \(\det:K^n\times\dots\times K^n\mapto K\). As such, it follows from the definition that it is multilinear in the sense that it is linear in each column, \(v_i\) say, of the matrix. That is,

\begin{equation*}
\det(v_1,\dots,cv_i,\dots,v_n)=c\det(v_1,\dots,v_i,\dots,v_n),
\end{equation*}
and
\begin{equation*}
\det(v_1,\dots,u_i+v_i,\dots,v_n)=\det(v_1,\dots,u_i,\dots,v_n)+\det(v_1,\dots,v_i,\dots,v_n).
\end{equation*}
Definition A multilinear function \(f:K^n\times\dots\times K^n\mapto K\) is a volume form on \(K^n\) if is alternating. That is, whenever \(i\neq j\) and \(v_i=v_j\), \(f(v_1,\dots,v_n)=0\).

Clearly, the determinant is an example of a volume form, and we may ask if there are any others. Consider the standard basis \(\{\mathbf{e}_i\}\) of \(K^n\), then any element of \(K^n\times\dots\times K^n\) has the form
\begin{equation*}
(A_1^i\mathbf{e}_i,A_2^i\mathbf{e}_i,\dots,A_n^i\mathbf{e}_i)
\end{equation*}
so if \(f\) is a volume form then
\begin{equation*}
f(A_1^i\mathbf{e}_i,A_2^i\mathbf{e}_i,\dots,A_n^i\mathbf{e}_i)=A_1^{i_1}A_2^{i_2}\dots A_n^{i_n}f(\mathbf{e}_{i_1}\dots,\mathbf{e}_{i_n}).
\end{equation*}
But \(f\) is alternating, so this is zero unless \((i_1,\dots,i_n)\) is some permutation of \((1,\dots,n)\) and moreover it is not difficult to see that \(f(\mathbf{e}_{\sigma(1)},\dots,\mathbf{e}_{\sigma(n)})=\sgn(\sigma)f(\mathbf{e}_1,\dots,\mathbf{e}_n)\) so
\begin{align*}
f(A_1^i\mathbf{e}_i,A_2^i\mathbf{e}_i,\dots,A_n^i\mathbf{e}_i)&=\sum_{\sigma\in S_n}A_1^{i_1}A_2^{i_2}\dots A_n^{i_n}\sgn(\sigma)f(\mathbf{e}_1\dots,\mathbf{e}_n)\\
&=\det(v_1,\dots,v_n)f(\mathbf{e}_1,\dots,\mathbf{e}_n).
\end{align*}
So any volume form is determined by \(f(\mathbf{e}_1,\dots,\mathbf{e}_n)\) and indeed the set of volume forms is a \(1\)-dimensional vector space. In other words, the determinant is uniquely specified by being multilinear, alternating, and having \(\det\mathbf{I}_n=1\) (\(f(\mathbf{e}_1,\dots,\mathbf{e}_n)=1\)), where \(\mathbf{I}_n\) is the \(n\times n\) identity matrix.

A rather neat demonstration of the fact that \(\det\mathbf{A}\mathbf{B}=\det\mathbf{A}\det\mathbf{B}\) may now be given. Define \(f:\text{Mat}_n(K)\mapto K\) as \(f(\mathbf{B})=\det\mathbf{A}\mathbf{B}\). Then \(f\) is a volume form, which is clear having observed, as above, that the \(j\)th column of \(\mathbf{A}\mathbf{B}\), let’s denote this \((AB)_j\), is a linear sum of the columns of \(\mathbf{A}\) with coefficients from the \(j\)th column of \(\mathbf{B}\), that is,
\begin{equation*}
(AB)_j=B_j^1(A)_1+B_j^2(A)_2+\dots+B_j^n(A)_n.
\end{equation*}
This means that
\begin{equation*}
f(\mathbf{B})=\det\mathbf{B}f(\mathbf{e}_1,\dots,\mathbf{e}_n),
\end{equation*}
but by the definition we have \(f(\mathbf{e}_1,\dots,\mathbf{e}_n)=f(\mathbf{I}_n)=\det\mathbf{A}\).

Clearly, \(\det{\mathbf{A}}^{-1}=(\det\mathbf{A})^{-1}\) so \(\det\mathbf{P}^{-1}\mathbf{A}\mathbf{P}=\det\mathbf{A}\) and it therefore makes sense to define the determinant of any linear operator \(T:V\mapto V\) as \(\det\mathbf{T}\) where \(\mathbf{T}\) is the matrix representation of \(T\) with respect to any basis. Thus, we may summarise what we have learnt about the invertibility of linear operators in the following theorem.

Theorem For any linear operator \(T\in\mathcal{L}(V)\) the following are equivalent:

  1. \(T\) is invertible;
  2. \(\det T\neq0\);
  3. \(\ker T=\{0\}\);
  4. \(\img T=V\).

There is another expression for the determinant which may be derived, called Laplace’s formula,\begin{equation}
\det(A) = \sum_{j=1}^n (-1)^{i+j} A_j^i M_{i,j} = \sum_{i=1}^n (-1)^{i+j} A_j^i M_{i,j},\label{eq:Laplace’s formula}
\end{equation}
where the expansion is along the \(i\)th row and \(j\)th column respectively. Here \(M_{i,j}\) is defined to be the determinant of the \((n-1)\times (n-1)\) matrix obtained by removing the \(i\)th row and \(j\)th column of the matrix \(\mathbf{A}\). It is called a minor. With the adjugate matrix, \(\adj\mathbf{A}\), defined as,
\begin{equation}
\adj A_j^i=(-1)^{i+j}M_{j,i},
\end{equation}
we then have
\begin{equation*}
((\adj A)A)_j^i=(\adj A)_k^iA_j^k=\sum_{k=1}^n(-1)^{i+k}M_{k,i}A_j^k.
\end{equation*}
In the last expression, if \(i=j\) then we simply have \(\det\mathbf{A}\), while if \(i\neq j\) then the expression is precisely that of the determinant of a matrix derived from \(\mathbf{A}\) by replacing the \(i\)th column by the \(j\)th column and as such is zero. Thus we see that,
\begin{equation}
\adj \mathbf{A}\mathbf{A}=\mathbf{A}\adj\mathbf{A}=\det\mathbf{A}\mathbf{I}_n,
\end{equation}
which is called Cramer’s rule, and which means in particular that
\begin{equation}
\mathbf{A}^{-1}=\frac{1}{\det\mathbf{A}}\adj\mathbf{A},\label{eq:Cramer’s rule}
\end{equation}
if \(\det\mathbf{A}\neq 0\).

The Dual Vector Space

Definition Let \(V\) be a vector space over a field \(K\). Then the dual space of \(V\) is \(V^*=\mathcal{L}(V,K)\). That is, the space of linear functions or functionals on \(V\).

If \(\{e_i\}\) is a basis of an \(n\)-dimensional vector space \(V\), then we can define elements \(e^i\) in \(V^*\) according to \(e^i(e_j)=\delta^i_j\) with the obvious linear extension to the whole of \(V\). It is not difficult to see that the \(e^i\) form a basis of \(V^*\). Indeed if \(c_ie^i=0\) then \(c_ie^i(e_j)=c_j=0\) for all \(j\) and if \(\varphi\in V^*\) then we can write \(\varphi=\varphi(e_i)e^i\) since on any basis element, \(e_j\), \(\varphi(e_i)e^i(e_j)=\varphi(e_j)\). Given a basis \(\{e_i\}\) of \(V\), the basis \(\{e^i\}\) so defined is called the dual basis of \(V^*\).

As \(\dim V^*=\dim V\) we have \(V^*\cong V\). Note that this isomorphism is not basis independent. In other words it is not canonical. By contrast, the vector spaces \(V\) and \(V^{**}\) are canonically isomorphic. Indeed, consider the map \(V\mapto V^{**}\) which takes \(v\mapsto\tilde{v}\) where \(\tilde{v}(\varphi)=\varphi(v)\) for any \(\varphi\in V^*\). Then it is not difficult to see that \(\tilde{v}\) so defined is certainly a linear function on \(V^*\) and that the map \(v\mapsto\tilde{v}\) is linear. It is injective since if \(v\neq0\) then we can extend \(v\) to a basis of \(V\) and so define on \(V\) a linear function \(\varphi\) such that \(\varphi(v)\neq0\) so \(\tilde{v}(\varphi)\neq0\) and thus \(\tilde{v}\neq0\). Since \(V\) is finite dimensional the map is an isomorphism.

Given a choice of basis, vectors \(v\) in \(V\) were identified with the column vectors, \(\mathbf{v}\), of their components. Similarly, given a choice of basis of the dual space, linear functionals \(\varphi\in V^*\) can naturally be identified with row vectors \(\boldsymbol{\varphi}\) of their components. Then \(\varphi(v)\) is simply the matrix product of the \(1\times n\) row vector \(\boldsymbol{\varphi}\) with the \(n\times 1\) column vector \(\mathbf{v}\), that is, \(\varphi(v)=\boldsymbol{\varphi}\mathbf{v}\).

We’ve seen that when we make a change of basis in \(V\), with \(e’_i=P_i^je_j\), then \(\mathbf{v}’=\mathbf{P}^{-1}\mathbf{v}\). The corresponding dual bases, \(\{e^i\}\) and \(\{e’^j\}\), must be related as say, \(e’^i=Q^i_je^j\), where \(Q^i_j\) are elements of an invertible matrix \(\mathbf{Q}\). Then we have \(e’^i(e’_j)=\delta^i_j=(Q^ike^k)(P^l_je_l)=Q^i_kP^k_j\). That is, \(\mathbf{Q}=\mathbf{P}^{-1}\). Now for any \(\varphi\in V^*\) we have the alternative expansions \(\varphi=\varphi_ie^i=\varphi’_ie’^i\), from which it follows that \(\boldsymbol{\varphi}’=\boldsymbol{\varphi}\mathbf{P}\). The components of vectors in the dual space thus transform with the change of basis matrix and so are said to transform covariantly.

Suppose \(V\) and \(W\) are respectively \(n\) and \(m\)-dimensional vector spaces over \(K\), with \(T\in\mathcal{L}(V,W)\). Then we can define the dual map \(T^*\), \(T^*:W^*\mapto V^*\) as \(T^*\omega=\omega T\). Having chosen bases for \(V\) and \(W\), \(\{e_i\}\) and \(\{f_i\}\) respectively, so that with respect to these bases \(\mathbf{T}\) is the matrix representation of \(T\), let us consider the matrix representation of \(T^*\) with respect to the dual bases \(\{e^i\}\) and \(\{f^i\}\). We have,
\begin{equation*}
(T^*f^i)(e_j)=f^i(Te_j)=f^i(T_j^kf_k)=T_j^i
\end{equation*}
but also
\begin{equation*}
(T^*f^i)(e_j)=((T^*)^i_ke^k)(e_j)=(T^*)^i_j,
\end{equation*}
that is \(\mathbf{T}^*=\mathbf{T}\). Note that if we’d chosen to identify elements of the dual space with column vectors of components then we would have found that the matrix of the dual map with respect to the dual bases was the transpose of the matrix of the original map.

Definition If \(U\) is a subspace of \(V\) then the annihilator of \(U\), \(U^\circ\) is defined to be
\begin{equation}
U^\circ=\{\varphi\in V^*\mid\varphi(u)=0\;\forall u\in U\}.
\end{equation}

Given a subspace \(U\) of \(V\) any element \(\varphi\in V^*\) has a restriction \(\varphi|_U\). This defines a linear operator \(\pi_U:V^*\mapto U^*\) as \(\pi_U(\varphi)=\varphi|_U\). Notice that \(\ker\pi_U=U^\circ\) and also that \(\img\pi_U=U^*\) so, since \(\dim V=\dim V^*\), we have
\begin{equation}
\dim U+\dim U^\circ=\dim V.\label{equ:annihilator dimension}
\end{equation}

Now suppose \(T\in\mathcal{L}(V,W)\), so that \(T^*\in\mathcal{L}(W^*,V^*)\), and consider \(\ker T^*\). That is, we consider elements \(\omega\in W^*\) such that \(T^*\omega=\omega T=0\). It is not difficult to see that this is precisely the annihilator of \(\img T\), that is
\begin{equation}
\ker T^*=(\img T)^\circ.
\end{equation}
Having observed that they have the same matrix representation (albeit modulo a different convention with regard to row and column vectors), we know that \(T\) and \(T^*\) have the same rank. But we can see this in a basis free fashion as follows. By the rank-nullity theorem we have \(\dim\img T^*=\dim W-\dim\ker T^*\). That is, \(\dim\img T^*=\dim W-\dim(\img T)^\circ\), but by \eqref{equ:annihilator dimension}, this just says that
\begin{equation}
\dim\img T^*=\dim\img T.
\end{equation}
Finally, maintaining a certain symmetry, we have
\begin{equation}
\img T^*=(\ker T)^\circ.
\end{equation}
To see this, consider \(\varphi\in\img T^*\), that is \(\varphi=\omega T\) for some \(\omega\in W^*\). Now if \(v\in\ker T\) then clearly \(\varphi(v)=\omega(Tv)=0\) so \(\varphi\in(\ker T)^\circ\). Thus we certainly have \(\img T^*\subseteq(\ker T)^\circ\). But \(\dim\ker T=\dim V-\dim\img T\) from the rank-nullity theorem, and we have just observed that \(\dim\ker T=\dim V-\dim(\ker T)^\circ\) so indeed \(\dim\img T=\dim(\ker T)^\circ\).

Realification and Complexification

In certain circumstances it turns out to be useful to pass from a complex vector space to a ‘corresponding’ real vector space or vice versa. The former is called realification and is really nothing more than restricting scalar multiplication to reals in the original complex vector space. If \(V\) is a vector space over \(\CC\), then its realification, \(V_\RR\), is the set \(V\) with vector addition and scalar multiplication by reals inherited unchanged from \(V\) (and the complex multiplication ‘forgotten’). If \(\{e_1,\dots,e_n\}\) is a basis of \(V\), then consider the vectors \(\{e_1,\dots,e_n,ie_1,\dots,ie_n\}\). It’s not difficult to see that these vectors form a basis of \(V_\RR\) so \(\dim V_\RR=2\dim V\). If \(T:V\mapto W\) is a linear transformation of complex vector spaces then we may also view it as a linear transformation, \(T_\RR:V_\RR\mapto W_\RR\), of real vector spaces. If the matrix representation of \(T\) with with respect to bases \(\{e_i\}\) and \(\{f_i\}\) of \(V\) and \(W\) respectively, is \(\mathbf{T}=\mathbf{A}+i\mathbf{B}\), with \(\mathbf{A}\) and \(\mathbf{B}\) both real matrices, then with respect to the bases \(\{e_1,\dots,e_n,ie_1,\dots,ie_n\}\) and \(\{f_1,\dots,f_n,if_1,\dots,if_n\}\), of respectively \(V_\RR\) and \(W_\RR\), \(T_\RR\) has the matrix representation,
\begin{equation}
\begin{pmatrix}
\mathbf{A}&-\mathbf{B}\\
\mathbf{B}&\mathbf{A}
\end{pmatrix}.
\end{equation}

This process of realification produces a rather special kind of real vector space, since it comes equipped with a particular linear operator which we’ll denote \(\iota\in\mathcal{L}(V_\RR)\), given by \(\iota v=iv\) and such that \(\iota^2=-\id_{V_\RR}\). This is the canonical example of a complex structure, a linear operator \(\iota\in\mathcal{L}(V)\) on any real vector space \(V\) such that \(\iota^2=-\id_V\).

Given a real vector space \(V\) equipped with a a complex structure \(\iota\) we may form a complex vector space structure on the set \(V\) by inheriting vector addition from \(V\) and defining complex scalar multiplication by \((a+ib)v=av+b\iota v\). It is not difficult to see that if \(\iota\) is the canonical complex structure on a real vector space \(V_\RR\) which is the realification of a complex vector space \(V\) then we just recover in this way the original complex vector space \(V\).

How might we proceed if we start from a real vector space without a complex structure? In particular, we would like to understand, in a basis free sense, the fact that whilst a real \(m\times n\) matrix \(\mathbf{A}\) is of course a linear transformation from \(\RR^n\) to \(\RR^m\), it may also be regarded as linear transformation from \(\CC^n\) to \(\CC^m\).

For any real vector space \(V\) we consider the (exterior) direct sum space \(V\oplus V\). We can then define complex structure on \(V\oplus V\), \(\iota:V\oplus V\mapto V\oplus V\) by \(\iota(v,v’)=(-v’,v)\). Then \(\iota^2=-\id_{V\oplus V}\), is clearly an isomorphism, and allows us to write any \((v,v’)\) in the suggestive form
\begin{equation*}
(v,v’)=(v,0)+\iota(v’,0).
\end{equation*}
The complexification of \(V\), \(V_\CC\), is defined to be the set \(V\oplus V\) equipped with the obvious vector addition and with complex multiplication given by
\begin{equation*}
(a+ib)(v,v’)\equiv(a+b\iota)(v,v’)=(av-bv’,av’+bv).
\end{equation*}
That \(V_\CC\) so defined is indeed a complex vector space is then obvious, given the properties of \(\iota\). It should also be clear that if \(V\) is an \(n\) dimensional real vector space, with basis \(\{e_i\}\) say, then the \((e_i,0)\) span \(V_\CC\). To see that they are also linearly independent over \(\CC\), suppose
\begin{equation*}
(a^1+ib^1)(e_1,0)+\dots+(a^n+ib^n)(e_n,0)=0.
\end{equation*}
This is true if and only if, \((a^ie_i,b^ie_i)=0\), itself true if and only if \(a^i=b^i=0\) for all \(i\).

Now, suppose \(T\in\mathcal{L}(V,W)\), is a linear transformation of real vector spaces. Define \(T_\CC:V_\CC\mapto W_\CC\) by \(T_\CC(v,v’)=(Tv,Tv’)\). Then since \(T_\CC(\iota(v,v’))=\iota(Tv,Tv’)\) it is clear that \(T_\CC\in\mathcal{L}(V_\CC,W_\CC)\). This is the unique linear map such that the diagram,
\begin{equation*}
\begin{CD}
V @>T>> W\\
@VV V @VV V\\
V_\CC @>T_\CC>> W_\CC
\end{CD}
\end{equation*}
in which the vertical maps are the standard inclusions (e.g.\ \(v\mapsto(v,0)\) of \(V\) into \(V_\CC\)), commutes. Given bases of \(V\) and \(W\), in which the matrix representation of \(T\) is \(\mathbf{T}\), it is clear that the matrix representation of \(T_\CC\) with respect to the bases obtained by the natural inclusions is identical.

Note that on \(V_\CC\) there is a natural notion of complex conjugation. For any \(v=(x,y)\in V_\CC\), \(x,y\in V\), we define \(v^*=(x,-y)\). For any \(T\in\mathcal{L}(V,W)\) we have \(T_\CC{(x,y)}^*=(T_\CC(x,y))^*\), that is, the complexification \(T_\CC\) commutes with this conjugation. In fact it is not difficult to show that a linear transformation in \(\mathcal{L}(V_\CC,W_\CC)\) is a complexification of a linear transformation in \(\mathcal{L}(V,W)\) if and only if it commutes with complex conjugation. Finally, note that \(\mathcal{L}(V,W)_\CC\cong\mathcal{L}(V_\CC,W_\CC)\), with the identification, \((S,T)\mapsto S_\CC+iT_\CC\), for \(S,T\in\mathcal{L}(V,W)\) and that therefore complex conjugation is also well defined in \(\mathcal{L}(V_\CC,W_\CC)\).

Sums, Intersections and Projections

While the intersection, \(U_1\cap U_2\), of two subspaces of a vector space \(V\) is again a subspace, the union, \(U_1\cup U_2\), is not unless \(U_1\subseteq U_2\) or \(U_2\subseteq U_1\). The sum \(U_1+U_2\) defined as \(\{u_1+u_2\mid u_1\in U_1,u_2\in U_2\}\) is the ‘smallest’ vector space containing \(U_1\cup U_2\). Just how large is it? Its not difficult to see that if we take \(u_1,\dots,u_d\) to be the basis \(U_1\cap U_2\) and extend it to a basis \(u_1,\dots,u_d,v_1,\dots,v_r\) of \(U_1\) and a basis \(u_1,\dots,u_d,w_1,\dots,w_s\) of \(U_2\), then \(u_1,\dots,u_d,v_1,\dots,v_r,w_1,\dots,w_s\) is a basis of \(U_1+U_2\) and we have
\begin{equation}
\dim(U_1\cap U_2)+\dim(U_1+U_2)=\dim(U_1)+\dim(U_2).
\end{equation}

Example In 3-dimensional space, if we consider two distinct planes which pass through the origin, so a pair of 2-dimensional subspaces, then their sum is of course the 3-dimensional space whilst their intersection is a 1-dimensional line, 1+3=2+2.

If \(U_1\), \(U_2\) and \(U_3\) are subspaces of a vector space \(V\) then notice that in general \(U_1\cap(U_2+U_3)\) and \((U_1\cap U_2)+(U_1\cap U_3)\) are not equal, intersection is not distributive over addition. Indeed, consider the subspaces \(U_1=\Span(\mathbf{e}_1+\mathbf{e}_2)\), \(U_2=\Span(\mathbf{e}_1)\) and \(U_3=\Span(\mathbf{e}_2)\) of \(\RR^2\). Then \(U_1\cap(U_2+U_3)=U_1\) but \(U_1\cap U_2=0=U_1\cap U_3\). Rather, we have the following equality,
\begin{equation}
U_1\cap(U_2+(U_1\cap U_3))=U_1\cap U_2+U_1\cap U_3.
\end{equation}
That \(U_1\cap(U_2+(U_1\cap U_3))\subset U_1\cap U_2+U_1\cap U_3\) follows since if \(v\in U_1\cap(U_2+(U_1\cap U_3))\) then \(v=u_1=u_2+u_{13}\) where \(u_1\in U_1\), \(u_2\in U_2\) and \(u_{13}\in (U_1\cap U_3)\). But then \(u_2\in U_1\) so indeed \(v\in U_1\cap U_2+U_1\cap U_3\). The reverse inclusion is immediate.

Definition A sum \(U=\sum_{i=1}^r U_i\) of vector subspaces \(U_i\subseteq V\) is direct if every \(u\in U\) can be written uniquely as \(u=u_1+\dots+u_r\) for some \(u_i\in U_i\).

Lemma The sum \(U_1+U_2\) of any pair of subspaces is direct if and only if \(U_1\cap U_2={0}\).

Proof If the sum is direct but there is some non-zero \(v\in U_1\cap U_2\), we could write the zero vector in two ways as \(0=0+0\) and \(0=v+(-v)\), contradicting directness. Conversely, if \(U_1\cap U_2=\{0\}\) and \(v=u_1+u_2\) as well as \(v=u’_1+u’_2\) then we must have \(u_1-u’_1=u’_2-u_2=0\) so the decomposition is unique. \(\blacksquare\)

More generally, the following theorem gives three equivalent criteria for a sum of arbitrary length to be direct.

Theorem A sum \(U=\sum_{i=1}^rU_i\) of vector subspaces \(U_i\subseteq V\) is direct if and only if one of the following three equivalent criteria holds:

  1. For each \(i\), \(U_i\cap\left(\sum_{j\neq i}U_j\right)=0\).
  2. If \(u_1+\dots+u_r=0\), \(u_i\in U_i\), then \(u_i=0\).
  3. Every \(u\in U\) can be written uniquely as \(u=u_1+\dots+u_r\) for some \(u_i\in U_i\)

Proof Suppose \((2)\) is false then \(-u_1=u_2+\dots+u_r\) which contradicts \((1)\), so \((1)\) implies \((2)\). Suppose \((3)\) were false so that we had \(u=u_1+\dots+u_r\) and \(u=u’_1+\dots+u’_r\), with not all \(u_i=u’_i\). Then subtracting these implies \((2)\) is false so \((2)\) implies \((3)\). Finally suppose \((1)\) is false, then we have some \(u\in U\) such that \(u\in U_i\) and \(u=u_1+\dots+u_{i-1}+u_{i+1}+\dots+u_r\). This implies \((3)\) is false so \((3)\) implies \((1)\).\(\blacksquare\)

Remark If \(\{e_1,\dots,e_n\}\) is a basis of \(V\) then clearly \(V=\sum_{i=1}^n\Span(e_i)\) is a direct sum.

Remark A situation which sometimes arises is that we know \(V=\sum_{i=1}^rU_i\) and also that \(\sum_{i=1}^r\dim U_i=\dim V\). Then choosing bases for each \(U_i\) we obtain a collection of vectors which certainly span \(V\). But since \(\sum_{i=1}^r\dim U_i=\dim V\) there must be \(\dim V\) of these vectors so they are a basis for \(V\) and we may conclude that the sum \(V=\sum_{i=1}^rU_i\) is direct.

If \(V=U+W\) is a direct sum of subspaces \(U\) and \(W\) these subspaces are said to be complementary. Given a subspace \(U\) of \(V\) then a complementary subspace \(W\) always exists. Just take a basis for \(U\), \(u_1,\dots,u_r\), and extend it to a basis \(u_1,\dots,u_r,w_1,\dots,w_{n-r}\) of \(V\) then \(W=\Span(w_1,\dots,w_{n-r})\) is a complementary subspace. Note that defining, for example, \(W’=\Span(w_1+u_1,w_2,\dots,w_{n-r})\) we obtain another subspace, also complementary to \(U\) but not equal to \(W\). Aside from the trivial cases of 0 and \(V\) complements of subspaces are not unique.

Example In \(\RR^2\), consider the subspace \(\Span(\mathbf{e}_1)\). Then \(\Span(\mathbf{e}_2)\) and \(\Span(\mathbf{e}_1+\mathbf{e}_2)\) are both examples of complementary subspaces.

Given two arbitrary vector spaces \(U\) and \(W\) their external direct sum, \(U\oplus W\), is defined to be the product set \(U\times W\) with vector space structure given by
\begin{align}
(u_1,w_1)+(u_2,w_2)&=(u_1+u_2,w_1+w_2)&c(u,w)=(cu,cw).
\end{align}
Now suppose \(U\) and \(W\) are in fact subspaces of some vector space \(V\) and consider the map \(\pi:U\oplus W\mapto V\) defined as \(\pi(u,w)=u+w\). This is clearly a linear transformation, with \(\ker\pi=\{(u,-u)\mid u\in U\cap W\}\) and \(\img\pi=U+W\). Thus in the case that \(U+W\) is a direct sum we have \(U+W\cong U\oplus W\) and in fact, abusing notation, we write in this case \(U+W=U\oplus W\). Furthermore, applying the rank-nullity theorem to the map \(\pi\), we obtain
\begin{equation}
\dim(U\oplus W)=\dim(U)+\dim(W)\label{equ:dim of sum}
\end{equation}

Example If \(U_1\) and \(U_2\) are subspaces of a vector space \(V\) with a non-zero intersection, \(U_0=U_1\cap U_2\) then we can write \(U_1=U_0\oplus U_1’\) and \(U_2=U_0\oplus U_2’\) for some subspaces \(U_1’\) and \(U_2’\) of \(U_1\) and \(U_2\) respectively. Consider the sum \(U_0+U_1’+U_2’\). It is not difficult to see that \(U_1+U_2=U_0+U_1’+U_2’\). In fact, since \(U_1\cap U_2’=0\), as follows since if \(u\in U_1\cap U_2’\) then \(u\in U_0\) and \(u\in U_2’\) contradicting \(U_2=U_0\oplus U_2’\), we can write \(U_1+U_2=(U_0\oplus U_1′)+U_2’=(U_0\oplus U_1′)\oplus U_2’\), that is,
\begin{equation}
U_1+U_2=U_0\oplus U_1’\oplus U_2′.
\end{equation}

Example If \(T\in\mathcal{L}(V)\), then note that whilst \(\dim V=\dim\ker T+\dim\img T\) it is not generally the case that \(\ker T\cap\img T=0\) so we cannot in general describe \(V\) as the direct sum of the kernel and image of a linear operator. Consider for example the \(V=\RR^2\) then the linear operator defined as
\begin{equation*}
T=\begin{pmatrix}0&1\\0&0\end{pmatrix}
\end{equation*}
is such that
\begin{equation*}
\ker T=\img T=\left\{\begin{pmatrix}1\\0\end{pmatrix}\right\}
\end{equation*}

If \(W\) is a subspace of a vector space \(V\) then we can form the quotient space \(V/W\) which is just the quotient group equipped with the obvious scalar multiplication. If \(w_1,\dots,w_r\) is a basis for \(W\) then we can extend it to a basis \(w_1,\dots,w_r,v_{1},\dots,v_{n-r}\) for \(V\) and \(v_1+W,\dots,v_{n-r}+W\) is then a basis for \(V/W\). In particular, we have
\begin{equation}
\dim V/W=\dim V-\dim W.
\end{equation}
Now if \(T:V\mapto V\), we could define a linear operator \(T’:V/W\mapto V/W\), as
\begin{equation}
T'(v+W)=Tv+W.
\end{equation}
But for this to be well-defined, \(W\) must be \(T\)-invariant, that is \(Tw\in W\) for any \(w\in W\). For if \(v+W=v’+W\), then \(v-v’=w\) for some \(w\in W\), and we require \(T'(v+W)=T'(v’+W)\). But
\begin{equation*}
T'(v+W)=Tv+W=T(v’+w)+W,
\end{equation*}
so we must have \(Tw\in W\).

Quotients of vector spaces also allow us to factorise certain linear maps in the following sense. Suppose \(T:V\mapto W\) is a linear transformation, and \(U\) is a subspace of \(V\) such that \(U\subseteq\ker T\). Define \(\pi:V\mapto V/U\) as \(\pi v=v+U\) (\(\pi\) is clearly linear with kernel \(\ker\pi=U\)). Then there exists a linear map \(T’:V/U\mapto W\) such that \(T=T’\pi\). That is, we have ‘factorised’ \(T\) as \(T’\pi\). Indeed, \(T’\) must clearly be defined as \(T'(v+U)=Tv+U\). This is well defined since if, \(v+U=v’+U\), then, \(v-v’\in U\), so that, \(T(v-v’)=0\) or \(Tv=Tv’\), so that \(T'(v+U)=T'(v’+U)\). In such a situation, \(T\) is said to factorise through \(V/U\).

Definition A projection on \(V\) is a linear operator \(P\in\mathcal{L}(V)\) such that
\begin{equation}
P^2=P.
\end{equation}

Proposition There is a one-to-one correspondence between projections \(P\), pairs of linear transformations \(P,Q\in\mathcal{L}(V)\) such that
\begin{equation}
P+Q=\id_V\qquad\text{and}\qquad PQ=0, \label{equ:proj op alter}
\end{equation}
and direct sum decompositions
\begin{equation}
V=U\oplus W. \label{equ:proj op decomp}
\end{equation}

Proof If \(P\) is a projection then we can define \(Q=\id_V-P\) and \eqref{equ:proj op alter} is obvious. Given operators \(P\) and \(Q\) satisfying \eqref{equ:proj op alter}, we can define subspaces \(U\) and \(W\) of \(V\) as \(U=PV\) and \(W=QV\). Then \(P+Q=\id_V\) implies that \(V=U+W\). That this sum is direct follows since if an element \(v\) belonged to both \(U\) and \(W\) then \(v=Pv_1=Qv_2\) for some \(v_1,v_2\in V\) which then means that \(Pv_1=PQv_2=0\), so \(v=0\) and \(V=U\oplus W\). Clearly \(\img Q\subseteq\ker P\) and conversely if \(v\in\ker P\) then \(v=Pv+Qv=Qv\) so \(\ker P\subseteq\img Q\) and \(\ker P\cong\img Q\). Likewise, \(\ker Q\cong\img P\), so we have
\begin{equation}
V=\img P\oplus\ker P=\ker Q\oplus\img Q.
\end{equation}
Given a direct sum decomposition \(V=U\oplus W\) any \(v\in V\) can be expressed uniquely as \(v=u+w\) with \(u\in U\), \(w\in W\) and we can therefore define a linear operator \(P\) by \(Pv=u\). So defined, \(P\) is clearly a projection. \(\blacksquare\)

Thus, we cannot speak of the projection onto some subspace \(U\) only a projection. There are as many projections onto a subspace \(U\) as there are complements of \(U\). However, note that if \(V=U\oplus W\) with \(P\) the projector onto \(W\) then, \(U=\ker P\), and, \(V/\ker P\cong W\), such that, \(v+\ker P\mapsto Pv\). So it is the case that all complements of \(U\) are isomorphic.

Example In the case of \(\RR^2\), for the direct sum decomposition \(\RR^2=\Span(\mathbf{e}_1)\oplus\Span(\mathbf{e}_2)\), the corresponding projections are
\begin{equation}
\mathbf{P}=\begin{pmatrix} 1&0\\0&0\end{pmatrix}\quad\text{and}\quad\mathbf{Q}=\begin{pmatrix} 0&0\\0&1\end{pmatrix},
\end{equation}
with \(\img\mathbf{P}=\Span(\mathbf{e}_1)\) and \(\img\mathbf{Q}=\Span(\mathbf{e}_2)\). For the direct sum decomposition, \(\RR^2=\Span(\mathbf{e}_1)\oplus\Span(\mathbf{e}_1+\mathbf{e}_2)\), the corresponding projections are
\begin{equation}
\mathbf{P}=\begin{pmatrix} 1&-1\\0&0\end{pmatrix}\quad\text{and}\quad\mathbf{Q}=\begin{pmatrix} 0&1\\0&1\end{pmatrix},
\end{equation}
with \(\img\mathbf{P}=\Span(\mathbf{e}_1)\) and \(\img\mathbf{Q}=\Span(\mathbf{e}_1+\mathbf{e}_2)\).

Remark Recalling the earlier Example, we note that a projection is an example of a linear operator for which the vector space does decompose as the direct sum of its kernel and image.

More generally, if we have linear operators \(P_1,\dots,P_r\) such that \(P_iP_j=0\) whenever \(i\neq j\) and \(P_1+\dots+P_r=\id_V\) then they are projectors and defining \(U_i=P_iV\),
\begin{equation}
V=U_1\oplus\dots\oplus U_r.
\end{equation}
Note that to check that this sum is really direct it is not enough to check that \(U_i\cap U_j=\{0\}\) whenever \(i\neq j\). 1 We confirm uniqueness directly. We have, \(v=(P_1+\dots+P_r)v=w_1+\dots+w_r\), say, and suppose we also have \(v=u_1+\dots+u_r\). Then applying \(P_i\) to both expressions we obtain \(u_i=w_i\) so the decomposition \(v=w_1+\dots+w_r\) is unique and the sum is direct. If we define \(U_{(i)}=\oplus_{j\neq i}U_j\) and \(P_{(i)}=\sum_{j\neq i}P_j\) then \(V=U_i\oplus U_{(i)}\), \(P_i+P_{(i)}=\id_V\) and \(P_iP_{(i)}=0\). So \(\ker P_i\cong\img P_{(i)}\), \(\img P_i\cong\ker P_{(i)}\) and \(V=\img P_i\oplus\ker P_i=\ker P_{(i)}\oplus\img P_{(i)}\).

If \(P_1\) and \(P_2\) are projections, which do not necessarily sum to the identity \(\id_V\), then it is natural to ask under what circumstances their sum (or difference) is also a projection.

Theorem Suppose \(P_1\) and \(P_2\) are projections onto subspaces \(U_1\) and \(U_2\) of a vector space \(V\) with \(W_1\) and \(W_2\) the respective complementary subspaces. Then,

  1. \(P=P_1+P_2\) is a projection if and only if \(P_1P_2=P_2P_1=0\) in which case \(\img P=U_1\oplus U_2\) and \(\img(\id_V-P)=W_1\cap W_2\).
  2. \(P=P_1-P_2\) is a projection if and only if \(P_1P_2=P_2P_1=P_2\) in which case \(\img P=U_1\cap U_2\) and \(\img(\id_V-P)=W_1\oplus W_2\).
  3. If \(P_1P_2=P_2P_1=P\), then \(P\) is a projection such that \(\img P=U_1\cap U_2\) and \(\img(\id_V-P)=W_1+W_2\)

Proof If \(P=P_1+P_2\) is a projection, then, \(P^2=P\), so that, \(P_1P_2+P_2P_1=0\). Multiplying by \(P_1\) from the left we obtain, \(P_1P_2+P_1P_2P_1=0\). Multiplying from the right we obtain, \(P_1P_2P_1+P_2P_1=0\), so that, \(P_1P_2=P_2P_1\), and hence \(P_1P_2=0\). That \(P_1P_2=P_2P_1=0\) implies \(P^2=P\) is clear. Assuming \(P\) is indeed a projection, consider \(\img P\). If \(v\in\img P\), then \(v=Pv=P_1v+P_2v\in U_1+U_2\). Conversely, if \(v\in U_1+U_2\), then \(v=u_1+u_2\) for some \(u_1\in U_1\) and \(u_2\in U_2\). Then, \(Pv=P_1u_1+P_2u_2\), since \(P_1P_2=0\), and so, \(Pv=v\), hence \(v\in\img P\). If \(v\in U_1\cap U_2\), then \(v=P_1v=P_1P_2v=0\), so \(U_1\cap U_2=0\) and \(\img P=U_1\oplus U_2\). Now if \(v\in\img(\id_V-P)\), then, \(v=v-Pv\), and it is clear that \(P_1v=0=P_2v\) so \(v\in W_1\cap W_2\). Conversely, if \(v\in W_1\cap W_2\), then
\begin{equation*}
(\id_V-P)v=v-(P_1+P_2)v=(\id_v-P_1)v-P_2v=(id_v-P_2)v=v
\end{equation*}
so \(v\in\img(\id_V-P)\). The other statements are proved similarly.\(\blacksquare\)

Suppose that with respect to some \(T\in\mathcal{L}(V)\) the subspace \(U\) of \(V\) is \(T\)-invariant and consider a direct sum \(V=U\oplus W\). Corresponding to any such direct sum decomposition we have a projection \(P\) such that \(\img P=U\) and \(\ker P=W\). Its not difficult to see that for any such projection we have \(PTP=TP\). Conversely, if \(PTP=TP\) for some projection \(P\) onto a subspace \(U\), then any \(u\in U\) is such that \(u=Pu\) so that \(Tu=TPu=PTPu=PTu\), that is, \(Tu\in U\), so \(U\) is \(T\)-invariant.

Theorem If \(V=U\oplus W\), for some subspaces \(U,W\subset V\), and \(P\) is the corresponding projection, then a necessary and sufficient condition for those subspaces to be invariant with respect to some linear operator \(T\in\mathcal{L}(V)\) is that \(T\) commutes with \(P\), \(TP=PT\).

Proof Assuming \(U\) and \(W\) are \(T\)-invariant then we know that \(PTP=TP\) but also \((1-P)T(1-P)=T(1-P)\). From the latter it follows that \(PTP=PT\) so \(TP=PT\). In the other direction, if \(TP=PT\), then for any \(u\in U\), \(u=Pu\) so that \(Tu=TPu=PTu\), or \(Tu\in U\). Likewise for any \(v\in V\).\(\blacksquare\)

Notes:

  1. For example, for the space \(\RR^2\) of all pairs \((x,y)\) we could define three subspaces \(U_1=\{(x,0)\mid x\in\RR\}\), \(U_2=\{(0,x)\mid x\in\RR\}\), and \(U_3=\{(x,x)\mid x\in\RR\}\). Clearly \(U_i\cap U_j=\{0\}\) whenever \(i\neq j\) but it is equally clear that we couldn’t express an arbitrary element \((x,y)\in\RR^2\) uniquely in terms of elements of \(U_1\), \(U_2\) and \(U_3\).

Rank and Solutions of Linear Systems

To actually compute the rank of a given linear transformation one would choose bases, express the transformation as a matrix and then use elementary row operations. Recall there are three types of elementary row operations which may be performed on an \(m\times n\) matrix \(A\), with corresponding elementary matrices obtained by carrying out the same operations on the identity matrix of the appropriate size. They are

Row interchange Interchanging the \(i\)th and \(j\)th rows. For example interchanging the 1st and 3rd rows of
\begin{equation*}
\begin{pmatrix}
1&2&3&4\\
3&4&5&6\\
7&8&9&10\\
11&12&13&14
\end{pmatrix}
\end{equation*}
we get
\begin{equation*}
\begin{pmatrix}
7&8&9&10\\
3&4&5&6\\
1&2&3&4\\
11&12&13&14
\end{pmatrix}
\end{equation*}
which is also obtained by multiplying from the left by
\begin{equation*}
\begin{pmatrix}
0&0&1&0\\
0&1&0&0\\
1&0&0&0\\
0&0&0&1
\end{pmatrix}
\end{equation*}

Row multiplication Multiplying the \(i\)th row by some number. For example multiplying the 1st row of\begin{equation*}
\begin{pmatrix}
1&2&3&4\\
3&4&5&6
\end{pmatrix}
\end{equation*}
by 2 we get
\begin{equation*}
\begin{pmatrix}
2&4&6&8\\
3&4&5&6
\end{pmatrix}
\end{equation*}
which is also obtained by multiplying from the left by
\begin{equation*}
\begin{pmatrix}
2&0\\
0&1
\end{pmatrix}
\end{equation*}

Row addition Replacing the \(i\)th row by the \(i\)th row plus some number times the \(j\)th row. For example replacing the 2nd row of
\begin{equation*}
\begin{pmatrix}
1&2&3\\
4&5&6\\
7&8&9
\end{pmatrix}
\end{equation*}
by the second row minus 4 times the 1st row we get
\begin{equation*}
\begin{pmatrix}
1&2&3\\
0&-3&-6\\
7&8&9
\end{pmatrix}
\end{equation*}
which is also obtained by multiplying from the left by
\begin{equation*}
\begin{pmatrix}
1&0&0\\
-4&1&0\\
0&0&1
\end{pmatrix}
\end{equation*}

Elementary row matrices are of course invertible, their inverses corresponding to the undoing of the original elementary row operation. Elementary row operations can be used to solve linear systems \(\mathbf{A}\mathbf{x}=\mathbf{b}\) by reducing the augmented matrix \(\mathbf{A}\mathbf{|}\mathbf{b}\) to row echelon form (all nonzero rows are above any rows of all zeroes and the leading coefficient of a nonzero row is 1 and is always strictly to the right of the leading coefficient of the row above it) or reduced row echelon form (row echelon form but with each leading coefficient being the only non-zero entry in its column). If we reduce a matrix \(\textbf{A}\) to reduced row echelon form using elementary row operations and record each step as an elementary row matrix \(\mathbf{E}_i\) then we obtain an expression for \(\textbf{A}\) as
\begin{equation}
\mathbf{A}=\mathbf{E}_1^{-1}\dots\mathbf{E}_a^{-1}\mathbf{A}’
\end{equation}
for some number \(a\) of row operations and \(\mathbf{A}’\) in reduced row echelon form.
The rank of the original matrix can already be recovered at this point as the number of non-zero rows of \(\mathbf{A}’\). Indeed, a matrix is invertible if and only if it can be expressed as a product elementary row matrices. In the more general case, by simplifying further using elementary column operations (defined in the obvious way), and recording the steps with the corresponding elementary column matrices \(\mathbf{F}_i\) we arrive at a matrix of the form equation and a factorisation of \(\mathbf{A}\) of the form,
\begin{equation}
\mathbf{A}=\mathbf{E}_1^{-1}\dots\mathbf{E}_a^{-1}\begin{pmatrix}
\mathbf{I}_{n-d}&\mathbf{0}_{n-d,d}\\
\mathbf{0}_{m-n+d,n-d}&\mathbf{0}_{m-n+d,d}
\end{pmatrix}\mathbf{F}_b^{-1}\dots\mathbf{F}_1^{-1},
\end{equation}
exactly as we established in equation.

Here’s another perspective on the equivalence of the dimension of the row and column spaces of a matrix. Observe first that if an \(m\times n\) matrix \(\mathbf{A}\) is the product of an \(m\times r\) matrix \(\mathbf{B}\) and an \(r\times n\) matrix \(\mathbf{C}\), \(\mathbf{A}=\mathbf{B}\mathbf{C}\), then the \(i\)th row of \(\mathbf{A}\) is just a linear combination of the \(r\) rows of \(\mathbf{C}\) with coefficients from the \(i\)th column of \(\mathbf{B}\),
\begin{align*}
\begin{pmatrix}
A_1^i&\dots&A^i_n
\end{pmatrix}&=B^i_1
\begin{pmatrix}
C^1_1&\dots&C^1_n
\end{pmatrix}\\
&+B^i_2
\begin{pmatrix}
C^2_1&\dots&C^2_n
\end{pmatrix}\\
&+\dots\\
&+B^i_r
\begin{pmatrix}
C^r_1&\dots&C^r_n
\end{pmatrix}.
\end{align*}
Similarly the \(j\)th column of \(\mathbf{A}\) is a linear combination of the \(r\) columns of \(\mathbf{B}\) with coefficients from the \(j\)th column of \(\mathbf{C}\),
\begin{equation*}
\begin{pmatrix}
A^1_j\\
\vdots\\
A^m_j
\end{pmatrix}=C^1_j
\begin{pmatrix}
B^1_1\\
\vdots\\
B^m_1
\end{pmatrix}
+C^2_j
\begin{pmatrix}
B^1_2\\
\vdots\\
B^m_2
\end{pmatrix}
+\dots
+C^r_j
\begin{pmatrix}
B^1_r\\
\vdots\\
B^m_r
\end{pmatrix}.
\end{equation*}
Now suppose that the dimension of the space spanned by the columns of an \(m\times n\) matrix is \(r\). This means that we can find \(r\) column vectors \(b_1,\dots,b_r\) in terms of which each column, \(a_i\), of \(A\) can be written as
\begin{equation*}
a_i=c_{1i}b_1+\dots+c_{ri}b_r.
\end{equation*}
But in so doing we have constructed the discussed \(\mathbf{A}=\mathbf{B}\mathbf{C}\) factorisation of \(\mathbf{A}\) and as such know that any row of \(\mathbf{A}\) can be expressed as a linear combination of the \(r\) rows of \(\mathbf{C}\). Moreover if the rows of \(\mathbf{A}\) could be expressed as a linear combination of fewer than \(r\) rows of \(\mathbf{C}\) then this would in turn imply, running the reasoning starting from the dimension of the row space, that the column space had dimension less than \(r\) contradicting our initial assumption. Thus we have established that the dimensions of the row and column spaces of an \(m\times n\) matrix \(\mathbf{A}\) are identical.

Recall that we noted that the solution set of any system of homogeneous equations in \(n\) variables over a field \(K\) is a subspace of \(K^n\). What is the dimension of this subspace? Writing the system of equations in the form \(\mathbf{A}\mathbf{x}=0\) where \(\mathbf{A}\) is some \(m\times n\) matrix we can then proceed to carry out elementary row operations on \(\mathbf{A}\) to transform the system of equations into echelon form. In so doing we produce a bijection, indeed a vector space isomorphism, between the solution set, let’s call it \(S\), and \(K^{n-r}\) where \(r\) is the rank of \(A\), in other words the number of “steps” of the echelon form. Think of this as arising as follows: For any \(n\)-tuple \((x_1,\dots,x_n)\in S\) we discard the \(x_j\) where \(j\) runs through the column indices corresponding to the beginnings of “stairs”. We now have the necessary machinery to make good on our claim that the abstract theory of vector spaces illuminates the theory of solutions of simultaneous linear equations in \(n\) unknowns just as pictures of intersecting planes do in the case of 3 unknowns.

Theorem (Rouché-Capelli) A system of linear equations in \(n\) variables, \(\mathbf{A}\mathbf{x}=\mathbf{b}\), has a solution if and only if the rank of its coefficient matrix, \(\mathbf{A}\), is equal to the rank of its augmented matrix, \(\mathbf{A}\mathbf{|}\mathbf{b}\). If a solution exists and \(\rank(\mathbf{A})=n\) the solution is unique. If a solution exists and \(\rank(\mathbf{A}){<}n\) there are infinite solutions. In either of these cases the set of solutions is of the form
\begin{equation*}
\{\mathbf{s}+\mathbf{p}\mid\mathbf{s}\in S,\mathbf{A}\mathbf{p}=\mathbf{b}\}
\end{equation*}
where \(S=\{\mathbf{s}\in K^n\mid\mathbf{A}\mathbf{s}=0\}\) is an \(n-\rank(\mathbf{A})\) subspace of \(K^n\) and \(\mathbf{p}\) is a particular solution.

Proof We must have \(\rank(\mathbf{A}\mathbf{|}\mathbf{b})\geq\rank(\mathbf{A})\). If \(\rank(\mathbf{A}){<}\rank(\mathbf{A}\mathbf{|}\mathbf{b})\) then the row echelon form of the augmented matrix would contain a row corresponding to an impossible equation implying no solutions for the system. Conversely, note that if \(\mathbf{A}\mathbf{x}=\mathbf{b}\) has no solution then \(\mathbf{b}\) does not belong to the column space of \(\mathbf{A}\) so that \(\rank(\mathbf{A}\mathbf{|}\mathbf{b})>\rank(\mathbf{A})\). Let us therefore assume \(\rank(\mathbf{A})=\rank(\mathbf{A}\mathbf{|}\mathbf{b})\). In the case that \(\rank(\mathbf{A})=n\) then \(S\) is the zero vector space. In this case the solution of the system is unique since suppose \(\mathbf{p}\) and \(\mathbf{p}’\) were two particular solutions of the original system then we must have \(\mathbf{p}-\mathbf{p}’\in S\), that is, \(\mathbf{p}=\mathbf{p}’\). More generally, as observed above, the dimension of \(S\) is \(n-\rank(\mathbf{A})\) and given any particular solution \(\mathbf{p}\) of \(\mathbf{A}\mathbf{x}=\mathbf{b}\), so too is \(\mathbf{p}+\mathbf{s}\) for any \(\mathbf{s}\in S\). Moreover, if \(\mathbf{p}’\) is any other solution then we must have \(\mathbf{p}’-\mathbf{p}\in S\), in other words, \(\mathbf{p}’=\mathbf{s}+\mathbf{p}\) for some \(\mathbf{s}\in S\).

Change of Basis

Suppose \(V\) is an \(n\)-dimensional vector space with basis \(\{e_i\}\). Then we may identify \(v\in V\) with the column vector, \(\mathbf{v}\), of its components with respect to this basis,
\begin{equation*}
\mathbf{v}=
\begin{pmatrix}
v^1\\
\vdots\\
v^n
\end{pmatrix}.
\end{equation*}
If we have an alternative basis for \(V\), \(\{e’_i\}\), then with respect to this basis \(v\) will be represented by a different column vector, \(\mathbf{v}’\) say,
\begin{equation*}
\mathbf{v}’=
\begin{pmatrix}
v’^1\\
\vdots\\
v’^n
\end{pmatrix},
\end{equation*}
where \(v’^i\) are the components of \(v\) with respect to this alternative basis. But \(\{e_i\}\) and \(\{e’_i\}\) must be related via an invertible matrix \(\mathbf{P}\), which we’ll call the change of basis matrix, according to \(e’_i=P_i^je_j\). Then, since \(v=v’^ie’_i=v’^iP_i^je_j=v^je_j\) we have that
\(\mathbf{v}’=\mathbf{P}^{-1}\mathbf{v}\). We say that the components of a vector transform with the inverse of the change of basis matrix, that is they transform contravariantly 1.

Now suppose that in addition we have an \(m\) dimensional space \(W\) with bases \(\{f_i\}\) and \(\{f’_i\}\) related by a change of basis matrix \(\mathbf{Q}\) according to \(f’_i=Q_i^jf_j\). Let us consider how linear transformations and their matrix representations are affected by change of bases. If a linear transformation \(T\in\mathcal{L}(V,W)\) is represented with respect to the bases \(\{e_i\}\) and \(\{f_i\}\) by the matrix \(\mathbf{T}\) with components \(T_i^j\), then we consider its representation, \(\mathbf{T}’\) say, with respect to the alternative bases \(\{e’_i\}\) and \(\{f’_i\}\). Since any \(v\in V\) can be written either as \(v=v^ie_i=v^i{P^{-1}}_i^je’_j\) or \(v=v’^ie’_i\) and any \(w\in W\) either as \(w=w^if_i=w^i{Q^{-1}}_i^jf’_j\) or \(w=w’^if’_i\), we have
\begin{equation*}
w’^j={Q^{-1}}_i^jw^i={Q^{-1}}_i^jT_k^iv^k={Q^{-1}}_i^jT_k^iP_l^kv’^l
\end{equation*}
as well as \(w’^j={T’}_i^j v’^i\). So we must have,
\begin{equation*}
{T’}_i^j={Q^{-1}}_k^jT_l^kP_i^l.
\end{equation*}
That is, \(\mathbf{T}’=\mathbf{Q}^{-1}\mathbf{T}\mathbf{P}\). The matrices \(\mathbf{T}’\) and \(\mathbf{T}\) are said to be equivalent. Correspondingly, two linear transformations, \(T:V\mapto W\) and \(T’:V\mapto W\) are said to be equivalent if there exist automorphisms \(P\in\text{GL}(V)\) and \(Q\in\text{GL}(W)\) such that \(T’=Q^{-1}TP\).

Lemma If \(T:V\mapto W\) is a linear transformation and \(P\in\text{GL}(V)\) and \(Q\in\text{GL}(W)\) then
\begin{equation}
\dim\ker T=\dim\ker QTP \qquad\text{and}\qquad \dim\img T=\dim\img QTP. \label{equ:rank conservation}
\end{equation}

Proof \(P\) induces an isomorphism \(\ker T\cong\ker QTP\), as suppose \(u\in\ker QTP\), then \(QTPu=0\iff Q(TPu)=0\iff TPu=0\). So the restriction of \(P\) to \(\ker QTP\) maps \(\ker QTP\) to \(\ker T\). Since \(P\) is invertible this is an isomorphism. Similarly, \(Q\) can be seen to induce an isomorphism \(\img T\cong\img QTP\), which follows in any case from the isomorphism of kernels by the rank-nullity theorem. \(\blacksquare\)

This result tells us, in particular, that equivalent linear transformations share the same rank. For an \(m\times n\) matrix \(\mathbf{A}\), the rank, \(\rank(\mathbf{A})\), is defined to be that of the corresponding linear transformation, \(L_\mathbf{A}\), that is \(\rank(\mathbf{A})=\rank(L_\mathbf{A})\). Now we may regard \(L_\mathbf{A}\) as a map \(L_\mathbf{A}:K^n\mapto K^m\) and taking \(\{\mathbf{e}_i\}\) to be the standard basis of \(K^n\), then \(L_\mathbf{A}\mathbf{e}_1,L_\mathbf{A}\mathbf{e}_2,\dots,L_\mathbf{A}\mathbf{e}_n\) span \(\img L_\mathbf{A}\). But \(L_\mathbf{A}\mathbf{e}_i\) is simply the \(i\)th column of \(\mathbf{A}\) so we see that the rank of the matrix \(\mathbf{A}\) is just the dimension of the space spanned by its columns. What is the dimension of the row space?

Let us denote by \(\{k_i\}\), \(1\leq i\leq d\), a basis of \(\ker L_\mathbf{A}\), and extend this to a basis, \(e_1,\dots,e_{n-d},k_1,\dots,k_d\) of \(K^n\). Then, as we already saw in the proof of the rank-nullity theorem, the elements \(f_i=L_\mathbf{A}e_i\in K^m\), \(1\leq i\leq n-d\), are linearly independent and so can be extended to a basis \(f_1,\dots,f_m\) of \(W\). So we have constructed new bases for \(V\) and \(W\) respectively with respect to which the matrix representation of \(L_\mathbf{A}\) has the particularly simple form
\begin{equation}\tilde{\mathbf{A}}=
\begin{pmatrix}
\mathbf{I}_{n-d}&\mathbf{0}_{n-d,d}\\
\mathbf{0}_{m-n+d,n-d}&\mathbf{0}_{m-n+d,d}
\end{pmatrix}\label{equ:fully reduced matrix}
\end{equation}
where \(\mathbf{I}_d\) is the \(d\times d\) identity matrix and \(\mathbf{0}_{m,n}\) the \(m\times n\) zero matrix. The rank of this matrix is of course \(n-d\), simply the number of \(1\)s. From \eqref{equ:rank conservation}, we know that equivalent linear transformations have isomorphic kernels and images, so equivalent matrices have the same rank. In other words, any matrix \(\mathbf{A}\) may be factorised as
\begin{equation}
\mathbf{A}=\mathbf{Q}\begin{pmatrix}
\mathbf{I}_{n-d}&\mathbf{0}_{n-d,d}\\
\mathbf{0}_{m-n+d,n-d}&\mathbf{0}_{m-n+d,d}
\end{pmatrix}
\mathbf{P}^{-1}=\mathbf{Q}\tilde{\mathbf{A}}\mathbf{P}^{-1},\label{matrix factorization}
\end{equation}
and if it is equivalent to another matrix of the same form as \(\tilde{\mathbf{A}}\), say \(\mathbf{B}\), then \(\mathbf{B}=\tilde{\mathbf{A}}\). Given this factorisation, it is clear that \(\rank\mathbf{A}^\mathsf{T}=\rank\mathbf{A}\) from which it follows that the dimensions of the row and column spaces of any matrix are equal.

To summarise, for any pair of vector spaces \(V\) and \(W\), of dimensions \(n\) and \(m\) respectively, the linear transformations, \(\mathcal{L}(V,W)\), are determined, up to change of the respective bases, by their rank which is bounded above by \(\min(n,m)\).

A nice way to restate this conclusion is in the language of group actions and their orbits. Recall that if \(G\) is a group and \(X\) some set then an action of \(G\) on \(X\) is a (group) homomorphism between \(G\) and \(S_X\), the group of all permutations of the elements of \(X\). Now \(\text{GL}(V)\times\text{GL}(W)\), with the obvious group structure, has an action on the space \(\mathcal{L}(V, W)\) defined by \((Q,P)T=QTP^{-1}\) for \(P\in\text{GL}(V)\) and \(Q\in\text{GL}(W)\). Recall also that the action of a group on a set \(X\) partitions \(X\) into orbits, where the orbit of some \(x\in X\) is defined to be the subset, \(\{gx\mid\forall g\in G\}\), of \(X\). In our case we see that orbits of the action of \(\text{GL}(V)\times\text{GL}(W)\) on \(\mathcal{L}(V, W)\) are precisely the linear transformations of a given rank. The headline, as it were, is therefore the following.

The orbits of the action of \(\text{GL}(V)\times\text{GL}(W)\) on \(\mathcal{L}(V, W)\) are those elements of \(\mathcal{L}(V,W)\) of a given rank and thus are in bijection with the set \(\{d\mid 0\leq d\leq \min(\dim V,\dim W)\}\).

Notes:

  1. If we assemble the old and new basis vectors into row vectors \(\mathbf{e}\) and \(\mathbf{e}’\) respectively then \(\mathbf{e}’=\mathbf{e}\mathbf{P}\) but \(\mathbf{v}’=\mathbf{P}^{-1}\mathbf{v}\).