Rank and Solutions of Linear Systems

To actually compute the rank of a given linear transformation one would choose bases, express the transformation as a matrix and then use elementary row operations. Recall there are three types of elementary row operations which may be performed on an \(m\times n\) matrix \(A\), with corresponding elementary matrices obtained by carrying out the same operations on the identity matrix of the appropriate size. They are

Row interchange Interchanging the \(i\)th and \(j\)th rows. For example interchanging the 1st and 3rd rows of
\begin{equation*}
\begin{pmatrix}
1&2&3&4\\
3&4&5&6\\
7&8&9&10\\
11&12&13&14
\end{pmatrix}
\end{equation*}
we get
\begin{equation*}
\begin{pmatrix}
7&8&9&10\\
3&4&5&6\\
1&2&3&4\\
11&12&13&14
\end{pmatrix}
\end{equation*}
which is also obtained by multiplying from the left by
\begin{equation*}
\begin{pmatrix}
0&0&1&0\\
0&1&0&0\\
1&0&0&0\\
0&0&0&1
\end{pmatrix}
\end{equation*}

Row multiplication Multiplying the \(i\)th row by some number. For example multiplying the 1st row of\begin{equation*}
\begin{pmatrix}
1&2&3&4\\
3&4&5&6
\end{pmatrix}
\end{equation*}
by 2 we get
\begin{equation*}
\begin{pmatrix}
2&4&6&8\\
3&4&5&6
\end{pmatrix}
\end{equation*}
which is also obtained by multiplying from the left by
\begin{equation*}
\begin{pmatrix}
2&0\\
0&1
\end{pmatrix}
\end{equation*}

Row addition Replacing the \(i\)th row by the \(i\)th row plus some number times the \(j\)th row. For example replacing the 2nd row of
\begin{equation*}
\begin{pmatrix}
1&2&3\\
4&5&6\\
7&8&9
\end{pmatrix}
\end{equation*}
by the second row minus 4 times the 1st row we get
\begin{equation*}
\begin{pmatrix}
1&2&3\\
0&-3&-6\\
7&8&9
\end{pmatrix}
\end{equation*}
which is also obtained by multiplying from the left by
\begin{equation*}
\begin{pmatrix}
1&0&0\\
-4&1&0\\
0&0&1
\end{pmatrix}
\end{equation*}

Elementary row matrices are of course invertible, their inverses corresponding to the undoing of the original elementary row operation. Elementary row operations can be used to solve linear systems \(\mathbf{A}\mathbf{x}=\mathbf{b}\) by reducing the augmented matrix \(\mathbf{A}\mathbf{|}\mathbf{b}\) to row echelon form (all nonzero rows are above any rows of all zeroes and the leading coefficient of a nonzero row is 1 and is always strictly to the right of the leading coefficient of the row above it) or reduced row echelon form (row echelon form but with each leading coefficient being the only non-zero entry in its column). If we reduce a matrix \(\textbf{A}\) to reduced row echelon form using elementary row operations and record each step as an elementary row matrix \(\mathbf{E}_i\) then we obtain an expression for \(\textbf{A}\) as
\begin{equation}
\mathbf{A}=\mathbf{E}_1^{-1}\dots\mathbf{E}_a^{-1}\mathbf{A}’
\end{equation}
for some number \(a\) of row operations and \(\mathbf{A}’\) in reduced row echelon form.
The rank of the original matrix can already be recovered at this point as the number of non-zero rows of \(\mathbf{A}’\). Indeed, a matrix is invertible if and only if it can be expressed as a product elementary row matrices. In the more general case, by simplifying further using elementary column operations (defined in the obvious way), and recording the steps with the corresponding elementary column matrices \(\mathbf{F}_i\) we arrive at a matrix of the form equation and a factorisation of \(\mathbf{A}\) of the form,
\begin{equation}
\mathbf{A}=\mathbf{E}_1^{-1}\dots\mathbf{E}_a^{-1}\begin{pmatrix}
\mathbf{I}_{n-d}&\mathbf{0}_{n-d,d}\\
\mathbf{0}_{m-n+d,n-d}&\mathbf{0}_{m-n+d,d}
\end{pmatrix}\mathbf{F}_b^{-1}\dots\mathbf{F}_1^{-1},
\end{equation}
exactly as we established in equation.

Here’s another perspective on the equivalence of the dimension of the row and column spaces of a matrix. Observe first that if an \(m\times n\) matrix \(\mathbf{A}\) is the product of an \(m\times r\) matrix \(\mathbf{B}\) and an \(r\times n\) matrix \(\mathbf{C}\), \(\mathbf{A}=\mathbf{B}\mathbf{C}\), then the \(i\)th row of \(\mathbf{A}\) is just a linear combination of the \(r\) rows of \(\mathbf{C}\) with coefficients from the \(i\)th column of \(\mathbf{B}\),
\begin{align*}
\begin{pmatrix}
A_1^i&\dots&A^i_n
\end{pmatrix}&=B^i_1
\begin{pmatrix}
C^1_1&\dots&C^1_n
\end{pmatrix}\\
&+B^i_2
\begin{pmatrix}
C^2_1&\dots&C^2_n
\end{pmatrix}\\
&+\dots\\
&+B^i_r
\begin{pmatrix}
C^r_1&\dots&C^r_n
\end{pmatrix}.
\end{align*}
Similarly the \(j\)th column of \(\mathbf{A}\) is a linear combination of the \(r\) columns of \(\mathbf{B}\) with coefficients from the \(j\)th column of \(\mathbf{C}\),
\begin{equation*}
\begin{pmatrix}
A^1_j\\
\vdots\\
A^m_j
\end{pmatrix}=C^1_j
\begin{pmatrix}
B^1_1\\
\vdots\\
B^m_1
\end{pmatrix}
+C^2_j
\begin{pmatrix}
B^1_2\\
\vdots\\
B^m_2
\end{pmatrix}
+\dots
+C^r_j
\begin{pmatrix}
B^1_r\\
\vdots\\
B^m_r
\end{pmatrix}.
\end{equation*}
Now suppose that the dimension of the space spanned by the columns of an \(m\times n\) matrix is \(r\). This means that we can find \(r\) column vectors \(b_1,\dots,b_r\) in terms of which each column, \(a_i\), of \(A\) can be written as
\begin{equation*}
a_i=c_{1i}b_1+\dots+c_{ri}b_r.
\end{equation*}
But in so doing we have constructed the discussed \(\mathbf{A}=\mathbf{B}\mathbf{C}\) factorisation of \(\mathbf{A}\) and as such know that any row of \(\mathbf{A}\) can be expressed as a linear combination of the \(r\) rows of \(\mathbf{C}\). Moreover if the rows of \(\mathbf{A}\) could be expressed as a linear combination of fewer than \(r\) rows of \(\mathbf{C}\) then this would in turn imply, running the reasoning starting from the dimension of the row space, that the column space had dimension less than \(r\) contradicting our initial assumption. Thus we have established that the dimensions of the row and column spaces of an \(m\times n\) matrix \(\mathbf{A}\) are identical.

Recall that we noted that the solution set of any system of homogeneous equations in \(n\) variables over a field \(K\) is a subspace of \(K^n\). What is the dimension of this subspace? Writing the system of equations in the form \(\mathbf{A}\mathbf{x}=0\) where \(\mathbf{A}\) is some \(m\times n\) matrix we can then proceed to carry out elementary row operations on \(\mathbf{A}\) to transform the system of equations into echelon form. In so doing we produce a bijection, indeed a vector space isomorphism, between the solution set, let’s call it \(S\), and \(K^{n-r}\) where \(r\) is the rank of \(A\), in other words the number of “steps” of the echelon form. Think of this as arising as follows: For any \(n\)-tuple \((x_1,\dots,x_n)\in S\) we discard the \(x_j\) where \(j\) runs through the column indices corresponding to the beginnings of “stairs”. We now have the necessary machinery to make good on our claim that the abstract theory of vector spaces illuminates the theory of solutions of simultaneous linear equations in \(n\) unknowns just as pictures of intersecting planes do in the case of 3 unknowns.

Theorem (Rouché-Capelli) A system of linear equations in \(n\) variables, \(\mathbf{A}\mathbf{x}=\mathbf{b}\), has a solution if and only if the rank of its coefficient matrix, \(\mathbf{A}\), is equal to the rank of its augmented matrix, \(\mathbf{A}\mathbf{|}\mathbf{b}\). If a solution exists and \(\rank(\mathbf{A})=n\) the solution is unique. If a solution exists and \(\rank(\mathbf{A}){<}n\) there are infinite solutions. In either of these cases the set of solutions is of the form
\begin{equation*}
\{\mathbf{s}+\mathbf{p}\mid\mathbf{s}\in S,\mathbf{A}\mathbf{p}=\mathbf{b}\}
\end{equation*}
where \(S=\{\mathbf{s}\in K^n\mid\mathbf{A}\mathbf{s}=0\}\) is an \(n-\rank(\mathbf{A})\) subspace of \(K^n\) and \(\mathbf{p}\) is a particular solution.

Proof We must have \(\rank(\mathbf{A}\mathbf{|}\mathbf{b})\geq\rank(\mathbf{A})\). If \(\rank(\mathbf{A}){<}\rank(\mathbf{A}\mathbf{|}\mathbf{b})\) then the row echelon form of the augmented matrix would contain a row corresponding to an impossible equation implying no solutions for the system. Conversely, note that if \(\mathbf{A}\mathbf{x}=\mathbf{b}\) has no solution then \(\mathbf{b}\) does not belong to the column space of \(\mathbf{A}\) so that \(\rank(\mathbf{A}\mathbf{|}\mathbf{b})>\rank(\mathbf{A})\). Let us therefore assume \(\rank(\mathbf{A})=\rank(\mathbf{A}\mathbf{|}\mathbf{b})\). In the case that \(\rank(\mathbf{A})=n\) then \(S\) is the zero vector space. In this case the solution of the system is unique since suppose \(\mathbf{p}\) and \(\mathbf{p}’\) were two particular solutions of the original system then we must have \(\mathbf{p}-\mathbf{p}’\in S\), that is, \(\mathbf{p}=\mathbf{p}’\). More generally, as observed above, the dimension of \(S\) is \(n-\rank(\mathbf{A})\) and given any particular solution \(\mathbf{p}\) of \(\mathbf{A}\mathbf{x}=\mathbf{b}\), so too is \(\mathbf{p}+\mathbf{s}\) for any \(\mathbf{s}\in S\). Moreover, if \(\mathbf{p}’\) is any other solution then we must have \(\mathbf{p}’-\mathbf{p}\in S\), in other words, \(\mathbf{p}’=\mathbf{s}+\mathbf{p}\) for some \(\mathbf{s}\in S\).