Category Archives: Vector Spaces

The Dual Vector Space

Definition Let \(V\) be a vector space over a field \(K\). Then the dual space of \(V\) is \(V^*=\mathcal{L}(V,K)\). That is, the space of linear functions or functionals on \(V\).

If \(\{e_i\}\) is a basis of an \(n\)-dimensional vector space \(V\), then we can define elements \(e^i\) in \(V^*\) according to \(e^i(e_j)=\delta^i_j\) with the obvious linear extension to the whole of \(V\). It is not difficult to see that the \(e^i\) form a basis of \(V^*\). Indeed if \(c_ie^i=0\) then \(c_ie^i(e_j)=c_j=0\) for all \(j\) and if \(\varphi\in V^*\) then we can write \(\varphi=\varphi(e_i)e^i\) since on any basis element, \(e_j\), \(\varphi(e_i)e^i(e_j)=\varphi(e_j)\). Given a basis \(\{e_i\}\) of \(V\), the basis \(\{e^i\}\) so defined is called the dual basis of \(V^*\).

As \(\dim V^*=\dim V\) we have \(V^*\cong V\). Note that this isomorphism is not basis independent. In other words it is not canonical. By contrast, the vector spaces \(V\) and \(V^{**}\) are canonically isomorphic. Indeed, consider the map \(V\mapto V^{**}\) which takes \(v\mapsto\tilde{v}\) where \(\tilde{v}(\varphi)=\varphi(v)\) for any \(\varphi\in V^*\). Then it is not difficult to see that \(\tilde{v}\) so defined is certainly a linear function on \(V^*\) and that the map \(v\mapsto\tilde{v}\) is linear. It is injective since if \(v\neq0\) then we can extend \(v\) to a basis of \(V\) and so define on \(V\) a linear function \(\varphi\) such that \(\varphi(v)\neq0\) so \(\tilde{v}(\varphi)\neq0\) and thus \(\tilde{v}\neq0\). Since \(V\) is finite dimensional the map is an isomorphism.

Given a choice of basis, vectors \(v\) in \(V\) were identified with the column vectors, \(\mathbf{v}\), of their components. Similarly, given a choice of basis of the dual space, linear functionals \(\varphi\in V^*\) can naturally be identified with row vectors \(\boldsymbol{\varphi}\) of their components. Then \(\varphi(v)\) is simply the matrix product of the \(1\times n\) row vector \(\boldsymbol{\varphi}\) with the \(n\times 1\) column vector \(\mathbf{v}\), that is, \(\varphi(v)=\boldsymbol{\varphi}\mathbf{v}\).

We’ve seen that when we make a change of basis in \(V\), with \(e’_i=P_i^je_j\), then \(\mathbf{v}’=\mathbf{P}^{-1}\mathbf{v}\). The corresponding dual bases, \(\{e^i\}\) and \(\{e’^j\}\), must be related as say, \(e’^i=Q^i_je^j\), where \(Q^i_j\) are elements of an invertible matrix \(\mathbf{Q}\). Then we have \(e’^i(e’_j)=\delta^i_j=(Q^ike^k)(P^l_je_l)=Q^i_kP^k_j\). That is, \(\mathbf{Q}=\mathbf{P}^{-1}\). Now for any \(\varphi\in V^*\) we have the alternative expansions \(\varphi=\varphi_ie^i=\varphi’_ie’^i\), from which it follows that \(\boldsymbol{\varphi}’=\boldsymbol{\varphi}\mathbf{P}\). The components of vectors in the dual space thus transform with the change of basis matrix and so are said to transform covariantly.

Suppose \(V\) and \(W\) are respectively \(n\) and \(m\)-dimensional vector spaces over \(K\), with \(T\in\mathcal{L}(V,W)\). Then we can define the dual map \(T^*\), \(T^*:W^*\mapto V^*\) as \(T^*\omega=\omega T\). Having chosen bases for \(V\) and \(W\), \(\{e_i\}\) and \(\{f_i\}\) respectively, so that with respect to these bases \(\mathbf{T}\) is the matrix representation of \(T\), let us consider the matrix representation of \(T^*\) with respect to the dual bases \(\{e^i\}\) and \(\{f^i\}\). We have,
\begin{equation*}
(T^*f^i)(e_j)=f^i(Te_j)=f^i(T_j^kf_k)=T_j^i
\end{equation*}
but also
\begin{equation*}
(T^*f^i)(e_j)=((T^*)^i_ke^k)(e_j)=(T^*)^i_j,
\end{equation*}
that is \(\mathbf{T}^*=\mathbf{T}\). Note that if we’d chosen to identify elements of the dual space with column vectors of components then we would have found that the matrix of the dual map with respect to the dual bases was the transpose of the matrix of the original map.

Definition If \(U\) is a subspace of \(V\) then the annihilator of \(U\), \(U^\circ\) is defined to be
\begin{equation}
U^\circ=\{\varphi\in V^*\mid\varphi(u)=0\;\forall u\in U\}.
\end{equation}

Given a subspace \(U\) of \(V\) any element \(\varphi\in V^*\) has a restriction \(\varphi|_U\). This defines a linear operator \(\pi_U:V^*\mapto U^*\) as \(\pi_U(\varphi)=\varphi|_U\). Notice that \(\ker\pi_U=U^\circ\) and also that \(\img\pi_U=U^*\) so, since \(\dim V=\dim V^*\), we have
\begin{equation}
\dim U+\dim U^\circ=\dim V.\label{equ:annihilator dimension}
\end{equation}

Now suppose \(T\in\mathcal{L}(V,W)\), so that \(T^*\in\mathcal{L}(W^*,V^*)\), and consider \(\ker T^*\). That is, we consider elements \(\omega\in W^*\) such that \(T^*\omega=\omega T=0\). It is not difficult to see that this is precisely the annihilator of \(\img T\), that is
\begin{equation}
\ker T^*=(\img T)^\circ.
\end{equation}
Having observed that they have the same matrix representation (albeit modulo a different convention with regard to row and column vectors), we know that \(T\) and \(T^*\) have the same rank. But we can see this in a basis free fashion as follows. By the rank-nullity theorem we have \(\dim\img T^*=\dim W-\dim\ker T^*\). That is, \(\dim\img T^*=\dim W-\dim(\img T)^\circ\), but by \eqref{equ:annihilator dimension}, this just says that
\begin{equation}
\dim\img T^*=\dim\img T.
\end{equation}
Finally, maintaining a certain symmetry, we have
\begin{equation}
\img T^*=(\ker T)^\circ.
\end{equation}
To see this, consider \(\varphi\in\img T^*\), that is \(\varphi=\omega T\) for some \(\omega\in W^*\). Now if \(v\in\ker T\) then clearly \(\varphi(v)=\omega(Tv)=0\) so \(\varphi\in(\ker T)^\circ\). Thus we certainly have \(\img T^*\subseteq(\ker T)^\circ\). But \(\dim\ker T=\dim V-\dim\img T\) from the rank-nullity theorem, and we have just observed that \(\dim\ker T=\dim V-\dim(\ker T)^\circ\) so indeed \(\dim\img T=\dim(\ker T)^\circ\).

Realification and Complexification

In certain circumstances it turns out to be useful to pass from a complex vector space to a ‘corresponding’ real vector space or vice versa. The former is called realification and is really nothing more than restricting scalar multiplication to reals in the original complex vector space. If \(V\) is a vector space over \(\CC\), then its realification, \(V_\RR\), is the set \(V\) with vector addition and scalar multiplication by reals inherited unchanged from \(V\) (and the complex multiplication ‘forgotten’). If \(\{e_1,\dots,e_n\}\) is a basis of \(V\), then consider the vectors \(\{e_1,\dots,e_n,ie_1,\dots,ie_n\}\). It’s not difficult to see that these vectors form a basis of \(V_\RR\) so \(\dim V_\RR=2\dim V\). If \(T:V\mapto W\) is a linear transformation of complex vector spaces then we may also view it as a linear transformation, \(T_\RR:V_\RR\mapto W_\RR\), of real vector spaces. If the matrix representation of \(T\) with with respect to bases \(\{e_i\}\) and \(\{f_i\}\) of \(V\) and \(W\) respectively, is \(\mathbf{T}=\mathbf{A}+i\mathbf{B}\), with \(\mathbf{A}\) and \(\mathbf{B}\) both real matrices, then with respect to the bases \(\{e_1,\dots,e_n,ie_1,\dots,ie_n\}\) and \(\{f_1,\dots,f_n,if_1,\dots,if_n\}\), of respectively \(V_\RR\) and \(W_\RR\), \(T_\RR\) has the matrix representation,
\begin{equation}
\begin{pmatrix}
\mathbf{A}&-\mathbf{B}\\
\mathbf{B}&\mathbf{A}
\end{pmatrix}.
\end{equation}

This process of realification produces a rather special kind of real vector space, since it comes equipped with a particular linear operator which we’ll denote \(\iota\in\mathcal{L}(V_\RR)\), given by \(\iota v=iv\) and such that \(\iota^2=-\id_{V_\RR}\). This is the canonical example of a complex structure, a linear operator \(\iota\in\mathcal{L}(V)\) on any real vector space \(V\) such that \(\iota^2=-\id_V\).

Given a real vector space \(V\) equipped with a a complex structure \(\iota\) we may form a complex vector space structure on the set \(V\) by inheriting vector addition from \(V\) and defining complex scalar multiplication by \((a+ib)v=av+b\iota v\). It is not difficult to see that if \(\iota\) is the canonical complex structure on a real vector space \(V_\RR\) which is the realification of a complex vector space \(V\) then we just recover in this way the original complex vector space \(V\).

How might we proceed if we start from a real vector space without a complex structure? In particular, we would like to understand, in a basis free sense, the fact that whilst a real \(m\times n\) matrix \(\mathbf{A}\) is of course a linear transformation from \(\RR^n\) to \(\RR^m\), it may also be regarded as linear transformation from \(\CC^n\) to \(\CC^m\).

For any real vector space \(V\) we consider the (exterior) direct sum space \(V\oplus V\). We can then define complex structure on \(V\oplus V\), \(\iota:V\oplus V\mapto V\oplus V\) by \(\iota(v,v’)=(-v’,v)\). Then \(\iota^2=-\id_{V\oplus V}\), is clearly an isomorphism, and allows us to write any \((v,v’)\) in the suggestive form
\begin{equation*}
(v,v’)=(v,0)+\iota(v’,0).
\end{equation*}
The complexification of \(V\), \(V_\CC\), is defined to be the set \(V\oplus V\) equipped with the obvious vector addition and with complex multiplication given by
\begin{equation*}
(a+ib)(v,v’)\equiv(a+b\iota)(v,v’)=(av-bv’,av’+bv).
\end{equation*}
That \(V_\CC\) so defined is indeed a complex vector space is then obvious, given the properties of \(\iota\). It should also be clear that if \(V\) is an \(n\) dimensional real vector space, with basis \(\{e_i\}\) say, then the \((e_i,0)\) span \(V_\CC\). To see that they are also linearly independent over \(\CC\), suppose
\begin{equation*}
(a^1+ib^1)(e_1,0)+\dots+(a^n+ib^n)(e_n,0)=0.
\end{equation*}
This is true if and only if, \((a^ie_i,b^ie_i)=0\), itself true if and only if \(a^i=b^i=0\) for all \(i\).

Now, suppose \(T\in\mathcal{L}(V,W)\), is a linear transformation of real vector spaces. Define \(T_\CC:V_\CC\mapto W_\CC\) by \(T_\CC(v,v’)=(Tv,Tv’)\). Then since \(T_\CC(\iota(v,v’))=\iota(Tv,Tv’)\) it is clear that \(T_\CC\in\mathcal{L}(V_\CC,W_\CC)\). This is the unique linear map such that the diagram,
\begin{equation*}
\begin{CD}
V @>T>> W\\
@VV V @VV V\\
V_\CC @>T_\CC>> W_\CC
\end{CD}
\end{equation*}
in which the vertical maps are the standard inclusions (e.g.\ \(v\mapsto(v,0)\) of \(V\) into \(V_\CC\)), commutes. Given bases of \(V\) and \(W\), in which the matrix representation of \(T\) is \(\mathbf{T}\), it is clear that the matrix representation of \(T_\CC\) with respect to the bases obtained by the natural inclusions is identical.

Note that on \(V_\CC\) there is a natural notion of complex conjugation. For any \(v=(x,y)\in V_\CC\), \(x,y\in V\), we define \(v^*=(x,-y)\). For any \(T\in\mathcal{L}(V,W)\) we have \(T_\CC{(x,y)}^*=(T_\CC(x,y))^*\), that is, the complexification \(T_\CC\) commutes with this conjugation. In fact it is not difficult to show that a linear transformation in \(\mathcal{L}(V_\CC,W_\CC)\) is a complexification of a linear transformation in \(\mathcal{L}(V,W)\) if and only if it commutes with complex conjugation. Finally, note that \(\mathcal{L}(V,W)_\CC\cong\mathcal{L}(V_\CC,W_\CC)\), with the identification, \((S,T)\mapsto S_\CC+iT_\CC\), for \(S,T\in\mathcal{L}(V,W)\) and that therefore complex conjugation is also well defined in \(\mathcal{L}(V_\CC,W_\CC)\).

Sums, Intersections and Projections

While the intersection, \(U_1\cap U_2\), of two subspaces of a vector space \(V\) is again a subspace, the union, \(U_1\cup U_2\), is not unless \(U_1\subseteq U_2\) or \(U_2\subseteq U_1\). The sum \(U_1+U_2\) defined as \(\{u_1+u_2\mid u_1\in U_1,u_2\in U_2\}\) is the ‘smallest’ vector space containing \(U_1\cup U_2\). Just how large is it? Its not difficult to see that if we take \(u_1,\dots,u_d\) to be the basis \(U_1\cap U_2\) and extend it to a basis \(u_1,\dots,u_d,v_1,\dots,v_r\) of \(U_1\) and a basis \(u_1,\dots,u_d,w_1,\dots,w_s\) of \(U_2\), then \(u_1,\dots,u_d,v_1,\dots,v_r,w_1,\dots,w_s\) is a basis of \(U_1+U_2\) and we have
\begin{equation}
\dim(U_1\cap U_2)+\dim(U_1+U_2)=\dim(U_1)+\dim(U_2).
\end{equation}

Example In 3-dimensional space, if we consider two distinct planes which pass through the origin, so a pair of 2-dimensional subspaces, then their sum is of course the 3-dimensional space whilst their intersection is a 1-dimensional line, 1+3=2+2.

If \(U_1\), \(U_2\) and \(U_3\) are subspaces of a vector space \(V\) then notice that in general \(U_1\cap(U_2+U_3)\) and \((U_1\cap U_2)+(U_1\cap U_3)\) are not equal, intersection is not distributive over addition. Indeed, consider the subspaces \(U_1=\Span(\mathbf{e}_1+\mathbf{e}_2)\), \(U_2=\Span(\mathbf{e}_1)\) and \(U_3=\Span(\mathbf{e}_2)\) of \(\RR^2\). Then \(U_1\cap(U_2+U_3)=U_1\) but \(U_1\cap U_2=0=U_1\cap U_3\). Rather, we have the following equality,
\begin{equation}
U_1\cap(U_2+(U_1\cap U_3))=U_1\cap U_2+U_1\cap U_3.
\end{equation}
That \(U_1\cap(U_2+(U_1\cap U_3))\subset U_1\cap U_2+U_1\cap U_3\) follows since if \(v\in U_1\cap(U_2+(U_1\cap U_3))\) then \(v=u_1=u_2+u_{13}\) where \(u_1\in U_1\), \(u_2\in U_2\) and \(u_{13}\in (U_1\cap U_3)\). But then \(u_2\in U_1\) so indeed \(v\in U_1\cap U_2+U_1\cap U_3\). The reverse inclusion is immediate.

Definition A sum \(U=\sum_{i=1}^r U_i\) of vector subspaces \(U_i\subseteq V\) is direct if every \(u\in U\) can be written uniquely as \(u=u_1+\dots+u_r\) for some \(u_i\in U_i\).

Lemma The sum \(U_1+U_2\) of any pair of subspaces is direct if and only if \(U_1\cap U_2={0}\).

Proof If the sum is direct but there is some non-zero \(v\in U_1\cap U_2\), we could write the zero vector in two ways as \(0=0+0\) and \(0=v+(-v)\), contradicting directness. Conversely, if \(U_1\cap U_2=\{0\}\) and \(v=u_1+u_2\) as well as \(v=u’_1+u’_2\) then we must have \(u_1-u’_1=u’_2-u_2=0\) so the decomposition is unique. \(\blacksquare\)

More generally, the following theorem gives three equivalent criteria for a sum of arbitrary length to be direct.

Theorem A sum \(U=\sum_{i=1}^rU_i\) of vector subspaces \(U_i\subseteq V\) is direct if and only if one of the following three equivalent criteria holds:

  1. For each \(i\), \(U_i\cap\left(\sum_{j\neq i}U_j\right)=0\).
  2. If \(u_1+\dots+u_r=0\), \(u_i\in U_i\), then \(u_i=0\).
  3. Every \(u\in U\) can be written uniquely as \(u=u_1+\dots+u_r\) for some \(u_i\in U_i\)

Proof Suppose \((2)\) is false then \(-u_1=u_2+\dots+u_r\) which contradicts \((1)\), so \((1)\) implies \((2)\). Suppose \((3)\) were false so that we had \(u=u_1+\dots+u_r\) and \(u=u’_1+\dots+u’_r\), with not all \(u_i=u’_i\). Then subtracting these implies \((2)\) is false so \((2)\) implies \((3)\). Finally suppose \((1)\) is false, then we have some \(u\in U\) such that \(u\in U_i\) and \(u=u_1+\dots+u_{i-1}+u_{i+1}+\dots+u_r\). This implies \((3)\) is false so \((3)\) implies \((1)\).\(\blacksquare\)

Remark If \(\{e_1,\dots,e_n\}\) is a basis of \(V\) then clearly \(V=\sum_{i=1}^n\Span(e_i)\) is a direct sum.

Remark A situation which sometimes arises is that we know \(V=\sum_{i=1}^rU_i\) and also that \(\sum_{i=1}^r\dim U_i=\dim V\). Then choosing bases for each \(U_i\) we obtain a collection of vectors which certainly span \(V\). But since \(\sum_{i=1}^r\dim U_i=\dim V\) there must be \(\dim V\) of these vectors so they are a basis for \(V\) and we may conclude that the sum \(V=\sum_{i=1}^rU_i\) is direct.

If \(V=U+W\) is a direct sum of subspaces \(U\) and \(W\) these subspaces are said to be complementary. Given a subspace \(U\) of \(V\) then a complementary subspace \(W\) always exists. Just take a basis for \(U\), \(u_1,\dots,u_r\), and extend it to a basis \(u_1,\dots,u_r,w_1,\dots,w_{n-r}\) of \(V\) then \(W=\Span(w_1,\dots,w_{n-r})\) is a complementary subspace. Note that defining, for example, \(W’=\Span(w_1+u_1,w_2,\dots,w_{n-r})\) we obtain another subspace, also complementary to \(U\) but not equal to \(W\). Aside from the trivial cases of 0 and \(V\) complements of subspaces are not unique.

Example In \(\RR^2\), consider the subspace \(\Span(\mathbf{e}_1)\). Then \(\Span(\mathbf{e}_2)\) and \(\Span(\mathbf{e}_1+\mathbf{e}_2)\) are both examples of complementary subspaces.

Given two arbitrary vector spaces \(U\) and \(W\) their external direct sum, \(U\oplus W\), is defined to be the product set \(U\times W\) with vector space structure given by
\begin{align}
(u_1,w_1)+(u_2,w_2)&=(u_1+u_2,w_1+w_2)&c(u,w)=(cu,cw).
\end{align}
Now suppose \(U\) and \(W\) are in fact subspaces of some vector space \(V\) and consider the map \(\pi:U\oplus W\mapto V\) defined as \(\pi(u,w)=u+w\). This is clearly a linear transformation, with \(\ker\pi=\{(u,-u)\mid u\in U\cap W\}\) and \(\img\pi=U+W\). Thus in the case that \(U+W\) is a direct sum we have \(U+W\cong U\oplus W\) and in fact, abusing notation, we write in this case \(U+W=U\oplus W\). Furthermore, applying the rank-nullity theorem to the map \(\pi\), we obtain
\begin{equation}
\dim(U\oplus W)=\dim(U)+\dim(W)\label{equ:dim of sum}
\end{equation}

Example If \(U_1\) and \(U_2\) are subspaces of a vector space \(V\) with a non-zero intersection, \(U_0=U_1\cap U_2\) then we can write \(U_1=U_0\oplus U_1’\) and \(U_2=U_0\oplus U_2’\) for some subspaces \(U_1’\) and \(U_2’\) of \(U_1\) and \(U_2\) respectively. Consider the sum \(U_0+U_1’+U_2’\). It is not difficult to see that \(U_1+U_2=U_0+U_1’+U_2’\). In fact, since \(U_1\cap U_2’=0\), as follows since if \(u\in U_1\cap U_2’\) then \(u\in U_0\) and \(u\in U_2’\) contradicting \(U_2=U_0\oplus U_2’\), we can write \(U_1+U_2=(U_0\oplus U_1′)+U_2’=(U_0\oplus U_1′)\oplus U_2’\), that is,
\begin{equation}
U_1+U_2=U_0\oplus U_1’\oplus U_2′.
\end{equation}

Example If \(T\in\mathcal{L}(V)\), then note that whilst \(\dim V=\dim\ker T+\dim\img T\) it is not generally the case that \(\ker T\cap\img T=0\) so we cannot in general describe \(V\) as the direct sum of the kernel and image of a linear operator. Consider for example the \(V=\RR^2\) then the linear operator defined as
\begin{equation*}
T=\begin{pmatrix}0&1\\0&0\end{pmatrix}
\end{equation*}
is such that
\begin{equation*}
\ker T=\img T=\left\{\begin{pmatrix}1\\0\end{pmatrix}\right\}
\end{equation*}

If \(W\) is a subspace of a vector space \(V\) then we can form the quotient space \(V/W\) which is just the quotient group equipped with the obvious scalar multiplication. If \(w_1,\dots,w_r\) is a basis for \(W\) then we can extend it to a basis \(w_1,\dots,w_r,v_{1},\dots,v_{n-r}\) for \(V\) and \(v_1+W,\dots,v_{n-r}+W\) is then a basis for \(V/W\). In particular, we have
\begin{equation}
\dim V/W=\dim V-\dim W.
\end{equation}
Now if \(T:V\mapto V\), we could define a linear operator \(T’:V/W\mapto V/W\), as
\begin{equation}
T'(v+W)=Tv+W.
\end{equation}
But for this to be well-defined, \(W\) must be \(T\)-invariant, that is \(Tw\in W\) for any \(w\in W\). For if \(v+W=v’+W\), then \(v-v’=w\) for some \(w\in W\), and we require \(T'(v+W)=T'(v’+W)\). But
\begin{equation*}
T'(v+W)=Tv+W=T(v’+w)+W,
\end{equation*}
so we must have \(Tw\in W\).

Quotients of vector spaces also allow us to factorise certain linear maps in the following sense. Suppose \(T:V\mapto W\) is a linear transformation, and \(U\) is a subspace of \(V\) such that \(U\subseteq\ker T\). Define \(\pi:V\mapto V/U\) as \(\pi v=v+U\) (\(\pi\) is clearly linear with kernel \(\ker\pi=U\)). Then there exists a linear map \(T’:V/U\mapto W\) such that \(T=T’\pi\). That is, we have ‘factorised’ \(T\) as \(T’\pi\). Indeed, \(T’\) must clearly be defined as \(T'(v+U)=Tv+U\). This is well defined since if, \(v+U=v’+U\), then, \(v-v’\in U\), so that, \(T(v-v’)=0\) or \(Tv=Tv’\), so that \(T'(v+U)=T'(v’+U)\). In such a situation, \(T\) is said to factorise through \(V/U\).

Definition A projection on \(V\) is a linear operator \(P\in\mathcal{L}(V)\) such that
\begin{equation}
P^2=P.
\end{equation}

Proposition There is a one-to-one correspondence between projections \(P\), pairs of linear transformations \(P,Q\in\mathcal{L}(V)\) such that
\begin{equation}
P+Q=\id_V\qquad\text{and}\qquad PQ=0, \label{equ:proj op alter}
\end{equation}
and direct sum decompositions
\begin{equation}
V=U\oplus W. \label{equ:proj op decomp}
\end{equation}

Proof If \(P\) is a projection then we can define \(Q=\id_V-P\) and \eqref{equ:proj op alter} is obvious. Given operators \(P\) and \(Q\) satisfying \eqref{equ:proj op alter}, we can define subspaces \(U\) and \(W\) of \(V\) as \(U=PV\) and \(W=QV\). Then \(P+Q=\id_V\) implies that \(V=U+W\). That this sum is direct follows since if an element \(v\) belonged to both \(U\) and \(W\) then \(v=Pv_1=Qv_2\) for some \(v_1,v_2\in V\) which then means that \(Pv_1=PQv_2=0\), so \(v=0\) and \(V=U\oplus W\). Clearly \(\img Q\subseteq\ker P\) and conversely if \(v\in\ker P\) then \(v=Pv+Qv=Qv\) so \(\ker P\subseteq\img Q\) and \(\ker P\cong\img Q\). Likewise, \(\ker Q\cong\img P\), so we have
\begin{equation}
V=\img P\oplus\ker P=\ker Q\oplus\img Q.
\end{equation}
Given a direct sum decomposition \(V=U\oplus W\) any \(v\in V\) can be expressed uniquely as \(v=u+w\) with \(u\in U\), \(w\in W\) and we can therefore define a linear operator \(P\) by \(Pv=u\). So defined, \(P\) is clearly a projection. \(\blacksquare\)

Thus, we cannot speak of the projection onto some subspace \(U\) only a projection. There are as many projections onto a subspace \(U\) as there are complements of \(U\). However, note that if \(V=U\oplus W\) with \(P\) the projector onto \(W\) then, \(U=\ker P\), and, \(V/\ker P\cong W\), such that, \(v+\ker P\mapsto Pv\). So it is the case that all complements of \(U\) are isomorphic.

Example In the case of \(\RR^2\), for the direct sum decomposition \(\RR^2=\Span(\mathbf{e}_1)\oplus\Span(\mathbf{e}_2)\), the corresponding projections are
\begin{equation}
\mathbf{P}=\begin{pmatrix} 1&0\\0&0\end{pmatrix}\quad\text{and}\quad\mathbf{Q}=\begin{pmatrix} 0&0\\0&1\end{pmatrix},
\end{equation}
with \(\img\mathbf{P}=\Span(\mathbf{e}_1)\) and \(\img\mathbf{Q}=\Span(\mathbf{e}_2)\). For the direct sum decomposition, \(\RR^2=\Span(\mathbf{e}_1)\oplus\Span(\mathbf{e}_1+\mathbf{e}_2)\), the corresponding projections are
\begin{equation}
\mathbf{P}=\begin{pmatrix} 1&-1\\0&0\end{pmatrix}\quad\text{and}\quad\mathbf{Q}=\begin{pmatrix} 0&1\\0&1\end{pmatrix},
\end{equation}
with \(\img\mathbf{P}=\Span(\mathbf{e}_1)\) and \(\img\mathbf{Q}=\Span(\mathbf{e}_1+\mathbf{e}_2)\).

Remark Recalling the earlier Example, we note that a projection is an example of a linear operator for which the vector space does decompose as the direct sum of its kernel and image.

More generally, if we have linear operators \(P_1,\dots,P_r\) such that \(P_iP_j=0\) whenever \(i\neq j\) and \(P_1+\dots+P_r=\id_V\) then they are projectors and defining \(U_i=P_iV\),
\begin{equation}
V=U_1\oplus\dots\oplus U_r.
\end{equation}
Note that to check that this sum is really direct it is not enough to check that \(U_i\cap U_j=\{0\}\) whenever \(i\neq j\). 1 We confirm uniqueness directly. We have, \(v=(P_1+\dots+P_r)v=w_1+\dots+w_r\), say, and suppose we also have \(v=u_1+\dots+u_r\). Then applying \(P_i\) to both expressions we obtain \(u_i=w_i\) so the decomposition \(v=w_1+\dots+w_r\) is unique and the sum is direct. If we define \(U_{(i)}=\oplus_{j\neq i}U_j\) and \(P_{(i)}=\sum_{j\neq i}P_j\) then \(V=U_i\oplus U_{(i)}\), \(P_i+P_{(i)}=\id_V\) and \(P_iP_{(i)}=0\). So \(\ker P_i\cong\img P_{(i)}\), \(\img P_i\cong\ker P_{(i)}\) and \(V=\img P_i\oplus\ker P_i=\ker P_{(i)}\oplus\img P_{(i)}\).

If \(P_1\) and \(P_2\) are projections, which do not necessarily sum to the identity \(\id_V\), then it is natural to ask under what circumstances their sum (or difference) is also a projection.

Theorem Suppose \(P_1\) and \(P_2\) are projections onto subspaces \(U_1\) and \(U_2\) of a vector space \(V\) with \(W_1\) and \(W_2\) the respective complementary subspaces. Then,

  1. \(P=P_1+P_2\) is a projection if and only if \(P_1P_2=P_2P_1=0\) in which case \(\img P=U_1\oplus U_2\) and \(\img(\id_V-P)=W_1\cap W_2\).
  2. \(P=P_1-P_2\) is a projection if and only if \(P_1P_2=P_2P_1=P_2\) in which case \(\img P=U_1\cap U_2\) and \(\img(\id_V-P)=W_1\oplus W_2\).
  3. If \(P_1P_2=P_2P_1=P\), then \(P\) is a projection such that \(\img P=U_1\cap U_2\) and \(\img(\id_V-P)=W_1+W_2\)

Proof If \(P=P_1+P_2\) is a projection, then, \(P^2=P\), so that, \(P_1P_2+P_2P_1=0\). Multiplying by \(P_1\) from the left we obtain, \(P_1P_2+P_1P_2P_1=0\). Multiplying from the right we obtain, \(P_1P_2P_1+P_2P_1=0\), so that, \(P_1P_2=P_2P_1\), and hence \(P_1P_2=0\). That \(P_1P_2=P_2P_1=0\) implies \(P^2=P\) is clear. Assuming \(P\) is indeed a projection, consider \(\img P\). If \(v\in\img P\), then \(v=Pv=P_1v+P_2v\in U_1+U_2\). Conversely, if \(v\in U_1+U_2\), then \(v=u_1+u_2\) for some \(u_1\in U_1\) and \(u_2\in U_2\). Then, \(Pv=P_1u_1+P_2u_2\), since \(P_1P_2=0\), and so, \(Pv=v\), hence \(v\in\img P\). If \(v\in U_1\cap U_2\), then \(v=P_1v=P_1P_2v=0\), so \(U_1\cap U_2=0\) and \(\img P=U_1\oplus U_2\). Now if \(v\in\img(\id_V-P)\), then, \(v=v-Pv\), and it is clear that \(P_1v=0=P_2v\) so \(v\in W_1\cap W_2\). Conversely, if \(v\in W_1\cap W_2\), then
\begin{equation*}
(\id_V-P)v=v-(P_1+P_2)v=(\id_v-P_1)v-P_2v=(id_v-P_2)v=v
\end{equation*}
so \(v\in\img(\id_V-P)\). The other statements are proved similarly.\(\blacksquare\)

Suppose that with respect to some \(T\in\mathcal{L}(V)\) the subspace \(U\) of \(V\) is \(T\)-invariant and consider a direct sum \(V=U\oplus W\). Corresponding to any such direct sum decomposition we have a projection \(P\) such that \(\img P=U\) and \(\ker P=W\). Its not difficult to see that for any such projection we have \(PTP=TP\). Conversely, if \(PTP=TP\) for some projection \(P\) onto a subspace \(U\), then any \(u\in U\) is such that \(u=Pu\) so that \(Tu=TPu=PTPu=PTu\), that is, \(Tu\in U\), so \(U\) is \(T\)-invariant.

Theorem If \(V=U\oplus W\), for some subspaces \(U,W\subset V\), and \(P\) is the corresponding projection, then a necessary and sufficient condition for those subspaces to be invariant with respect to some linear operator \(T\in\mathcal{L}(V)\) is that \(T\) commutes with \(P\), \(TP=PT\).

Proof Assuming \(U\) and \(W\) are \(T\)-invariant then we know that \(PTP=TP\) but also \((1-P)T(1-P)=T(1-P)\). From the latter it follows that \(PTP=PT\) so \(TP=PT\). In the other direction, if \(TP=PT\), then for any \(u\in U\), \(u=Pu\) so that \(Tu=TPu=PTu\), or \(Tu\in U\). Likewise for any \(v\in V\).\(\blacksquare\)

Notes:

  1. For example, for the space \(\RR^2\) of all pairs \((x,y)\) we could define three subspaces \(U_1=\{(x,0)\mid x\in\RR\}\), \(U_2=\{(0,x)\mid x\in\RR\}\), and \(U_3=\{(x,x)\mid x\in\RR\}\). Clearly \(U_i\cap U_j=\{0\}\) whenever \(i\neq j\) but it is equally clear that we couldn’t express an arbitrary element \((x,y)\in\RR^2\) uniquely in terms of elements of \(U_1\), \(U_2\) and \(U_3\).

Rank and Solutions of Linear Systems

To actually compute the rank of a given linear transformation one would choose bases, express the transformation as a matrix and then use elementary row operations. Recall there are three types of elementary row operations which may be performed on an \(m\times n\) matrix \(A\), with corresponding elementary matrices obtained by carrying out the same operations on the identity matrix of the appropriate size. They are

Row interchange Interchanging the \(i\)th and \(j\)th rows. For example interchanging the 1st and 3rd rows of
\begin{equation*}
\begin{pmatrix}
1&2&3&4\\
3&4&5&6\\
7&8&9&10\\
11&12&13&14
\end{pmatrix}
\end{equation*}
we get
\begin{equation*}
\begin{pmatrix}
7&8&9&10\\
3&4&5&6\\
1&2&3&4\\
11&12&13&14
\end{pmatrix}
\end{equation*}
which is also obtained by multiplying from the left by
\begin{equation*}
\begin{pmatrix}
0&0&1&0\\
0&1&0&0\\
1&0&0&0\\
0&0&0&1
\end{pmatrix}
\end{equation*}

Row multiplication Multiplying the \(i\)th row by some number. For example multiplying the 1st row of\begin{equation*}
\begin{pmatrix}
1&2&3&4\\
3&4&5&6
\end{pmatrix}
\end{equation*}
by 2 we get
\begin{equation*}
\begin{pmatrix}
2&4&6&8\\
3&4&5&6
\end{pmatrix}
\end{equation*}
which is also obtained by multiplying from the left by
\begin{equation*}
\begin{pmatrix}
2&0\\
0&1
\end{pmatrix}
\end{equation*}

Row addition Replacing the \(i\)th row by the \(i\)th row plus some number times the \(j\)th row. For example replacing the 2nd row of
\begin{equation*}
\begin{pmatrix}
1&2&3\\
4&5&6\\
7&8&9
\end{pmatrix}
\end{equation*}
by the second row minus 4 times the 1st row we get
\begin{equation*}
\begin{pmatrix}
1&2&3\\
0&-3&-6\\
7&8&9
\end{pmatrix}
\end{equation*}
which is also obtained by multiplying from the left by
\begin{equation*}
\begin{pmatrix}
1&0&0\\
-4&1&0\\
0&0&1
\end{pmatrix}
\end{equation*}

Elementary row matrices are of course invertible, their inverses corresponding to the undoing of the original elementary row operation. Elementary row operations can be used to solve linear systems \(\mathbf{A}\mathbf{x}=\mathbf{b}\) by reducing the augmented matrix \(\mathbf{A}\mathbf{|}\mathbf{b}\) to row echelon form (all nonzero rows are above any rows of all zeroes and the leading coefficient of a nonzero row is 1 and is always strictly to the right of the leading coefficient of the row above it) or reduced row echelon form (row echelon form but with each leading coefficient being the only non-zero entry in its column). If we reduce a matrix \(\textbf{A}\) to reduced row echelon form using elementary row operations and record each step as an elementary row matrix \(\mathbf{E}_i\) then we obtain an expression for \(\textbf{A}\) as
\begin{equation}
\mathbf{A}=\mathbf{E}_1^{-1}\dots\mathbf{E}_a^{-1}\mathbf{A}’
\end{equation}
for some number \(a\) of row operations and \(\mathbf{A}’\) in reduced row echelon form.
The rank of the original matrix can already be recovered at this point as the number of non-zero rows of \(\mathbf{A}’\). Indeed, a matrix is invertible if and only if it can be expressed as a product elementary row matrices. In the more general case, by simplifying further using elementary column operations (defined in the obvious way), and recording the steps with the corresponding elementary column matrices \(\mathbf{F}_i\) we arrive at a matrix of the form equation and a factorisation of \(\mathbf{A}\) of the form,
\begin{equation}
\mathbf{A}=\mathbf{E}_1^{-1}\dots\mathbf{E}_a^{-1}\begin{pmatrix}
\mathbf{I}_{n-d}&\mathbf{0}_{n-d,d}\\
\mathbf{0}_{m-n+d,n-d}&\mathbf{0}_{m-n+d,d}
\end{pmatrix}\mathbf{F}_b^{-1}\dots\mathbf{F}_1^{-1},
\end{equation}
exactly as we established in equation.

Here’s another perspective on the equivalence of the dimension of the row and column spaces of a matrix. Observe first that if an \(m\times n\) matrix \(\mathbf{A}\) is the product of an \(m\times r\) matrix \(\mathbf{B}\) and an \(r\times n\) matrix \(\mathbf{C}\), \(\mathbf{A}=\mathbf{B}\mathbf{C}\), then the \(i\)th row of \(\mathbf{A}\) is just a linear combination of the \(r\) rows of \(\mathbf{C}\) with coefficients from the \(i\)th column of \(\mathbf{B}\),
\begin{align*}
\begin{pmatrix}
A_1^i&\dots&A^i_n
\end{pmatrix}&=B^i_1
\begin{pmatrix}
C^1_1&\dots&C^1_n
\end{pmatrix}\\
&+B^i_2
\begin{pmatrix}
C^2_1&\dots&C^2_n
\end{pmatrix}\\
&+\dots\\
&+B^i_r
\begin{pmatrix}
C^r_1&\dots&C^r_n
\end{pmatrix}.
\end{align*}
Similarly the \(j\)th column of \(\mathbf{A}\) is a linear combination of the \(r\) columns of \(\mathbf{B}\) with coefficients from the \(j\)th column of \(\mathbf{C}\),
\begin{equation*}
\begin{pmatrix}
A^1_j\\
\vdots\\
A^m_j
\end{pmatrix}=C^1_j
\begin{pmatrix}
B^1_1\\
\vdots\\
B^m_1
\end{pmatrix}
+C^2_j
\begin{pmatrix}
B^1_2\\
\vdots\\
B^m_2
\end{pmatrix}
+\dots
+C^r_j
\begin{pmatrix}
B^1_r\\
\vdots\\
B^m_r
\end{pmatrix}.
\end{equation*}
Now suppose that the dimension of the space spanned by the columns of an \(m\times n\) matrix is \(r\). This means that we can find \(r\) column vectors \(b_1,\dots,b_r\) in terms of which each column, \(a_i\), of \(A\) can be written as
\begin{equation*}
a_i=c_{1i}b_1+\dots+c_{ri}b_r.
\end{equation*}
But in so doing we have constructed the discussed \(\mathbf{A}=\mathbf{B}\mathbf{C}\) factorisation of \(\mathbf{A}\) and as such know that any row of \(\mathbf{A}\) can be expressed as a linear combination of the \(r\) rows of \(\mathbf{C}\). Moreover if the rows of \(\mathbf{A}\) could be expressed as a linear combination of fewer than \(r\) rows of \(\mathbf{C}\) then this would in turn imply, running the reasoning starting from the dimension of the row space, that the column space had dimension less than \(r\) contradicting our initial assumption. Thus we have established that the dimensions of the row and column spaces of an \(m\times n\) matrix \(\mathbf{A}\) are identical.

Recall that we noted that the solution set of any system of homogeneous equations in \(n\) variables over a field \(K\) is a subspace of \(K^n\). What is the dimension of this subspace? Writing the system of equations in the form \(\mathbf{A}\mathbf{x}=0\) where \(\mathbf{A}\) is some \(m\times n\) matrix we can then proceed to carry out elementary row operations on \(\mathbf{A}\) to transform the system of equations into echelon form. In so doing we produce a bijection, indeed a vector space isomorphism, between the solution set, let’s call it \(S\), and \(K^{n-r}\) where \(r\) is the rank of \(A\), in other words the number of “steps” of the echelon form. Think of this as arising as follows: For any \(n\)-tuple \((x_1,\dots,x_n)\in S\) we discard the \(x_j\) where \(j\) runs through the column indices corresponding to the beginnings of “stairs”. We now have the necessary machinery to make good on our claim that the abstract theory of vector spaces illuminates the theory of solutions of simultaneous linear equations in \(n\) unknowns just as pictures of intersecting planes do in the case of 3 unknowns.

Theorem (Rouché-Capelli) A system of linear equations in \(n\) variables, \(\mathbf{A}\mathbf{x}=\mathbf{b}\), has a solution if and only if the rank of its coefficient matrix, \(\mathbf{A}\), is equal to the rank of its augmented matrix, \(\mathbf{A}\mathbf{|}\mathbf{b}\). If a solution exists and \(\rank(\mathbf{A})=n\) the solution is unique. If a solution exists and \(\rank(\mathbf{A}){<}n\) there are infinite solutions. In either of these cases the set of solutions is of the form
\begin{equation*}
\{\mathbf{s}+\mathbf{p}\mid\mathbf{s}\in S,\mathbf{A}\mathbf{p}=\mathbf{b}\}
\end{equation*}
where \(S=\{\mathbf{s}\in K^n\mid\mathbf{A}\mathbf{s}=0\}\) is an \(n-\rank(\mathbf{A})\) subspace of \(K^n\) and \(\mathbf{p}\) is a particular solution.

Proof We must have \(\rank(\mathbf{A}\mathbf{|}\mathbf{b})\geq\rank(\mathbf{A})\). If \(\rank(\mathbf{A}){<}\rank(\mathbf{A}\mathbf{|}\mathbf{b})\) then the row echelon form of the augmented matrix would contain a row corresponding to an impossible equation implying no solutions for the system. Conversely, note that if \(\mathbf{A}\mathbf{x}=\mathbf{b}\) has no solution then \(\mathbf{b}\) does not belong to the column space of \(\mathbf{A}\) so that \(\rank(\mathbf{A}\mathbf{|}\mathbf{b})>\rank(\mathbf{A})\). Let us therefore assume \(\rank(\mathbf{A})=\rank(\mathbf{A}\mathbf{|}\mathbf{b})\). In the case that \(\rank(\mathbf{A})=n\) then \(S\) is the zero vector space. In this case the solution of the system is unique since suppose \(\mathbf{p}\) and \(\mathbf{p}’\) were two particular solutions of the original system then we must have \(\mathbf{p}-\mathbf{p}’\in S\), that is, \(\mathbf{p}=\mathbf{p}’\). More generally, as observed above, the dimension of \(S\) is \(n-\rank(\mathbf{A})\) and given any particular solution \(\mathbf{p}\) of \(\mathbf{A}\mathbf{x}=\mathbf{b}\), so too is \(\mathbf{p}+\mathbf{s}\) for any \(\mathbf{s}\in S\). Moreover, if \(\mathbf{p}’\) is any other solution then we must have \(\mathbf{p}’-\mathbf{p}\in S\), in other words, \(\mathbf{p}’=\mathbf{s}+\mathbf{p}\) for some \(\mathbf{s}\in S\).

Change of Basis

Suppose \(V\) is an \(n\)-dimensional vector space with basis \(\{e_i\}\). Then we may identify \(v\in V\) with the column vector, \(\mathbf{v}\), of its components with respect to this basis,
\begin{equation*}
\mathbf{v}=
\begin{pmatrix}
v^1\\
\vdots\\
v^n
\end{pmatrix}.
\end{equation*}
If we have an alternative basis for \(V\), \(\{e’_i\}\), then with respect to this basis \(v\) will be represented by a different column vector, \(\mathbf{v}’\) say,
\begin{equation*}
\mathbf{v}’=
\begin{pmatrix}
v’^1\\
\vdots\\
v’^n
\end{pmatrix},
\end{equation*}
where \(v’^i\) are the components of \(v\) with respect to this alternative basis. But \(\{e_i\}\) and \(\{e’_i\}\) must be related via an invertible matrix \(\mathbf{P}\), which we’ll call the change of basis matrix, according to \(e’_i=P_i^je_j\). Then, since \(v=v’^ie’_i=v’^iP_i^je_j=v^je_j\) we have that
\(\mathbf{v}’=\mathbf{P}^{-1}\mathbf{v}\). We say that the components of a vector transform with the inverse of the change of basis matrix, that is they transform contravariantly 1.

Now suppose that in addition we have an \(m\) dimensional space \(W\) with bases \(\{f_i\}\) and \(\{f’_i\}\) related by a change of basis matrix \(\mathbf{Q}\) according to \(f’_i=Q_i^jf_j\). Let us consider how linear transformations and their matrix representations are affected by change of bases. If a linear transformation \(T\in\mathcal{L}(V,W)\) is represented with respect to the bases \(\{e_i\}\) and \(\{f_i\}\) by the matrix \(\mathbf{T}\) with components \(T_i^j\), then we consider its representation, \(\mathbf{T}’\) say, with respect to the alternative bases \(\{e’_i\}\) and \(\{f’_i\}\). Since any \(v\in V\) can be written either as \(v=v^ie_i=v^i{P^{-1}}_i^je’_j\) or \(v=v’^ie’_i\) and any \(w\in W\) either as \(w=w^if_i=w^i{Q^{-1}}_i^jf’_j\) or \(w=w’^if’_i\), we have
\begin{equation*}
w’^j={Q^{-1}}_i^jw^i={Q^{-1}}_i^jT_k^iv^k={Q^{-1}}_i^jT_k^iP_l^kv’^l
\end{equation*}
as well as \(w’^j={T’}_i^j v’^i\). So we must have,
\begin{equation*}
{T’}_i^j={Q^{-1}}_k^jT_l^kP_i^l.
\end{equation*}
That is, \(\mathbf{T}’=\mathbf{Q}^{-1}\mathbf{T}\mathbf{P}\). The matrices \(\mathbf{T}’\) and \(\mathbf{T}\) are said to be equivalent. Correspondingly, two linear transformations, \(T:V\mapto W\) and \(T’:V\mapto W\) are said to be equivalent if there exist automorphisms \(P\in\text{GL}(V)\) and \(Q\in\text{GL}(W)\) such that \(T’=Q^{-1}TP\).

Lemma If \(T:V\mapto W\) is a linear transformation and \(P\in\text{GL}(V)\) and \(Q\in\text{GL}(W)\) then
\begin{equation}
\dim\ker T=\dim\ker QTP \qquad\text{and}\qquad \dim\img T=\dim\img QTP. \label{equ:rank conservation}
\end{equation}

Proof \(P\) induces an isomorphism \(\ker T\cong\ker QTP\), as suppose \(u\in\ker QTP\), then \(QTPu=0\iff Q(TPu)=0\iff TPu=0\). So the restriction of \(P\) to \(\ker QTP\) maps \(\ker QTP\) to \(\ker T\). Since \(P\) is invertible this is an isomorphism. Similarly, \(Q\) can be seen to induce an isomorphism \(\img T\cong\img QTP\), which follows in any case from the isomorphism of kernels by the rank-nullity theorem. \(\blacksquare\)

This result tells us, in particular, that equivalent linear transformations share the same rank. For an \(m\times n\) matrix \(\mathbf{A}\), the rank, \(\rank(\mathbf{A})\), is defined to be that of the corresponding linear transformation, \(L_\mathbf{A}\), that is \(\rank(\mathbf{A})=\rank(L_\mathbf{A})\). Now we may regard \(L_\mathbf{A}\) as a map \(L_\mathbf{A}:K^n\mapto K^m\) and taking \(\{\mathbf{e}_i\}\) to be the standard basis of \(K^n\), then \(L_\mathbf{A}\mathbf{e}_1,L_\mathbf{A}\mathbf{e}_2,\dots,L_\mathbf{A}\mathbf{e}_n\) span \(\img L_\mathbf{A}\). But \(L_\mathbf{A}\mathbf{e}_i\) is simply the \(i\)th column of \(\mathbf{A}\) so we see that the rank of the matrix \(\mathbf{A}\) is just the dimension of the space spanned by its columns. What is the dimension of the row space?

Let us denote by \(\{k_i\}\), \(1\leq i\leq d\), a basis of \(\ker L_\mathbf{A}\), and extend this to a basis, \(e_1,\dots,e_{n-d},k_1,\dots,k_d\) of \(K^n\). Then, as we already saw in the proof of the rank-nullity theorem, the elements \(f_i=L_\mathbf{A}e_i\in K^m\), \(1\leq i\leq n-d\), are linearly independent and so can be extended to a basis \(f_1,\dots,f_m\) of \(W\). So we have constructed new bases for \(V\) and \(W\) respectively with respect to which the matrix representation of \(L_\mathbf{A}\) has the particularly simple form
\begin{equation}\tilde{\mathbf{A}}=
\begin{pmatrix}
\mathbf{I}_{n-d}&\mathbf{0}_{n-d,d}\\
\mathbf{0}_{m-n+d,n-d}&\mathbf{0}_{m-n+d,d}
\end{pmatrix}\label{equ:fully reduced matrix}
\end{equation}
where \(\mathbf{I}_d\) is the \(d\times d\) identity matrix and \(\mathbf{0}_{m,n}\) the \(m\times n\) zero matrix. The rank of this matrix is of course \(n-d\), simply the number of \(1\)s. From \eqref{equ:rank conservation}, we know that equivalent linear transformations have isomorphic kernels and images, so equivalent matrices have the same rank. In other words, any matrix \(\mathbf{A}\) may be factorised as
\begin{equation}
\mathbf{A}=\mathbf{Q}\begin{pmatrix}
\mathbf{I}_{n-d}&\mathbf{0}_{n-d,d}\\
\mathbf{0}_{m-n+d,n-d}&\mathbf{0}_{m-n+d,d}
\end{pmatrix}
\mathbf{P}^{-1}=\mathbf{Q}\tilde{\mathbf{A}}\mathbf{P}^{-1},\label{matrix factorization}
\end{equation}
and if it is equivalent to another matrix of the same form as \(\tilde{\mathbf{A}}\), say \(\mathbf{B}\), then \(\mathbf{B}=\tilde{\mathbf{A}}\). Given this factorisation, it is clear that \(\rank\mathbf{A}^\mathsf{T}=\rank\mathbf{A}\) from which it follows that the dimensions of the row and column spaces of any matrix are equal.

To summarise, for any pair of vector spaces \(V\) and \(W\), of dimensions \(n\) and \(m\) respectively, the linear transformations, \(\mathcal{L}(V,W)\), are determined, up to change of the respective bases, by their rank which is bounded above by \(\min(n,m)\).

A nice way to restate this conclusion is in the language of group actions and their orbits. Recall that if \(G\) is a group and \(X\) some set then an action of \(G\) on \(X\) is a (group) homomorphism between \(G\) and \(S_X\), the group of all permutations of the elements of \(X\). Now \(\text{GL}(V)\times\text{GL}(W)\), with the obvious group structure, has an action on the space \(\mathcal{L}(V, W)\) defined by \((Q,P)T=QTP^{-1}\) for \(P\in\text{GL}(V)\) and \(Q\in\text{GL}(W)\). Recall also that the action of a group on a set \(X\) partitions \(X\) into orbits, where the orbit of some \(x\in X\) is defined to be the subset, \(\{gx\mid\forall g\in G\}\), of \(X\). In our case we see that orbits of the action of \(\text{GL}(V)\times\text{GL}(W)\) on \(\mathcal{L}(V, W)\) are precisely the linear transformations of a given rank. The headline, as it were, is therefore the following.

The orbits of the action of \(\text{GL}(V)\times\text{GL}(W)\) on \(\mathcal{L}(V, W)\) are those elements of \(\mathcal{L}(V,W)\) of a given rank and thus are in bijection with the set \(\{d\mid 0\leq d\leq \min(\dim V,\dim W)\}\).

Notes:

  1. If we assemble the old and new basis vectors into row vectors \(\mathbf{e}\) and \(\mathbf{e}’\) respectively then \(\mathbf{e}’=\mathbf{e}\mathbf{P}\) but \(\mathbf{v}’=\mathbf{P}^{-1}\mathbf{v}\).

Linear Transformations and Matrices

Considering structure preserving maps between vector spaces leads to the following definition.

Definition A linear transformation is any map \(T:V\mapto W\) between vector spaces \(V\) and \(W\) which preserves vector addition and scalar multiplication, \(T(au+bv)=aTu+bTv\), where \(a,b\in K\) and \(u,v\in V\). We’ll call a linear transformation from a vector space to itself, \(T:V\mapto V\), a linear operator on \(V\).

The kernel of such a linear transformation is \(\ker T=\{v\in V\mid Tv=0\}\) and the image is \(\img T=\{w\in W\mid w=Tv\}\). The linearity of \(T\) means they are vector subspaces of \(V\) and \(W\) respectively. The dimension of \(\ker T\) is called the nullity of \(T\) while that of the image of \(T\) is the rank of \(T\), \(\rank(T)\). They are related to the dimension of the vector space \(V\) as follows.

Theorem (Rank-nullity theorem) If \(T:V\mapto W\) is a linear transformation and \(V\) is finite dimensional then,\begin{equation}
\dim \ker T +\dim \img T =\dim V .\label{equ:dimension equation}
\end{equation}

Proof  Take \(\{k_i\}\), \(1\leq i\leq r\), to be a basis of \(\ker T\). We know that we can extend this to a basis of \(V\), \(\{k_1,\dots,k_r,h_1,\dots,h_s\}\). Consider then the set \(\{h’_i\}\), \(1\leq i\leq s\), the elements of which are defined according to \(Th_i=h’_i\). Any element of \(\img T\) is of the form \(T(c^1k_1+\dots+c^rk_r+d^1h_1+\dots+d^sh_s)= d^1h’_1+\dots+d^sh’_s\) so \(\Span(h’_1,\dots,h’_s)=\img T\). Furthermore, suppose we could find \(c^i\), not all zero, such that \(c^1h’_1+\dots+c^sh’_s=0\). Then we would have \(T(c^1h_1+\dots+c^sh_s)=0\), that is, \(c^1h_1+\dots+c^sh_s\in\ker T\), but this would contradict the linear independence of the basis of \(V\). Thus \(\{h’_i\}\) for \(1\leq i\leq s\) is a basis for \(\img T\) and the result follows.\(\blacksquare\)

It is of course the case that \(T\) is one-to-one (injective) if and only if \(\ker T=\{0\}\) and is onto (surjective) if and only if \(\img T =W\). This theorem thus tells us that if \(V\) and \(W\) have the same dimension then \(T\) is one-to-one if and only if it is onto.

\(T\) is said to be invertible if there exists a linear transformation \(S:W\mapto V\) such that \(TS=\id_W\), the identity operator on \(W\), and \(ST=\id_V\), the identity operator on \(V\). In this case \(S\) is called the (there can only be one) inverse of \(T\) and denoted \(T^{-1}\). \(T\) is invertible if and only if it is both one-to-one and onto, in which case we call it an isomorphism. Notice that one-to-one and onto are equivalent respectively to \(T\) having a left and right inverse. Indeed, if \(T\) has a left inverse, \(S:W\mapto V\) such that \(ST=\id_V\), then for any \(v,v’\in V\) \(Tv=Tv’\Rightarrow STv=STv’\Rightarrow v=v’\). Conversely, if \(T\) is one-to-one then we can define a map \(S:W\mapto V\) which on the restriction to \(\img T\) is such that for any \(w\in\img T\), with \(w=Tv\), \(Sw=v\). \(T\) being one-to-one, this map is well-defined (single-valued). If \(T\) has a right inverse such that \(TS=\id_W\) then for any \(w\in W\) we can write \(w=\id_W w=TSw\) so \(T\) is certainly onto. Conversely, if \(T\) is onto then the existence of a right inverse is equivalent to the axiom of choice.

It is not difficult to see that two finite dimensional vector spaces \(V\) and \(W\) are isomorphic if and only if \(\dim(V)=\dim(W)\) and a linear transformation \(T:V\mapto W\) between two such spaces is an isomorphism if and only if \(\rank(T)=n\) where \(n\) is the common dimension. In other words, we have the following characterisation.

Finite dimensional vector spaces are completely classified in terms of their dimension.

Indeed, any \(n\)-dimensional vector space over \(K\) is isomorphic to \(K^n\), the space of all \(n\)-tuples of elements of the field \(K\). Explicitly, given a basis \(\{e_i\}\) of a vector space \(V\), this isomorphism identifies a vector \(v\) with the column vector \(\mathbf{v}\) of its components with respect to that basis,\begin{equation*}
v=v^ie_i\longleftrightarrow
\begin{pmatrix}
v^1\\
\vdots\\
v^n
\end{pmatrix}.
\end{equation*}

Clearly, a linear transformation \(T:V\mapto W\) is uniquely specified by its action on basis elements. If bases \(\{e_i\}\) and \(\{f_i\}\) are chosen for the respectively \(n\) and \(m\) dimensional vector spaces \(V\) and \(W\), then we can write any vector \(v\in V\) as \(v=v^ie_i\). For any such \(v\in V\) there is an element \(w\in W\) such that \(Tv=w\) and of course we can write it as \(w=w^if_i\). But there must also exist numbers \(T_i^j\) such that \(Te_i=T_i^jf_j\) so we have \(Tv=v^iT_i^jf_j=w^if_i=w\) which can be summarised in terms of matrices as
\begin{equation}
\begin{pmatrix}
w^1\\
\vdots\\
w^m
\end{pmatrix}=\begin{pmatrix}
T_1^1&\dots&T_n^1\\
\vdots&\ddots&\vdots\\
T_1^m&\dots&T_n^m
\end{pmatrix}\begin{pmatrix}
v^1\\
\vdots\\
v^n
\end{pmatrix}.
\end{equation}
That is, \(\mathbf{w}=\mathbf{T}\mathbf{v}\), is the matrix version of \(w=Tv\). The matrix \(\mathbf{T}\) is called the matrix representation of the linear transformation \(T\) and addition, scalar multiplication and composition of linear transformations correspond respectively to matrix addition, multiplication of a matrix by a scalar and matrix multiplication respectively.

Conversely, given a choice of bases \(\{e_i\}\) and \(\{f_i\}\) for vector spaces \(V\) and \(W\) of dimensions \(n\) and \(m\) respectively, any \(m\times n\) matrix \(\mathbf{A}\) gives rise to a linear transformation \(L_\mathbf{A}:V\mapto W\) defined by \(L_\mathbf{A}v=L_\mathbf{A}(v^ie_i)=A_i^jv^if_j\) for all \(v\in V\). Of course, having chosen bases for \(V\) and \(W\) we also have isomorphisms \(V\cong K^n\) and \(W\cong K^m\) so the following diagram commutes:
\begin{equation}
\begin{CD}
K^n @>\mathbf{A}>> K^m\\
@VV\cong V @VV\cong V\\
V @>L_{\mathbf{A}}>> W
\end{CD}
\end{equation}

Denoting by, \(\mathcal{L}(V,W)\), the set of linear transformations between vector spaces \(V\) and \(W\), it is clear that \(\mathcal{L}(V,W)\) is a vector space and we may summarise the preceding discussion in the following theorem.

Theorem A choice of bases for vector spaces \(V\) and \(W\), of dimensions \(n\) and \(m\) respectively, defines a vector space isomorphism \(\mathcal{L}(V,W)\cong\text{Mat}_{m,n}(K)\).

A consequence of this is that,
\begin{equation}
\dim\mathcal{L}(V,W)=\dim\text{M}_{m,n}(K)=nm=\dim V\dim W.
\end{equation}

A linear operator \(T:V\mapto V\) is called an automorphism if it is an isomorphism. The set of all linear operators on a vector space \(V\) is denoted \(\mathcal{L}(V)\) and is of course a vector space in its own right. The automorphisms of a vector space \(V\), denoted \(\text{GL}(V)\), form a group called the general linear group of \(V\). If \(T\in\text{GL}(V)\) and \(\{e_i\}\) is some basis of \(V\) then clearly \(\{Te_i\}\) is also a basis, identical to the original if and only if \(T=\id_V\) and conversely if \(\{e’_i\}\) is some other basis of \(V\) then the linear operator \(T\) defined by \(Te_i=e’_i\) is an isomorphism.

The invertibility of a linear transformation \(T\in\mathcal{L}(V,W)\) is equivalent to the invertibility, once a basis is chosen, of the matrix representation of the transformation. Indeed the invertibility of any matrix \(\mathbf{A}\) is equivalent to the invertibility of the corresponding linear transformation \(L_\mathbf{A}\) (which in turn means an invertible matrix must be square and of rank \(n=\dim V\)). We denote by \(\text{GL}_n(K)\) the group of automorphisms of \(K^n\), that is, the group of invertible \(n\times n\) matrices over \(K\). It is not difficult to see that the isomorphism \(V\cong K^n\) induces an isomorphism \(\text{GL}(V)\cong\text{GL}_n(K)\).

 

Basic Definitions and Examples

At school we learn that ‘space’ is 3-dimensional. We can specify its points in terms of coordinates \((x,y,z)\) and we think of ‘vectors’ as arrows from one point to another. For example, in the diagram below,
IMG_0192
the points \(P\) and \(Q\) might be specified as \(P=(p_1,p_2,p_3)\) and \(Q=(q_1,q_2,q_3)\) respectively. The vectors \(\mathbf{OP}\) and \(\mathbf{OQ}\) are the arrows from the origin to the respective points and it’s typical to write their components as column vectors,
\begin{equation*}
\mathbf{OP}=\begin{pmatrix}p_1\\p_2\\p_3\end{pmatrix}\quad\text{and}\quad\mathbf{OQ}=\begin{pmatrix}q_1\\q_2\\q_3\end{pmatrix}.
\end{equation*}
We don’t distinguish between \(\mathbf{OP}\) and any other arrow of the same length and direction. We can think of the vectors emanating from the origin as the representatives of an equivalence class of arrows of the same length and orientation positioned anywhere in space. In particular, the arrow from \(Q\) to \(S\) which we get by transporting \(\mathbf{OP}\), keeping its length and orientation the same so that its ‘tail’ meets the ‘head’ of \(\mathbf{OQ}\), belongs to the same equivalence class as \(\mathbf{OP}\). In fact what we obtain in this way is the geometric construction of \(\mathbf{OS}\) as the sum of \(\mathbf{OP}\) and \(\mathbf{OQ}\). Similarly, \(\mathbf{PQ}\), is equivalently taken to be the arrow from \(P\) to \(Q\) or, as in the diagram, the vector from the origin to the point reached by joining the tail of \(-\mathbf{OP}\), the vector of the same length as \(\mathbf{OP}\) but opposite direction, to the head of \(\mathbf{OQ}\), that is, \(\mathbf{PQ}=\mathbf{OQ}-\mathbf{OP}\). Generally, given any vector \(\mathbf{v}\) and any real number \(a\) \(a\mathbf{v}\) is another vector \(\abs{a}\) times as long as \(\mathbf{v}\) and pointing in the same direction when \(a\) is positive and opposite direction when \(a\) is negative. The algebraic structure we have here is perhaps the most familiar example of the abstract notion of a vector space.

Vectors in space arise of course in physics as the mathematical representation of physical quantities such as force or velocity. But this geometric setting also clarifies the discussion of the solution of simultaneous of linear equations in three unknowns.

Recall that to specify a plane in space we need a point \(P=(p_1,p_2,p_3)\) and a normal vector
\begin{equation*}
\mathbf{n}=\begin{pmatrix}n_1\\n_2\\n_3\end{pmatrix}.
\end{equation*}
The plane is then the set of points \(X=(x,y,z)\) such that the scalar product of the vector
\begin{equation*}
\mathbf{PX}=\begin{pmatrix}x-p_1\\y-p_2\\z-p_3\end{pmatrix},
\end{equation*}
between our chosen point \(P\) and \(X\), with the normal vector, \(\mathbf{n}\), is zero, that is, \(\mathbf{PX}\cdot\mathbf{n}=0\). This is equivalent to the equation
\begin{equation*}
n_1x+n_2y+n_3z=c
\end{equation*}
where \(c=\mathbf{OP}\cdot\mathbf{n}\), a single linear equation in 3 unknowns for which there are an infinite number of solutions, namely, all the points of the plane. This solution ‘subspace’ is clearly a 2-dimensional space within the ambient 3-dimensional space. Now consider a pair of such equations, assuming they aren’t simply a constant multiple of one another, there are two possibilities. In the case that the equations correspond to a pair of parallel planes there are no solutions. For example the pair
\begin{align*}
x-y+3z&=-1\\
-2x+2y-6z&=3
\end{align*}
corresponds geometrically to,
twoplanesparallel
The other possibility is that the equations correspond to a pair of intersecting planes in which case there are an infinite number of solutions corresponding to all points on the line of intersection. For example the pair
\begin{align*}
x-y+3z&=-1\\
2x-y-z&=1
\end{align*}
correspond geometrically to
twoplanes
The line of intersection here, found by solving the pair of equations, may be expressed as
\begin{equation*}
\begin{pmatrix}x\\y\\z\end{pmatrix}=\lambda\begin{pmatrix}4\\7\\1\end{pmatrix}+\begin{pmatrix}2\\3\\0\end{pmatrix}.
\end{equation*}
Its direction vector could have been found as the cross product of the respective normal vectors — equivalent to solving the homogeneous system,
\begin{align*}
x-y+3z&=0\\
2x-y-z&=0,
\end{align*}
with the triple \((2,3,0)\) a particular solution of the inhomogeneous system. In dimensions higher than 3, that is, for systems of linear equations involving more than 3 variables, we can no longer think in terms of planes intersecting in space but the abstract vector space setting continues to provide illumination.

Definition A vector space \(V\) over a field 1 \(K\) (from the point of view of physical applications \(\mathbb{R}\) or \(\mathbb{C}\) will be most relevant), the elements of which will be referred to as scalars or numbers, is a set in which two operations, addition and multiplication by an element of \(K\) are defined. The elements of \(V\), called vectors, satisfy:

  • \(u+v=v+u\)
  • \((u+v)+w=u+(v+w)\)
  • There exists a zero vector \(0\) such that \(v+0=v\)
  • For any \(u\), there exists \(-u\), such that \(u+(-u)=0\)

Thus \((V,+)\) is an abelian group, and is further equipped with a scalar multiplication satisfying:

  • \(c(u+v)=cu+cv\)
  • \((c+d)u=cu+du\)
  • \((cd)u=c(du)\)
  • \(1u=u\)

where \(u,v,w\in V\), \(c,d\in K\) and 1 is the unit element of \(K\).

Example The canonical example is the space of \(n\)-tuples, \(x=(x^1,\dots,x^n)\), \(x^i\in K\), denoted \(K^n\). Its vector space structure is given by \(x+y=(x^1+y^1,\dots,x^n+y^n)\) and \(ax=(ax^1,\dots,ax^n)\).

Example The polynomials of degeree \(n\) over \(\mathbb{R}\), \(P_n\), is a real vector space. In this case, typical vectors would be \(p=a_nx^n+a_{n-1}x^{n-1}+\dots+a_1x+a_0\) and \(q=b_nx^n+b_{n-1}x^{n-1}+\dots+b_1x+b_0\) with vector space structure given by \(p+q=(a_n+b_n)x^n+\dots+(a_0+b_0)\) and \(cp=ca_nx^n+\dots+ca_0\). More generally we have \(F[x]\), the space of all polynomials in \(x\) with coefficients from the field \(F\).

Example Continuous real-valued functions of a single variable, \(C(\RR)\), form a vector space with the natural vector addition and scalar multiplication.

Example The \(m\times n\) matrices over \(K\), \(\text{Mat}_{m,n}(K)\), form a vector space with the usual matrix addition and scalar multiplication. We denote by \(\text{Mat}_n(K)\) the vector space of \(n\times n\) matrices.

Definition A subspace, \(U\), of a vector space \(V\), is a subset of \(V\) which is closed under vector addition and scalar multiplication.

Example A plane through the origin is a subspace of \(\RR^3\). Note though that any plane which does not contain the origin cannot be a subspace since it does not contain the zero vector.

Example The solution set of any system of homogeneous linear equations in \(n\) variables over the field \(K\) is a subspace of \(K^n\). Incidentally, it will be useful to note here that any system of homogeneous linear equations has at least one solution, namely the zero vector and that also, an underdetermined system of homogeneous linear equations has infinite solutions. A simple induction argument on the number of variables establishes the latter. Suppose \(x_n\) is the \(n\)th variable. First we deal with the special case that in each equation the coefficient of \(x_n\) is zero. In this case, \(x_n\) can take any value, and we may set all other variables to zero. We thus have an infinite number of solutions. If, however, one or more equations have non-zero coefficients of \(x_n\) then choose one of them and use it to obtain an expression for \(x_n\) in terms of the other variables. We then use this expression twice. First, we eliminate \(x_n\) from all other equations to arrive at an underdetermined homogeneous system in \(n-1\) variables. By the induction hypothesis, this has infinite solutions. Then, to each such solution, we use the expression for \(x_n\) to obtain a solution of the original system.

A set of \(n\) vectors \(\{e_i\}\) in a vector space \(V\) is linearly dependent if there exist numbers \(c^i\), not all zero, such that \(c^1e_1+c^2e_2+…+c^ne_n=0\). They are linearly independent if they are not linearly dependent. The span of a set of vectors \(S\) in a vector space \(V\), \(\Span(S)\), is the set of all linear combinations of elements in \(S\).

Definition A set of vectors \(S\) is a basis of \(V\) if it spans \(V\), \(\Span(S)=V\), and is also linearly independent.

Throughout the Linear Algebra section of the Library, vector spaces will be assumed to be finite dimensional, that is, spaces \(V\) in which there exists a finite set \(S\) such that \(\Span(S)=V\). In this case it is not difficult to see that \(S\) must have a subset which is a basis of \(V\). In particular, any finite dimensional vector space has a basis.

The following fact will be used repeatedly in what follows.

Theorem Any linearly independent set of vectors, \(e_1,\dots,e_r\), in \(V\) can be extended to a basis of \(V\).

Proof For \(v\in V\), \(v\notin\Span(e_1,\dots,e_r)\) if and only if \(e_1,\dots,e_r,v\) are linearly independent (the ‘if’ follows since \(v\in\Span(e_1,\dots,e_r)\) implies that \(e_1,\dots,e_r,v\) are linearly dependent, the ‘only if’ since if \(v\notin\Span(e_1,\dots,e_r)\) and we had numbers \(c^i,b\) not all zero such that \(c^1e_1+\dots+c^re_r+bv=0\) then with \(b=0\) we contradict the linear independence of the \(e_i\) while with \(b\neq 0\) we contradict \(v\notin\Span(e_1,\dots,e_r)\)). Now, since \(V\) is finite dimensional, there is a spanning set \(S=\{f_1,\dots,f_d\}\) and if each \(f_i\in\Span(e_1,\dots,e_r)\) then \(e_1,\dots,e_r\) is already a basis. On the other hand if \(f_i\notin\Span(e_1,\dots,e_r)\) then we’ve seen that \(e_1,\dots,e_r,f_i\) is linearly independent and so considering each \(f_i\) in turn we may construct a basis for \(V\).\(\blacksquare\)

Theorem If a vector space \(V\) contains a finite basis which consists of \(n\) elements then any basis of \(V\) must consist of exactly \(n\) elements.

Proof We first establish that if we have \(V=\Span(e_1,\dots,e_m)\), with the \(e_i\) linearly independent, then for any linearly independent set of vectors, \(\{f_1,\dots,f_n\}\), \(m\geq n\). One way to see this is by observing that by assumption we can express each \(f_i\) as a linear combination of \(e_i\)s, \(f_i=\sum_{j=1}^mA_i^je_j\) for some \(A_i^j\in K\). Now the linear independence of the \(f_i\)s means that if we have numbers \(c^i\) such that \(\sum_{i=1}^nc^if_i=0\) then \(c^i=0\) for all \(i\). But \(\sum_{i=1}^nc^if_i=\sum_{i=1}^n\sum_{j=1}^mc^iA_i^je_j\), with the coefficient of \(e_j\) being the \(j\)th element of the \(m\times 1\) column vector \(\mathbf{A}\mathbf{c}\) (\(\mathbf{A}\) is here the \(m\times n\) matrix with \(A^i_j\) in the \(i\)th row and \(j\)th column and \(\mathbf{c}\) the \(n\times 1\) column vector with elements \(c^i\)) and if \(m{<}n\), then we know from the discussion in Example that there exists a \(\mathbf{c}\neq 0\) such that \(\mathbf{A}\mathbf{c}=0\). This contradicts the linear independence of the \(f_i\) and so we must indeed have \(m\geq n\). This result may now be applied to any pair of bases to establish the uniqueness of the dimension.\(\blacksquare\)

This allows us to define the dimension, \(n\), of a vector space \(V\) as the number of vectors in any basis of \(V\).

From now on we will, unless stated otherwise, employ the summation convention. That is, if in any term an index appears both as a subscript and a superscript then it is assumed to be summed over from 1 to \(n\) where \(n\) is the dimension of the space. Thus if \(\{e_i\}\) is a basis for \(V\) then any \(v\in V\) can be expressed uniquely as \(v=v^ie_i\). The numbers \(v^i\) are then called the components of \(v\) with respect to the basis \(\{e_i\}\).

Example The vector space \(K^n\) of \(n\)-tuples has a basis, which could reasonably be called ‘standard’, given by the vectors \(e_i=(0,\dots,1,\dots,0)\) with the \(1\) in the \(i\)th place. So in this special basis the components of a vector \(x=(x^1,\dots,x^n)\) are precisely the \(x^i\). It is common to take the elements of \(K^n\) to be \(n\)-dimensional column vectors and to denote vectors using bold face, as in,
\begin{equation}
\mathbf{x}=\begin{pmatrix}
x^1\\
\vdots\\
x^n
\end{pmatrix},
\end{equation}
with the standard basis vectors, \(\{\mathbf{e}_i\}\).

 

Notes:

  1. Recall that a ring is an (additive) group with a multiplication operation which is associative and distributes over addition. A field is a ring such that the multiplication also satisfies all the group properties (after throwing out the additive identity); i.e. it has multiplicative inverses, multiplicative identity, and is commutative.