Monthly Archives: March 2016

The Tensor Algebra

Recall that an algebra, \(A\), over \(K\) is a vector space over \(K\) together with a multiplication operation \(A\times A\mapto A\) which is bilinear. In this section we will use the tensor product to construct the `universal’ associative algebra having an identity.

Definition A tensor of type \((r,s)\) is an element of the tensor product space \(\mathcal{T}^r_s(V)\) defined as
\begin{equation}
T^r_s(V)=\underbrace{V\otimes\dots\otimes V}_r\otimes\underbrace{V^*\otimes\dots\otimes V^*}_s.
\end{equation}
Here \(r\) is called the contravariant rank and \(s\) the covariant rank. In this context a \((0,0)\) tensor is an element of the base field \(K\), called simply a \(0\) rank tensor.

Recall that we have the following isomorphisms,
\begin{equation}
V_1^*\otimes\dots\otimes V_s^*\cong(V_1\otimes\dots\otimes V_s)^*\cong\mathcal{L}(V_1,\dots,V_r;K),
\end{equation}
so that tensors of type \((r,s)\) may be identified with multilinear functions,
\begin{equation}
f:\underbrace{V^*\times\dots\times V^*}_r\times\underbrace{V\times\dots\times V}_s\mapto K.
\end{equation}
A \(0\) rank tensor is just a scalar, the corresponding map just being scalar multiplication.

If we have another multilinear function,
\begin{equation*}
g:\underbrace{V^*\times\dots\times V^*}_p\times\underbrace{V\times\dots\times V}_q\mapto K,
\end{equation*}
which may, of course, be identified with a tensor of type \((p,q)\), then we can define a new multilinear function, such that,
\begin{equation*}
(\alpha_1,\dots,\alpha_{r+p},v_1,\dots,v_{s+q})\mapsto f(\alpha_1,\dots,\alpha_r,v_1,\dots,v_s)g(\alpha_{r+1},\dots,\alpha_{r+p},v_{s+1},\dots,v_{s+q}),
\end{equation*}
which could be identified with a tensor of type \((r+p,s+q)\). We have thus multiplied, via their respective identifications with multilinear maps, a tensor of type \((r,s)\) with a tensor of type \((p,q)\), to obtain a tensor of type \((r+p,s+q)\). The result, viewed as a multilinear map, is therefore called the tensor product, \(f\otimes g\), of the multilinear maps \(f\) and \(g\).

So defined, it is clear that this multiplication is bilinear, in the sense that,
\begin{equation}
(af_1+bf_2)\otimes g=af_1\otimes g +bf_2\otimes g,
\end{equation}
and,
\begin{equation}
f\otimes(ag_1+bg_2) =af\otimes g_1 +bf\otimes g_2,
\end{equation}
is associative but not necessarily commutative. It provides a multiplication on the space,
\begin{equation}
\mathcal{T}(V;V^*)=\bigoplus_{r,s=0}^\infty T^r_s(V).
\end{equation}
such that it becomes an algebra (here we understand \(T_0^0=K\), \(T_0^1=V\) and \(T_1^0=V^*\). This is called the tensor algebra, the name also given to the particular case of, \(\mathcal{T}(V)\), defined as
\begin{equation}
\mathcal{T}(V)=\bigoplus_{r=0}^\infty T^r(V)=K\oplus V\oplus (V\otimes V)\oplus\cdots ,
\end{equation}
equipped with the same multiplication (here \(T^r(V)=T_0^r(V)\). In fact, in this slightly simpler setting, we’ll introduce the multiplication directly without going via the identification of tensors with multilinear maps. Thus, we define the multiplication of an element of \(T^r\) with an element of \(T^s\) using the isomorphism.
\begin{equation}
(\underbrace{V\otimes\dots\otimes V}_r)\otimes(\underbrace{V\otimes\dots\otimes V}_s)\cong\underbrace{V\otimes\dots\otimes V}_{r+s},
\end{equation}
Indeed the restriction of this to \(T^r\times T^s\) provides a bilinear multiplication, \(\otimes:T^r\times T^s\mapto T^{r+s}\) (in the more general setting the only complication is that we’d need to use isomorphisms involving permutations). Equipped with this multiplication \(\mathcal{T}(V)\) is called the tensor algebra of \(V\). The tensor algebra, or rather the pair \((\mathcal{T}(V),\iota)\) where \(\iota:V\mapto T^1(V)\) is the obvious inclusion, has a universal mapping property. That is, whenever \(f:V\mapto A\) is a linear map from \(V\) into an associative algebra \(A\), with an identity, there exists a unique associative algebra homomorphism, \(F:\mathcal{T}(V)\mapto A\), with \(F(1)=1\) such that the following diagram commutes.

image

Here the uniqueness of \(F\) follows since \(\mathcal{T}(V)\) is generated by \(1\) and \(V\). Given then that on elements, \(v\in V\), \(F(v)=f(v)\), \(F\) is defined on the whole of \(\mathcal{T}(V)\) such that \(F(v_1\otimes\dots\otimes v_r)=f(v_1)\cdots f(v_r)\).

Isomorphisms

In this section the universal mapping property is used to establish a number basic isomorphisms involving tensor products.

Theorem Given vector spaces \(V_1\) and \(V_2\) over \(K\), there is a unique isomorphism, \(V_1\otimes V_2\cong V_2\otimes V_1\), such that for any \(v_1\in V_1\) and \(v_2\in V_2\), \(v_1\otimes v_2\mapsto v_2\otimes v_1\).

Proof Consider the bilinear function \(f:V_1\times V_2\mapto V_2\otimes V_1\) defined by \(f(v_1,v_2)=v_2\otimes v_1\) on elements \(v_1\in V_1\) and \(v_2\in V_2\). That \(f\) is indeed bilinear is a consequence of the bilinearity of the tensor product \(v_2\otimes v_1\). From the universal mapping property it follows that there is a linear map \(L:V_1\otimes V_2\mapto V_2\otimes V_1\) such that \(L(v_1\otimes v_2)=v_2\otimes v_1\). But likewise we could have started with a bilinear map from \(V_2\times V_1\mapto V_1\otimes V_2\) to end up with a linear map, \(L’:V_2\otimes V_1\mapto V_1\otimes V_2\), inverse, at least on pure tensors, to \(L\). That \(L’L=\id_{V_1\otimes V_2}\) on the whole of \(V_1\otimes V_2\) and indeed that \(LL’=\id_{V_2\otimes V_1}\) on the whole of \(V_2\otimes V_1\) follows since both \(L\) and \(L’\) are linear and the tensor product space is spanned by the linear sum of pure products.\(\blacksquare\)

Note that while \(V_1\otimes V_2\cong V_2\otimes V_1\) it is certainly not the case that \(v_1\otimes v_2=v_2\otimes v_1\) for arbitrary \(v_1\in V_1\) and \(v_2\in V_2\). The generalisation of the this to a tensor product of \(r\) vector spaces says that for any permutation \(\sigma\) of the numbers \(1,\dots,r\) there is a unique isomorphism,
\begin{equation}
V_1\otimes\dots\otimes V_r\cong V_{\sigma(1)}\otimes\dots\otimes V_{\sigma(r)},
\end{equation}
such that, \(v_1\otimes\dots\otimes v_r\mapsto v_{\sigma(1)}\otimes\dots\otimes v_{\sigma(r)}\), for any \(v_i\in V_i\).

Now let us consider associativity of the tensor product.

Theorem For vector spaces \(V_1\), \(V_2\) and \(V_3\) over \(K\) there is a unique isomorphism,
\begin{equation}
(V_1\otimes V_2)\otimes V_3\cong V_1\otimes(V_2\otimes V_3),
\end{equation}
such that for any \(v_1\in V_1\), \(v_2\in V_2\) and \(v_3\in V_3\), \((v_1\otimes v_2)\otimes v_3\mapsto v_1\otimes(v_2\otimes v_3)\).

Proof The function \(f:V_1\times V_2\times V_3\mapto(V_1\otimes V_2)\otimes V_3\), given by \(f(v_1,v_2,v_3)=(v_1\otimes v_2)\otimes v_3\) is clearly trilinear so by the universal mapping property we have linear map \(V_1\otimes V_2\otimes V_3\mapto(V_1\otimes V_2)\otimes V_3\) such that \(v_1\otimes v_2\otimes v_3\mapsto(v_1\otimes v_2)\otimes v_3\). Choosing bases for \(V_1\), \(V_2\) and \(V_3\) it’s clear that this maps one basis to another and so is an isomorphism. Similarly, we find \(V_1\otimes V_2\otimes V_3\cong V_1\otimes(V_2\otimes V_3)\) and the result follows.\(\blacksquare\)

Theorem For vector spaces \(V_1\), \(V_2\) and \(V_3\) over \(K\) there is a unique isomorphism,
\begin{equation}
V_1\otimes(V_2\oplus V_3)\cong(V_1\otimes V_2)\oplus(V_1\otimes V_3),
\end{equation}
such that for any \(v_1\in V_1\), \(v_2\in V_2\) and \(v_3\in V_3\), \(v_1\otimes (v_2,v_3)\mapsto (v_1\otimes v_2,v_1\otimes v_3)\).

Proof Here we need a bilinear function \(V_1\times(V_2\oplus V_3)\mapto(V_1\otimes V_2)\oplus(V_1\otimes V_3)\) so let us define a function \(f\) according to \(f(v_1,(v_2,v_3))=(v_1\otimes v_2,v_1\otimes v_3)\). That this is bilinear is demonstrated as follows,
\begin{align*}
f(av_1+bv_1′,(v_2,v_3))&=((av_1+bv_1′)\otimes v_2,(av_1+bv_1′)\otimes v_3)\\
&=(av_1\otimes v_2+bv_1’\otimes v_2,av_1\otimes v_3+bv_1’\otimes v_3)\\
&=(av_1\otimes v_2,av_1\otimes v_3)+(bv_1’\otimes v_2,bv_1’\otimes v_3)\\
&=a(v_1\otimes v_2,v_1\otimes v_3)+b(v_1’\otimes v_2,v_1’\otimes v_3)\\
&=af(v_1,(v_2,v_3))+b(v_1′,(v_2,v_3)),
\end{align*}
and
\begin{align*}
f(v_1,a(v_2,v_3)+b(v_2′,v_3′))&=f(v_1,(av_2+bv_2′,av_3,bv_3′))\\
&=(v_1\otimes(av_2+bv_2′),v_1\otimes(av_3+bv_3′))\\
&=(v_1\otimes av_2+v_1\otimes bv_2′,v_1\otimes av_3+v_1\otimes bv_3′)\\
&=(av_1\otimes v_2+bv_1\otimes v_2′,av_1\otimes v_3+bv_1\otimes v_3′)\\
&=(av_1\otimes v_2,av_1\otimes v_3)+(bv_1\otimes v_2′,bv_1\otimes v_3′)\\
&=a(v_1\otimes v_2,v_1\otimes v_3)+b(v_1\otimes v_2′,v_1\otimes v_3′)\\
&=af(v_1,(v_2,v_3))+bf(v_1,(v_2′,v_3′)).
\end{align*}
Then by the universal mapping property there is a linear map \(V_1\otimes(V_2\oplus V_3)\mapto(V_1\otimes V_2)\oplus(V_1\otimes V_3)\) such that \(v_1\otimes (v_2,v_3)\mapsto (v_1\otimes v_2,v_1\otimes v_3)\). Choosing bases for \(V_1\), \(V_2\) and \(V_3\) we see that this maps one basis to another so is an isomorphism.\(\blacksquare\)

More generally, we have that for a vector spaces \(U\), together with a (possibly infinite) set of spaces \(V_i\),
\begin{equation}
U\otimes(\bigoplus_iV_i) \cong\bigoplus_i(U\otimes V_i),
\end{equation}
with the isomorphism being the obvious extension of the one from the previous theorem.

Theorem For vector spaces \(V_1\) and \(V_2\) over \(K\) with respective dual spaces \(V_1^*\) and \(V_2^*\) there is a unique isomorphism,
\begin{equation}
V_1^*\otimes V_2^*\cong(V_1\otimes V_2)^*,
\end{equation}
such that, \(f_1\otimes f_2\mapsto(v_1\otimes v_2\mapsto f_1(v_1)f_2(v_2))\), for any \(v_1\in V_1\), \(v_2\in V_2\), \(f_1\in V_1^*\) and \(f_2\in V_2^*\).

Proof Define a function \(V_1^*\times V_2^*\mapto\mathcal{L}(V_1,V_2;K)\) by \((f_1,f_2)\mapsto((v_1,v_2)\mapsto f_1(v_1)f_2(v_2))\). That this is bilinear is clear so by the universal mapping property there is unique linear map \(V_1^*\otimes V_2^*\mapto\mathcal{L}(V_1,V_2;K)\) such that \(f_1\otimes f_2\mapsto((v_1,v_2)\mapsto f_1(v_1)f_2(v_2))\). Now this is a linear mapping between vector spaces of the same dimension, \(\dim V_1\dim V_2\), and moreover its image contains the bilinear functions we’ve already observed form a basis for \(\mathcal{L}(V_1,V_2;K)\), namely, \((v_1,v_2)\mapsto \alpha_{i_1}^{(1)}(v_1)\alpha_{i_2}^{(2)}(v_2)\), where the \(\alpha_{i_1}^{(1)}\) and \(\alpha_{i_2}^{(2)}\) are dual bases of \(V_1\) and \(V_2\) respectively. Thus this linear map is an isomorphism, which when combined with the isomorphism, \(\mathcal{L}(V_1,V_2;K)\cong(V_1\otimes V_2)^*\), gives us the isomorphism we sought.\(\blacksquare\)

We have of course also the obvious generalisation of this isomorphism,
\begin{equation}
V_1^*\otimes\dots\otimes V_r^*\cong(V_1\otimes\dots\otimes V_r)^*.
\end{equation}

Theorem For vector spaces \(V_1\) and \(V_2\) over \(K\), with \(V_1^*\) dual space of \(V_1\), there is a unique isomorphism,
\begin{equation}
V_1^*\otimes V_2\cong\mathcal{L}(V_1,V_2),
\end{equation}
such that, \(f_1\otimes v_2\mapsto(v_1\mapsto f_1(v_1)v_2)\), for any \(v_1\in V_1\), \(v_2\in V_2\) and \(f_1\in V_1^*\).

Proof The function defined by, \((f_1,v_2)\mapsto(v_1\mapsto f_1(v_1)v_2)\), is clearly a bilinear function from \(V_1^*\times V_2\) to \(\mathcal{L}(V_1,V_2)\). Therefore, by the universal mapping property, there is a unique linear map from \(V_1^*\otimes V_2\) to \(\mathcal{L}(V_1,V_2)\) given by, \(f_1\otimes v_2\mapsto(v_1\mapsto f_1(v_1)v_2)\). Now both \(V_1^*\otimes V_2\) and \(\mathcal{L}(V_1,V_2)\) have dimension \(\dim V_1\dim V_2\) and choosing bases \(e_i^{(1)}\) and \(e_i^{(2)}\) for \(V_1\) and \(V_2\) respectively, with \(\alpha_i^{(1)}\) the dual basis of \(V_1^*\), then this map takes the basis elements, \(\alpha_i^{(1)}\otimes e_j^{(2)}\), of \(V_1^*\otimes V_2\), to the linear maps \(v_1\mapsto\alpha_i^{(1)}(v_1)e_j^{(2)}\). Considering these maps applied to the basis elements of \(V_1\) we see that their matrix representations are the matrices with \(1\) in the \(j\)th row and \(i\)th column with zeros everywhere else. These matrices form a basis of \(\mathcal{L}(V_1,V_2)\) so we see that our linear map takes a basis to a basis and is therefore an isomorphism.\(\blacksquare\)

In the case of \(V_1=V_2=V\), it is of interest to establish the element of \(V^*\otimes V\) which corresponds to \(\id_V\). Denoting by \(e^i\) the dual basis of the basis \(e_i\) of \(V\) then this is the element \(\sum_ie^i\otimes e_i\) of \(V^*\otimes V\).

Consider the function \(V^*\times V\mapto K\) given by \((f,v)\mapsto f(v)\). This is clearly bilinear so induces a unique linear map \(V^*\otimes V\mapto K\), given by \(f\otimes v\mapsto f(v)\). This, understood as a linear map \(\mathcal{L}(V)\mapto K\), is just the trace, now given a basis-free (`canonical’) definition. To see that this really does coincide with the trace as previously encountered, consider an arbitrary element of \(V^*\otimes V\). It has the form, \(\sum_{ij}A_i^je^i\otimes e_j\), for some scalars, \(A_i^j\), and corresponds to the linear operator on \(V\) such that, \(e_k\mapsto\sum_{ij}A_i^je^i(e_k)e_j=\sum_jA_k^je_j\), that is, the linear operator represented by the matrix \(\mathbf{A}\) with elements \(A_i^j\). The trace of this linear operator is then \(\sum_{ij}A_i^je^i(e_j)=\sum_iA_i^i\) in accordance with our previous definition.

More generally, we have the notion of contraction. If in some tensor product space, \(V_1\otimes\dots\otimes V_r\), we have, \(V_j=V_i^*\), for some \(i\) and \(j\), then the contraction with respect to \(i\) and \(j\) is a linear mapping,
\begin{equation}
V_1\otimes\dots\otimes V_r\mapto\bigotimes_{k\neq i,j}^rV_r,\label{eq:abstract contraction}
\end{equation}
formed as a composition of a permutation of the tensor factors such that the \(i\) and \(j\) spaces are in the first two positions with the remaining order unchanged followed by the map formed as the tensor product of the map \(V^*\otimes V\mapto K\) discussed above tensored with the identity for the remaining factors followed by the trivial isomorphism corresponding to \(K\otimes V\cong V\).

Dimension and Bases

Consider first the trivial case when one of the spaces, \(V_i\) say, in a tensor product space, \(V_1\otimes\dots\otimes V_r\), is zero. Then any \(r\)-linear function out of \(V_1\times\dots\times V_r\) must be zero and so, as the image of the \(r\)-linear function \(\iota:V_1\times\dots\times V_r\mapto V_1\otimes\dots\otimes V_r\) generates the whole tensor product space, \(\dim(V_1\otimes\dots\otimes V_r)=0\) in this case. More generally, in the case that none of the spaces are zero, observe first that the dimension of \(V_1\otimes\dots\otimes V_r\) is the dimension of the dual space, \((V_1\otimes\dots\otimes V_r)^*\), and as already discussed, \(\mathcal{L}(V_1,\dots,V_r;K)\cong(V_1\otimes\dots\otimes V_r)^*\). Now suppose that a basis for the \(i\)th space, \(V_i\), is \(\{e_1^{(i)},\dots,e_{n_i}^{(i)}\}\), that is, \(\dim V_i=n_i\), for \(i=1,\dots,r\). Then for any \(r\)-linear form, \(f\), we have
\begin{equation}
f\left(\sum_{i_1}c_{(1)}^{i_1}e_{i_1}^{(1)},\dots,\sum_{i_1}c_{(r)}^{i_r}e_{i_r}^{(r)}\right)=\sum_{i_1,\dots,i_r}c_{(1)}^{i_1}\cdots c_{(r)}^{i_r}f(e_{i_1}^{(1)},\dots,e_{i_r}^{(r)}),
\end{equation}
where the \(c_{(j)}^{i_j}\) are arbitrary scalars. That is, \(f\) is uniquely specified by the \(n_1\cdots n_r\) scalars \(f(e_{i_1}^{(1)},\dots,e_{i_r}^{(r)})\). So defining the \(n_1\cdots n_r\), clearly linearly independent, \(r\)-linear forms, \(\phi_{i_1\dots i_r}\), such that, \(\phi_{i_i\dots i_r}(e_{j_1}^{(1)},\dots,e_{j_r}^{(r)})=\delta_{i_1j_1}\dots\delta_{i_rj_r}\), that is, \(\phi_{i_1\dots i_r}=\alpha_{i_1}^{(1)}(\cdot)\cdots\alpha_{i_r}^{(r)}(\cdot)\), where \(\{\alpha_1^{(i)},\dots,\alpha_{n_i}^{(i)}\}\) is the dual basis of \(V_i^*\), we see that any \(r\)-linear form can be expressed as linear combination of the \(\phi_{i_i\dots i_r}\) and so they form a basis for \(\mathcal{L}(V_1,\dots,V_r;K)\) which therefore has dimension \(n_1\cdots n_r\). That is,
\begin{equation}
\dim(V_1\otimes\dots\otimes V_r)=\dim V_1\cdots\dim V_r.
\end{equation}
It is also then clear that the \(n_1\cdots n_r\) pure tensors \(\{e_{i_1}^{(1)}\otimes\dots\otimes e_{i_r}^{(r)}\}\) form a basis for \(V_1\otimes\dots\otimes V_r\).

A nice application of the tensor product machinery is to the construction of the complexification of a vector space \(V\) over \(\RR\). As a vector space over \(\RR\), \(\CC\) is a two dimensional vector space with basis \(\{1,i\}\). Suppose \(\{e_1,\dots,e_n\}\) is a basis for the \(n\)-dimensional space \(V\). Then we can form the tensor product space \(\CC\otimes V\) over \(\RR\). As a real vector space this has basis \(\{1\otimes e_1,\dots,1\otimes e_n,i\otimes e_1,\dots,i\otimes e_n\}\) but we can define scalar multiplication by complex numbers simply as \(z(z’\otimes v)=(zz’)\otimes v\). We should now demonstrate that \(\CC\otimes V\cong V_\CC\) where \(V_\CC\) is as defined in Realification and Complexification. Consider the map \(\phi:V_\CC\mapto\CC\otimes V\) defined by \(\phi(v,v’)=1\otimes v+i\otimes v’\). This is clearly linear over \(\RR\). That it’s also linear over \(\CC\) follows since \(\phi(i(v,v’))=\phi(-v’,v)=-1\otimes v’+i\otimes v=i(1\otimes v+i\otimes v’)=i\phi(v,v’)\). To verify that this is an isomorphism , we’ll construct the inverse map. The bilinear map \(\CC\times V\mapto V_\CC\) defined by \((z,v)\mapsto z(v,0)\) induces by virtue of the universal mapping property a linear map we’ll call \(\phi’:\CC\otimes V\mapto V_\CC\) given by \(\phi'(z\otimes v)=z(v,0)\) on pure tensors. But this map is obviously also \(\CC\)-linear and since, \(\phi\circ\phi'(z\otimes v)=\phi(z(v,0))=z\phi(v,0)=z(1\otimes v)=z\otimes v\), we see that it is the inverse of \(\phi\).

Let us also note at this point that we have the obvious isomorphisms,
\begin{equation}
K\otimes V\cong V\cong V\otimes K,
\end{equation}
when \(V\) is a vector space over \(K\).

Definition and Construction of the Tensor Product

The central object of the following notes is the tensor product space. It is a vector space constructed from multiple vector spaces. The key attribute of such a space is that though it is a linear space it has an intrinsic multilinearity. Indeed, as we will see, tensor product spaces can be regarded equivalently as certain multilinear functions.

Given vector spaces \(V_1,\dots,V_n,W\) over a field \(K\), recall that a function \(f:V_1\times\dots\times V_n\mapto W\) is multilinear if it is linear in each entry in turn, that is, for each \(i=1,\dots, n\),
\begin{equation}
f(v_1,\dots,u_i+v_i,\dots,v_n)=f(v_1,\dots,u_i,\dots,v_n)+f(v_1,\dots,v_i,\dots,v_n).
\end{equation}
We encountered such functions already in a definition of the determinant as an alternating multilinear form (when the target space is the underlying field the word form is often encountered), also known as a volume form, as well as in the definition of inner products, excepting the complex Hermitian case, which were seen to be bilinear forms. They are clearly objects of considerable importance in their own right. However note the terminology. We have referred to these relationships as ‘functions’ or ‘forms’. They are (obviously) not linear maps. As such, they are not amenable to the main body of linear algebra technology we’ve developed. In particular we have no \(\ker\) or \(\img\) spaces. It turns out though that this apparent ‘otherness’ is somewhat illusory for we can introduce a new vector space, called the tensor product space, such that, essentially, multilinear functions on cartesian products become linear maps defined on the new tensor product spaces. With the multilinearity of the function encapsulated in the structure of the new product space.

Let us begin by considering the bilinear case. Thus, we suppose we have three vector spaces \(V_1,V_2\) and \(W\) with a function \(f:V_1\times V_2\mapto W\) such that
\begin{align}
f(av_1+bv_1′,v_2)&=af(v_1,v_2)+bf(v_1′,v_2)\\
f(v_1,av_2+bv_2′)&=af(v_1,v_2)+bf(v_1,v_2′),
\end{align}
that is, for any \(v_1\in V_1\) and \(v_2\in V_2\), \(f(v_1,-):V_2\mapto W\) \(f(-,v_2):V_1\mapto W\) are both linear maps.

Definition Given vector spaces \(V_1\) and \(V_2\) over a field \(K\), by a tensor product of \(V_1\) and \(V_2\) will be meant a vector space, \(U\), over \(K\), together with a bilinear function, \(\iota:V_1\times V_2\mapto U\), that is, a pair, \((U,\iota)\), with the following universal mapping property: whenever \(f:V_1\times V_2\mapto W\) is a bilinear function with values in a vector space \(W\) over \(K\), then there exists a unique linear mapping \(L:U\mapto W\) such that, \(L\iota=f\). That is, such that the following diagram commutes.

image

Theorem If \(V_1\) and \(V_2\) are vector spaces over \(K\) then a tensor product, \((U,\iota)\) exists and is unique in the sense that if \((U_1,\iota_1)\) and \((U_2,\iota_2)\) are two tensor products then there exists a unique isomorphism \(\phi:U_1\mapto U_2\) with \(\iota_2=\phi\circ\iota_1\).

We’ll establish this result in two stages.

Proof (Uniqueness) By definition, the following two diagrams commute.

image
So we have \(L_2L_1:U_1\mapto U_1\) such that \(L_2L_1\iota_1=L_2\iota_2=\iota_1\). But we then have the following two commutative diagrams,

image

and the uniqueness stipulation of the universal mapping property therefore implies \(L_2L_1=\id_{U_1}\). Clearly a similar argument leads to \(L_1L_2=\id_{U_2}\), thus \(U_1\) and \(U_2\) are indeed isomorphic, with \(L_1\) the desired isomorphism.
With uniqueness established, we can talk of the tensor product of vector spaces, \(V_1\) and \(V_2\), denoted simply \(V_1\otimes V_2\), with the bilinear function, \(\iota:V_1\times V_2\mapto V_1\otimes V_2\), understood to be part of the definition, but often not explicitly mentioned.

To establish existence, we need to recall the notion of the free vector space, \(F(S)\), on a set \(S\) over \(K\). This is the `formal enhancement’ of the set \(S\) with a formal scalar multiplication and a formal vector addition such that \(F(S)\) becomes a vector space consisting of finite sums \(a^1s_1+\dots+a^rs_r\) of the formal products \(a^is_i\) of elements \(a^i\in K\) and \(s_i\in S\). This is made precise in the following definition.

Definition The free vector space, \(F(S)\), on a set \(S\) over a field \(K\) is the set of all set-theoretic maps \(S\mapto K\) which vanish at all but a finite number of points of \(S\).

According to this definition, \(F(S)\) is a vector space over \(K\) with the usual pointwise addition and scalar multiplication, \((f+g)(s)=f(s)+g(s)\) and \((af)(s)=af(s)\). It has a basis consisting of the ‘delta functions’ \(\delta_s\) such that \(\delta_s(t)=1\) if \(s=t\) and 0 otherwise. Equivalence with the ‘formal enhancement of \(S\)’ definition is seen by observing that any \(f\in F(S)\) can be written uniquely as, \(f=f(s_1)\delta_{s_1}+\dots+f(s_r)\delta_{s_r}\) and so the formal finite sum \(a^1s_1+\dots+a^rs_r\) corresponds, upon identifying the elements \(s_i\) with their delta functions \(\delta_{s_i}\), to the map \(f\) such that \(f(s_i)=a^i\) for \(i=1,\dots,r\).

Note that if our set is a finite dimensional vector space \(V\) over \(K\) then \(F(V)\) is infinite dimensional as long as the field \(K\) is infinite (assuming \(V\) is not zero dimensional).

We now conclude the proof by constructing the tensor product space \(V_1\otimes V_2\) as a certain quotient of the free vector space, \(F(V_1\times V_2)\).

Proof (Existence) Consider, \(F(V_1\times V_2)\), the free vector space over the product space,\(V_1\times V_2\), of vector spaces \(V_1\) and \(V_2\) over \(K\). In this space we identify a subspace \(D\) defined to be the span of all elements of the form
\begin{equation*}
(av_1+bv_1′,v_2)-a(v_1,v_2)-b(v_1′,v_2),
\end{equation*}
and
\begin{equation*}
(v_1,av_2+bv_2′)-a(v_1,v_2)-b(v_1,v_2′),
\end{equation*}
where \(a,b\in K\), \(v_1,v_1’\in V_1\) and \(v_2,v_2’\in V_2\) and we’ve suppressed the \(\delta\) symbol, identifying \((v,w)\) with \(\delta_{(v,w)}\). Define \(V=F(V_1\times V_2)/D\) and \(\iota:V_1\times V_2\mapto V\) by \(\iota(v_1,v_2)=(v_1,v_2)+D\). Then \(\iota\) is bilinear by definition of the subspace \(D\). Also, note that any element of \(V\) can be expressed as some (finite) linear sum of elements of the image in \(V\) of \(\iota\). Now suppose we have a bilinear function \(f:V_1\times V_2\mapto W\) taking values in a vector space \(W\) over \(K\). We need to establish the existence and uniqueness of the linear map \(L\) such that the following diagram commutes.

image

Define \(L’:F(V_1\times V_2)\mapto W\) by \(L'(v_1,v_2)=f(v_1,v_2)\) extended linearly to the whole of \(F(V_1\times V_2)\) according to \(L'(a(v_1,v_2)+b(v_1′,v_2′))=af(v_1,v_2)+bf(v_1′,v_2′)\). Then since
\begin{equation*}
L'((av_1+bv_1′,v_2)-a(v_1,v_2)-b(v_1′,v_2))=f(av_1+bv_1′,v_2)-af(v_1,v_2)-bf(v_1′,v_2),
\end{equation*}
and
\begin{equation*}
L'((v_1,av_2+bv_2′)-a(v_1,v_2)-b(v_1,v_2′))=f(v_1,av_2+bv_2′)-af(v_1,v_2)-bf(v_1,v_2′),
\end{equation*}
by the bilinearity of \(f\), we see that \(D\subseteq\ker L’\). \(L’\) therefore factors as \(L’=L\pi\) where \(\pi:F(V_1\times V_2)\mapto F(V_1\times V_2)/D\) is the quotient map. Clearly \(L\iota=f\) so we have demonstrated existence of the desired linear map. Uniqueness is immediate, given commutativity of the diagram above and the fact, already noted, that \(F(V_1\times V_2)/D\) is linear span of the image if \(\iota\).\(\blacksquare\)

Though the definition and associated existence and uniqueness result are rather abstract, the basic idea is simple. A bilinear function on a product of spaces taking values in a third space `transfers’ its bilinearity to the tensor product space on which there is a corresponding linear transformation to the same target space. Indeed, if we denote by, \(\mathcal{L}(V_1,V_2;W)\), the vector space of bilinear functions on \(V_1\times V_2\) taking values in a vector space \(W\) over \(K\) and by, \(\mathcal{L}(V_1\otimes V_2,W)\), the vector space of linear maps from \(V_1\otimes V_2\) to \(W\), then we have the vector space isomorphism,
\begin{equation}
\mathcal{L}(V_1,V_2;W)\cong\mathcal{L}(V_1\otimes V_2,W).
\end{equation}
By the universal mapping property, to any \(f\in\mathcal{L}(V_1,V_2;W)\) there corresponds a unique linear map \(L_f:V_1\otimes V_2\mapto W\) so the mapping of sets here is such that \(f\mapsto L_f\). This is clearly linear. It is surjective since for any \(T:V_1\otimes V_2\mapto W\), \(T\iota\) is bilinear and injective since if \(f\in\mathcal{L}(V_1,V_2;W)\) is non-zero, \(f\neq0\), then \(L_f\iota\neq0\) so that \(L_f\neq0\). In particular, when \(W=K\), we have that the vector space of bilinear forms on \(V_1\times V_2\) is isomorphic to the dual of \(V_1\otimes V_2\), \((V_1\otimes V_2)^*\).

Elements of \(V_1\otimes V_2\) are written as linear sums of terms of the form, \(v_1\otimes v_2\), which are themselves images, via the bilinear function, \(\iota\), of elements, \((v_1,v_2)\in V_1\times V_2\), \(\iota(v_1,v_2)=v_1\otimes v_2\). Elements of \(V_1\otimes V_2\) of the form \(v_1\otimes v_2\) are called pure tensors. Although a small subset of all elements of \(V_1\otimes V_2\), as noted in the proof of existence, the pure tensors span the tensor product space.

Note that \(v_1\otimes v_2=0\) for some \(v_1\in V_1\) and \(v_2\in V_2\) if and only if every bilinear function \(f:V_1\times V_2\mapto W\) is zero on \((m,n)\). Consequently \(v_1\otimes v_2\neq 0\) if there exists some bilinear function \(f\) such that \(f(v_1,v_2)\neq 0\). Also note that \(v_1\otimes0=0\otimes v_2=0\) for any \(v_1\in V_1\) and \(v_2\in V_2\) since for any bilinear map, \(f\), \(f(v_1,0)=0=f(0,v_2)\).

All of the above can be immediately generalised to more than 2 vector spaces. If we have \(r\) vector spaces, \(V_1,\dots, V_r\) over a field \(K\) then the tensor product \(V_1\otimes\dots\otimes V_r\) is the unique vector space over \(K\) together with the \(r\)-linear function \(\iota:V_1\times\dots\times V_r\mapto V_1\otimes\dots\otimes V_r\) such that to any \(r\)-linear function \(f:V_1\times\dots\times V_r\mapto W\), \(W\) a vector space over \(K\), there is a unique linear mapping \(L:V_1\otimes\dots\otimes V_r\mapto W\) such that \(L\iota=f\). In this more general context we have, of course,
\begin{equation}
\mathcal{L}(V_1,\dots,V_r;W)\cong\mathcal{L}(V_1\otimes\dots\otimes V_r,W),
\end{equation}
and in particular,
\begin{equation}
\mathcal{L}(V_1,\dots,V_r;K)\cong(V_1\otimes\dots\otimes V_r)^*.
\end{equation}

Given vector spaces \(V_1,\dots, V_r\) and \(W_1,\dots, W_r\) over \(K\) with linear maps \(A_i:V_i\mapto W_i\) we can form the \(r\)-linear function from \(V_1\times\dots\times V_r\) to \(W_1\otimes\dots\otimes W_r\) such that \((v_1,\dots,v_r)\mapsto A_1(v_1)\otimes\dots\otimes A_r(v_r)\). That this is indeed an \(r\)-linear map follows immediately from the fact that the \(A_i\) are linear and the \(r\)-linearity of the tensor product \(W_1\otimes\dots\otimes W_r\). The universal mapping property then gives us a linear map from \(V_1\otimes\dots\otimes V_r\) to \(W_1\otimes\dots\otimes W_r\) such that \(v_1\otimes\dots\otimes v_r\mapsto A_1(v_1)\otimes\dots\otimes A_r(v_r)\) which we’ll write as \(A_1\otimes\dots\otimes A_r\) and call the tensor product of the linear maps \(A_i\).

Simultaneously Diagonalisable Operators

Recall that the commutator of two linear operators, \(S\) and \(T\), is \([S,T]=ST-TS\) and that they are said to commute if \([S,T]=0\).

Theorem Two normal operators, \(S\) and \(T\), commute if and only if there is an orthonormal basis consisting of common eigenvectors of \(S\) and \(T\). In this case \(S\) and \(T\) are said to be simultaneously diagonalisable since their respective matrix representations with respect to such a basis are diagonal.

Proof The if is trivial so we focus on the only if. \(S\) and \(T\) have spectral decompositions of the form,
\begin{equation*}
S=\sum_{i=1}^r\mu_iQ_{\mu_i}\quad\text{and}\quad T=\sum_{i=1}^s\lambda_iP_{\lambda_i},
\end{equation*}
where the \(\mu_i\) and \(\lambda_i\) are the distinct eigenvalues of \(S\) and \(T\) respectively and \(Q_{\mu_i}\) and \(P_{\lambda_i}\) the projectors onto the corresponding eigenspaces. Now, if any operator, \(A\), commutes with a normal operator, \(T\), then the eigenspaces, \(V_{\lambda_i}\), of \(T\) are \(A\)-invariant since if, \(\ket{\lambda}\in V_\lambda\), then, \(TA\ket{\lambda}=AT\ket{\lambda}=\lambda A\ket{\lambda}\), so, \(A\ket{\lambda}\in V_\lambda\). This means that both, \(V_{\lambda_i}\), and its orthogonal complement, \(V_{\lambda_i}^\perp\), are \(A\)-invariant, so we have a direct sum decomposition, \(V=V_{\lambda_i}\oplus V_{\lambda_i}^\perp\), with corresponding orthogonal projectors, \(P_{\lambda_i}\) and \(P_{\lambda_i}^\perp\), such that for any \(\ket{v}\in V\) we can write, \(P_{\lambda_i}A\ket{v}=P_{\lambda_i}A(P_{\lambda_i}\ket{v}+P_{\lambda_i}^\perp\ket{v})=AP_{\lambda_i}\ket{v}\). That is, \(A\) commutes with each of the projection operators of the spectral decomposition of \(T\). It follows that if the two normal operators, \(S\) and \(T\), commute then so too must their respective projectors, \(Q_{\mu_i}P_{\lambda_j}=P_{\lambda_j}Q_{\mu_i}\). Now define \(R_{ij}=Q_{\mu_i}P_{\lambda_j}\), clearly \(R_{ij}^2=R_{ij}\), \(R_{ij}R_{kl}=0\) unless \(i=k\) and \(j=l\), and \(R_{ij}^\dagg=R_{ij}\) so the \(R_{ij}\) are orthogonal projectors which moreover satisfy,
\begin{equation}
\sum_{i,j}R_{ij}=\sum_{i=1}^rQ_{\mu_i}\sum_{j=1}^sP_{\lambda_j}=\id_V,
\end{equation}
because, \(\sum_{i=1}^rQ_{\mu_i}=\id_V\) and \(\sum_{j=1}^sP_{\lambda_j}=\id_V\). Thus, since, \(\img R_{ij}=V_{\mu_i}\cap V_{\lambda_j}\), we have
\begin{equation}
V=\bigoplus_{i,j}V_{\mu_i}\cap V_{\lambda_j},
\end{equation}
and choosing an orthonormal basis for each \(V_{\mu_i}\cap V_{\lambda_j}\) we obtain, as desired, an orthonormal basis for \(V\) consisting of common eigenvectors of \(S\) and \(T\). Equivalently, we note that \(S\) and \(T\) have spectral decompositions with respect to the projectors \(R_{ij}\), \(S=\sum_{i=1}^r\mu_iQ_{\mu_i}=\sum_{i,j}\mu_iR_{ij}\) and \(T=\sum_{j=1}^s\lambda_jP_{\lambda_j}=\sum_{ij}\lambda_jR_{ij}\).\(\blacksquare\)

Outer Products in Dirac Notation

Throughout this section we assume our vector space \(V\) is complex Hermitian with positive definite inner product.

Dirac invented a notation for linear algebra particularly well suited to quantum mechanics. In this notation, vectors \(\psi,\phi\in V\) are denoted by the kets, \(\ket{\psi}\) and \(\ket{\phi}\), and their inner product, \((\psi,\phi)\), by the bra-ket \(\braket{\psi|\phi}\). The bra, \(\bra{\psi}\), and corresponding ket, \(\ket{\psi}\), are viewed as being distinct objects, with \(\bra{\psi}\) being precisely the image in \(V^*\) under the Riesz antiisomorphism, of \(\ket{\psi}\). A linear combination of vectors, \(a\psi+b\phi\), where \(a,b\in\CC\), would be denoted either as \(\ket{a\psi+b\phi}\) or, more commonly, as, \(a\ket{\psi}+b\ket{\phi}\). Likewise, the corresponding bra is usually denoted, \(a^*\bra{\psi}+b^*\bra{\phi}\), though sometimes it may be convenient to write it equivalently as \(\bra{a\psi+b\phi}\).

For any linear operator, \(T\in\mathcal{L}(V)\), the vector \(T\psi\) is denoted either as \(\ket{T\psi}\) or \(T\ket{\psi}\). The corresponding bra, \(\bra{T\psi}\), is such that \(\braket{T\psi|\phi}=\braket{\psi|T^\dagger\phi}\) for any \(\ket{\phi}\in V\). This is just the inner product of, \(\ket{\psi}\), and, \(\ket{T^\dagger\phi}=T^\dagger\ket{\phi}\). As such, it it is typically denoted, \(\braket{\psi|T^\dagger|\phi}\), which is itself then equal to, \(\braket{\phi|T|\psi}^*\), and we can think of the bra corresponding to the action of \(T\) (from the left) on a ket \(\psi\), \(T\ket{\psi}\), as the action of \(T^\dagg\) from the right on the bra \(\bra{\psi}\), \(\bra{\psi}T^\dagg\). In particular, Hermitian operators, \(T\), are defined to satisfy
\begin{equation}
\braket{\psi|T|\phi}=\braket{\phi|T|\psi}^*.\label{equ:Dirac Hermitian}
\end{equation}

It’s worth noting that even though one rarely sees scalars or operators within the bras or kets of Dirac notation, there is no reason why we shouldn’t and indeed sometimes it may be convenient. Dirac notation’s particular elegance is in its handling of a construction known as the outer product.

For any vectors \(\psi,\phi\in V\) we can define an operator, \(A_{\psi,\phi}\in\mathcal{L}(V)\), such that for any \(\chi\in V\)
\begin{equation}
A_{\psi,\phi}\chi=(\phi,\chi)\psi.
\end{equation}
The operator \(A_{\psi,\phi}\) is called the outer product of \(\psi\) and \(\phi\). In Dirac notation, this operator would simply be the ‘butterfly product’, \(\ket{\psi}\bra{\phi}\), with its action on \(\ket{\chi}\), \(\ket{\psi}\braket{\phi|\chi}\).

In particular, if \(\ket{\psi}\in V\) is a normalised vector then the projection onto the subspace spanned by \(\ket{\psi}\), \(P_\psi\), is just,
\begin{equation}
P_\psi=\ket{\psi}\bra{\psi}.
\end{equation}

When using Dirac notation, it is typical to denote orthonormal basis vectors, \(e_i\), by indexed kets, \(\ket{i}\). Then the expansion of an arbitrary vector in this basis is given by
\begin{equation}
\ket{\psi}=\sum_{i=1}^{N}\ket{i}\braket{i|\psi}=\sum_{i=1}^{N}P_i\ket{\psi},
\end{equation}
where \(P_i\) is the projector on to the one-dimensional subspace spanned by \(\ket{i}\). This must hold for every \(\ket{\psi}\) so we see that the identity operator for the space may be written as
\begin{equation}
I=\sum_{i=1}^N\ket{i}\bra{i}=\sum_{i=1}^NP_i.
\end{equation}
This is known as the resolution of the identity associated with the given basis set.

The adjoint of an operator of the form \(\ket{u}\bra{v}\), \((\ket{u}\bra{v})^\dagg\) is simply \(\ket{v}\bra{u}\) since for any vectors \(\ket{\psi}\) and \(\ket{\phi}\),
\begin{equation*}
\bra{\psi}(\ket{u}\bra{v})^\dagg\ket{\phi}=\left(\braket{\phi|u}\braket{v|\psi}\right)^*=\braket{\psi|v}\braket{u|\phi}=\bra{\psi}(\ket{v}\bra{u})\ket{\phi}.
\end{equation*}

We know that if \(T\in\mathcal{L}(V)\) is a normal operator then there exists an orthonormal basis for \(V\) of eigenvectors of \(T\). If \(T\) has \(r\) distinct eigenvalues, \(\lambda_1,\dots,\lambda_r\), each with geometric multiplicity, \(d_i\), then an eigenvalue \(\lambda_i\) for which \(d_i>1\) is said to be degnerate with its degree of degeneracy, \(d_i\). Let us denote this orthonormal basis, \(\ket{i,j}\), with \(i=1,\dots,r\) and \(j=1,\dots,d_i\). That is,
\begin{equation}
T\ket{i,j}=\lambda_i\ket{i,j},\quad i=1,\dots,r,\; j=1,\dots,d_i.
\end{equation}
If we denote by \(P_\lambda\) the projector onto the eigenspace, \(V_\lambda\), so that
\begin{equation}
P_{\lambda_i}=\sum_{j=1}^{d_i}\ket{i,j}\bra{i,j},
\end{equation}
then \(T\) may be written as the spectral decomposition,
\begin{equation}
T=\sum_{i=1}^r\lambda_iP_{\lambda_i}.
\end{equation}

The Riesz Lemma

Recall that we saw that there was no natural isomorphism between a finite dimensional vector space, \(V\), and its dual, \(V^*\). When we have a non-degenerate inner product, the situation is a little different. Consider first the case of a real non-degenerate inner product space, and define a map \(V\mapto V^*\) according to \(v\mapsto f_v\), where \(f_v(w)=(v,w)\) for any \(w\in V\). That this is indeed a linear map follows since \(f_{av}(w)=(av,w)=a(v,w)\), that is, \(av\mapsto af_v\). It is injective since the inner product is non-degenerate, and, since \(\dim V=\dim V^*\), it is an isomorphism. In the complex case we again have a bijection but now, since \(f_{av}(w)=(av,w)=a^*(v,w)\) it is no longer linear but antilinear. Thus, in this case, we say that the map is an antiisomorphsim.

Polar and Singular Value Decompositions

Might there be an analog for operators of the polar decomposition of complex numbers \(z=re^{i\theta}\)? If so, we’d hope for something of the form \(T=PU\) with a unitary operator \(U\) corresponding to \(e^{i\theta}\) and some operator \(P\) corresponding to the ‘absolute value’ of \(T\). Guided by analogy we should hope for \(P\) to be positive’ in some sense.

Definition A linear operator \(T\) on a real orthogonal or complex Hermitian inner product space, \(V\), is called positive if it is self-adjoint and if \((Tv,v)\geq0\) for all \(v\in V\).

This is a sensible definition since for a self-adjoint operator \(T\), it is not difficult to see that the condition \((Tv,v)\geq0\) is equivalent to all eigenvalues, \(\lambda\), of \(T\) being such that \(\lambda\geq0\). Now, for any operator \(T\in\mathcal{L}(V)\), consider \(T^\dagger T\). This is clearly self-adjoint, and since \((T^\dagger Tv,v)=(Tv,Tv)\geq0\), by assumption of the positive definiteness of the inner product, it is also positive (note also that \(\ker T=\ker(T^\dagger T)\)). So to any operator \(T\) is associated a self-adjoint positive operator \(T^\dagger T\). However for our immediate goal, of achieving an analog of polar decomposition, it is somehow of the wrong ‘order’. We would need something like its ‘square root’.

Recall that any self-adjoint operator, \(T\), has a spectral decomposition \(T=\sum_i\lambda_iP_i\). If \(T\) is in fact positive, then each \(\lambda_i\geq0\) so that we could define,
\begin{equation}
\sqrt{T}=\sum_i\sqrt{\lambda_i}P_i.
\end{equation}
Clearly, \(\sqrt{T}\) is positive and \((\sqrt{T})^2=T\) (also note that \(\ker\sqrt{T}=\ker T\)). Moreover, it is the unique positive operator whose square is \(T\). Indeed suppose \(A\) was a positive operator such that \(A^2=T\). Suppose \(A\) has a spectral decomposition, \(A=\sum_i\mu_iQ_i\), then \(A^2=T=\sum_i\mu_i^2Q_i\). But by the uniqueness of the spectral decomposition of \(T\) we know that, appropriately ordered, we must have \(\lambda_i=\mu_i^2\) and \(Q_i=P_i\), so that \(A=\sqrt{T}\).

Theorem Any operator \(T\) on a real orthogonal or complex Hermitian inner product space with positive definite inner product can be expressed as a product of two operators, \(T=UP\), called its polar decomposition, in which \(P\) is a uniquely determined positive operator and \(U\) is an isometry, which is unique if and only if \(T\) is invertible.

Proof To begin, notice that if such a decomposition exists, then \(T^\dagger T=PU^\dagger UP=P^2\), and by the uniqueness of the square root \(P=\sqrt{T^\dagger T}\) is unique. Also, if \(T\) is invertible, then so is \(P\), so \(U=TP^{-1}\) is unique. Conversely if \(T\) is not invertible then neither is \(P\) and in this case \(\ker P\) is non-trivial and we can write \(V=\ker P\oplus \img P\). \(U\) can then be replaced by \(UU’\) where \(U’\) is any isometry of the form \(U’=f\oplus\id_{\img P}\) where \(f\) is any isometry of \(\ker P\).
Now let us consider existence. Define \(P=\sqrt{T^\dagger T}\) and observe that in the case that \(T\) is invertible we could simply define \(U=TP^{-1}\) which, as is easily verified, is an isometry. In the case that \(T\) is not invertible, we start by considering the subspace \(\img P\) of \(V\) and define on this subspace the map \(U_1:\img P\mapto\img T\) as \(U_1=TP^{-1}|_{\img P}\) (\(P\) is an isomorphism on \(\img P\) since \(\ker P\cap\img P=0\)). So defined \(U_1\) is clearly linear and since \(\ker P=\ker T\), it is well defined, in the sense that if \(v_1,v_2\in V\) are such that \(Pv_1=Pv_2\) then \(Tv_1=Tv_2\), is injective, and \(\dim\img P=\dim\img T\). Moreover, for any \(v_1,v_2\in\img P\), with \(v_1=Pu_1\) and \(v_2=Pu_2\), \((U_1v_1,U_1v_2)=(Tu_1,Tu_2)=(T^\dagger Tu_1,u_2)=(P^2u_1,u_2)=(Pu_1,Pu_2)=(v_1,v_2)\), so if \(\{v_1,\dots,v_k\}\) is an orthonormal basis of \(\img P\) and \(\{v_{k+1},\dots,v_n\}\) an orthonormal basis of \(\ker P=(\img P)^\perp\), then \(\{U_1v_{1},\dots,U_1v_k\}\) is an orthonormal basis for \(\img T\) which we can extend to an orthonormal basis of \(V=\img P\oplus(\img P)^\perp=\img T\oplus(\img T)^\perp\) as \(\{U_1v_{1},\dots,U_1v_k,u_k+1,\dots,u_n\}\) where \(\{u_k+1,\dots,u_n\}\) is a basis of \((\img T)^\perp\). Defining \(U_2:(\img P)^\perp\mapto(\img T)^\perp\) as \(U_2v_i=u_i\) for \(i=k+1,\dots,n\) then \(U=U_1\oplus U_2\) is the desired isometry of \(V\).\(\blacksquare\)

Remark From the polar decomposition of an operator \(T\) as \(T=UP\) we can obtain the polar decomposition of \(T\) in the form \(T=P’U\) where \(U\) is the same isometry and now \(P’=UPU^{-1}\).

For any linear operator \(T\), the eigenvalues of \(\sqrt{T^\dagger T}\) are called the singular values of \(T\). Clearly the singular values of \(T\) are non-negative real numbers. We will use the polar decomposition, to establish the following result, known as the singular value decomposition.

Theorem For any operator \(T\in\mathcal{L}(V)\) with singular values \(s_i\), \(i=1,\dots,n\), there exist orthonormal bases of \(V\), \(\{e_1,\dots,e_n\}\) and \(\{f_1,\dots,f_n\}\) such that \(Te_i=s_if_i\) for \(i=1,\dots,n\).

Proof Choose \(\{e_1,\dots,e_n\}\) as the basis of eigenvectors \(\sqrt{T^\dagger T}\), so that \(\sqrt{T^\dagger T}e_i=s_ie_i\). By the polar decomposition there is an isometry \(U\) such that \(T=U\sqrt{T^\dagger T}\). It follows that \(Te_i=U\sqrt{T^\dagger T}e_i=s_iUe_i\). Thus defining \(f_i=Ue_i\) we have \(Te_i=s_if_i\).\(\blacksquare\)

Example Suppose \(X\) and \(Y\) are \(n\)-dimensional subspaces of a \(2n\)-dimensional vector space \(V\) such that \(V=X\oplus Y\) with \(X\) and \(Y\) not assumed to be orthogonal. As subspaces of \(V\), both \(X\) and \(Y\) have orthonormal bases which we’ll denote by \(\{e_1,\dots,e_n\}\) and \(\{f_1,\dots,f_n\}\) respectively. We define a matrix \(\mathbf{A}\) with coordinates \(A_{ij}=(e_i,f_j)\). This matrix then has a polar decomposition, \(\mathbf{A}=\mathbf{U}\mathbf{P}\) and we can use the matrix \(U\) to define a new orthonormal basis for \(X\) with elements \(e_i’=Ue_i\), \(i=1,\dots,n\). Notice that \((e_i’,f_j)=(U_{ki}e_k,f_j)=U_{ki}^*(e_k,f_j)=U^\dagger_{ik}A_kj=P_{ij}\). In particular, this means that \((e_i’f_j)=(e_j’,f_i)^*\). Now introduce a linear operator, \(T\in\mathcal{L}(V)\), by defining it on basis elements as \(Te_i’=f_i\) and \(Tf_i=e_i’\). Clearly this map is such that \(T(X)=Y\) and \(T(Y)=X\). We demonstrate that it is in fact an isometry of \(V\). We have, \((Te_i’,Te_j’)=(f_i,f_j)=\delta_{ij}=(e_i’,e_j’)=(Tf_i,Tf_j)\), but also, \((Te_i’,Tf_j)=(f_i,e_j’)=(e_j’,f_i)^*=(e_i’,f_j)\). More generally, for \(X\) and \(Y\) subspaces of equal dimension of a vector space \(V\), then there always exists an isometry of \(V\) which interchanges \(X\) and \(Y\). To see this we first define \(\tilde{X}\) and \(\tilde{Y}\) to be respectively the orthogonal complements of \(X\cap Y\) in \(X\) and \(Y\) respectively so that \(X=(X\cap Y)\oplus\tilde{X}\) and \(Y=(X\cap Y)\oplus\tilde{Y}\). Now we can write \(V\) as \(V=(X\cap Y)\oplus(\tilde{X}\oplus\tilde{Y})\oplus(X+Y)^\perp\). In this decomposition, note that \(\tilde{X}\oplus\tilde{Y}\) is not an orthogonal direct sum whilst the other two are. The result now follows since we’ve already seen how to construct the desired isometry of \(\tilde{X}\oplus\tilde{Y}\) and this can be extended to an isometry of \(V\) by acting as the identity on \(X\cap Y\) and \((X+Y)^\perp\).

The singular value decomposition appears most commonly as the statement that any (real/complex) \(m\times n\) matrix \(\mathbf{T}\) can be expressed as a product of matrices \(\mathbf{P}\mathbf{\Sigma}\mathbf{Q}^\dagger\) where \(\mathbf{P}\) and \(\mathbf{Q}\) are respectively \(m\times m\) and \(n\times n\) real orthogonal/complex unitary matrices and \(\mathbf{\Sigma}\) is a diagonal matrix with non-negative real numbers on the diagonal.

To see this, consider two finite dimensional vector spaces, \(U\) and \(V\), both real orthogonal or complex Hermitian inner product spaces of dimensions \(n\) and \(m\) respectively, and a linear map \(T\in\mathcal{L}(U,V)\). Then \(\ker T^\dagger T=\ker T\) since clearly \(\ker T\subseteq\ker T^\dagger T\) and if \(u\in\ker T^\dagger T\) then \(0=(T^\dagger Tu,u)=(Tu,Tu)\) so \(u\in\ker T\). Thus, since \(T^\dagger T\) is positive, if we set \(r=\rank T=\rank T^\dagger T\) then \(U\) has an orthonormal basis \(\{u_1,\dots,u_r,u_{r+1},\dots,u_n\}\) of eigenvectors of \(T^\dagger T\) such that the corresponding eigenvalues can be arranged so that \(\lambda_1\geq\cdots\geq\lambda_r>0=\lambda_{r+1}=\cdots=\lambda_n\). Having fixed notation in this way we have that \(\{u_{r+1},\dots,u_n\}\) is a basis for \(\ker T=\ker T^\dagger T\) and that \(\{u_1,\dots,u_r\}\) is a basis for \((\ker T)^\top\). In fact, \(\img T^\dagger\subseteq(\ker T)^\top\), since if \(\tilde{u}\in\img T^\dagger\) then \(\tilde{u}=T^\dagger v\) for some \(v\in V\) and so for any \(u\in\ker T\), \((u,\tilde{u})=(u,T^\dagger v)=(Tu,v)=0\). Conversely, \((\ker T)^\top\subseteq\img T^\dagger\) since if \(u\notin\img T^\dagger\) then \(\exists\tilde{u}\in(\img T^\dagger)^\top\) such that \((u,\tilde{u})\neq0\) but \(T^\dagger T\tilde{u}\in\img T^\dagger\) and \((T\tilde{u},T\tilde{u})=(\tilde{u},T^\dagger T\tilde{u})=0\) so that \(T\tilde{u}=0\) and \(\tilde{u}\in\ker T\) which means \(u\notin(\ker T)^\top\). Thus \(\img T^\dagger=(\ker T)^\top\) and \(\{u_1,\dots,u_r\}\) is thus a basis for \(\img T^\dagger\).

Let us now introduce the notation \(s_i=\sqrt{\lambda_i}\) so that \(T^\dagger Tu_i=s_i^2u_i\) and, for \(i=1,\dots,r\), define the elements \(v_i\in V\) by \(v_i=(1/s_i)Tu_i\). Then
\begin{equation*}
(v_i,v_j)=\frac{1}{s_is_j}(Tu_i,Tu_j)=\frac{1}{s_is_j}(u_i,T^\dagger Tu_j)=\frac{s_j}{s_i}\delta_{i,j}
\end{equation*}
so \(\{v_1,\dots,v_r\}\) is an orthonormal basis for \(\img T=(\ker T^\dagger)^\top\) which can be extended to an orthonormal basis for \(V\) with \(\{v_{r+1},\dots,v_m\}\) then a basis for \(\ker T^\dagger\). With the basis vectors \(\{v_1,\dots,v_m\}\) of \(V\) so defined, \(TT^\dagger v_i=s_iTu_i=s_i^2v_i\), so that the \(v_i\) are eigenvectors for \(TT^\dagger\) with the same eigenvalues as the \(u_i\) have as eigenvectors for \(T^\dagger T\).

The result for matrices now follows since given the bases \(\{u_i\}\) and \(\{v_i\}\) of \(U\) and \(V\) respectively and the corresponding standard bases \(\{e_i\}\), there are real orthogonal/complex unitary matrices \(\mathbf{P}\) and \(mathbf{Q}\) whose elements are defined by \(e_i=P_i^ju_j\) and \(e_i=Q_i^jv_j\). The matrix elements of \(T\) with respect to the standard bases of $U$ and $V$ are defined by \(Te_i=T_i^je_j\) but we know that \(Tu_i=s_iv_i\) so we have
\begin{equation*}
Te_i=P_i^jTu_j=P_i^js_jv_j=P_i^js_j{Q^\dagger}_j^ke_k=T_i^ke_k.
\end{equation*}
That is, \(\mathbf{T}=\mathbf{P}\mathbf{\Sigma}\mathbf{Q}^\dagger\).

Self-Adjoint, Unitary and Orthogonal Operators

We continue to focus on real orthogonal and complex Hermitian spaces with positive definite inner products, now considering maps between them. Suppose \(V\) and \(W\) are such spaces, of dimensions \(n\) and \(m\) respectively, and \(T\in\mathcal{L}(V,W)\). Then the inner product allows us to uniquely associate a linear map \(T^\dagger\in\mathcal{L}(W,V)\) with \(T\), by defining it to be such that \((v,T^\dagger w)=(Tv,w)\). This is the adjoint of \(T\) (in the case of Hermitian spaces, sometimes the Hermitian adjoint). To see that it is indeed unique, notice that if there were distinct \(v’,v”\in V\) such that \((Tv,w)=(v,v’)\) and \((Tv,w)=(v,v”)\) then \((v,v’-v”)=0\) for all \(v\in V\) so \(v’=v”\). Its existence follows since if \(\{e_i\}\) is an orthonormal basis of \(V\), then any \(v\in V\) can be expressed as \(v=\sum v^ie_i\) with \(v^i=(e_i,v)\). Then \((Tv,w)=\sum(v,e_i)(Te_i,w)=(v,\sum(Te_i,w)e_i)\), so \(T^\dagger w=\sum(Te_i,w)e_i\). Indeed, if the matrix representation of \(T\) is \(\mathbf{T}\), then the matrix representation of \(T^\dagger\) is \(\mathbf{T}^\dagger\).

Of particular interest are operators on inner product spaces which coincide with their adjoint.

Definition A linear operator \(T\in\mathcal{L}(V)\) is self-adjoint or Hermitian if \(T=T^\dagger\).

Remark In terms of an orthonormal basis for \(V\), this means that the matrix representation of \(T\) is such that \(\mathbf{T}=\mathbf{T}^\dagger\). That is, if \(K=\RR\) it is symmetric whilst if \(K=\CC\) it is Hermitian.

Remark We’ll typically use the word Hermitian in the specific context of a complex Hermitian space and self-adjoint when the underlying vector space could be either real orthogonal or complex Hermitian.

Remark Recall Example. For any \(T\in\mathcal{L}(V)\), \(\ker T^\dagger=(\img T)^\perp\). Indeed if \(u\in\ker T^\dagger\) then for any element \(w\in\img T\), \((u,w)=(u,Tv)\), for some \(v\in V\), and \((u,Tv)=(T^\dagger u,v)=0\), so \(u\in(\img T)^\perp\). Conversely, if \(u\in(\img T)^\perp\), then for any \(v\in V\), \((T^\dagger u,v)=(u,Tv)=0\), so \(u\in\ker T^\dagger\). In particular, if \(T\) is self-adjoint, then \(\ker T=(\img T)^\perp\), and we have the orthogonal direct sum decomposition, \(V=\ker T\oplus\img T\).

The defintion of an Hermitian operator looks somewhat like an operator/matrix version of the condition on complex numbers by which we restrict to the reals. Though seemingly a trivial observation there turns out to be a rather remarkable analogy between linear operators and complex numbers and in this analogy, real numbers do indeed correspond to self-adjoint operators. In due course we’ll see analogs for modulus 1 and positive numbers as well as the polar decomposition of complex numbers. First though we obtain some results which make the real number/self-adjoint operator correspondence particularly compelling.

It is not difficult to see that for any linear operator \(T\) on a positive definite inner product space, \(T=0\) if and only if \((Tv,w)=0\) for all \(v,w\in V\). In fact we can do somewhat better than this.

Theorem A linear operator \(T\) on a complex Hermitian space \(V\) is zero, \(T=0\), if and only if \((Tv,v)=0\) for all \(v\in V\).

Proof Observe that generally, we have that,
\begin{equation}
(T(av),bw)+(T(bw),av)=(T(av+bw),av+bw)-\abs{a}^2(Tv,v)-\abs{b}^2(Tw,w)
\end{equation}
for all \(v,w\in V\) and \(a,b\in\CC\). In particular, if \((Tv,v)=0\) for all \(v\in V\), then with \(a=b=1\),
\begin{equation}
(Tv,w)+(Tw,v)=0,
\end{equation}
and choosing \(a=1\) and \(b=i\) then dividing by \(i\),
\begin{equation}
(Tv,w)-(Tw,v)=0,
\end{equation}
so that \((Tv,w)=0\) for all \(v,w\in V\) and we conclude that \(T=0\).\(\blacksquare\)

Note that we made essential use of the fact that \(V\) is a complex vector space here. The result is not generally valid for real vector spaces.

Theorem A linear operator \(T\in\mathcal{L}(V)\) on a complex Hermitian space \(V\) is Hermitian if and only if \((Tv,v)\) is real for all \(v\in V\).

Proof If \(T\) is Hermitian, then \((Tv,v)=(v,Tv)=(Tv,v)^*\) so \((Tv,v)\) is real. Conversely, if \((Tv,v)\) is real then \((Tv,v)=(Tv,v)^*=(v,T^\dagger v)^*=(T^\dagger v,v)\) so that \(((T-T^\dagger)v,v)=0\). But we’ve already seen that in this case we must have \(T-T^\dagger=0\) so \(T=T^\dagger\).\(\blacksquare\)

The following result provides an even stronger expression of the ‘realness’ of self-adjoint operators.

Theorem \(T\in\mathcal{L}(V)\) is a self-adjoint operator on a real orthogonal or complex Hermitian inner product space with positive definite inner product if and only if

  1. All eigenvalues of \(T\) are real.
  2. Eigenvectors with distinct eigenvalues are orthogonal.
  3. There exists an orthonormal basis of eigenvectors of \(T\). In particular, \(T\) is diagonalisable.

Proof The if is straightforward so we concentrate on the only if.

  1. Assuming \(K=\CC\), if \(Tv=\lambda v\) for some \(\lambda\in\CC\) and a non-zero vector \(v\in V\), then, \(\lambda^*(v,v)=(\lambda v,v)=(Tv,v)=(v,Tv)=\lambda(v,v)\), so \(\lambda\) must be real. Now suppose \(K=\RR\), but let us pass to the complexification, \(V_\CC\), defined in Realification and Complexification. To avoid confusion with the inner product, let us abuse notation and write an element of \(V_\CC\) as \(v+iv’\) rather than \((v,v’)\), where \(v,v’\in V\). Then, given the symmetric inner product on \(V\), \((\cdot,\cdot)\) we can define an Hermitian inner product on \(V_\CC\) according to \((u+iu’,v+iv’)=(u,v)+(u’,v’)-i(u’,v)+i(u,v’)\), where \(u,u’,v,v’\in V\). Clearly, \(T_\CC\), which acts on an element, \(v+iv’\), of \(V_\CC\) as \(T_\CC(v+iv’)=Tv+iTv’\), is self-adjoint with respect to this inner product and since, as we already know, the matrices of \(T\) and \(T_\CC\) with respect to a basis of \(V\) (which is also a basis of \(V_\CC\)) are identical, it follows that the eigenvalues of \(T\) must be real.
  2. Suppose \(Tv_1=\lambda_1 v_1\) and \(Tv_2=\lambda_2 v_2\) with \(\lambda_1\neq\lambda_2\). Then \(\lambda_1^*(v_1,v_2)=(Tv_1,v_2)=(v_1,Tv_2)=(v_1,v_2)\lambda_2\), so that if, for contradiction, \((v_1,v_2)\neq 0\) then \(\lambda_1^*=\lambda_1=\lambda_2\) contradicting the initial assumption.
  3. In the case \(\dim V=1\) the result is trivial so we proceed by induction on the dimension of \(V\) and assume the result holds for \(n=\dim V>1\). We know there is a real eigenvalue \(\lambda\) and eigenvector \(v_1\in V\) such that \(Tv_1=\lambda v_1\) and since by assumption the inner product of \(V\) is non-degenerate we have the decomposition \(V=\Span(v_1)\oplus\Span(v_1)^\perp\). Now since for any \(w\in\Span(v_1)^\perp\), we have \((Tw,v_1)=(w,Tv_1)=(w,v_1)\lambda=0\), \(\Span(v_1)^\perp\) is \(T\)-invariant. Thus, by the induction hypothesis, we can assume the result for \(\Span(v_1)^\perp\) and take \(\hat{v}_2,\dots,\hat{v}_n\) as its orthonormal basis. Then defining \(\hat{v}_1=v_1/\norm{v_1}\), \(\hat{v}_1,\dots,\hat{v}_n\) is an orthonormal basis of eigenvectors of \(T\).
  4. \(\blacksquare\)

Since the eigenspaces corresponding to the \(r\) distinct eigenvalues of a self-adjoint operator \(T\) decompose \(V\) into an orthogonal direct sum, \(V=\oplus_iV_i\), there correspond orthogonal projectors, \(P_i\), such that, \(\id_V=\sum_iP_i\). Thus for any \(v\in V\) we have \(Tv=T(\sum_iP_iv)=\sum_i\lambda_iP_iv\), that is,
\begin{equation}
T=\sum_{i=1}^r\lambda_iP_i,
\end{equation}
the spectral decomposition of \(T\), of which we’ll see a great deal more later. This decomposition is unique. Suppose we have \(r\) orthogonal projectors, \(Q_i\), complete in the sense that \(\id_V=\sum_iQ_i\), together with \(r\) real numbers, \(\mu_i\), such that \(T=\sum_i\mu_iQ_i\). If \(v\in\img Q_i\), that is, \(v=Q_iv\), we must have \(Tv=\mu_iv\). That is, \(\mu_i\) is an eigenvalue of \(T\) and any \(v\in\img Q_i\) belongs to the eigenspace of \(\mu_i\). Conversely, if \(Tv=\lambda v\) for some \(v\in V\), then since \(v=\sum_iQ_iv\), writing \(v_i=Q_iv\) we have \(\sum_i(\lambda-\mu_i)v_i=0\). Those \(v_i\) which are non-zero are orthogonal and since \(v\neq0\) at least one must be non-zero, so there must be some \(i\) such that \(\lambda=\mu_i\). Let us suppose we have relabelled the \(\mu_i\) such that \(\lambda_i=\mu_i\). Clearly, for any polynomial \(p\), \(p(T)=\sum_ip(\lambda_i)P_i=\sum_ip(\lambda_i)Q_i\). In particular, if we define a polynomial \(p_j(x)=\prod_{i\neq j}(x-\lambda_i)/(\lambda_j-\lambda_i)\) then \(p_j(\lambda_j)=1\) but \(p_j(\lambda_i)=0\) for all \(i\neq j\). These polynomials allow us then to establish \(P_i=Q_i\) for all \(i=1,\dots,r\).

Generally, a projector \(P\in\mathcal{L}(V)\) on a real orthogonal or complex Hermitian inner product space with positive definite inner product is an orthogonal projector if and only if \(P\) is self-adjoint. Indeed, the eigenvalues of a projection operator are either 0 or 1 and the corresponding eigenspaces are \(\ker P\) and \(\img P\) respectively. Thus, if \(P\) is self-adjoint, these eigenspaces are orthogonal. Conversely, if \(P\) is an orthogonal projection, we have, \(V=\ker P\oplus\img P\), with \(\ker P\) and \(\img P\) orthogonal. So choosing an orthonormal basis for \(V\) as the sum of of orthonormal bases for \(\ker P\) and \(\img P\) we have a basis which is precisely an orthonormal basis of eigenvectors of \(P\), so \(P\) is self-adjoint.

Earlier, we classified inner product spaces up to isometry. Focusing on real orthogonal and Hermitian spaces with non-degenerate inner products, let us now consider automorphisms, \(f:V\mapto V\), of these spaces which are also isometries, that is, such that, \((v,w)=(f(v),f(w))\), \(\forall v,w\in V\). Given our definition of the adjoint, this means that, \(f^\dagger f=\id_V\). If \(f\) is an isometry of a real orthogonal geometry it is called an orthogonal operator whilst an isometry of an Hermitian geometry is called a unitary operator.

Isometries of course form a group and in the case of a real orthogonal space whose inner product has signature, \((p,n-p,0)\), that group is called the orthogonal group of the inner product, \(O(V,p,n-p)\). Choosing an orthonormal basis for \(V\), \(\{e_i\}\), such that \((e_i,e_j)=\epsilon_i\delta_{ij}\) (no summation) with \(\epsilon_i=1\), \(1\leq i\leq p\) and \(\epsilon_i=-1\), \(p+1\leq i\leq n\), and defining the matrix, \(\mathbf{I}_{p,q}\), to be
\begin{equation*}
\mathbf{I}_{p,q}=
\begin{pmatrix}
\mathbf{I}_p & \mathbf{0}\\
\mathbf{0} & -\mathbf{I}_q
\end{pmatrix},
\end{equation*}
then its not difficult to see that we have a group isomorphism, \(O(V,p,n-p)\cong O(p,n-p)\), where \(O(p,n-p)\) is the matrix group,
\begin{equation*}
O(p,n-p)=\{\mathbf{O}\in\text{GL}_n(\CC)\mid \mathbf{O}^\mathsf{T}\mathbf{I}_{p,n-p}\mathbf{O}=\mathbf{I}_{p,n-p}\}.
\end{equation*}
In particular, when the inner product is positive definite, then the group of isometries is denoted simply, \(O(V)\), and we have the isomorphism, \(O(V)\cong O(n)\), where,
\begin{equation*}
O(n)=\{\mathbf{O}\in\text{GL}_n(\CC)\mid \mathbf{O}^\mathsf{T}\mathbf{O}=\mathbf{I}_n\}.
\end{equation*}

Similarly, in the case of Hermitian geometries the group of isometries is called the unitary group of the inner product. If the signature of the inner product is \((p,n-p,0)\) then it is denoted, \(U(V,p,n-p)\), and we have an isomorphism, \(U(V,p,n-p)\cong U(p,n-p)\), where \(U(p,n-p)\) is the matrix group defined by,
\begin{equation*}
U(p,n-p)=\{\mathbf{U}\in\text{GL}_n(\CC)\mid \mathbf{U}^\dagger\mathbf{I}_{p,n-p}\mathbf{U}=\mathbf{I}_{p,n-p}\}.
\end{equation*}
In particular, when the inner product is positive definite, a choice of an orthonormal basis provides an isomorphism \(U(V)\cong U(n)\) where,
\begin{equation*}
U(n)=\{\mathbf{U}\in\text{GL}_n(\CC)\mid \mathbf{U}^\dagger\mathbf{U}=\mathbf{I}_n\}.
\end{equation*}

In the spirit of the analogy, already discussed, between complex numbers and linear operators, unitary operators look like they should correspond to complex numbers of unit modulus. Indeed, as the following result, similar to Theorem, demonstrates, the spectra of such operators justifies the analogy.

Theorem \(U\) is a unitary operator on an Hermitian inner product space over \(\CC\) with positive definite inner product if and only if,

  1. All eigenvalues \(\lambda\) of \(U\) are such that \(|\lambda|=1\).
  2. Eigenvectors with distinct eigenvalues are orthogonal.
  3. There exists an orthonormal basis of eigenvectors of \(U\). In particular \(U\) is diagonalisable.

Proof The if is straightforward so we concentrate on the only if.

  1. If \(Uv=\lambda v\) for some \(\lambda\in\CC\) and a non-zero vector \(v\in V\) then \((Uv,Uv)=\lambda^*\lambda(v,v)=(v,v)\) so \(|\lambda|=1\).
  2. Suppose \(Uv_1=\lambda_1 v_1\) and \(Uv_2=\lambda_2 v_2\) with \(\lambda_1\neq\lambda_2\). Then \(\lambda_1^*(v_1,v_2)=(Uv_1,v_2)=(v_1,U^{-1}v_2)=(v_1,v_2)\lambda_2^{-1}=(v_1,v_2)\lambda_2^*\). That is \((v_1,v_2)=0\).
  3. In the case \(\dim V=1\) the result is trivial so we proceed by induction on the dimension of \(V\) and assume the result holds for \(n=\dim V>1\). We know there is some eigenvalue \(\lambda\) and eigenvector \(v_1\in V\) such that \(Uv_1=\lambda v_1\) and since by assumption the inner product of \(V\) is non-degenerate we have the decomposition \(V=\Span(v_1)\oplus\Span(v_1)^\perp\). Now since for every \(w\in\Span(v_1)^\perp\), we have \((Uw,v_1)=(w,U^{-1}v_1)=(w,v_1)\lambda^*=0\), \(\Span(v_1)^\perp\) is \(U\)-invariant. Thus, by the induction hypothesis, we can assume the result for \(\Span(v_1)^\perp\) and take \(\hat{v_2},\dots,\hat{v}_n\) as its orthonormal basis. Then defining \(\hat{v}_1=v_1/\norm{v_1}\), \(\hat{v}_1,\dots,\hat{v}_n\) is an orthonormal basis of eigenvectors of \(U\).
  4. \(\blacksquare\)

The corresponding result for orthogonal operators is a little different.

Theorem \(O\) is an orthogonal operator on a real orthogonal inner product space over \(\RR\) with positive definite inner product if and only if there exists an orthonormal basis in terms of which the matrix representation of \(O\) has the form,
\begin{equation}\label{orthog operator}
\begin{pmatrix}
\mathbf{R}(\theta_1)&\mathbf{0}& & & & & & & \\
\mathbf{0}&\ddots & & & & & & & \\
& &\mathbf{R}(\theta_r) && & & & & \\
& & &1 &\mathbf{0} & & & &\\
& & & \mathbf{0}&\ddots & & & & \\
& & & && 1& & & & \\
& & & & & &-1& \mathbf{0}& \\
& & & & & &\mathbf{0}&\ddots & \\
& & & & & & & &-1
\end{pmatrix}.
\end{equation}

Proof As ever, the if is straightforward so we focus on the only if. Dimension 1 is trivial. Consider dimension \(2\). A choice of orthonormal basis tells us that any orthogonal operator, \(O\), has a matrix representation \(\mathbf{O}\), such that \(\mathbf{O}^\mathsf{T}\mathbf{O}=\mathbf{I}_2\). Considering the determinant of this, we see that \(\det\mathbf{O}=\pm1\) and so \(\mathbf{O}\) must be of the form,
\begin{equation*}
\begin{pmatrix}
a&b\\
c&d
\end{pmatrix},
\end{equation*}
with \(a^2+c^2=1=b^2+d^2\), \(ab+cd=0\) and \(ad-bc=\pm1\). So in the case of determinant 1 we have \(b=-c\) and \(a=d\), and any such matrix can be written as
\begin{equation}
\begin{pmatrix}
\cos\theta&-\sin\theta\\
\sin\theta&\cos\theta
\end{pmatrix},
\end{equation}
for \(0\leq\theta<\pi\), that is, a rotation through an angle \(\theta\). Notice that for \(0<\theta<\pi\) this has no eigenvalues in \(\RR\). In the determinant -1 case we have \(a=-d\) and \(b=c\) and any such matrix can be written
\begin{equation}
\begin{pmatrix}
\cos\theta&\sin\theta\\
\sin\theta&-\cos\theta
\end{pmatrix},
\end{equation}
for \(0\leq\theta<\pi\), that is, a reflection in the line with unit vector \(\cos\frac{\theta}{2}\mathbf{e}_1+\sin\frac{\theta}{2}\mathbf{e}_2\). In contrast to the rotation matrix, this matrix has eigenvalues \(\pm1\) and so can be diagonalised. We conclude that the result holds for dimensions 1 and 2 and proceed, as in the unitary case, by induction on the the dimension of \(V\), assuming the result holds for some some \(n=\dim V>2\).
If \(O\) has a real eigenvalue then by the same argument argument as in the unitary case the result follows. Otherwise, consider \(O_\CC\), the complexification of \(O\), which must have pairs of complex eigenvalues. We recall from the discussion of the real Jordan normal form in The Real Jordan Normal Form that each pair of complex eigenvalues corresponds to a \(2\)-dimensional \(O\)-invariant subspace of \(V\). Choose such a subspace, \(V_0\). Then in an orthonormal basis we know that the matrix representation of the restriction of \(O\) to \(V_0\) must have the form \(\mathbf{R}(\theta)\) where,
\begin{equation}
\mathbf{R}(\theta)=\begin{pmatrix}
\cos\theta&-\sin\theta\\
\sin\theta&\cos\theta
\end{pmatrix}.
\end{equation}
Similar reasoning to the unitary case makes it clear that \(V_0^\perp\) is also \(O\)-invariant and so the result follows.\(\blacksquare\)

So in summary, an operator \(O\) on a real orthogonal space \(V\) with a positive definite inner product is an isometry, that is, an orthogonal operator, if and only if its matrix representation with respect to some orthonormal basis of \(V\) has the form,~\eqref{orthog operator}. An operator \(U\) on a (complex) Hermitian space \(V\) with a positive definite inner product is an isometry, that is, a unitary operator, if and only if its matrix representation with respect to some orthonormal basis of \(V\) is diagonal with its diagonal elements all belonging to the unit circle in \(\CC\). We’ve also seen that operators \(T\) on real orthogonal or Hermitian spaces with positive definite inner products are self-adjoint if and only if they are diagonalisable with all diagonals real.

It’s also worth noting that in an inner product space, \(V\), the orbit of a linear operator \(A\in\mathcal{L}(V)\) which is self-adjoint, under the usual action of vector space automorphisms \(P\in\text{GL}(V)\), \(P^{-1}AP\), does not consist only of other self-adjoint operators. However, if we consider instead the action of isometries that is, elements \(U\in U(V)\) when \(V\) is over \(\CC\) or of elements \(O\in O(V)\) when working over \(\RR\), then the orbits consist exclusively of self-adjoint operators.

Proposition If \(\mathbf{A}\in\text{Mat}_n(\CC)\) is Hermitian, that is, \(\mathbf{A}^\dagger=\mathbf{A}\), then there exists a \(\mathbf{P}\in U(n)\) such that \(\mathbf{P}^{-1}\mathbf{A}\mathbf{P}\) is real and diagonal. Similarly, if \(\mathbf{A}\in\text{Mat}_n(\RR)\) is symmetric, that is, \(\mathbf{A}^\mathsf{T}=\mathbf{A}\), then there exists a \(\mathbf{P}\in O(n)\) such that \(\mathbf{P}^{-1}\mathbf{A}\mathbf{P}\) is real and diagonal.

Proof We use Theorem and treat both \(K=\CC\) and \(K=\RR\) simultaneously. So assume \(\mathbf{A}\in\text{Mat}_n(K)\), then \(L_\mathbf{A}\in\mathcal{L}(K^n)\) is self-adjoint with respect to the standard inner product on \(K^n\), and from Theorem there is an orthonormal basis of eigenvectors for \(L_\mathbf{A}\). That is, there are some \(\lambda_1,\dots,\lambda_n\in\RR\) such that \(L_\mathbf{A}v_i=\lambda_i v_i\) or in terms of matrices,
\begin{equation}
\mathbf{A}(\mathbf{v}_1\dots\mathbf{v}_n)=(\lambda_1\mathbf{v}_1\dots\lambda_n\mathbf{v}_n)
\end{equation}\(\blacksquare\)


Theorem and Theorem are clearly very similar. Indeed, in the context of an Hermitian inner product space they can be ‘unified’ through the notion of a normal linear operator \(T\), that is, one which commutes with its own adjoint, \(TT^\dagger=T^\dagger T\). Self-adjoint operators and unitary operators are clearly both examples of normal operators. Now, for a normal operator \(T\), we have \((Tv,Tv)=(v,T^\dagger Tv)=(v,TT^\dagger v)=(T^\dagger v,T^\dagger v)\). Also, if \(T\) is normal then so is \((T-\lambda\id_V)^\dagger=T^\dagger-\lambda^*\id_V\) for any \(\lambda\in\CC\). So for any normal operator \(T\), if \(\lambda\) is an eigenvalue with eigenvector \(v\), \(Tv=\lambda v\), then \(\lambda^*\) is an eigenvalue of \(T^\dagger\) with eigenvector \(v\), \(T^\dagger v=\lambda^* v\).
Then, if \(Tv_1=\lambda_1v_1\) and \(Tv_2=\lambda_2v_2\) with \(\lambda_1\neq\lambda_2\), \(\lambda_1^*(v_1,v_2)=(Tv_1,v_2)=(v_1,T^\dagger v_2)=(v_1,v_2)\lambda_2^*\), so \((v_1,v_2)=0\). Finally, if \(v_1\) is an eigenvector of a normal operator \(T\) with eigenvalue \(\lambda_1\) then \(\Span(v_1)^\perp\) is \(T\)-invariant since for every \(w\in\Span(v_1)^\perp\) we have \((Tw,v_1)=(w,T^\dagger v_1)=(w,v_1)\lambda_1^*\). So we have,

Theorem \(T\) is a normal operator on an Hermitian inner product space with positive definite inner product if and only if,

  1. Eigenvectors with distinct eigenvalues are orthogonal.
  2. There exists an orthonormal basis of eigenvectors of \(T\). In particular \(T\) is diagonalisable.

Thus, in the case of Hermitian inner product spaces, we have a generalisation of the spectral decomposition result for self-adjoint operators. Any normal operator \(T\) has a spectral decomposition,
\begin{equation}
T=\sum_i\lambda_iP_i,
\end{equation}
where as before, the orthogonal projectors, \(P_i\), correspond to the eigenspaces, \(V_i\), of the (distinct) eigenvalues \(\lambda_i\) of \(T\) in the orthogonal decomposition \(V=\oplus_iV_i\).

Gram-Schmidt Orthogonalisation

In the cases of real orthogonal or complex Hermitian inner product spaces with a positive definite inner product there are no null vectors so we can start from any existing basis, \(\{f_i\}\) say, and systematically construct an orthogonal basis, \(\{e_i\}\), as follows. We begin by setting \(e_1=f_1\). We set \(e_2\) to be \(f_2\) with any component parallel to \(f_1\) removed, that is,
\begin{equation*}
e_2=f_2-\pi_{e_1}f_2,
\end{equation*}
where we have introduced the operator
\begin{equation*}
\pi_uv=\frac{(u,v)}{(u,u)}u.
\end{equation*}
Likewise \(e_3\) is just \(f_3\) with its components in the \(e_1\) and \(e_2\) directions removed and so on with the general vector \(e_j\) given by
\begin{equation}
e_j=f_j-\sum_{i=1}^{j-1}\pi_{e_i}f_j.
\end{equation}
Given the orthogonal basis \(\{e_i\}\) we can then normalise each vector to obtain an orthonormal basis. This procedure, for constructing an orthogonal basis from any given basis in an inner product space, is known as Gram-Schmidt orthogonalisation.

We can also view the construction ‘in reverse’ as follows. Given the assumption of positive (negative) definite inner product we know that not only is the \(n\times n\) matrix, \(\mathbf{G}\), of the inner product with respect to the given basis, \(\{f_i\}\), invertible, but every \(k\times k\) submatrix with elements \(G_{ij}\), \(1\leq i,j\leq k\) is also invertible. Indeed, if it weren’t, and there were numbers \(x^i\) not all zero such that \(\sum_{i=1}^kG_{ij}x^j=0\), then the vector \(\sum_{i=1}^kx^if_i\) would a non-zero null vector since \((\sum_{i=1}^kx^if_i,\sum_{j=1}^kx^jf_j)=\sum_{i,j=1}^k{x^i}^*x^j(f_i,f_j)=\sum_{i,j=1}^k{x^i}^*x^jG_ij=0\). Now define \(e_n=f_n-\sum_{i,j=1}^{n-1}G_{ij}^{-1}(f_j,f_n)f_i\). It is clearly orthogoonal to all \(f_i\), \(1\leq i\leq n-1\). \(e_{n-1}\) is then defined as \(e_{n-1}=f_{n-1}-\sum_{i,j=1}^{n-2}G_{ij}^{-1}(f_j,f_{n-1})f_i\), and is clearly orthogonal to all \(f_i\), \(1\leq i\leq n-2\) and to \(e_n\). Continuing in this way we arrive at the desired orthogonal basis.

Now, we know that if \(U\) is a subspace of \(V\) on which the restriction of the inner product is non-degenerate then we can write, \(V=U\oplus U^\perp\), which specifies the orthogonal projection onto \(U\), \(P:V\mapto U\). In fact, \(Pv\) is the vector in \(U\) closest to \(v\) in the sense that \(\norm{v-Pv}\leq\norm{v-u}\) for all \(u\in U\) with equality if and only if \(u=Pv\). To see this, observe first that for any \(v\in V\) and \(u\in U\), we have \(v-Pv\in U^\perp\) and \(Pv-u\in U\), so that \((v-Pv,Pv-u)=0\), and therefore,
\begin{align*}
\norm{v-u}^2&=\norm{v-Pv+Pv-u}^2\\
&=(v-Pv+Pv-u,v-Pv+Pv-u)\\
&=(v-Pv,v-Pv)+(Pv-u,Pv-u)+2\Real(v-Pv,Pv-u)\\
&=\norm{v-Pv}^2+\norm{Pv-u}^2,
\end{align*}
from which it follows that \(\norm{v-Pv}\leq\norm{v-u}\) with equality if and only if \(u=Pv\).

In the context of the Gram-Schmidt procedure, if \(U=\Span(e_1,\dots,e_k)\), then notice that the orthogonal projector \(P\) is just \(P=\sum_{i=1}^k\pi_{e_i}\), so geometrically, the inductive step of the Gram-Schmidt procedure is to express the next of the original basis vectors, \(f_{k+1}\), as the sum of the vector in \(\Span(e_1,\dots,e_k)\) closest to \(f_{k+1}\) with an element \(e_{k+1}\) of \(\Span(e_1,\dots,e_k)^\perp\).

It’s worth mentioning that any set of orthonormal vectors in a real orthogonal or complex Hermitian inner product space with positive definite inner product, may be extended to an orthonormal basis for \(V\) since they can be extended to a basis of \(V\) and Gram-Schmidt employed to orthogonalise the extension.