Polar and Singular Value Decompositions

Might there be an analog for operators of the polar decomposition of complex numbers \(z=re^{i\theta}\)? If so, we’d hope for something of the form \(T=PU\) with a unitary operator \(U\) corresponding to \(e^{i\theta}\) and some operator \(P\) corresponding to the ‘absolute value’ of \(T\). Guided by analogy we should hope for \(P\) to be positive’ in some sense.

Definition A linear operator \(T\) on a real orthogonal or complex Hermitian inner product space, \(V\), is called positive if it is self-adjoint and if \((Tv,v)\geq0\) for all \(v\in V\).

This is a sensible definition since for a self-adjoint operator \(T\), it is not difficult to see that the condition \((Tv,v)\geq0\) is equivalent to all eigenvalues, \(\lambda\), of \(T\) being such that \(\lambda\geq0\). Now, for any operator \(T\in\mathcal{L}(V)\), consider \(T^\dagger T\). This is clearly self-adjoint, and since \((T^\dagger Tv,v)=(Tv,Tv)\geq0\), by assumption of the positive definiteness of the inner product, it is also positive (note also that \(\ker T=\ker(T^\dagger T)\)). So to any operator \(T\) is associated a self-adjoint positive operator \(T^\dagger T\). However for our immediate goal, of achieving an analog of polar decomposition, it is somehow of the wrong ‘order’. We would need something like its ‘square root’.

Recall that any self-adjoint operator, \(T\), has a spectral decomposition \(T=\sum_i\lambda_iP_i\). If \(T\) is in fact positive, then each \(\lambda_i\geq0\) so that we could define,
\begin{equation}
\sqrt{T}=\sum_i\sqrt{\lambda_i}P_i.
\end{equation}
Clearly, \(\sqrt{T}\) is positive and \((\sqrt{T})^2=T\) (also note that \(\ker\sqrt{T}=\ker T\)). Moreover, it is the unique positive operator whose square is \(T\). Indeed suppose \(A\) was a positive operator such that \(A^2=T\). Suppose \(A\) has a spectral decomposition, \(A=\sum_i\mu_iQ_i\), then \(A^2=T=\sum_i\mu_i^2Q_i\). But by the uniqueness of the spectral decomposition of \(T\) we know that, appropriately ordered, we must have \(\lambda_i=\mu_i^2\) and \(Q_i=P_i\), so that \(A=\sqrt{T}\).

Theorem Any operator \(T\) on a real orthogonal or complex Hermitian inner product space with positive definite inner product can be expressed as a product of two operators, \(T=UP\), called its polar decomposition, in which \(P\) is a uniquely determined positive operator and \(U\) is an isometry, which is unique if and only if \(T\) is invertible.

Proof To begin, notice that if such a decomposition exists, then \(T^\dagger T=PU^\dagger UP=P^2\), and by the uniqueness of the square root \(P=\sqrt{T^\dagger T}\) is unique. Also, if \(T\) is invertible, then so is \(P\), so \(U=TP^{-1}\) is unique. Conversely if \(T\) is not invertible then neither is \(P\) and in this case \(\ker P\) is non-trivial and we can write \(V=\ker P\oplus \img P\). \(U\) can then be replaced by \(UU’\) where \(U’\) is any isometry of the form \(U’=f\oplus\id_{\img P}\) where \(f\) is any isometry of \(\ker P\).
Now let us consider existence. Define \(P=\sqrt{T^\dagger T}\) and observe that in the case that \(T\) is invertible we could simply define \(U=TP^{-1}\) which, as is easily verified, is an isometry. In the case that \(T\) is not invertible, we start by considering the subspace \(\img P\) of \(V\) and define on this subspace the map \(U_1:\img P\mapto\img T\) as \(U_1=TP^{-1}|_{\img P}\) (\(P\) is an isomorphism on \(\img P\) since \(\ker P\cap\img P=0\)). So defined \(U_1\) is clearly linear and since \(\ker P=\ker T\), it is well defined, in the sense that if \(v_1,v_2\in V\) are such that \(Pv_1=Pv_2\) then \(Tv_1=Tv_2\), is injective, and \(\dim\img P=\dim\img T\). Moreover, for any \(v_1,v_2\in\img P\), with \(v_1=Pu_1\) and \(v_2=Pu_2\), \((U_1v_1,U_1v_2)=(Tu_1,Tu_2)=(T^\dagger Tu_1,u_2)=(P^2u_1,u_2)=(Pu_1,Pu_2)=(v_1,v_2)\), so if \(\{v_1,\dots,v_k\}\) is an orthonormal basis of \(\img P\) and \(\{v_{k+1},\dots,v_n\}\) an orthonormal basis of \(\ker P=(\img P)^\perp\), then \(\{U_1v_{1},\dots,U_1v_k\}\) is an orthonormal basis for \(\img T\) which we can extend to an orthonormal basis of \(V=\img P\oplus(\img P)^\perp=\img T\oplus(\img T)^\perp\) as \(\{U_1v_{1},\dots,U_1v_k,u_k+1,\dots,u_n\}\) where \(\{u_k+1,\dots,u_n\}\) is a basis of \((\img T)^\perp\). Defining \(U_2:(\img P)^\perp\mapto(\img T)^\perp\) as \(U_2v_i=u_i\) for \(i=k+1,\dots,n\) then \(U=U_1\oplus U_2\) is the desired isometry of \(V\).\(\blacksquare\)

Remark From the polar decomposition of an operator \(T\) as \(T=UP\) we can obtain the polar decomposition of \(T\) in the form \(T=P’U\) where \(U\) is the same isometry and now \(P’=UPU^{-1}\).

For any linear operator \(T\), the eigenvalues of \(\sqrt{T^\dagger T}\) are called the singular values of \(T\). Clearly the singular values of \(T\) are non-negative real numbers. We will use the polar decomposition, to establish the following result, known as the singular value decomposition.

Theorem For any operator \(T\in\mathcal{L}(V)\) with singular values \(s_i\), \(i=1,\dots,n\), there exist orthonormal bases of \(V\), \(\{e_1,\dots,e_n\}\) and \(\{f_1,\dots,f_n\}\) such that \(Te_i=s_if_i\) for \(i=1,\dots,n\).

Proof Choose \(\{e_1,\dots,e_n\}\) as the basis of eigenvectors \(\sqrt{T^\dagger T}\), so that \(\sqrt{T^\dagger T}e_i=s_ie_i\). By the polar decomposition there is an isometry \(U\) such that \(T=U\sqrt{T^\dagger T}\). It follows that \(Te_i=U\sqrt{T^\dagger T}e_i=s_iUe_i\). Thus defining \(f_i=Ue_i\) we have \(Te_i=s_if_i\).\(\blacksquare\)

Example Suppose \(X\) and \(Y\) are \(n\)-dimensional subspaces of a \(2n\)-dimensional vector space \(V\) such that \(V=X\oplus Y\) with \(X\) and \(Y\) not assumed to be orthogonal. As subspaces of \(V\), both \(X\) and \(Y\) have orthonormal bases which we’ll denote by \(\{e_1,\dots,e_n\}\) and \(\{f_1,\dots,f_n\}\) respectively. We define a matrix \(\mathbf{A}\) with coordinates \(A_{ij}=(e_i,f_j)\). This matrix then has a polar decomposition, \(\mathbf{A}=\mathbf{U}\mathbf{P}\) and we can use the matrix \(U\) to define a new orthonormal basis for \(X\) with elements \(e_i’=Ue_i\), \(i=1,\dots,n\). Notice that \((e_i’,f_j)=(U_{ki}e_k,f_j)=U_{ki}^*(e_k,f_j)=U^\dagger_{ik}A_kj=P_{ij}\). In particular, this means that \((e_i’f_j)=(e_j’,f_i)^*\). Now introduce a linear operator, \(T\in\mathcal{L}(V)\), by defining it on basis elements as \(Te_i’=f_i\) and \(Tf_i=e_i’\). Clearly this map is such that \(T(X)=Y\) and \(T(Y)=X\). We demonstrate that it is in fact an isometry of \(V\). We have, \((Te_i’,Te_j’)=(f_i,f_j)=\delta_{ij}=(e_i’,e_j’)=(Tf_i,Tf_j)\), but also, \((Te_i’,Tf_j)=(f_i,e_j’)=(e_j’,f_i)^*=(e_i’,f_j)\). More generally, for \(X\) and \(Y\) subspaces of equal dimension of a vector space \(V\), then there always exists an isometry of \(V\) which interchanges \(X\) and \(Y\). To see this we first define \(\tilde{X}\) and \(\tilde{Y}\) to be respectively the orthogonal complements of \(X\cap Y\) in \(X\) and \(Y\) respectively so that \(X=(X\cap Y)\oplus\tilde{X}\) and \(Y=(X\cap Y)\oplus\tilde{Y}\). Now we can write \(V\) as \(V=(X\cap Y)\oplus(\tilde{X}\oplus\tilde{Y})\oplus(X+Y)^\perp\). In this decomposition, note that \(\tilde{X}\oplus\tilde{Y}\) is not an orthogonal direct sum whilst the other two are. The result now follows since we’ve already seen how to construct the desired isometry of \(\tilde{X}\oplus\tilde{Y}\) and this can be extended to an isometry of \(V\) by acting as the identity on \(X\cap Y\) and \((X+Y)^\perp\).

The singular value decomposition appears most commonly as the statement that any (real/complex) \(m\times n\) matrix \(\mathbf{T}\) can be expressed as a product of matrices \(\mathbf{P}\mathbf{\Sigma}\mathbf{Q}^\dagger\) where \(\mathbf{P}\) and \(\mathbf{Q}\) are respectively \(m\times m\) and \(n\times n\) real orthogonal/complex unitary matrices and \(\mathbf{\Sigma}\) is a diagonal matrix with non-negative real numbers on the diagonal.

To see this, consider two finite dimensional vector spaces, \(U\) and \(V\), both real orthogonal or complex Hermitian inner product spaces of dimensions \(n\) and \(m\) respectively, and a linear map \(T\in\mathcal{L}(U,V)\). Then \(\ker T^\dagger T=\ker T\) since clearly \(\ker T\subseteq\ker T^\dagger T\) and if \(u\in\ker T^\dagger T\) then \(0=(T^\dagger Tu,u)=(Tu,Tu)\) so \(u\in\ker T\). Thus, since \(T^\dagger T\) is positive, if we set \(r=\rank T=\rank T^\dagger T\) then \(U\) has an orthonormal basis \(\{u_1,\dots,u_r,u_{r+1},\dots,u_n\}\) of eigenvectors of \(T^\dagger T\) such that the corresponding eigenvalues can be arranged so that \(\lambda_1\geq\cdots\geq\lambda_r>0=\lambda_{r+1}=\cdots=\lambda_n\). Having fixed notation in this way we have that \(\{u_{r+1},\dots,u_n\}\) is a basis for \(\ker T=\ker T^\dagger T\) and that \(\{u_1,\dots,u_r\}\) is a basis for \((\ker T)^\top\). In fact, \(\img T^\dagger\subseteq(\ker T)^\top\), since if \(\tilde{u}\in\img T^\dagger\) then \(\tilde{u}=T^\dagger v\) for some \(v\in V\) and so for any \(u\in\ker T\), \((u,\tilde{u})=(u,T^\dagger v)=(Tu,v)=0\). Conversely, \((\ker T)^\top\subseteq\img T^\dagger\) since if \(u\notin\img T^\dagger\) then \(\exists\tilde{u}\in(\img T^\dagger)^\top\) such that \((u,\tilde{u})\neq0\) but \(T^\dagger T\tilde{u}\in\img T^\dagger\) and \((T\tilde{u},T\tilde{u})=(\tilde{u},T^\dagger T\tilde{u})=0\) so that \(T\tilde{u}=0\) and \(\tilde{u}\in\ker T\) which means \(u\notin(\ker T)^\top\). Thus \(\img T^\dagger=(\ker T)^\top\) and \(\{u_1,\dots,u_r\}\) is thus a basis for \(\img T^\dagger\).

Let us now introduce the notation \(s_i=\sqrt{\lambda_i}\) so that \(T^\dagger Tu_i=s_i^2u_i\) and, for \(i=1,\dots,r\), define the elements \(v_i\in V\) by \(v_i=(1/s_i)Tu_i\). Then
\begin{equation*}
(v_i,v_j)=\frac{1}{s_is_j}(Tu_i,Tu_j)=\frac{1}{s_is_j}(u_i,T^\dagger Tu_j)=\frac{s_j}{s_i}\delta_{i,j}
\end{equation*}
so \(\{v_1,\dots,v_r\}\) is an orthonormal basis for \(\img T=(\ker T^\dagger)^\top\) which can be extended to an orthonormal basis for \(V\) with \(\{v_{r+1},\dots,v_m\}\) then a basis for \(\ker T^\dagger\). With the basis vectors \(\{v_1,\dots,v_m\}\) of \(V\) so defined, \(TT^\dagger v_i=s_iTu_i=s_i^2v_i\), so that the \(v_i\) are eigenvectors for \(TT^\dagger\) with the same eigenvalues as the \(u_i\) have as eigenvectors for \(T^\dagger T\).

The result for matrices now follows since given the bases \(\{u_i\}\) and \(\{v_i\}\) of \(U\) and \(V\) respectively and the corresponding standard bases \(\{e_i\}\), there are real orthogonal/complex unitary matrices \(\mathbf{P}\) and \(mathbf{Q}\) whose elements are defined by \(e_i=P_i^ju_j\) and \(e_i=Q_i^jv_j\). The matrix elements of \(T\) with respect to the standard bases of $U$ and $V$ are defined by \(Te_i=T_i^je_j\) but we know that \(Tu_i=s_iv_i\) so we have
\begin{equation*}
Te_i=P_i^jTu_j=P_i^js_jv_j=P_i^js_j{Q^\dagger}_j^ke_k=T_i^ke_k.
\end{equation*}
That is, \(\mathbf{T}=\mathbf{P}\mathbf{\Sigma}\mathbf{Q}^\dagger\).