Determinants

If we focus on linear operators which are endomorphisms, that is, operators \(T\in\mathcal{L}(V)\) for some vector space \(V\), then basis independent properties are those of the orbits of the group action of \(\text{GL}(V)\) on \(\mathcal{L}(V)\) which maps \(T\mapto P^{-1}TP\). In other words, properties of square \(n\times n\) matrices \(\mathbf{A}\) which are also properties of all similar (also called conjugate) matrices \(\mathbf{P}^{-1}\mathbf{A}\mathbf{P}\) for some invertible \(n\times n\) matrix \(\mathbf{P}\). The change of basis matrix \(\mathbf{P}\) here is called a similarity transformation. One such attribute is the determinant.

Definition The determinant, \(\det\mathbf{A}\), of an \(n\times n\) matrix \(\mathbf{A}\) is defined to be
\begin{equation}
\det\mathbf{A} = \sum_{\sigma\in S_n}\sgn(\sigma)A_{\sigma_{1}}^1A_{\sigma_{2}}^2\dots A_{\sigma_{n}}^n\label{eq:def det}
\end{equation}
where the sum is over all permutations \(\sigma\) of the set \(\{1, 2,\dots, n\}\), \(\sigma_i\) being the image of \(i\) in a given permutation and \(S_n\) the set of all such permutations (the \(n\)-dimensional symmetric group) and \(\sgn(\sigma)\) the signature of \(\sigma\) (\(+1\) whenever the reordering given by \(\sigma\) can be achieved by successively interchanging two entries an even number of times, and \(-1\) otherwise.

This is sometimes expressed in terms of the Levi-Civita symbol \(\epsilon_{i_1i_2\dots i_n}\), defined as
\begin{equation}
\epsilon_{i_1i_2\dots i_n}=\begin{cases}
\sgn(i_1,i_2,\dots,i_n)&(i_1,i_2,\dots,i_n)\in S_n\\
0&(i_1,i_2,\dots,i_n)\notin S_n.
\end{cases}\label{def:Levi-Cevita}
\end{equation}

For any permutation \(\sigma\) given by
\begin{equation*}
\sigma=\begin{pmatrix}
1&2&\dots&n\\
\sigma_1&\sigma_2&\dots&\sigma_n
\end{pmatrix}
\end{equation*}
there is an inverse permutation, let’s call it \(\tau\), given by
\begin{equation*}
\tau=\begin{pmatrix}
\sigma_1&\sigma_2&\dots&\sigma_n\\
1&2&\dots&n
\end{pmatrix}
\end{equation*}
for which it is clear that \(\sgn(\sigma)=\sgn(\tau)\). This leads us to the observation that we could as well write the determinant as
\begin{equation}
\det\mathbf{A} = \sum_{\sigma\in S_n}\sgn(\sigma)A_1^{\sigma_{1}}A_2^{\sigma_{2}}\dots A_n^{\sigma_{n}}
\end{equation}
and that indeed
\begin{equation}
\det\mathbf{A}^\mathsf{T}=\det\mathbf{A}
\end{equation}
Note also that if the matrix is upper (or lower) triangular then \(\det\mathbf{A}=A_1^1\dots A_n^n\) and also that if two columns (or rows) are identical then \(\det\mathbf{A}=0\)

The effect of elementary row operations on the determinant is as follows. Interchanging rows changes the sign. Multiplying a row by a number multiplies the corresponding determinant by that number. Adding a multiple of one row to another leaves the determinant unchanged. The first two of these are obvious. For the third notice that for any permutation \(\sigma\) given by
\begin{equation*}
\sigma=\begin{pmatrix}
1&\dots&i&\dots&j&\dots&n\\
\sigma_1&\dots&\sigma_i&\dots&\sigma_j&\dots&\sigma_n
\end{pmatrix}
\end{equation*}
there will also be a permutation \(\sigma’\) given by
\begin{equation*}
\sigma’=\begin{pmatrix}
1&\dots&i&\dots&j&\dots&n\\
\sigma_1&\dots&\sigma_j&\dots&\sigma_i&\dots&\sigma_n
\end{pmatrix}
\end{equation*}
with \(\sgn(\sigma’)=-\sgn(\sigma)\). Thus if the elementary row operation transforms \(\mathbf{A}\) to \(\mathbf{A}’\), the matrix \(\mathbf{A}\) but with the \(i\)th row replaced by row \(i\) plus \(c\) times row \(j\), then
\begin{align*}
\det\tilde{\mathbf{A}} = \sum_{\sigma\in S_n}\sgn(\sigma) \bigl( &A_{\sigma_{1}}^1\dots A_{\sigma_{i}}^i\dots A_{\sigma_{j}}^j\dots A_{\sigma_{n}}^n \\
+ &cA_{\sigma_{1}}^1\dots A_{\sigma_{i}}^j\dots A_{\sigma_{j}}^j\dots A_{\sigma_{n}}^n \bigr)
\end{align*}
and we see that the term involving the constant \(c\) will be cancelled by the term corresponding to the permutation \(\sigma’\). The determinant of the identity matrix being of course 1, we see that if \(\mathbf{E}\) is any elementary row matrix, then \(\det\mathbf{E}\mathbf{A}=\det\mathbf{E}\det\mathbf{A}\). An entirely analogous discussion applies of course to elementary column matrices. So if an \(n\times n\) matrix \(\mathbf{A}\) is invertible then since it can be written as a product of elementary matrices, its determinant is non-zero. Conversely if the determinant is non-zero then the reduced row echelon form of the matrix has non-zero determinant so it must have full rank and therefore be invertible. We have therefore proved the following

Theorem An \(n\times n\) matrix \(\mathbf{A}\) is invertible if and only if \(\det\mathbf{A}\neq0\).

Another important property of determinants is contained in the following

Theorem For any \(n\times n\) matrices \(\mathbf{A}\) and \(\mathbf{B}\)
\begin{equation}
\det\mathbf{A}\mathbf{B}=\det\mathbf{A}\det\mathbf{B}.
\end{equation}

Proof Consider first the the case that \(A\) is not invertible. Then \(\rank L_\mathbf{A}<n\) which means \(\rank L_\mathbf{A}L_\mathbf{B}<n\) so \(\mathbf{A}\mathbf{B}\) is not invertible and both sides are zero. Similarly if \(\mathbf{B}\) is not invertible, then the nullity of \(L_\mathbf{B}\) and therefore \(L_\mathbf{A}L_\mathbf{B}\) contain more than \(0\) so again both sides are zero. Finally if both \(A\) and \(B\) are invertible then both can be expressed as products of elementary matrices and the result follows by repeated use of \(\det\mathbf{E}\mathbf{A}=\det\mathbf{E}\det\mathbf{A}\).\(\blacksquare\)

Another perspective on determinants comes from regarding an \(n\times n\) matrix as an \(n\)-tuple of \(n\)-dimensional column vectors. There is then an obvious isomorphism \(\text{Mat}_n(K)\cong K^n\times\dots\times K^n\) and the determinant may be regarded as a function \(\det:K^n\times\dots\times K^n\mapto K\). As such, it follows from the definition that it is multilinear in the sense that it is linear in each column, \(v_i\) say, of the matrix. That is,

\begin{equation*}
\det(v_1,\dots,cv_i,\dots,v_n)=c\det(v_1,\dots,v_i,\dots,v_n),
\end{equation*}
and
\begin{equation*}
\det(v_1,\dots,u_i+v_i,\dots,v_n)=\det(v_1,\dots,u_i,\dots,v_n)+\det(v_1,\dots,v_i,\dots,v_n).
\end{equation*}
Definition A multilinear function \(f:K^n\times\dots\times K^n\mapto K\) is a volume form on \(K^n\) if is alternating. That is, whenever \(i\neq j\) and \(v_i=v_j\), \(f(v_1,\dots,v_n)=0\).

Clearly, the determinant is an example of a volume form, and we may ask if there are any others. Consider the standard basis \(\{\mathbf{e}_i\}\) of \(K^n\), then any element of \(K^n\times\dots\times K^n\) has the form
\begin{equation*}
(A_1^i\mathbf{e}_i,A_2^i\mathbf{e}_i,\dots,A_n^i\mathbf{e}_i)
\end{equation*}
so if \(f\) is a volume form then
\begin{equation*}
f(A_1^i\mathbf{e}_i,A_2^i\mathbf{e}_i,\dots,A_n^i\mathbf{e}_i)=A_1^{i_1}A_2^{i_2}\dots A_n^{i_n}f(\mathbf{e}_{i_1}\dots,\mathbf{e}_{i_n}).
\end{equation*}
But \(f\) is alternating, so this is zero unless \((i_1,\dots,i_n)\) is some permutation of \((1,\dots,n)\) and moreover it is not difficult to see that \(f(\mathbf{e}_{\sigma(1)},\dots,\mathbf{e}_{\sigma(n)})=\sgn(\sigma)f(\mathbf{e}_1,\dots,\mathbf{e}_n)\) so
\begin{align*}
f(A_1^i\mathbf{e}_i,A_2^i\mathbf{e}_i,\dots,A_n^i\mathbf{e}_i)&=\sum_{\sigma\in S_n}A_1^{i_1}A_2^{i_2}\dots A_n^{i_n}\sgn(\sigma)f(\mathbf{e}_1\dots,\mathbf{e}_n)\\
&=\det(v_1,\dots,v_n)f(\mathbf{e}_1,\dots,\mathbf{e}_n).
\end{align*}
So any volume form is determined by \(f(\mathbf{e}_1,\dots,\mathbf{e}_n)\) and indeed the set of volume forms is a \(1\)-dimensional vector space. In other words, the determinant is uniquely specified by being multilinear, alternating, and having \(\det\mathbf{I}_n=1\) (\(f(\mathbf{e}_1,\dots,\mathbf{e}_n)=1\)), where \(\mathbf{I}_n\) is the \(n\times n\) identity matrix.

A rather neat demonstration of the fact that \(\det\mathbf{A}\mathbf{B}=\det\mathbf{A}\det\mathbf{B}\) may now be given. Define \(f:\text{Mat}_n(K)\mapto K\) as \(f(\mathbf{B})=\det\mathbf{A}\mathbf{B}\). Then \(f\) is a volume form, which is clear having observed, as above, that the \(j\)th column of \(\mathbf{A}\mathbf{B}\), let’s denote this \((AB)_j\), is a linear sum of the columns of \(\mathbf{A}\) with coefficients from the \(j\)th column of \(\mathbf{B}\), that is,
\begin{equation*}
(AB)_j=B_j^1(A)_1+B_j^2(A)_2+\dots+B_j^n(A)_n.
\end{equation*}
This means that
\begin{equation*}
f(\mathbf{B})=\det\mathbf{B}f(\mathbf{e}_1,\dots,\mathbf{e}_n),
\end{equation*}
but by the definition we have \(f(\mathbf{e}_1,\dots,\mathbf{e}_n)=f(\mathbf{I}_n)=\det\mathbf{A}\).

Clearly, \(\det{\mathbf{A}}^{-1}=(\det\mathbf{A})^{-1}\) so \(\det\mathbf{P}^{-1}\mathbf{A}\mathbf{P}=\det\mathbf{A}\) and it therefore makes sense to define the determinant of any linear operator \(T:V\mapto V\) as \(\det\mathbf{T}\) where \(\mathbf{T}\) is the matrix representation of \(T\) with respect to any basis. Thus, we may summarise what we have learnt about the invertibility of linear operators in the following theorem.

Theorem For any linear operator \(T\in\mathcal{L}(V)\) the following are equivalent:

  1. \(T\) is invertible;
  2. \(\det T\neq0\);
  3. \(\ker T=\{0\}\);
  4. \(\img T=V\).

There is another expression for the determinant which may be derived, called Laplace’s formula,\begin{equation}
\det(A) = \sum_{j=1}^n (-1)^{i+j} A_j^i M_{i,j} = \sum_{i=1}^n (-1)^{i+j} A_j^i M_{i,j},\label{eq:Laplace’s formula}
\end{equation}
where the expansion is along the \(i\)th row and \(j\)th column respectively. Here \(M_{i,j}\) is defined to be the determinant of the \((n-1)\times (n-1)\) matrix obtained by removing the \(i\)th row and \(j\)th column of the matrix \(\mathbf{A}\). It is called a minor. With the adjugate matrix, \(\adj\mathbf{A}\), defined as,
\begin{equation}
\adj A_j^i=(-1)^{i+j}M_{j,i},
\end{equation}
we then have
\begin{equation*}
((\adj A)A)_j^i=(\adj A)_k^iA_j^k=\sum_{k=1}^n(-1)^{i+k}M_{k,i}A_j^k.
\end{equation*}
In the last expression, if \(i=j\) then we simply have \(\det\mathbf{A}\), while if \(i\neq j\) then the expression is precisely that of the determinant of a matrix derived from \(\mathbf{A}\) by replacing the \(i\)th column by the \(j\)th column and as such is zero. Thus we see that,
\begin{equation}
\adj \mathbf{A}\mathbf{A}=\mathbf{A}\adj\mathbf{A}=\det\mathbf{A}\mathbf{I}_n,
\end{equation}
which is called Cramer’s rule, and which means in particular that
\begin{equation}
\mathbf{A}^{-1}=\frac{1}{\det\mathbf{A}}\adj\mathbf{A},\label{eq:Cramer’s rule}
\end{equation}
if \(\det\mathbf{A}\neq 0\).