Vector differentiation

Recall that the derivative, \(f'(t)\), of a scalar function of one real variable, \(f(t)\), is defined to be
\begin{equation}
f'(t)=\lim_{h\mapto 0}\frac{f(t+h)-f(t)}{h}.\label{def:one-dim deriv}
\end{equation}
We can also consider functions taking values in \(\RR^n\), \(f:\RR\mapto\RR^n\). In the definition of derivative we’ll then explicitly make use of the vector space nature of \(\RR^n\) and though we won’t in general, it can be useful in this context to denote the image under \(f\) of some \(x\in\RR\) using bold face \(\mathbf{f}(x)\).
\begin{equation}
\mathbf{f}'(x)=\frac{d\mathbf{f}}{dx}=\lim_{h\mapto 0}\frac{\mathbf{f}(x+h)-\mathbf{f}(x)}{h}.
\end{equation}
The vector \(\mathbf{f}(x)\) is nothing but the vector corresponding to the element \(f(x)\in\RR^n\) with respect to the standard basis in \(\RR^n\). The following product rules follow from this definition in the same way as the scalar function product rule,
\begin{align}
\frac{d}{dx}\left(c(x)\mathbf{f}(x)\right)&=c\frac{d\mathbf{f}}{dx}+\frac{dc}{dx}\mathbf{f},\\
\frac{d}{dx}\left(\mathbf{f}(x)\cdot\mathbf{g}(x)\right)&=\mathbf{f}\cdot\frac{d\mathbf{g}}{dx}+\frac{d\mathbf{f}}{dx}\cdot\mathbf{g},\\
\frac{d}{dx}\left(\mathbf{f}(x)\times\mathbf{g}(x)\right)&=\mathbf{f}\times\frac{d\mathbf{g}}{dx}+\frac{d\mathbf{f}}{dx}\times\mathbf{g},
\end{align}
where \(c:\RR\mapto\RR\) and \(g:\RR\mapto\RR^n\) with \(\mathbf{g}(x)\) the vector representation of \(g(x)\) with respect to the standard basis.

More generally, we can consider vector valued functions \(f:\RR^n\mapto\RR^m\) such that points \(x\in\RR^n\) are mapped to points \(f(x)=(f^1(x),\dots,f^m(x))\) of \(\RR^m\) where we have here introduced the component functions, \(f^i:\RR^n\mapto\RR\), of \(f\). Such a function \(f\) is said to be differentiable at \(a\) if there exists a linear map \(J_f(a):\RR^n\mapto\RR^m\) such that
\begin{equation}
\lim_{h\mapto0}\frac{|f(a+h)-f(a)-J_f(a)h|}{|h|}=0\label{eq:genderiv}
\end{equation}
where \(||\) is the appropriate length (for \(\RR^m\) in the numerator and \(\RR^n\) in the denominator). In this case, \(J_f(a)\) is called the derivative (sometimes total derivative) of \(f\) at \(a\). Introducing \(R(h)\in\RR^m\) as \(R(h)=f(a+h)-f(a)-J_f(a)h\) we can interpret \eqref{eq:genderiv} as saying that
\begin{equation}
f(a+h)=f(a)+J_f(a)h+R(h)
\end{equation}
where the “remainder” \(R(h)\) is such that \(\lim_{h\mapto0}|R(h)|/|h|=0\) and so can interpret \(J_f(a)\) as linearly approximating \(f(a+h)-f(a)\) near \(a\). Perhaps not surprisingly it turns out that, if \(f\) is differentiable at \(a\), then with respect to the standard bases of \(\RR^n\) and \(\RR^m\) the matrix of the linear map \(J_f(a)\), \(\mathbf{J}_f(a)\), has elements given by the partial derivatives,
\begin{equation}
{J_f(a)}_i^j=\frac{\partial f^j}{\partial x^i}(a).
\end{equation}
To see this, note that, if it exists, the \(i\)th partial derivative of \(f^j\) at \(a\) is given by
\begin{equation}
\partial_if^j(a)=\lim_{\epsilon\mapto0}\frac{f^j(a+\epsilon e_i)-f^j(a)}{\epsilon}.
\end{equation}
where \(e_i\) is \(i\)th standard basis element of \(\RR^n\). Now, recalling the definition of the remainder \(R(h)\in\RR^m\), we have that, with respect to the standard basis of \(\RR^m\) the \(j\)th component of \(R(\epsilon e_i)\) is \(R^j(\epsilon e_i)=f^j(a+\epsilon e_i) – f^j(a)-{J_f(a)}_i^j\epsilon\). Therefore we can write
\begin{align*}
\partial_if^j(a)&=\lim_{\epsilon\mapto0}\frac{f^j(a+\epsilon e_i)-f^j(a)}{\epsilon}=\lim_{\epsilon\mapto0}\frac{{J_f(a)}_i^j\epsilon+R^j(\epsilon e_i)}{\epsilon}\\
&={J_f(a)}_i^j+\lim_{\epsilon\mapto0}\frac{R^j(\epsilon e_i)}{\epsilon}\\
&={J_f(a)}_i^j.
\end{align*}
The converse also holds. That is, if all the component functions of a function \(f:\RR^n\mapto\RR^m\) are differentiable at a point \(a\in\RR^n\), then \(f\) is differentiable at \(a\). Thus, we have that a function \(f:\RR^n\mapto\RR^m\) is differentiable at \(a\) if and only if all its component functions are differentiable at \(a\). In this case, with respect to the standard bases of \(\RR^n\) and \(\RR^m\), the matrix of the derivative of \(f\), \(\mathbf{J}_f(a)\), the matrix of partial derivatives of the component functions at \(a\). This matrix is called the Jacobian matrix of \(f\) at \(a\).

A function \(f:\RR^n\mapto\RR^m\) is said to be smooth if all its component functions are smooth. A smooth function \(f\) between open sets of \(\RR^n\) and \(\RR^m\) is called a diffeomorphism if it is bijective and its inverse function is also smooth. We will consider the inevitability of functions in the section on the inverse function theorem.

The derivative of a composition of maps \(f:\RR^n\mapto\RR^m\), and \(g:\RR^m\mapto\RR^p\), \(g\circ f\), at a point \(a\in\RR^n\), that is, the generalisation of the familiar chain rule, is then given my the matrix product of the respective Jacobian matrices,
\begin{equation}
J_{g\circ f}(a)=J_g(f(a))J_f(a).
\end{equation}

Example Suppose \(f:\RR^n\mapto\RR^m\) is a linear map whose matrix representation with respect to the standard bases of \(\RR^n\) and \(\RR^m\) is \(\mathbf{f}\). Then \(f\) maps \(\mathbf{x}\mapsto\mathbf{f}\mathbf{x}\) so clearly \(\mathbf{J}_f=\mathbf{f}\).

Example
As we discussed earlier, the relationship between polar and cartesian coordinates can be described through a map from \(\RR^2\mapto\RR^2\) given by
\begin{equation}
\begin{pmatrix}r\\\theta\end{pmatrix}\mapsto\begin{pmatrix}r\cos\theta\\r\sin\theta\end{pmatrix},
\end{equation}
the domain of which we take to be \((0,\infty)\times[0,2\pi)\subset\RR^2\). We typically write the components of this map as \(x(r,\theta)=r\cos\theta\) and \(y(r,\theta)=r\sin\theta\). The Jacobian matrix for the polar coordinate map is then
\begin{equation}
\begin{pmatrix}
\partial x/\partial r&\partial x/\partial\theta\\
\partial y/\partial r&\partial y/\partial\theta
\end{pmatrix}=
\begin{pmatrix}
\cos\theta&-r\sin\theta\\
\sin\theta& r\cos\theta
\end{pmatrix}.
\end{equation}
Likewise, cylindrical coordinates are related to cartesian coordinate through a map from \(\RR^3\mapto\RR^3\) given by
\begin{equation}
\begin{pmatrix}
\rho\\\phi\\z\end{pmatrix}\mapsto\begin{pmatrix}
\rho\cos\phi\\\rho\sin\phi\\z
\end{pmatrix}.
\end{equation}
In this case the domain is taken to be \((0,\infty)\times[0,2\pi)\times\RR\subset\RR^3\) and the Jacobian matrix is
\begin{equation}
\begin{pmatrix}
\cos\phi&-\rho\sin\phi&0\\
\sin\phi&\rho\cos\phi&0\\
0&0&1
\end{pmatrix}.
\end{equation}
For spherical polar coordinates the \(\RR^3\mapto\RR^3\) map is
\begin{equation}
\begin{pmatrix}
r\\\theta\\\phi
\end{pmatrix}\mapsto\begin{pmatrix}
r\sin\theta\cos\phi\\
r\sin\theta\sin\phi\\
r\cos\theta
\end{pmatrix},
\end{equation}
we take the domain to be \((0,\infty)\times(0,\pi)\times[0,2\pi)\subset\RR^3\) and with Jacobian matrix
\begin{equation}
\begin{pmatrix}
\sin\theta\cos\phi&r\cos\theta\cos\phi&-r\sin\theta\sin\phi\\
\sin\theta\sin\phi&r\cos\theta\sin\phi&r\sin\theta\cos\phi\\
\cos\theta&-r\sin\theta&0\\
\end{pmatrix}.
\end{equation}