Partial derivatives and some applications

Some Multivariable Functions

The most familiar examples of multivariable functions are those taking values in \(\RR\). These are also called scalar fields — they assign a scalar to each point in space. One example, a function \(f:\RR^2\mapto\RR\), is
\begin{equation*}
f(x,y)=x^2+y^2.
\end{equation*}
We say that it has ‘level curves’ — the set of points \((x,y)\) such that \(f(x,y)=r^2\) — which are circles of radius \(r\). An analogous example this time of a function \(f:\RR^3\mapto\RR\) is
\begin{equation*}
f(x,y,z)=x^2+y^2+z^2,
\end{equation*}
and in this case a ‘level surface’, specified by the points \((x,y,z)\) such that \(f(x,y,z)=r^2\), is a sphere of radius \(r\).

The curvilinear coordinates provide important examples of functions taking values in \(\RR^2\) and \(\RR^3\). Take the polar coordinates first. The function mapping a points polar coordinates to its cartesian coordinates is given by \(f:(0,\infty)\times[0,2\pi)\mapto\RR^2\)
\begin{equation*}
f(r,\theta)=(r\cos\theta,r\sin\theta).
\end{equation*}
The function mapping a point’s cylindrical coordinates to its Cartesian coordinates is a function \((0,\infty)\times[0,2\pi)\times\RR\mapto\RR^3\) which we could write as
\begin{equation*}
f(\rho,\varphi,z)=(\rho\cos\varphi,\rho\sin\varphi,z).
\end{equation*}
The function mapping a point’s spherical coordinates to its Cartesian coordinates is a function \((0,\infty)\times(0,\pi)\times[0,2\pi)\mapto\RR^3\) which we could write as
\begin{equation*}
f(r,\theta,\varphi)=(r\cos\varphi\sin\theta,r\sin\varphi\sin\theta,r\cos\theta).
\end{equation*}
Note that in each of these functions the domain has been restricted to ensure the function is one-to-one.

Definition of partial derivative

If \(f:\RR^n\mapto\RR\) is a real valued function on \(\RR^n\) we define the partial derivative of \(f\) with respect to \(x^i\) as,
\begin{equation}
\frac{\partial f}{\partial x^i}=\lim_{\epsilon\mapto0}\frac{f(x^1,\dots,x^i+\epsilon,\dots,x^n)-f(x^1,\dots,x^i,\dots,x^n) }{\epsilon}.
\end{equation}
Thus, a small change \(\Delta x^i\) in the \(x^i\) coordinate leads to an increment in the value of the function given by,
\begin{equation}
\Delta f\approx\frac{\partial f}{\partial x^i}\Delta x^i.
\end{equation}
More generally, we have
\begin{equation}
\Delta f\approx\sum_{i=1}^n\frac{\partial f}{\partial x^i}\Delta x^i=\partial_if\Delta x^i,
\end{equation}
where we’ve introduced the notation,
\begin{equation}
\partial_if=\frac{\partial f}{\partial x^i}.
\end{equation}
In this notation, second order partials are represented as
\begin{equation}
\partial_{ij}f=\frac{\partial^2 f}{\partial x^i\partial x^j}.
\end{equation}
An extremely important property of partial derivatives is that the order in which we take partial derivatives is irrelevant: \(\partial_{ij}f=\partial_{ji}f\). A function \(f:\RR^n\mapto\RR\) is said to be smooth if all higher order partials exist and are continuous. We denote the set of smooth functions \(f:\RR^n\mapto\RR\) by \(C^\infty(\RR^n)\).

Leibnitz’ rule

Partial differentiation can be useful in evaluating certain integrals, a technique informally known as ‘differentiating under the integral sign’. Suppose \(F(x,t)=\int f(x,t)\,dt\), then
\begin{equation*}
\frac{\partial F}{\partial t}=f(x,t),
\end{equation*}
so that
\begin{equation*}
\frac{\partial^2 F(x,t)}{\partial x\partial t}=\frac{\partial f(x,t)}{\partial x},
\end{equation*}
which upon integrating yields
\begin{equation*}
\frac{\partial F(x,t)}{\partial x}=\int \frac{\partial f(x,t)}{\partial x}\,dt.
\end{equation*}
More generally, if
\begin{equation*}
I(x)=\int_{u(x)}^{v(x)}f(x,t)\,dt=F(x,v(x))-F(x,u(x)),
\end{equation*}
then \(\partial I/\partial v=f(x,v(x))\), \(\partial I/\partial u=-f(x,u(x))\) and
\begin{align*}
\frac{\partial I}{\partial x}&= \int^v\frac{\partial f(x,t)}{\partial x}\,dt-\int^u\frac{\partial f(x,t)}{\partial x}\,dt\\
&=\int_u^v\frac{\partial f(x,t)}{\partial x}\,dt
\end{align*}
so that
\begin{equation}
\frac{dI}{dx}=f(x,v(x))\frac{dv}{dx}-f(x,u(x))\frac{du}{dx}+\int_u^v\frac{\partial f(x,t)}{\partial x}\,dt
\end{equation}
which is called Leibnitz’ rule.

Example
If
\begin{equation}
\phi(\alpha)=\int_\alpha^{\alpha^2}\frac{\sin\alpha x}{x}\,dx
\end{equation}
then by Leibnitz’ rule we have,
\begin{align*}
\phi'(\alpha)&=\frac{\sin\alpha^3}{\alpha^2}\cdot2\alpha-\frac{\sin\alpha^2}{\alpha}+\int_\alpha^{\alpha^2}\cos\alpha x\,dx\\
&=2\frac{\sin\alpha^3}{\alpha}-\frac{\sin\alpha^2}{\alpha}+\frac{\sin\alpha^3}{\alpha}-\frac{\sin\alpha^2}{\alpha}\\
&=\frac{3\sin\alpha^3-2\sin\alpha^2}{\alpha}.
\end{align*}

Taylor expansion and stationary points

The Taylor expansion for \(f:\RR^n\mapto\RR\) about a point \(a\) is
\begin{equation}
f(x)=f(a)+\sum_{i=1}^n\partial_if(a)(x^i-a^i)+\frac{1}{2}\sum_{i,j=1}^n\partial_{ij}f(a)(x^i-a^i)(x^j-a^j)+\dots,
\end{equation}
where for clarity (here and below) we’re not employing the summation convention for repeated indices.

The stationary points of a function \(f\) may be analysed with the help of the Taylor expansion as follows. At any stationary point, \(a\), the first partial derivatives must be zero. To try to determine the nature of the stationary point we consider the approximation of the function given by the Taylor expansion about the point,
\begin{equation}
f(x)-f(a)\approx\frac{1}{2}{\Delta\mathbf{x}}^\mathsf{T}\mathbf{M}\Delta\mathbf{x}\label{eq:1st order taylor}
\end{equation}
where \(\Delta\mathbf{x}^\mathsf{T}=(x^1-a^1,\dots,x^n-a^n)\), \(\Delta\mathbf{x}\) the corresponding column vector and \(\mathbf{M}\) is the matrix with elements \(M_{ij}=\partial_{ij}f(a)\). Since \(\mathbf{M}\) is a real symmetric matrix it is diagonalisable through a similarity transformation by an orthogonal matrix \(\mathbf{O}\). That is, \(\mathbf{O}^\mathsf{T}\mathbf{M}\mathbf{O}\) is diagonal with diagonal elements the eigenvalues of \(\mathbf{M}\). Thus we have
\begin{equation}
f(x)-f(a)\approx\frac{1}{2}{\Delta\mathbf{x}’}^\mathsf{T}\mathbf{M}’\Delta\mathbf{x}’,
\end{equation}
where \(\Delta\mathbf{x}’=\mathbf{O}^\mathsf{T}\Delta\mathbf{x}\) and \(M’_{ij}=\delta_{ij}\lambda_i\) with \(\lambda_i\) the eigenvalues of \(\mathbf{M}\). That is,
\begin{equation}
f(x)-f(a)\approx\frac{1}{2}\sum_i(\Delta x’^i)^2\lambda_i,
\end{equation}
from which we conclude the following:

  1. If \(\lambda_i>0\) for all \(i\) then the stationary point at \(a\) is a minimum.
  2. If \(\lambda_i<0\) for all \(i\) then the stationary point at \(a\) is a maximum.
  3. If at least one \(\lambda_i>0\) and at least one \(\lambda_i<0\) then the stationary point at \(a\) is a saddle point (a stationary point which is not an extremum).
  4. If some \(\lambda_i=0\) and the non-zero \(\lambda_i\) all have the same sign then the test is inconclusive.

For a function of two real variables, \(f:\RR^2\mapto\RR\), the eigenvalues of \(\mathbf{M}\) are obtained from
\begin{equation*}
\det\begin{pmatrix}
\partial_{xx}f-\lambda & \partial_{xy}f\\
\partial_{xy}f & \partial_{yy}f-\lambda\\
\end{pmatrix}=\lambda^2-(\partial_{xx}f+\partial_{yy}f)\lambda+\partial_{xx}f\partial_{yy}f-\partial_{xy}f^2=0.
\end{equation*}
That is,
\begin{equation*}
\lambda=(\partial_{xx}f+\partial_{yy}f)\pm\sqrt{(\partial_{xx}f-\partial_{yy}f)^2+4\partial_{xy}f^2},
\end{equation*}
so that for \(\lambda>0\) we need \(\partial_{xx}f>0\), \(\partial_{yy}f>0\) and \(\partial_{xx}f\partial_{yy}f-\partial_{xy}f^2>0\). For \(\lambda<0\) we need \(\partial_{xx}f<0\), \(\partial_{yy}f<0\) and \(\partial_{xx}f\partial_{yy}f-\partial_{xy}f^2>0\). For a saddle point we need \(\partial_{xx}f\) and \(\partial_{yy}f\) having opposite signs or \(\partial_{xx}f\partial_{yy}f<\partial_{xy}f^2\).

Taylor’s Theorem

Further refining our index notation for partials such that for any \(m\)-tuple \(I=(i_1,\dots,i_m)\) and \(|I|=m\) we define
\begin{equation}
\partial_I=\frac{\partial^m}{\partial x^{i_1}\cdots\partial x^{i_m}}
\end{equation}
and
\begin{equation}
(x-a)^I=(x^{i_1}-a^{i_1})\cdots(x^{i_m}-a^{i_m})
\end{equation}
we can state Taylor’s Theorem. This says that for a function \(f:\RR^n\mapto\RR\) appropriately differentiable near a point \(a\) we have for all \(x\) near \(a\),
\begin{equation}
f(x)=P_k(x)+R_k(x),
\end{equation}
where
\begin{equation}
P_k(x)=f(a)+\sum_{m=1}^k\frac{1}{m!}\sum_{I:|I|=m}(x-a)^I\partial_If(a),
\end{equation}
is the the \(k\)th-order Taylor polynomial of \(f\) at \(a\) and
\begin{equation}
R_k(x)=\frac{1}{k!}\sum_{I:|I|=k+1}(x-a)^I\int_0^1(1-t)^k\partial_If(a+t(x-a))dt
\end{equation}
is the \(k\)th remainder term. To see why this is true we use induction. For \(k=0\), \(P_0(x)=f(a)\) and the \(k\)th remainder term is
\begin{align*}
R_0(x)&=\sum_{i=1}^n(x^i-a^i)\int_0^1\partial_if(a+t(x-a))dt\\
&=\int_0^1\frac{d}{dt}f(a+t(x-a))dt\\
&=f(x)-f(a).
\end{align*}
Now assume the result for some \(k\) and use integration by parts on the integral in the remainder term,
\begin{align*}
\int_0^1(1-t)^k\partial_If(a+t(x-a))dt&=\left.\left(-\frac{(1-t)^{k+1}}{k+1}\partial_If(a+t(x-a))\right)\right\rvert_0^1\\
&+\frac{(1-t)^{k+1}}{k+1}\frac{d}{dt}\partial_If(a+t(x-a))dt\\
&=\frac{1}{k+1}\partial_If(a)\\
&+\frac{1}{k+1}\sum_{i=1}^n(x^i-a^i)\int_0^1(1-t)^k+1\frac{\partial}{\partial x^i}\partial_If(a+t(x-a))dt
\end{align*}
Now observe that
\begin{equation*}
P_k(x)+\frac{1}{k!}\sum_{I:|I|=k+1}(x-a)^I\frac{1}{k+1}\partial_If(a)=P_{k+1}(x)
\end{equation*}
and that
\begin{align*}
&\frac{1}{k!}\sum_{I:|I|=k+1}(x-a)^I\frac{1}{k+1}\sum_{i=1}^n(x^i-a^i)\int_0^1(1-t)^k+1\frac{\partial}{\partial x^i}\partial_If(a+t(x-a))dt\\
&=\frac{1}{(k+1)!}\sum_{I:|I|=k+1}\sum_{i=1}^n(x-a)^I(x^i-a^i)\int_0^1(1-t)^k+1\frac{\partial}{\partial x^i}\partial_If(a+t(x-a))dt\\
&=\frac{1}{(k+1)!}\sum_{I:|I|=k+2}(x-a)^I\int_0^1(1-t)^k+1\partial_If(a+t(x-a))dt\\
&=R_{k+1}(x).
\end{align*}

Differentials

Recall that the total differential \(df\) of a function \(f:\RR^n\mapto\RR\) is defined to be
\begin{equation}
df=\partial_if dx^i.
\end{equation}
Though the relation to the infinitesimal increment is clear, there is no approximation intended here. Later we will formally define \(df\) as an object belonging to the dual space of the tangent space at a point, a “differential form”, but for the time being it is safe to think of it either as a small change in \(f\) or as the kind of object we are used to integrating.

When working with partial derivatives it is always wise to indicate clearly which variables are being held constant. Thus,
\begin{equation}
\left(\frac{\partial\phi}{\partial x}\right)_{y,z},
\end{equation}
means the partial derivative of \(\phi\), regarded as a function of \(x\), \(y\) and \(z\), with respect to \(x\), holding \(y\) and \(z\) constant. The following example demonstrates how differentials naturally ‘spit out’ all partial derivatives simultaneously.

Example Suppose \(w=x^3y-z^2t\), \(xy=zt\), and we wish to calculate
\begin{equation*}
\left(\frac{\partial w}{\partial y}\right)_{x,t}.
\end{equation*}
We could either proceed directly, using the chain rule,
\begin{equation*}
\left(\frac{\partial w}{\partial y}\right)_{x,t}=x^3-2zt\quad\quad\left(\frac{\partial z}{\partial y}\right)_{x,t}=x^3-2xz,
\end{equation*}
or take differentials,
\begin{equation*}
dw=3x^2y\,dx+x^3\,dy-2zt\,dz-z^2\,dt,
\end{equation*}
\begin{equation*}
y\,dx+x\,dy=t\,dz+z\,dt,
\end{equation*}
then subsituting for \(dz\), since \(x\), \(y\) and \(t\) are being treated as the independent variables, to get,
\begin{equation*}
dw=(3x^2y-2yz)\,dx+(x^3-2xz)\,dy+z^2\,dt,
\end{equation*}
from which we obtain all the partials at once,
\begin{equation*}
\left(\frac{\partial w}{\partial x}\right)_{y,t}=3x^2y-2yz\quad
\left(\frac{\partial w}{\partial y}\right)_{x,t}=x^3-2xz\quad
\left(\frac{\partial w}{\partial t}\right)_{x,y}=z^2.\\
\end{equation*}

A differential of the form \(g_idx^i\) is said to be exact if there exists a function \(f\) such that \(df=g_idx^i\). This turns out to be an important attribute and in many situations is equivalent to the condition that \(\partial_ig_j=\partial_jg_i\) for all pairs \(i,j\).

Example Consider the differential, \((x+y^2)\,dx+(2xy+3y^2)\,dy\). This certainly satisfies the condition so let us try to identify an \(f\) such that it is equal to \(df\). Integrating \((x+y^2)\) with respect to \(x\) treating \(y\) as constant, we find our candidate must have the form, \(x^2/2+xy^2+c(y)\), where \(c(y)\) is some function of \(y\). Now differentiating this with respect to \(y\) we get \(2xy+c'(y)\) and this must be equal to \(2xy+3y^2\). Therefore \(f\) must have the form \(f(x,y)=x^2/2+xy^2+y^3+c\) where \(c\) is an arbitrary constant.