Tangent vectors

As we’ve already indicated we choose to view \(\RR^n\) as a space of points, \(x=(x^1,\dots,x^n)\), with “arrows” emerging from a point \(x\) in space living in the tangent space \(T_x(\RR^n)\) at \(x\). This is just another copy of \(\RR^n\), now viewed as vector space. For example, at some point \(a\in\RR^n\) we might have vectors \(\mathbf{v}_a,\mathbf{w}_a\in T_a(\RR^n)\) such that \(\mathbf{v}_a+\mathbf{w}_a=(\mathbf{v}+\mathbf{w})_a\). Given two distinct points \(a,b\in\RR^n\), \(T_a(\RR^n)\) and \(T_b(\RR^n)\) are distinct copies of \(\RR^n\). Without some further mechanism by which we could transport a vector from \(T_a(\RR^n)\) to a vector in \(T_b(\RR^n)\) there can be no meaning attached to the sum of a vector in \(T_a(\RR^n)\) and a vector in \(T_b(\RR^n)\).

Working with the space \(\RR^n\) we can safely think of the tangent space at each point as the collection of all arrows at the point. But suppose our space was the surface of a sphere. In this case we have the idea of tangent vectors at a point living in a tangent plane within an ambient space. But what if there was no ambient space? We’re anticipating here the generalisation of the tools of calculus to spaces far more general than \(\RR^n\). With this in mind we’ll consider here more sophisticated characterisations of the notion of tangent vectors. Specifically, we’ll avoid, as far as possible, explicitly exploiting the fact that our underlying space of points, \(\RR^n\), is itself a vector space. Instead we’ll rely on the fact that at any point we have a valid coordinate system through which we can access a vector space structure.

A tangent vector as an equivalence class of curves

A smooth curve in \(\RR^n\) is a smooth map \(\gamma:(\alpha,\beta)\mapto\RR^n\) which we denote simply by \(\gamma(t)\). With respect to some coordinate system \(x^i\), two curves, \(\gamma(t)\) and \(\tilde{\gamma}(t)\), are said to be tangent at a point \(\gamma(t_0)=a=\tilde{\gamma}(t_0)\) if
\begin{equation}
\left.\frac{dx^i(\gamma(t))}{dt}\right|_{t_0}=\left.\frac{dx^i(\tilde{\gamma}(t))}{dt}\right|_{t_0}
\end{equation}
for \(i=1,\dots,n\). Curves are tangent regardless of the coordinate system used. Indeed, suppose instead of \(x^i\) we used some other coordinate system \(y^i\) such that \(y^i=y^i(x^1,\dots,x^n)\) then
\begin{align*}
\left.\frac{dy^i(\gamma(t))}{dt}\right|_{t_0}&=\sum_{j=1}^{n}\frac{\partial y^i}{\partial x^j}\left.\frac{dx^i(\gamma(t))}{dt}\right|_{t_0}\\
&=\sum_{j=1}^{n}\frac{\partial y^i}{\partial x^j}\left.\frac{dx^i(\tilde{\gamma}(t))}{dt}\right|_{t_0}\\
&=\left.\frac{dy^i(\tilde{\gamma}(t))}{dt}\right|_{t_0}
\end{align*}
One definition of a tangent vector, \(\mathbf{v}\), at a point \(a\) is then as the equivalence class \([\gamma]\) of curves tangent to one another at the point \(a\). We can define addition and scalar multiplication by \(\mathbf{v}_1+\mathbf{v}_2=[\gamma_1+\gamma_2]\) and \(c\mathbf{v}=[c\gamma]\). These definitions are clearly exploiting the vector space structure of our space \(\RR^n\) but can easily be tweaked not to do so. The tangent vectors so defined form a real vector space, the tangent space \(T_a(\RR^n)\). This is clearly equivalent to our intuitive notion of vectors as arrows at a point but is applicable even when our space of points is more general than \(\RR^n\).

We can now introduce the directional derivative of a (smooth) function \(f:\RR^n\mapto\RR\), at a point \(a=\gamma(t_0)\), in the direction of a tangent vector \(\mathbf{v}\) according to the definition,
\begin{equation}
D_{\mathbf{v}}f(a)=\left.\frac{df(\gamma(t))}{dt}\right|_{t_0},
\end{equation}
where \(\mathbf{v}\) is the tangent vector corresponding to the equivalence class of curves \([\gamma]\). Note that this does not depend on the representative of the equivalence class chosen since with respect to any coordinate system \(x^i\),
\begin{align*}
\left.\frac{df(\gamma(t))}{dt}\right|_{t_0}&=\left.\sum_{i=1}^n\frac{\partial f}{\partial x^i}\frac{dx^i(\gamma(t))}{dt}\right|_{t_0}\\
&=\left.\sum_{i=1}^n\frac{\partial f}{\partial x^i}\frac{dx^i(\tilde{\gamma}(t))}{dt}\right|_{t_0}\\
&=\left.\frac{df(\tilde{\gamma}(t))}{dt}\right|_{t_0}
\end{align*}
Note also that this corresponds to the usual definition of a directional derivative in \(\RR^n\) as,
\begin{equation}
D_vf(a)=\frac{d}{dt}f(a+t\mathbf{v})|_{t=0},
\end{equation}
by considering the curve in \(\RR^n\) through the point \(a\) defined according to \(t\mapto a+t\mathbf{v}\).

Directional derivatives and derivations

Let us regard the directional derivative, \(D_\mathbf{v}\), as a map, \(D_\mathbf{v}:C^\infty(\RR^n)\mapto\RR\), at any point in \(\RR^n\). Then directional derivatives are examples of derivations according to the following definition.

Definition A map \(X:C^\infty(\RR^n)\mapto\RR\) is called a derivation at a point \(a\in\RR^n\) if it is linear over \(\RR\) and satisfies the Leibniz rule,
\begin{equation}
X(fg)=X(f)g(a)+f(a)X(g).
\end{equation}

To see that \(D_\mathbf{v}\) is a derivation at some point \(a=\gamma(t_0)\in\RR^n\) with \(\mathbf{v}=[\gamma]\),
\begin{align*}
D_\mathbf{v}(f+g)(a)&=\left.\frac{d}{dt}((f+g)(\gamma(t)))\right|_{t_0}\\
&=\left.\frac{df(\gamma(t))}{dt}\right|_{t_0}+\left.\frac{dg(\gamma(t))}{dt}\right|_{t_0}\\
&=D_{\mathbf{v}}f(a)+D_{\mathbf{v}}g(a),
\end{align*}
and for \(c\in\RR\),
\begin{align*}
D_\mathbf{v}(cf)(a)&=\left.\frac{d}{dt}((cf)(\gamma(t)))\right|_{t_0}\\
&=c\left.\frac{df(\gamma(t))}{dt}\right|_{t_0}\\
&=cD_\mathbf{v}(f)(a),
\end{align*}
and
\begin{align*}
D_\mathbf{v}(fg)(a)&=\left.\frac{d}{dt}(f(\gamma(t))g(\gamma(t)))\right|_{t_0}\\
&=\left.\frac{df(\gamma(t))}{dt}g(\gamma(t))\right|_{t_0}+\left.f(\gamma(t))\frac{dg(\gamma(t))}{dt}\right|_{t_0}\\
&=D_\mathbf{v}f(a)g(a)+f(a)D_\mathbf{v}g(a)
\end{align*}

The Leibniz rule is what really captures the essence of differentiation. Let’s consider some of its consequences. Suppose \(f\) is the constant function, \(f(x)=1\). Then for any derivation \(X\), \(X(f)=0\). This follows since \(f=ff\) and by the Leibniz rule, \(X(f)=X(ff)=X(f)f(a)+f(a)X(f)=2X(f)\), so \(Xf=0\). It follows immediately, by linearity of derivations, that \(Xf=0\) for any constant function \(f(x)=c\). Another consequence is that if \(f(a)=g(a)=0\) then \(X(fg)=0\) since \(X(fg)=X(f)g(a)+f(a)X(g)=0\).

It’s straightforward to verify that derivations at a point \(a\in\RR^n\) form a real vector space which we denote by \(\mathcal{D}_a(\RR^n)\). For any coordinate system \(x^i\) the partial derivatives \(\partial/\partial x^i\) are easily seen to be derivations and we’ll now demonstrate that the partial derivatives \(\partial/\partial x^i\) at \(a\) provide a basis for \(\mathcal{D}_a(\RR^n)\). Indeed, from Taylor’s theorem, we know that for any smooth function \(f\), in the neighbourhood of a point \(a\in\RR^n\) and in terms of coordinates \(x^i\),
\begin{equation}
f(x)=f(a)+\sum_{i=1}^n\left.(x^i-a^i)\frac{\partial f}{\partial x^i}\right|_a+\sum_{i,j=1}^n(x^i-a^i)(x^j-a^j)\int_0^1(1-t)\frac{\partial^2f(a+t(x-a))}{\partial x^i\partial x^j}dt.
\end{equation}
Consider applying the derivation \(X\) to \(f\) at \(a\), \(X(f)\).
\begin{equation}
Xf=X(f(a))+\sum_{i=1}^n\left.X((x^i-a^i))\frac{\partial f}{\partial x^i}\right|_a+\sum_{i,j=1}^nX((x^i-a^i)(x^j-a^j))\int_0^1(1-t)\frac{\partial^2f(a+t(x-a))}{\partial x^i\partial x^j}dt
\end{equation}
but \(X(f(a))=0\) since \(f(a)\) is a constant and \(X((x^i-a^i)(x^j-a^j))=0\) since both \((x^i-a^i)\) and \((x^j-a^j)\) are 0 at \(a\). Thus we have that,
\begin{equation}
Xf=\sum_{i=1}^n\left.X(x^i)\frac{\partial f}{\partial x^i}\right|_a.
\end{equation}
In other words \(X=\sum_{i=1}^nX(x^i)\partial/\partial x^i\). Notice that if \(y^i\) is any other coordinate system valid in the neighbourhood of \(a\) then
\begin{equation}
Xy^j=\left.\sum_{i=1}^nX(x^i)\frac{\partial y^j}{\partial x^i}\right|_a
\end{equation}
so that
\begin{align*}
Xf&=\sum_{i=1}^n\left.X(x^i)\frac{\partial f}{\partial x^i}\right|_a\\
&=\sum_{i=1}^n\left.X(x^i)\frac{\partial y^j}{\partial x^i}\frac{\partial f}{\partial y^j}\right|_a\\
&=\sum_{i=1}^nX(y^i)\left.\frac{\partial f}{\partial y^i}\right|_a,
\end{align*}
and the result does not depend on the chosen coordinate system. That the coordinate partials are linearly independent follows since applying \(\sum_ic^i\partial/\partial x^i=0\) to the coordinate functions \(x^i\) in turn yields \(c^i=0\) for all \(i\). It follows from what we’ve observed here that every derivation is the directional derivative to a curve,

So, to any tangent vector \(\mathbf{v}\) is associated the directional derivative \(D_\mathbf{v}\) which is a derivation. Are all derivations directional derivatives? The answer is yes. If we have a derivation \(X\) at a point \(a\) then we know that for any smooth function in a neighbourhood of \(a\), in terms of coordinates \(x^i\),
\begin{equation}
Xf=\sum_{i=1}^n\left.X(x^i)\frac{\partial f}{\partial x^i}\right|_a.
\end{equation}
We also know that the directional derivative of \(f\) in the direction of a tangent vector \(\mathbf{v}=[\gamma]\) at \(a=\gamma(t_0)\) is, again in terms of local coordinates \(x^i\),
\begin{equation}
D_{\mathbf{v}}f(a)=\left.\frac{df(\gamma(t))}{dt}\right|_{t_0}=\sum_{i=1}^n\left.\frac{dx^i(\gamma(t))}{dt}\frac{\partial f}{\partial x^i}\right|_{t_0}.
\end{equation}
So, if we choose a curve \(\gamma\) such that \(\gamma(t_0)=a\) and
\begin{equation}
\left.\frac{dx^i(\gamma(t))}{dt}\right|_{t_0}=X(x^i),
\end{equation}
then \(Xf=D_{\mathbf{v}}f(a)\). Thus, we can just take \(\gamma(t)=(a^1+X(x^1)(t-t_0),\dots,a^n+X(x^n)(t-t_0))\) (even though we’re explicitly relying on the vector space structure of \(\RR^n\) there is nothing essentially different required in the more general setting). Finally, we can ask whether each tangent vector corresponds to a unique derivation. This follows since if \(D_\mathbf{v}=D_\mathbf{w}\), where \(\mathbf{v}=[\gamma_1]\) and \(\mathbf{w}=[\gamma_2]\), then applying this in turn to each coordinate function, \(x^I\), we obtain,
\begin{equation*}
\left.\frac{dx^i(\gamma_1(t)}{dt}\right|_{t_0}=\left.\frac{dx^i(\gamma_2(t)}{dt}\right|_{t_0}
\end{equation*}
so \(\gamma_1\) and \(\gamma_2\) are in the same equivalence class, or, \(\mathbf{v}=\mathbf{w}\). We have therefore proved the following important theorem.

Theorem The vector spaces \(T_a(\RR^n)\) and \(\mathcal{D}_a(\RR^n)\) are isomorphic and under this isomorphism tangent vectors \(v\) map to derivations \(D_v\).

Under this isomorphism the standard basis vectors \(\mathbf{e}_i\) at \(a\) map to the partial derivatives \(\partial/\partial x^i\) at \(a\) and indeed in the more general setting of differential geometry, which the treatment here anticipates, it is usual to treat those partials as the basis “vectors” of the tangent space \(T_a(\RR^n)\). The basis \(\partial/\partial x^i\) is called the coordinate basis. Suppose our space was the plane \(\RR^2\) then the Cartesian coordinate basis would be \((\partial/\partial x,\partial/\partial y)\) corresponding respectively to the standard basis vectors \((\mathbf{e}_x,\mathbf{e}_y)\). If we choose to work with polar coordinates, the coordinate basis would be \((\partial/\partial r,\partial/\partial\theta)\) with
\begin{align*}
\frac{\partial}{\partial r}&=\frac{\partial x}{\partial r}\frac{\partial}{\partial x}+\frac{\partial y}{\partial r}\frac{\partial}{\partial y}\\
&=\cos\theta\frac{\partial}{\partial x}+\sin\theta\frac{\partial}{\partial y}
\end{align*}
and
\begin{align*}
\frac{\partial}{\partial\theta}&=\frac{\partial x}{\partial\theta}\frac{\partial}{\partial x}+\frac{\partial y}{\partial\theta}\frac{\partial}{\partial y}\\
&=-r\sin\theta\frac{\partial}{\partial x}+r\cos\theta\frac{\partial}{\partial y}
\end{align*}
Note that if in \(T_a(\RR^n)\) we adopt the usual Euclidean metric (a non-degenerate, symmetric, positive definite inner product, \((\,,\,)\)), such that the standard basis is orthonormal, \((\mathbf{e}_i,\mathbf{e}_j)=\delta_{ij}\), then the polar basis vectors, \(\partial/\partial r\) and \(\partial/\partial\theta\) of \(T_a(\RR^2)\) are not orthonormal. The corresponding normalised basis vectors would be \(\mathbf{e}_r\) and \(\mathbf{e}_\theta\) defined by
\begin{equation}
\mathbf{e}_r=\frac{\partial}{\partial r}=\cos\theta\mathbf{e}_x+\sin\theta\mathbf{e}_y
\end{equation}
and
\begin{equation}
\mathbf{e}_\theta=\frac{1}{r}\frac{\partial}{\partial\theta}=-\sin\theta\mathbf{e}_x+\cos\theta\mathbf{e}_y.
\end{equation}

In the case of cylindrical coordinates we find
\begin{align}
\frac{\partial}{\partial\rho}&=\cos\varphi\frac{\partial}{\partial x}+\sin\varphi\frac{\partial}{\partial y}\\
\frac{\partial}{\partial\varphi}&=-\rho\sin\varphi\frac{\partial}{\partial x}+\rho\cos\varphi\frac{\partial}{\partial y}\\
\frac{\partial}{\partial z}&=\frac{\partial}{\partial z}
\end{align}
and the corresponding normalised basis vectors are \(\mathbf{e}_\rho\), \(\mathbf{e}_\varphi\) and \(\mathbf{e}_z\) defined by
\begin{equation}
\mathbf{e}_\rho=\frac{\partial}{\partial\rho}=\cos\varphi\mathbf{e}_x+\sin\varphi\mathbf{e}_y,
\end{equation}
\begin{equation}
\mathbf{e}_\varphi=\frac{1}{\rho}\frac{\partial}{\partial\varphi}=-\sin\varphi\mathbf{e}_x+\cos\varphi\mathbf{e}_y,
\end{equation}
and
\begin{equation}
\mathbf{e}_z=\frac{\partial}{\partial z}=\mathbf{e}_z.
\end{equation}

In the case of sperical coordinates we find
\begin{align}
\frac{\partial}{\partial r}&=\cos\varphi\sin\theta\frac{\partial}{\partial x}+\sin\varphi\sin\theta\frac{\partial}{\partial y}+\cos\theta\frac{\partial}{\partial z}\\
\frac{\partial}{\partial\theta}&=r\cos\varphi\cos\theta\frac{\partial}{\partial x}+r\sin\varphi\cos\theta\frac{\partial}{\partial y}-r\sin\theta\frac{\partial}{\partial z}\\
\frac{\partial}{\partial\varphi}&=-r\sin\varphi\sin\theta\frac{\partial}{\partial x}+r\cos\varphi\sin\theta\frac{\partial}{\partial y}
\end{align}
and the corresponding normalised basis vectors are \(\mathbf{e}_r\), \(\mathbf{e}_\theta\) and \(\mathbf{e}_\varphi\) defined by
\begin{equation}
\mathbf{e}_r=\frac{\partial}{\partial r}=\cos\varphi\sin\theta\mathbf{e}_x+\sin\varphi\sin\theta\mathbf{e}_y+\cos\theta\mathbf{e}_z,
\end{equation}
\begin{equation}
\mathbf{e}_\theta=\frac{1}{r}\frac{\partial}{\partial\theta}=\cos\varphi\cos\theta\mathbf{e}_x+\sin\varphi\cos\theta\mathbf{e}_y-\sin\theta\mathbf{e}_z,
\end{equation}
and
\begin{equation}
\mathbf{e}_\varphi=\frac{1}{r\sin\theta}\frac{\partial}{\partial\varphi}=-\sin\varphi\mathbf{e}_x+\cos\varphi\mathbf{e}_y.
\end{equation}