The chain rule

Section 2.3 The chain rule

Recordings on Re:View.

A version of the following chain rule has been proved in Analysis 2B.

Theorem 2.16. Chain rule.

Suppose that \(U \subseteq \R^N\) and \(V \subseteq \R^M\) are open sets, that \(G \maps U \to \R^M\) and \(F \maps V \to \R^K\) are continuously differentiable, and that \(g(U) \subset V\) so that \(F \circ G \maps U \to \R^K\) is a well-defined function. Then the composition \(F \circ G\) is also continuously differentiable, with Jacobian matrix

\begin{equation} D(F\circ G)(x) = DF(G(x)) DG(x)\text{.}\tag{2.1} \end{equation}

Equivalently, using the Summation convention,

\begin{equation} \partial_j (F_i \circ G)(x) = \partial_k F_i(G(x))\, \partial_j G_k(x)\text{.}\tag{2.2} \end{equation}

Remark 2.17.

As with the single-variable chain rule, it is perhaps easier to remember (2.2) if we introduce additional variable names and use Leibniz notation. Let \(f=F_i\) be one of the component functions of \(F\text{.}\) Denoting points in \(\R^N\) by \(x\) and setting \(y=G(x)\text{,}\) we can think of (2.2) as saying

\begin{equation} \frac{\partial}{\partial x_j} f(y) = \frac{\partial f}{\partial y_k} \frac{\partial y_k}{\partial x_j}\text{.}\tag{2.3} \end{equation}

See Figure 2.1 below for an illustration.

Figure 2.1. A visualisation of a concrete case of (2.3) with \(N=3\) and \(M=2\text{.}\) We draw a graph where the nodes represent the variables (\(f\text{,}\) \(y_k\) and \(x_j\)) and the edges represent the basic partial derivatives (\(\partial f/\partial y_k\) and \(\partial y_k/\partial x_j\)) associated to the functional relationships \(f = f(y)\) and \(y_k=g_k(x)\text{.}\) In order to calculate \(\partial f/\partial x_3\text{,}\) we find each path in this graph from \(f\) to \(x_3\text{,}\) multiply together the partial derivatives associated to the edges in this path, and then sum up over all such paths.

When applying the chain rule in the functional form (2.2), the relevant functions \(F,G\) are rarely both handed to us. Instead, we will have to discover them for ourselves and, especially when things get complicated, give them names. Similarly, when using the Leibniz notation form the of the chain rule in (2.3), the intermediate variable \(y\) is typically something we will have to figure out for ourselves and often give a name.

We will often apply Theorem 2.16 in the following situation.

Definition 2.18. Radial symmetry.

A function \(f \maps \Omega \to \R\) is called radially symmetric if it is constant on spheres \(\{ \abs x = r\}\text{,}\) or, equivalently, if there exists \(\phi \maps [0,\infty) \to \R\) such that \(f(x)=\phi(\abs x)\text{.}\)

Exercises Exercises

1. (PS2) Derivatives after shifting and scaling.

Let \(u \in C^k(B_r(y))\) for some \(r \gt 0\text{,}\) \(y \in \R^N\) and \(k \ge 0\text{.}\) Using the fact that \(x \mapsto y+rx\) is a bijection \(B_1(0) \to B_r(y)\text{,}\) define \(v \in C^k(B_1(0))\) by

\begin{gather*} v(x) = u(y+rx)\text{.} \end{gather*}

(a)

What is the inverse of the mapping \(x \mapsto y + rx\text{?}\)

Solution.

\(x \mapsto r^{-1}(x-y)\text{.}\)

(b)

Suppose \(k \ge 1\text{.}\) Calculate \(\nabla u\) in terms of derivatives of \(v\text{.}\)

Solution.

To simplify the notation we introduce an intermediate variable \(z = r^{-1}(x-y)\text{.}\) Using the chain rule and the previous part we have

\begin{align*} \partial_i u(x) \amp = \partial_i [v(z)]\\ \amp = \partial_j v(z) \partial_i z_j\\ \amp = \partial_j v(z) r^{-1}\delta_{ij}\\ \amp = r^{-1} \partial_i v(z)\text{,} \end{align*}

or in other words \(\nabla u(x) = r^{-1} \nabla v(r^{-1}(x-y))\text{.}\)

(c)

Suppose that \(k \ge 2\text{.}\) Calculate \(D^2 v\) in terms of derivatives of \(u\text{.}\)

Solution.

We repeatedly use the previous part, with \(z\) defined in the same way,

\begin{align*} \partial_{ij} u(x) \amp = \partial_i \partial_j [v(z)]\\ \amp = r^{-1}\partial_i [\partial_j u(z)]\\ \amp = r^{-2} \partial_i \partial_j u(z)\\ \amp = r^{-2} \partial_{ij} u(z),\text{,} \end{align*}

or in other words \(D^2 u(x) = r^{-2} D^2 v(r^{-1}(x-y))\text{.}\)

The question asked for \(D^2 v\) in terms of derivatives of \(u\text{,}\) and so we rewrite this as \(D^2 v(x) = r^2 D^2 u(y+rx)\text{.}\)

(d)

Let \(\alpha\) be a multiindex with \(\abs \alpha \le k\text{.}\) Give a formula for \(\partial^\alpha u\) in terms of derivatives of \(v\text{.}\) You do not need to provide a detailed justification.

Solution.

We have now spotted the pattern: each time we differentiate \(u\) we simply get the same derivative applied to \(v\) but with an additional factor of \(r^{-1}\text{.}\) So \(\partial^\alpha u(x) = r^{-\abs \alpha} \partial^\alpha v(r^{-1}(x-y))\text{.}\)

2. (PS2) Derivatives of symmetric functions.

Let \(\Omega = \R^N \without \{0\}\) and \(p \in \R \without \{0\}\text{.}\)

(a)

Suppose that \(f \maps \Omega \to \R\) is continuously differentiable, and that it is radially symmetric with \(f(x) = \phi(\abs x)\text{.}\) Show that

\begin{equation*} \nabla f(x) = \phi'(\abs x) \frac x{\abs x} \text{.} \end{equation*}

Hint.

Exercise 2.2.2 is useful.

Solution.

We calculate

\begin{align*} \partial_i f(x) \amp= \partial_i [\phi(\abs x)]\\ \amp= \phi'(\abs x) \partial_i \abs x\\ \amp= \phi'(\abs x) \frac{x_i}{\abs x}, \end{align*}

where in the last step we have used Exercise 2.2.2.

Comment.

See this comment for Exercise 2.2.2.

(b)

Calculate \(\nabla \abs x^p\text{.}\)

Solution.

By the previous part we have

\begin{equation*} \nabla \abs x^p = p\abs x^{p-2} x\text{.} \end{equation*}

(c)

Calculate \(D (\abs x^p x)\text{.}\)

Solution.

Using the previous part and the product rule, we have

\begin{align*} (D(\abs x^p x))_{ij} \amp= \partial_j (\abs x^p x_i )\\ \amp= p\abs x^{p-2} x_j x_i + \abs x^p \delta_{ij}\text{.} \end{align*}

Comment.

Some authors write the matrix with entries \(x_i x_j\) as \(x \otimes x\text{,}\) in which case we can write the above formula as

\begin{equation*} D(\abs x^p x) = p\abs x^{p-2} (x \otimes x) + \abs x^p I, \end{equation*}

where here \(I\) is the identity matrix.

(d)

Calculate \(\nabla \log \abs x\text{.}\)

Solution.

We calculate

\begin{align*} \nabla \log \abs x \amp = \frac 1{\abs x} \nabla \abs x\\ \amp = \frac x{\abs x^2}\text{.} \end{align*}

(e)

Calculate \(\Delta \abs x^p\text{.}\)

Hint.

Using the summation convention, you will likely end up with a term involving \(\delta_{ii}=N\text{.}\)

Solution.

Using our formula for \(\nabla \abs x^p\text{,}\) we calculate

\begin{align*} \Delta \abs x^p \amp = \partial_i (p\abs x^{p-2} x_i)\\ \amp = p(p-2)\abs x^{p-4} x_i x_i + p\abs x^{p-2} \delta_{ii}\\ \amp = p (p-2+N) \abs x^{p-2}\text{.} \end{align*}

(f)

Calculate \(\Delta \log \abs x\text{.}\)

Hint.

As with the previous part you will likely encounter a term involving \(\delta_{ii}=N\text{.}\)

Solution.

Using our formula for \(\nabla \log \abs x\text{,}\) we calculate

\begin{align*} \Delta \log \abs x \amp = \partial_i (\abs x^{-2} x_i)\\ \amp = -2\abs x^{-4}x_i x_i +\abs x^{-2}\delta_{ii}\\ \amp = (N-2)\abs x^{-2}\text{.} \end{align*}

(g)

Conclude that the function \(\Phi \in C^\infty(\Omega)\) defined by

\begin{align*} \Phi(x) = \begin{cases} \abs x^{2-N} \amp N \ge 3,\\ -\log \abs x \amp N = 2 \end{cases} \end{align*}

satisfies \(\Delta \Phi = 0\) on \(\Omega\text{.}\)

Solution.

This follows immediately from the previous two parts.

3. The Kelvin transform.

Let \(u \in C^2(\R^N)\) and define \(v \in C^2(\R^N \without \{0\})\) by

\begin{equation*} v(x) = \frac 1{\abs x^{N-2}} u\Big( \frac x{\abs x^2} \Big)\text{.} \end{equation*}

Show that

\begin{equation*} \Delta v(x) = \frac 1{\abs x^{N+2}} \Delta u \Big( \frac x{\abs x^2} \Big)\text{.} \end{equation*}

Hint.

This is a challenging calculation, and somewhat beyond the scope of what we will need in this unit. Before attempting the general case, it might be worthwhile to look at the special case \(N=2\) where the formula for \(v\) is simpler.

Prev Top Next