Section 2.2 Partial derivatives
Notation 2.8. The domain \(\Omega\).
The symbol \(\Omega\) will always denote a non-empty open subset of some Euclidean space \(\R^N\text{.}\) Often we will further assume that \(\Omega\) is connected, in which case it is traditionally called a domain.
Notation 2.9. First derivatives.
- If \(x \in \Omega \subseteq \R^N\) and \(f \maps \Omega \to \R\text{,}\) then the \(j\)-th partial derivative of \(f\) evaluated at \(x\) is the limit\begin{equation*} \partial_j f(x) = f_{x_j} (x) = \frac{\partial f}{\partial x_j}(x) = \lim_{t \to 0} \frac{f(x + te_j) - f(x)}t \end{equation*}whenever it exists. If the limit exists for all \(x \in \Omega\text{,}\) then \(\partial_j f\) is a function \(\Omega \to \R\text{.}\) Moreover, \(\partial_j\) is the linear operator which sends \(f\) to \(\partial_j f\text{.}\)
- Collecting all of the first partials of \(f\) into a vector we obtain the gradient\begin{equation*} \nabla f = \begin{pmatrix} \partial_1 f \\ \vdots \\ \partial_N f \end{pmatrix} \text{.} \end{equation*}As above we can think of \(\nabla\) as the linear operator \(f \mapsto \nabla f\text{.}\)
- More generally, if \(F \maps \Omega \to \R^M\text{,}\) then \(DF\) denotes the Jacobian matrix with entries\begin{equation*} (DF)_{ij} = \partial_j F_i\text{.} \end{equation*}In other words,\begin{equation*} DF = \begin{pmatrix} (\nabla F_1)^\top \\ \vdots \\ (\nabla F_N)^\top \end{pmatrix} \end{equation*}is a matrix whose rows are (the transposes of) the gradients of the component functions \(F_1,\ldots,F_N\text{.}\) We will only ever use this notation when the partial derivatives of the components of \(F\) are continuous, in which case the Jacobian matrix is the same thing as the Fréchet derivative.
- When \(F \maps \Omega \to \R^N\text{,}\) the Jacobian matrix is square, and so we can take its trace, called the divergence,\begin{equation*} \nabla \cdot F = \trace DF = \partial_i F_i = \frac{\partial F_1}{\partial x_1} + \cdots + \frac{\partial F_N}{\partial x_N}. \end{equation*}
Notation 2.10. Second derivatives.
We denote second partial derivatives of a function \(f \maps \Omega \to \R\) by
\begin{equation*}
\partial_{ij} f
= \partial_i \partial_j f
= f_{x_j x_i}
= \frac{\partial^2 f}{\partial x_i \partial x_j}.
\end{equation*}
These are collected into the symmetric Hessian matrix \(D^2 f\text{,}\) whose \((i,j)\)-th entry is
1
At least when \(f\) is \(C^2\text{,}\) as will always be the case in this unit when we use this notation.
\begin{equation*}
(D^2 f)_{ij} = \partial_{ij} f = \frac{\partial^2 f}{\partial x_i \partial x_j}.
\end{equation*}
The trace of this matrix is the Laplacian of \(f\text{,}\)
\begin{equation*}
\Delta f
= \nabla \cdot \nabla f
= \frac{\partial^2 f}{\partial x_i \partial x_i}
= \partial_{ii} f = \trace D^2 f
= \frac{\partial^2 f}{\partial x_1^2}
+ \cdots + \frac{\partial^2 f}{\partial x_N^2}
\end{equation*}
and \(\Delta\) is the Laplace operator.
Notation 2.11. Higher derivatives.
We write higher-order partial derivatives of a function \(f \maps \Omega \to \R\) using the notation
\begin{equation*}
\partial^\alpha = \partial_1^{\alpha_1} \cdots \partial_N^{\alpha_N}
\end{equation*}
where here \(\alpha\) is a multiindex. \(D^k f\) is the collection of all partials \(\partial^\alpha f\) with order \(|\alpha|
= k\text{.}\) The zeroth-order partial \(1 = \partial^0\) does not take any derivatives at all, i.e. \(\partial^0 f = f\text{.}\)
Notation 2.12. Space versus time.
In Chapter 6 we will consider functions \(f \maps \Omega \times [0,T] \to \R\text{,}\) written \(f=f(x,t)\text{,}\) where \(x \in \R^N\) is thought of as ‘space’ and \(t \in [0,T]\) is thought of as ‘time’. Following common practice, in this case we will use the symbols \(\partial_j, \partial_{ij}\text{,}\) \(\Delta\) etc. refer the dependence of \(f\) on \(x\text{.}\) Derivatives with respect to \(t\) will be denoted \(\partial_t f = f_t = \partial f/\partial t\text{.}\)
Notation 2.13. Named variables.
When we have agreed on the names for the arguments of a function, we can also use these names to denote the corresponding partial derivatives. For instance, if we have agreed that \(f \maps \R^2 \to \R\) is written as \(f = f(u,v)\text{,}\) then the corresponding partials can be denoted \(f_u = \partial_u f\) and \(f_v =
\partial_v f\text{.}\)
Notation 2.14. Order of operations.
Our convention is that differential operators like \(\partial_i,\nabla,D,\Delta\) are applied to a named function \(f\) before that function is evaluated on its arguments. So, for instance,
\begin{equation*}
\partial_i f(2x) = (\partial_i f)(2x)
\end{equation*}
denotes the \(i\)-th partial derivative of \(f\) evaluated at the point \(2x\text{.}\) When we want to talk about the partial derivative of the composite function \(x \mapsto f(2x)\text{,}\) we can either give this function a name, or else write something like
\begin{equation*}
\partial_i [f(2x)]
\end{equation*}
with an extra set of brackets. In Section 2.3 we will see that these two functions are related by \(\partial_i[f(2x)] = 2\partial_i f(2x)\text{.}\)
Notation 2.15. Treating formulas as functions.
In a strong break with the conventions of MA30252, in this unit we will frequently conflate formulas and functions. For example, rather than writing
we will simply write “\(\partial_2 (x_1^2 x_2^3) = 3x_1^2 x_2^2\)”. Similarly, if \(f \maps \R^3 \to \R\) then will write \(x_2 \cos f + x_3\) as shorthand for the function \(x \mapsto x_2 \cos(f(x)) + x_3\text{.}\)“Let \(f \maps \R^2 \to \R\) be the function defined by \(f(x)=x_1^2 x_2^3\text{.}\) Then \(\partial_2 f(x) = 3x_1^2 x_2^2\) for all \(x \in \R^2\text{.}\)”
Exercises Exercises
1. (PS2) Derivatives of linear and quadratic functions.
Let \(A \in \R^{N \times N}\text{.}\) Calculate the following derivatives, using the Summation convention or otherwise.
(a)
\(Dx\)
Solution.
Using the definition of the Jacobian matrix, we have
\begin{gather*}
(Dx)_{ij}
= \partial_j x_i
= \delta_{ji}\text{,}
\end{gather*}
i.e. that \(Dx\) the identity matrix.
(b)
\(D(Ax)\)
Solution.
We calculate
\begin{align*}
(D(Ax))_{ij}
\amp= \partial_j (Ax)_i\\
\amp= \partial_j (A_{ik} x_k)\\
\amp= A_{ik} \partial_j x_k\\
\amp= A_{ik} \delta_{jk}\\
\amp= A_{ij} \text{.}
\end{align*}
In other words, \(D(Ax)=A\text{.}\)
(c)
\(\nabla \cdot (Ax)\)
Solution.
Using the previous part, we calculate
\begin{gather*}
\nabla \cdot (Ax)
=
\trace D(Ax)
= \trace A = A_{ii}\text{.}
\end{gather*}
(d)
\(\nabla (x \cdot Ax)\)
Solution.
We calculate
\begin{align*}
(\nabla (x \cdot Ax))_i
\amp=
\partial_i ( x_j A_{jk} x_k)\\
\amp=
\delta_{ij}A_{jk} x_k
+ x_j A_{jk} \delta_{ki}\\
\amp=
A_{ik} x_k
+ x_j A_{ji}\\
\amp=
A_{ij} x_j
+ x_j A_{ji}\\
\amp=
(A_{ij}+A_{ji}) x_j\text{.}
\end{align*}
In other words, \(\nabla (x \cdot Ax) = (A+A^\top)x\text{.}\)
Comment.
It is easy to check that, for any \(x \in \R^N\text{,}\) \(x \cdot Ax = x \cdot A^\top
x\text{.}\) In particular, the function \(x \mapsto x \cdot Ax\) in this question is exactly the same function as \(x \mapsto \tfrac 12 x \cdot
(A+A^\top)x\text{.}\) This strongly suggests that the expression \(A+A^\top\) should somehow appear in the formula for the gradient, either explicitly or implicitly.
(e)
\(D^2 (x \cdot Ax)\)
Solution.
Starting from the formula from the previous part, we have
\begin{align*}
(D^2 (x \cdot Ax))_{ik}
\amp =
\partial_k \partial_i (x \cdot Ax)\\
\amp =
\partial_k [(A_{ij}+A_{ji}) x_j]\\
\amp =
(A_{ij}+A_{ji}) \delta_{jk}\\
\amp =
A_{ik}+A_{ki}\text{.}
\end{align*}
In other words, \(D^2 (x \cdot Ax) = A + A^\top\text{.}\)
Comment.
Since \(x \mapsto x \cdot Ax\) is a smooth function, its Hessian \(D^2(x
\cdot Ax)\) has to be a symmetric matrix. So it can’t possibly be \(2A\text{,}\) which isn’t necessarily symmetric, but could plausibly be \(A+A^\top\text{,}\) which is always symmetric.
(f)
\(\Delta (x \cdot Ax)\)
Solution.
Starting from the formula from the previous part, we have
\begin{align*}
\Delta (x \cdot Ax)
\amp=
\partial_i \partial_i (x \cdot Ax)\\
\amp =
A_{ii}+A_{ii}\\
\amp =
2 \trace A\text{.}
\end{align*}
(g)
\(\Delta (\abs{Ax}^2)\)
Solution 1. Direct argument
We calculate
\begin{align*}
\Delta (\abs{Ax}^2)
\amp=
\partial_i \partial_i (A_{jk} x_k A_{j\ell} x_\ell)\\
\amp=
\partial_i
(A_{jk} \delta_{ki} A_{j\ell} x_\ell
+ A_{jk} x_k A_{j\ell} \delta_{\ell i})\\
\amp=
\partial_i
(A_{ji} A_{j\ell} x_\ell
+ A_{jk} x_k A_{ji} )\\
\amp=
A_{ji} A_{j\ell} \delta_{\ell i}
+ A_{jk} \delta_{ki} A_{ji} \\
\amp=
A_{ji} A_{ji}
+ A_{ji} A_{ji} \\
\amp=
2 \abs A^2,
\end{align*}
where in the last step we have recognised the matrix norm from Definition 2.5.
Solution 2. Slicker argument
Perhaps unsurprisingly, we can save a bit of work by reusing one of the earlier parts. Note that
\begin{equation*}
\abs{Ax}^2 = \langle Ax,Ax\rangle = \langle x,A^\top Ax\rangle = x \cdot Bx
\end{equation*}
where \(B=A^\top A\text{.}\) By an earlier part we therefore have
\begin{equation*}
D^2 (\abs{Ax}^2) = B + B^\top = 2B\text{,}
\end{equation*}
where in the last step we have used that \(B\) is symmetric. In particular,
\begin{equation*}
\Delta(\abs{Ax}^2) = \trace(2B) = 2A_{ij}A_{ij} = 2
\abs{A}^2\text{.}
\end{equation*}
2. (PS2) Derivative of the modulus.
Consider the function \(x \mapsto \abs x\text{,}\) which is \(C^\infty\) on \(\Omega = \R^N
\without \{0\}\text{.}\) Keeping in mind Remark 2.6, show that
\begin{equation*}
\nabla \abs x = \frac x{\abs x} \text{.}
\end{equation*}
Solution 1.
We have
\begin{align*}
\partial_i \abs x
\amp=
\partial_i \sqrt{x_1^2+\cdots+x_N^2}\\
\amp=
\tfrac 12 (x_1^2+\cdots+x_N^2)^{-1/2} 2x_i\\
\amp=
\frac{x_i}{\abs x}\text{.}
\end{align*}
In other words,
\begin{equation*}
\nabla \abs x = \frac x{\abs x} \text{.}
\end{equation*}
Solution 2.
We can actually use the summation convention here if we square things first and then work backwards. Let \(f \maps x \mapsto \abs x\) and \(g \maps x \mapsto \abs x^2 = (f(x))^2\text{.}\) Then using the chain rule we have
\begin{gather}
\nabla g(x) = 2f(x) \nabla f(x) = 2 \abs x \nabla f(x)\text{.}\tag{✶}
\end{gather}
Calculating the left hand side using the summation convention and the product rule we get
\begin{equation*}
(\nabla g(x))_i = \partial_i (x_j x_j) = 2 x_j \partial_i x_j
= 2 x_j \delta_{ij} = 2 x_i\text{,}
\end{equation*}
\begin{equation*}
\nabla f(x) = \frac 1{2 \abs x} \nabla g(x) = \frac 1{2 \abs x} 2x = \frac x{\abs x}\text{.}
\end{equation*}
Comment.
As we talked about in class and in the notes, there are many different ways to write down and think about the chain rule. Some of these involve introducing ‘intermediate variables’. For instance, in this example we could set \(y=\abs x^2\) and then write
\begin{equation*}
\partial_i \abs x = \frac{\partial \sqrt y}{\partial x_i}
= \frac{d\sqrt y}{dy} \frac{\partial y}{\partial x_i}
= \frac 12 y^{-1/2} 2x_i
= \frac{x_i}{\abs x}\text{.}
\end{equation*}
When writing things this way you must use different letters for the different variables \(x \in \R^N\) and \(y=\abs x^2 \in \R\text{.}\) Using the same letter for both of these leads to writing things like
\begin{equation*}
\partial_i \abs x = \frac{\partial \sqrt x}{\partial x_i}
= \frac{d\sqrt x}{dx} \frac{\partial x}{\partial x_i}
= \frac 12 x^{-1/2} 2x_i
= \frac{x_i}{\abs x}
\end{equation*}
which are extremely confusing if taken literally.