Section 2.7 Maxima and minima
Since our main tool is called the maximum principle, it should come as no surprise that we will need some information about the behaviour of functions near their maxima and minima.
Definition 2.32. Maximum and minimum.
Consider a function \(f\maps A \to \R\text{,}\) \(A\subseteq \R^N\text{.}\) We say that \(f\) has a (global) maximum if there is a point \(p \in A\) such that \(f(x) \le f(p)\) for all \(x \in A\text{.}\) In this case we write
\begin{equation*}
\max_A f= \max_{x\in A}f(x) =f(p)
\end{equation*}
We say \(f\) has a local maximum at \(p\in A\) if there exists \(\delta \gt 0\) such that \(f(x) \le f(q)\) for all \(x \in B_\delta(q) \cap
A\text{.}\) Global and local minima are defined similarly, and \(\min_A f =
-\max_A(-f)\) if either exists.
It is important to keep in mind that a function \(f \maps A \to \R\) need not have any maxima or minima; counterexamples are requested in Exercise 2.8.3. However, from Analysis 1 we know that \(f\) will always have a supremum and infimum, at least provided we allow the possibility that these are \(\pm\infty\text{.}\)
Definition 2.33. Supremum and infimum.
For a nonempty set \(A \subset \R^N\) and a function \(f \maps A \to \R\text{,}\) the supremum \(\sup_A f = \sup_{x \in A} f(x)\) is the unique element \(s\) of \(\R \cup \{+ \infty\}\) such that
- \(f(x) \le s\) for all \(x \in A\)
- if \(t \lt s\text{,}\) then there exists a point \(y \in A\) with \(f(y)\gt t\text{.}\)
The infimum \(\inf_A f \in \R \cup \{-\infty\}\) is defined similarly but with the inequalities reversed, and satisfies \(\inf_A f =
-\sup_A(-f)\text{.}\)
Proposition 2.34. First and second derivative tests.
If \(f \in C^2(\Omega)\) has a local maximum (minimum) at \(y \in \Omega\text{,}\) then \(\nabla f(y)=0\) and \(D^2 f(y)\) is negative (positive) semi-definite.
Proof.
Suppose that \(f\) has a local maximum at \(y \in \Omega\text{.}\) For any nonzero \(\xi \in \R^N\text{,}\) the function
\begin{equation*}
g(t) = f(y+t\xi)
\end{equation*}
is \(C^2\) in a small open interval containing \(t = 0\text{,}\) where it has a local maximum. From single-variable calculus, we therefore know that \(g'(0)=0\) and \(g''(0) \le 0\text{.}\) Using the chain rule, we calculate
\begin{align*}
g'(t)
\amp = \frac d{dt} f(y+t\xi) \\
\amp = \partial_i f(y+t\xi) \frac d{dt} (y+t\xi)_i\\
\amp = \partial_i f(y+t\xi) \xi_i
\end{align*}
and hence
\begin{gather}
0 = g'(0) = \partial_i f(y) \xi_i \text{.}\tag{✶}
\end{gather}
Differentiating a second time, again using the chain rule, we find
\begin{equation*}
g''(t) = \partial_{ij} f(y+t\xi) \xi_i \xi_j
\end{equation*}
and hence
\begin{gather}
0 \ge g''(0) = \partial_{ij} f(y) \xi_i \xi_j\text{.}\tag{✶✶}
\end{gather}
Since (✶) must hold for any \(\xi \in \R^N\text{,}\) the only possibility is that \(\nabla f(y)=0\text{.}\) Similarly, by Corollary 2.29, the only way for (✶✶) to hold for all \(\xi \in \R^N\) is for \(D^2 f(y)\) to be negative semi-definite. The proof for a local minimum is nearly identical.
As in single-variable calculus, the derivative tests above can be thought of in terms of Taylor polynomials. The relevant result is the following.
Proposition 2.35. Multivariate Taylor’s theorem.
Let \(f \in C^2(\Omega)\text{,}\) and suppose that \(x_0 \in B_\varepsilon(x_0) \subseteq \Omega\text{.}\) Then for any \(h \in \R^N\) with \(\abs h \lt \varepsilon\text{,}\) there exists \(\theta \in [0,1]\) such that
\begin{equation}
f(x_0+h)=f(x_0)+\partial_i f(x_0)h_i + \tfrac 1 2 \partial_{ij} f(x_0+\theta h) h_i h_j.\tag{2.5}
\end{equation}
In particular, by the continuity of \(\partial_{ij} f\) we have
\begin{equation}
f(x_0+h) = f(x_0)+ \partial_jf(x_0)h_j +\tfrac 1 2
\partial_{jk} f(x_0) h_j h_k+ \abs h^2 \rho(h)\tag{2.6}
\end{equation}
where the remainder term \(\rho(h) \to 0\) as \(h \to 0\text{.}\)
Proof.
See Analysis 2B.
Exercises Exercises
1. (PS3) Minimising a quadratic polynomial.
Let \(\alpha \in \R\) be a constant, and let \(f \maps \R^2 \to \R\) be the quadratic polynomial
\begin{gather*}
f(x) = x_1^2 + x_2^2 + \alpha x_1 (x_2 + 1) \text{.}
\end{gather*}
In this exercise we will calculate \(\inf_{\R^2} f\text{.}\)
(a)
Find all ‘critical points’ \(x \in \R^2\) where \(\nabla f(x) = 0\text{.}\)
Solution.
We calculate
\begin{align*}
\partial_1 f(x) \amp = 2x_1 + \alpha (x_2 + 1)\\
\partial_2 f(x) \amp = 2x_2 + \alpha x_1\text{.}
\end{align*}
Setting these both equal to zero, we get a linear system of equations for \((x_1,x_2)\text{.}\) When \(\alpha = \pm 2\text{,}\) this system has no solutions. Otherwise the unique solution is
\begin{gather*}
x = \Big( \frac{2\alpha}{\alpha^2-4}, -\frac{\alpha^2}{\alpha^2-4} \Big)\text{.}
\end{gather*}
Comment.
It’s important here for us to realise that there are no solutions when \(\alpha = \pm 2\text{.}\) Certainly our formula for the critical point involves \(\alpha^2-4\) in a denominator and so doesn’t make sense for \(\alpha = \pm
2\text{.}\)
(b)
Calculate the Hessian matrix \(D^2 f\text{.}\) When is \(D^2 f\) positive semi-definite? When is \(D^2 f\) positive definite?
Solution.
The Hessian is
\begin{align*}
D^2 f(x) =
\begin{pmatrix}
2 \amp \alpha \\ \alpha \amp 2
\end{pmatrix}\text{.}
\end{align*}
The eigenvalues of this matrix are \(2 - \alpha\) and \(2+\alpha\text{,}\) and so it is positive definite only if both \(2 - \alpha \ge 0\) and \(2 + \alpha \ge 0\text{,}\) i.e. if \(\abs \alpha \le 2\text{.}\) Similarly \(D^2 f\) is positive definite when \(\abs \alpha \lt 2\text{.}\)
(c)
Show that \(f\) can have a local minimum only if \(\abs \alpha \lt 2\text{.}\)
Hint.
Use the previous parts and (both parts of) Proposition 2.34.
Solution.
By Proposition 2.34, a local minimum of \(f\) must be a point \(y \in \R^2\) where \(\nabla f(y) = 0\) and \(D^2 f(y)\) is positive semi-definite. We have shown that \(\nabla f(y) = 0\) is possible only when \(\alpha \ne 2\text{,}\) and that \(D^2 f(y)\) is positive semi-definite only when \(\abs \alpha \le 2\text{,}\) and so to have a local minimum we must have \(\abs
\alpha \lt 2\text{.}\)
(d)
Let \(\abs \alpha \lt 2\) and let \(y \in \R^2\) be a critical point of \(f\text{.}\) Since \(f\) is a quadratic polynomial, it is equal to its second-order Taylor polynomial at any point, which at \(y\) gives
\begin{align*}
f(y+h) \amp = f(y) + \tfrac 12 \partial_{ij} f(y) h_i h_j
\quad \text{for all }h \in \R^2\text{.}
\end{align*}
Use this identity, the previous parts, and Corollary 2.29 to show that \(y\) is in fact a global minimum of \(f\text{.}\) Conclude that
\begin{gather*}
\inf_{\R^2} f = \min_{\R^2} f = f(y) = -\frac{\alpha^2}{4-\alpha^2} \text{.}
\end{gather*}
Hint.
Use Corollary 2.29 to estimate \(\tfrac 12 \partial_{ij} f(y) h_i
h_j\) from below. Plugging this estimate into the identity gives \(f(y+h) \le f(y)\) for all \(h \in \R^2\text{.}\)
Solution.
From previous parts we know that \(f\) can only have a minimum if \(\abs \alpha \lt 2\text{,}\) and that in this case the Hessian matrix \(D^2 f\) has smallest eigenvalue \(\min(2+\alpha,2-\alpha) \gt 0\text{.}\) The lower bound in Corollary 2.29 therefore gives
\begin{gather*}
f(y+h) \ge \tfrac 12 \min(2+\alpha,2-\alpha) \abs h^2 + f(y) \ge f(y)
\end{gather*}
for all \(h \in \R^2\text{,}\) which implies
\begin{gather*}
\inf_{\R^2} f = \min_{\R^2} f = f(y) = \cdots = \frac{\alpha^2}{\alpha^2 - 4}\text{.}
\end{gather*}
Comment.
In general, \(D^2 f(y)\) positive definite and \(\nabla f(y) =0\) at an interior point \(y\) are sufficient conditions for \(f\) to have a local minimum at \(y\text{.}\) (This can be proved using Proposition 2.35.) They are far from sufficient, however, for \(f\) to have a global minimum at \(y\text{.}\) What saves us in this example is the fact that \(f\) is a quadratic polynomial, which makes Corollary 2.29 much more powerful.
(e) Optional.
Show that \(\inf_{\R^2} f = -\infty\) for \(\abs \alpha \ge 2\text{.}\)
Hint.
Look at the restrictions of \(f\) to lines spanned by eigenvectors of \(D^2 f\text{.}\)
Solution.
The eigenvectors of \(D^2 f\) are \((1,1)\) and \((1,-1)\text{,}\) which leads us to look at
\begin{align*}
f(t,t) \amp= (2+\alpha) t^2 + \alpha t,\\
f(t,-t) \amp= (2-\alpha) t^2 + \alpha t \text{.}
\end{align*}
We consider four cases:
- If \(\alpha \lt -2\) then \(f(t,t) \to -\infty\) as \(t \to \pm\infty\text{.}\)
- If \(\alpha \gt 2\) then \(f(t,-t) \to -\infty\) as \(t \to \pm\infty\text{.}\)
- If \(\alpha = -2\) then \(f(t,t) \to -\infty\) as \(t \to +\infty\text{.}\)
- If \(\alpha = 2\) then \(f(t,-t) \to -\infty\) as \(t \to -\infty\text{.}\)
Thus, whenever \(\abs \alpha \ge 2\text{,}\) we have \(\inf_{\R^2} f = -\infty\text{.}\)
2. A \(C^2\) domain.
Let \(\Omega =\{(x,y) \in \R^2 : y \lt f(x) \}\) where \(f \maps \R \to \R\) is a \(C^2\) function. Suppose that the origin \(p = (0,0) \in \partial\Omega\text{,}\) and that \(f_x(0)=0\text{.}\)
(a)
\begin{gather}
f(x) \le Cx^2 \text{ for } \abs x \le 1\tag{✶}
\end{gather}
where
\begin{equation*}
C = \frac 12 \max_{\abs x \le 1} \abs{f_{xx}(x)}\text{.}
\end{equation*}
(b)
For \(r \gt 0\text{,}\) consider the ball \(B=B_r((0,r))\text{,}\) which has \(p=(0,0) \in \partial B\text{.}\) By using the basic estimate \(\sqrt{1+a^2} \le 1+a/2\text{,}\) show that points \((x,y) \in B\) satisfy
\begin{gather}
y \gt \frac{x^2}{2r}\text{.}\tag{✶✶}
\end{gather}
(c)
(d)
(Optional) How would you go about generalising this argument to the case where \(f_x(0) \ne 0\text{?}\) What is the analogous statement in higher dimensions?