Section 2.1 Index notation
The following notational conventions are more-or-less standard, and allow us to more easily work with complex expressions involving functions and their partial derivatives.
Notation 2.1. Indices.
- When referring to a sequence \((x_1,x_2,\ldots)\text{,}\) we will often abuse notation and simply write \(x_n\) rather than \((x_n)_{n \in \N}\) or \((x_n)_{n \ge 1}\text{.}\) Similarly, we will often refer to vectors by their components and matrices by their entries.
- The space of all real \(M \times N\) matrices is denoted \(\R^{M \times N}\text{.}\)
- The Kronecker delta is the symbol\begin{equation*} \delta_{ij} = \begin{cases} 1 \amp i = j, \\ 0 \amp i \ne j \end{cases}\text{.} \end{equation*}In particular, the \(N \times N\) matrix with \((i,j)\)-th entry \(\delta_{ij}\) is the identity matrix.
- The columns of the identity matrix make up the canonical basis \(e_1,\ldots,e_N\) of \(\R^N\text{,}\) i.e. \(e_j \in \R^N\) is the vector with components \((e_j)_i = \delta_{ij}\text{.}\)
Notation 2.2. Summation convention.
We adopt the convention that when a term in an expression contains the same dummy index twice, there is an implicit sum over all allowable values of this index. If we do not want this implicit sum, then we must write ‘no sum’ or similar.
The above convention is best understood through some examples.
Example 2.3. Dot products.
Consider two vectors \(x, y \in \R^N\text{.}\) Their dot product is
\begin{equation*}
\langle x,y\rangle = x \cdot y = x_1 y_1 + \cdots + x_N y_N = \sum_{i=1}^N x_i y_i\text{.}
\end{equation*}
Using the Summation convention, we will simply write
\begin{equation*}
\langle x,y\rangle = x \cdot y = x_i y_i \text{,}
\end{equation*}
with an implicit sum over the repeated index \(i\text{.}\)
Example 2.4. Matrix multiplication.
Let \(A \in \R^{M \times N}\) and \(B \in \R^{N \times K}\) be real matrices. Then their matrix product \(C=AB\) has entries
\begin{equation*}
C_{ij} = A_{i1} B_{1j} + \cdots + A_{iN} B_{Nj}
= \sum_{k=1}^N A_{ik} B_{kj}\text{.}
\end{equation*}
Using the Summation convention, we will simply write
\begin{equation*}
C_{ij} = A_{ik} B_{kj}\text{,}
\end{equation*}
with an implicit sum over the repeated index \(k\text{.}\)
Definition 2.5. Vector and matrix norms.
- We denote the length (or norm or modulus) of a vector \(x \in \R^N\) by \(\abs x\text{.}\) Thus, with the summation convention, we have \(\abs x^2 = x_i x_i\text{.}\)
- Similarly, for matrices \(A \in \R^{M \times N}\) we define the norm \(\abs A\) by \(\abs A^2 = A_{ij} A_{ij}\text{.}\)
Remark 2.6. When to avoid the summation convention.
In light of the Summation convention, how ought we to interpret an expression such as
\begin{equation*}
\sqrt{x_i x_i} \text{?}
\end{equation*}
Do we place the sum outside of the square root,
\begin{equation*}
\sqrt{x_i x_i} = \sqrt{x_1^2}+\cdots + \sqrt{x_N^2} = \abs{x_1} + \cdots + \abs{x_N},
\end{equation*}
or do we place it inside the square root
\begin{equation*}
\sqrt{x_i x_i} = \sqrt{x_1^2+\cdots + x_N^2} = \abs{x}\text{?}
\end{equation*}
In these notes we will avoid using the summation convention in situations like this where the meaning is potentially ambiguous.
Definition 2.7. Multiindices and polynomials.
- A multiindex is a vector \(\alpha =(\alpha_1,\ldots,\alpha_N)\) whose components are nonnegative integers.
- The order of a multiindex is \(|\alpha| :=\alpha_1+\cdots+\alpha_N\text{.}\)
- If \(x \in \R^N\) and \(\alpha\) is a multiindex, then \(x^\alpha := x_1^{\alpha_1} \cdots x_N^{\alpha_N}\) is a monomial with degree \(|\alpha|\text{.}\)
- A polynomial is a function \(p \maps \R^N \to \R\) of the form\begin{equation*} p(x) = \sum_{\abs \alpha \le d} a_\alpha x^\alpha \text{,} \end{equation*}where the sum ranges over multiindices \(\alpha\) with order \(\abs \alpha \le d\text{,}\) and the coefficients \(a_\alpha\) are real constants. Assuming that there is some \(\alpha\) with \(\abs \alpha = d\) and \(a_\alpha \ne 0\) (otherwise we could replace \(d\) by \(d-1\)), we call \(d\) the degree of \(p\text{.}\)
Exercises Exercises
1. (PS2) Summation convention.
Let \(x,y \in \R^N\) and \(A,B,C \in \R^{N \times N}\text{.}\) Give formulas for the following quantities using the Summation convention.
(a)
\(\trace A\)
Solution.
\(\trace A = A_{ii}\text{.}\)
(b)
\(Ax\)
Solution.
\((Ax)_i = A_{ij} x_j\text{.}\)
(c)
\(ABC\)
Solution.
\((ABC)_{ij} = A_{ik} B_{k\ell} C_{\ell j}\text{.}\)
(d)
\(y \cdot Ax\)
Solution.
\(y \cdot Ax = y_i A_{ij} x_j\text{.}\)
(e)
\(\langle Ax,By\rangle\)
Solution.
\(\langle Ax,By\rangle = A_{ij} x_j B_{ik} y_k\text{.}\)
Comment.
Suppose we are multiplying together two finite sums
\begin{gather*}
\left(\sum_{i=1}^N a_i\right)
\left(\sum_{i=1}^N b_i\right)
=
\left(a_1 + \cdots + a_N\right)
\left(b_1 + \cdots + b_N\right)\text{.}
\end{gather*}
If we expand this out completely, all possible products \(a_i b_j\) will appear, and not just the ‘diagonal’ products \(a_i b_i\) (no sum). One way to write this out is to say
\begin{gather*}
\left(\sum_{i=1}^N a_i\right)
\left(\sum_{j=1}^N b_j\right)
=
\sum_{i=1}^N
\sum_{j=1}^N a_i b_j\text{.}
\end{gather*}
Applying the same logic here, we have
\begin{align*}
\langle Ax,By\rangle
\amp= \sum_{i=1}^N (A x)_i (By)_i\\
\amp= \sum_{i=1}^N
\left(\sum_{j=1}^N A_{ij} x_j\right)
\left(\sum_{k=1}^N B_{ik} x_k\right)\\
\amp = \sum_{i=1}^N
\sum_{j=1}^N
\sum_{k=1}^N A_{ij} x_j B_{ik} x_k\text{,}
\end{align*}
where the last equality only works because we use different indices (\(j\) and \(k\)) in the two inner sums.
(f)
\(\trace(A^\top B)\)
Solution.
\(\trace(A^\top B) = A_{ij} B_{ij}\)