× Binary Unit Round-off Roundoff Error Number System with base \(\beta\) Floating point error analysis Orders of Convergence Newton's Method Error Analysis for Newton's Method Theorems on Newton's Method Horner's Algorithm Contractive Mapping Polynomial Interpretation Hermite Interpolation Change of Intervals Gaussian Quadrature Error Analysis for Gaussian Quadrature Composite Formulae \(\theta\)-Method Error Initial Value Problem Stability Local Truncation Error for Multistep Methods Schur's Form Hermite Matrix QR Factorization Householder LU Decomposition Norm Positive Definite Gerschgorin Theorem Rayleigh Quotient August 2017 August 2018 January 2019 August 2019 January 2020 August 2020 January 2021 August 2021 January 2022 August 2022 January 2023 August 2023 January 2024 References
☰ Menu

My name is Hanzhang Yin, and I have developed this website as a resource to facilitate the review of key concepts in abstract algebra, and it is not fully completed. Feel free to email me at hanyin@ku.edu if there are any errors or suggestions for improvement.

Numerical Analysis

Binary

\[ x = \pm (0. a_1\cdots a_n\cdots)_2\times 2^m \] where \(a_1 = 1\)

\(\textbf{Note. }\)For Marc-32, \(n = 24\), \(-126\leq m\leq 127\). For Marc-64, \(n = 53\), \(-1021\leq m\leq 1024\).


Unit Roundoff

\[ fl(1 + \varepsilon) > 1 \] \[ \varepsilon = 2^{-24} ~ 10^{-8} \] \[ \varepsilon = 2^{-53} ~ 10^{-16} \]


Roundoff Error

Let \(\delta = \frac{x - fl(x)}{x}\). define \[ \left|\delta\right|\leq \varepsilon \] Suppose that \(x = \pm (0. a_1\cdots a_n\cdots)_2\times 2^m\), then we pick two points such that \(x_-\) is less than \(x\) and \(x_+\) is greater than \(x\). Then we have \[ x_- = (0, a_1\cdots a_n)_2\times 2^m \] \[ x_+ = (0, a_1\cdots a_n+0.000\dots 1_n)_2\times 2^m \] \[ fl(x) = x_- \text{ if }a_{n+1} = 0 \text{(x is closer to x_- than x_+) and } fl(x) = x_+ \text{ if }a_{n+1} = 1\text{ (x is closer to x_+ than x_-)} \] \[ |fl(x) - x|\leq \frac{1}{2}\times (x_+ - x_-) = \frac{1}{2}\cdot 2^{-n}\cdot 2^m \] \[ \left|\frac{fl(x) - x}{x}\right|\leq \frac{\frac{1}{2}2^{-n}2^m}{(0,a_1\cdots a_n)_2\times 2^m} = 2^{-n} \]


Number System with base \(\beta\)

The unit roundoff is \[ \varepsilon = \frac{1}{2}\beta^{1-t} \] where \(t\) is the number of digits in the mantissa.


Floating point error analysis

Let \(x, y\) be two machine numbers. Let \(\circ\) is a binary operation. \[ fl(x\circ y) = (x\circ y)(1 + \delta) \] where \(|\delta|\leq\varepsilon\) is the roundoff error. \[ fl(x) = x(1 + \delta) \]

\(\textbf{Example.}\)


Orders of Convergence

Let \(\{x_n\}\) be a sequence of real numbers tending to a limit \( x^* \). We say that the rate of convergence is at least linear if there are a constant \( c \lt 1 \) and an integer \( N \) such that \[ |x_{n+1} - x^*| \le c |x_n - x^*| \quad (n \ge N) \] We say that the rate of convergence is at least superlinear if there exist a sequence \( \epsilon_n \) tending to 0 and an integer \( N \) such that \[ |x_{n+1} - x^*| \le \epsilon_n |x_n - x^*| \quad (n \ge N) \] The convergence is at least quadratic if there are a constant \( C \) (not necessarily less than 1) and an integer \( N \) such that \[ |x_{n+1} - x^*| \le C |x_n - x^*|^2 \quad (n \ge N) \] In general, if there are positive constants \( C \) and \( \alpha \) and an integer \( N \) such that \[ |x_{n+1} - x^*| \le C |x_n - x^*|^\alpha \quad (n \ge N) \] we say that the rate of convergence is of order \(\alpha\) at least.

"Numerical Analysis: Mathematics of Scientific Computing" page. 17.

Newton's Method

Definition. Newton's method, also known as the Newton-Raphson method, is an iterative root-finding algorithm used to find successively better approximations to the roots (or zeros) of a real-valued function. Given a function \( f(x) \) and its derivative \( f'(x) \), Newton's method starts with an initial guess \( x_0 \) and iteratively refines this guess using the following formula: \[ x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}, \] where:

Convergence Criteria. The method typically converges if the initial guess \( x_0 \) is sufficiently close to the actual root, and \( f(x) \) is continuously differentiable in the neighborhood of the root.

Example. Consider finding the root of the function \( f(x) = x^2 - 2 \). The derivative of the function is \( f'(x) = 2x \). Firstly, we start with an initial guess \( x_0 \). Secondly, we apply the Newton's iteration: \[ x_{n+1} = x_n - \frac{x_n^2 - 2}{2x_n} = \frac{2x_n - \frac{x_n^2 - 2}{x_n}}{2} = \frac{x_n + \frac{2}{x_n}}{2} \] So, the iterative formula becomes: \[ x_{n+1} = \frac{1}{2} \left( x_n + \frac{2}{x_n} \right) \] By repeatedly applying this formula, the sequence \( \{x_n\} \) will converge to the square root of 2.


Error Analysis for Newton's Method

Let \( r \) be a solution of \( f(x) = 0 \) (i.e. \( f(r) = 0 \)). Suppose that we have already computed \( x_n \) and the error in \( x_n \) is \(e_n = |x_n - r| \). We now derive a formula that relates the error after the next step, \( |x_{n+1} - r| \), to \( |x_n - r| \). According to Taylor's theorem, there is a \( c \) between \( x_n \) and \( r \) such that \[ 0 = f(r) = f(x_n) + f'(x_n)(r - x_n) + \frac{1}{2}f''(c)(r - x_n)^2 \quad (1) \] By the definition of Newton's method, we have \[ 0 = f(x_n) + f'(x_n)(x_{n+1} - x_n) \quad (2) \] Subtracting (2) from (1). \[ \begin{align} 0 &= f'(x_n)(r - x_{n+1}) + \frac{1}{2}f''(c)(r - x_n)^2\\ x_{n+1} - r &= \frac{f''(c)}{2f'(x_n)}(x_n - r)^2 \\ e_{n+1} &= |x_{n+1} - r| = \left| \frac{f''(c)}{2f'(x_n)} \right||x_n - r|^2 \\ \end{align} \] If \( x_n \) is close to \( r \), then \( c \), which must be between \( x_n \) and \( r \), is also close to \( r \) and \[ e_{n+1} = |x_{n+1} - r| \approx \left| \frac{f''(r)}{2f'(r)} \right||x_n - r|^2 = \left| \frac{f''(r)}{2f'(r)}\right| e_n^2 = Ce_n^2 \]

Thus, we can see that Newton's Method is quadratically convergent.

However, the disadvantage of Newton's Method is that if the initial value is closed enough to the root.


Theorems on Newton's Method

Theorem. Let \(f''\) be continuous and let \(r\) be a simple zero of \(f\). Then there is a neighborhood of \(r\) and a constant \(C\) such that if Newton's method is started in that neighborhood, the successive points become steadily closer to \(r\) and satisfy \[ |x_{n+1} - r| \leq C (x_n- r)^2 \quad (n \geq 0) \]

$\textbf{Theorem on Newton's Method for a Convex Function. }$ If \( f \) belongs to \( C^2(\mathbb{R}) \), is increasing, is convex, and has a zero, then the zero is unique, and the Newton iteration will converge to it from any starting point.

$\textbf{Convergence of Newton's method.}$ Assume that \( f(x) \), \( f'(x) \), and \( f''(x) \) are continuous for all \( x \) in some neighborhood of \( \alpha \) and \( f(\alpha) = 0 \) and \( f'(\alpha) \neq 0 \). If \( x_0 \) is sufficiently close to \( \alpha \), then \( x_n \to \alpha \) as \( n \to \infty \). Moreover, \[ \lim_{n \to \infty} \frac{|x_{n+1} - \alpha|}{|x_n - \alpha|^2} = \frac{1}{2} \left| \frac{f''(\alpha)}{f'(\alpha)} \right|. \]


Horner's Algorithm

This method is also known as Horner's method, nested multiplication, or synthetic division.

\(\textbf{Definition. }\) Horner's algorithm is a method for evaluating a polynomial at a point. Given a polynomial \( p(x) = a_0 + a_1 x + a_2 x^2 + \cdots + a_n x^n \), Horner's algorithm evaluates \( p(x) \) at a point \( x = c \) as follows: \[ p(c) = a_0 + c(a_1 + c(a_2 + \cdots + c(a_{n-1} + c a_n) \cdots)). \]

Given a polynomial \( p(x) = a_n x^n + a_{n-1} x^{n-1} + \cdots + a_1 x + a_0 \). Then \[ \begin{align} b_{n-1} &= a_n \\ b_{n-2} &= a_{n-1} + z_0 b_{n-1} \\ &\vdots \\ b_0 &= a_1 + z_0 b_1 \\ p(z_0) &= a_0 + z_0 b_0 \end{align} \] Horner's algorithm evaluates \( p(x) \) at a point \( x = z_0 \) as follows: \[ \begin{array}{cccccc} & a_n & a_{n-1} & a_{n-2} & \cdots & a_0 \\ z_0 & & z_0 b_{n-1} & z_0 b_{n-2} & \cdots & z_0 b_0 \\ & b_{n-1} & b_{n-2} & b_{n-3} & \cdots & \boxed{b_{-1}} \\ \end{array} \] The boxed number satisfies \(p(z_0) = b_{-1}\).

\(\textbf{Example. }\) Given the polynomial \( p(x) = 4x^3 - 3x^2 + 2x - 1 \) and \( z_0 = 2 \), evaluate \( p(z_0) \). \[ \begin{array}{ccccc} & 4 & -3 & 2 & -1 \\ 2 & & 8 & 10 & 24 \\ & 4 & 5 & 12 & \boxed{23} \\ \end{array} \] Hence, we have \( p(2) = 23 \).

\(\textbf{Theorem on Horner's Method. }\) Let \( p(x) = a_n x^n + \cdots + a_1 x + a_0 \). Define pairs \((\alpha_j, \beta_j)\) for \( j = n, n-1, \ldots, 0 \) by the algorithm \[ \begin{cases} (\alpha_n, \beta_n) = (a_n, 0) \\ (\alpha_j, \beta_j) = (a_j + x \alpha_{j+1}, \alpha_{j+1} + x \beta_{j+1}) & (n-1 \ge j \ge 0) \end{cases} \] Then \(\alpha_0 = p(x)\) and \(\beta_0 = p'(x)\).


Contractive Mapping

Newton's method and Steffensen's method are examples of procedures whereby a sequence of points is computed by a formula of the form \[ x_{n+1} = F(x_n) \quad (n \geq 0) \tag*{(1)} \] The algorithm defined by such an equation is called functional iteration.

"Numerical Analysis: Mathematics of Scientific Computing" page. 100.

\(\textbf{Definition. }\) A mapping \( F \) of a metric space \( (X, d) \) into itself is called a contraction if there exists a constant \( k \) with \( 0 \leq k \lt 1 \) such that \[ d(F(x), F(y)) \leq k d(x, y) \] for all \( x, y \) in \( X \).

"Numerical Analysis: Mathematics of Scientific Computing" page. 101.

Contractive Mapping Theorem

Theorem (Contractive Mapping Theorem). Let \( C \) be a closed subset of the real line. If \( F \) is a contractive mapping of \( C \) into \( C \), then \( F \) has a unique fixed point. Moreover, this fixed point is the limit of every sequence obtained from Equation (1) with a starting point \( x_0 \in C \).

"Numerical Analysis: Mathematics of Scientific Computing" page. 102.


Polynomial Interpretation

Theorem on Polynomial Interpolation Error

\(\textbf{Theorem. }\)Let \( f \) be a function in \( C^{n+1}[a, b] \), and let \( p \) be the polynomial of degree at most \( n \) that interpolates the function \( f \) at \( n + 1 \) distinct points \( x_0, x_1, \ldots, x_n \) in the interval \([a, b]\). To each \( x \) in \([a, b]\) there corresponds a point \( \xi_x \) in \((a, b)\) such that \[ f(x) - p(x) = \frac{1}{(n + 1)!} f^{(n+1)}(\xi_x) \prod_{i=0}^{n} (x - x_i) . \]


Lagrange's Form

The Lagrange form of the interpolation polynomial is a method for constructing a polynomial that passes through a given set of points. Specifically, given a set of \( n+1 \) distinct data points \((x_0, y_0), (x_1, y_1), \ldots, (x_n, y_n)\), the Lagrange interpolation polynomial \( P(x) \) is defined as: \[ P(x) = \sum_{i=0}^{n} y_i \ell_i(x) \] where \( \ell_i(x) \) are the Lagrange basis polynomials given by: \[ \ell_i(x) = \prod_{\substack{0 \le j \le n \\ j \ne i}} \frac{x - x_j}{x_i - x_j} \] Each \( \ell_i(x) \) is a polynomial of degree \( n \) that is 1 at \( x = x_i \) and 0 at \( x = x_j \) for \( j \ne i \).

\(\textbf{Example. }\) Given points: \( (1, 1), (2, 4), (3, 9) \). Here \(x_0 = 1, x_1 = 2, x_2 = 3\), and \(y_0 = 1, y_1 = 4, y_2 = 9\).

Compute the Lagrange basis polynomials \( \ell_i(x) \): \[ \ell_0(x) = \frac{(x-2)(x-3)}{(1-2)(1-3)} = \frac{(x-2)(x-3)}{2} \] \[ \ell_1(x) = \frac{(x-1)(x-3)}{(2-1)(2-3)} = - (x-1)(x-3) \] \[ \ell_2(x) = \frac{(x-1)(x-2)}{(3-1)(3-2)} = \frac{(x-1)(x-2)}{2} \]

Construct the interpolation polynomial \( P(x) \): \[ P(x) = 1 \cdot \ell_0(x) + 4 \cdot \ell_1(x) + 9 \cdot \ell_2(x). \] \[ P(x) = 1 \cdot \frac{(x-2)(x-3)}{2} - 4 \cdot (x-1)(x-3) + 9 \cdot \frac{(x-1)(x-2)}{2}. \] Simplifying, we get: \[ P(x) = \frac{(x-2)(x-3)}{2} - 4(x-1)(x-3) + \frac{9(x-1)(x-2)}{2}. \]


Divided Differences

The divided differences of a set of data points are a sequence of numbers that are used to construct the Newton form of the interpolation polynomial.

\[f[x_i, x_{i+1}, \ldots, x_{i+j}] = \frac{f[x_{i+1}, x_{i+2}, \ldots, x_{i+j}] - f[x_i, x_{i+1}, \ldots, x_{i+j-1}]}{x_{i+j} - x_i}\] The Following table shows the divided differences for a set of data points: \[\begin{array}{cccc} x_0 & f[x_0] & f[x_0, x_1] & f[x_0, x_1, x_2] & f[x_0, x_1, x_2, x_3] \\ x_1 & f[x_1] & f[x_1, x_2] & f[x_1, x_2, x_3] & \\ x_2 & f[x_2] & f[x_2, x_3] & & \\ x_3 & f[x_3] & & & \end{array} \]

\(\textbf{Example. }\) Given the following table of data points: \[ \begin{array}{c|cccc} x & 3 & 1 & 5 & 6 \\ \hline f(x) & 1 & -3 & 2 & 4 \\ \end{array} \] Solution: We arrange the given table vertically and compute divided differences by the use of Formula above, arriving at \[ \begin{array}{ccccc} 3 & 1 & 2 & -3/8 & 7/40 \\ 1 & -3 & 5/4 & 3/20 & \\ 5 & 2 & 2 & & \\ 6 & 4 & & & \\ \end{array} \]

Theorem on Permutations in Divided Differences. The divided difference is a symmetric function of its arguments. Thus, if \((z_0, z_1, \ldots, z_n)\) is a permutation of \((x_0, x_1, \ldots, x_n)\), then \[ f[z_0, z_1, \ldots, z_n] = f[x_0, x_1, \ldots, x_n]. \]

Theorem on Derivatives and Divided Differences. If \( f \) is \( n \) times continuously differentiable on \([a, b]\) and if \( x_0, x_1, \ldots, x_n \) are distinct points in \([a, b]\), then there exists a point \( \xi \) in \((a, b)\) such that \[ f[x_0, x_1, \ldots, x_n] = \frac{1}{n!} f^{(n)}(\xi). \]


Newton's Form

The Newton form of the interpolation polynomial is a method for constructing a polynomial that passes through a given set of points. Specifically, given a set of \( n+1 \) distinct data points \((x_0, y_0), (x_1, y_1), \ldots, (x_n, y_n)\), the Newton interpolation polynomial \( P(x) \) is defined as: \[ P(x) = a_0 + a_1(x - x_0) + a_2(x - x_0)(x - x_1) + \cdots + a_n(x - x_0)(x - x_1) \cdots (x - x_{n-1}) \] where \( a_0, a_1, \ldots, a_n \) are the divided differences of the data points.

\(\textbf{Remark. }\)Here, \(a_n = f[x_0, x_1, \ldots, x_n]\).

\(\textbf{Example. }\) Given the following table of data points: \[ \begin{array}{c|cccc} x & 3 & 1 & 5 & 6 \\ \hline f(x) & 1 & -3 & 2 & 4 \\ \end{array} \] We already found the divided differences in the previous example. The Newton form of the interpolation polynomial is \[ p(x) = 1 + 2(x - 3) - \frac{3}{8}(x - 3)(x - 1) + \frac{7}{40}(x - 3)(x - 1)(x - 5). \]


Theorem on Polynomial Interpolation Error

\(\textbf{Theorem. }\)Let \( f \) be a function in \( C^{n+1}[a, b] \), and let \( p \) be the polynomial of degree at most \( n \) that interpolates the function \( f \) at \( n + 1 \) distinct points \( x_0, x_1, \ldots, x_n \) in the interval \([a, b]\). To each \( x \) in \([a, b]\) there corresponds a point \( \xi_x \) in \((a, b)\) such that \[ f(x) - p(x) = \frac{1}{(n + 1)!} f^{(n+1)}(\xi_x) \prod_{i=0}^{n} (x - x_i) . \]


Hermite Interpolation

Hermite Interpolation Theorem. Let \( n \geq 0 \), and suppose that \( x_i \), \( i = 0, \ldots, n \), are distinct real numbers. Then, given two sets of real numbers \( y_i \), \( i = 0, \ldots, n \), and \( z_i \), \( i = 0, \ldots, n \), there is a unique polynomial \( p_{2n+1} \) in \( \mathcal{P}_{2n+1} \) such that \[ p_{2n+1}(x_i) = y_i, \quad p'_{2n+1}(x_i) = z_i, \quad i = 0, \ldots, n. \quad (6.13) \]

Remark. In general, if there is a set of \(m\) distinct points, there is a unique polynomial of degree \(2m-1\) that interpolates the function at those points for hermite interpolation.

"Introduction to Numerical Analysis", page. 188.

Theorem. Suppose that \( n \geq 0 \) and let \( f \) be a real-valued function, defined, continuous and \( 2n + 2 \) times differentiable on the interval \([a, b]\), such that \( f^{(2n+2)} \) is continuous on \([a, b]\). Further, let \( p_{2n+1} \) denote the Hermite interpolation polynomial of \( f \) defined by (6.16). Then, for each \( x \in [a, b] \) there exists \( \xi = \xi(x) \) in \((a, b)\) such that \[ f(x) - p_{2n+1}(x) = \frac{f^{(2n+2)}(\xi)}{(2n+2)!} [\pi_{n+1}(x)]^2, \quad (6.17) \] where \( \pi_{n+1} \) is as defined in (6.9). Moreover, \[ |f(x) - p_{2n+1}(x)| \leq \frac{M_{2n+2}}{(2n+2)!} [\pi_{n+1}(x)]^2, \quad (6.18) \] where \( M_{2n+2} = \max_{\zeta \in [a,b]} |f^{(2n+2)}(\zeta)| \).

"Introduction to Numerical Analysis", page. 190.

Theorem on Hermite Interpolation Error Estimate Let \( x_0, x_1, \ldots, x_n \) be distinct nodes in \([a, b]\) and let \( f \in C^{2n+2}[a, b] \). If \( p \) is the polynomial of degree at most \( 2n + 1 \) such that \[ p(x_i) = f(x_i) \quad \text{and} \quad p'(x_i) = f'(x_i) \quad (0 \leq i \leq n) \] then to each \( x \) in \([a, b]\) there corresponds a point \( \xi \) in \((a, b)\) such that \[ f(x) - p(x) = \frac{f^{(2n+2)}(\xi)}{(2n+2)!} \prod_{i=0}^{n} (x - x_i)^2 \]

"Numerical Analysis: Mathematics of Scientific Computing" page. 344.


Change of Intervals

Suppose that a numerical integration formula is given: \[ \int_c^d f(t) \, dt \approx \sum_{i=0}^n A_i f(t_i). \] Let us assume that it is exact for all polynomials of degree \( m \) or less. If we want to integrate the function \( f \) over the interval \([a, b]\), we can use the change of interval formula. We first define a linear function \(\lambda\) of \( t \) such that if \( t \) traverses \([c, d]\), \(\lambda(t)\) will traverse \([a, b]\). The function \(\lambda\) is given explicitly by \[ \lambda(t) = \frac{b - a}{d - c} t + \frac{ad - bc}{d - c}. \] Now in the integral \[ \int_a^b f(x) \, dx \] we make the change of variable \( x = \lambda(t) \). Then \( dx = \lambda'(t) \, dt = (b-a)(d-c)^{-1} \, dt \), and so we have \[ \int_a^b f(x) \, dx = \frac{b - a}{d - c} \int_c^d f(\lambda(t)) \, dt \] \[ \approx \frac{b - a}{d - c} \sum_{i=0}^n A_i f(\lambda(t_i)) \] Hence, we have \[ \int_a^b f(x) \, dx \approx \frac{b - a}{d - c} \sum_{i=0}^n A_i f \left( \frac{b - a}{d - c} t_i + \frac{ad - bc}{d - c} \right) \] Observe that, because \(\lambda\) is linear, \( f(\lambda(t)) \) is a polynomial in \( t \) if \( f \) is a polynomial, and the degrees are the same. Hence, the new formula is exact for polynomials of degree \( m \). For Simpson’s rule, this procedure yields Equation (6) as a consequence of Equation (5) using \( \lambda(t) = (b - a)t + a \).

"Numerical Analysis: Mathematics of Scientific Computing," page. 486.


Gaussian Quadrature

The theory can be formulated for quadrature rules of a slightly more general form; namely, \[ \int_a^b f(x) w(x) \, dx \approx \sum_{i=0}^n A_i f(x_i), \] where \( w \) is a fixed positive weight function.

The Gauss quadrature rule: \[ \int_a^b w(x) f(x) \, dx \approx G_n(f) = \sum_{k=0}^{n} W_k f(x_k), \tag*{(10.6)} \] where the quadrature weights are \[ W_k = \int_a^b w(x) [L_k(x)]^2 \, dx, \tag*{(10.7)} \] and the quadrature points \( x_k, \, k = 0, \ldots, n \), are chosen as the zeros of the polynomial of degree \( n + 1 \) from a system of orthogonal polynomials over the interval \( (a, b) \) with respect to the weight function \( w \). Since this quadrature rule was obtained by exact integration of the Hermite interpolation polynomial of degree \( 2n + 1 \) for \( f \), it gives the exact result whenever \( f \) is a polynomial of degree \( 2n + 1 \) or less.

"Introduction to Numerical Analysis", page. 279.


Error Analysis for Gaussian Quadrature

Given the Gaussian quadrature formula \[ \int_a^b f(x) dx = \sum_{i=1}^{n} w_i f(x_i) + R_n(f), \] where \[ R_n(f) = \frac{(b-a)^{2n+1}(n!)^4}{(2n+1)[(2n)!]^3} f^{(2n)}(c). \]

Theorem. Suppose that \( w \) is a weight function, defined, integrable, continuous and positive on \( (a,b) \), and that \( f \) is defined and continuous on \( [a,b] \); suppose further that \( f \) has a continuous derivative of order \( 2n+2 \) on \( [a,b] \), \( n \geq 0 \). Then, there exists a number \( \eta \) in \( (a,b) \) such that \[ \int_a^b w(x)f(x)dx - \sum_{k=0}^n W_k f(x_k) = K_n f^{(2n+2)}(\eta), \tag{10.18} \] and \[ K_n = \frac{1}{(2n+2)!} \int_a^b w(x) [\pi_{n+1}(x)]^2 dx. \] Consequently, the integration formula (10.6), (10.7) will give the exact result for every polynomial of degree \( 2n + 1 \).

"Introduction to Numerical Analysis," page. 282.

Theorem on Gaussian Formula with Error Term Consider a Gaussian formula with error term: \[ \int_a^b f(x)w(x) \, dx = \sum_{i=0}^{n-1} A_i f(x_i) + E \] For an \( f \in C^{2n}[a, b] \), we have \[ E = \frac{f^{(2n)}(\xi)}{(2n)!} \int_a^b q^2(x) w(x) \, dx \] where \( a < \xi < b \) and \( q(x) = \prod_{i=0}^{n-1} (x - x_i) \).

Proof. By Hermite interpolation (Section 6.3, p. 338), a polynomial \( p \) of degree at most \( 2n - 1 \) exists such that \[ p(x_i) = f(x_i), \quad p'(x_i) = f'(x_i) \quad (0 \leq i \leq n-1) \] The error formula for this interpolation is \[ f(x) - p(x) = \frac{f^{(2n)}(\zeta(x)) q^2(x)}{(2n)!} \quad (11) \] as given in Theorem 2 of Section 6.3 (p. 344). Here \(q(x)\) is the polynomial of degree \(n\) which is \(w\)-orthogonal to the polynomials of degree at most \(n-1\). It follows that \[ \int_a^b f(x) w(x) \, dx - \int_a^b p(x) w(x) \, dx = \frac{1}{(2n)!} \int_a^b f^{(2n)}(\zeta(x)) q^2(x) w(x) \, dx \] Now use the fact that \( p \) is of degree at most \( 2n - 1 \) to see that \[ \int_a^b p(x) w(x) \, dx = \sum_{i=0}^{n-1} A_i p(x_i) = \sum_{i=0}^{n-1} A_i f(x_i) \] Furthermore, the Mean-Value Theorem for Integrals can be used to write \[ \int_a^b f^{(2n)}(\zeta(x)) q^2(x) w(x) \, dx = f^{(2n)}(\xi) \int_a^b q^2(x) w(x) \, dx \] This requires continuity of \( f^{(2n)}(\zeta(x)) \), which can be inferred from Equation (11). The desired error formula results from making the substitution described. \(\blacksquare\)

"Numerical Analysis: Mathematics of Scientific Computing" page. 498.

Theorem 8. Let \( w(x) \) be a weight function and \( p_n(x) \) be the monic polynomial of degree \( n \) orthogonal to all polynomials of smaller degrees. Let \( x_j \), \( j = 1, 2, \ldots, n \), be zeros of \( p_n(x) \). Let \( Q(f) \) be the quadrature rule defined in Eqs. (36) and (37). Suppose \( f \in C^{2n}[a, b] \). Then one can find \( \lambda \in (a, b) \) such that \[ \int_a^b f(x) w(x) \, dx = Q(f) + \gamma_n \frac{f^{(2n)}(\lambda)}{(2n)!}, \quad \text{where} \quad \gamma_n = \int_a^b p_n^2(x) w(x) \, dx. \quad (43) \]

Proof. We use the Hermite interpolation to prove the error estimate. There exists a unique polynomial \( h_{2n-1}(x) \) of degree \( 2n - 1 \) such that \[ h_{2n-1}(x_j) = f(x_j) \quad \text{and} \quad h'_{2n-1}(x_j) = f'(x_j). \] In addition, there exists \( \zeta \in (a, b) \) depending on \( x \) such that \[ f(x) = h_{2n-1}(x) + \frac{f^{(2n)}(\zeta)}{(2n)!} (x - x_1)^2 \cdots (x - x_n)^2. \] Note that \( (x - x_1) \cdots (x - x_n) = p_n(x) \) as \( p_n \) is monic and \( x_j \), \( j = 1, \ldots, n \), are its roots. Therefore, \[ \int_a^b f(x) w(x) \, dx = \int_a^b h_{2n-1}(x) w(x) \, dx + \int_a^b \frac{f^{(2n)}(\zeta(x))}{(2n)!} p_n^2(x) w(x) \, dx. \] Since the quadrature is exact for polynomials of degree \( 2n - 1 \), it is exact for \( h_{2n-1}(x) \). Hence \[ \int_a^b h_{2n-1}(x) w(x) \, dx = Q(h_{2n-1}) = \sum_{j=1}^n h_{2n-1}(x_j) w_j = \sum_{j=1}^n f(x_j) w_j = Q(f). \] On the other hand, because \( p_n^2(x) w(x) \geq 0 \) on \([a, b]\), we can apply the mean value theorem and get \[ \int_a^b \frac{f^{(2n)}(\zeta(x))}{(2n)!} p_n^2(x) w(x) \, dx = \frac{f^{(2n)}(\lambda)}{(2n)!} \int_a^b p_n^2(x) w(x) \, dx. \] for some \( \lambda \in (a, b) \). This completes the proof. \(\square\)


Composite Formulae

Definition(Composite trapezium rule). \[ \int_a^b f(x) \, dx \approx h \left[ \frac{1}{2} f(x_0) + f(x_1) + \cdots + f(x_{m-1}) + \frac{1}{2} f(x_m) \right]. \]

Introduction to Numerical Analysis page. 209.

Definition (Composite Simpson rule). \[ \int_a^b f(x) \, dx \approx \frac{h}{3} \left[ f(x_0) + 4f(x_1) + 2f(x_2) + 4f(x_3) + \cdots + 2f(x_{2m-2}) + 4f(x_{2m-1}) + f(x_{2m}) \right]. \]

Introduction to Numerical Analysis page. 210.

Definition (Composite Midpoint Rule). The composite midpoint rule is the composite Gauss formula with \(w(x) \equiv 1\) and \(n = 0\) defined by \[ \int_a^b f(x) \, dx \approx h \sum_{j=1}^m f\left(a + \left(j - \frac{1}{2}\right) h \right). \] This follows from the fact that when \(n = 0\) there is one quadrature point \(\xi_0 = 0\) in \((-1,1)\), which is at the midpoint of the interval, and the corresponding quadrature weight \(W_0\) is equal to the length of the interval \((-1,1)\), i.e., \(W_0 = 2\). It follows from (10.24) with \(n = 0\) and \[ C_0 = \int_{-1}^1 t^2 \, dt = \frac{2}{3} \] that the error in the composite midpoint rule is \[ E_{0,m} = \frac{(b-a)^3}{24m^2} f''(\eta), \] where \(\eta \in (a, b)\), provided that the function \(f\) has a continuous second derivative on \([a, b]\).

"Introduction to Numerical Analysis", page. 286.


\(\theta\)-Method

Theta methods. We consider methods of the form \[ y_{n+1} = y_n + h [ \theta f(t_n, y_n) + (1 - \theta) f(t_{n+1}, y_{n+1}) ], \quad n = 0, 1, \ldots, \] where \( \theta \in [0, 1] \) is a parameter: - If \( \theta = 1 \), we recover Euler's method. - If \( \theta \in (0, 1) \), then the theta method (3.3) is implicit: Each time step requires the solution of \( N \) (in general, nonlinear) algebraic equations for the unknown vector \( y_{n+1} \). - The choices \( \theta = 0 \) and \( \theta = \frac{1}{2} \) are known as Backward Euler: \[ y_{n+1} = y_n + h f(t_{n+1}, y_{n+1}), \] Trapezoidal rule: \[ y_{n+1} = y_n + \frac{1}{2} h [ f(t_n, y_n) + f(t_{n+1}, y_{n+1}) ]. \] Solution of nonlinear algebraic equations can be done by iteration. For example, for backward Euler, letting \( y_{n+1}^{[0]} = y_n \), we may use $\textit{Direct iteration}$: \[ y_{n+1}^{[j+1]} = y_n + h f(t_{n+1}, y_{n+1}^{[j]}); \] $\textit{Newton–Raphson}$: \[ y_{n+1}^{[j+1]} = y_{n+1}^{[j]} - \left[ I - h \frac{\partial f(t_n, y_n)}{\partial y} \right]^{-1} \left[ y_{n+1}^{[j]} - y_n - h f(t_{n+1}, y_{n+1}^{[j]}) \right]; \] $\textit{Modified Newton–Raphson}$: \[ y_{n+1}^{[j+1]} = y_{n+1}^{[j]} - \left[ I - h \frac{\partial f(t_n, y_n)}{\partial y} \right]^{-1} \left[ y_{n+1}^{[j]} - y_n - h f(t_{n+1}, y_{n+1}^{[j]}) \right]. \]


Error

In numerically solving a differential equation, several types of errors arise. These are conveniently classified as follows:

Local Truncation Error. The local truncation error is the error made in a single step of a numerical method.

{Local Roundoff Error. The local roundoff error is the error made in a single step of a numerical method due to roundoff.

Global Truncation Error. The global truncation error is the accumulation of the local truncation errors in the previous steps.

Global Roundoff Error. The global roundoff error is the accumulation of the local roundoff errors in the previous steps.

Total Error. The total error is the sum of the global truncation error and the global roundoff error.

"Numerical Analysis: Mathematics of Scientific Computing" page. 533.


Richardson Extrapolation

\(\textbf{Definition. }\) Richardson extrapolation is a method for improving the accuracy of a numerical method. Given a numerical method that approximates a quantity with error \( E(h) \) for a step size \( h \), Richardson extrapolation constructs a new approximation with error \( E(h^2) \) by combining two approximations with step sizes \( h \) and \( h/2 \).


Initial value problems for ODEs

Our model is an initial-value problem written in the form \[ \begin{cases} x' = f(t, x) \\ x(t_0) = x_0 \quad \tag{1} \end{cases} \] Here \( x \) is an unknown function of \( t \) that we hope to construct from the information given in Equations (1), where \( x' = \frac{dx(t)}{dt} \). The second of the two equations in (1) specifies one particular value of the function \( x(t) \). The first equation gives the slope.

"Numerical Analysis: Mathematics of Scientific Computing" page.524.

Existence

$\textbf{First Existence Theorem, Initial-Value Problem}$ If \( f \) is continuous in a rectangle \( R \) centered at \( (t_0, x_0) \), say \[ R = \{ (t, x) : |t - t_0| \leq \alpha, \quad |x - x_0| \leq \beta \}\tag{4} \] then the initial-value problem above has a solution \( x(t) \) for \( |t - t_0| \leq \min(\alpha, \beta / M) \), where \( M \) is the maximum of \( |f(t, x)| \) in the rectangle \( R \).

"Numerical Analysis: Mathematics of Scientific Computing" page 525.

$\textbf{Second Existence Theorem, Initial-Value Problem.}$ If \( f \) is continuous in the strip \( a \leq t \leq b, -\infty < x < \infty \) and satisfies there an inequality \[ |f(t, x_1) - f(t, x_2)| \leq L |x_1 - x_2| \tag{5} \] then the initial-value problem (1) has a unique solution in the interval \([a, b]\).

"Numerical Analysis: Mathematics of Scientific Computing" page.526.

Lipschitz Condition

Definition: A function \( f : \mathbb{R}^n \to \mathbb{R} \) satisfies the Lipschitz condition on a domain \( D \subset \mathbb{R}^n \) if there exists a constant \( L \geq 0 \) such that for all \( x, y \in D \): \[ \| f(x) - f(y) \| \leq L \| x - y \| \] where \( \| \cdot \| \) denotes a norm (commonly the Euclidean norm). And \( L \) is called the Lipschitz constant.

Note. Hence, inequality (5) is the Lipschitz condition.

Uniqueness

Uniqueness Theorem, Initial-Value Problem. If \( f \) and \(\partial f / \partial x \) are continuous in the rectangle \( R \) defined by $\textbf{(4)}$, then the initial-value problem (1) has a unique solution in the interval \( |t - t_0| < \min(\alpha, \beta / M) \).

$\textbf{Theorem (Picard's Theorem).}$ Suppose that the real-valued function \((x,y) \mapsto f(x,y)\) is continuous in the rectangular region \(D\) defined by \(x_0 \leq x \leq X_M\), \(y_0 - C \leq y \leq y_0 + C\); that \(|f(x,y_0)| \leq K\) when \(x_0 \leq x \leq X_M\); and that \(f\) satisfies the Lipschitz condition: there exists \(L > 0\) such that \[ |f(x,u) - f(x,v)| \leq L|u - v| \quad \text{for all} \quad (x,u) \in D, \quad (x,v) \in D. \] Assume further that \[ C \geq \frac{K}{L} \left( e^{L(X_M - x_0)} - 1 \right). \tag{12.3} \] Then, there exists a unique function \(y \in C^1[x_0, X_M]\) such that \(y(x_0) = y_0\) and \(y' = f(x,y)\) for \(x \in [x_0, X_M]\); moreover, \[ |y(x) - y_0| \leq C, \quad x_0 \leq x \leq X_M. \]

"Introduction to Numerical Analysis", page. 311.

One-step methods

Definition. Generally, a one-step method may be written in the form \[ y_{n+1} = y_n + h \Phi (x_n, y_n; h), \quad n = 0, 1, \ldots, N-1,\qquad y(x_0) = y_0, \tag*{(12.13)} \] where \(\Phi (\cdot, \cdot; \cdot)\) is a continuous function of its variables.

"Introduction to Numerical Analysis", page. 317.

Definition. In order to find the accuracy of the numerical method (12.13), we define the global error, \( e_n \), by \[ e_n = y(x_n) - y_n. \]

Definition. The truncation error, \( T_n \), is defined by \[ T_n = \frac{y(x_{n+1}) - y(x_n)}{h} - \Phi(x_n, y(x_n); h). \tag*{(12.14)} \]

"Introduction to Numerical Analysis", page. 317.

Theorem. Consider the general one-step method (12.13) where, in addition to being a continuous function of its arguments, \(\Phi\) is assumed to satisfy a Lipschitz condition with respect to its second argument, that is, there exists a positive constant \( L_{\Phi} \) such that, for \( 0 \le h \le h_0 \) and for all \((x, u)\) and \((x, v)\) in the rectangle \[ D = \{(x, y): x_0 \le x \le X_M, |y - y_0| \le C\}, \] we have that \[ |\Phi(x, u; h) - \Phi(x, v; h)| \le L_{\Phi} |u - v|. \quad (12.15) \] Then, assuming that \( |y_n - y_0| \le C \), \( n = 1, 2, \ldots, N \), it follows that \[ |e_n| \le \frac{T}{L_{\Phi}} \left( e^{L_{\Phi} (x_n - x_0)} - 1 \right), \quad n = 0, 1, \ldots, N, \quad (12.16) \] where \( T = \max_{0 \le n \le N-1} |T_n| \).

"Introduction to Numerical Analysis", page. 318.

$\textbf{Euler's method.}$ Given that \( y(x_0) = y_0 \), let us suppose that we have already calculated \( y_n \), up to some \( n \), \( 0 \leq n \leq N - 1 \), \( N \geq 1 \); we define \[ y_{n+1} = y_n + h f(x_n, y_n). \] Thus, taking in succession \( n = 0, 1, \ldots, N - 1 \), one step at a time, the approximate values \( y_n \) at the mesh points \( x_n \) can be easily obtained. This numerical method is known as Euler's method.

"Introduction to Numerical Analysis", page. 317.


Consistency

Definition. The numerical method (12.13) is consistent with the differential equation (12.1) if the truncation error, defined by (12.14), is such that for any \( \epsilon > 0 \) there exists a positive \( h(\epsilon) \) for which \( |T_n| < \epsilon \) for \( 0 < h < h(\epsilon) \) and any pair of points \((x_n, y(x_n)), (x_{n+1}, y(x_{n+1}))\) on any solution curve in \( D \).

"Introduction to Numerical Analysis", page. 321.

Theorem. Suppose that the initial value problem (12.1), (12.2) satisfies the conditions of Picard's Theorem, and also that its approximation generated from (12.13) when \( h \le h_0 \) lies in the region \( D \). Assume further that the function \(\Phi (\cdot, \cdot; \cdot)\) is continuous on \( D \times [0, h_0] \), and satisfies the consistency condition (12.21) and the Lipschitz condition \[ |\Phi(x, u; h) - \Phi(x, v; h)| \le L_{\Phi} |u - v| \quad \text{on} \quad D \times [0, h_0]. \tag*{(12.22)} \] Then, if successive approximation sequences \((y_n)\), generated by using the mesh points \( x_n = x_0 + nh \), \( n = 1, 2, \ldots, N \), are obtained from (12.13) with successively smaller values of \( h \), each \( h \) less than \( h_0 \), we have convergence of the numerical solution to the solution of the initial value problem in the sense that \[ \lim_{n \to \infty} y_n = y(x) \quad \text{as} \quad x_n \to x \in [x_0, X_M] \quad \text{when} \quad h \to 0 \quad \text{and} \quad n \to \infty. \]

"Introduction to Numerical Analysis", page. 322.

Adams-Bashforth Formula

Suppose the resulting formula is of the following type: \[ x_{n+1} = x_n + a\cdot f_n + b\cdot f_{n-1} + c\cdot f_{n-2} + \cdots \] where \( f_i \) denotes \( f(t_i, x_i) \). An equation of this type is called an Adams-Bashforth formula.

Note. The Adams-Bashforth formula is an explicit method since it does not include $f_{n+1}$.

Adams-Bashforth Formula of Order $5$

\[x_{n+1} = x_n + \frac{h}{720} \left[ 1901 f_n - 2774 f_{n-1} + 2616 f_{n-2} - 1274 f_{n-3} + 251 f_{n-4} \right]\]

Adams-Moulton Formula

Definition. Suppose the resulting formula is of the following type: \[ x_{n+1} = x_n + a\cdot f_{n+1} + b\cdot f_{n} + c\cdot f_{n} + \cdots \] where \( f_i \) denotes \( f(t_i, x_i) \). An equation of this type is called an Adams-Moulton formula.

Note. The Adams-Moulton formula is an implicit method since it includes \( f_{n+1} \).

Adams-Moulton Formula of Order $5$

\[ x_{n+1} = x_n + \frac{h}{720} \left[ 251 f_{n+1} + 646 f_n - 264 f_{n-1} + 106 f_{n-2} - 19 f_{n-3} \right] \]

"Numerical Analysis: Mathematics of Scientific Computing", page. 551.

Linear Multistep Methods

Linear $k$-step method

The format of any such method is \[ a_k x_n + a_{k-1} x_{n-1} + \cdots + a_0 x_{n-k} = h \left[ b_k f_n + b_{k-1} f_{n-1} + \cdots + b_0 f_{n-k} \right]. \tag{9}\] Or in general, it can be written as \[\sum_{j=0}^{k} a_j x_{n+j} = h \sum_{j=0}^{k} b_j f(t_{n+j}, x_{n+j}), \] where \( a_k \neq 0 \). This is called a \( k \)-step method.

Note. The left side of the equation (9) is called the corresponding characteristic polynomial of the method.

Note

Examples.

"Introduction to Numerical Analysis" page. 331.

Order

The accuracy of a numerical solution of a differential equation is largely determined by the order of the algorithm used.

Definition.The order indicates how many terms in a Taylor-series solution are being simulated by the method.

For example, the Adams-Bashforth method: $x_{n+1} = x_n + \frac{h}{720} \left[ 1901 f_n - 2774 f_{n-1} + 2616 f_{n-2} - 1274 f_{n-3} + 251 f_{n-4} \right]$ is said to be of order 5 because it produces approximately the same accuracy as the Taylor-series method with terms \( h, h^2, h^3, h^4, \) and \( h^5 \). The error can then be expected to be \( \mathcal{O}(h^6) \) in each step of a solution using the above method.

In order to have better understanding of the order of the Multistep methods, we will define the following function \[ Lx = \sum_{i=0}^{k} \left[ a_i x(ih) - h b_i x'(ih) \right] a. \] However, in the following analysis, we assume that \( x \) is represented by its Taylor series at \( t = 0 \). By using the Taylor series for \( x \), one can express \( L \) in this form: \[ Lx = d_0 x(0) + d_1 h x'(0) + d_2 h^2 x''(0) + \cdots \tag{11} \] To compute the coefficients, \( d_i \), in Equation (11), we write the Taylor series for \( x \) and \( x' \): \[ x(ih) = \sum_{j=0}^{\infty} \frac{(ih)^j}{j!} x^{(j)}(0) \] \[ x'(ih) = \sum_{j=0}^{\infty} \frac{(ih)^j}{j!} x^{(j+1)}(0) \] These series are then substituted into Equation (10). Rearranging the result in powers of \( h \), we obtain an equation having the form of (11), with these values of \( d_i \): \[ \begin{aligned} d_0 &= \sum_{i=0}^{k} a_i \\ d_1 &= \sum_{i=0}^{k} (i a_i - b_i) \\ d_2 &= \sum_{i=0}^{k} \left( \frac{1}{2} i^2 a_i - i b_i \right) \\ &\vdots \\ d_j &= \sum_{i=0}^{k} \left( \frac{i^j}{j!} a_i - \frac{i^{j-1}}{(j-1)!} b_i \right) \quad (j \geq 1) \end{aligned} \tag{12} \] Here we use \( 0! = 1 \) and \( i^0 = 1 \).

Theorem on Multistep Method Properties. These three properties of the multistep method (9) are equivalent:

  1. \( d_0 = d_1 = \cdots = d_m = 0 \)
  2. \( Lp = 0 \) for each polynomial \( p \) of degree \( \leq m \)
  3. \( Lx \) is \( \mathcal{O}(h^{m+1}) \) for all \( x \in C^{m+1} \)

"Numerical Analysis: Mathematics of Scientific Computing", page. 552 & 553.

Alternate Definition of Order

Now, we can define the order of the multistep method as follows: The order of the multistep method in Equation (9) is the unique natural number \( m \) such that \[ d_0 = d_1 = \cdots = d_m = 0 \neq d_{m+1}. \]

EXAMPLE 1 What is the order of the method described by the following equation? \[ x_n - x_{n-2} = \frac{1}{3}h(f_n + 4f_{n-1} + f_{n-2}). \] Solution. Firstly, we rewrite the equation in the form of Equation (9): \[ \begin{align} x_n - x_{n-2} &= \frac{1}{3}h(f_n + 4f_{n-1} + f_{n-2}) \\ x_n + 0\cdot x_{n-1} - x_{n-2} &= h\left(\frac{1}{3}f_n + \frac{4}{3}f_{n-1} + \frac{1}{3}f_{n-2}\right) \end{align} \] And we have to match it to \[ a_2 x_n + a_{1} x_{n-1} + a_0 x_{n-2} = h \left[ b_2 f_n + b_{1} f_{n-1} + b_0 f_{n-2} \right]. \] The vector \((a_0, a_1, a_2)\) is \((-1, 0, 1)\) and the vector \((b_0, b_1, b_2)\) is \(\left(\frac{1}{3}, \frac{4}{3}, \frac{1}{3}\right)\). Thus, the \( d_i \) are \[ \begin{align} d_0 &= a_0 + a_1 + a_2 = 0 \\ d_1 &= -b_0 + (a_1 - b_1) + (2a_2 - b_2) = 0 \\ d_2 &= \left( \frac{1}{2}a_1 - b_1 \right) + (2a_2 - 2b_2) = 0 \\ d_3 &= \left( \frac{1}{6}a_1 - \frac{1}{2}b_1 \right) + \left( \frac{4}{3}a_2 - 2b_2 \right) = 0 \\ d_4 &= \left( \frac{1}{24}a_1 - \frac{1}{6}b_1 \right) + \left( \frac{2}{3}a_2 - \frac{4}{3}b_2 \right) = 0 \\ d_5 &= \left( \frac{1}{120}a_1 - \frac{1}{24}b_1 \right) + \left( \frac{4}{15}a_2 - \frac{2}{3}b_2 \right) = -\frac{1}{90} \end{align} \] The order of the method is \(4\).

"Numerical Analysis: Mathematics of Scientific Computing", page 554.

Root Condition

Definition. The roots of the first characteristic polynomial lie in the closed unit disc and those on the unit circle are simple is often called the Root Condition.

Stability

Definition. The roots of $\rho(z)$ of modulus one are called essential roots. The root \( z = 1 \) is called the principal root. The roots of \(\rho(z)\) of modulus \( \lt 1 \) are called nonessential roots.

Weakly Stability

Definition. A linear multistep method is strongly stable if all roots of \(\rho(z)\) are \(\leq 1\) in magnitude and only one root has magnitude one. If more than one root has magnitude one, the method is called weakly or conditionally stable.

Note. We still require only simple roots of magnitude one. Also, note these definitions refer to the case \( h = 0 \).

Strong Stability

Definition. A multistep method is said to be strongly stable if \( p(1) = 0, p'(1) \neq 0 \), and all other roots of \( p \) satisfy the inequality \( |z| < 1 \).

"Numerical Analysis: Mathematics of Scientific Computing", page 564.

Absolute Stability

Definition. A linear multistep method is said to be absolutely stable for a given value of \( \lambda h \) if each root \(z_r = z_r(\lambda h)\) of the associated stability polynomial \(\pi(\cdot; \lambda h)\) satisfies \(|z_r(\lambda h)| < 1\).

Region of Absolute Stability

Definition. The region of absolute stability of a linear multistep method is the set of all points \(\lambda h\) in the complex plane for which the method is absolutely stable.

\(A\)-stable

Definition. A linear multistep method is said to be A-stable if its region of absolute stability contains the negative (left) complex half-plane.

Second Barrier Theorem:

"Introduction to Numerical Analysis", page 348.


Local Truncation Error, Multistep Method

Theorem (Theorem on Local Truncation Error, Multistep Method). If the multistep method (2) is of order \( m \), if \( x \in C^{m+2} \), and if \( \partial f / \partial x \) is continuous, then under the hypotheses of the preceding paragraph, \[ x(t_n) - x_n = \left( \frac{d_{m+1}}{a_k} \right) h^{m+1} x^{(m+1)}(t_{n-k}) + \mathcal{O}(h^{m+2}) \tag*{(13)} \] (The coefficients \( d_k \) are defined in Section 8.4, p. 553.)

"Numerical Analysis: Mathematics of Scientific Computing", page 560.

Proof It suffices to prove the equation when \( n = k \), because \( x_n \) can be interpreted as the value of a numerical solution that began at the point \( t_{n-k} \). Using the linear functional \( L \), Equation (10) in Section 8.4 (p. 552), we have \[ Lx = \sum_{i=0}^k [a_i x(t_i) - h b_i x'(t_i)] = \sum_{i=0}^k [a_i x(t_i) - h b_i f(t_i, x(t_i))] \tag{(14)} \] On the other hand, the numerical solution satisfies the equation \[ 0 = \sum_{i=0}^k [a_i x_i - h b_i f(t_i, x_i)] \tag{(15)} \] Because we have assumed that \( x_i = x(t_i) \) for \( i \lt k \), the result of subtracting Equation (15) from Equation (14) will be \[ Lx = a_k [x(t_k) - x_k] - h b_k [f(t_k, x(t_k)) - f(t_k, x_k)] \tag{(16)} \] To the last term of Equation (16) we apply the Mean-Value Theorem, obtaining \[ Lx = a_k [x(t_k) - x_k] - h b_k \frac{\partial f}{\partial x} (t_k, \xi) [x(t_k) - x_k] = [a_k - h b_k F] [x(t_k) - x_k] \tag{(17)} \] where \( F = \frac{\partial f(t_k, \xi)}{\partial x} \) for some \( \xi \) between \( x(t_k) \) and \( x_k \). Now recall from Equation (13) in Section 8.4 (p. 553) that if the method being used is of order \( m \), then \( Lx \) will have the form \[ Lx = d_{m+1} h^{m+1} x^{(m+1)} (t_0) + \mathcal{O}(h^{m+2}) \tag{(18)} \] Combining Equations (17) and (18), we obtain (13). Here we can ignore \( h b_k F \) in the denominator. (See Problem 8.5.8, p. 564.) \(\blacksquare\)

"Numerical Analysis: Mathematics of Scientific Computing", page 563.


Schur Form

(Schur Decomposition). For any square matrix \( A \), there exists a unitary matrix \( U \) such that \[ U^* A U = T \] where \( T \) is upper triangular.

Remark. That is, every square matrix is similar to an upper-triangular matrix.


Hermite Matrix

Definition. A matrix \( A \) is called a Hermite matrix if it is equal to its conjugate transpose, that is, \( A = A^* \).

Theorem. Every Hermitian \( n \times n \) matrix \( A \) can be diagonalized by a unitary matrix, \[ U^* A U = \Lambda, \] where \( U \) is unitary and \( \Lambda \) is a diagonal matrix.

Proof. Since we know that the hermite matrix has the schur form of \(A = U T U^*\), where \(T\) is upper triangular. Since \(A = A^*\), we have \(A = U T U^* = (U T U^*)^* = U T^* U^* = U T U^*\). Thus, \(T = T^*\). Since \(T\) is upper triangular, we have \(T^*\) is lower triangular. Thus, \(T = T^*\) is diagonal. \(\blacksquare\)

Gram-Schmidt

QR Factorization

Reduced QR Factorization

Definition [Reduced QR factorization]. Let \( A \in \mathbb{F}^{m \times n} \) (\( m \geq n \)). There is an orthonormal matrix \( \hat{Q} \in \mathbb{F}^{m \times n} \) and a square upper triangular matrix \( \hat{R} \in \mathbb{F}^{n \times n} \) such that \[ A = \hat{Q} \hat{R}. \]

There are several ways to compute the QR factorization of a matrix.

Householder Transformation

Example. Let \(A = [a_{1} \; a_{2} \; a_{3}] = \begin{bmatrix} \sqrt{2} & 0 & 0 \\ 1 & 1 & 0 \\ 1 & 1 & 1 \end{bmatrix}\). Compute a QR factorization by using Householder Transformation.

Solution. Let \[ v_1 = a_1 + \begin{bmatrix} \|a_1\|_2 \\ 0 \\ 0 \end{bmatrix} = \begin{bmatrix} 2+\sqrt{2} \\ 1 \\ 1\end{bmatrix}. \], we define our first Householder reflector as \[ H_1 = I - 2\dfrac{v_1v_1^*}{v_1^*v_1} = \begin{bmatrix} 1 - \frac{\sqrt{2}+2}{2} & -\frac{1}{2} & -\frac{1}{2} \\ -\frac{1}{2} & 1 - \frac{1}{2(\sqrt{2}+2)} & -\frac{1}{2(\sqrt{2}+2)} \\ -\frac{1}{2} & -\frac{1}{2(\sqrt{2}+2)} & 1-\frac{1}{2(\sqrt{2}+2)} \\ \end{bmatrix}= \begin{bmatrix} -\sqrt{2}/2 & -1/2 & -1/2 \\ -1/2 & (\sqrt{2}+2)/4 & (\sqrt{2}- 2)/4 \\ -1/2 & (\sqrt{2}-2)/4& (\sqrt{2}+2)/4\\ \end{bmatrix} \] Then, we have a new matrix \(M_2\) as \[ \begin{align} M_2 &= H_1\cdot A\\ &= \begin{bmatrix} 1 - \frac{\sqrt{2}+2}{2} & -\frac{1}{2} & -\frac{1}{2} \\ -\frac{1}{2} & 1 - \frac{1}{2(\sqrt{2}+2)} & -\frac{1}{2(\sqrt{2}+2)} \\ -\frac{1}{2} & -\frac{1}{2(\sqrt{2}+2)} & 1-\frac{1}{2(\sqrt{2}+2)} \\ \end{bmatrix}\cdot \begin{bmatrix} \sqrt{2} & 0 & 0 \\ 1 & 1 & 0\\ 1 & 1 & 1\\ \end{bmatrix}\\ &= \begin{bmatrix} -2 & -1 & -\frac{1}{2} \\ 0 & 1 -\frac{1}{\sqrt{2}+2} & - \frac{1}{\sqrt{2}+2} \\ 0 & 1 -\frac{1}{\sqrt{2}+2} & 1 - \frac{1}{2(\sqrt{2}+2)} \\ \end{bmatrix}\\ &= \begin{bmatrix} -2 & -1 & -1/2 \\ 0 & \sqrt{2}/2 & (\sqrt{2}-2)/2 \\ 0 & \sqrt{2}/2 & (2+\sqrt{2})/4 \\ \end{bmatrix} \end{align} \] Now we pick vector \(a'_2 = [\sqrt{2}/2, \sqrt{2}/2]^T\), and we have \(\|a'_2\|_2 = 1\). Now, we define \(v_2 = a'_2 + \begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} \sqrt{2}/2 + 1 \\ \sqrt{2}/2 \end{bmatrix}\). Then, we define our second reduced Householder reflector as \[ \begin{align} H'_2 &= I - 2\dfrac{v_2v_2^*}{v_2^*v_2} \\ &= \begin{bmatrix} 1 - \frac{2+\sqrt{2}}{2} & -\frac{\sqrt{2}}{2} \\ -\frac{\sqrt{2}}{2} & 1 - \frac{2-\sqrt{2}}{2} \\ \end{bmatrix}\\ &= \begin{bmatrix} -\sqrt{2}/2 & -\sqrt{2}/2 \\ -\sqrt{2}/2 & \sqrt{2}/2 \\ \end{bmatrix}\\ \end{align} \] And we extend \(H'_2\) to a \(3\times 3\) matrix \(H_2\) as \[ H_2 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & -\sqrt{2}/2 & -\sqrt{2}/2 \\ 0 & -\sqrt{2}/2 & \sqrt{2}/2 \\ \end{bmatrix} \] Then we have \(M_3 = H_2M_2\) as \[ \begin{align} M_3 &= H_2M_2\\ &= \begin{bmatrix} 1 & 0 & 0 \\ 0 & -\sqrt{2}/2 & -\sqrt{2}/2 \\ 0 & -\sqrt{2}/2 & \sqrt{2}/2 \\ \end{bmatrix}\cdot \begin{bmatrix} -2 & -1 & -1/2 \\ 0 & \sqrt{2}/2 & (2-\sqrt{2})/2 \\ 0 & \sqrt{2}/2 & (2+\sqrt{2})/4 \\ \end{bmatrix}\\ &= \begin{bmatrix} -2 & -1 & -1/2 \\ 0 & -1 & -1/2 \\ 0 & 0 & 1/\sqrt{2} \end{bmatrix}. \end{align} \] Given that for any Householder reflector \(H\), we have \(H^{-1} = H\) (Proved in the next problem). Thus, we have \(QR\) factorization as \[ \begin{align} \begin{bmatrix} \sqrt{2} & 0 & 0 \\ 1 & 1 & 0 \\ 1 & 1 & 1 \end{bmatrix} &= \begin{bmatrix} -2 & -1 & -1/2 \\ 0 & \sqrt{2}/2 & (\sqrt{2}-2)/2 \\ 0 & \sqrt{2}/2 & (2+\sqrt{2})/4 \\ \end{bmatrix}\cdot \begin{bmatrix} 1 & 0 & 0 \\ 0 & -\sqrt{2}/2 & -\sqrt{2}/2 \\ 0 & -\sqrt{2}/2 & \sqrt{2}/2 \\ \end{bmatrix} \begin{bmatrix} -2 & -1 & -1/2 \\ 0 & -1 & -1/2 \\ 0 & 0 & 1/\sqrt{2} \end{bmatrix} \\ &= \begin{bmatrix} -\sqrt{2}/2 & \sqrt{2}/2 & 0 \\ -1/2 & -1/2 & -\sqrt{2}/2 \\ -1/2 & -1/2 & \sqrt{2}/2 \\ \end{bmatrix}\cdot \begin{bmatrix} -2 & -1 & -1/2 \\ 0 & -1 & -1/2 \\ 0 & 0 & 1/\sqrt{2} \end{bmatrix}. \end{align} \]

LU Decomposition

The LU decomposition of a matrix \(A\) is a factorization of \(A\) into a product of a lower triangular matrix \(L\) and an upper triangular matrix \(U\). The LU decomposition with partial pivoting is given by \[ PA = LU \] where \(P\) is a permutation matrix, \(L\) is a lower triangular matrix with ones on the diagonal, and \(U\) is an upper triangular matrix. The LU decomposition is computed using Gaussian elimination with partial pivoting. One thing we need to know is when the LU decomposition exists and the following proposition gives us the answer.

Proposition 5. Let \( A = (a_{ij}) \) be an \( n \times n \) matrix and for \( k = 1, \ldots, n-1 \), let \( A(k) \) denote the \( k \times k \) submatrix of \( A \) consisting of the first \( k \) rows and \( k \) columns of \( A \). If \( A(k) \) is invertible for \( k = 1, 2, \ldots, n-1 \), then \( A \) has a unique LU-decomposition.

LU Decomposition provides a good explanation of the LU decomposition.

Bauer-Fike Theorem

Consider a diagonalizable matrix \(A \in \mathbb{C}^{n \times n}\) with non-singular eigenvector matrix \(V \in \mathbb{C}^{n \times n}\), such that \(A = V\Lambda V^{-1}\), where \(\Lambda\) is a diagonal matrix. If \(X \in \mathbb{C}^{n \times n}\) is invertible, its condition number in the \(p\)-norm is denoted by \(\kappa_p(X)\) and defined as: \[ \kappa_p(X) = |X|_p |X^{-1}|_p \]

\(\textbf{Theorem. }\) Let A be an \( n \times n \) matrix with a complete set of linearly independent eigenvectors and suppose the \( V^{-1}AV = D \) where \( V \) is non-singular and \( D \) is diagonal. Let \( \delta A \) be a perturbation of \( A \) and let \( \mu \) be an eigenvalue of \( A + \delta A \). Then \( A \) has an eigenvalue \( \lambda \) such that \[ |\mu - \lambda| \leq \kappa_p(V) \| \delta A \|_p, \quad 1 \leq p \leq \infty \] where \( \kappa_p(V) \) is the \( p \)-norm condition number of \( V \).

The Bauer-Fike Theorem:

The Bauer–Fike theorem is a powerful result, and its proof involves several key concepts from linear algebra and perturbation theory. Let's outline the main steps of the proof:

Theorem [Bauer-Fike Theorem] Suppose \( A, E \in \mathbb{C}^{m \times m} \) and \( A = X \Lambda X^{-1} \), where \[ \Lambda = \begin{bmatrix} \lambda_1 & & \\ & \ddots & \\ & & \lambda_m \end{bmatrix}. \] Then for any eigenvalue \( \tilde{\lambda} \) of \( \tilde{A} = A + E \), there exists \( \lambda_i \), an eigenvalue of \( A \), such that \[ |\tilde{\lambda} - \lambda_i| \leq \kappa_2(X) \|E\|_2, \] where \( \kappa_2(X) = \|X\|_2 \|X^{-1}\|_2 \).

Proof. Suppose \( 0 \neq x \in \mathbb{C}^m \) is an eigenvector of \( \tilde{A} \) corresponding to \( \tilde{\lambda} \). Then for \( y = X^{-1} x \), \[ (A + E - \tilde{\lambda} I)x = 0 \implies (X \Lambda X^{-1} - \tilde{\lambda} I + E)x = 0 \implies (\Lambda - \tilde{\lambda} I + X^{-1} EX)y = 0, \] and from which \[ (\Lambda - \tilde{\lambda} I)y = -X^{-1} EX y. \] Then \[ \sigma_{\min}(\Lambda - \tilde{\lambda} I)\|y\|_2 \leq \|(\Lambda - \tilde{\lambda} I)y\|_2 = \|X^{-1} EX y\|_2 \leq \|X^{-1} EX\|_2 \|y\|_2. \] Since \( \Lambda - \tilde{\lambda} I \) is diagonal, \[ \sigma_{\min}(\Lambda - \tilde{\lambda} I) = \min_j |\lambda_j - \tilde{\lambda}|. \] Suppose the minimum is attained when \( j = i \). Then \[ |\lambda_i - \tilde{\lambda}| \leq \|X^{-1} EX\|_2 \leq \kappa_2(X) \|E\|_2. \]

"Lecture note No. 22"


Norms

Definition (Forbenius norm). For a matrix \( A \in \mathbb{C}^{m \times n} \), the Forbenius norm is defined as \[ \|A\|_F = \sqrt{\sum_{i = 1}^m \sum_{j = 1}^n |a_{ij}|^2}. \]

Definition (\(p\)-norm). For a matrix \( A \in \mathbb{C}^{m \times n} \), the \(p\)-norm is defined as \[ \|A\|_p = \max_{x \neq 0} \frac{\|Ax\|_p}{\|x\|_p} = \max_{\|x\|_p = 1} \|Ax\|_p, \] where \(x\) is a vector in \(\mathbb{C}^n\).

Theorem. For any \( A \in \mathbb{C}^{m \times n} \) and unitary \( Q \in \mathbb{C}^{m \times m} \), we have \[ \|QA\|_2 = \|A\|_2, \quad \|QA\|_F = \|A\|_F. \]

"Numerical Linear Algebra", Page. 23

Theorem. For any \( A \in \mathbb{C}^{m \times n} \) and unitary \( Q \in \mathbb{C}^{n \times n} \), we have \[ \|AQ\|_2 = \|A\|_2, \quad \|AQ\|_F = \|A\|_F. \]


Positive Definite

The following definitions all involve the term \( x^* M x \). Notice that this is always a real number for any Hermitian square matrix \( M \). An \( n \times n \) Hermitian complex matrix \( M \) is said to be positive-definite if \( x^* M x > 0 \) for all non-zero \( x \) in \( \mathbb{C}^n \). Formally, \[ M \text{positive-definite} \iff x^* M x > 0 \text{ for all } x \in \mathbb{C}^n \setminus \{0\} \] An \( n \times n \) Hermitian complex matrix \( M \) is said to be positive semi-definite or non-negative-definite if \( x^* M x \geq 0 \) for all \( x \) in \( \mathbb{C}^n \). Formally, \[ M \text{positive semi-definite} \iff x^* M x \geq 0 \text{ for all } x \in \mathbb{C}^n \]

Theorem. A matrix \( M \) is positive-definite if and only if it satisfies any of the following equivalent conditions:


Gerschgorin’s Theorem

Theorem [Row version Gerschgorin’s Theorem] Let \( A \in \mathbb{C}^{m \times m} \). Consider \( m \) disks in the complex plane: \[ D_i = \left\{ z \mid |z - a_{ii}| \leq \sum_{j \neq i} |a_{ij}| = |a_{i1}| + \ldots + |a_{i,i-1}| + |a_{i,i+1}| + \ldots + |a_{im}| \right\}. \] Then we have the following results.


Theorem [Column version Gerschgorin’s Theorem] Let \( A \in \mathbb{C}^{m \times m} \). Consider \( m \) disks in the complex plane: \[ \tilde{D}_i = \left\{ z \mid |z - a_{ii}| \leq \sum_{j \neq i} |a_{ji}| = |a_{1i}| + \ldots + |a_{i-1,i}| + |a_{i+1,i}| + \ldots + |a_{mi}| \right\}. \] Then we have the following results.

Theorem [Row version Generalized Gerschgorin’s Theorem] Let \( A \in \mathbb{C}^{m \times m} \). Consider \( m \) disks in the complex plane: \[ D_i = \left\{ z \mid |z - a_{ii}| \leq \frac{1}{|d_i|} \sum_{j \neq i} |a_{ij}||d_j| \right\}, \] where \( d_1, \ldots, d_n \) are arbitrary nonzero numbers. Then we have the following results.

"Lecture note No. 22"


Rayleigh Quotient

Definition. Let \( A \in \mathbb{C}^{m \times m} \). For a given vector \( 0 \neq x \in \mathbb{C}^m \), the Rayleigh quotient of \( x \) is the scalar defined by \[ r(x) = \frac{x^* A x}{x^* x} \]

Algorithm: Rayleigh quotient iteration

Choose an initial vector \( x^{(0)} \in \mathbb{C}^m \) with \( \|x^{(0)}\|_2 = 1 \).

Compute \( \mu^{(0)} = (x^{(0)})^* A x^{(0)} \).

For \( k = 1, 2, \ldots \)

End


Upper Hessenberg Matrix

Definition. A matrix \( H = [h_{ij}] \in \mathbb{C}^{m \times m} \) is upper Hessenberg if \( h_{ij} = 0 \) for all \( i \geq j + 2 \), or equivalently, \[ H = \begin{bmatrix} h_{11} & h_{12} & \cdots & h_{1m} \\ h_{21} & h_{22} & \cdots & h_{2m} \\ \vdots & \vdots & \ddots & \vdots \\ h_{m,m-1} & h_{mm} \end{bmatrix} \] \( H \) is called unreduced upper Hessenberg if \( h_{21}, \ldots, h_{m,m-1} \neq 0 \). Otherwise \( H \) is reduced upper Hessenberg.

References