Mean Ergodic Theorem in Hilbert Space

Mean Ergodic Theorem

The mean ergodic theorem is a fundamental result in the ergodic theory, and produces the von Neumann ergodic theorem as a corollary. The theorem is stated as follows (Ergodic Theory, Karl Peterson):

Theorem

Let U\in \mathcal L(\mathcal H) be a linear transform on the Hilbert space \mathcal H, and satisfies the contraction property:

\|Uf\| \leqslant \|f\|,~~~~\forall f\in\mathcal H.

Define the invariance subspace of U by

\mathcal M = \{f\in\mathcal H:Uf=f\}.

Let \mathcal P:\mathcal H\mapsto \mathcal M be the orthogonal projection from \mathcal H to \mathcal M. Then for any f\in\mathcal H,

\displaystyle \lim_{n\rightarrow\infty} \frac1n\sum_{k=0}^{n-1} U^kf = \mathcal P f

in the sense of \mathcal H.

This result states that the time average of \{U^k f\}_{k=0}^{\infty} can be used to approximate \mathcal Pf, which is classical in the ergodic theory.

Proof Define \mathcal N to be the span of the subspace \{g-Ug:g\in\mathcal H\}, then \mathcal N is the orthogonal complement of \mathcal M. Now we compute the time average of \{U^k f\}_{k=0}^\infty.

(1) If f = g - Ug for some g\in\mathcal H, then it is direct to see

\displaystyle \frac1n\sum_{k=0}^{n-1} U^k f = \frac1n \sum_{k=0}^{n-1} U^k(g-Ug) = \frac1n(g-U^ng),

hence we have

\displaystyle \bigg\|\frac1n\sum_{k=0}^{n-1} U^kf\bigg\| \leqslant \bigg\|\frac1n(g-U^ng)\bigg\| \leqslant \frac2n \|g\|.

(2) If f\in\mathcal N, then there exists \{g_i\}_{i=1}^\infty in \mathcal N such that f_i = g_i - Ug_i converges to f in \mathcal H. For any \varepsilon>0, we can choose f_i such that \|f-f_i\|<\frac{\varepsilon}2, and thus

\displaystyle \bigg\|\frac1n\sum_{k=0}^{n-1} U^k f - \frac1n \sum_{k=0}^{n-1} U^k f_i\bigg\| \leqslant \|f-f_i\| < \frac{\varepsilon}2.

For this given f_i, for sufficiently large n we have

\displaystyle \bigg\|\frac1n\sum_{k=0}^{n-1} U^k f_i\bigg\| \leqslant \frac2n \|g_i\| < \frac{\varepsilon}2.

Therefore we conclude that for sufficiently large n\in\mathbb N,

\displaystyle \bigg\|\frac1n\sum_{k=0}^{n-1} U^kf\bigg\| \leqslant \bigg\|\frac1n\sum_{k=0}^{n-1} U^k f - \frac1n \sum_{k=0}^{n-1} U^k f_i\bigg\| + \bigg\| \frac1n \sum_{k=0}^{n-1} U^k f_i\bigg\| <\varepsilon.

Then we obtain \displaystyle \lim_{n\rightarrow\infty} \frac1n \sum_{k=0}^{n-1} U^k f = 0 in the norm of \mathcal H.

From the proof we know that, if \{g - Ug:g\in\mathcal H\} is a closed set, then the convergence rate of the time average is explicitly given by \mathcal O(1/n).

A natural consequence is the von Neumann’s mean ergodic theorem, which is the FIRST ergodic theorem in history.

Theorem

Let (X,\mathcal B,\mu) be a \sigma-finite measure space, T:X\mapsto X be a measure-preserving transformation, and f\in L^2(X,\mathcal B,\mu). Then there exists a function \bar f\in L^2(X,\mathcal B,\mu) for which

\displaystyle \lim_{n\rightarrow\infty} \bigg\|\frac1n \sum_{k=0}^{n-1} f\circ T^k - \bar f \bigg\|_{2} = 0.

Here, f \in L^2(X,\mathcal B,\mu) means that \displaystyle \int_{X} |f^2(x)| \mathrm{d}\mu(x) < +\infty.

Example: Irreducible Markov Chain

Now we consider a special example: the irreducible Markov chain in the Euclidean space \mathbb R^d. We shall employ the mean ergodic theorem to prove convergence of the weak convergence in the irreducible Markov chain.

Theorem

Let \{X_n\}_{n\geqslant 0} be a Markov chain in \mathbb R^d with invariant distribution \pi. Suppose for any x\in\mathbb R^d, the transition probability p(x,y) has a positive lower bound when y lives in a compact set. For given f\in L^2(\pi), for almost every initial value X_0 \sim \pi there holds

\displaystyle \lim_{n\rightarrow\infty} \mathbb E\bigg[\frac1n \sum_{k=0}^{n-1} f(X_n)\bigg] = \int_{\mathbb R^d} f(x) \pi(x)\mathrm{d} x.

Here, the Hilbert space L^2(\pi) actually means L^2(\mathbb R^d, \mathcal B(\mathbb R^d), \pi), where \mathcal B(\mathbb R^d) is the Borel \sigma-algebra in \mathbb R^d. The “irreducible” corresponds to the positivity of the transition probability p(x,y). The theorem asserts that in the weak sense, the time average converges to the spatial average almost surely.

Proof Introduce the operator P on L^2(\pi) by

\displaystyle (Pf)(x) = \mathbb E\big[f(X_{n+1}):X_n = x\big] = \int_{\mathbb R^d} p(x,y) f(y) \mathrm{d} y.

Now we prove two essential properties of \mathcal P.

(1) P is a contraction in L^2(\pi).

This is actually a classical result from the Markov semigroup theory, (The geometry of Markov diffusion generators, Bakry) for example. Still, we present a short proof of this property below. It is direct to calculate for any x\in\mathbb R^d,

|P f(x)|^2 = \mathbb E\big[f(x_1) f(x_2): x_1,x_2\sim p(x,\cdot)\big],

where p(x,\cdot) is the transition probability distribution. Integrate both sides over \pi(x) gives

\displaystyle \begin{aligned} \int_{\mathbb R^d} |P f(x)|^2 \pi(x) \mathrm{d} x & = \mathbb E\big[f(x_1) f(x_2):x_1,x_2\sim p(x,\cdot),x\sim \pi\big] \\ &= \mathbb E\big[f(x_1) f(x_2): (x_1,x_2) \sim m\big]. \end{aligned}

Here, m is a probability distribution in \mathbb R^d\times \mathbb R^d with density

m(x_1,x_2) = \int_{\mathbb R^d} \pi(x) p(x,x_1) p(x,x_2)\mathrm{d} x,~~~~x_1,x_2\in\mathbb R^d.

Clearly, the marginal distributions of m in x_1,x_2 are both \pi. Therefore,

\displaystyle \begin{aligned} \int_{\mathbb R^d} |P f(x)|^2 \pi(x) \mathrm{d} x & \leqslant \mathbb E\bigg[\frac{|f(x_1)|^2+|f(x_2)|^2}2:(x_1,x_2)\sim m\bigg] \\ & = \mathbb E[|f(x)|^2:x\sim\pi], \end{aligned}

and we conclude that P is a contraction in \mathbb R^d.

(2) The solution to (Pf)(x) = f(x) for f\in L^2(\pi) is constant.

Clearly, if f \equiv c is a constant function, then

\displaystyle (Pf)(x) = \int_{\mathbb R^d} p(x,y) c\mathrm{d} y = c,

which implies f is actually a solution. Next, if some non-constant f satisfies Pf = f, we can assume f has a nonzero maximum. Specifically, let f(x_0) = M, and M is the maximum. Then

\displaystyle \int_{\mathbb R^d} p(x_0,y) f(y) \mathrm{d} y = f(x_0) = M

implies

\displaystyle \int_{\mathbb R^d} p(x_0,y) (M - f(y)) \mathrm{d} y = 0.

Since p(x_0,y) has a positive lower bound when y lives in a compatc set, we claim that f(y) = M holds true almost everywhere. This is a contradition.

Using the two properties above, we immediately obtain the result from the mean ergodic theorem.

Extension to Continuous Time Stochastic Process

A natural idea is to extend the mean ergodic theorem to the continuous time stochastic processes. There is an intuitive approach to complete this task: in the discrete time case, we have used f = g - Ug to conclude

\displaystyle \sum_{k=0}^{n-1} U^k f = g- U^ng.

In the continuous case, suppose \mathcal L is the generator of the stochastic process, then we wish for some g there holds

\displaystyle \int_{0}^t e^{s\mathcal L} f \mathrm{d} s = g - e^{t\mathcal L} g.

Differentiating on the variable t gives

e^{t\mathcal L} f = -e^{t\mathcal L} L g \Longrightarrow Lg = -f.

Therefore, to study the time average corresponding to the function f, we only need to solve the Poisson equation Lg = -f (note that L is usually an elliptic operator).

The details of the method can be found in Mattingly’s prominent paper Convergence of numerical time-averaging and stationary measures via Poisson equations, which also provides the error analysis of numerical discretization methods.

Summary

We establish the mean ergodic theorem in the Hilbert space and apply the theorem to prove the convergence of the time average of a Markov chain.

留下评论