Tuesday, May 17, 2016

Dirichlet's theorem

There have been some calls for an informal exposition of the recent formal proof of Dirichlet's theorem:
Theorem 9.4.1 (dirith, Dirichlet's theorem): If $N\in\Bbb N$ and $(A,N)=1$, then there are infinitely many primes $p$ such that $N\mid p-A$.
If you have a copy of Shapiro, "Introduction to the Theory of Numbers", open it to p. 375, Theorem 9.4.1, because the proof follows the book closely. Otherwise, I will do my best to explain the steps, as well as a few things that Shapiro glossed over.

The major players in the proof are the von Mangoldt function $\Lambda(n)$, which is defined by $\Lambda(p^\alpha)=\log p$ if $p^\alpha$ is a prime power and $\Lambda(n)=0$ otherwise, the Möbius function $\mu(n)$, which is $1$ if $n$ is a squarefree number with an even number of factors, $-1$ if it is squarefree with an odd number of factors, and $0$ otherwise, and the Dirichlet characters $\chi(n)$, which are completely multiplicative functions on $\Bbb Z/N\Bbb Z$ that are nonzero on the units. Facts about these functions will be introduced as we go along.

The starting point is the following theorem (vmadivsum, Equation 9.2.13): $$\sum_{n\le x}\frac{\Lambda(n)}n=\log x+O(1).$$
Proof: By definition, $\psi(x)=\sum_{n\le x}\Lambda(n)$ (this is the second Chebyshev function), and from chpo1ub $\psi(x)=O(x)$. Also (logfac2), $$\log\lfloor x\rfloor!=\sum_{n\le x}\Lambda(n)\left\lfloor\frac xn\right\rfloor.$$ We can put these together via \begin{align} \sum_{n\le x}\frac{\Lambda(n)}n &= \frac1x\sum_{n\le x}\Lambda(n)\frac xn\\ &=\frac1x\sum_{n\le x}\Lambda(n)\left\lfloor\frac xn\right\rfloor + \frac1x\sum_{n\le x}\Lambda(n)\left\{\frac xn\right\}\\ &=\frac1x\log\lfloor x\rfloor! + \frac1x O(\sum_{n\le x}\Lambda(n))\\ &=\frac1x(x\log x+O(x)) + \frac1x O(\psi(x))\\ &=\log x+O(1). \end{align} (We also used the asymptotic approximation of the factorial (logfaco1), $\log\lfloor x\rfloor!=x\log x+O(x)$ in this equation.) $$\tag*{\blacksquare}$$

Our goal will be to strengthen this to talk about a specific progression $\bmod N$ (we will hold $N$ fixed throughout the proof) to achieve rpvmasum, Equation 9.4.3: $$\phi(N)\sum_{\substack{n\le x\\n\equiv A}}\frac{\Lambda(n)}n=\log x+O(1).$$ Once we have established this, the main theorem is simple:

Theorem (rplogsum, Equation 9.4.4): $$\phi(N)\sum_{\substack{p\le x\\p\equiv A}}\frac{\log p}p=\log x+O(1).$$
Proof: (Note that $p$ is always a variable over primes here.) \begin{align} \sum_{\substack{n\le x\\n\equiv A}}\frac{\Lambda(n)}n-\sum_{\substack{p\le x\\p\equiv A}}\frac{\log p}p &= \sum_{\substack{p^\alpha\le x, \alpha>1\\p^\alpha\equiv A}}\frac{\log p}{p^\alpha} \le \sum_{p^\alpha\le x, \alpha>1}\frac{\log p}{p^\alpha}\\ &\le \sum_{p\in\Bbb P}\sum_{\alpha=2}^\infty\frac{\log p}{p^\alpha} = \sum_{p\in\Bbb P}\frac{\log p}{p(p-1)}\\ &\le \sum_{n=2}^\infty\frac{\log n}{n(n-1)} \le \sum_{n=2}^\infty\frac{\sqrt{n-1}}{n(n-1)}\\ &\le \sum_{n=2}^\infty\left(\frac2{\sqrt{n-1}}-\frac2{\sqrt n}\right)=2 \end{align} Thus $\phi(N)\sum_{n\le x,n\equiv A}\frac{\Lambda(n)}n=\log x+O(1)$ implies $\phi(N)\sum_{p\le x,p\equiv A}\frac{\log p}p=\log x+O(1)$. But this implies that $\sum_{p\le x,p\equiv A}\frac{\log p}p$ diverges, and since a finite sum cannot diverge, this means that $\{p\in\Bbb P\mid p\equiv A\pmod N\}$ is infinite. $$\tag*{\blacksquare}$$

We are left with the task of establishing rpvmasum. At this point we turn to Dirichlet characters as a "better basis" from which to weight the sum. This works because of the relation (sum2dchr) $$\sum_{\chi}\bar\chi(A)\chi(n)=\begin{cases}\phi(N)&n\equiv A\\0&o.w.\end{cases}$$ (where the sum is taken over all Dirichlet characters $\chi$). Using this equation we can rewrite the sum as $$\sum_{\substack{n\le x\\n\equiv A}}\frac{\Lambda(n)}n=\sum_{\chi}\bar\chi(A)\sum_{n\le x}\frac{\chi(n)\Lambda(n)}n.$$
There are two cases. The $\log x$ term comes from the principal character $\chi=\chi_0$ (written $\chi=\bf 1$ in Metamath):

Theorem (rpvmasumlem): $$\sum_{n\le x}\frac{\chi_0(n)\Lambda(n)}n=\log x+O(1)$$ Proof: \begin{align} \sum_{n\le x}\frac{\chi_0(n)\Lambda(n)}n &=\sum_{\substack{n\le x\\(n,N)=1}}\frac{\Lambda(n)}n\\ &=\sum_{n\le x}\frac{\Lambda(n)}n-\sum_{\substack{n\le x\\(n,N)>1}}\frac{\Lambda(n)}n\\ &=\log x+O(1)-\sum_{\substack{p^\alpha\le x\\(p^\alpha,N)>1}}\frac{\log p}{p^\alpha} \end{align} If $(p^\alpha,N)>1$, then $p$ is a divisor of $N$, and there are only finitely many of those. Thus $$\sum_{\substack{p^\alpha\le x\\(p^\alpha,N)>1}}\frac{\log p}{p^\alpha} \le\sum_{p\mid N}\sum_{\alpha=1}^\infty\frac{\log p}{p^\alpha} =\sum_{p\mid N}\frac{\log p}{p-1}$$ so $\sum_{n\le x,(n,N)>1}\frac{\Lambda(n)}n=O(1)$ and $\sum_{n\le x}\frac{\chi_0(n)\Lambda(n)}n=\log x+O(1)$. $$\tag*{\blacksquare}$$

The other case (the hard case) is when $\chi\ne\chi_0$. Fix a nonprincipal character $\chi$. Our goal is to show (dchrvmasum, Equation 9.4.8) $$\sum_{n\le x}\frac{\chi(n)\Lambda(n)}n=O(1).$$ From this it follows that \begin{align} \sum_{\substack{n\le x\\n\equiv A}}\frac{\Lambda(n)}n &=\sum_{n\le x}\frac{\chi_0(n)\Lambda(n)}n+\sum_{\chi\ne\chi_0}\bar\chi(A)\sum_{n\le x}\frac{\chi(n)\Lambda(n)}n\\ &=(\log x+O(1))+\sum_{\chi\ne\chi_0}\bar\chi(A)O(1)\\ &=\log x+O(1), \end{align} which will prove vmadivsum.

Theorem (vmasum, Equation 9.2.4): $$\log n=\sum_{d\mid n}\Lambda(d)$$
Proof: \begin{align} \sum_{d\mid n}\Lambda(d) &=\sum_{p^\alpha\mid n}\log p =\sum_{p}\sum_{\alpha\le\nu_p(n)}\log p\\ &=\sum_{p}\nu_p(n)\log p =\log\prod_{p} p^{\nu_p(n)}=\log n \end{align} where $\nu_p(n)$ (written in Metamath as $(p\mbox{ pCnt }n)$) is the exponent of $p$ in the prime factorization of $n$. $$\tag*{\blacksquare}$$

Using Möbius inversion (muinv) on this equation gives $$\Lambda(n)=\sum_{d\mid n}\mu(d)\log\frac nd,$$ hence (dchrvmasumlem1) \begin{align} \sum_{n\le x}\frac{\chi(n)\Lambda(n)}n &=\sum_{n\le x}\sum_{d\mid n}\frac{\chi(n)\mu(d)}n\log\frac nd\\ &=\sum_{d\le x}\sum_{m\le x/d}\frac{\chi(dm)\mu(d)}{dm}\log m\\ &=\sum_{d\le x}\frac{\chi(d)\mu(d)}{d}\sum_{m\le x/d}\frac{\chi(m)}{m}\log m. \end{align}
We have a similar equation for $\log x$ (dchrvmasum2lem): \begin{align} \log x&=\sum_{n\le x}\frac{\chi(n)}n\log\frac xn\sum_{d\mid n}\mu(d)\\ &=\sum_{n\le x}\sum_{d\mid n}\frac{\chi(n)\mu(d)}n\log\frac xn\\ &=\sum_{d\le x}\sum_{m\le x/d}\frac{\chi(dm)\mu(d)}{dm}\log\frac x{dm}\\ &=\sum_{d\le x}\frac{\chi(d)\mu(d)}{d}\sum_{m\le x/d}\frac{\chi(m)}{m}\log\frac x{dm}. \end{align}
The first step is an application of musum to collapse the double sum to just the $n=1$ term. Combining these two yields the equation $$\sum_{n\le x}\frac{\chi(n)\Lambda(n)}n+\log x=\sum_{d\le x}\frac{\chi(d)\mu(d)}{d}\sum_{m\le x/d}\frac{\chi(m)}{m}\log\frac xd.$$
Our next task is to get an asymptotic expression for the sums $\sum_{m\le z}\frac{\chi(m)}{m}$ and $\sum_{m\le z}\frac{\chi(m)}{m}\log m$ appearing here. To that end, we introduce the following lemma:

Theorem (dchrisum, Theorem 9.4.1): Let $f:\Bbb R^+\to\Bbb R$ be an eventually monotonically decreasing function (that is, for some $M$, $M\le x\le y$ implies $f(x)\ge f(y)$) which converges to $0$. Then $\sum_n\chi(n)f(n)$ converges to some $L$, and for some $c$, $$x\ge M\implies|\sum_{n\le x}\chi(n)f(n)-L|\le cf(x).$$
Proof: It suffices to show that $\sum_{a\le n<b}\chi(n)f(n)=O(f(x))$ (for positive integers $a,b$ with $x\le a\le b$), because then the sequence is Cauchy and so converges to a limit $L$, and since the entire sequence above $x$ is inside the closed set $\{z\in\Bbb C\mid|\sum_{n\le x}\chi(n)f(n)-z|\le f(x)\}$, so is the limit.

Let $R(k)=\sum_{n=0}^{k-1}\chi(n)$. Then $R(N)$ sums over all the equivalence classes of $\Bbb Z/N\Bbb Z$ exactly once, so $R(N)=\sum_{i\in\Bbb Z/N\Bbb Z}\chi(i)$, and since $\chi$ is nonprincipal, by dchrsum $R(N)=0$. Let $C\ge|R(k)|$ for all $0\le k<N$. (This is a finite set, so there is a finite supremum.) Then setting $k=aN+b$ where $b<N$, we have \begin{align} R(k)&=\sum_{i=0}^{a-1}\sum_{n=0}^{N-1}\chi(iN+n)+\sum_{n=0}^{b-1}\chi(aN+n)\\ &=\sum_{i=0}^{a-1}\sum_{n=0}^{N-1}\chi(n)+\sum_{n=0}^{b-1}\chi(n)\\ &=\sum_{i=0}^{a-1}R(N)+R(b)=R(b), \end{align} and $C\ge|R(b)|$; hence $C\ge|R(k)|$ for all $k$.

Using summation by parts, \begin{align} |\sum_{a\le n<b}\chi(n)f(n)| &=|\sum_{a\le n<b}(R(n+1)-R(n))f(n)|\\ &=|R(b)f(b)-R(a)f(a)-\sum_{a\le n<b}R(n+1)(f(n+1)-f(n))|\\ &\le Cf(b)+Cf(a)+\sum_{a\le n<b}C|f(n+1)-f(n)|\\ &=Cf(b)+Cf(a)+\sum_{a\le n<b}C(f(n)-f(n+1))\\ &=Cf(b)+Cf(a)+C(f(a)-f(b))\le 2Cf(x). \end{align} $$\tag*{\blacksquare}$$

With dchrisum in hand we can apply it to the decreasing functions $f(x)=\frac{\log x}x$ and $f(x)=\frac1x$; let $$L_0=\sum_{n=1}^\infty\frac{\chi(n)}n\qquad L_1=\sum_{n=1}^\infty\frac{\chi(n)\log n}n$$ be the limits of the series that result.

Theorem (dchrmusum2, Lemma 9.4.2): $$L_0\sum_{n\le x}\frac{\chi(n)\mu(n)}n=O(1).$$
Proof: We have: \begin{align} 1&=\sum_{n\le x}\sum_{d\mid n}\frac{\chi(n)\mu(d)}n\\ &=\sum_{d\le x}\sum_{m\le x/d}\frac{\chi(dm)\mu(d)}{dm}\\ &=\sum_{d\le x}\frac{\chi(d)\mu(d)}{d}\sum_{m\le x/d}\frac{\chi(m)}m \end{align}
dchrisum provides that $|\sum_{m\le z}\frac{\chi(m)}m-L_0|\le\frac Cz$ for some $C$ and all $z$. Then: \begin{align} |L_0\sum_{n\le x}\frac{\chi(n)\mu(n)}n| &\le1+|1-L_0\sum_{n\le x}\frac{\chi(n)\mu(n)}n|\\ &=1+|\sum_{d\le x}\frac{\chi(d)\mu(d)}{d}(\sum_{m\le x/d}\frac{\chi(m)}m-L_0)|\\ &\le1+\sum_{d\le x}\frac1d|\sum_{m\le x/d}\frac{\chi(m)}m-L_0|\\ &\le1+\sum_{d\le x}\frac Cx\le 1+C. \end{align} $$\tag*{\blacksquare}$$

We will be doing a lot of case analysis on $L_0=0$, so let the notation $\left[a\atop b\right]$ denote $$\left[a\atop b\right]=\begin{cases}a&L_0=0\\b&L_0\ne0\end{cases}.$$
Theorem (dchrvmasumif, Equation 9.4.24): $$\sum_{n\le x}\frac{\chi(n)\Lambda(n)}n=-\left[\log x\atop 0\right]+O(1).$$
Proof: Earlier, we showed that $$\sum_{n\le x}\frac{\chi(n)\Lambda(n)}n+\left[\log x\atop 0\right]= \sum_{d\le x}\frac{\chi(d)\mu(d)}{d}\sum_{m\le x/d}\frac{\chi(m)}{m}\log\left[x/d\atop m\right].$$ We will show that this latter sum is bounded. Let $g(x)=\sum_{n\le x}\frac{\chi(n)}n\log\left[x\atop n\right]$, so that the sum in question is $\sum_{d\le x}\frac{\chi(d)\mu(d)}{d}g(x/d)$. Let us show that $$g(x)=\left[0\atop L_1\right]+O\left(\frac{\log x}x\right).$$ In the case $L_0=0$, we have $$g(x)=\sum_{n\le x}\frac{\chi(n)}n\log x=\left(L_0+O\left(\frac1x\right)\right)\log x=O\left(\frac{\log x}x\right),$$ and if $L_0\ne0$, we have $$g(x)=\sum_{n\le x}\frac{\chi(n)\log n}n=L_1+O\left(\frac{\log x}x\right).$$ Since $\left[0\atop L_1/L_0\right]L_0=\left[0\atop L_1\right]$ and $L_0\sum_{d\le x}\frac{\chi(d)\mu(d)}d=O(1)$, we also have $\left[0\atop L_1\right]\sum_{d\le x}\frac{\chi(d)\mu(d)}d=O(1)$. \begin{align} \sum_{n\le x}\frac{\chi(n)\Lambda(n)}n+\left[\log x\atop 0\right] &=\sum_{d\le x}\frac{\chi(d)\mu(d)}dg(x/d)\\ &=O(1)+\sum_{d\le x}\frac{\chi(d)\mu(d)}d\left(g(x/d)-\left[0\atop L_1\right]\right) \end{align}
Now suppose the big-O constants take $x\ge A\implies \left|g(x)-\left[0\atop L_1\right]\right|\le C\frac{\log x}x$, with $A\ge1$, and set $D=\sum_{n<A}\frac{\log A}n$. Then $x<A$ implies \begin{align} |g(x)|&\le\sum_{n\le x}\frac1n\left|\log\left[x\atop n\right]\right|\\ &\le\sum_{n\le x}\frac1n\log A\le\sum_{n<A}\frac1n\log A=D\\ \end{align} Then: \begin{align} \left|\sum_{d\le x}\frac{\chi(d)\mu(d)}d\left(g(x/d)-\left[0\atop L_1\right]\right)\right| &\le\sum_{d\le x}\frac1d\left|g(x/d)-\left[0\atop L_1\right]\right|\\ &\le\sum_{d\le x/A}\frac Cd\frac{\log(x/d)}{x/d}+\sum_{x/A<d\le x}\frac Dd\\ &\le\frac Cx\sum_{d\le x}(\log x-\log d)+D(\log x-\log(x/A)+1)\\ &\le\frac Cx(x\log x-\log\lfloor x\rfloor!)+D(\log A+1) \end{align} which is bounded using $\log\lfloor x\rfloor!=x\log x+O(x)$. $$\tag*{\blacksquare}$$

We're almost there: if we can prove $L_0\ne0$ unconditionally, dchrvmasumif will give us what we want. Replay the derivation of rpvmasum, with $A=1$ so that the $\bar\chi(A)$ factors disappear: \begin{align} \sum_{\substack{n\le x\\n\equiv 1}}\frac{\Lambda(n)}n &=\sum_{n\le x}\frac{\chi_0(n)\Lambda(n)}n+\sum_{\chi\ne\chi_0}\sum_{n\le x}\frac{\chi(n)\Lambda(n)}n\\ &=(\log x+O(1))+\sum_{\chi\ne\chi_0}(-\left[\log x\atop 0\right]+O(1))\\ &=(1-|W|)\log x+O(1), \end{align} where $W$ is the set of all $\chi\ne\chi_0$ such that $L_0(\chi)\ne 0$. Since $\sum_{n\le x,n\equiv 1}\frac{\Lambda(n)}n$ is nonnegative, we have a contradiction if $|W|>1$ (because the $(1-|W|)\log x+O(1)$ will eventually be negative). Thus there is at most one $\chi$ such that $L_0(\chi)=0$. Now fix $\chi$ again and assume $L_0=0$. Because $\bar\chi$ is also a Dirichlet character, and $L_0(\bar\chi)=\overline{L_0(\chi)}=0$, we must have $\bar\chi=\chi$, so $\chi$ is a real Dirichlet character (dchrisum0re).

Now let $F(n)=\sum_{d\mid n}\chi(d)$. Because a divisor sum of a multiplicative function is multiplicative (fsumdvdsmul), $F$ is multiplicative (dchrisum0fmul). (Unlike $\chi$ it is not completely multiplicative.)

Theorem (dchrisum0flb, Equation 9.4.29): $$F(n)\ge\begin{cases}1&\sqrt n\in\Bbb N\\0&o.w.\end{cases}.$$
Proof: We begin by establishing the theorem for prime powers $p^\alpha$, $\alpha\ge0$. In this case $\sqrt{p^\alpha}\in\Bbb N$ iff $\alpha$ is even. Now $$F(p^\alpha)=\sum_{d\mid p^\alpha}\chi(d)=\sum_{k\le\alpha}\chi(p^k)=\sum_{k\le\alpha}\chi(p)^k,$$ and since $\chi$ is a real Dirichlet character it can only take the values $-1,0,1$. If $\chi(p)=1$, then $F(p^\alpha)=\alpha+1\ge1$; if $\chi(p)=0$, then $F(p^\alpha)=1\ge1$; and if $\chi(p)=-1$ then $$F(p^\alpha)=\sum_{k\le\alpha}(-1)^k=\frac{1+(-1)^{\alpha}}2$$ which is $1$ when $\alpha$ is even and $0$ if it is odd. Thus in any case the theorem is proven for prime powers.

Now to cover the general case, by complete induction. If $n=1$ then it is a prime power so the theorem is proven; otherwise let $p$ be a prime divisor of $n$, and let $n=p^\alpha k$ where $p\nmid k$, so $F(n)=F(p^\alpha)F(k)$. We have proven that $F(p^\alpha)$ satisfies the theorem, and $F(k)$ satisfies the theorem by the induction hypothesis, because $k<n$. The product of two nonnegatives is nonnegative, so $F(n)\ge0$. If $n$ is a square then $\nu_p(n)=\alpha$ is even, so $p^\alpha$ is a square, and so $k$ is a quotient of squares, which is thus also a square. Thus $F(p^\alpha),F(k)\ge1$, so $F(n)\ge1$. $$\tag*{\blacksquare}$$

Therefore, $$\sum_{n\le x}\frac{F(n)}{\sqrt n}\ge\sum_{m^2\le x}\frac{F(m^2)}{m}\ge\sum_{m\le\sqrt x}\frac1m\ge\frac12\log x,$$ so $\sum_{n\le x}\frac{F(n)}{\sqrt n}$ is unbounded.

Theorem (dchrisumn0, Lemma 9.4.4): $L_0\ne0$.
Proof: We will show that $L_0=0$ implies $\sum_{n\le x}\frac{F(n)}{\sqrt n}$ is bounded. We have: \begin{align} \sum_{n\le x}\frac{F(n)}{\sqrt n} &=\sum_{n\le x}\sum_{d\mid n}\frac{\chi(d)}{\sqrt n}\\ &=\sum_{d\le x}\sum_{m\le x/d}\frac{\chi(d)}{\sqrt{dm}}\\ &=\sum_{d\le\sqrt x}\sum_{m\le x/d}\frac{\chi(d)}{\sqrt{dm}} +\sum_{m\le\sqrt x}\sum_{\sqrt x<d\le x/m}\frac{\chi(d)}{\sqrt{dm}}\\ &=\sum_{d\le y}\sum_{m\le y^2/d}\frac{\chi(d)}{\sqrt{dm}} +\sum_{m\le y}\sum_{y<d\le y^2/m}\frac{\chi(d)}{\sqrt{dm}} \end{align} (changing variables $y\mapsto\sqrt x$ in the last step). Using dchrisum once more, we obtain $L_2$ such that $|\sum_{n\le x}\frac{\chi(d)}{\sqrt d}-L_2|\le\frac C{\sqrt x}$. Then \begin{align} |\sum_{m\le y}\sum_{y<d\le y^2/m}\frac{\chi(d)}{\sqrt{dm}}| &\le\frac{2C}{\sqrt y}\sum_{m\le y}\frac1{\sqrt m}. \end{align} We need here the asymptotic estimate divsqrsum for $\sum_{m\le x}\frac1{\sqrt m}=2\sqrt x+c+O(x^{-1/2})$, so that we get $$|\sum_{m\le y}\sum_{y<d\le y^2/m}\frac{\chi(d)}{\sqrt{dm}}|\le4C+O(\frac1{\sqrt y})=O(1).$$

For the other sum, we can write \begin{align} \sum_{d\le y}\frac{\chi(d)}{\sqrt d}\sum_{m\le y^2/d}\frac1{\sqrt m} &=\sum_{d\le y}\frac{\chi(d)}{\sqrt d}\left(2\frac{y}{\sqrt d}+c+O(\frac{\sqrt d}{y})\right)\\ &=2y\sum_{d\le y}\frac{\chi(d)}{d}+c\sum_{d\le y}\frac{\chi(d)}{\sqrt d}+O(1)\\ &=2y(L_0+O(\frac1y))+c(L_2+O(\frac1{\sqrt y}))+O(1)=2yL_0+O(1)=O(1). \end{align}
Thus $\sum_{n\le x}\frac{F(n)}{\sqrt n}=O(1)$, a contradiction. Thus our assumption $L_0=0$ was false, so $L_0\ne0$. $$\tag*{\blacksquare}$$ Then dchrvmasumif simplifies to dchrvmasum, and we are done. $$\tag*{\blacksquare}$$

Wednesday, September 24, 2014

Mathbox update: Functions of Number Theory

I should probably apologize for having nothing but mathbox updates on this blog of late, but this is the only type of post I release regularly, in conjunction with the actual theorems being uploaded to the Metamath site. (I try to make up for it by making frequent, hopefully enlightening remarks in the list below so it's not just a list.) This update is primarily concerned with preparations for an eventual proof of the Prime Number Theorem, and actual results include the Chebyshev bound on the prime pi function $a\frac x{\log x}\le\pi(x)\le A\frac x{\log x}$, which is a weak form of the PNT (which shows that this is true for any $a<1<A$). We also define several number-theoretic functions, and formalize the notions $f\to a$ and $f\in O(1)$ for real functions (we already have a limit notion $(a_n)\to a$, but it only works on sequences).
• df-rlim: Define the limit predicate $f(x)\to a$ as $x\to\infty$. It is defined so that it will work on sequences defined on an upper integer set or on real functions; we say that $f\to a$ if $f$ is a partial function $\Bbb R\to\Bbb C$ and for all $\epsilon>0$ there exists an $M\in\Bbb R$ such that for all $x\ge M$ in the domain of $f$, $|f(x)-a|<\epsilon$. Note that if the domain of $f$ is upper bounded, then the rest is vacuously true, so $f\to a$ for every $a\in\Bbb C$ in this case. (But if the domain of $f$ is not upper bounded, then $f\to a$ for at most one $a$, as we expect.)
• df-o1: Define the set of eventually bounded functions, that is, $f\in O(1)$ iff there exists an $c,M$ such that $|f(x)|\le M$ for all $x\ge c$. The notation $O(1)$ is meant to be suggestive of Bachmann-Landau notation, and indeed it matches the notation for $f=O(1)$ (although of course we use $\in$ in place of the misleading $=$ in big-O notation). However, we do not define the complete hierarchy of notations, as $O(1)$ and the $\to 0$ predicate of df-rlim is sufficient to unify all of these, in the following manner:
$$f(x)=O(g(x))\iff g(x)=\Omega(f(x))\iff\frac{f(x)}{g(x)}\in O(1)\\ f(x)=o(g(x))\iff g(x)=\omega(f(x))\iff\frac{f(x)}{g(x)}\to0\\ f(x)=\Theta(g(x))\iff\frac{f(x)}{g(x)},\frac{g(x)}{f(x)}\in O(1)\\ f(x)\sim g(x)\iff\frac{f(x)}{g(x)}\to1$$
Thus chebbnd1, chebbnd2 together make the statement $\pi(x)\in\Theta(x/\log x)$.

Since usually the functions $g(x)$ we use in this sense are eventually non-zero, the equivalences above are exact. For functions that are not eventually non-zero, the usual definition of the statement $f(x)=O(g(x))$ also requires that $f(x)=0$ wherever $g(x)=0$ for sufficiently large $x$, while $f/g\in O(1)$ leaves $f$ unconstrained whenever $g(x)=0$ because in these cases $f(x)/g(x)$ is undefined. I don't see that this edge case is very useful, though, so the discrepancy is not enough to be relevant. (If anyone has a good example of a use for the notation $f(x)=O(g(x))$ or $f(x)=o(g(x))$ where $g$ is not eventually nonzero, I'm all ears.)
• o1const, rlimconst: The constant function $f(x)=a$ satisfies $f\in O(1)$ and $f\to a$.
• rlimclim: For functions $f:\Bbb Z^{\ge n}\to\Bbb C$, $f\to_{\Bbb Z}a\iff f\to_r a$, i.e. the old $\Bbb Z$-limit predicate df-clim and the new df-rlim are equivalent. Since df-clim applies to non-functions, but $f\to_{\Bbb Z} a\iff (x\in\Bbb N\mapsto f(x))\to_{\Bbb Z} a$, i.e. we can replace $f$ with a function that takes the same values and has the same behavior under $\to_{\Bbb Z}$, it follows that df-rlim is a generalization of df-clim.
• rlimres, o1res: If $f\to a$, then $f\upharpoonright X\to a$, that is, $f$ restricted to any subset $X$ still has limit $a$. Similarly, $f\in O(1)$ implies $f\upharpoonright X\in O(1)$. The unconstrained nature of this restriction may seem peculiar, since then it seems that even the empty function has limit $a$, but considering the note about upper bounded domain implying that $f$ takes every value as a limit, this should be no surprise.
• rlimresb: If $X=[b,\infty)$ for some $b\in\Bbb R$, then $f\to a\iff f\upharpoonright X\to a$. In other words, the limit of $f$ is not affected by its values on any upper bounded set.
• rlimuni: Since as pointed out above, $f$ often fails to have a unique limit, we may ask under what conditions we are ensured that if a limit exists it is unique. The correct answer is that the limit is unique iff the domain of $f$ has no upper bound, but here we prove the weaker assertion (which is sufficient for most applications) that the limit is unique if the domain of $f$ contains an upper integer set.
• rlimcn1, rlimcn2: One of the most important properties of continuous functions (almost a definitional property) is that it preserves limits, that is, if $f$ is continuous and $g\to a$ then $f\circ g\to f(a)$. Here we derive this for one and two-dimensional continuous functions. (This is a bit early in the development to be worrying about two-dimensional continuity, but we need it for the important continuous functions $+,-,\cdot$.)
• reccn2: Previously a lemma for the much later divcn, here we show that the reciprocal function is continuous (away from zero).
• o1of2: The property that characterizes $O(1)$-preserving functions is a bit different from continuity, and I don't know if it has a name: For all $\delta\in\Bbb R$, there exists an $\epsilon$ such that for all $x$, $|x|\le\delta\to|f(x)|\le\epsilon$. (Note that the roles of $\delta$ and $\epsilon$ is reversed from the usual definition of a limit.) The most nontrivial example of this I have seen so far is the function $f(x)=x^a$ where $a\in\Bbb C$ with $\Re a\ge0$; in this case we have the complicated bound $|x|\le\delta\to|x^a|\le\delta^{\Re a}e^{|\Im a|\pi}$. Note that $x^a$ is not even a continuous function due to the branch cut. I believe that any continuous function is also $O(1)$-preserving, but the proof seems to need the extreme value theorem and so I would not call it an "easy" theorem. $+,-,\cdot$ are all $O(1)$-preserving, of course (o1add, o1sub, o1mul), but division is not.
• rlimo1: If $f\to a$, then $f\in O(1)$. This encompasses the rules
• $f(x)=o(g(x))\implies f(x)=O(g(x))$,
• $f(x)\sim g(x)\implies f(x)=O(g(x))$
as well as a few others. Similarly, the rule $f,g\in O(1)\implies f\cdot g\in O(1)$ encompasses the rules
• $f(x)=O(g(x)),\, g(x)=O(h(x))\implies f(x)=O(h(x)),$
• $f_1(x)=O(g_1(x)),\, f_2(x)=O(g_2(x))\implies f_1(x)f_2(x)=O(g_1(x)g_2(x)),$
• $f(x)=O(g(x))\implies cf(x)=O(g(x)),$
and the sum rule implies $f_1(x),f_2(x)\in O(g(x))\implies f_1(x)+f_2(x)\in O(g(x))$. This phenomenon where many rules are expressed as special cases of one rule when written using a more restricted notation is why I support the approach taken in Metamath where multiple notations for one concept are eliminated as much as possible. For example, in Metamath there is a less-than sign $<$, but no greater-than sign, because any formula of the form $y>x$ can be rewritten as $x<y$ and it means that we don't need separate theorems dealing with transitivity of greater-than as well as those with less-than. (Even the use of $x\le y$ instead of $\lnot(y<x)$ is a controlled concession, although in this case it is used in enough contexts and helps in reading enough that it is considered worthwhile to define.)
• o1rlimmul: If $f\in O(1)$ and $g\to0$, then $f\cdot g\to0$. This expresses the rule $f(x)=O(g(x)),g(x)=o(h(x))\implies f(x)=o(h(x))$. Of course this is not true for limits other than $0$; if $f\in O(1)$ and $g\to a\ne0$, then $f\cdot g$ may not have a limit at all. It is still bounded, though; but this already follows from rlimo1 and o1mul.
• rlimadd, rlimsub, rlimmul, rlimdiv: I'm sure at this point it is no surprise that we have the theorems $f\to a,g\to b$ implies $f+g\to a+b$,$f-g\to a-b$,$f\cdot g\to a\cdot b$, and (if $g(x)\ne 0$ and $b\ne0$) $f/g\to a/b$. The last one is new, however - we had no $\to_{\Bbb Z}$-equivalent for division of limits before this point.
• divrcnv: The most basic nontrivial convergent function: if $f(x)=\frac ax$, then $f\to0$.

• logimcl: The imaginary part of $\log x$ is defined to satisfy $-\pi<\Im[\log x]\le\pi$. This, and the property $e^{\log x}=x$ (eflog), is sufficient to uniquely define the number $\log x$, and is generally needed when making arguments involving the complex logarithm function (the situation is a bit simpler for the real logarithm, which has no branch cut issues).
• lognegb: Any number with argument $\pi$ (recall that $\operatorname{Arg}z:=\Im[\log z]$) is a negative real number and vice-versa.
• relog: The dual equation to absef, which is nice for having no branch cut restrictions, unlike many other $\log$ formulas: $\Re[\log z]=\log|z|$.
• rplogcl: Common special case of logltb: if $x>1$, then $\log x>0$.
• angval: Define a function $\angle(x,y)$ which measures the angle of two vectors in the plane, using the complex numbers as a convenient proxy for $\Bbb R^2$. We define $\angle(x,y)=\operatorname{Arg}(y/x)=\Im[\log(y/x)]$. Note that $x,y$ are nonzero complex numbers, not real numbers. This measures the angle from the line $\overline{0x}$ to $\overline{0y}$, as a number in $(-\pi,\pi]$. A positive number indicates that $y$ is counterclockwise ahead of $x$, and the function is antisymmetric as long as $x,y$ are not $180^\circ$ from each other. For example, $\angle(1,1)=0$, $\angle(1,i)=\angle(i,-1)=90^\circ$, $\angle(i,1)=-90^\circ$. (I'm using degrees because I want to treat this as a tool for geometry, not complex analysis.)
• ang180: Let $a,b,c$ be three distinct points in the plane. Then they form the vertices of a (possibly degenerate) triangle, and the interior angles of the triangle are respectively $\angle(c-b,a-b)$, $\angle(a-c,b-c)$, $\angle(b-a,c-a)$. Under these circumstances, this theorem states that the sum of these angles is $\pm180^\circ$, where the sign choice depends on whether the three points are labeled in clockwise or counterclockwise order. This is actually one of the Metamath 100 theorems, so now this makes the total count 38 theorems. The proof is more intricate than I expected it to be, although it's not too hard to understand at a high level.
• abscxp, abscxp2, abscxpbnd: I was disappointed to discover there is no nice formula for $|a^b|$ in the general case. If $a$ is positive real, then $|a^b|=a^{\Re b}$; and if $b$ is real, then $|a^b|=|a|^b$. If both $a,b$ are complex, then $|a^b|=|a|^{\Re b}e^{-\operatorname{Arg}a\cdot\Im b}$; as a bound we can simplify this to $|a^b|\le |a|^{\Re b}e^{|\Im b|\pi}$ or $|a^b|\le |a|^{\Re b}e^{|b|\pi}$ (which is the form we quoted earlier).
• cxplim: In this section we establish the basic facts about the order of growth of powers, logarithms, and exponentials. If $a<0$ then $x^a\to0$ as $x\to\infty$.
• rlimcxp, o1cxp: If $f(x)\to0$ and $a>0$, then $f(x)^a\to0$, and similarly $f(x)\in O(1)$ and $\Re a\ge0$ imply $f(x)^a\in O(1)$.
• cxp2limlem: If $b>1$ then $x\in o(b^x)$, i.e. $x/b^x\to 0$.
• cxp2lim: If $a\in\Bbb R$ and $b>1$ then $x^a\in o(b^x)$, i.e. $x^a/b^x\to 0$. In words, any increasing exponential function eventually dominates every polynomial.
• cxploglim: If $a>0$, then $\log x\in o(x^a)$ - a logarithm grows slower than any polynomial.
• df-cht: Define the first Chebyshev function, $\theta(x)=\sum_{p\le x}\log p$, where the sum is only over primes.
• df-ppi: Define the prime $\pi$ function, $\pi(x)=\sum_{p\le x}1=|[0,x]\cap\Bbb P|$.
• df-mmu: Define the Möbius function, $\mu(x)=0$ if $x$ is not squarefree and $\mu(x)=(-1)^n$ where $n$ is the number of prime factors of $x$ otherwise.
• df-sgm: Define the divisor function, $\sigma(x)=$ the number of divisors of $x$.
• efchtcl: The Chebyshev function is closed in the log-integers, i.e. $e^{\theta(x)}\in\Bbb N$. (This is one of the most unusual sets I have applied fsumcllem to, showing that the set $\{x\in\Bbb R:e^x\in\Bbb N\}$ is closed under addition and zero as a consequence of the natural numbers being closed under multiplication and one.)
• efchtdvds: The exponential Chebyshev function is actually the same as the primorial, $e^{\theta(x)}=x\#=\prod_{p\le x}p$, so it's not surprising that it shares all the properties of the primorial, like this one, that says that $e^{\theta(x)}|e^{\theta(y)}$ whenever $x\le y$.
• ppifl: The Chebyshev and prime $\pi$ functions are actually functions on the reals, not the integers; we extend them to the reals by means of this theorem, $\pi(x)=\pi(\lfloor x\rfloor)$.
• ppip1le: A local version of ppiltx, showing that $\pi(x+1)\le\pi(x)+1$. The obvious extension of this formula, $\pi(x+n)\le\pi(x)+n$, is only true when $n$ is an integer, because if it was true for all $n\in\Bbb R$ then $\pi$ would have to be continuous, and of course it has jump discontinuities at the primes.
• ppiltx: A simple cardinality argument shows $\pi(x)<x$.
• prmorcht: As remarked earlier, the primorial is the same as the exponential of the Chebyshev function, $e^{\theta(x)}=x\#$. Later, in the proof of bpos, we will make a careful analysis of the primorial function in order to establish a bound; now this proof has been converted to show a bound on $\theta(x)$ instead.
• ppiub: This theorem used to be called prmpiub, and stated as $|\Bbb P\cap\{1\dots n\}|\le\frac n3+2$ and used as part of the proof of bpos. Now that we have defined a function for this, we can just write $\pi(x)\le\frac x3+2$, although the content hasn't changed much beyond being extended to the reals.
• chtleppi: A simple sum inequality shows $$\theta(x)=\sum_{p\le x}\log p\le\sum_{p\le x}\log x=\pi(x)\log x.$$
• chtub: Previously called prmorub and stated as $\prod_{p\le n}p<2^{2n-3}$, now we can write $\theta(x)<(2x-3)\log 2$ and extend the domain to all reals greater than $2$.
• chebbnd1: The largest original contribution in this update aside from ang180, we show that $\pi(x)\in\Omega(x/\log x)$ by borrowing a lot of lemmas that were used for bpos. We have $4^n/n<{2n\choose n}=\prod_pp^{e_p}$, where the exponents $e_p=\#_p{2n\choose n}$ of the product all satisfy $p^{e_p}\le 2n$, by bposlem1. Then no prime larger than $2n$ enters the product, and each term is bounded, so the largest that sum can be is $(2n)^{\pi(2n)}$, and after taking logs you get $\pi(2n)\frac{\log(2n)}{2n}>\frac{\log 4}2-\frac{\log n}{2n}\ge\frac{\log 4}2-\frac1{2e}$, and with a little relaxing of the constants you can extend this to all real numbers, so in the end you get that $\pi(x)\in\Omega(x/\log x)$. The other inequality goes via $\theta(x)$, as we will see.
• chtppilim: Here we show that $\theta(x)\sim\pi(x)\log x$. One inequality, chtleppi, is easy and mentioned above; the other direction fixes $0<a<1$ and uses the trick $$\theta(x)\ge\sum_{x^a<p\le x}\log p\ge\sum_{x^a<p\le x}a\log x\ge a\log x(\pi(x)-x^a),$$ and since $x^a\in o(x/\log x)$ and $x/\log x\in O(\pi(x))$, we can make $x^a\le (1-a)\pi(x)$ and then $\theta(x)\ge a^2\log x\,\pi(x)$; now since $a<1$ was arbitrary, we have the desired statement $\frac{\theta(x)}{\pi(x)\log x}\to1$.
• chto1ub: A simple consequence of chtub: Since $\theta(x)<(2x-3)\log 2<2x\log 2$, it immediately follows that $\theta(x)\in O(x)$. The PNT will strengthen this to $\theta(x)/x\to1$.
• chebbnd2: The other direction of chebbnd1, showing $\pi(x)\in O(x/\log x)$, by leveraging chto1ub and chtppilim: since $\theta(x)\in O(x)$ and $\theta(x)\sim\pi(x)\log x$, $\pi(x)\log x\in O(x)$, or $\pi(x)\in O(x/\log x)$.

Monday, September 15, 2014

Mathbox update: Fundamental Theorem of Algebra

The main theorems in this short update are Van der Waerden's theorem (already discussed in a previous post) and the Fundamental Theorem of Algebra. This last one is one of the metamath 100, and the proof was surprisingly straightforward. The hardest part was the first step, to show that a closed disk in $\Bbb C$ is compact via the Heine-Borel theorem; then it is just a matter of verifying some explicit bounds. By the way, this proof is based on the first "topological proof" on Wikipedia.
• vdw: Van der Waerden's theorem: for any $k$ and finite set $R$ of colors, there is an $N$ such that any $R$-coloring of $\{1\dots N\}$ contains a length-$k$ monochromatic arithmetic progression. See also the previous post I wrote on this theorem.
• vdwnn: Infinitary Van der Waerden's theorem: for any finite coloring of $\Bbb N$, there is a color that contains arbitrarily long arithmetic progressions.
• txcmp, txcmpb: The topological product of two nonempty spaces is compact iff both factors are compact.
• hmeores: The restriction of a homeomorphism of topological spaces is a homeomorphism on the induced subspace topologies.
• cnheibor: Heine-Borel theorem for the complex numbers: A subset of $\Bbb C$ is compact iff it is closed and bounded. Note that we already have the more general heibor (a metric space is compact iff it is complete and totally bounded), but this direct proof avoids the axiom of countable choice.
• dgrco: The degree of a composition of polynomials is the product of the degrees.
• fta: The Fundamental Theorem of Algebra: Every non-constant polynomial has a root.

Thursday, September 11, 2014

Mathbox update: Ostrowski's theorem

Hello all,

This update has two main foci: the rounding off of the integration theorems with the integration by parts and integration by substitution formulas, and an unrelated work to formalize the concept of an absolute value over a ring (in the algebra sense) and prove Ostrowski's theorem. I dare say that the real analysis section is finally "open for business" now that most of the workhorse theorems are now available for use. On to the list:
• suplub2, suprleub, infmxrgelb, ...: All of the supremum theorems of the form $\forall x \in A: x\le B\to\sup A\le B$ or $B<\sup A\to\exists x \in A: B<x$ have now been made bidirectional. I find this interesting, because it means that one can characterize "less than supremum" exactly in terms of a conjunction of individual inequalities. Another way to think about it is to consider the sets $(-\infty,x)$ for $x\in\Bbb R$. These are open sets, so any union of them is also open; moreover, they are all convex and unbounded below, and it's not hard to see that such sets are also closed under unions. Thus any nonempty union $\bigcup_{x\in A}(-\infty,x)$ is an open, convex, unbounded-below set, and the only such sets on the reals are those of the form $(\infty,x)$ or $(-\infty,\infty)$, since $\Bbb R$ is complete. If $A$ is bounded above, then the last case is not possible, and so we can define $\sup A$ as the upper endpoint of the union: $\bigcup_{x\in A}(-\infty,x)=(-\infty,\sup A)$.

Note that a similar characterization with less-or-equal is not possible, because the infinite union of closed intervals is not necessarily closed.
• limsupval, etc.: The definition of the limit supremum already existed, but there were no theorems for working with it beyond the bare minimum. I added the basics, with the most nontrivial theorems being limsupval2 (the limsup is not affected by any finite prefix) and limsupgre (if a sequence of real numbers has a bounded limit supremum, then the quantities $\sup_{n>k}f(n)$ appearing in the definition of the limsup are all bounded as well).
• caurcvg: The original proof showed that if $f$ is a Cauchy sequence, then it converges to the supremum of the set $\{x:\exists j\forall k\ge j:x<f(k)\}$. Since this set is so similar to the limit supremum set, I modified the proof to show that it converges to the limit supremum instead (which then yields the corollary that any convergent sequence converges to the limit supremum of the sequence).
• pcqmul, pczdvds, pcid: generalize some proofs about the prime count function from integers to rational numbers
• pcadd: To show that $p^{-\#_p(r)}$ is an ultrametric absolute value on the rational numbers, you need two main facts about the prime count function $\#_p(r)$ (which is defined so that $p^{\#_p(r)}$ is the highest power of $p$ that divides the integer or rational number $r$): the multiplicative property pcqmul ($\#_p(ab)=\#_p(a)\#_p(b)$), and this theorem, which says that $\#_p(a+b)\ge \min(\#_p(a),\#_p(b))$. When you exponentiate both sides, this gives $p^{-\#_p(a+b)}\le\max(p^{-\#_p(a)},p^{-\#_p(b)})$, which is the ultrametric inequality. The proof is not as easy as you might think.
• df-abv: Given a ring $R$, an absolute value on $R$ is a function $f:R\to\Bbb R$ such that for all $x,y\in R$:
• $f(x)\ge 0$
• $f(x)=0\iff x=0_R$
• $f(x\cdot_R y)=f(x)f(y)$
• $f(x+_R y)\le f(x)+f(y)$
This definition is of course inspired by the standard absolute value function $|x|$ on $\Bbb C$; and all except the third axiom are very reminiscent of the analogous axioms for a metric. (Below I will denote an absolute value using the $|x|$ notation as is customary.) Other interesting simple properties of absolute values include $|0|=0$, $|1|=1$ (unless $1=0$), $\left|-x\right|=|x|$, and $|x/y|=|x|/|y|$.
• abvdom: If $R$ is a ring with an absolute value, then $R$ is actually a domain, which is to say, $xy=0$ implies $x=0\vee y=0$.
• abvtriv: As a converse to abvdom, if $R$ is a domain, then it has an absolute value, namely the trivial absolute value defined by $|x|=0$ if $x=0$ and $|x|=1$ otherwise. (Since we don't have a good notation for domain, we strengthen the hypotheses to say that $R$ is a division ring instead. However, this theorem now provides the alternative notation ${\rm AbsVal}(R)\ne\emptyset$ in order to specify that $R$ is a non-commutative domain.)
• cncfmpt1f, cncfmpt2f: These theorems are the analogues of cnmpt11f and cnmpt12f, respectively, specialized to continuous functions on the complex numbers (rather than continuous functions in arbitrary topological spaces). We still need the topological continuity predicate for the two-arg version, though, because the version on $\Bbb C$ is designed only for functions $f:\Bbb C\to\Bbb C$ in one variable, which misses the common use-case $h(x)=R(f(x),g(x))$ where $R:\Bbb C\times\Bbb C\to\Bbb C$ is continuous in the product topology on $\Bbb C\times\Bbb C$ (particularly when $R$ is one of $+,\cdot,-$).
• mbfsub: The difference of two measurable functions is measurable.
• mbfsup, mbfinf, mbflimsup, mbflim: The supremum, infimum, limit supremum, and limit of a sequence of measurable functions is measurable. The main reason for developing limsup in the first place is because it allows a limit to be written in terms of a supremum and infimum, which are the basic measurability-preserving operations we have to work with.
• mbfi1flim: A real measurable function is the pointwise limit of a sequence of simple functions. (The restriction to real functions is only because in our definition simple functions have to be real, so that they can be compared.)
• mbfmul: The pointwise product of two measurable functions is measurable. The proof is very general; you just take two sequences of simple functions converging to $f$ and $g$ and multiply them pointwise to get a sequence of simple functions that converges to $fg$; then $fg$ is a limit of simple functions and so is measurable. I think it could easily be adapted to show that $h(f(x),g(x))$ is a measurable function whenever $h$ is jointly continuous and $f,g$ are measurable.
• bddmulibl, bddibl: A bounded function is integrable, and a bounded function times an integrable function is also integrable. (This also yields a shorter proof for cniccibl.)
• dvcn: A function differentiable on its entire domain is continuous.
• ftc2ditg: Restate ftc2 for directed integrals, so that we can drop the ordering of the endpoints.
• itgparts: Integration by parts. If $A:[x,y]\to\Bbb C$ and $C:[x,y]\to\Bbb C$ are continuous functions, $A':(x,y)\to\Bbb C$ and $C':(x,y)\to\Bbb C$ are defined and continuous (so $A,C$ are differentiable on $(x,y)$), and $A\cdot C'$ and $A'\cdot C$ are integrable on $[x,y]$, then $$\int_x^yA(t)C'(t)\,dt=(A(y)C(y)-A(x)C(x))-\int_x^yA'(t)C(t)\,dt.$$
• itgsubst: Integration by $u$-substitution. If $I$ is an open interval, $A:[x,y]\to I$ and $C:I\to\Bbb C$ are continuous functions, and $A':(x,y)\to\Bbb R$ is continuous and integrable, then $$\int_{A(x)}^{A(y)}C(u)\,du=\int_x^yC(A(t))A'(t)\,dt.$$ When I was learning these theorems in school, I don't remember there being so many assumptions, but it just goes to show that it's easy to drop details in a regular math class if you're not being careful. These theorems are not as general as they could be, but usually making the hypotheses more general also means making them more complicated, and I don't want to go too far in that direction. (I certainly don't want to define absolute continuity, which has little use outside these theorems and the FTC.)
• cxpaddle: A tricky proof that $(a+b)^c\le a^c+b^c$ when $0<c\le1$.
• abvcxp: We're back to the absolute value after that integration interlude because this list is in database order and we had to wait for the logarithm and complex power functions to be defined. This theorem shows that if $|x|$ is an absolute value on $R$, then $|x|^c$ is as well, when $0<c\le 1$. For $c>1$, it usually is not because the triangle inequality is broken. (I have seen an alternative definition of absolute value with an auxiliary parameter, the "order" $c$ of the absolute value, such that for all $a,b$, $|a+b|\le c\max(|a|,|b|)$; then $|x|$ is a "triangular" absolute value iff $c\le 2$ and $|x|$ is an ultrametric absolute value iff $c=1$. Under this setup, if $|x|$ is an absolute value of order $n$, then $|x|^c$ is an absolute value of order $n^c$.)
• abvmet: I mentioned earlier that the axioms of an absolute value look a lot like the axioms of a metric; well, here we show the correspondence, where an absolute value $|x|$ induces a metric $d(x,y)=|x-y|$. In fact, in the proof we never use the third property abvmul; and if we drop that (and add $\left|-x\right|=|x|$ as an axiom, since this is no longer derivable) then there is no need to even have the multiplication operation, so we end up with a statement about pure groups. I don't even think they need to be abelian groups, since we never use that either.
• qrng and related theorems: Define the ring $\langle \Bbb Q,+,\cdot\rangle$, and show that it is actually a division ring, and the operations $+,\cdot,0,1,-,/$ map to their ring operation equivalents.
• qabvle: From here on, we are interested in absolute values on the ring of rational numbers. For brevity and unambiguity, I will denote the "traditional" absolute value function by $|x|_\infty$, and reserve $|x|$ for any absolute value under our new definition. Induct the triangle inequality to show that $|n|\le n$ for all $n\in\Bbb N_0$.
• ostthlem2: Any two absolute values that agree on the primes are equal.
• qabsabv: The regular absolute value $|x|_\infty$ is an absolute value on $\Bbb Q$.
• padicabv, padicabvf, padicabvcxp: The $p$-adic absolute value $|x|=n^{\#_p(x)}$ is an absolute value whenever $0<n<1$. We define "the" $p$-adic absolute value $|x|_p$ by setting $n=p^{-1}$; other choices of $n$ yield equivalent (off by an exponent) absolute values.
• ostth: Ostrowski's theorem. If $|x|$ is an absolute value, then (1) $|x|=|x|_0$ is trivial, that is, $|x|=1$ for all $x\ne0$; (2) $|x|=|x|_\infty^c$ for some $c\in(0,1]$, i.e. $|x|$ is "regular"; or (3) there is a prime $p$ such that $|x|=|x|_p^c$ for some $c\in(0,\infty)$, i.e. $|x|$ is equivalent to a $p$-adic absolute value. We separate the proof into three parts according to these cases. If there is some $n\in\Bbb N$ with $|n|>1$ then we are in case (2), otherwise if there is a prime $|p|$ such that $|p|<1$ then we are in case (3). If both of these are false then $|p|=1$ for each prime, so case (1) shows that $|x|=|x|_0$ is the trivial absolute value.
I will probably write a post later going into a bit more detail for this proof, which is quite involved. It's a shame it's not on the metamath 100 list - I think it's a pretty nice and notable theorem.

Wednesday, September 3, 2014

Mathbox update: The Fundamental Theorem of Calculus

Today I just put the finishing touches on a proof of the Fundamental Theorem of Calculus, which is the first theorem from the Formalizing 100 Theorems list to be proven since the creation of this blog, and has been the main reason for my recent fixation on real analysis and basic properties of integrals. Hopefully this means that integrals can now actually be useful tools for more interesting proofs rather than objects of study of themselves. There is also a proof of the Mean Value Theorem and Rolle's theorem, among other things. I may describe proof overviews for some of the theorems in this list in later posts; the theorems below that are proven on this blog are marked with an asterisk and linked at the end.

Note that almost all of these theorems were necessary in order to complete the proof; it is an interesting exercise to see how some of these seemingly unrelated theorems in totally different fields are used in the final proof of the FTC. (To track the usage of a theorem, use the "This theorem is referenced by:" list at the end of the proof page.)
• fofinf1o: Complementary to f1finf1o, this shows that a surjection from a finite set to another of the same size is a bijection (f1finf1o shows the same assuming an injection).
• wlogle: Inspired by a similar theorem in HOL Light (another formalization language), we show that if a predicate satisfies $\varphi(x,y)\leftrightarrow \varphi(y,x)$, i.e. it is symmetric under interchange of $x,y$, then "without loss of generality" we can assume that $x\le y$ in a proof that $\forall x,y:\varphi(x,y)$.
• ifle: A nice property of the conditional operator ${\rm if}(\varphi,x,y)$ (read "if $\varphi$ then $x$, else $y$", or in piecewise notation as $\begin{cases}x&\varphi\\y&\lnot\varphi\end{cases}$) is that it can transform an implication into an inequality. To be precise, if $y\le x$, then $\varphi\to\psi$ implies ${\rm if}(\varphi,x,y)\le{\rm if}(\psi,x,y)$ (and the converse is also true if $x<y$).
• cnrecnv, cnrehmeo: The canonical map $\Bbb R^2\to\Bbb C$ is the function $\langle x,y\rangle\mapsto x+iy$. We already know from cnref1o that this map is invertible; here we show that the inverse is $z\mapsto\langle\Re\,z,\Im\,z\rangle$. Since all the functions involved here are known to be continuous, this allows us to deduce that the map is a homeomorphism, and so $\Bbb R^2$ under the product topology is homeomorphic to $\Bbb C$ under the metric topology.
• max0add: The identity $f^++f^-=|f|$, where $f^+=\max(f,0)$ and $f^-=\max(-f,0)$ are the positive and negative part functions. See also max0sub which shows $f^+-f^-=f$.
• txbas2: If $B$ and $C$ are topology bases, then $\{u\times v\,|\,u\in B,v\in C\}$ is also a basis for a topology, where $u\times v$ is the cartesian product of sets. This generalizes txbas which assumes that $B$ and $C$ are topologies.
• txbasval: The topological product $\scr B\times\scr C$ is defined as the topology generated by the basis discussed in the last bullet, but it is only defined on topologies, not topological bases. Here we show that if $B$ is a basis for $\scr B$ and $C$ is a basis for $\scr C$, then $\{u\times v\,|\,u\in B,v\in C\}$ is a basis for $\scr B\times\scr C$. Note the difference from the definition, which defines $\scr B\times\scr C$ as the topology generated by $\{u\times v\,|\,u\in{\scr B},v\in{\scr C}\}$ - we see that it is sufficient to consider rectangles of basis elements and not topology elements.
• iccntr: The interior of the closed interval $[x,y]$ is the open interval $(x,y)$, in the standard topology on the reals.
• volsup2*, volcn*, volivth*: The intermediate value theorem for Lebesgue measure: Given a measurable set $A$, there is a measurable subset of $A$ of any measure $0\le x\le\operatorname{vol}(A)$.
• I changed the token for the set of all (Lebesgue) integrable functions from the made-up $f\in{\rm Itgbl}$ to $f\in L^1$, since the latter actually seems to be in use and so may be more understandable to the lay-mathematician.
• mbfimaopn, mbfimaopn2: The preimage of an open set under a measurable function is measurable. Note in particular that these are open sets in the complex topology, not the real topology; the proof essentially proceeds by showing that open rectangles with rational coordinates form a countable basis for the topology on $\Bbb C$.
• cncombf: The composition $g\circ f$ is measurable if $f$ is measurable and $g$ is continuous. This could be generalized to $g$ Borel-measurable, but I don't feel like defining the Borel
• itg2gt0*, itggt0*: If $\operatorname{vol}(A)>0$ and $f>0$ on $A$, then $\int_A f>0$. This is a nice proof that is discussed in a previous post.
• itg2cn, itgcn: Absolute continuity of the Lebesgue integral. If $f$ is integrable, then for any $\varepsilon>0$ there is a $\delta>0$ such that for any measurable set $A$ with $\operatorname{vol}(A)<\delta$, $\int_A|f|<\varepsilon$. Note that no assumptions on $f$ other than integrability are needed.
• itgeqa: If two functions are equal almost everywhere, then they have the same integrable and one is integrable iff the other is.
• itgss3: If $A\subseteq B$ and $B\setminus A$ is a nullset, then $f$ is integrable on $A$ iff it is integrable on $B$ and the integrals are the same.
• iblsub, itgsub: If $f,g$ are integrable, then $f-g$ is integrable and $\int(f-g)=\int f-\int g$.
• itgfsum: If $f_1,f_2,\dots f_n$ are integrable, then $\sum_{k=1}^nf_k$ is integrable and $\int_A\sum_{k=1}^nf_k=\sum_{k=1}^n\int_Af_k$.
• iblabs, iblabsr: If $f$ is a measurable function, then $f$ is integrable iff $|f|$ is integrable.
• iblmulc2, itgmulc2: If $f$ is integrable and $c\in\Bbb C$, then $c\cdot f$ is integrable and $\int_Ac\cdot f=c\int_Af$.
• itgabs: If $f$ is integrable, then $|\int_Af|\le\int_A|f|$. This is another nice proof that works on arbitrary complex functions $f$.
• dvcj: The derivative $(f(x)^*)'$ is $f'(x)^*$, where ${}^*$ denotes the conjugate function. This does not follow from dvcof (the chain rule) because ${}^*$ is not differentiable as a complex function and is just the identity as a real function.
• dvfre: The derivative of a real function is a real function.
• dvmptid, dvmptc, dvmptadd, $\dots$, dvmptco: These theorems are nothing new in terms of actual content, in the sense that we already have formulas for the derivative of a sum, product, etc.; rather these are designed to be easy to use in proofs so that one can quickly differentiate a function given by a formula, in a sense providing an "algorithm" for differentiation. Supported here are the rules: $$c'=0\quad x'=1\quad (f+g)'=f'+g'\quad (fg)'=f'g+g'f\quad (cf)'=cf'\quad (-f)'=-f'$$ $$(f-g)'=f'-g'\quad (f^*)'=f'^*\quad (\Re\,f)'=\Re\,f'\quad (\Im\,f)'=\Im\,f'$$ Also the chain rule $(f(g(x)))'=f'(g(x))g'(x)$ as well as restriction theorems: if $f:X\to\Bbb C$ is differentiable on $X$ and $Z\subseteq X$, $Y=Z^\circ$ is the interior of $Z$, then $(f\upharpoonright Z)'=f'\upharpoonright Y$.
• rolle: Rolle's theorem. If $f:[a,b]\to\Bbb R$ is continuous on $[a,b]$, differentiable on $(a,b)$, and $a<b$ and $f(a)=f(b)$, then there is an $x\in(a,b)$ such that $f'(x)=0$.
• mvth: The Mean Value Theorem. If $f:[a,b]\to\Bbb R$ is continuous on $[a,b]$, differentiable on $(a,b)$, and $a<b$, then there is an $x\in(a,b)$ such that $f'(x)=\frac{f(b)-f(a)}{b-a}$.
• dveq0: If $f:[a,b]\to\Bbb R$ is continuous on $[a,b]$ and differentiable with derivative $0$ on $(a,b)$, then $f$ is a constant function.
Finally we get to the main theorem. Recall, though, that the "fundamental theorem" is really several separate but related theorems; we state them here.
• ftc1a: Let $f:D\to\Bbb C$ be an integrable function, and $(a,b)\subseteq D\subseteq\Bbb R$ with $a\le b$. Define $G:[a,b]\to\Bbb C$ by $G(x)=\int_a^xf(t)\,dt$. Then $G$ is a continuous function on $[a,b]$.
• ftc1: Now assume that $f$ is also continuous at some $c\in(a,b)$. Then $G$ is differentiable at $c$, and $G'(c)=f(c)$.
• ftc1cn: Simplifying the assumptions, if $f:(a,b)\to\Bbb C$ is continuous and integrable, then $G'=f$.
• ftc2: Let $f:[a,b]\to\Bbb C$ be a function differentiable on $(a,b)$ such that $f'$ is integrable and continuous. Then $\int_a^bf'=f(b)-f(a)$.
If you are familiar with the theorems, you'll see I kinda copped out on the last statement; the assumption that $f'$ is continuous is unnecessary but makes the proof a lot easier. I found a proof in Rudin that uses upper-semicontinuous functions via the Vitali–Carathéodory theorem, but I have not formalized the notion of semicontinuity or the Sorgenfrey line at all, so that seemed like a pain. Maybe I'll revisit it later if it becomes necessary to strengthen the theorem.

This will be theorem #15 of 100 at Formalizing 100 Theorems; as of now there are thus 35 theorems formalized in Metamath. You can see the others at the metamath 100 page.

* These theorems were discussed with natural-language proofs in integral of a positive function.

Sunday, August 31, 2014

Integral of a positive function is positive

Here is a nice theorem that is not as simple as the statement would have you believe:

Theorem (itggt0): If $f$ is integrable and $A$ is a measurable set such that $0<\DeclareMathOperator{\vol}{vol}\vol(A)\le\infty$ and $f(x)>0$ for all $x\in A$, then $\int_A f(x)\,dx>0$.

The analogous theorem with $\le$ is trivial (itgge0), but this theorem requires some of the countable additivity properties of the volume function. We postpone the proof until we have proven the needed lemmas. First, we establish what I will call, for lack of a better name, the Intermediate Value Theorem for the Lebesgue volume function.

Theorem (volivth): If $A$ is a measurable set and $0\le x\le\vol(A)$, then there is a measurable $B\subseteq A$ such that $\vol(B)=x$. (Note that $\vol(A)$ is allowed to be infinite.)

Proof: If $x=\vol(A)$, then $B=A$ satisfies the requirements, so assume instead that $x<\vol(A)$. Consider the sequence $A\cap[-n,n]$ for $n\in\Bbb N$. This is an increasing sequence of measurable sets, so by volsup we have $\sup_{n\in\Bbb N}\vol(A\cap[-n,n])=\vol(A)$. Thus since $x<\vol(A)$ there is an $n$ such that $x<\vol(A\cap[-n,n])$.

Now define the function $f(y)=\vol(A\cap[-n,y])$. If $y\le y'$, then $[-n,y]\subseteq[-n,y']\implies f(y)\le f(y')$, so $f$ is a weakly increasing function, and since $(A\cap[-n,y'])\setminus(A\cap[-n,y])\subseteq[y,y']$, we also have $f(y')-f(y)\le\vol([y,y'])=y'-y$, so $f$ is a continuous function $\Bbb R\to\Bbb R$. Since $f(-n)=0\le x<\vol(A\cap[-n,n])=f(n)$, the regular intermediate value theorem implies that there is some $z$ such that $\vol(A\cap[-n,z])=x$, and for this $z$, $B=A\cap[-n,z]$ satisfies the requirements of the conclusion. $$\tag*{\blacksquare}$$

We need this theorem to extend the formula $\int_A c\,dx=c\vol(A)$ to the case $\vol(A)=\infty$. (We can't use itg2const directly because the formula is only valid when $\vol(A)$ is a real number, since multiplication only works on complex numbers.) If $\vol(A)=\infty$ and $c>0$, then for any $x\in\Bbb R$ there is a $B\subseteq A$ with $\vol(B)=x/c$, and then $\int_B c\,dx=x\le\int_A c\,dx$. Thus $\int_A c\,dx$ is larger than any real number, so $\int_A c\,dx=\infty$ as well. (This is theorem itg2const2.)

Now we can prove the main theorem.

Proof (of itggt0): Define the sequence $A_n=\{x:f(x)>\frac1n\}$. This is clearly an increasing sequence of measurable sets (since $f$ is measurable), and since $f(x)>0$ for all $x\in A$ so that there is some $k$ with $f(x)>\frac1k$, $A\subseteq\bigcup_{n\in\Bbb N}A_n$. Now assume that $\int_A f(x)\,dx=0$. Then $$0\le\int_{A_n}\frac1n\,dx\le\int_{A_n}f(x)\,dx\le\int_A f(x)\,dx=0,$$ so from itg2const and itg2const2, $\vol(A_n)=0$ (because $\vol(A_n)=\infty$ implies $\int_{A_n}\frac1n\,dx=\infty\ne0$ and $\vol(A_n)\in\Bbb R$ implies $n\int_{A_n}\frac1n\,dx=\vol(A_n)=0$). Since this is true for every $n$, we have $\vol(A)\le\vol(\bigcup_{n\in\Bbb N}A_n)=\lim_{n\to\infty}\vol(A_n)=0$, which contradicts the hypothesis $\vol(A)>0$. Therefore $\int_A f(x)\ne0$, and since $\int_A f(x)\ge0$ we in fact have $\int_A f(x)>0$. $$\tag*{\blacksquare}$$

Wednesday, August 20, 2014

Van der Waerden's theorem

To take a break from all this real analysis, I thought I'd change gears and formalize a gem of number theory, which goes by the name of Van der Waerden's theorem. The proof is not yet complete, so instead I'll just lay out the high-level strategy of the proof. (Well, any regular mathematician would call this a proof, and a pretty detailed one at that, but for me it's only a sketch of the "real" formal proof.)

Theorem (vdw): For any finite set $R$ (the set of "colors") and any $k\in\Bbb N$, there is an $N$ such that for any "coloring" $f:\{1,\dots,N\}\to R$, there is a monochromatic arithmetic progression. This last phrase is expressed in symbols as: $$\exists c\in R;a,d\in\Bbb N:\forall m\in\{0,\dots,k-1\}:a+md\in f^{-1}(\{c\}).$$ By demanding that $a+md$ lie in the preimage of $f$ we can indirectly also demand that the entire AP lies within $\{1,\dots,N\}$.

Proof: We begin by defining some predicates, which look horrible in metamath because they take so many parameters. First, we define the "arithmetic progression function": $$A_k:\Bbb N\times\Bbb N\to{\cal P}\Bbb N\quad A_k(a,d)=\{a+md\,|\,m\in\{0,\dots,k-1\}\}$$ This function takes a length $k$, a start point and step $a,d$, and outputs the set of all points in this arithmetic progression. The parameter $k$ has been separated from the other two so that we can consider the set $\operatorname{ran}A_k$ of all length-$k$ arithmetic progressions.

Next, we define the notion that a given coloring $\langle R,N,f\rangle$ (where $f:\{1,\dots,N\}\to R$) contains a length-$k$ monochromatic AP. In metamath this predicate looks like $k\,(R\,O\,N)\,f$, because $O$ is a function which takes $R,N$ as input and returns a binary relation whose two parameters are $k,f$. In any case, we define this notion to be: $$k\,(R\,O\,N)\,f\iff\exists c\in R,x\in\operatorname{ran}A_k:x\subseteq f^{-1}(\{c\}).$$
Straightforward enough. Now we turn to our last and most complex definition, which states that a given coloring $\langle R,N,f\rangle$ contains a polychromatic $m$-tuple of arithmetic progressions of length $k$. (In metamath this looks like $k\,((R\,P\,N)m)\,f$.) A polychromatic $m$-tuple of arithmetic progressions is a set of arithmetic progressions $a+\{1,\dots,k\}d_1,a+\{1,\dots,k\}d_2,\dots,a+\{1,\dots,k\}d_m$ which all share a common basepoint $a$ (but do not actually include $a$), and we demand that each progression be monochromatic, but no two APs have the same color. We express this as: $$k\,((R\,P\,N)m)\,f\iff\exists a\in\Bbb N, d:\{1,\dots,m\}\to\Bbb N:\\ \quad\forall i\in 1,\dots,m:A_k(a+d_i,d_i)\subseteq f^{-1}(\{f(a+d_i)\})\,\wedge\\ \quad\big|\{f(a+d_i)\,|\,i\in1,\dots,m\}\big|=m$$

The first conjunct says that the entire AP $A_k(a+d_i,d_i)$ is all the same color as $f(a+d_i)$, the first element of the progression, i.e. the progression is monochromatic; the second conjunct says that the set $\{f(a+d_i)\,|\,i\in1,\dots,m\}$ of colors of the first element of each progression has size exactly $m$, i.e. they are all distinct. (We could just as well say that the $f$-image of the union of all the AP's has size $m$, but it's less trivial to show that the size of this larger set is already at most $m$.)

Now, onto the proof proper. The main body of the proof is a double induction; at the top level we induct on $k$, where our induction hypothesis is that for any $R$ there is an $N$ such that all $R$-colorings of $\{1,\dots,N\}$ contain a monochromatic length-$k$ AP. For the base case, we treat $k=1$ and $k=2$ separately (because some of our arguments in the induction step $k\to k+1$ don't work if $k=1$). If $k=1$, then any point is an AP, so take $N=1$ and so $A_1(1,1)=\{1\}\subseteq f^{-1}(\{f(1)\})$. If $k=2$, then we can use the pigeonhole principle. Take $N=|R|+1$, and assume that $f$ contains no length-$2$ monochromatic APs. If $f(x)=f(y)$ for $x<y$, then $A_2(x,y-x)=\{x,y\}\subseteq f^{-1}(\{f(x)\})$, so $f$ has a length-$2$ AP contrary to the hypothesis, so therefore $f$ is injective and $|\{1,\dots,N\}|=N=|R|+1\le|R|$, a contradiction.

In our induction step, we assume (for $k\ge2$) that the hypothesis is true for $k$, and derive it for $k+1$. Fix $R\ne\emptyset$; if $R=\emptyset$ then there are no functions $f:\{1\}\to R$ so the statement is vacuously true for $N=1$. Let us then start another induction on $m\in\Bbb N$, where the induction hypothesis is that there is an $N$ such that every coloring on $\{1,\dots,N\}$ contains either a monochromatic length-$k+1$ AP or a polychromatic $m$-tuple of length-$k$ APs. The base case is not entirely trivial, because a $1$-tuple of length-$k$ APs is not quite the same as a monochromatic length $k$ AP (which is guaranteed to us by the $k$-induction hypothesis) because it "reserves space" for the base point of a $k+1$-long AP - it just doesn't demand that this initial point have any particular color.

To fix this, suppose that $N$ is such that every coloring of $\{1,\dots,N\}$ contains a monochromatic length-$k$ AP, and let $N_2=2N$. Given a coloring $f:\{1,\dots,2N\}$, by defining $g:\{1,\dots,N\}\to R$ by $g(x)=f(x+N)$, we know $g$ contains a monochromatic length-$k$ AP, so there is a $c,a,d$ such that for all $t=0,\dots,k-1$ we have $a+td\in\{1,\dots,N\}\wedge f(a+td)=c$. Then we can define $a'=N+a-d$; this is a positive integer because $a,a+d\in\{1,\dots,N\}$ imply $a-d=2a-(a+d)\ge 2-N>-N$. (This is why we needed $k\ge2$ at the start.) Now we have the necessary setup for a polychromatic $1$-tuple of APs, using $a'$ and $d_1=d$.

Now, for the $m$-induction step, we finally get to the interesting part of the proof. Let $W\in\Bbb N$ be such as specified by the $m$-induction hypothesis. Then the set $R^{\{1,\dots,W\}}$ of all functions $\{1,\dots,W\}\to R$ is finite, so by the $k$-induction hypothesis there is a $V$ such that all colorings $f:\{1,\dots,V\}\to R^{\{1,\dots,W\}}$ have a length-$k$ monochromatic AP. Let $N=2WV$, and now we want to show that every function $h:\{1,\dots,N\}\to R$ contains either a monochromatic $k+1$ AP or a polychromatic $m,k$ AP tuple. Define $$f(x)=\big(y\in\{1,\dots,W\}\mapsto h(y+W(x-1+V))\big);$$ effectively we are cutting the set $\{1,\dots,N\}$ into blocks of length $W$. Again, we "reserve space" of size $WV$ in our set $1,\dots,N$ so that we can move backwards a little. The hypotheses give us that $f$ contains an AP with parameters $g,a,d$ such that $f(x)=g$ for $x=a,a+d,\dots,a+(k-1)d$. Now since $f$ takes values in the function space $R^{\{1,\dots,W\}}$, $g:\{1,\dots,W\}\to R$ is really a function, which satisfies the assumptions for the $m$-induction hypothesis, so either $g$ contains a $k+1$-long AP with parameters $c,a',d'$ (in which case $h$ has a $k+1$-long AP with parameters $c,W(a-1+V)+a',d'$), or there is a polychromatic $m$-tuple of APs $a',d_1,\dots,d_m$.

In this case, consider the prospective $m+1$-tuple defined by $a''=a'+W(a-d-1+V)$ and $d''_i=d_i+dW$, $d''_{m+1}=dW$. We can verify that $a''$ is positive, because $$V-1+a-d=V-1+2a-(a+d)\ge V-1+2-V=1,$$ and for each $i\in\{1,\dots,m\}$ and $t\in\{1,\dots,k\}$, \begin{align}
h(a''+td''_i)&=h(a'+td_i+W(a+(t-1)d-1+V))\\
&=f(a+(t-1)d)(a'+td_i)\\
&=g(a'+td_i)=g(a+d_i),
\end{align}
so each tuple is individually monochromatic. For the last AP, $h(a''+td''_{m+1})=f(a+(t-1)d)(a')=g(a')$ is monochromatic as well. All that remains is to verify that the colors are distinct. Since we already know that $g(a'+d_i)\ne g(a'+d_j)$ for any $i\ne j$, if additionally $g(a')\ne g(a'+d_i)$ for all $i=1,\dots,m$ then we are done because $h$ contains a polychromatic $m+1$-tuple. Otherwise, there is some $i$ such that $g(a')=g(a'+d_i)$. But we already know $g(a'+d_i)=g(a'+td_i)$ for $t\in\{1,\dots,k\}$, so then $g$ contains a $k+1$ monochromatic AP, and so does $h$ (with parameters $g(a'),a'+W(a-1+V),d_i$).

This completes the $m$-induction. To complete the $k$-induction, let $m=|R|$ (which is valid since $R\ne\emptyset$). Then there is an $N$ such that all colorings have a $k+1$ monochromatic AP or a polychromatic $|R|$-tuple of length-$k$ APs. In the first case we are done, and in the second case let the parameters of the tuple be $a,d_1,\dots,d_{|R|}$. Since $|\{f(a+d_i)\,|\,i\in\{1,\dots,|R|\}\}|=|R|$ and $R$ is finite, the function $i\mapsto f(a+d_i)$ must be bijective to $R$, so there is an $i$ such that $f(a)=f(a+d_i)$, and as before this yields a monochromatic AP of length $k+1$ with parameters $a,d_i$. Thus there is a $k+1$ monochromatic AP in any case, which completes the $k$-induction and the proof. $$\tag*{\blacksquare}$$

There is one more nice extension to this theorem, which is the so-called "infinitary" version of Van der Waerden's theorem.

Theorem (vdwnn): For any finite set $R$ and any coloring $f:\Bbb N\to R$, there is a color $c\in R$ such that $f^{-1}(\{c\})$ contains arbitrarily long arithmetic progressions. (In symbols, $\forall k\in\Bbb N:\exists a,d\in\Bbb N:\forall m\in\{0,\dots,k-1\}:a+md\in f^{-1}(\{c\})$.)

Proof: Fix $R$ and $f$ and suppose not. Then for each $c\in\Bbb R$, there is a $k$ such that there are no $c$-colored APs of length $k$, and since each length $n\ge k$ AP contains a length $k$ AP, there are also no $c$-colored APs of length $n$ for any $n\ge k$. Thus the set of $k\in\Bbb N$ such that there are $c$-colored APs of length $k$ is bounded, and since the finite union of bounded sets is bounded, there is a uniform bound $K$ such that there are no monochromatic length $K$ APs in any color. But this is a contradiction to vdw above. $$\tag*{\blacksquare}$$