Wednesday, September 24, 2014

Mathbox update: Functions of Number Theory

I should probably apologize for having nothing but mathbox updates on this blog of late, but this is the only type of post I release regularly, in conjunction with the actual theorems being uploaded to the Metamath site. (I try to make up for it by making frequent, hopefully enlightening remarks in the list below so it's not just a list.) This update is primarily concerned with preparations for an eventual proof of the Prime Number Theorem, and actual results include the Chebyshev bound on the prime pi function $a\frac x{\log x}\le\pi(x)\le A\frac x{\log x}$, which is a weak form of the PNT (which shows that this is true for any $a<1<A$). We also define several number-theoretic functions, and formalize the notions $f\to a$ and $f\in O(1)$ for real functions (we already have a limit notion $(a_n)\to a$, but it only works on sequences).
  • df-rlim: Define the limit predicate $f(x)\to a$ as $x\to\infty$. It is defined so that it will work on sequences defined on an upper integer set or on real functions; we say that $f\to a$ if $f$ is a partial function $\Bbb R\to\Bbb C$ and for all $\epsilon>0$ there exists an $M\in\Bbb R$ such that for all $x\ge M$ in the domain of $f$, $|f(x)-a|<\epsilon$. Note that if the domain of $f$ is upper bounded, then the rest is vacuously true, so $f\to a$ for every $a\in\Bbb C$ in this case. (But if the domain of $f$ is not upper bounded, then $f\to a$ for at most one $a$, as we expect.)
  • df-o1: Define the set of eventually bounded functions, that is, $f\in O(1)$ iff there exists an $c,M$ such that $|f(x)|\le M$ for all $x\ge c$. The notation $O(1)$ is meant to be suggestive of Bachmann-Landau notation, and indeed it matches the notation for $f=O(1)$ (although of course we use $\in$ in place of the misleading $=$ in big-O notation). However, we do not define the complete hierarchy of notations, as $O(1)$ and the $\to 0$ predicate of df-rlim is sufficient to unify all of these, in the following manner:
    $$f(x)=O(g(x))\iff g(x)=\Omega(f(x))\iff\frac{f(x)}{g(x)}\in O(1)\\ f(x)=o(g(x))\iff g(x)=\omega(f(x))\iff\frac{f(x)}{g(x)}\to0\\ f(x)=\Theta(g(x))\iff\frac{f(x)}{g(x)},\frac{g(x)}{f(x)}\in O(1)\\ f(x)\sim g(x)\iff\frac{f(x)}{g(x)}\to1$$
    Thus chebbnd1, chebbnd2 together make the statement $\pi(x)\in\Theta(x/\log x)$.

    Since usually the functions $g(x)$ we use in this sense are eventually non-zero, the equivalences above are exact. For functions that are not eventually non-zero, the usual definition of the statement $f(x)=O(g(x))$ also requires that $f(x)=0$ wherever $g(x)=0$ for sufficiently large $x$, while $f/g\in O(1)$ leaves $f$ unconstrained whenever $g(x)=0$ because in these cases $f(x)/g(x)$ is undefined. I don't see that this edge case is very useful, though, so the discrepancy is not enough to be relevant. (If anyone has a good example of a use for the notation $f(x)=O(g(x))$ or $f(x)=o(g(x))$ where $g$ is not eventually nonzero, I'm all ears.)
  • o1const, rlimconst: The constant function $f(x)=a$ satisfies $f\in O(1)$ and $f\to a$.
  • rlimclim: For functions $f:\Bbb Z^{\ge n}\to\Bbb C$, $f\to_{\Bbb Z}a\iff f\to_r a$, i.e. the old $\Bbb Z$-limit predicate df-clim and the new df-rlim are equivalent. Since df-clim applies to non-functions, but $f\to_{\Bbb Z} a\iff (x\in\Bbb N\mapsto f(x))\to_{\Bbb Z} a$, i.e. we can replace $f$ with a function that takes the same values and has the same behavior under $\to_{\Bbb Z}$, it follows that df-rlim is a generalization of df-clim.
  • rlimres, o1res: If $f\to a$, then $f\upharpoonright X\to a$, that is, $f$ restricted to any subset $X$ still has limit $a$. Similarly, $f\in O(1)$ implies $f\upharpoonright X\in O(1)$. The unconstrained nature of this restriction may seem peculiar, since then it seems that even the empty function has limit $a$, but considering the note about upper bounded domain implying that $f$ takes every value as a limit, this should be no surprise.
  • rlimresb: If $X=[b,\infty)$ for some $b\in\Bbb R$, then $f\to a\iff f\upharpoonright X\to a$. In other words, the limit of $f$ is not affected by its values on any upper bounded set.
  • rlimuni: Since as pointed out above, $f$ often fails to have a unique limit, we may ask under what conditions we are ensured that if a limit exists it is unique. The correct answer is that the limit is unique iff the domain of $f$ has no upper bound, but here we prove the weaker assertion (which is sufficient for most applications) that the limit is unique if the domain of $f$ contains an upper integer set.
  • rlimcn1, rlimcn2: One of the most important properties of continuous functions (almost a definitional property) is that it preserves limits, that is, if $f$ is continuous and $g\to a$ then $f\circ g\to f(a)$. Here we derive this for one and two-dimensional continuous functions. (This is a bit early in the development to be worrying about two-dimensional continuity, but we need it for the important continuous functions $+,-,\cdot$.)
  • reccn2: Previously a lemma for the much later divcn, here we show that the reciprocal function is continuous (away from zero).
  • o1of2: The property that characterizes $O(1)$-preserving functions is a bit different from continuity, and I don't know if it has a name: For all $\delta\in\Bbb R$, there exists an $\epsilon$ such that for all $x$, $|x|\le\delta\to|f(x)|\le\epsilon$. (Note that the roles of $\delta$ and $\epsilon$ is reversed from the usual definition of a limit.) The most nontrivial example of this I have seen so far is the function $f(x)=x^a$ where $a\in\Bbb C$ with $\Re a\ge0$; in this case we have the complicated bound $|x|\le\delta\to|x^a|\le\delta^{\Re a}e^{|\Im a|\pi}$. Note that $x^a$ is not even a continuous function due to the branch cut. I believe that any continuous function is also $O(1)$-preserving, but the proof seems to need the extreme value theorem and so I would not call it an "easy" theorem. $+,-,\cdot$ are all $O(1)$-preserving, of course (o1add, o1sub, o1mul), but division is not.
  • rlimo1: If $f\to a$, then $f\in O(1)$. This encompasses the rules
    • $f(x)=o(g(x))\implies f(x)=O(g(x))$,
    • $f(x)\sim g(x)\implies f(x)=O(g(x))$
    as well as a few others. Similarly, the rule $f,g\in O(1)\implies f\cdot g\in O(1)$ encompasses the rules
    • $f(x)=O(g(x)),\, g(x)=O(h(x))\implies f(x)=O(h(x)),$
    • $f_1(x)=O(g_1(x)),\, f_2(x)=O(g_2(x))\implies f_1(x)f_2(x)=O(g_1(x)g_2(x)),$
    • $f(x)=O(g(x))\implies cf(x)=O(g(x)),$
    and the sum rule implies $f_1(x),f_2(x)\in O(g(x))\implies f_1(x)+f_2(x)\in O(g(x))$. This phenomenon where many rules are expressed as special cases of one rule when written using a more restricted notation is why I support the approach taken in Metamath where multiple notations for one concept are eliminated as much as possible. For example, in Metamath there is a less-than sign $<$, but no greater-than sign, because any formula of the form $y>x$ can be rewritten as $x<y$ and it means that we don't need separate theorems dealing with transitivity of greater-than as well as those with less-than. (Even the use of $x\le y$ instead of $\lnot(y<x)$ is a controlled concession, although in this case it is used in enough contexts and helps in reading enough that it is considered worthwhile to define.)
  • o1rlimmul: If $f\in O(1)$ and $g\to0$, then $f\cdot g\to0$. This expresses the rule $f(x)=O(g(x)),g(x)=o(h(x))\implies f(x)=o(h(x))$. Of course this is not true for limits other than $0$; if $f\in O(1)$ and $g\to a\ne0$, then $f\cdot g$ may not have a limit at all. It is still bounded, though; but this already follows from rlimo1 and o1mul.
  • rlimadd, rlimsub, rlimmul, rlimdiv: I'm sure at this point it is no surprise that we have the theorems $f\to a,g\to b$ implies $f+g\to a+b$,$f-g\to a-b$,$f\cdot g\to a\cdot b$, and (if $g(x)\ne 0$ and $b\ne0$) $f/g\to a/b$. The last one is new, however - we had no $\to_{\Bbb Z}$-equivalent for division of limits before this point.
  • divrcnv: The most basic nontrivial convergent function: if $f(x)=\frac ax$, then $f\to0$.

  • logimcl: The imaginary part of $\log x$ is defined to satisfy $-\pi<\Im[\log x]\le\pi$. This, and the property $e^{\log x}=x$ (eflog), is sufficient to uniquely define the number $\log x$, and is generally needed when making arguments involving the complex logarithm function (the situation is a bit simpler for the real logarithm, which has no branch cut issues).
  • lognegb: Any number with argument $\pi$ (recall that $\operatorname{Arg}z:=\Im[\log z]$) is a negative real number and vice-versa.
  • relog: The dual equation to absef, which is nice for having no branch cut restrictions, unlike many other $\log$ formulas: $\Re[\log z]=\log|z|$.
  • rplogcl: Common special case of logltb: if $x>1$, then $\log x>0$.
  • angval: Define a function $\angle(x,y)$ which measures the angle of two vectors in the plane, using the complex numbers as a convenient proxy for $\Bbb R^2$. We define $\angle(x,y)=\operatorname{Arg}(y/x)=\Im[\log(y/x)]$. Note that $x,y$ are nonzero complex numbers, not real numbers. This measures the angle from the line $\overline{0x}$ to $\overline{0y}$, as a number in $(-\pi,\pi]$. A positive number indicates that $y$ is counterclockwise ahead of $x$, and the function is antisymmetric as long as $x,y$ are not $180^\circ$ from each other. For example, $\angle(1,1)=0$, $\angle(1,i)=\angle(i,-1)=90^\circ$, $\angle(i,1)=-90^\circ$. (I'm using degrees because I want to treat this as a tool for geometry, not complex analysis.)
  • ang180: Let $a,b,c$ be three distinct points in the plane. Then they form the vertices of a (possibly degenerate) triangle, and the interior angles of the triangle are respectively $\angle(c-b,a-b)$, $\angle(a-c,b-c)$, $\angle(b-a,c-a)$. Under these circumstances, this theorem states that the sum of these angles is $\pm180^\circ$, where the sign choice depends on whether the three points are labeled in clockwise or counterclockwise order. This is actually one of the Metamath 100 theorems, so now this makes the total count 38 theorems. The proof is more intricate than I expected it to be, although it's not too hard to understand at a high level.
  • abscxp, abscxp2, abscxpbnd: I was disappointed to discover there is no nice formula for $|a^b|$ in the general case. If $a$ is positive real, then $|a^b|=a^{\Re b}$; and if $b$ is real, then $|a^b|=|a|^b$. If both $a,b$ are complex, then $|a^b|=|a|^{\Re b}e^{-\operatorname{Arg}a\cdot\Im b}$; as a bound we can simplify this to $|a^b|\le |a|^{\Re b}e^{|\Im b|\pi}$ or $|a^b|\le |a|^{\Re b}e^{|b|\pi}$ (which is the form we quoted earlier).
  • cxplim: In this section we establish the basic facts about the order of growth of powers, logarithms, and exponentials. If $a<0$ then $x^a\to0$ as $x\to\infty$.
  • rlimcxp, o1cxp: If $f(x)\to0$ and $a>0$, then $f(x)^a\to0$, and similarly $f(x)\in O(1)$ and $\Re a\ge0$ imply $f(x)^a\in O(1)$.
  • cxp2limlem: If $b>1$ then $x\in o(b^x)$, i.e. $x/b^x\to 0$.
  • cxp2lim: If $a\in\Bbb R$ and $b>1$ then $x^a\in o(b^x)$, i.e. $x^a/b^x\to 0$. In words, any increasing exponential function eventually dominates every polynomial.
  • cxploglim: If $a>0$, then $\log x\in o(x^a)$ - a logarithm grows slower than any polynomial.
  • df-cht: Define the first Chebyshev function, $\theta(x)=\sum_{p\le x}\log p$, where the sum is only over primes.
  • df-ppi: Define the prime $\pi$ function, $\pi(x)=\sum_{p\le x}1=|[0,x]\cap\Bbb P|$.
  • df-mmu: Define the Möbius function, $\mu(x)=0$ if $x$ is not squarefree and $\mu(x)=(-1)^n$ where $n$ is the number of prime factors of $x$ otherwise.
  • df-sgm: Define the divisor function, $\sigma(x)=$ the number of divisors of $x$.
  • efchtcl: The Chebyshev function is closed in the log-integers, i.e. $e^{\theta(x)}\in\Bbb N$. (This is one of the most unusual sets I have applied fsumcllem to, showing that the set $\{x\in\Bbb R:e^x\in\Bbb N\}$ is closed under addition and zero as a consequence of the natural numbers being closed under multiplication and one.)
  • efchtdvds: The exponential Chebyshev function is actually the same as the primorial, $e^{\theta(x)}=x\#=\prod_{p\le x}p$, so it's not surprising that it shares all the properties of the primorial, like this one, that says that $e^{\theta(x)}|e^{\theta(y)}$ whenever $x\le y$.
  • ppifl: The Chebyshev and prime $\pi$ functions are actually functions on the reals, not the integers; we extend them to the reals by means of this theorem, $\pi(x)=\pi(\lfloor x\rfloor)$.
  • ppip1le: A local version of ppiltx, showing that $\pi(x+1)\le\pi(x)+1$. The obvious extension of this formula, $\pi(x+n)\le\pi(x)+n$, is only true when $n$ is an integer, because if it was true for all $n\in\Bbb R$ then $\pi$ would have to be continuous, and of course it has jump discontinuities at the primes.
  • ppiltx: A simple cardinality argument shows $\pi(x)<x$.
  • prmorcht: As remarked earlier, the primorial is the same as the exponential of the Chebyshev function, $e^{\theta(x)}=x\#$. Later, in the proof of bpos, we will make a careful analysis of the primorial function in order to establish a bound; now this proof has been converted to show a bound on $\theta(x)$ instead.
  • ppiub: This theorem used to be called prmpiub, and stated as $|\Bbb P\cap\{1\dots n\}|\le\frac n3+2$ and used as part of the proof of bpos. Now that we have defined a function for this, we can just write $\pi(x)\le\frac x3+2$, although the content hasn't changed much beyond being extended to the reals.
  • chtleppi: A simple sum inequality shows $$\theta(x)=\sum_{p\le x}\log p\le\sum_{p\le x}\log x=\pi(x)\log x.$$
  • chtub: Previously called prmorub and stated as $\prod_{p\le n}p<2^{2n-3}$, now we can write $\theta(x)<(2x-3)\log 2$ and extend the domain to all reals greater than $2$.
  • chebbnd1: The largest original contribution in this update aside from ang180, we show that $\pi(x)\in\Omega(x/\log x)$ by borrowing a lot of lemmas that were used for bpos. We have $4^n/n<{2n\choose n}=\prod_pp^{e_p}$, where the exponents $e_p=\#_p{2n\choose n}$ of the product all satisfy $p^{e_p}\le 2n$, by bposlem1. Then no prime larger than $2n$ enters the product, and each term is bounded, so the largest that sum can be is $(2n)^{\pi(2n)}$, and after taking logs you get $\pi(2n)\frac{\log(2n)}{2n}>\frac{\log 4}2-\frac{\log n}{2n}\ge\frac{\log 4}2-\frac1{2e}$, and with a little relaxing of the constants you can extend this to all real numbers, so in the end you get that $\pi(x)\in\Omega(x/\log x)$. The other inequality goes via $\theta(x)$, as we will see.
  • chtppilim: Here we show that $\theta(x)\sim\pi(x)\log x$. One inequality, chtleppi, is easy and mentioned above; the other direction fixes $0<a<1$ and uses the trick $$\theta(x)\ge\sum_{x^a<p\le x}\log p\ge\sum_{x^a<p\le x}a\log x\ge a\log x(\pi(x)-x^a),$$ and since $x^a\in o(x/\log x)$ and $x/\log x\in O(\pi(x))$, we can make $x^a\le (1-a)\pi(x)$ and then $\theta(x)\ge a^2\log x\,\pi(x)$; now since $a<1$ was arbitrary, we have the desired statement $\frac{\theta(x)}{\pi(x)\log x}\to1$.
  • chto1ub: A simple consequence of chtub: Since $\theta(x)<(2x-3)\log 2<2x\log 2$, it immediately follows that $\theta(x)\in O(x)$. The PNT will strengthen this to $\theta(x)/x\to1$.
  • chebbnd2: The other direction of chebbnd1, showing $\pi(x)\in O(x/\log x)$, by leveraging chto1ub and chtppilim: since $\theta(x)\in O(x)$ and $\theta(x)\sim\pi(x)\log x$, $\pi(x)\log x\in O(x)$, or $\pi(x)\in O(x/\log x)$.

Monday, September 15, 2014

Mathbox update: Fundamental Theorem of Algebra

The main theorems in this short update are Van der Waerden's theorem (already discussed in a previous post) and the Fundamental Theorem of Algebra. This last one is one of the metamath 100, and the proof was surprisingly straightforward. The hardest part was the first step, to show that a closed disk in $\Bbb C$ is compact via the Heine-Borel theorem; then it is just a matter of verifying some explicit bounds. By the way, this proof is based on the first "topological proof" on Wikipedia.
  • vdw: Van der Waerden's theorem: for any $k$ and finite set $R$ of colors, there is an $N$ such that any $R$-coloring of $\{1\dots N\}$ contains a length-$k$ monochromatic arithmetic progression. See also the previous post I wrote on this theorem.
  • vdwnn: Infinitary Van der Waerden's theorem: for any finite coloring of $\Bbb N$, there is a color that contains arbitrarily long arithmetic progressions.
  • txcmp, txcmpb: The topological product of two nonempty spaces is compact iff both factors are compact.
  • hmeores: The restriction of a homeomorphism of topological spaces is a homeomorphism on the induced subspace topologies.
  • cnheibor: Heine-Borel theorem for the complex numbers: A subset of $\Bbb C$ is compact iff it is closed and bounded. Note that we already have the more general heibor (a metric space is compact iff it is complete and totally bounded), but this direct proof avoids the axiom of countable choice.
  • dgrco: The degree of a composition of polynomials is the product of the degrees.
  • fta: The Fundamental Theorem of Algebra: Every non-constant polynomial has a root.

Thursday, September 11, 2014

Mathbox update: Ostrowski's theorem

Hello all,

This update has two main foci: the rounding off of the integration theorems with the integration by parts and integration by substitution formulas, and an unrelated work to formalize the concept of an absolute value over a ring (in the algebra sense) and prove Ostrowski's theorem. I dare say that the real analysis section is finally "open for business" now that most of the workhorse theorems are now available for use. On to the list:
  • suplub2, suprleub, infmxrgelb, ...: All of the supremum theorems of the form $\forall x \in A: x\le B\to\sup A\le B$ or $B<\sup A\to\exists x \in A: B<x$ have now been made bidirectional. I find this interesting, because it means that one can characterize "less than supremum" exactly in terms of a conjunction of individual inequalities. Another way to think about it is to consider the sets $(-\infty,x)$ for $x\in\Bbb R$. These are open sets, so any union of them is also open; moreover, they are all convex and unbounded below, and it's not hard to see that such sets are also closed under unions. Thus any nonempty union $\bigcup_{x\in A}(-\infty,x)$ is an open, convex, unbounded-below set, and the only such sets on the reals are those of the form $(\infty,x)$ or $(-\infty,\infty)$, since $\Bbb R$ is complete. If $A$ is bounded above, then the last case is not possible, and so we can define $\sup A$ as the upper endpoint of the union: $\bigcup_{x\in A}(-\infty,x)=(-\infty,\sup A)$.

    Note that a similar characterization with less-or-equal is not possible, because the infinite union of closed intervals is not necessarily closed.
  • limsupval, etc.: The definition of the limit supremum already existed, but there were no theorems for working with it beyond the bare minimum. I added the basics, with the most nontrivial theorems being limsupval2 (the limsup is not affected by any finite prefix) and limsupgre (if a sequence of real numbers has a bounded limit supremum, then the quantities $\sup_{n>k}f(n)$ appearing in the definition of the limsup are all bounded as well).
  • caurcvg: The original proof showed that if $f$ is a Cauchy sequence, then it converges to the supremum of the set $\{x:\exists j\forall k\ge j:x<f(k)\}$. Since this set is so similar to the limit supremum set, I modified the proof to show that it converges to the limit supremum instead (which then yields the corollary that any convergent sequence converges to the limit supremum of the sequence).
  • pcqmul, pczdvds, pcid: generalize some proofs about the prime count function from integers to rational numbers
  • pcadd: To show that $p^{-\#_p(r)}$ is an ultrametric absolute value on the rational numbers, you need two main facts about the prime count function $\#_p(r)$ (which is defined so that $p^{\#_p(r)}$ is the highest power of $p$ that divides the integer or rational number $r$): the multiplicative property pcqmul ($\#_p(ab)=\#_p(a)\#_p(b)$), and this theorem, which says that $\#_p(a+b)\ge \min(\#_p(a),\#_p(b))$. When you exponentiate both sides, this gives $p^{-\#_p(a+b)}\le\max(p^{-\#_p(a)},p^{-\#_p(b)})$, which is the ultrametric inequality. The proof is not as easy as you might think.
  • df-abv: Given a ring $R$, an absolute value on $R$ is a function $f:R\to\Bbb R$ such that for all $x,y\in R$:
    • $f(x)\ge 0$
    • $f(x)=0\iff x=0_R$
    • $f(x\cdot_R y)=f(x)f(y)$
    • $f(x+_R y)\le f(x)+f(y)$
    This definition is of course inspired by the standard absolute value function $|x|$ on $\Bbb C$; and all except the third axiom are very reminiscent of the analogous axioms for a metric. (Below I will denote an absolute value using the $|x|$ notation as is customary.) Other interesting simple properties of absolute values include $|0|=0$, $|1|=1$ (unless $1=0$), $\left|-x\right|=|x|$, and $|x/y|=|x|/|y|$.
  • abvdom: If $R$ is a ring with an absolute value, then $R$ is actually a domain, which is to say, $xy=0$ implies $x=0\vee y=0$.
  • abvtriv: As a converse to abvdom, if $R$ is a domain, then it has an absolute value, namely the trivial absolute value defined by $|x|=0$ if $x=0$ and $|x|=1$ otherwise. (Since we don't have a good notation for domain, we strengthen the hypotheses to say that $R$ is a division ring instead. However, this theorem now provides the alternative notation ${\rm AbsVal}(R)\ne\emptyset$ in order to specify that $R$ is a non-commutative domain.)
  • cncfmpt1f, cncfmpt2f: These theorems are the analogues of cnmpt11f and cnmpt12f, respectively, specialized to continuous functions on the complex numbers (rather than continuous functions in arbitrary topological spaces). We still need the topological continuity predicate for the two-arg version, though, because the version on $\Bbb C$ is designed only for functions $f:\Bbb C\to\Bbb C$ in one variable, which misses the common use-case $h(x)=R(f(x),g(x))$ where $R:\Bbb C\times\Bbb C\to\Bbb C$ is continuous in the product topology on $\Bbb C\times\Bbb C$ (particularly when $R$ is one of $+,\cdot,-$).
  • mbfsub: The difference of two measurable functions is measurable.
  • mbfsup, mbfinf, mbflimsup, mbflim: The supremum, infimum, limit supremum, and limit of a sequence of measurable functions is measurable. The main reason for developing limsup in the first place is because it allows a limit to be written in terms of a supremum and infimum, which are the basic measurability-preserving operations we have to work with.
  • mbfi1flim: A real measurable function is the pointwise limit of a sequence of simple functions. (The restriction to real functions is only because in our definition simple functions have to be real, so that they can be compared.)
  • mbfmul: The pointwise product of two measurable functions is measurable. The proof is very general; you just take two sequences of simple functions converging to $f$ and $g$ and multiply them pointwise to get a sequence of simple functions that converges to $fg$; then $fg$ is a limit of simple functions and so is measurable. I think it could easily be adapted to show that $h(f(x),g(x))$ is a measurable function whenever $h$ is jointly continuous and $f,g$ are measurable.
  • bddmulibl, bddibl: A bounded function is integrable, and a bounded function times an integrable function is also integrable. (This also yields a shorter proof for cniccibl.)
  • dvcn: A function differentiable on its entire domain is continuous.
  • ftc2ditg: Restate ftc2 for directed integrals, so that we can drop the ordering of the endpoints.
  • itgparts: Integration by parts. If $A:[x,y]\to\Bbb C$ and $C:[x,y]\to\Bbb C$ are continuous functions, $A':(x,y)\to\Bbb C$ and $C':(x,y)\to\Bbb C$ are defined and continuous (so $A,C$ are differentiable on $(x,y)$), and $A\cdot C'$ and $A'\cdot C$ are integrable on $[x,y]$, then $$\int_x^yA(t)C'(t)\,dt=(A(y)C(y)-A(x)C(x))-\int_x^yA'(t)C(t)\,dt.$$
  • itgsubst: Integration by $u$-substitution. If $I$ is an open interval, $A:[x,y]\to I$ and $C:I\to\Bbb C$ are continuous functions, and $A':(x,y)\to\Bbb R$ is continuous and integrable, then $$\int_{A(x)}^{A(y)}C(u)\,du=\int_x^yC(A(t))A'(t)\,dt.$$ When I was learning these theorems in school, I don't remember there being so many assumptions, but it just goes to show that it's easy to drop details in a regular math class if you're not being careful. These theorems are not as general as they could be, but usually making the hypotheses more general also means making them more complicated, and I don't want to go too far in that direction. (I certainly don't want to define absolute continuity, which has little use outside these theorems and the FTC.)
  • cxpaddle: A tricky proof that $(a+b)^c\le a^c+b^c$ when $0<c\le1$.
  • abvcxp: We're back to the absolute value after that integration interlude because this list is in database order and we had to wait for the logarithm and complex power functions to be defined. This theorem shows that if $|x|$ is an absolute value on $R$, then $|x|^c$ is as well, when $0<c\le 1$. For $c>1$, it usually is not because the triangle inequality is broken. (I have seen an alternative definition of absolute value with an auxiliary parameter, the "order" $c$ of the absolute value, such that for all $a,b$, $|a+b|\le c\max(|a|,|b|)$; then $|x|$ is a "triangular" absolute value iff $c\le 2$ and $|x|$ is an ultrametric absolute value iff $c=1$. Under this setup, if $|x|$ is an absolute value of order $n$, then $|x|^c$ is an absolute value of order $n^c$.)
  • abvmet: I mentioned earlier that the axioms of an absolute value look a lot like the axioms of a metric; well, here we show the correspondence, where an absolute value $|x|$ induces a metric $d(x,y)=|x-y|$. In fact, in the proof we never use the third property abvmul; and if we drop that (and add $\left|-x\right|=|x|$ as an axiom, since this is no longer derivable) then there is no need to even have the multiplication operation, so we end up with a statement about pure groups. I don't even think they need to be abelian groups, since we never use that either.
  • qrng and related theorems: Define the ring $\langle \Bbb Q,+,\cdot\rangle$, and show that it is actually a division ring, and the operations $+,\cdot,0,1,-,/$ map to their ring operation equivalents.
  • qabvle: From here on, we are interested in absolute values on the ring of rational numbers. For brevity and unambiguity, I will denote the "traditional" absolute value function by $|x|_\infty$, and reserve $|x|$ for any absolute value under our new definition. Induct the triangle inequality to show that $|n|\le n$ for all $n\in\Bbb N_0$.
  • ostthlem2: Any two absolute values that agree on the primes are equal.
  • qabsabv: The regular absolute value $|x|_\infty$ is an absolute value on $\Bbb Q$.
  • padicabv, padicabvf, padicabvcxp: The $p$-adic absolute value $|x|=n^{\#_p(x)}$ is an absolute value whenever $0<n<1$. We define "the" $p$-adic absolute value $|x|_p$ by setting $n=p^{-1}$; other choices of $n$ yield equivalent (off by an exponent) absolute values.
  • ostth: Ostrowski's theorem. If $|x|$ is an absolute value, then (1) $|x|=|x|_0$ is trivial, that is, $|x|=1$ for all $x\ne0$; (2) $|x|=|x|_\infty^c$ for some $c\in(0,1]$, i.e. $|x|$ is "regular"; or (3) there is a prime $p$ such that $|x|=|x|_p^c$ for some $c\in(0,\infty)$, i.e. $|x|$ is equivalent to a $p$-adic absolute value. We separate the proof into three parts according to these cases. If there is some $n\in\Bbb N$ with $|n|>1$ then we are in case (2), otherwise if there is a prime $|p|$ such that $|p|<1$ then we are in case (3). If both of these are false then $|p|=1$ for each prime, so case (1) shows that $|x|=|x|_0$ is the trivial absolute value.
I will probably write a post later going into a bit more detail for this proof, which is quite involved. It's a shame it's not on the metamath 100 list - I think it's a pretty nice and notable theorem.

Wednesday, September 3, 2014

Mathbox update: The Fundamental Theorem of Calculus

Today I just put the finishing touches on a proof of the Fundamental Theorem of Calculus, which is the first theorem from the Formalizing 100 Theorems list to be proven since the creation of this blog, and has been the main reason for my recent fixation on real analysis and basic properties of integrals. Hopefully this means that integrals can now actually be useful tools for more interesting proofs rather than objects of study of themselves. There is also a proof of the Mean Value Theorem and Rolle's theorem, among other things. I may describe proof overviews for some of the theorems in this list in later posts; the theorems below that are proven on this blog are marked with an asterisk and linked at the end.

Note that almost all of these theorems were necessary in order to complete the proof; it is an interesting exercise to see how some of these seemingly unrelated theorems in totally different fields are used in the final proof of the FTC. (To track the usage of a theorem, use the "This theorem is referenced by:" list at the end of the proof page.)
  • fofinf1o: Complementary to f1finf1o, this shows that a surjection from a finite set to another of the same size is a bijection (f1finf1o shows the same assuming an injection).
  • wlogle: Inspired by a similar theorem in HOL Light (another formalization language), we show that if a predicate satisfies $\varphi(x,y)\leftrightarrow \varphi(y,x)$, i.e. it is symmetric under interchange of $x,y$, then "without loss of generality" we can assume that $x\le y$ in a proof that $\forall x,y:\varphi(x,y)$.
  • ifle: A nice property of the conditional operator ${\rm if}(\varphi,x,y)$ (read "if $\varphi$ then $x$, else $y$", or in piecewise notation as $\begin{cases}x&\varphi\\y&\lnot\varphi\end{cases}$) is that it can transform an implication into an inequality. To be precise, if $y\le x$, then $\varphi\to\psi$ implies ${\rm if}(\varphi,x,y)\le{\rm if}(\psi,x,y)$ (and the converse is also true if $x<y$).
  • cnrecnv, cnrehmeo: The canonical map $\Bbb R^2\to\Bbb C$ is the function $\langle x,y\rangle\mapsto x+iy$. We already know from cnref1o that this map is invertible; here we show that the inverse is $z\mapsto\langle\Re\,z,\Im\,z\rangle$. Since all the functions involved here are known to be continuous, this allows us to deduce that the map is a homeomorphism, and so $\Bbb R^2$ under the product topology is homeomorphic to $\Bbb C$ under the metric topology.
  • max0add: The identity $f^++f^-=|f|$, where $f^+=\max(f,0)$ and $f^-=\max(-f,0)$ are the positive and negative part functions. See also max0sub which shows $f^+-f^-=f$.
  • txbas2: If $B$ and $C$ are topology bases, then $\{u\times v\,|\,u\in B,v\in C\}$ is also a basis for a topology, where $u\times v$ is the cartesian product of sets. This generalizes txbas which assumes that $B$ and $C$ are topologies.
  • txbasval: The topological product $\scr B\times\scr C$ is defined as the topology generated by the basis discussed in the last bullet, but it is only defined on topologies, not topological bases. Here we show that if $B$ is a basis for $\scr B$ and $C$ is a basis for $\scr C$, then $\{u\times v\,|\,u\in B,v\in C\}$ is a basis for $\scr B\times\scr C$. Note the difference from the definition, which defines $\scr B\times\scr C$ as the topology generated by $\{u\times v\,|\,u\in{\scr B},v\in{\scr C}\}$ - we see that it is sufficient to consider rectangles of basis elements and not topology elements.
  • iccntr: The interior of the closed interval $[x,y]$ is the open interval $(x,y)$, in the standard topology on the reals.
  • volsup2*, volcn*, volivth*: The intermediate value theorem for Lebesgue measure: Given a measurable set $A$, there is a measurable subset of $A$ of any measure $0\le x\le\operatorname{vol}(A)$.
  • I changed the token for the set of all (Lebesgue) integrable functions from the made-up $f\in{\rm Itgbl}$ to $f\in L^1$, since the latter actually seems to be in use and so may be more understandable to the lay-mathematician.
  • mbfimaopn, mbfimaopn2: The preimage of an open set under a measurable function is measurable. Note in particular that these are open sets in the complex topology, not the real topology; the proof essentially proceeds by showing that open rectangles with rational coordinates form a countable basis for the topology on $\Bbb C$.
  • cncombf: The composition $g\circ f$ is measurable if $f$ is measurable and $g$ is continuous. This could be generalized to $g$ Borel-measurable, but I don't feel like defining the Borel
  • itg2gt0*, itggt0*: If $\operatorname{vol}(A)>0$ and $f>0$ on $A$, then $\int_A f>0$. This is a nice proof that is discussed in a previous post.
  • itg2cn, itgcn: Absolute continuity of the Lebesgue integral. If $f$ is integrable, then for any $\varepsilon>0$ there is a $\delta>0$ such that for any measurable set $A$ with $\operatorname{vol}(A)<\delta$, $\int_A|f|<\varepsilon$. Note that no assumptions on $f$ other than integrability are needed.
  • itgeqa: If two functions are equal almost everywhere, then they have the same integrable and one is integrable iff the other is.
  • itgss3: If $A\subseteq B$ and $B\setminus A$ is a nullset, then $f$ is integrable on $A$ iff it is integrable on $B$ and the integrals are the same.
  • iblsub, itgsub: If $f,g$ are integrable, then $f-g$ is integrable and $\int(f-g)=\int f-\int g$.
  • itgfsum: If $f_1,f_2,\dots f_n$ are integrable, then $\sum_{k=1}^nf_k$ is integrable and $\int_A\sum_{k=1}^nf_k=\sum_{k=1}^n\int_Af_k$.
  • iblabs, iblabsr: If $f$ is a measurable function, then $f$ is integrable iff $|f|$ is integrable.
  • iblmulc2, itgmulc2: If $f$ is integrable and $c\in\Bbb C$, then $c\cdot f$ is integrable and $\int_Ac\cdot f=c\int_Af$.
  • itgabs: If $f$ is integrable, then $|\int_Af|\le\int_A|f|$. This is another nice proof that works on arbitrary complex functions $f$.
  • dvcj: The derivative $(f(x)^*)'$ is $f'(x)^*$, where ${}^*$ denotes the conjugate function. This does not follow from dvcof (the chain rule) because ${}^*$ is not differentiable as a complex function and is just the identity as a real function.
  • dvfre: The derivative of a real function is a real function.
  • dvmptid, dvmptc, dvmptadd, $\dots$, dvmptco: These theorems are nothing new in terms of actual content, in the sense that we already have formulas for the derivative of a sum, product, etc.; rather these are designed to be easy to use in proofs so that one can quickly differentiate a function given by a formula, in a sense providing an "algorithm" for differentiation. Supported here are the rules: $$c'=0\quad x'=1\quad (f+g)'=f'+g'\quad (fg)'=f'g+g'f\quad (cf)'=cf'\quad (-f)'=-f'$$ $$(f-g)'=f'-g'\quad (f^*)'=f'^*\quad (\Re\,f)'=\Re\,f'\quad (\Im\,f)'=\Im\,f'$$ Also the chain rule $(f(g(x)))'=f'(g(x))g'(x)$ as well as restriction theorems: if $f:X\to\Bbb C$ is differentiable on $X$ and $Z\subseteq X$, $Y=Z^\circ$ is the interior of $Z$, then $(f\upharpoonright Z)'=f'\upharpoonright Y$.
  • rolle: Rolle's theorem. If $f:[a,b]\to\Bbb R$ is continuous on $[a,b]$, differentiable on $(a,b)$, and $a<b$ and $f(a)=f(b)$, then there is an $x\in(a,b)$ such that $f'(x)=0$.
  • mvth: The Mean Value Theorem. If $f:[a,b]\to\Bbb R$ is continuous on $[a,b]$, differentiable on $(a,b)$, and $a<b$, then there is an $x\in(a,b)$ such that $f'(x)=\frac{f(b)-f(a)}{b-a}$.
  • dveq0: If $f:[a,b]\to\Bbb R$ is continuous on $[a,b]$ and differentiable with derivative $0$ on $(a,b)$, then $f$ is a constant function.
  Finally we get to the main theorem. Recall, though, that the "fundamental theorem" is really several separate but related theorems; we state them here.
  • ftc1a: Let $f:D\to\Bbb C$ be an integrable function, and $(a,b)\subseteq D\subseteq\Bbb R$ with $a\le b$. Define $G:[a,b]\to\Bbb C$ by $G(x)=\int_a^xf(t)\,dt$. Then $G$ is a continuous function on $[a,b]$.
  • ftc1: Now assume that $f$ is also continuous at some $c\in(a,b)$. Then $G$ is differentiable at $c$, and $G'(c)=f(c)$.
  • ftc1cn: Simplifying the assumptions, if $f:(a,b)\to\Bbb C$ is continuous and integrable, then $G'=f$.
  • ftc2: Let $f:[a,b]\to\Bbb C$ be a function differentiable on $(a,b)$ such that $f'$ is integrable and continuous. Then $\int_a^bf'=f(b)-f(a)$.
If you are familiar with the theorems, you'll see I kinda copped out on the last statement; the assumption that $f'$ is continuous is unnecessary but makes the proof a lot easier. I found a proof in Rudin that uses upper-semicontinuous functions via the Vitali–Carathéodory theorem, but I have not formalized the notion of semicontinuity or the Sorgenfrey line at all, so that seemed like a pain. Maybe I'll revisit it later if it becomes necessary to strengthen the theorem.

This will be theorem #15 of 100 at Formalizing 100 Theorems; as of now there are thus 35 theorems formalized in Metamath. You can see the others at the metamath 100 page.

* These theorems were discussed with natural-language proofs in integral of a positive function.

Sunday, August 31, 2014

Integral of a positive function is positive

Here is a nice theorem that is not as simple as the statement would have you believe:

Theorem (itggt0): If $f$ is integrable and $A$ is a measurable set such that $0<\DeclareMathOperator{\vol}{vol}\vol(A)\le\infty$ and $f(x)>0$ for all $x\in A$, then $\int_A f(x)\,dx>0$.

The analogous theorem with $\le$ is trivial (itgge0), but this theorem requires some of the countable additivity properties of the volume function. We postpone the proof until we have proven the needed lemmas. First, we establish what I will call, for lack of a better name, the Intermediate Value Theorem for the Lebesgue volume function.

Theorem (volivth): If $A$ is a measurable set and $0\le x\le\vol(A)$, then there is a measurable $B\subseteq A$ such that $\vol(B)=x$. (Note that $\vol(A)$ is allowed to be infinite.)

Proof: If $x=\vol(A)$, then $B=A$ satisfies the requirements, so assume instead that $x<\vol(A)$. Consider the sequence $A\cap[-n,n]$ for $n\in\Bbb N$. This is an increasing sequence of measurable sets, so by volsup we have $\sup_{n\in\Bbb N}\vol(A\cap[-n,n])=\vol(A)$. Thus since $x<\vol(A)$ there is an $n$ such that $x<\vol(A\cap[-n,n])$.

Now define the function $f(y)=\vol(A\cap[-n,y])$. If $y\le y'$, then $[-n,y]\subseteq[-n,y']\implies f(y)\le f(y')$, so $f$ is a weakly increasing function, and since $(A\cap[-n,y'])\setminus(A\cap[-n,y])\subseteq[y,y']$, we also have $f(y')-f(y)\le\vol([y,y'])=y'-y$, so $f$ is a continuous function $\Bbb R\to\Bbb R$. Since $f(-n)=0\le x<\vol(A\cap[-n,n])=f(n)$, the regular intermediate value theorem implies that there is some $z$ such that $\vol(A\cap[-n,z])=x$, and for this $z$, $B=A\cap[-n,z]$ satisfies the requirements of the conclusion. $$\tag*{$\blacksquare$}$$

 We need this theorem to extend the formula $\int_A c\,dx=c\vol(A)$ to the case $\vol(A)=\infty$. (We can't use itg2const directly because the formula is only valid when $\vol(A)$ is a real number, since multiplication only works on complex numbers.) If $\vol(A)=\infty$ and $c>0$, then for any $x\in\Bbb R$ there is a $B\subseteq A$ with $\vol(B)=x/c$, and then $\int_B c\,dx=x\le\int_A c\,dx$. Thus $\int_A c\,dx$ is larger than any real number, so $\int_A c\,dx=\infty$ as well. (This is theorem itg2const2.)

Now we can prove the main theorem.

Proof (of itggt0): Define the sequence $A_n=\{x:f(x)>\frac1n\}$. This is clearly an increasing sequence of measurable sets (since $f$ is measurable), and since $f(x)>0$ for all $x\in A$ so that there is some $k$ with $f(x)>\frac1k$, $A\subseteq\bigcup_{n\in\Bbb N}A_n$. Now assume that $\int_A f(x)\,dx=0$. Then $$0\le\int_{A_n}\frac1n\,dx\le\int_{A_n}f(x)\,dx\le\int_A f(x)\,dx=0,$$ so from itg2const and itg2const2, $\vol(A_n)=0$ (because $\vol(A_n)=\infty$ implies $\int_{A_n}\frac1n\,dx=\infty\ne0$ and $\vol(A_n)\in\Bbb R$ implies $n\int_{A_n}\frac1n\,dx=\vol(A_n)=0$). Since this is true for every $n$, we have $\vol(A)\le\vol(\bigcup_{n\in\Bbb N}A_n)=\lim_{n\to\infty}\vol(A_n)=0$, which contradicts the hypothesis $\vol(A)>0$. Therefore $\int_A f(x)\ne0$, and since $\int_A f(x)\ge0$ we in fact have $\int_A f(x)>0$. $$\tag*{$\blacksquare$}$$

Wednesday, August 20, 2014

Van der Waerden's theorem

To take a break from all this real analysis, I thought I'd change gears and formalize a gem of number theory, which goes by the name of Van der Waerden's theorem. The proof is not yet complete, so instead I'll just lay out the high-level strategy of the proof. (Well, any regular mathematician would call this a proof, and a pretty detailed one at that, but for me it's only a sketch of the "real" formal proof.)

Theorem (vdw): For any finite set $R$ (the set of "colors") and any $k\in\Bbb N$, there is an $N$ such that for any "coloring" $f:\{1,\dots,N\}\to R$, there is a monochromatic arithmetic progression. This last phrase is expressed in symbols as: $$\exists c\in R;a,d\in\Bbb N:\forall m\in\{0,\dots,k-1\}:a+md\in f^{-1}(\{c\}).$$ By demanding that $a+md$ lie in the preimage of $f$ we can indirectly also demand that the entire AP lies within $\{1,\dots,N\}$.

Proof: We begin by defining some predicates, which look horrible in metamath because they take so many parameters. First, we define the "arithmetic progression function": $$A_k:\Bbb N\times\Bbb N\to{\cal P}\Bbb N\quad A_k(a,d)=\{a+md\,|\,m\in\{0,\dots,k-1\}\}$$ This function takes a length $k$, a start point and step $a,d$, and outputs the set of all points in this arithmetic progression. The parameter $k$ has been separated from the other two so that we can consider the set $\operatorname{ran}A_k$ of all length-$k$ arithmetic progressions.

Next, we define the notion that a given coloring $\langle R,N,f\rangle$ (where $f:\{1,\dots,N\}\to R$) contains a length-$k$ monochromatic AP. In metamath this predicate looks like $k\,(R\,O\,N)\,f$, because $O$ is a function which takes $R,N$ as input and returns a binary relation whose two parameters are $k,f$. In any case, we define this notion to be: $$k\,(R\,O\,N)\,f\iff\exists c\in R,x\in\operatorname{ran}A_k:x\subseteq f^{-1}(\{c\}).$$
Straightforward enough. Now we turn to our last and most complex definition, which states that a given coloring $\langle R,N,f\rangle$ contains a polychromatic $m$-tuple of arithmetic progressions of length $k$. (In metamath this looks like $k\,((R\,P\,N)`m)\,f$.) A polychromatic $m$-tuple of arithmetic progressions is a set of arithmetic progressions $a+\{1,\dots,k\}d_1,a+\{1,\dots,k\}d_2,\dots,a+\{1,\dots,k\}d_m$ which all share a common basepoint $a$ (but do not actually include $a$), and we demand that each progression be monochromatic, but no two APs have the same color. We express this as: $$k\,((R\,P\,N)`m)\,f\iff\exists a\in\Bbb N, d:\{1,\dots,m\}\to\Bbb N:\\
\quad\forall i\in 1,\dots,m:A_k(a+d_i,d_i)\subseteq f^{-1}(\{f(a+d_i)\})\,\wedge\\

The first conjunct says that the entire AP $A_k(a+d_i,d_i)$ is all the same color as $f(a+d_i)$, the first element of the progression, i.e. the progression is monochromatic; the second conjunct says that the set $\{f(a+d_i)\,|\,i\in1,\dots,m\}$ of colors of the first element of each progression has size exactly $m$, i.e. they are all distinct. (We could just as well say that the $f$-image of the union of all the AP's has size $m$, but it's less trivial to show that the size of this larger set is already at most $m$.)

Now, onto the proof proper. The main body of the proof is a double induction; at the top level we induct on $k$, where our induction hypothesis is that for any $R$ there is an $N$ such that all $R$-colorings of $\{1,\dots,N\}$ contain a monochromatic length-$k$ AP. For the base case, we treat $k=1$ and $k=2$ separately (because some of our arguments in the induction step $k\to k+1$ don't work if $k=1$). If $k=1$, then any point is an AP, so take $N=1$ and so $A_1(1,1)=\{1\}\subseteq f^{-1}(\{f(1)\})$. If $k=2$, then we can use the pigeonhole principle. Take $N=|R|+1$, and assume that $f$ contains no length-$2$ monochromatic APs. If $f(x)=f(y)$ for $x<y$, then $A_2(x,y-x)=\{x,y\}\subseteq f^{-1}(\{f(x)\})$, so $f$ has a length-$2$ AP contrary to the hypothesis, so therefore $f$ is injective and $|\{1,\dots,N\}|=N=|R|+1\le|R|$, a contradiction.

In our induction step, we assume (for $k\ge2$) that the hypothesis is true for $k$, and derive it for $k+1$. Fix $R\ne\emptyset$; if $R=\emptyset$ then there are no functions $f:\{1\}\to R$ so the statement is vacuously true for $N=1$. Let us then start another induction on $m\in\Bbb N$, where the induction hypothesis is that there is an $N$ such that every coloring on $\{1,\dots,N\}$ contains either a monochromatic length-$k+1$ AP or a polychromatic $m$-tuple of length-$k$ APs. The base case is not entirely trivial, because a $1$-tuple of length-$k$ APs is not quite the same as a monochromatic length $k$ AP (which is guaranteed to us by the $k$-induction hypothesis) because it "reserves space" for the base point of a $k+1$-long AP - it just doesn't demand that this initial point have any particular color.

To fix this, suppose that $N$ is such that every coloring of $\{1,\dots,N\}$ contains a monochromatic length-$k$ AP, and let $N_2=2N$. Given a coloring $f:\{1,\dots,2N\}$, by defining $g:\{1,\dots,N\}\to R$ by $g(x)=f(x+N)$, we know $g$ contains a monochromatic length-$k$ AP, so there is a $c,a,d$ such that for all $t=0,\dots,k-1$ we have $a+td\in\{1,\dots,N\}\wedge f(a+td)=c$. Then we can define $a'=N+a-d$; this is a positive integer because $a,a+d\in\{1,\dots,N\}$ imply $a-d=2a-(a+d)\ge 2-N>-N$. (This is why we needed $k\ge2$ at the start.) Now we have the necessary setup for a polychromatic $1$-tuple of APs, using $a'$ and $d_1=d$.

Now, for the $m$-induction step, we finally get to the interesting part of the proof. Let $W\in\Bbb N$ be such as specified by the $m$-induction hypothesis. Then the set $R^{\{1,\dots,W\}}$ of all functions $\{1,\dots,W\}\to R$ is finite, so by the $k$-induction hypothesis there is a $V$ such that all colorings $f:\{1,\dots,V\}\to R^{\{1,\dots,W\}}$ have a length-$k$ monochromatic AP. Let $N=2WV$, and now we want to show that every function $h:\{1,\dots,N\}\to R$ contains either a monochromatic $k+1$ AP or a polychromatic $m,k$ AP tuple. Define $$f(x)=\big(y\in\{1,\dots,W\}\mapsto h(y+W(x-1+V))\big);$$ effectively we are cutting the set $\{1,\dots,N\}$ into blocks of length $W$. Again, we "reserve space" of size $WV$ in our set $1,\dots,N$ so that we can move backwards a little. The hypotheses give us that $f$ contains an AP with parameters $g,a,d$ such that $f(x)=g$ for $x=a,a+d,\dots,a+(k-1)d$. Now since $f$ takes values in the function space $R^{\{1,\dots,W\}}$, $g:\{1,\dots,W\}\to R$ is really a function, which satisfies the assumptions for the $m$-induction hypothesis, so either $g$ contains a $k+1$-long AP with parameters $c,a',d'$ (in which case $h$ has a $k+1$-long AP with parameters $c,W(a-1+V)+a',d'$), or there is a polychromatic $m$-tuple of APs $a',d_1,\dots,d_m$.

In this case, consider the prospective $m+1$-tuple defined by $a''=a'+W(a-d-1+V)$ and $d''_i=d_i+dW$, $d''_{m+1}=dW$. We can verify that $a''$ is positive, because $$V-1+a-d=V-1+2a-(a+d)\ge V-1+2-V=1,$$ and for each $i\in\{1,\dots,m\}$ and $t\in\{1,\dots,k\}$, \begin{align}
so each tuple is individually monochromatic. For the last AP, $h(a''+td''_{m+1})=f(a+(t-1)d)(a')=g(a')$ is monochromatic as well. All that remains is to verify that the colors are distinct. Since we already know that $g(a'+d_i)\ne g(a'+d_j)$ for any $i\ne j$, if additionally $g(a')\ne g(a'+d_i)$ for all $i=1,\dots,m$ then we are done because $h$ contains a polychromatic $m+1$-tuple. Otherwise, there is some $i$ such that $g(a')=g(a'+d_i)$. But we already know $g(a'+d_i)=g(a'+td_i)$ for $t\in\{1,\dots,k\}$, so then $g$ contains a $k+1$ monochromatic AP, and so does $h$ (with parameters $g(a'),a'+W(a-1+V),d_i$).

This completes the $m$-induction. To complete the $k$-induction, let $m=|R|$ (which is valid since $R\ne\emptyset$). Then there is an $N$ such that all colorings have a $k+1$ monochromatic AP or a polychromatic $|R|$-tuple of length-$k$ APs. In the first case we are done, and in the second case let the parameters of the tuple be $a,d_1,\dots,d_{|R|}$. Since $|\{f(a+d_i)\,|\,i\in\{1,\dots,|R|\}\}|=|R|$ and $R$ is finite, the function $i\mapsto f(a+d_i)$ must be bijective to $R$, so there is an $i$ such that $f(a)=f(a+d_i)$, and as before this yields a monochromatic AP of length $k+1$ with parameters $a,d_i$. Thus there is a $k+1$ monochromatic AP in any case, which completes the $k$-induction and the proof. $$\tag*{$\blacksquare$}$$

There is one more nice extension to this theorem, which is the so-called "infinitary" version of Van der Waerden's theorem.

Theorem (vdwnn): For any finite set $R$ and any coloring $f:\Bbb N\to R$, there is a color $c\in R$ such that $f^{-1}(\{c\})$ contains arbitrarily long arithmetic progressions. (In symbols, $\forall k\in\Bbb N:\exists a,d\in\Bbb N:\forall m\in\{0,\dots,k-1\}:a+md\in f^{-1}(\{c\})$.)

Proof: Fix $R$ and $f$ and suppose not. Then for each $c\in\Bbb R$, there is a $k$ such that there are no $c$-colored APs of length $k$, and since each length $n\ge k$ AP contains a length $k$ AP, there are also no $c$-colored APs of length $n$ for any $n\ge k$. Thus the set of $k\in\Bbb N$ such that there are $c$-colored APs of length $k$ is bounded, and since the finite union of bounded sets is bounded, there is a uniform bound $K$ such that there are no monochromatic length $K$ APs in any color. But this is a contradiction to vdw above. $$\tag*{$\blacksquare$}$$

Monday, August 18, 2014

Mathbox update: More integrals

This is the first post on this blog in the mathbox-update category, so allow me to explain what this is. I periodically release my latest work in Metamath to the database, and in this post I go over the highlights and additions to be found in the release, like a feature list. You can see the latest additions to the database at; note in particular that the us2 server is a few days ahead of the us server that I usually link to - the us2 server is for development, so if you are reading this notice hot off the presses you may want to try replacing "" with "" in the links here. On to the update...

New theorems in this update:
  • ivthicc: The intermediate value theorem, in the form "$[f(m),f(n)]\subseteq\operatorname{ran}f$ whenever $m,n\in[a,b]$ and $f:[a,b]\to\Bbb R$ is continuous"; this is a bit more convenient to use than ivth, which explicitly states that $f(a)<u<f(b)$ and $a<b$, so you need to go through some case analysis with increasing/decreasing combinations to use this directly.
  • divcn: In a curious oversight, we have continuity of addition, subtraction, and multiplication, but not division. This derives that $f:\Bbb C\times\Bbb C^*\to\Bbb C$ defined by $f(x,y)=\frac xy$ is jointly continuous (in the product topology). (Note that $0$ is excluded from the domain.)
  • bndth: The Boundedness Theorem, topological version. A continuous function $f:X\to\Bbb R$, where $X$ is compact, is bounded above.
  • evth: The Extreme Value Theorem, topological version. A continuous function $f:X\to\Bbb R$, where $X$ is compact and nonempty, attains its maximum value somewhere on the domain. (Curiously, most books neglect to mention that $X$ has to be nonempty, but it's clearly necessary; the topology $J=\{\emptyset\}$ is compact and has empty base set, and the empty function from $X=\emptyset$ to $\Bbb R$ is continuous, yet it clearly can't attain its maximum because there are no points in the base set.)
  • evth2: A continuous function $f:X\to\Bbb R$, $X$ compact, attains its minimum.
  • evthicc: Specialize evth and evth2 to the case when $X=[a,b]$ is a closed interval of $\Bbb R$.
  • volsup*: The Lebesgue volume function preserves the limit of an increasing sequence of measurable sets.
  • df-ditg: Define the directed integral. This is basically the difference between $\int_{[a,b]}f(x)\,dx$ and $\int_a^bf(x)\,dx$ - the second is defined regardless of the order of $a,b$, and negates under a swap of upper and lower limit. It is literally defined as $\int_a^bf(x)\,dx=\int_{(a,b)}f(x)\,dx$ if $a\le b$ and $\int_a^bf(x)\,dx=-\int_{(b,a)}f(x)\,dx$ otherwise. The usage of open intervals is in anticipation of such integrals as $f(x)=\int_0^x\log|t|\,dt$, which would not make sense under a closed interval because $\log 0$ is undefined. As it is, this would be a total function on $\Bbb R$. (The integral at $x=0$ is $0$ regardless of the function being integrated.)
  • mbfmulc2re: The product of a measurable function $f:A\to\Bbb C$ and a real constant is a measurable function (the old mbfmulc2 only allowed $f:A\to\Bbb R$).
  • mbfposb: A function $f:A\to\Bbb R$ is measurable if and only if the positive and negative parts $f^+=\max(0,f)$, $f^-=(-f)^+$ are measurable.
  • mbfadd: The sum of measurable functions is measurable. (This is a nice argument by the decomposition $\{f+g>y\}=\bigcup_{r\in\Bbb Q}\{f>r\}\cap\{g>y-r\}$.)
  • itg10a, itg1ge0a, itg1lea: These are "approximate" versions of the analogous theorems itg10, itg1ge0, itg1le which relax their conditions to only hold on a set of full measure. (For example, the integral of a function that is zero for almost all $x$ is zero, where "almost all" means that the measure of the complement is zero.)
  • itg1climres*: The simple function integral preserves limiting operations on the base set.
  • mbfi1fseq: A ridiculously complicated proof that there is an increasing sequence of simple functions that converge pointwise to any positive measurable function. The proof uses the sequence $$g_n(x)=\begin{cases} \max(2^{-n}\lfloor 2^nf(x)\rfloor,n) & -n\le x\le n \\ 0 & o.w. \end{cases}$$ and verifying that the preimage of each point is a measurable set (by describing it explicitly in terms of preimages of $f$) is a study in case analysis.
  • itg2const, itgconst: Integral of a constant function.
  • itg2seq: A nice result, although now mostly precluded by mbfi1fseq and itg2i1fseq which give more detailed information on the sequence. For any nonnegative function $f$, there is a sequence of simple functions less than $f$ whose integrals converge to $\int f$.
  • itg2split, itgsplit: Unlike the analogous theorem for sums, we don't use itg2add to prove this, and instead prove it "directly" - the upshot of this is that we can work with extended real functions (for which addition is not defined), and also we don't need the countable choice baggage carried along with itg2add. If $A,B$ are almost disjoint measurable sets and $f$ is integrable on $A$ and $B$, then $\int_{A\cup B}f(x)\,dx=\int_A f(x)\,dx+\int_B f(x)\,dx$.
  • ditgsplit: The best argument in favor of the definition df-ditg is that the path addition rule $\int_a^cf=\int_a^bf+\int_b^cf$ holds regardless of the relative ordering of $a,b,c$, unlike the analogous theorem for integrals of closed intervals, $\int_{[a,c]}f=\int_{[a,b]}f+\int_{[b,c]}f$, which only holds for $a\le b\le c$.
  • mbfmono*, itg2mono*: The Monotone Convergence Theorem for integrals. This and the other starred* entries in the list are explained in much more detail in my previous blog entry.
  • itg2i1fseq: Completing the work of mbfi1fseq, here we show that given an increasing sequence of simple functions that converge pointwise to the positive measurable function $f$, the integrals of the simple functions also converge to the integral of $f$. (Combining these two gives an alternative proof of itg2seq.)
  • itg2add, itgadd: The main theorem of this update, which is not too complicated but carries some high-powered prerequisites above: If $f,g$ are integrable on $A$, then $$\int_A(f+g)\,dx=\int_Af\,dx+\int_Ag\,dx.$$

Sunday, August 17, 2014

Extensible structures, and "Every vector space has a basis"

I'd like to take a little time to talk about the proof of lbsex, which says that every vector space has a basis, or every vector space is free (a free module is a module with a basis; general modules are not necessarily free). This one is interesting because it uses some tricks that as far as I am aware are not in any standard proofs, which I had to use because finite linear combinations are not yet easy to state over arbitrary modules. Note that these articles are best read by also consulting the frequent links to the associated formal proofs, so that you can appreciate the similarities and differences between a hand proof and a formal proof.

First, some background, and also a little window into how Metamath does structures. A left module, written $\rm{LMod}$ in metamath, is an example of an extensible structure, which is characterized by having various "accessor" functions defined on it to get multiple components. In this case, a left module is an extension of a group, which is another extensible structure with two defined operations: the base set $\operatorname{Base}(W)$ and the group operation $+_g(W)$. These functions are simply defined as $\operatorname{Base}(W)=W(1)$ and $+_g(W)=W(2)$, so that a group object is really a function from the natural numbers or a subset thereof to the operations. This is a little different from the standard literature definition "A group is an ordered pair $\langle V,+\rangle$ such that..." which does not allow for extensions of the set operations.

Given all the above, given a set $W$, define (throughout this article) $V=\operatorname{Base}(W)$ and $+=+_g(W)$ (note that these operations always return something even if $W$ is not a function, so it is always valid to make such a definition). Then $W$ is a group iff:
  • $\forall a,b,c\in V:(a+b\in V\wedge(a+b)+c=a+(b+c))$
  • $\exists e\in V\,\forall a\in V:(e+a=a\wedge\exists m\in V:m+a=e)$
It is not too hard to see that the first axiom is just closure and associativity and the second is existence of an identity element and inverses. Now, let us define a ring, which shows off the "extensible" part of our extensible structures. Define the function $\cdot_r(W)=W(3)$. Letting $\cdot=\cdot_r(W)$, $W$ is a ring iff $W$ is an abelian group (which adds the axiom $\forall a,b:a+b=b+a$ to the group axioms) and:
  •  $\forall x,y,z\in V:$
    • $x\cdot y\in V$
    • $(x\cdot y)\cdot z=x\cdot(y\cdot z)$
    • $x\cdot(y+z)=(x\cdot y)+(x\cdot z)$
    • $(x+y)\cdot z=(x\cdot z)+(y\cdot z)$
  • $\exists u\in V\,\forall x\in V:u\cdot x=x=x\cdot u$
Here we have closure, associativity of multiplication, left- and right-distributivity, and existence of a multiplicative unit. (This is a unital ring.) Note how now we are demanding that $W$ contain at least three meaningful operations, stored at $W(1)$, $W(2)$, $W(3)$, so that a ring still satisfies the axioms of a group, even though it has "extra elements" that were not mentioned in the group definition.

Finally, let us define a left module. Define $\operatorname{Scalar}(W)=W(5)$ and $\operatorname{vsca}(W)=W(6)$. (We skipped $4$ because that is used for the conjugation of a star field, and we don't want to overlap slot numbers.) A left module is an abelian group $W$ such that $F=\operatorname{Scalar}(W)$ is a ring and (abbreviating the operations of $F$ by a subscript $F$, and abbreviating $\cdot_s=\operatorname{vsca}(W)$ and $K=\operatorname{Base}(F)$):
  • $\forall q,r\in K,x,w\in V:$
    • $r\cdot_s w\in V$
    • $r\cdot_s(w+x)=(r\cdot_sw)+(r\cdot_sx)$
    • $(q+_Fr)\cdot_sw=(q\cdot_sw)+(r\cdot_sw)$
    • $(q\cdot_Fr)\cdot_sw=q\cdot_s(r\cdot_sw)$
    • $1_F\cdot_sw=w$
These are most of the basic axioms of a vector space, except that there are no commutativity-type axioms for the multiplication operations $\cdot_F$ and $\cdot_s$, and there is no division. The only difference between a left module and a (left) vector space is the ability to divide elements: a left vector space is a left module such that $\operatorname{Scalar}(W)$ is a division ring; a division ring is a ring such that $\langle V\setminus\{0\},\cdot\rangle$ is a group (which is a fancy way of saying that for any $x\ne0$, there is a $y$ such that $x\cdot y=1$).

We need a few more definitions before we can get to the proof: a subspace of the left module $W$ (using the same abbreviations as above) is a subset $S$ of $V$ such that $a\cdot_sx+y\in S$ for any $a\in K$, $x,y\in S$; the span of a subset $S$ of $V$ is the smallest subspace containing $S$; and a basis is a set $B\subseteq V$ such that $\DeclareMathOperator{\span}{span}\span(B)=V$ and $x\notin\span(B\setminus\{x\})$ for all $x\in B$. A set satisfying only the first condition is called a spanning set, while a set satisfying only the second condition is called linearly independent.

Theorem (lbsex): Any vector space has a basis.

Proof: This is a special case of the following more general theorem, which we prove instead.

Theorem (lbsext): For any linearly independent subset $C\subseteq V$, there is a basis $B\supseteq C$ for the vector space $W$.

Proof: This theorem is a well-known equivalent to the Axiom of Choice, so it's no surprise that we will be using Zorn's lemma (zornn0) as part of the proof. The statement that we will be using says that if $S$ is a nonempty collection of sets such that for all nonempty $y\subseteq S$ with $\forall s,t\in y:(s\subseteq t\vee t\subseteq s)$, we also have $\bigcup y\in S$, then there is an $s\in S$ with no proper superset in $S$. This is usually read as "if any chain in $S$ has an upper bound in $S$, then $S$ has a maximal element", but this requires more definitions than necessary for understanding.

The set $S$ in question here is the set of all linearly independent subsets of $V$ containing $C$. This is nonempty because $C\in S$, and so next we have to establish that chains have an upper bound; so suppose $A\subseteq S$ is a nonempty collection of sets such that any two elements are in a subset relation. Actually, we will trade this for the weaker assumption that $A$ is closed under binary union, that is, for all $s,t\in A:s\cup t\in A$; this follows from the chain assumption because if $s\subseteq t$ then $s\cup t=t\in A$ and if $t\subseteq s$ then $s\cup t=s\in A$.

Our goal is to show that $\bigcup A\in S$, which is to say, $\bigcup A$ is a linearly independent subset of $V$ containing $C$. Now clearly $C\subseteq\bigcup A$ because $A$ is nonempty and every element of $A$ has $C$ as a subset; so now we need to show that for any $x\in\bigcup A$, $x\notin\span(\bigcup A\setminus\{x\})$. Rather than exploiting the "finite linear combinations" characterization of span, which we don't have, we leverage the "smallest subspace" characterization, by showing that a particular set is a subspace containing $\bigcup A\setminus\{x\}$.

Lemma (lbsextlem2): Let $T=\bigcup_{u\in A}\span(u\setminus\{x\})$. Then $T$ is a subspace, and $\bigcup A\setminus\{x\}\subseteq T$.

Proof: Let $r\in K,v,w\in T$. Then there is an $m,n\in A$ such that $v\in\span(m\setminus\{x\})$ and an $w\in\span(n\setminus\{x\})$. Since $A$ is closed under binary union, $m\cup n\in A$, and also $v,w\in\span(m\cup n\setminus\{x\})$ because span respects the subset relation. Since a span is a subspace, we then have $rv+w\in\span(m\cup n\setminus\{x\})$, so $rv+w\in T$. Thus $T$ is a subspace. To show $\bigcup A\setminus\{x\}\subseteq T$, let $y\in u\in A$ with $y\ne x$. Then $y\in u\setminus\{x\}\subseteq\span(u\setminus\{x\})$, so $y\in T$. $$\tag*{$\blacksquare$}$$

It follows that $\span(\bigcup A\setminus\{x\})\subseteq T$, so it suffices to show that for any $x\in\bigcup A$, $x\notin T$.

Lemma (lbsextlem3): If $A\subseteq S$ is nonempty and $\forall s,t\in A: s\cup t\in A$, then $\bigcup A\in S$.

Proof: From earlier remarks, all that remains is to show that for all $x\in\bigcup A$, $x\notin T$, so suppose that $x\in y\in A$ and $u\in A$, $x\in\span(u\setminus\{x\})$. Then since $y\cup u\in A$, $y\cup u$ is linearly independent, and so $x\in y\subseteq y\cup u\to x\notin\span(y\cup u\setminus\{x\})$, but since $x\in\span(u\setminus\{x\})\subseteq\span(y\cup u\setminus\{x\})$, this is a contradiction. $$\tag*{$\blacksquare$}$$

This finishes the prerequisites to zornn0, so now suppose that $B\in S$ such that for all $y\in S,y\not\subset B$. We wish to show that this $B$ is a basis containing $C$. It is immediate that $B$ is linearly independent and contains $C$, since $B\in S$; it remains to prove (lbsextlem4) that for any $w\in V$, $w\in\span(B)$. We will need one more lemma, where the division ring property comes into play; we state it now.

Theorem (lspsolv): For any vector space $W$, $A\subseteq V$, and $y\in V$, if $x\in\span(A\cup\{y\})$ but $x\notin\span(A)$, then $y\in\span(A\cup\{x\})$.

Proof: This is clear from the "finite linear combinations" characterization of span; if $x=\sum_{v\in A\cup\{y\}}a_vv$ where only finitely many $a_v$ are nonzero, then the coefficient $a_y$ must be nonzero or else this would show that $x\in\span(A)$, so we can "solve for $y$" in the equation to write $y=a_y^{-1}(x-\sum_{v\in A}a_vv)$ and thus show that $y$ is in the span of $A\cup\{x\}$.

In our characterization, we have to work with subspaces, so the subspace we consider for this proof is $$Q=\{z\in V:\exists r\in K\, z+ry\in\span(A)\}.$$ To show this is a subspace, suppose $a\in K,u,v\in Q$. Then there are $s,t\in K$ such that $u+sy,v+ty\in\span(A)$, so since $\span(A)$ is a subspace, $$(au+v)+(as+t)y=a(u+sy)+(v+ty)\in\span(A),$$ so since $as+t\in K$, $au+v\in Q$ and $Q$ is a subspace.

Furthermore, $A\cup\{y\}\subseteq Q$ because $z\in A\to z+0y\in\span(A)$, and $y+(-1)y=0\in\span(A)$. Therefore $x\in\span(A\cup\{y\})\subseteq Q$, and there is an $r\in K$ such that $x+ry\in\span(A)$. (This is all proved in lspsolvlem.)

Now we are almost there; if $r=0$ then $x\in\span(A)$ against the hypothesis, so by the division property $r^{-1}$ exists, and we can write $y=r^{-1}((x+ry)-x)$, and since $\span(A\cup\{x\})$ is a subspace containing $x$ and $x+ry$, $y$ is also in $\span(A\cup\{x\})$. $$\tag*{$\blacksquare$}$$

Returning to the proof of lbsext, our strategy is to show that if $w\in V\setminus\span(B)$, then $B\cup\{w\}\supset B$ is also linearly independent, contradicting the maximality of $B$. For this to be true, we need that $\forall x\in B\cup\{w\}:x\notin\span(B\cup\{w\}\setminus\{x\})$. For $x=w$, this is easy, because $B\cup\{w\}\setminus\{w\}=B$ and $w\notin\span(B)$ by assumption. If $x\in B$, suppose that $x\in\span(B\cup\{w\}\setminus\{x\})$ for a contradiction. Since $x\ne w$, $B\cup\{w\}\setminus\{x\}=B\setminus\{x\}\cup\{w\}$, so we can apply lspsolv above: $x\in\span(B\setminus\{x\}\cup\{w\})$ but $x\notin\span(B\setminus\{x\})$ (because $B$ is linearly independent), so $w\in\span(B\setminus\{x\}\cup\{x\})=\span(B)$, contrary to hypothesis. Thus $B$ is a basis containing $C$, which completes the proof. $$\tag*{$\blacksquare$}$$

Saturday, August 16, 2014

The Monotone Convergence Theorem for integrals

Today's post is about the Monotone Convergence Theorem and the theorems leading up to it.

Theorem (itg2mono): If $(f_n)_{n\in\Bbb N}$ is a sequence of measurable functions $\Bbb R\to[0,\infty)$ such that $$0\le f_1\le f_2\le\dots\le f_n\le\dots$$ and the pointwise limit $g$ of the sequence $(f_n)$ exists, then $g$ is measurable and $\int g=\lim_{n\to\infty}\int f_n$ (note that for increasing sequences, $\lim$ and $\sup$ are equivalent, so the statement may differ from the Metamath version for this reason).

Proof: First, we prove $g$ is measurable. (mbfmono) It is sufficient to prove that the inverse image $\{g>t\}=\{x\in\Bbb R|g(x)>t\}$ is measurable, which follows from $\{g>t\}=\bigcup_{n\in\Bbb N}\{f_n>t\}$, because $t<g(x)=\sup_{n\in\Bbb N}f_n(x)\iff\exists n\in\Bbb N:t<f_n(x)$ and the countable union of the measurable sets $\{f_n>t\}$ is measurable.

Before tackling the main claim $\int g=\lim_{n\to\infty}\int f_n$, we will need a few more lemmas, so let's get those down first.

Theorem (volsup): If $(A_n)_{n\in\Bbb N}$ is a sequence of measurable sets such that $A_1\subseteq A_2\subseteq\dots\subseteq A_n\subseteq\dots\subseteq\Bbb R$ and $\bigcup_{n\in\Bbb N}A_n=A$, then $\DeclareMathOperator{\vol}{vol}\vol(A)=\lim_{n\to\infty}\vol(A_n)$, where the limit may be $\infty$.

Proof: First, we address the case when there is an $A_k$ such that $\vol(A_k)=\infty$. In this case, $A_k\subseteq A$ implies $\vol(A)=\infty$, and $$\lim_{n\to\infty}\vol(A_n)=\sup_{n\to\infty}\vol(A_n)\ge\vol(A_k)=\infty$$ so $\lim_{n\to\infty}\vol(A_n)=\infty$ as well.

Otherwise, all the $\vol(A_n)$ are real, so we are free to do arithmetic on the volumes. Consider the  sequence $B_n=A_n\setminus\bigcup_{k=1}^{n-1}A_k=A_n\setminus A_{n-1}$ (where the second representation is only valid for $n\ge2$). This is a set of pairwise-disjoint measurable sets, so from voliun we have $\vol(\bigcup_{n\in\Bbb N}B_n)=\sum_{n=1}^\infty\vol(B_n)$. Now each $B_n\subseteq A_n$, so $\bigcup_{n\in\Bbb N}B_n\subseteq A$, and each $x\in A$ is in some $A_k$, and for the minimal $k$, it is also in $B_k$. Thus $A=\bigcup_{n\in\Bbb N}B_n$. As for the sum, we prove by induction that $\sum_{k=1}^n\vol(B_n)=\vol(A_n)$ and therefore the sequence of partial sums of $\vol(B_n)$ is the sequence $\vol(A_n)$ and $\sum_{n=1}^\infty\vol(B_n)=\lim_{n\to\infty}\vol(A_n)$.

For $n=1$, we have $B_1=A_1$ so $\sum_{k=1}^1\vol(B_n)=\vol(A_1)$, and assuming that this is true for $n$, we have $$\sum_{k=1}^{n+1}\vol(B_n)=\vol(A_n)+\vol(B_{n+1})=\vol(A_n)+\vol(A_{n+1}\setminus A_n),$$ and since these are disjoint sets and $A_n\subseteq A_{n+1}$ we have $\vol(A_n)+\vol(A_{n+1}\setminus A_n)=\vol(A_{n+1})$. $$\tag*{$\blacksquare$}$$

Next we use this theorem to prove that the integral on simple functions respects limit operations on the base set.

Theorem (itg1climres): If $(A_n)_{n\in\Bbb N}$ is a sequence of measurable sets such that $A_1\subseteq A_2\subseteq\dots\subseteq A_n\subseteq\dots\subseteq\Bbb R$ and $\bigcup_{n\in\Bbb N}A_n=\Bbb R$, and $\phi$ is a simple function, then $\lim_{n\to\infty}\int_{A_n}\phi=\int\phi$.

Proof: We expand out the definition of the integral on simple functions: $\int\phi=\sum_{x\in\Bbb R^*}x\vol(\phi^{-1}\{x\})$, where the sum is nominally over all nonzero reals but is only nonzero for finitely many $x$. We can distribute the limit over the sum: $$\lim_{n\to\infty}\int_{A_n}\phi=\lim_{n\to\infty}\sum_{x\in\Bbb R^*}x\vol(\phi^{-1}\{x\}\cap A_n)=\sum_{x\in\Bbb R^*}x\lim_{n\to\infty}\vol(\phi^{-1}\{x\}\cap A_n),$$ so we are reduced to proving $\lim_{n\to\infty}\vol(\phi^{-1}\{x\}\cap A_n)=\vol(\phi^{-1}\{x\})$, which is an application of volsup above, taking the sets $(\phi^{-1}\{x\}\cap A_n)_{n\in\Bbb N}$ as our increasing sequence. $$\tag*{$\blacksquare$}$$

Now we are finally ready to return to our proof of the main theorem itg2mono. The easy direction of our equality is $\lim_{n\to\infty}\int f_n\le\int g$, which is true because each $f_n\le g$ so $\int f_n\le\int g$ by itg2le. Now the definition of $\int g$ is as a supremum over all simple functions $\phi\le g$. Thus to show $\int g\le\lim_{n\to\infty}\int f_n$ we can show that $S:=\lim_{n\to\infty}\int f_n$ is an upper bound on $\int\phi$ for any $\phi\le g$. Now we are going to prove that $t\int\phi\le S$ for any $0<t<1$, but to get from there to $\int\phi\le S$ takes a little trickery.

We assume that $\int\phi>S$ to start with. Since $S\in[0,\infty]$ and $\int\phi\in\Bbb R$ to start with (remember that $S$ is a supremum, possibly infinite, of integrals of nonnegative functions, while $\phi$ is a simple function whose integral is real but possibly negative), the inequality $\int\phi>S$ immediately implies $S\in\Bbb R$ and $\int\phi>0$. Now we can use alrple to use that for any $x>0$, $\int\phi\le S+x$ and hence prove that $\int\phi\ge S$ (in contradiction to our assumption, thus showing $\int\phi\ge S$ unconditionally). Define $t=\max(\frac12,\frac S{S+x})$, which is valid since $S\ge 0$ and $x>0$. Then $t\ge\frac12>0$ and $t\le\frac S{S+x}<1$, so by our lemma below, $t\int\phi\le S$. Since $\int\phi>0$ and $t>0$ we can infer $S>0$, so we can cancel the $S$ and multiply by $S+x$ in $\frac S{S+x}\int\phi\le t\int\phi\le S$ to get $\int\phi\le S+x$ as we wanted to show.

Now all that remains to prove is our lemma:

Lemma (itg2monolem): For any $0<t<1$, and any simple function $\phi\le g$, we have $\int\phi\le \lim_{n\to\infty}\int f_n$.

Proof: Define the sequence $A_n=\{x\in\Bbb R|t\phi(x)\le f_n(x)\}$. Then $\int_{A_n}t\phi\le\int f_n$ because the restricted function $t\phi\upharpoonright A_n$ is always dominated by $f_n$: if $x\notin A_n$ then $(t\phi\upharpoonright A_n)(x)=0\le f(n)$ and if $x\in A_n$ then $(t\phi\upharpoonright A_n)(x)=t\phi(x)\le f(n)$. We can take this inequality in the limit, and thus $$\int t\phi=\lim_{n\to\infty}\int_{A_n}t\phi\le\lim_{n\to\infty}\int f_n,$$ where the first equality is none other than our earlier lemma itg1climres, which can be applied because $A_n=\{0\le f_n-t\phi\}$ is a measurable set (since $f_n-t\phi$ is a sum of a simple function and a measurable function, which is measurable), the $A_n$ sets form an inclusion chain since $f_{n+1}\le f_n$, and the union of all of them is $\Bbb R$ because either $\phi(x)>0$ in which case $t\phi(x)<\phi(x)\le g(x)$ and there is some $f_n(x)>t\phi(x)$, or $\phi(x)\le0$ in which case $t\phi(x)\le0\le f_n(x)$ for every $n$. This completes the proof. $$\tag*{$\blacksquare$}$$


For a long time now, I've considered starting a weblog, and over the years I have had many people suggest that I start one, but the question that is always unanswered is: "What do I write?" I've decided to make this a blog about the subject that has consumed my thoughts for the past year, and that's Metamath.

For those who don't know, Metamath is a proof language for writing mathematical proofs which can be checked by an associated verification program, and I have been using Metamath to write proofs and submit them to the database, called, since around the beginning of 2013. Back then most of what I did was revise old theorems to make them more efficient or avoid usage of the axiom of choice or clean them up in some way, and then in February 2014 I was made aware of Freek Wiedijk's Formalizing 100 Theorems page, which tracks the formalization of 100 of the most famous theorems in mathematics, and I decided to make it my goal to prove all of these.

When we first started tracking our progress, there were 19 theorems already proven, and over the past 8 months 15 more have been proven. Of those, 12 were my work. This includes such theorems as
  • The Basel problem: $\sum_{n=1}^\infty\frac1{n^2}=\frac{\pi^2}6$ (basel),
  • Bertrand's postulate: $n\in\Bbb N\to\exists p\in\Bbb P:n<p\le 2n$ (bpos),
  • Lagrange's theorem for groups: $G$ finite group, $H\subseteq G$ subgroup implies $|H|$ divides $|G|$, (lagsubg),
and many others. I hope in later articles to talk a little more about the way these proofs are being done, and also highlight my latest work, and talk more generally about metamath and mmj2, the program I use (and develop) as a proof assistant to make the proofs. My current project is a proof of the Fundamental Theorem of Calculus, which has required a lot of work on defining a (Lebesgue) integral and derivative. But for now, welcome to my blog.