Sunday, August 31, 2014

Integral of a positive function is positive

Here is a nice theorem that is not as simple as the statement would have you believe:

Theorem (itggt0): If $f$ is integrable and $A$ is a measurable set such that $0<\DeclareMathOperator{\vol}{vol}\vol(A)\le\infty$ and $f(x)>0$ for all $x\in A$, then $\int_A f(x)\,dx>0$.

The analogous theorem with $\le$ is trivial (itgge0), but this theorem requires some of the countable additivity properties of the volume function. We postpone the proof until we have proven the needed lemmas. First, we establish what I will call, for lack of a better name, the Intermediate Value Theorem for the Lebesgue volume function.

Theorem (volivth): If $A$ is a measurable set and $0\le x\le\vol(A)$, then there is a measurable $B\subseteq A$ such that $\vol(B)=x$. (Note that $\vol(A)$ is allowed to be infinite.)

Proof: If $x=\vol(A)$, then $B=A$ satisfies the requirements, so assume instead that $x<\vol(A)$. Consider the sequence $A\cap[-n,n]$ for $n\in\Bbb N$. This is an increasing sequence of measurable sets, so by volsup we have $\sup_{n\in\Bbb N}\vol(A\cap[-n,n])=\vol(A)$. Thus since $x<\vol(A)$ there is an $n$ such that $x<\vol(A\cap[-n,n])$.

Now define the function $f(y)=\vol(A\cap[-n,y])$. If $y\le y'$, then $[-n,y]\subseteq[-n,y']\implies f(y)\le f(y')$, so $f$ is a weakly increasing function, and since $(A\cap[-n,y'])\setminus(A\cap[-n,y])\subseteq[y,y']$, we also have $f(y')-f(y)\le\vol([y,y'])=y'-y$, so $f$ is a continuous function $\Bbb R\to\Bbb R$. Since $f(-n)=0\le x<\vol(A\cap[-n,n])=f(n)$, the regular intermediate value theorem implies that there is some $z$ such that $\vol(A\cap[-n,z])=x$, and for this $z$, $B=A\cap[-n,z]$ satisfies the requirements of the conclusion. $$\tag*{$\blacksquare$}$$

 We need this theorem to extend the formula $\int_A c\,dx=c\vol(A)$ to the case $\vol(A)=\infty$. (We can't use itg2const directly because the formula is only valid when $\vol(A)$ is a real number, since multiplication only works on complex numbers.) If $\vol(A)=\infty$ and $c>0$, then for any $x\in\Bbb R$ there is a $B\subseteq A$ with $\vol(B)=x/c$, and then $\int_B c\,dx=x\le\int_A c\,dx$. Thus $\int_A c\,dx$ is larger than any real number, so $\int_A c\,dx=\infty$ as well. (This is theorem itg2const2.)

Now we can prove the main theorem.

Proof (of itggt0): Define the sequence $A_n=\{x:f(x)>\frac1n\}$. This is clearly an increasing sequence of measurable sets (since $f$ is measurable), and since $f(x)>0$ for all $x\in A$ so that there is some $k$ with $f(x)>\frac1k$, $A\subseteq\bigcup_{n\in\Bbb N}A_n$. Now assume that $\int_A f(x)\,dx=0$. Then $$0\le\int_{A_n}\frac1n\,dx\le\int_{A_n}f(x)\,dx\le\int_A f(x)\,dx=0,$$ so from itg2const and itg2const2, $\vol(A_n)=0$ (because $\vol(A_n)=\infty$ implies $\int_{A_n}\frac1n\,dx=\infty\ne0$ and $\vol(A_n)\in\Bbb R$ implies $n\int_{A_n}\frac1n\,dx=\vol(A_n)=0$). Since this is true for every $n$, we have $\vol(A)\le\vol(\bigcup_{n\in\Bbb N}A_n)=\lim_{n\to\infty}\vol(A_n)=0$, which contradicts the hypothesis $\vol(A)>0$. Therefore $\int_A f(x)\ne0$, and since $\int_A f(x)\ge0$ we in fact have $\int_A f(x)>0$. $$\tag*{$\blacksquare$}$$

Wednesday, August 20, 2014

Van der Waerden's theorem

To take a break from all this real analysis, I thought I'd change gears and formalize a gem of number theory, which goes by the name of Van der Waerden's theorem. The proof is not yet complete, so instead I'll just lay out the high-level strategy of the proof. (Well, any regular mathematician would call this a proof, and a pretty detailed one at that, but for me it's only a sketch of the "real" formal proof.)

Theorem (vdw): For any finite set $R$ (the set of "colors") and any $k\in\Bbb N$, there is an $N$ such that for any "coloring" $f:\{1,\dots,N\}\to R$, there is a monochromatic arithmetic progression. This last phrase is expressed in symbols as: $$\exists c\in R;a,d\in\Bbb N:\forall m\in\{0,\dots,k-1\}:a+md\in f^{-1}(\{c\}).$$ By demanding that $a+md$ lie in the preimage of $f$ we can indirectly also demand that the entire AP lies within $\{1,\dots,N\}$.

Proof: We begin by defining some predicates, which look horrible in metamath because they take so many parameters. First, we define the "arithmetic progression function": $$A_k:\Bbb N\times\Bbb N\to{\cal P}\Bbb N\quad A_k(a,d)=\{a+md\,|\,m\in\{0,\dots,k-1\}\}$$ This function takes a length $k$, a start point and step $a,d$, and outputs the set of all points in this arithmetic progression. The parameter $k$ has been separated from the other two so that we can consider the set $\operatorname{ran}A_k$ of all length-$k$ arithmetic progressions.

Next, we define the notion that a given coloring $\langle R,N,f\rangle$ (where $f:\{1,\dots,N\}\to R$) contains a length-$k$ monochromatic AP. In metamath this predicate looks like $k\,(R\,O\,N)\,f$, because $O$ is a function which takes $R,N$ as input and returns a binary relation whose two parameters are $k,f$. In any case, we define this notion to be: $$k\,(R\,O\,N)\,f\iff\exists c\in R,x\in\operatorname{ran}A_k:x\subseteq f^{-1}(\{c\}).$$
Straightforward enough. Now we turn to our last and most complex definition, which states that a given coloring $\langle R,N,f\rangle$ contains a polychromatic $m$-tuple of arithmetic progressions of length $k$. (In metamath this looks like $k\,((R\,P\,N)`m)\,f$.) A polychromatic $m$-tuple of arithmetic progressions is a set of arithmetic progressions $a+\{1,\dots,k\}d_1,a+\{1,\dots,k\}d_2,\dots,a+\{1,\dots,k\}d_m$ which all share a common basepoint $a$ (but do not actually include $a$), and we demand that each progression be monochromatic, but no two APs have the same color. We express this as: $$k\,((R\,P\,N)`m)\,f\iff\exists a\in\Bbb N, d:\{1,\dots,m\}\to\Bbb N:\\
\quad\forall i\in 1,\dots,m:A_k(a+d_i,d_i)\subseteq f^{-1}(\{f(a+d_i)\})\,\wedge\\

The first conjunct says that the entire AP $A_k(a+d_i,d_i)$ is all the same color as $f(a+d_i)$, the first element of the progression, i.e. the progression is monochromatic; the second conjunct says that the set $\{f(a+d_i)\,|\,i\in1,\dots,m\}$ of colors of the first element of each progression has size exactly $m$, i.e. they are all distinct. (We could just as well say that the $f$-image of the union of all the AP's has size $m$, but it's less trivial to show that the size of this larger set is already at most $m$.)

Now, onto the proof proper. The main body of the proof is a double induction; at the top level we induct on $k$, where our induction hypothesis is that for any $R$ there is an $N$ such that all $R$-colorings of $\{1,\dots,N\}$ contain a monochromatic length-$k$ AP. For the base case, we treat $k=1$ and $k=2$ separately (because some of our arguments in the induction step $k\to k+1$ don't work if $k=1$). If $k=1$, then any point is an AP, so take $N=1$ and so $A_1(1,1)=\{1\}\subseteq f^{-1}(\{f(1)\})$. If $k=2$, then we can use the pigeonhole principle. Take $N=|R|+1$, and assume that $f$ contains no length-$2$ monochromatic APs. If $f(x)=f(y)$ for $x<y$, then $A_2(x,y-x)=\{x,y\}\subseteq f^{-1}(\{f(x)\})$, so $f$ has a length-$2$ AP contrary to the hypothesis, so therefore $f$ is injective and $|\{1,\dots,N\}|=N=|R|+1\le|R|$, a contradiction.

In our induction step, we assume (for $k\ge2$) that the hypothesis is true for $k$, and derive it for $k+1$. Fix $R\ne\emptyset$; if $R=\emptyset$ then there are no functions $f:\{1\}\to R$ so the statement is vacuously true for $N=1$. Let us then start another induction on $m\in\Bbb N$, where the induction hypothesis is that there is an $N$ such that every coloring on $\{1,\dots,N\}$ contains either a monochromatic length-$k+1$ AP or a polychromatic $m$-tuple of length-$k$ APs. The base case is not entirely trivial, because a $1$-tuple of length-$k$ APs is not quite the same as a monochromatic length $k$ AP (which is guaranteed to us by the $k$-induction hypothesis) because it "reserves space" for the base point of a $k+1$-long AP - it just doesn't demand that this initial point have any particular color.

To fix this, suppose that $N$ is such that every coloring of $\{1,\dots,N\}$ contains a monochromatic length-$k$ AP, and let $N_2=2N$. Given a coloring $f:\{1,\dots,2N\}$, by defining $g:\{1,\dots,N\}\to R$ by $g(x)=f(x+N)$, we know $g$ contains a monochromatic length-$k$ AP, so there is a $c,a,d$ such that for all $t=0,\dots,k-1$ we have $a+td\in\{1,\dots,N\}\wedge f(a+td)=c$. Then we can define $a'=N+a-d$; this is a positive integer because $a,a+d\in\{1,\dots,N\}$ imply $a-d=2a-(a+d)\ge 2-N>-N$. (This is why we needed $k\ge2$ at the start.) Now we have the necessary setup for a polychromatic $1$-tuple of APs, using $a'$ and $d_1=d$.

Now, for the $m$-induction step, we finally get to the interesting part of the proof. Let $W\in\Bbb N$ be such as specified by the $m$-induction hypothesis. Then the set $R^{\{1,\dots,W\}}$ of all functions $\{1,\dots,W\}\to R$ is finite, so by the $k$-induction hypothesis there is a $V$ such that all colorings $f:\{1,\dots,V\}\to R^{\{1,\dots,W\}}$ have a length-$k$ monochromatic AP. Let $N=2WV$, and now we want to show that every function $h:\{1,\dots,N\}\to R$ contains either a monochromatic $k+1$ AP or a polychromatic $m,k$ AP tuple. Define $$f(x)=\big(y\in\{1,\dots,W\}\mapsto h(y+W(x-1+V))\big);$$ effectively we are cutting the set $\{1,\dots,N\}$ into blocks of length $W$. Again, we "reserve space" of size $WV$ in our set $1,\dots,N$ so that we can move backwards a little. The hypotheses give us that $f$ contains an AP with parameters $g,a,d$ such that $f(x)=g$ for $x=a,a+d,\dots,a+(k-1)d$. Now since $f$ takes values in the function space $R^{\{1,\dots,W\}}$, $g:\{1,\dots,W\}\to R$ is really a function, which satisfies the assumptions for the $m$-induction hypothesis, so either $g$ contains a $k+1$-long AP with parameters $c,a',d'$ (in which case $h$ has a $k+1$-long AP with parameters $c,W(a-1+V)+a',d'$), or there is a polychromatic $m$-tuple of APs $a',d_1,\dots,d_m$.

In this case, consider the prospective $m+1$-tuple defined by $a''=a'+W(a-d-1+V)$ and $d''_i=d_i+dW$, $d''_{m+1}=dW$. We can verify that $a''$ is positive, because $$V-1+a-d=V-1+2a-(a+d)\ge V-1+2-V=1,$$ and for each $i\in\{1,\dots,m\}$ and $t\in\{1,\dots,k\}$, \begin{align}
so each tuple is individually monochromatic. For the last AP, $h(a''+td''_{m+1})=f(a+(t-1)d)(a')=g(a')$ is monochromatic as well. All that remains is to verify that the colors are distinct. Since we already know that $g(a'+d_i)\ne g(a'+d_j)$ for any $i\ne j$, if additionally $g(a')\ne g(a'+d_i)$ for all $i=1,\dots,m$ then we are done because $h$ contains a polychromatic $m+1$-tuple. Otherwise, there is some $i$ such that $g(a')=g(a'+d_i)$. But we already know $g(a'+d_i)=g(a'+td_i)$ for $t\in\{1,\dots,k\}$, so then $g$ contains a $k+1$ monochromatic AP, and so does $h$ (with parameters $g(a'),a'+W(a-1+V),d_i$).

This completes the $m$-induction. To complete the $k$-induction, let $m=|R|$ (which is valid since $R\ne\emptyset$). Then there is an $N$ such that all colorings have a $k+1$ monochromatic AP or a polychromatic $|R|$-tuple of length-$k$ APs. In the first case we are done, and in the second case let the parameters of the tuple be $a,d_1,\dots,d_{|R|}$. Since $|\{f(a+d_i)\,|\,i\in\{1,\dots,|R|\}\}|=|R|$ and $R$ is finite, the function $i\mapsto f(a+d_i)$ must be bijective to $R$, so there is an $i$ such that $f(a)=f(a+d_i)$, and as before this yields a monochromatic AP of length $k+1$ with parameters $a,d_i$. Thus there is a $k+1$ monochromatic AP in any case, which completes the $k$-induction and the proof. $$\tag*{$\blacksquare$}$$

There is one more nice extension to this theorem, which is the so-called "infinitary" version of Van der Waerden's theorem.

Theorem (vdwnn): For any finite set $R$ and any coloring $f:\Bbb N\to R$, there is a color $c\in R$ such that $f^{-1}(\{c\})$ contains arbitrarily long arithmetic progressions. (In symbols, $\forall k\in\Bbb N:\exists a,d\in\Bbb N:\forall m\in\{0,\dots,k-1\}:a+md\in f^{-1}(\{c\})$.)

Proof: Fix $R$ and $f$ and suppose not. Then for each $c\in\Bbb R$, there is a $k$ such that there are no $c$-colored APs of length $k$, and since each length $n\ge k$ AP contains a length $k$ AP, there are also no $c$-colored APs of length $n$ for any $n\ge k$. Thus the set of $k\in\Bbb N$ such that there are $c$-colored APs of length $k$ is bounded, and since the finite union of bounded sets is bounded, there is a uniform bound $K$ such that there are no monochromatic length $K$ APs in any color. But this is a contradiction to vdw above. $$\tag*{$\blacksquare$}$$

Monday, August 18, 2014

Mathbox update: More integrals

This is the first post on this blog in the mathbox-update category, so allow me to explain what this is. I periodically release my latest work in Metamath to the database, and in this post I go over the highlights and additions to be found in the release, like a feature list. You can see the latest additions to the database at; note in particular that the us2 server is a few days ahead of the us server that I usually link to - the us2 server is for development, so if you are reading this notice hot off the presses you may want to try replacing "" with "" in the links here. On to the update...

New theorems in this update:
  • ivthicc: The intermediate value theorem, in the form "$[f(m),f(n)]\subseteq\operatorname{ran}f$ whenever $m,n\in[a,b]$ and $f:[a,b]\to\Bbb R$ is continuous"; this is a bit more convenient to use than ivth, which explicitly states that $f(a)<u<f(b)$ and $a<b$, so you need to go through some case analysis with increasing/decreasing combinations to use this directly.
  • divcn: In a curious oversight, we have continuity of addition, subtraction, and multiplication, but not division. This derives that $f:\Bbb C\times\Bbb C^*\to\Bbb C$ defined by $f(x,y)=\frac xy$ is jointly continuous (in the product topology). (Note that $0$ is excluded from the domain.)
  • bndth: The Boundedness Theorem, topological version. A continuous function $f:X\to\Bbb R$, where $X$ is compact, is bounded above.
  • evth: The Extreme Value Theorem, topological version. A continuous function $f:X\to\Bbb R$, where $X$ is compact and nonempty, attains its maximum value somewhere on the domain. (Curiously, most books neglect to mention that $X$ has to be nonempty, but it's clearly necessary; the topology $J=\{\emptyset\}$ is compact and has empty base set, and the empty function from $X=\emptyset$ to $\Bbb R$ is continuous, yet it clearly can't attain its maximum because there are no points in the base set.)
  • evth2: A continuous function $f:X\to\Bbb R$, $X$ compact, attains its minimum.
  • evthicc: Specialize evth and evth2 to the case when $X=[a,b]$ is a closed interval of $\Bbb R$.
  • volsup*: The Lebesgue volume function preserves the limit of an increasing sequence of measurable sets.
  • df-ditg: Define the directed integral. This is basically the difference between $\int_{[a,b]}f(x)\,dx$ and $\int_a^bf(x)\,dx$ - the second is defined regardless of the order of $a,b$, and negates under a swap of upper and lower limit. It is literally defined as $\int_a^bf(x)\,dx=\int_{(a,b)}f(x)\,dx$ if $a\le b$ and $\int_a^bf(x)\,dx=-\int_{(b,a)}f(x)\,dx$ otherwise. The usage of open intervals is in anticipation of such integrals as $f(x)=\int_0^x\log|t|\,dt$, which would not make sense under a closed interval because $\log 0$ is undefined. As it is, this would be a total function on $\Bbb R$. (The integral at $x=0$ is $0$ regardless of the function being integrated.)
  • mbfmulc2re: The product of a measurable function $f:A\to\Bbb C$ and a real constant is a measurable function (the old mbfmulc2 only allowed $f:A\to\Bbb R$).
  • mbfposb: A function $f:A\to\Bbb R$ is measurable if and only if the positive and negative parts $f^+=\max(0,f)$, $f^-=(-f)^+$ are measurable.
  • mbfadd: The sum of measurable functions is measurable. (This is a nice argument by the decomposition $\{f+g>y\}=\bigcup_{r\in\Bbb Q}\{f>r\}\cap\{g>y-r\}$.)
  • itg10a, itg1ge0a, itg1lea: These are "approximate" versions of the analogous theorems itg10, itg1ge0, itg1le which relax their conditions to only hold on a set of full measure. (For example, the integral of a function that is zero for almost all $x$ is zero, where "almost all" means that the measure of the complement is zero.)
  • itg1climres*: The simple function integral preserves limiting operations on the base set.
  • mbfi1fseq: A ridiculously complicated proof that there is an increasing sequence of simple functions that converge pointwise to any positive measurable function. The proof uses the sequence $$g_n(x)=\begin{cases} \max(2^{-n}\lfloor 2^nf(x)\rfloor,n) & -n\le x\le n \\ 0 & o.w. \end{cases}$$ and verifying that the preimage of each point is a measurable set (by describing it explicitly in terms of preimages of $f$) is a study in case analysis.
  • itg2const, itgconst: Integral of a constant function.
  • itg2seq: A nice result, although now mostly precluded by mbfi1fseq and itg2i1fseq which give more detailed information on the sequence. For any nonnegative function $f$, there is a sequence of simple functions less than $f$ whose integrals converge to $\int f$.
  • itg2split, itgsplit: Unlike the analogous theorem for sums, we don't use itg2add to prove this, and instead prove it "directly" - the upshot of this is that we can work with extended real functions (for which addition is not defined), and also we don't need the countable choice baggage carried along with itg2add. If $A,B$ are almost disjoint measurable sets and $f$ is integrable on $A$ and $B$, then $\int_{A\cup B}f(x)\,dx=\int_A f(x)\,dx+\int_B f(x)\,dx$.
  • ditgsplit: The best argument in favor of the definition df-ditg is that the path addition rule $\int_a^cf=\int_a^bf+\int_b^cf$ holds regardless of the relative ordering of $a,b,c$, unlike the analogous theorem for integrals of closed intervals, $\int_{[a,c]}f=\int_{[a,b]}f+\int_{[b,c]}f$, which only holds for $a\le b\le c$.
  • mbfmono*, itg2mono*: The Monotone Convergence Theorem for integrals. This and the other starred* entries in the list are explained in much more detail in my previous blog entry.
  • itg2i1fseq: Completing the work of mbfi1fseq, here we show that given an increasing sequence of simple functions that converge pointwise to the positive measurable function $f$, the integrals of the simple functions also converge to the integral of $f$. (Combining these two gives an alternative proof of itg2seq.)
  • itg2add, itgadd: The main theorem of this update, which is not too complicated but carries some high-powered prerequisites above: If $f,g$ are integrable on $A$, then $$\int_A(f+g)\,dx=\int_Af\,dx+\int_Ag\,dx.$$

Sunday, August 17, 2014

Extensible structures, and "Every vector space has a basis"

I'd like to take a little time to talk about the proof of lbsex, which says that every vector space has a basis, or every vector space is free (a free module is a module with a basis; general modules are not necessarily free). This one is interesting because it uses some tricks that as far as I am aware are not in any standard proofs, which I had to use because finite linear combinations are not yet easy to state over arbitrary modules. Note that these articles are best read by also consulting the frequent links to the associated formal proofs, so that you can appreciate the similarities and differences between a hand proof and a formal proof.

First, some background, and also a little window into how Metamath does structures. A left module, written $\rm{LMod}$ in metamath, is an example of an extensible structure, which is characterized by having various "accessor" functions defined on it to get multiple components. In this case, a left module is an extension of a group, which is another extensible structure with two defined operations: the base set $\operatorname{Base}(W)$ and the group operation $+_g(W)$. These functions are simply defined as $\operatorname{Base}(W)=W(1)$ and $+_g(W)=W(2)$, so that a group object is really a function from the natural numbers or a subset thereof to the operations. This is a little different from the standard literature definition "A group is an ordered pair $\langle V,+\rangle$ such that..." which does not allow for extensions of the set operations.

Given all the above, given a set $W$, define (throughout this article) $V=\operatorname{Base}(W)$ and $+=+_g(W)$ (note that these operations always return something even if $W$ is not a function, so it is always valid to make such a definition). Then $W$ is a group iff:
  • $\forall a,b,c\in V:(a+b\in V\wedge(a+b)+c=a+(b+c))$
  • $\exists e\in V\,\forall a\in V:(e+a=a\wedge\exists m\in V:m+a=e)$
It is not too hard to see that the first axiom is just closure and associativity and the second is existence of an identity element and inverses. Now, let us define a ring, which shows off the "extensible" part of our extensible structures. Define the function $\cdot_r(W)=W(3)$. Letting $\cdot=\cdot_r(W)$, $W$ is a ring iff $W$ is an abelian group (which adds the axiom $\forall a,b:a+b=b+a$ to the group axioms) and:
  •  $\forall x,y,z\in V:$
    • $x\cdot y\in V$
    • $(x\cdot y)\cdot z=x\cdot(y\cdot z)$
    • $x\cdot(y+z)=(x\cdot y)+(x\cdot z)$
    • $(x+y)\cdot z=(x\cdot z)+(y\cdot z)$
  • $\exists u\in V\,\forall x\in V:u\cdot x=x=x\cdot u$
Here we have closure, associativity of multiplication, left- and right-distributivity, and existence of a multiplicative unit. (This is a unital ring.) Note how now we are demanding that $W$ contain at least three meaningful operations, stored at $W(1)$, $W(2)$, $W(3)$, so that a ring still satisfies the axioms of a group, even though it has "extra elements" that were not mentioned in the group definition.

Finally, let us define a left module. Define $\operatorname{Scalar}(W)=W(5)$ and $\operatorname{vsca}(W)=W(6)$. (We skipped $4$ because that is used for the conjugation of a star field, and we don't want to overlap slot numbers.) A left module is an abelian group $W$ such that $F=\operatorname{Scalar}(W)$ is a ring and (abbreviating the operations of $F$ by a subscript $F$, and abbreviating $\cdot_s=\operatorname{vsca}(W)$ and $K=\operatorname{Base}(F)$):
  • $\forall q,r\in K,x,w\in V:$
    • $r\cdot_s w\in V$
    • $r\cdot_s(w+x)=(r\cdot_sw)+(r\cdot_sx)$
    • $(q+_Fr)\cdot_sw=(q\cdot_sw)+(r\cdot_sw)$
    • $(q\cdot_Fr)\cdot_sw=q\cdot_s(r\cdot_sw)$
    • $1_F\cdot_sw=w$
These are most of the basic axioms of a vector space, except that there are no commutativity-type axioms for the multiplication operations $\cdot_F$ and $\cdot_s$, and there is no division. The only difference between a left module and a (left) vector space is the ability to divide elements: a left vector space is a left module such that $\operatorname{Scalar}(W)$ is a division ring; a division ring is a ring such that $\langle V\setminus\{0\},\cdot\rangle$ is a group (which is a fancy way of saying that for any $x\ne0$, there is a $y$ such that $x\cdot y=1$).

We need a few more definitions before we can get to the proof: a subspace of the left module $W$ (using the same abbreviations as above) is a subset $S$ of $V$ such that $a\cdot_sx+y\in S$ for any $a\in K$, $x,y\in S$; the span of a subset $S$ of $V$ is the smallest subspace containing $S$; and a basis is a set $B\subseteq V$ such that $\DeclareMathOperator{\span}{span}\span(B)=V$ and $x\notin\span(B\setminus\{x\})$ for all $x\in B$. A set satisfying only the first condition is called a spanning set, while a set satisfying only the second condition is called linearly independent.

Theorem (lbsex): Any vector space has a basis.

Proof: This is a special case of the following more general theorem, which we prove instead.

Theorem (lbsext): For any linearly independent subset $C\subseteq V$, there is a basis $B\supseteq C$ for the vector space $W$.

Proof: This theorem is a well-known equivalent to the Axiom of Choice, so it's no surprise that we will be using Zorn's lemma (zornn0) as part of the proof. The statement that we will be using says that if $S$ is a nonempty collection of sets such that for all nonempty $y\subseteq S$ with $\forall s,t\in y:(s\subseteq t\vee t\subseteq s)$, we also have $\bigcup y\in S$, then there is an $s\in S$ with no proper superset in $S$. This is usually read as "if any chain in $S$ has an upper bound in $S$, then $S$ has a maximal element", but this requires more definitions than necessary for understanding.

The set $S$ in question here is the set of all linearly independent subsets of $V$ containing $C$. This is nonempty because $C\in S$, and so next we have to establish that chains have an upper bound; so suppose $A\subseteq S$ is a nonempty collection of sets such that any two elements are in a subset relation. Actually, we will trade this for the weaker assumption that $A$ is closed under binary union, that is, for all $s,t\in A:s\cup t\in A$; this follows from the chain assumption because if $s\subseteq t$ then $s\cup t=t\in A$ and if $t\subseteq s$ then $s\cup t=s\in A$.

Our goal is to show that $\bigcup A\in S$, which is to say, $\bigcup A$ is a linearly independent subset of $V$ containing $C$. Now clearly $C\subseteq\bigcup A$ because $A$ is nonempty and every element of $A$ has $C$ as a subset; so now we need to show that for any $x\in\bigcup A$, $x\notin\span(\bigcup A\setminus\{x\})$. Rather than exploiting the "finite linear combinations" characterization of span, which we don't have, we leverage the "smallest subspace" characterization, by showing that a particular set is a subspace containing $\bigcup A\setminus\{x\}$.

Lemma (lbsextlem2): Let $T=\bigcup_{u\in A}\span(u\setminus\{x\})$. Then $T$ is a subspace, and $\bigcup A\setminus\{x\}\subseteq T$.

Proof: Let $r\in K,v,w\in T$. Then there is an $m,n\in A$ such that $v\in\span(m\setminus\{x\})$ and an $w\in\span(n\setminus\{x\})$. Since $A$ is closed under binary union, $m\cup n\in A$, and also $v,w\in\span(m\cup n\setminus\{x\})$ because span respects the subset relation. Since a span is a subspace, we then have $rv+w\in\span(m\cup n\setminus\{x\})$, so $rv+w\in T$. Thus $T$ is a subspace. To show $\bigcup A\setminus\{x\}\subseteq T$, let $y\in u\in A$ with $y\ne x$. Then $y\in u\setminus\{x\}\subseteq\span(u\setminus\{x\})$, so $y\in T$. $$\tag*{$\blacksquare$}$$

It follows that $\span(\bigcup A\setminus\{x\})\subseteq T$, so it suffices to show that for any $x\in\bigcup A$, $x\notin T$.

Lemma (lbsextlem3): If $A\subseteq S$ is nonempty and $\forall s,t\in A: s\cup t\in A$, then $\bigcup A\in S$.

Proof: From earlier remarks, all that remains is to show that for all $x\in\bigcup A$, $x\notin T$, so suppose that $x\in y\in A$ and $u\in A$, $x\in\span(u\setminus\{x\})$. Then since $y\cup u\in A$, $y\cup u$ is linearly independent, and so $x\in y\subseteq y\cup u\to x\notin\span(y\cup u\setminus\{x\})$, but since $x\in\span(u\setminus\{x\})\subseteq\span(y\cup u\setminus\{x\})$, this is a contradiction. $$\tag*{$\blacksquare$}$$

This finishes the prerequisites to zornn0, so now suppose that $B\in S$ such that for all $y\in S,y\not\subset B$. We wish to show that this $B$ is a basis containing $C$. It is immediate that $B$ is linearly independent and contains $C$, since $B\in S$; it remains to prove (lbsextlem4) that for any $w\in V$, $w\in\span(B)$. We will need one more lemma, where the division ring property comes into play; we state it now.

Theorem (lspsolv): For any vector space $W$, $A\subseteq V$, and $y\in V$, if $x\in\span(A\cup\{y\})$ but $x\notin\span(A)$, then $y\in\span(A\cup\{x\})$.

Proof: This is clear from the "finite linear combinations" characterization of span; if $x=\sum_{v\in A\cup\{y\}}a_vv$ where only finitely many $a_v$ are nonzero, then the coefficient $a_y$ must be nonzero or else this would show that $x\in\span(A)$, so we can "solve for $y$" in the equation to write $y=a_y^{-1}(x-\sum_{v\in A}a_vv)$ and thus show that $y$ is in the span of $A\cup\{x\}$.

In our characterization, we have to work with subspaces, so the subspace we consider for this proof is $$Q=\{z\in V:\exists r\in K\, z+ry\in\span(A)\}.$$ To show this is a subspace, suppose $a\in K,u,v\in Q$. Then there are $s,t\in K$ such that $u+sy,v+ty\in\span(A)$, so since $\span(A)$ is a subspace, $$(au+v)+(as+t)y=a(u+sy)+(v+ty)\in\span(A),$$ so since $as+t\in K$, $au+v\in Q$ and $Q$ is a subspace.

Furthermore, $A\cup\{y\}\subseteq Q$ because $z\in A\to z+0y\in\span(A)$, and $y+(-1)y=0\in\span(A)$. Therefore $x\in\span(A\cup\{y\})\subseteq Q$, and there is an $r\in K$ such that $x+ry\in\span(A)$. (This is all proved in lspsolvlem.)

Now we are almost there; if $r=0$ then $x\in\span(A)$ against the hypothesis, so by the division property $r^{-1}$ exists, and we can write $y=r^{-1}((x+ry)-x)$, and since $\span(A\cup\{x\})$ is a subspace containing $x$ and $x+ry$, $y$ is also in $\span(A\cup\{x\})$. $$\tag*{$\blacksquare$}$$

Returning to the proof of lbsext, our strategy is to show that if $w\in V\setminus\span(B)$, then $B\cup\{w\}\supset B$ is also linearly independent, contradicting the maximality of $B$. For this to be true, we need that $\forall x\in B\cup\{w\}:x\notin\span(B\cup\{w\}\setminus\{x\})$. For $x=w$, this is easy, because $B\cup\{w\}\setminus\{w\}=B$ and $w\notin\span(B)$ by assumption. If $x\in B$, suppose that $x\in\span(B\cup\{w\}\setminus\{x\})$ for a contradiction. Since $x\ne w$, $B\cup\{w\}\setminus\{x\}=B\setminus\{x\}\cup\{w\}$, so we can apply lspsolv above: $x\in\span(B\setminus\{x\}\cup\{w\})$ but $x\notin\span(B\setminus\{x\})$ (because $B$ is linearly independent), so $w\in\span(B\setminus\{x\}\cup\{x\})=\span(B)$, contrary to hypothesis. Thus $B$ is a basis containing $C$, which completes the proof. $$\tag*{$\blacksquare$}$$

Saturday, August 16, 2014

The Monotone Convergence Theorem for integrals

Today's post is about the Monotone Convergence Theorem and the theorems leading up to it.

Theorem (itg2mono): If $(f_n)_{n\in\Bbb N}$ is a sequence of measurable functions $\Bbb R\to[0,\infty)$ such that $$0\le f_1\le f_2\le\dots\le f_n\le\dots$$ and the pointwise limit $g$ of the sequence $(f_n)$ exists, then $g$ is measurable and $\int g=\lim_{n\to\infty}\int f_n$ (note that for increasing sequences, $\lim$ and $\sup$ are equivalent, so the statement may differ from the Metamath version for this reason).

Proof: First, we prove $g$ is measurable. (mbfmono) It is sufficient to prove that the inverse image $\{g>t\}=\{x\in\Bbb R|g(x)>t\}$ is measurable, which follows from $\{g>t\}=\bigcup_{n\in\Bbb N}\{f_n>t\}$, because $t<g(x)=\sup_{n\in\Bbb N}f_n(x)\iff\exists n\in\Bbb N:t<f_n(x)$ and the countable union of the measurable sets $\{f_n>t\}$ is measurable.

Before tackling the main claim $\int g=\lim_{n\to\infty}\int f_n$, we will need a few more lemmas, so let's get those down first.

Theorem (volsup): If $(A_n)_{n\in\Bbb N}$ is a sequence of measurable sets such that $A_1\subseteq A_2\subseteq\dots\subseteq A_n\subseteq\dots\subseteq\Bbb R$ and $\bigcup_{n\in\Bbb N}A_n=A$, then $\DeclareMathOperator{\vol}{vol}\vol(A)=\lim_{n\to\infty}\vol(A_n)$, where the limit may be $\infty$.

Proof: First, we address the case when there is an $A_k$ such that $\vol(A_k)=\infty$. In this case, $A_k\subseteq A$ implies $\vol(A)=\infty$, and $$\lim_{n\to\infty}\vol(A_n)=\sup_{n\to\infty}\vol(A_n)\ge\vol(A_k)=\infty$$ so $\lim_{n\to\infty}\vol(A_n)=\infty$ as well.

Otherwise, all the $\vol(A_n)$ are real, so we are free to do arithmetic on the volumes. Consider the  sequence $B_n=A_n\setminus\bigcup_{k=1}^{n-1}A_k=A_n\setminus A_{n-1}$ (where the second representation is only valid for $n\ge2$). This is a set of pairwise-disjoint measurable sets, so from voliun we have $\vol(\bigcup_{n\in\Bbb N}B_n)=\sum_{n=1}^\infty\vol(B_n)$. Now each $B_n\subseteq A_n$, so $\bigcup_{n\in\Bbb N}B_n\subseteq A$, and each $x\in A$ is in some $A_k$, and for the minimal $k$, it is also in $B_k$. Thus $A=\bigcup_{n\in\Bbb N}B_n$. As for the sum, we prove by induction that $\sum_{k=1}^n\vol(B_n)=\vol(A_n)$ and therefore the sequence of partial sums of $\vol(B_n)$ is the sequence $\vol(A_n)$ and $\sum_{n=1}^\infty\vol(B_n)=\lim_{n\to\infty}\vol(A_n)$.

For $n=1$, we have $B_1=A_1$ so $\sum_{k=1}^1\vol(B_n)=\vol(A_1)$, and assuming that this is true for $n$, we have $$\sum_{k=1}^{n+1}\vol(B_n)=\vol(A_n)+\vol(B_{n+1})=\vol(A_n)+\vol(A_{n+1}\setminus A_n),$$ and since these are disjoint sets and $A_n\subseteq A_{n+1}$ we have $\vol(A_n)+\vol(A_{n+1}\setminus A_n)=\vol(A_{n+1})$. $$\tag*{$\blacksquare$}$$

Next we use this theorem to prove that the integral on simple functions respects limit operations on the base set.

Theorem (itg1climres): If $(A_n)_{n\in\Bbb N}$ is a sequence of measurable sets such that $A_1\subseteq A_2\subseteq\dots\subseteq A_n\subseteq\dots\subseteq\Bbb R$ and $\bigcup_{n\in\Bbb N}A_n=\Bbb R$, and $\phi$ is a simple function, then $\lim_{n\to\infty}\int_{A_n}\phi=\int\phi$.

Proof: We expand out the definition of the integral on simple functions: $\int\phi=\sum_{x\in\Bbb R^*}x\vol(\phi^{-1}\{x\})$, where the sum is nominally over all nonzero reals but is only nonzero for finitely many $x$. We can distribute the limit over the sum: $$\lim_{n\to\infty}\int_{A_n}\phi=\lim_{n\to\infty}\sum_{x\in\Bbb R^*}x\vol(\phi^{-1}\{x\}\cap A_n)=\sum_{x\in\Bbb R^*}x\lim_{n\to\infty}\vol(\phi^{-1}\{x\}\cap A_n),$$ so we are reduced to proving $\lim_{n\to\infty}\vol(\phi^{-1}\{x\}\cap A_n)=\vol(\phi^{-1}\{x\})$, which is an application of volsup above, taking the sets $(\phi^{-1}\{x\}\cap A_n)_{n\in\Bbb N}$ as our increasing sequence. $$\tag*{$\blacksquare$}$$

Now we are finally ready to return to our proof of the main theorem itg2mono. The easy direction of our equality is $\lim_{n\to\infty}\int f_n\le\int g$, which is true because each $f_n\le g$ so $\int f_n\le\int g$ by itg2le. Now the definition of $\int g$ is as a supremum over all simple functions $\phi\le g$. Thus to show $\int g\le\lim_{n\to\infty}\int f_n$ we can show that $S:=\lim_{n\to\infty}\int f_n$ is an upper bound on $\int\phi$ for any $\phi\le g$. Now we are going to prove that $t\int\phi\le S$ for any $0<t<1$, but to get from there to $\int\phi\le S$ takes a little trickery.

We assume that $\int\phi>S$ to start with. Since $S\in[0,\infty]$ and $\int\phi\in\Bbb R$ to start with (remember that $S$ is a supremum, possibly infinite, of integrals of nonnegative functions, while $\phi$ is a simple function whose integral is real but possibly negative), the inequality $\int\phi>S$ immediately implies $S\in\Bbb R$ and $\int\phi>0$. Now we can use alrple to use that for any $x>0$, $\int\phi\le S+x$ and hence prove that $\int\phi\ge S$ (in contradiction to our assumption, thus showing $\int\phi\ge S$ unconditionally). Define $t=\max(\frac12,\frac S{S+x})$, which is valid since $S\ge 0$ and $x>0$. Then $t\ge\frac12>0$ and $t\le\frac S{S+x}<1$, so by our lemma below, $t\int\phi\le S$. Since $\int\phi>0$ and $t>0$ we can infer $S>0$, so we can cancel the $S$ and multiply by $S+x$ in $\frac S{S+x}\int\phi\le t\int\phi\le S$ to get $\int\phi\le S+x$ as we wanted to show.

Now all that remains to prove is our lemma:

Lemma (itg2monolem): For any $0<t<1$, and any simple function $\phi\le g$, we have $\int\phi\le \lim_{n\to\infty}\int f_n$.

Proof: Define the sequence $A_n=\{x\in\Bbb R|t\phi(x)\le f_n(x)\}$. Then $\int_{A_n}t\phi\le\int f_n$ because the restricted function $t\phi\upharpoonright A_n$ is always dominated by $f_n$: if $x\notin A_n$ then $(t\phi\upharpoonright A_n)(x)=0\le f(n)$ and if $x\in A_n$ then $(t\phi\upharpoonright A_n)(x)=t\phi(x)\le f(n)$. We can take this inequality in the limit, and thus $$\int t\phi=\lim_{n\to\infty}\int_{A_n}t\phi\le\lim_{n\to\infty}\int f_n,$$ where the first equality is none other than our earlier lemma itg1climres, which can be applied because $A_n=\{0\le f_n-t\phi\}$ is a measurable set (since $f_n-t\phi$ is a sum of a simple function and a measurable function, which is measurable), the $A_n$ sets form an inclusion chain since $f_{n+1}\le f_n$, and the union of all of them is $\Bbb R$ because either $\phi(x)>0$ in which case $t\phi(x)<\phi(x)\le g(x)$ and there is some $f_n(x)>t\phi(x)$, or $\phi(x)\le0$ in which case $t\phi(x)\le0\le f_n(x)$ for every $n$. This completes the proof. $$\tag*{$\blacksquare$}$$


For a long time now, I've considered starting a weblog, and over the years I have had many people suggest that I start one, but the question that is always unanswered is: "What do I write?" I've decided to make this a blog about the subject that has consumed my thoughts for the past year, and that's Metamath.

For those who don't know, Metamath is a proof language for writing mathematical proofs which can be checked by an associated verification program, and I have been using Metamath to write proofs and submit them to the database, called, since around the beginning of 2013. Back then most of what I did was revise old theorems to make them more efficient or avoid usage of the axiom of choice or clean them up in some way, and then in February 2014 I was made aware of Freek Wiedijk's Formalizing 100 Theorems page, which tracks the formalization of 100 of the most famous theorems in mathematics, and I decided to make it my goal to prove all of these.

When we first started tracking our progress, there were 19 theorems already proven, and over the past 8 months 15 more have been proven. Of those, 12 were my work. This includes such theorems as
  • The Basel problem: $\sum_{n=1}^\infty\frac1{n^2}=\frac{\pi^2}6$ (basel),
  • Bertrand's postulate: $n\in\Bbb N\to\exists p\in\Bbb P:n<p\le 2n$ (bpos),
  • Lagrange's theorem for groups: $G$ finite group, $H\subseteq G$ subgroup implies $|H|$ divides $|G|$, (lagsubg),
and many others. I hope in later articles to talk a little more about the way these proofs are being done, and also highlight my latest work, and talk more generally about metamath and mmj2, the program I use (and develop) as a proof assistant to make the proofs. My current project is a proof of the Fundamental Theorem of Calculus, which has required a lot of work on defining a (Lebesgue) integral and derivative. But for now, welcome to my blog.