Theorem 9.4.1 (dirith, Dirichlet's theorem): If N∈N and (A,N)=1, then there are infinitely many primes p such that N∣p−A.If you have a copy of Shapiro, "Introduction to the Theory of Numbers", open it to p. 375, Theorem 9.4.1, because the proof follows the book closely. Otherwise, I will do my best to explain the steps, as well as a few things that Shapiro glossed over.
The major players in the proof are the von Mangoldt function Λ(n), which is defined by Λ(pα)=logp if pα is a prime power and Λ(n)=0 otherwise, the Möbius function μ(n), which is 1 if n is a squarefree number with an even number of factors, −1 if it is squarefree with an odd number of factors, and 0 otherwise, and the Dirichlet characters χ(n), which are completely multiplicative functions on Z/NZ that are nonzero on the units. Facts about these functions will be introduced as we go along.
The starting point is the following theorem (vmadivsum, Equation 9.2.13): ∑n≤xΛ(n)n=logx+O(1).
Proof: By definition, ψ(x)=∑n≤xΛ(n) (this is the second Chebyshev function), and from chpo1ub ψ(x)=O(x). Also (logfac2), log⌊x⌋!=∑n≤xΛ(n)⌊xn⌋. We can put these together via ∑n≤xΛ(n)n=1x∑n≤xΛ(n)xn=1x∑n≤xΛ(n)⌊xn⌋+1x∑n≤xΛ(n){xn}=1xlog⌊x⌋!+1xO(∑n≤xΛ(n))=1x(xlogx+O(x))+1xO(ψ(x))=logx+O(1). (We also used the asymptotic approximation of the factorial (logfaco1), log⌊x⌋!=xlogx+O(x) in this equation.)
Our goal will be to strengthen this to talk about a specific progression mod (we will hold N fixed throughout the proof) to achieve rpvmasum, Equation 9.4.3: \phi(N)\sum_{\substack{n\le x\\n\equiv A}}\frac{\Lambda(n)}n=\log x+O(1). Once we have established this, the main theorem is simple:
Theorem (rplogsum, Equation 9.4.4): \phi(N)\sum_{\substack{p\le x\\p\equiv A}}\frac{\log p}p=\log x+O(1).
Proof: (Note that p is always a variable over primes here.) \begin{align} \sum_{\substack{n\le x\\n\equiv A}}\frac{\Lambda(n)}n-\sum_{\substack{p\le x\\p\equiv A}}\frac{\log p}p &= \sum_{\substack{p^\alpha\le x, \alpha>1\\p^\alpha\equiv A}}\frac{\log p}{p^\alpha} \le \sum_{p^\alpha\le x, \alpha>1}\frac{\log p}{p^\alpha}\\ &\le \sum_{p\in\Bbb P}\sum_{\alpha=2}^\infty\frac{\log p}{p^\alpha} = \sum_{p\in\Bbb P}\frac{\log p}{p(p-1)}\\ &\le \sum_{n=2}^\infty\frac{\log n}{n(n-1)} \le \sum_{n=2}^\infty\frac{\sqrt{n-1}}{n(n-1)}\\ &\le \sum_{n=2}^\infty\left(\frac2{\sqrt{n-1}}-\frac2{\sqrt n}\right)=2 \end{align} Thus \phi(N)\sum_{n\le x,n\equiv A}\frac{\Lambda(n)}n=\log x+O(1) implies \phi(N)\sum_{p\le x,p\equiv A}\frac{\log p}p=\log x+O(1). But this implies that \sum_{p\le x,p\equiv A}\frac{\log p}p diverges, and since a finite sum cannot diverge, this means that \{p\in\Bbb P\mid p\equiv A\pmod N\} is infinite. \tag*{$\blacksquare$}
We are left with the task of establishing rpvmasum. At this point we turn to Dirichlet characters as a "better basis" from which to weight the sum. This works because of the relation (sum2dchr) \sum_{\chi}\bar\chi(A)\chi(n)=\begin{cases}\phi(N)&n\equiv A\\0&o.w.\end{cases} (where the sum is taken over all Dirichlet characters \chi). Using this equation we can rewrite the sum as \sum_{\substack{n\le x\\n\equiv A}}\frac{\Lambda(n)}n=\sum_{\chi}\bar\chi(A)\sum_{n\le x}\frac{\chi(n)\Lambda(n)}n.
There are two cases. The \log x term comes from the principal character \chi=\chi_0 (written \chi=\bf 1 in Metamath):
Theorem (rpvmasumlem): \sum_{n\le x}\frac{\chi_0(n)\Lambda(n)}n=\log x+O(1) Proof: \begin{align} \sum_{n\le x}\frac{\chi_0(n)\Lambda(n)}n &=\sum_{\substack{n\le x\\(n,N)=1}}\frac{\Lambda(n)}n\\ &=\sum_{n\le x}\frac{\Lambda(n)}n-\sum_{\substack{n\le x\\(n,N)>1}}\frac{\Lambda(n)}n\\ &=\log x+O(1)-\sum_{\substack{p^\alpha\le x\\(p^\alpha,N)>1}}\frac{\log p}{p^\alpha} \end{align} If (p^\alpha,N)>1, then p is a divisor of N, and there are only finitely many of those. Thus \sum_{\substack{p^\alpha\le x\\(p^\alpha,N)>1}}\frac{\log p}{p^\alpha} \le\sum_{p\mid N}\sum_{\alpha=1}^\infty\frac{\log p}{p^\alpha} =\sum_{p\mid N}\frac{\log p}{p-1} so \sum_{n\le x,(n,N)>1}\frac{\Lambda(n)}n=O(1) and \sum_{n\le x}\frac{\chi_0(n)\Lambda(n)}n=\log x+O(1). \tag*{$\blacksquare$}
The other case (the hard case) is when \chi\ne\chi_0. Fix a nonprincipal character \chi. Our goal is to show (dchrvmasum, Equation 9.4.8) \sum_{n\le x}\frac{\chi(n)\Lambda(n)}n=O(1). From this it follows that \begin{align} \sum_{\substack{n\le x\\n\equiv A}}\frac{\Lambda(n)}n &=\sum_{n\le x}\frac{\chi_0(n)\Lambda(n)}n+\sum_{\chi\ne\chi_0}\bar\chi(A)\sum_{n\le x}\frac{\chi(n)\Lambda(n)}n\\ &=(\log x+O(1))+\sum_{\chi\ne\chi_0}\bar\chi(A)O(1)\\ &=\log x+O(1), \end{align} which will prove vmadivsum.
Theorem (vmasum, Equation 9.2.4): \log n=\sum_{d\mid n}\Lambda(d)
Proof: \begin{align} \sum_{d\mid n}\Lambda(d) &=\sum_{p^\alpha\mid n}\log p =\sum_{p}\sum_{\alpha\le\nu_p(n)}\log p\\ &=\sum_{p}\nu_p(n)\log p =\log\prod_{p} p^{\nu_p(n)}=\log n \end{align} where \nu_p(n) (written in Metamath as (p\mbox{ pCnt }n)) is the exponent of p in the prime factorization of n. \tag*{$\blacksquare$}
Using Möbius inversion (muinv) on this equation gives \Lambda(n)=\sum_{d\mid n}\mu(d)\log\frac nd, hence (dchrvmasumlem1) \begin{align} \sum_{n\le x}\frac{\chi(n)\Lambda(n)}n &=\sum_{n\le x}\sum_{d\mid n}\frac{\chi(n)\mu(d)}n\log\frac nd\\ &=\sum_{d\le x}\sum_{m\le x/d}\frac{\chi(dm)\mu(d)}{dm}\log m\\ &=\sum_{d\le x}\frac{\chi(d)\mu(d)}{d}\sum_{m\le x/d}\frac{\chi(m)}{m}\log m. \end{align}
We have a similar equation for \log x (dchrvmasum2lem): \begin{align} \log x&=\sum_{n\le x}\frac{\chi(n)}n\log\frac xn\sum_{d\mid n}\mu(d)\\ &=\sum_{n\le x}\sum_{d\mid n}\frac{\chi(n)\mu(d)}n\log\frac xn\\ &=\sum_{d\le x}\sum_{m\le x/d}\frac{\chi(dm)\mu(d)}{dm}\log\frac x{dm}\\ &=\sum_{d\le x}\frac{\chi(d)\mu(d)}{d}\sum_{m\le x/d}\frac{\chi(m)}{m}\log\frac x{dm}. \end{align}
The first step is an application of musum to collapse the double sum to just the n=1 term. Combining these two yields the equation \sum_{n\le x}\frac{\chi(n)\Lambda(n)}n+\log x=\sum_{d\le x}\frac{\chi(d)\mu(d)}{d}\sum_{m\le x/d}\frac{\chi(m)}{m}\log\frac xd.
Our next task is to get an asymptotic expression for the sums \sum_{m\le z}\frac{\chi(m)}{m} and \sum_{m\le z}\frac{\chi(m)}{m}\log m appearing here. To that end, we introduce the following lemma:
Theorem (dchrisum, Theorem 9.4.1): Let f:\Bbb R^+\to\Bbb R be an eventually monotonically decreasing function (that is, for some M, M\le x\le y implies f(x)\ge f(y)) which converges to 0. Then \sum_n\chi(n)f(n) converges to some L, and for some c, x\ge M\implies|\sum_{n\le x}\chi(n)f(n)-L|\le cf(x).
Proof: It suffices to show that \sum_{a\le n<b}\chi(n)f(n)=O(f(x)) (for positive integers a,b with x\le a\le b), because then the sequence is Cauchy and so converges to a limit L, and since the entire sequence above x is inside the closed set \{z\in\Bbb C\mid|\sum_{n\le x}\chi(n)f(n)-z|\le f(x)\}, so is the limit.
Let R(k)=\sum_{n=0}^{k-1}\chi(n). Then R(N) sums over all the equivalence classes of \Bbb Z/N\Bbb Z exactly once, so R(N)=\sum_{i\in\Bbb Z/N\Bbb Z}\chi(i), and since \chi is nonprincipal, by dchrsum R(N)=0. Let C\ge|R(k)| for all 0\le k<N. (This is a finite set, so there is a finite supremum.) Then setting k=aN+b where b<N, we have \begin{align} R(k)&=\sum_{i=0}^{a-1}\sum_{n=0}^{N-1}\chi(iN+n)+\sum_{n=0}^{b-1}\chi(aN+n)\\ &=\sum_{i=0}^{a-1}\sum_{n=0}^{N-1}\chi(n)+\sum_{n=0}^{b-1}\chi(n)\\ &=\sum_{i=0}^{a-1}R(N)+R(b)=R(b), \end{align} and C\ge|R(b)|; hence C\ge|R(k)| for all k.
Using summation by parts, \begin{align} |\sum_{a\le n<b}\chi(n)f(n)| &=|\sum_{a\le n<b}(R(n+1)-R(n))f(n)|\\ &=|R(b)f(b)-R(a)f(a)-\sum_{a\le n<b}R(n+1)(f(n+1)-f(n))|\\ &\le Cf(b)+Cf(a)+\sum_{a\le n<b}C|f(n+1)-f(n)|\\ &=Cf(b)+Cf(a)+\sum_{a\le n<b}C(f(n)-f(n+1))\\ &=Cf(b)+Cf(a)+C(f(a)-f(b))\le 2Cf(x). \end{align} \tag*{$\blacksquare$}
With dchrisum in hand we can apply it to the decreasing functions f(x)=\frac{\log x}x and f(x)=\frac1x; let L_0=\sum_{n=1}^\infty\frac{\chi(n)}n\qquad L_1=\sum_{n=1}^\infty\frac{\chi(n)\log n}n be the limits of the series that result.
Theorem (dchrmusum2, Lemma 9.4.2): L_0\sum_{n\le x}\frac{\chi(n)\mu(n)}n=O(1).
Proof: We have: \begin{align} 1&=\sum_{n\le x}\sum_{d\mid n}\frac{\chi(n)\mu(d)}n\\ &=\sum_{d\le x}\sum_{m\le x/d}\frac{\chi(dm)\mu(d)}{dm}\\ &=\sum_{d\le x}\frac{\chi(d)\mu(d)}{d}\sum_{m\le x/d}\frac{\chi(m)}m \end{align}
dchrisum provides that |\sum_{m\le z}\frac{\chi(m)}m-L_0|\le\frac Cz for some C and all z. Then: \begin{align} |L_0\sum_{n\le x}\frac{\chi(n)\mu(n)}n| &\le1+|1-L_0\sum_{n\le x}\frac{\chi(n)\mu(n)}n|\\ &=1+|\sum_{d\le x}\frac{\chi(d)\mu(d)}{d}(\sum_{m\le x/d}\frac{\chi(m)}m-L_0)|\\ &\le1+\sum_{d\le x}\frac1d|\sum_{m\le x/d}\frac{\chi(m)}m-L_0|\\ &\le1+\sum_{d\le x}\frac Cx\le 1+C. \end{align} \tag*{$\blacksquare$}
We will be doing a lot of case analysis on L_0=0, so let the notation \left[a\atop b\right] denote \left[a\atop b\right]=\begin{cases}a&L_0=0\\b&L_0\ne0\end{cases}.
Theorem (dchrvmasumif, Equation 9.4.24): \sum_{n\le x}\frac{\chi(n)\Lambda(n)}n=-\left[\log x\atop 0\right]+O(1).
Proof: Earlier, we showed that \sum_{n\le x}\frac{\chi(n)\Lambda(n)}n+\left[\log x\atop 0\right]= \sum_{d\le x}\frac{\chi(d)\mu(d)}{d}\sum_{m\le x/d}\frac{\chi(m)}{m}\log\left[x/d\atop m\right]. We will show that this latter sum is bounded. Let g(x)=\sum_{n\le x}\frac{\chi(n)}n\log\left[x\atop n\right], so that the sum in question is \sum_{d\le x}\frac{\chi(d)\mu(d)}{d}g(x/d). Let us show that g(x)=\left[0\atop L_1\right]+O\left(\frac{\log x}x\right). In the case L_0=0, we have g(x)=\sum_{n\le x}\frac{\chi(n)}n\log x=\left(L_0+O\left(\frac1x\right)\right)\log x=O\left(\frac{\log x}x\right), and if L_0\ne0, we have g(x)=\sum_{n\le x}\frac{\chi(n)\log n}n=L_1+O\left(\frac{\log x}x\right). Since \left[0\atop L_1/L_0\right]L_0=\left[0\atop L_1\right] and L_0\sum_{d\le x}\frac{\chi(d)\mu(d)}d=O(1), we also have \left[0\atop L_1\right]\sum_{d\le x}\frac{\chi(d)\mu(d)}d=O(1). \begin{align} \sum_{n\le x}\frac{\chi(n)\Lambda(n)}n+\left[\log x\atop 0\right] &=\sum_{d\le x}\frac{\chi(d)\mu(d)}dg(x/d)\\ &=O(1)+\sum_{d\le x}\frac{\chi(d)\mu(d)}d\left(g(x/d)-\left[0\atop L_1\right]\right) \end{align}
Now suppose the big-O constants take x\ge A\implies \left|g(x)-\left[0\atop L_1\right]\right|\le C\frac{\log x}x, with A\ge1, and set D=\sum_{n<A}\frac{\log A}n. Then x<A implies \begin{align} |g(x)|&\le\sum_{n\le x}\frac1n\left|\log\left[x\atop n\right]\right|\\ &\le\sum_{n\le x}\frac1n\log A\le\sum_{n<A}\frac1n\log A=D\\ \end{align} Then: \begin{align} \left|\sum_{d\le x}\frac{\chi(d)\mu(d)}d\left(g(x/d)-\left[0\atop L_1\right]\right)\right| &\le\sum_{d\le x}\frac1d\left|g(x/d)-\left[0\atop L_1\right]\right|\\ &\le\sum_{d\le x/A}\frac Cd\frac{\log(x/d)}{x/d}+\sum_{x/A<d\le x}\frac Dd\\ &\le\frac Cx\sum_{d\le x}(\log x-\log d)+D(\log x-\log(x/A)+1)\\ &\le\frac Cx(x\log x-\log\lfloor x\rfloor!)+D(\log A+1) \end{align} which is bounded using \log\lfloor x\rfloor!=x\log x+O(x). \tag*{$\blacksquare$}
We're almost there: if we can prove L_0\ne0 unconditionally, dchrvmasumif will give us what we want. Replay the derivation of rpvmasum, with A=1 so that the \bar\chi(A) factors disappear: \begin{align} \sum_{\substack{n\le x\\n\equiv 1}}\frac{\Lambda(n)}n &=\sum_{n\le x}\frac{\chi_0(n)\Lambda(n)}n+\sum_{\chi\ne\chi_0}\sum_{n\le x}\frac{\chi(n)\Lambda(n)}n\\ &=(\log x+O(1))+\sum_{\chi\ne\chi_0}(-\left[\log x\atop 0\right]+O(1))\\ &=(1-|W|)\log x+O(1), \end{align} where W is the set of all \chi\ne\chi_0 such that L_0(\chi)\ne 0. Since \sum_{n\le x,n\equiv 1}\frac{\Lambda(n)}n is nonnegative, we have a contradiction if |W|>1 (because the (1-|W|)\log x+O(1) will eventually be negative). Thus there is at most one \chi such that L_0(\chi)=0. Now fix \chi again and assume L_0=0. Because \bar\chi is also a Dirichlet character, and L_0(\bar\chi)=\overline{L_0(\chi)}=0, we must have \bar\chi=\chi, so \chi is a real Dirichlet character (dchrisum0re).
Now let F(n)=\sum_{d\mid n}\chi(d). Because a divisor sum of a multiplicative function is multiplicative (fsumdvdsmul), F is multiplicative (dchrisum0fmul). (Unlike \chi it is not completely multiplicative.)
Theorem (dchrisum0flb, Equation 9.4.29): F(n)\ge\begin{cases}1&\sqrt n\in\Bbb N\\0&o.w.\end{cases}.
Proof: We begin by establishing the theorem for prime powers p^\alpha, \alpha\ge0. In this case \sqrt{p^\alpha}\in\Bbb N iff \alpha is even. Now F(p^\alpha)=\sum_{d\mid p^\alpha}\chi(d)=\sum_{k\le\alpha}\chi(p^k)=\sum_{k\le\alpha}\chi(p)^k, and since \chi is a real Dirichlet character it can only take the values -1,0,1. If \chi(p)=1, then F(p^\alpha)=\alpha+1\ge1; if \chi(p)=0, then F(p^\alpha)=1\ge1; and if \chi(p)=-1 then F(p^\alpha)=\sum_{k\le\alpha}(-1)^k=\frac{1+(-1)^{\alpha}}2 which is 1 when \alpha is even and 0 if it is odd. Thus in any case the theorem is proven for prime powers.
Now to cover the general case, by complete induction. If n=1 then it is a prime power so the theorem is proven; otherwise let p be a prime divisor of n, and let n=p^\alpha k where p\nmid k, so F(n)=F(p^\alpha)F(k). We have proven that F(p^\alpha) satisfies the theorem, and F(k) satisfies the theorem by the induction hypothesis, because k<n. The product of two nonnegatives is nonnegative, so F(n)\ge0. If n is a square then \nu_p(n)=\alpha is even, so p^\alpha is a square, and so k is a quotient of squares, which is thus also a square. Thus F(p^\alpha),F(k)\ge1, so F(n)\ge1. \tag*{$\blacksquare$}
Therefore, \sum_{n\le x}\frac{F(n)}{\sqrt n}\ge\sum_{m^2\le x}\frac{F(m^2)}{m}\ge\sum_{m\le\sqrt x}\frac1m\ge\frac12\log x, so \sum_{n\le x}\frac{F(n)}{\sqrt n} is unbounded.
Theorem (dchrisumn0, Lemma 9.4.4): L_0\ne0.
Proof: We will show that L_0=0 implies \sum_{n\le x}\frac{F(n)}{\sqrt n} is bounded. We have: \begin{align} \sum_{n\le x}\frac{F(n)}{\sqrt n} &=\sum_{n\le x}\sum_{d\mid n}\frac{\chi(d)}{\sqrt n}\\ &=\sum_{d\le x}\sum_{m\le x/d}\frac{\chi(d)}{\sqrt{dm}}\\ &=\sum_{d\le\sqrt x}\sum_{m\le x/d}\frac{\chi(d)}{\sqrt{dm}} +\sum_{m\le\sqrt x}\sum_{\sqrt x<d\le x/m}\frac{\chi(d)}{\sqrt{dm}}\\ &=\sum_{d\le y}\sum_{m\le y^2/d}\frac{\chi(d)}{\sqrt{dm}} +\sum_{m\le y}\sum_{y<d\le y^2/m}\frac{\chi(d)}{\sqrt{dm}} \end{align} (changing variables y\mapsto\sqrt x in the last step). Using dchrisum once more, we obtain L_2 such that |\sum_{n\le x}\frac{\chi(d)}{\sqrt d}-L_2|\le\frac C{\sqrt x}. Then \begin{align} |\sum_{m\le y}\sum_{y<d\le y^2/m}\frac{\chi(d)}{\sqrt{dm}}| &\le\frac{2C}{\sqrt y}\sum_{m\le y}\frac1{\sqrt m}. \end{align} We need here the asymptotic estimate divsqrsum for \sum_{m\le x}\frac1{\sqrt m}=2\sqrt x+c+O(x^{-1/2}), so that we get |\sum_{m\le y}\sum_{y<d\le y^2/m}\frac{\chi(d)}{\sqrt{dm}}|\le4C+O(\frac1{\sqrt y})=O(1).
For the other sum, we can write \begin{align} \sum_{d\le y}\frac{\chi(d)}{\sqrt d}\sum_{m\le y^2/d}\frac1{\sqrt m} &=\sum_{d\le y}\frac{\chi(d)}{\sqrt d}\left(2\frac{y}{\sqrt d}+c+O(\frac{\sqrt d}{y})\right)\\ &=2y\sum_{d\le y}\frac{\chi(d)}{d}+c\sum_{d\le y}\frac{\chi(d)}{\sqrt d}+O(1)\\ &=2y(L_0+O(\frac1y))+c(L_2+O(\frac1{\sqrt y}))+O(1)=2yL_0+O(1)=O(1). \end{align}
Thus \sum_{n\le x}\frac{F(n)}{\sqrt n}=O(1), a contradiction. Thus our assumption L_0=0 was false, so L_0\ne0. \tag*{$\blacksquare$} Then dchrvmasumif simplifies to dchrvmasum, and we are done. \tag*{$\blacksquare$}