Theorem 9.4.1 (dirith, Dirichlet's theorem): If $N\in\Bbb N$ and $(A,N)=1$, then there are infinitely many primes $p$ such that $N\mid p-A$.If you have a copy of Shapiro, "Introduction to the Theory of Numbers", open it to p. 375, Theorem 9.4.1, because the proof follows the book closely. Otherwise, I will do my best to explain the steps, as well as a few things that Shapiro glossed over.
The major players in the proof are the von Mangoldt function $\Lambda(n)$, which is defined by $\Lambda(p^\alpha)=\log p$ if $p^\alpha$ is a prime power and $\Lambda(n)=0$ otherwise, the Möbius function $\mu(n)$, which is $1$ if $n$ is a squarefree number with an even number of factors, $-1$ if it is squarefree with an odd number of factors, and $0$ otherwise, and the Dirichlet characters $\chi(n)$, which are completely multiplicative functions on $\Bbb Z/N\Bbb Z$ that are nonzero on the units. Facts about these functions will be introduced as we go along.
The starting point is the following theorem (vmadivsum, Equation 9.2.13): $$\sum_{n\le x}\frac{\Lambda(n)}n=\log x+O(1).$$
Proof: By definition, $\psi(x)=\sum_{n\le x}\Lambda(n)$ (this is the second Chebyshev function), and from chpo1ub $\psi(x)=O(x)$. Also (logfac2), $$\log\lfloor x\rfloor!=\sum_{n\le x}\Lambda(n)\left\lfloor\frac xn\right\rfloor.$$ We can put these together via \begin{align} \sum_{n\le x}\frac{\Lambda(n)}n &= \frac1x\sum_{n\le x}\Lambda(n)\frac xn\\ &=\frac1x\sum_{n\le x}\Lambda(n)\left\lfloor\frac xn\right\rfloor + \frac1x\sum_{n\le x}\Lambda(n)\left\{\frac xn\right\}\\ &=\frac1x\log\lfloor x\rfloor! + \frac1x O(\sum_{n\le x}\Lambda(n))\\ &=\frac1x(x\log x+O(x)) + \frac1x O(\psi(x))\\ &=\log x+O(1). \end{align} (We also used the asymptotic approximation of the factorial (logfaco1), $\log\lfloor x\rfloor!=x\log x+O(x)$ in this equation.) $$\tag*{$\blacksquare$}$$
Our goal will be to strengthen this to talk about a specific progression $\bmod N$ (we will hold $N$ fixed throughout the proof) to achieve rpvmasum, Equation 9.4.3: $$\phi(N)\sum_{\substack{n\le x\\n\equiv A}}\frac{\Lambda(n)}n=\log x+O(1).$$ Once we have established this, the main theorem is simple:
Theorem (rplogsum, Equation 9.4.4): $$\phi(N)\sum_{\substack{p\le x\\p\equiv A}}\frac{\log p}p=\log x+O(1).$$
Proof: (Note that $p$ is always a variable over primes here.) \begin{align} \sum_{\substack{n\le x\\n\equiv A}}\frac{\Lambda(n)}n-\sum_{\substack{p\le x\\p\equiv A}}\frac{\log p}p &= \sum_{\substack{p^\alpha\le x, \alpha>1\\p^\alpha\equiv A}}\frac{\log p}{p^\alpha} \le \sum_{p^\alpha\le x, \alpha>1}\frac{\log p}{p^\alpha}\\ &\le \sum_{p\in\Bbb P}\sum_{\alpha=2}^\infty\frac{\log p}{p^\alpha} = \sum_{p\in\Bbb P}\frac{\log p}{p(p-1)}\\ &\le \sum_{n=2}^\infty\frac{\log n}{n(n-1)} \le \sum_{n=2}^\infty\frac{\sqrt{n-1}}{n(n-1)}\\ &\le \sum_{n=2}^\infty\left(\frac2{\sqrt{n-1}}-\frac2{\sqrt n}\right)=2 \end{align} Thus $\phi(N)\sum_{n\le x,n\equiv A}\frac{\Lambda(n)}n=\log x+O(1)$ implies $\phi(N)\sum_{p\le x,p\equiv A}\frac{\log p}p=\log x+O(1)$. But this implies that $\sum_{p\le x,p\equiv A}\frac{\log p}p$ diverges, and since a finite sum cannot diverge, this means that $\{p\in\Bbb P\mid p\equiv A\pmod N\}$ is infinite. $$\tag*{$\blacksquare$}$$
We are left with the task of establishing rpvmasum. At this point we turn to Dirichlet characters as a "better basis" from which to weight the sum. This works because of the relation (sum2dchr) $$\sum_{\chi}\bar\chi(A)\chi(n)=\begin{cases}\phi(N)&n\equiv A\\0&o.w.\end{cases}$$ (where the sum is taken over all Dirichlet characters $\chi$). Using this equation we can rewrite the sum as $$\sum_{\substack{n\le x\\n\equiv A}}\frac{\Lambda(n)}n=\sum_{\chi}\bar\chi(A)\sum_{n\le x}\frac{\chi(n)\Lambda(n)}n.$$
There are two cases. The $\log x$ term comes from the principal character $\chi=\chi_0$ (written $\chi=\bf 1$ in Metamath):
Theorem (rpvmasumlem): $$\sum_{n\le x}\frac{\chi_0(n)\Lambda(n)}n=\log x+O(1)$$ Proof: \begin{align} \sum_{n\le x}\frac{\chi_0(n)\Lambda(n)}n &=\sum_{\substack{n\le x\\(n,N)=1}}\frac{\Lambda(n)}n\\ &=\sum_{n\le x}\frac{\Lambda(n)}n-\sum_{\substack{n\le x\\(n,N)>1}}\frac{\Lambda(n)}n\\ &=\log x+O(1)-\sum_{\substack{p^\alpha\le x\\(p^\alpha,N)>1}}\frac{\log p}{p^\alpha} \end{align} If $(p^\alpha,N)>1$, then $p$ is a divisor of $N$, and there are only finitely many of those. Thus $$\sum_{\substack{p^\alpha\le x\\(p^\alpha,N)>1}}\frac{\log p}{p^\alpha} \le\sum_{p\mid N}\sum_{\alpha=1}^\infty\frac{\log p}{p^\alpha} =\sum_{p\mid N}\frac{\log p}{p-1}$$ so $\sum_{n\le x,(n,N)>1}\frac{\Lambda(n)}n=O(1)$ and $\sum_{n\le x}\frac{\chi_0(n)\Lambda(n)}n=\log x+O(1)$. $$\tag*{$\blacksquare$}$$
The other case (the hard case) is when $\chi\ne\chi_0$. Fix a nonprincipal character $\chi$. Our goal is to show (dchrvmasum, Equation 9.4.8) $$\sum_{n\le x}\frac{\chi(n)\Lambda(n)}n=O(1).$$ From this it follows that \begin{align} \sum_{\substack{n\le x\\n\equiv A}}\frac{\Lambda(n)}n &=\sum_{n\le x}\frac{\chi_0(n)\Lambda(n)}n+\sum_{\chi\ne\chi_0}\bar\chi(A)\sum_{n\le x}\frac{\chi(n)\Lambda(n)}n\\ &=(\log x+O(1))+\sum_{\chi\ne\chi_0}\bar\chi(A)O(1)\\ &=\log x+O(1), \end{align} which will prove vmadivsum.
Theorem (vmasum, Equation 9.2.4): $$\log n=\sum_{d\mid n}\Lambda(d)$$
Proof: \begin{align} \sum_{d\mid n}\Lambda(d) &=\sum_{p^\alpha\mid n}\log p =\sum_{p}\sum_{\alpha\le\nu_p(n)}\log p\\ &=\sum_{p}\nu_p(n)\log p =\log\prod_{p} p^{\nu_p(n)}=\log n \end{align} where $\nu_p(n)$ (written in Metamath as $(p\mbox{ pCnt }n)$) is the exponent of $p$ in the prime factorization of $n$. $$\tag*{$\blacksquare$}$$
Using Möbius inversion (muinv) on this equation gives $$\Lambda(n)=\sum_{d\mid n}\mu(d)\log\frac nd,$$ hence (dchrvmasumlem1) \begin{align} \sum_{n\le x}\frac{\chi(n)\Lambda(n)}n &=\sum_{n\le x}\sum_{d\mid n}\frac{\chi(n)\mu(d)}n\log\frac nd\\ &=\sum_{d\le x}\sum_{m\le x/d}\frac{\chi(dm)\mu(d)}{dm}\log m\\ &=\sum_{d\le x}\frac{\chi(d)\mu(d)}{d}\sum_{m\le x/d}\frac{\chi(m)}{m}\log m. \end{align}
We have a similar equation for $\log x$ (dchrvmasum2lem): \begin{align} \log x&=\sum_{n\le x}\frac{\chi(n)}n\log\frac xn\sum_{d\mid n}\mu(d)\\ &=\sum_{n\le x}\sum_{d\mid n}\frac{\chi(n)\mu(d)}n\log\frac xn\\ &=\sum_{d\le x}\sum_{m\le x/d}\frac{\chi(dm)\mu(d)}{dm}\log\frac x{dm}\\ &=\sum_{d\le x}\frac{\chi(d)\mu(d)}{d}\sum_{m\le x/d}\frac{\chi(m)}{m}\log\frac x{dm}. \end{align}
The first step is an application of musum to collapse the double sum to just the $n=1$ term. Combining these two yields the equation $$\sum_{n\le x}\frac{\chi(n)\Lambda(n)}n+\log x=\sum_{d\le x}\frac{\chi(d)\mu(d)}{d}\sum_{m\le x/d}\frac{\chi(m)}{m}\log\frac xd.$$
Our next task is to get an asymptotic expression for the sums $\sum_{m\le z}\frac{\chi(m)}{m}$ and $\sum_{m\le z}\frac{\chi(m)}{m}\log m$ appearing here. To that end, we introduce the following lemma:
Theorem (dchrisum, Theorem 9.4.1): Let $f:\Bbb R^+\to\Bbb R$ be an eventually monotonically decreasing function (that is, for some $M$, $M\le x\le y$ implies $f(x)\ge f(y)$) which converges to $0$. Then $\sum_n\chi(n)f(n)$ converges to some $L$, and for some $c$, $$x\ge M\implies|\sum_{n\le x}\chi(n)f(n)-L|\le cf(x).$$
Proof: It suffices to show that $\sum_{a\le n<b}\chi(n)f(n)=O(f(x))$ (for positive integers $a,b$ with $x\le a\le b$), because then the sequence is Cauchy and so converges to a limit $L$, and since the entire sequence above $x$ is inside the closed set $\{z\in\Bbb C\mid|\sum_{n\le x}\chi(n)f(n)-z|\le f(x)\}$, so is the limit.
Let $R(k)=\sum_{n=0}^{k-1}\chi(n)$. Then $R(N)$ sums over all the equivalence classes of $\Bbb Z/N\Bbb Z$ exactly once, so $R(N)=\sum_{i\in\Bbb Z/N\Bbb Z}\chi(i)$, and since $\chi$ is nonprincipal, by dchrsum $R(N)=0$. Let $C\ge|R(k)|$ for all $0\le k<N$. (This is a finite set, so there is a finite supremum.) Then setting $k=aN+b$ where $b<N$, we have \begin{align} R(k)&=\sum_{i=0}^{a-1}\sum_{n=0}^{N-1}\chi(iN+n)+\sum_{n=0}^{b-1}\chi(aN+n)\\ &=\sum_{i=0}^{a-1}\sum_{n=0}^{N-1}\chi(n)+\sum_{n=0}^{b-1}\chi(n)\\ &=\sum_{i=0}^{a-1}R(N)+R(b)=R(b), \end{align} and $C\ge|R(b)|$; hence $C\ge|R(k)|$ for all $k$.
Using summation by parts, \begin{align} |\sum_{a\le n<b}\chi(n)f(n)| &=|\sum_{a\le n<b}(R(n+1)-R(n))f(n)|\\ &=|R(b)f(b)-R(a)f(a)-\sum_{a\le n<b}R(n+1)(f(n+1)-f(n))|\\ &\le Cf(b)+Cf(a)+\sum_{a\le n<b}C|f(n+1)-f(n)|\\ &=Cf(b)+Cf(a)+\sum_{a\le n<b}C(f(n)-f(n+1))\\ &=Cf(b)+Cf(a)+C(f(a)-f(b))\le 2Cf(x). \end{align} $$\tag*{$\blacksquare$}$$
With dchrisum in hand we can apply it to the decreasing functions $f(x)=\frac{\log x}x$ and $f(x)=\frac1x$; let $$L_0=\sum_{n=1}^\infty\frac{\chi(n)}n\qquad L_1=\sum_{n=1}^\infty\frac{\chi(n)\log n}n$$ be the limits of the series that result.
Theorem (dchrmusum2, Lemma 9.4.2): $$L_0\sum_{n\le x}\frac{\chi(n)\mu(n)}n=O(1).$$
Proof: We have: \begin{align} 1&=\sum_{n\le x}\sum_{d\mid n}\frac{\chi(n)\mu(d)}n\\ &=\sum_{d\le x}\sum_{m\le x/d}\frac{\chi(dm)\mu(d)}{dm}\\ &=\sum_{d\le x}\frac{\chi(d)\mu(d)}{d}\sum_{m\le x/d}\frac{\chi(m)}m \end{align}
dchrisum provides that $|\sum_{m\le z}\frac{\chi(m)}m-L_0|\le\frac Cz$ for some $C$ and all $z$. Then: \begin{align} |L_0\sum_{n\le x}\frac{\chi(n)\mu(n)}n| &\le1+|1-L_0\sum_{n\le x}\frac{\chi(n)\mu(n)}n|\\ &=1+|\sum_{d\le x}\frac{\chi(d)\mu(d)}{d}(\sum_{m\le x/d}\frac{\chi(m)}m-L_0)|\\ &\le1+\sum_{d\le x}\frac1d|\sum_{m\le x/d}\frac{\chi(m)}m-L_0|\\ &\le1+\sum_{d\le x}\frac Cx\le 1+C. \end{align} $$\tag*{$\blacksquare$}$$
We will be doing a lot of case analysis on $L_0=0$, so let the notation $\left[a\atop b\right]$ denote $$\left[a\atop b\right]=\begin{cases}a&L_0=0\\b&L_0\ne0\end{cases}.$$
Theorem (dchrvmasumif, Equation 9.4.24): $$\sum_{n\le x}\frac{\chi(n)\Lambda(n)}n=-\left[\log x\atop 0\right]+O(1).$$
Proof: Earlier, we showed that $$\sum_{n\le x}\frac{\chi(n)\Lambda(n)}n+\left[\log x\atop 0\right]= \sum_{d\le x}\frac{\chi(d)\mu(d)}{d}\sum_{m\le x/d}\frac{\chi(m)}{m}\log\left[x/d\atop m\right].$$ We will show that this latter sum is bounded. Let $g(x)=\sum_{n\le x}\frac{\chi(n)}n\log\left[x\atop n\right]$, so that the sum in question is $\sum_{d\le x}\frac{\chi(d)\mu(d)}{d}g(x/d)$. Let us show that $$g(x)=\left[0\atop L_1\right]+O\left(\frac{\log x}x\right).$$ In the case $L_0=0$, we have $$g(x)=\sum_{n\le x}\frac{\chi(n)}n\log x=\left(L_0+O\left(\frac1x\right)\right)\log x=O\left(\frac{\log x}x\right),$$ and if $L_0\ne0$, we have $$g(x)=\sum_{n\le x}\frac{\chi(n)\log n}n=L_1+O\left(\frac{\log x}x\right).$$ Since $\left[0\atop L_1/L_0\right]L_0=\left[0\atop L_1\right]$ and $L_0\sum_{d\le x}\frac{\chi(d)\mu(d)}d=O(1)$, we also have $\left[0\atop L_1\right]\sum_{d\le x}\frac{\chi(d)\mu(d)}d=O(1)$. \begin{align} \sum_{n\le x}\frac{\chi(n)\Lambda(n)}n+\left[\log x\atop 0\right] &=\sum_{d\le x}\frac{\chi(d)\mu(d)}dg(x/d)\\ &=O(1)+\sum_{d\le x}\frac{\chi(d)\mu(d)}d\left(g(x/d)-\left[0\atop L_1\right]\right) \end{align}
Now suppose the big-O constants take $x\ge A\implies \left|g(x)-\left[0\atop L_1\right]\right|\le C\frac{\log x}x$, with $A\ge1$, and set $D=\sum_{n<A}\frac{\log A}n$. Then $x<A$ implies \begin{align} |g(x)|&\le\sum_{n\le x}\frac1n\left|\log\left[x\atop n\right]\right|\\ &\le\sum_{n\le x}\frac1n\log A\le\sum_{n<A}\frac1n\log A=D\\ \end{align} Then: \begin{align} \left|\sum_{d\le x}\frac{\chi(d)\mu(d)}d\left(g(x/d)-\left[0\atop L_1\right]\right)\right| &\le\sum_{d\le x}\frac1d\left|g(x/d)-\left[0\atop L_1\right]\right|\\ &\le\sum_{d\le x/A}\frac Cd\frac{\log(x/d)}{x/d}+\sum_{x/A<d\le x}\frac Dd\\ &\le\frac Cx\sum_{d\le x}(\log x-\log d)+D(\log x-\log(x/A)+1)\\ &\le\frac Cx(x\log x-\log\lfloor x\rfloor!)+D(\log A+1) \end{align} which is bounded using $\log\lfloor x\rfloor!=x\log x+O(x)$. $$\tag*{$\blacksquare$}$$
We're almost there: if we can prove $L_0\ne0$ unconditionally, dchrvmasumif will give us what we want. Replay the derivation of rpvmasum, with $A=1$ so that the $\bar\chi(A)$ factors disappear: \begin{align} \sum_{\substack{n\le x\\n\equiv 1}}\frac{\Lambda(n)}n &=\sum_{n\le x}\frac{\chi_0(n)\Lambda(n)}n+\sum_{\chi\ne\chi_0}\sum_{n\le x}\frac{\chi(n)\Lambda(n)}n\\ &=(\log x+O(1))+\sum_{\chi\ne\chi_0}(-\left[\log x\atop 0\right]+O(1))\\ &=(1-|W|)\log x+O(1), \end{align} where $W$ is the set of all $\chi\ne\chi_0$ such that $L_0(\chi)\ne 0$. Since $\sum_{n\le x,n\equiv 1}\frac{\Lambda(n)}n$ is nonnegative, we have a contradiction if $|W|>1$ (because the $(1-|W|)\log x+O(1)$ will eventually be negative). Thus there is at most one $\chi$ such that $L_0(\chi)=0$. Now fix $\chi$ again and assume $L_0=0$. Because $\bar\chi$ is also a Dirichlet character, and $L_0(\bar\chi)=\overline{L_0(\chi)}=0$, we must have $\bar\chi=\chi$, so $\chi$ is a real Dirichlet character (dchrisum0re).
Now let $F(n)=\sum_{d\mid n}\chi(d)$. Because a divisor sum of a multiplicative function is multiplicative (fsumdvdsmul), $F$ is multiplicative (dchrisum0fmul). (Unlike $\chi$ it is not completely multiplicative.)
Theorem (dchrisum0flb, Equation 9.4.29): $$F(n)\ge\begin{cases}1&\sqrt n\in\Bbb N\\0&o.w.\end{cases}.$$
Proof: We begin by establishing the theorem for prime powers $p^\alpha$, $\alpha\ge0$. In this case $\sqrt{p^\alpha}\in\Bbb N$ iff $\alpha$ is even. Now $$F(p^\alpha)=\sum_{d\mid p^\alpha}\chi(d)=\sum_{k\le\alpha}\chi(p^k)=\sum_{k\le\alpha}\chi(p)^k,$$ and since $\chi$ is a real Dirichlet character it can only take the values $-1,0,1$. If $\chi(p)=1$, then $F(p^\alpha)=\alpha+1\ge1$; if $\chi(p)=0$, then $F(p^\alpha)=1\ge1$; and if $\chi(p)=-1$ then $$F(p^\alpha)=\sum_{k\le\alpha}(-1)^k=\frac{1+(-1)^{\alpha}}2$$ which is $1$ when $\alpha$ is even and $0$ if it is odd. Thus in any case the theorem is proven for prime powers.
Now to cover the general case, by complete induction. If $n=1$ then it is a prime power so the theorem is proven; otherwise let $p$ be a prime divisor of $n$, and let $n=p^\alpha k$ where $p\nmid k$, so $F(n)=F(p^\alpha)F(k)$. We have proven that $F(p^\alpha)$ satisfies the theorem, and $F(k)$ satisfies the theorem by the induction hypothesis, because $k<n$. The product of two nonnegatives is nonnegative, so $F(n)\ge0$. If $n$ is a square then $\nu_p(n)=\alpha$ is even, so $p^\alpha$ is a square, and so $k$ is a quotient of squares, which is thus also a square. Thus $F(p^\alpha),F(k)\ge1$, so $F(n)\ge1$. $$\tag*{$\blacksquare$}$$
Therefore, $$\sum_{n\le x}\frac{F(n)}{\sqrt n}\ge\sum_{m^2\le x}\frac{F(m^2)}{m}\ge\sum_{m\le\sqrt x}\frac1m\ge\frac12\log x,$$ so $\sum_{n\le x}\frac{F(n)}{\sqrt n}$ is unbounded.
Theorem (dchrisumn0, Lemma 9.4.4): $L_0\ne0$.
Proof: We will show that $L_0=0$ implies $\sum_{n\le x}\frac{F(n)}{\sqrt n}$ is bounded. We have: \begin{align} \sum_{n\le x}\frac{F(n)}{\sqrt n} &=\sum_{n\le x}\sum_{d\mid n}\frac{\chi(d)}{\sqrt n}\\ &=\sum_{d\le x}\sum_{m\le x/d}\frac{\chi(d)}{\sqrt{dm}}\\ &=\sum_{d\le\sqrt x}\sum_{m\le x/d}\frac{\chi(d)}{\sqrt{dm}} +\sum_{m\le\sqrt x}\sum_{\sqrt x<d\le x/m}\frac{\chi(d)}{\sqrt{dm}}\\ &=\sum_{d\le y}\sum_{m\le y^2/d}\frac{\chi(d)}{\sqrt{dm}} +\sum_{m\le y}\sum_{y<d\le y^2/m}\frac{\chi(d)}{\sqrt{dm}} \end{align} (changing variables $y\mapsto\sqrt x$ in the last step). Using dchrisum once more, we obtain $L_2$ such that $|\sum_{n\le x}\frac{\chi(d)}{\sqrt d}-L_2|\le\frac C{\sqrt x}$. Then \begin{align} |\sum_{m\le y}\sum_{y<d\le y^2/m}\frac{\chi(d)}{\sqrt{dm}}| &\le\frac{2C}{\sqrt y}\sum_{m\le y}\frac1{\sqrt m}. \end{align} We need here the asymptotic estimate divsqrsum for $\sum_{m\le x}\frac1{\sqrt m}=2\sqrt x+c+O(x^{-1/2})$, so that we get $$|\sum_{m\le y}\sum_{y<d\le y^2/m}\frac{\chi(d)}{\sqrt{dm}}|\le4C+O(\frac1{\sqrt y})=O(1).$$
For the other sum, we can write \begin{align} \sum_{d\le y}\frac{\chi(d)}{\sqrt d}\sum_{m\le y^2/d}\frac1{\sqrt m} &=\sum_{d\le y}\frac{\chi(d)}{\sqrt d}\left(2\frac{y}{\sqrt d}+c+O(\frac{\sqrt d}{y})\right)\\ &=2y\sum_{d\le y}\frac{\chi(d)}{d}+c\sum_{d\le y}\frac{\chi(d)}{\sqrt d}+O(1)\\ &=2y(L_0+O(\frac1y))+c(L_2+O(\frac1{\sqrt y}))+O(1)=2yL_0+O(1)=O(1). \end{align}
Thus $\sum_{n\le x}\frac{F(n)}{\sqrt n}=O(1)$, a contradiction. Thus our assumption $L_0=0$ was false, so $L_0\ne0$. $$\tag*{$\blacksquare$}$$ Then dchrvmasumif simplifies to dchrvmasum, and we are done. $$\tag*{$\blacksquare$}$$