Bounds for monochromatic solutions to $\{x+y,xy\}$

Ben Green Mathematical Institute, Andrew Wiles Building, Radcliffe Observatory Quarter, Woodstock Rd, Oxford OX2 6QW, UK ben.green@maths.ox.ac.uk and Mehtaab Sawhney Department of Mathematics, Columbia University, New York, NY 10027 m.sawhney@columbia.edu

Abstract.

Let $r$ be a sufficiently large positive integer, and let $N\geqslant\exp\exp(r^{50})$ . Then any $r$ -colouring of $[N]$ contains a monochromatic copy of $\{x+y,xy\}$ with $x>y>2$ .

1. Introduction

The key result in this work is an effective bound for $r$ -colourings of the natural numbers $\mathbf{N}$ containing a monochromatic copy of $\{x+y,xy\}$ .

Theorem 1.1.

There is a constant $r_{0}$ such that the following holds. Let $r\geqslant r_{0}$ be an integer and let $N\geqslant\exp\exp(r^{50})$ . Then any $r$ -colouring of $[N]:=\{1,\ldots,N\}$ contains a monochromatic copy of $\{x+y,xy\}$ with $x>y>2$ .

Remarks.

The constant $r_{0}$ is effectively computable. Furthermore, minor tweaks to the numerics in our arguments would allow one to replace $50$ with a slightly smaller constant. However, to obtain a ‘small’ constant (less than 10, say) would appear to require new ideas. Finally we make no effort to compute an actual value of $r_{0}$ . Due to arguments regarding the possible existence of a Siegel zero in Appendix˜B (among other reasons), to do so would be rather painful.

In the other direction, for all $r$ there is an $r$ -colouring of $[N]$ with no monochromatic $\{x+y,xy\}$ with $x>y>2$ when $N=\frac{1}{2}(3^{r}+7)$ , and therefore Theorem˜1.1 is at most one logarithm from the optimal result. To obtain such a colouring, use colour $i$ for $[a_{i},a_{i+1})$ where $a_{i}:=\frac{1}{2}(3^{i}+9)$ , $i=0,\dots,r-1$ , and any colour for $\{1,2,3,4\}$ . The point here is that $a_{i+1}=3(a_{i}-3)$ and so if $x+y\in[a_{i},a_{i+1})$ with $x,y\geqslant 3$ then $xy\geqslant a_{i+1}$ . We never have $x+y$ or $xy\in\{1,2,3,4\}$ .

We remark that obtaining effective bounds for the pattern $\{x+y,xy\}$ has been raised by both the first author [GreOp, Problem 22] and by Richter [Ric25, Question 7.2].

1.1. Previous results

Theorem˜1.1 guarantees the existence of infinitely many pairs $\{x+y,xy\}$ given a fixed $r$ -colouring of $\mathbf{N}$ . To see this, suppose that we have found $d$ such monochromatic pairs $\{x_{i}+y_{i},x_{i}y_{i}\}$ , $i=1,\dots,d$ . We modify our colouring of $\mathbf{N}$ to an $(r+2d)$ -colouring in which $x_{1},\dots,x_{d},y_{1},\dots,y_{d}$ are given distinct colours, different to the original $r$ , and then use Theorem˜1.1 to find a further pair $\{x_{d+1}+y_{d+1},x_{d+1}y_{d+1}\}$ . (Alternatively one may observe that our proof of Theorem˜1.1 may be trivially modified to give many monochromatic pairs as $N\rightarrow\infty$ for a fixed value of $r$ .)

This existential statement was first proven in a celebrated paper of Moreira [Mor17]; furthermore Moreira in fact guarantees a monochromatic pattern of the form $\{x,x+y,xy\}$ . This result represents substantial progress towards Hindman’s conjecture that any $r$ -colouring of $\mathbf{N}$ contains a monochromatic copy of $\{x,y,x+y,xy\}$ . Recently there has been further important progress towards Hindman’s conjecture in various settings. Bowen [Bow25] has proven that any $2$ -colouring of $\mathbf{N}$ contains infinitely many copies of $\{x,y,x+y,xy\}$ . Bowen and Sabok [BS24] have proven that any $r$ -colouring of $\mathbf{Q}^{\neq 0}$ contains a copy of $\{x,y,x+y,xy\}$ and Alweiss [Alw23] extended this to patterns of the form $\{\sum_{i\in S}x_{i},\prod_{i\in S}x_{i}\}$ where $S\subseteq[k]$ ranges over all nontrivial subsets. Additionally Alweiss [Alw24] has given an alternate proof of the result of Moreira. However even when restricting to $\{x+y,xy\}$ the proofs of Moreira and Alweiss give at least tower–type bounds due to highly recursive Ramsey type arguments. We remark that while the main argument of Moreira is purely qualitative, he indicates in [Mor17, Section 5] a variant argument using van der Waerden’s theorem (or Szemerédi’s theorem) which does give explicit finite bounds when used with appropriate bounds for Szemerédi’s theorem due to Gowers [Gow01].

Recently, Richter [Ric25] provided a quite different, more analytic, proof of Moreira’s result about $\{x+y,xy\}$ . The argument of Richter is quite infinitary in flavour and gives no bounds. However, as will be discussed shortly, our methods in this paper are very strongly influenced by those of Richter.

One may additionally compare Theorem˜1.1 with bounds for certain Schur-type equations. For instance, for the configuration $\{x,y,x+y\}$ , bounds of the form $\exp(r^{O(1)})$ are known due to work of Cwalina and Schoen [CS17]. Note that this (by restricting to powers of $2$ ) gives an essentially double-exponential bound for $\{x,y,xy\}$ . Furthermore for more general linear systems $A$ , bounds of the form $\exp\exp(r^{O_{A}(1)})$ are proven in generality by Sanders [San20], and good control on the implicit constant $O_{A}(1)$ for many systems may be found in work of Chapman and Prendiville [CP20].

1.2. Proof outline

Our work draws heavily on recent beautiful work of Richter [Ric25]; many of the ideas presented in this section are drawn from this work.

Logarithmic averages play a central role, so we define these before turning to an outline of the proof. If $\mathcal{N}$ is a finite set of positive integers and if $f:\mathcal{N}\rightarrow\mathbf{C}$ is a function, we write

\mathbb{E}_{n\in\mathcal{N}}^{\log}f(n):=\frac{\sum_{n\in\mathcal{N}}f(n)/n}{\sum_{n\in\mathcal{N}}1/n}.

We write $\mathbb{E}_{n_{1}\in\mathcal{N}_{1},n_{2}\in\mathcal{N}_{2}}^{\log}$ as a shorthand for $\mathbb{E}_{n_{1}\in\mathcal{N}_{1}}^{\log}\mathbb{E}^{\log}_{n_{2}\in\mathcal{N}_{2}}$ (and similarly for higher iterates). We will often use this notation when $\mathcal{N}=[N]=\{1,\dots,N\}$ .

Suppose now that $[N]=A_{1}\cup\cdots\cup A_{r}$ is an $r$ -colouring of $[N]=\{1,\dots,N\}$ in which we seek to find a monochromatic pair $x+y,xy$ . The colour class in which this pair will be found is identified right at the very start of the proof. We take $B_{0}$ to be a fixed set of $r^{O(1)}$ ‘highly divisible’ numbers; the precise set we take is $B_{0}:=\{V^{4^{i}}:i=1,2,\dots,r^{C_{1}}\}$ , where $V=(r^{C_{2}})!$ for appropriate constants $C_{1},C_{2}$ . By the pigeonhole principle there is some $A=A_{\ell}$ which contains many multiples of elements of $B_{0}$ in the sense that that $\mathbb{E}_{n\in[N]}^{\log}1_{A}(bn)\gg 1/r$ for at least $\gg r^{C_{1}-1}$ elements $b\in B_{0}$ . We will find the desired configuration $\{x+y,xy\}$ in this colour class, which we fix for the rest of the argument.

The next key idea, which follows [Ric25] very closely, is to locate a ‘rich’ set of pairs $\{x,xy\}$ in $A$ . This is done using a variant of arguments of Ahlswede, Khachatrian and Sárközy [AKS99] and Davenport and Erdős [DE36]. This argument involves the choice of various auxiliary sets of primes (for details see Section˜7.1) and a key component is Elliott’s inequality from multiplicative number theory (given in Lemma˜A.5 in the form we shall need). The output of this argument is many instances of the inequality

\mathbb{E}_{n\in[N],p_{1}\in\mathscr{P}_{1},\dots,p_{k}\in\mathscr{P}_{k}}^{\log}1_{A}(bn)1_{A}(b^{\prime}p_{1}\cdots p_{k})\gg r^{O(1)}

(1.1)

for some fixed $b\in B_{0}$ and many $b^{\prime}\in B_{0}$ with $b<b^{\prime}$ and associated $k$ where $2\leqslant k\ll r^{O(1)}$ , and where the sets $\mathscr{P}_{i}$ of primes can be chosen at many different scales. (The precise statement we are sketching here may be found at ˜7.6.) This provides the aforementioned rich source of configurations $\{x,xy\}$ , here with $x:=bn$ and $y:=\frac{b^{\prime}}{b}p_{1}\cdots p_{k}$ ,

The main business of the proof is a kind of deformation of the patterns $x,xy$ to the desired $x+y,xy$ . To describe how this works, fix an instance of ˜1.1 (that is, fix $b^{\prime}$ and the sets $\mathscr{P}_{i}$ of primes). Set $f(n):=1_{A}(bn)$ . We will then consider two ‘projections’ $\Pi^{\operatorname{sml}}f$ and $\Pi^{\operatorname{lrg}}f$ , both of which average over progressions. They are defined by

\Pi^{\operatorname{sml}}f(n):=\mathbb{E}_{h,h^{\prime}\in[H]}f(n+q(h-h^{\prime}))\qquad\mbox{and}\qquad\Pi^{\operatorname{lrg}}f(n):=\mathbb{E}_{h,h^{\prime}\in[\tilde{H}]}f(n+\tilde{q}(h-h^{\prime}))

where here $q\mid\tilde{q}$ and $H>\tilde{H}$ . (The actual choice of parameters depends on the scale of the sets of primes $\mathscr{P}_{i}$ ; the details are given at ˜7.7). One should think of $q,\tilde{q}$ as being bounded in terms of $r$ , whereas the lengths $H,\tilde{H}$ grow with $N$ .

The small projection $\Pi^{\operatorname{sml}}$ is chosen so that we may run the following argument, starting from ˜1.1. First, via a kind of maximal function argument, we replace ˜1.1 by

\mathbb{E}_{n\in[N],p_{1}\in\mathscr{P}_{1},\dots,p_{k}\in\mathscr{P}_{k}}^{\log}\Pi^{\operatorname{sml}}f(n)1_{A}(b^{\prime}p_{1}\cdots p_{k})\gg r^{-O(1)}.

(1.2)

Details of this argument may be found in Lemma˜6.5.

Then, we use the almost-periodicity property $\Pi^{\operatorname{sml}}f(n)\approx\Pi^{\operatorname{sml}}f(n+\frac{b^{\prime}}{b^{2}}p_{1}\cdots p_{k})$ to replace ˜1.2 by

\mathbb{E}_{n\in[N],p_{1}\in\mathscr{P}_{1},\dots,p_{k}\in\mathscr{P}_{k}}^{\log}\Pi^{\operatorname{sml}}f\big(n+\frac{b^{\prime}}{b^{2}}p_{1}\cdots p_{k}\big)1_{A}(b^{\prime}p_{1}\cdots p_{k})\gg r^{-O(1)}.

(1.3)

(note here that $b^{\prime}/b^{2}$ is an integer by the highly divisible nature of the set $B_{0}$ ). In order for this almost-periodicity property to hold, the small projection $\Pi^{\operatorname{sml}}$ must be chosen appropriately: $q$ must divide $\frac{b^{\prime}}{b^{2}}p_{1}\cdots p_{k}$ and $H$ must be sufficiently long.

Leaving ˜1.3 aside for the moment, the technical heart of the proof is then an argument to the effect that (for an appropriate choice of the large projection $\Pi^{\operatorname{lrg}}$ ) we have

	$\displaystyle\mathbb{E}_{n\in[N],p_{1}\in\mathscr{P}_{1},\dots,p_{k}\in\mathscr{P}_{k}}^{\log}$	$\displaystyle\Pi^{\operatorname{lrg}}f\big(n+\frac{b^{\prime}}{b^{2}}p_{1}\cdots p_{k}\big)1_{A}(b^{\prime}p_{1}\cdots p_{k})$
		$\displaystyle\approx\mathbb{E}_{n\in[N],p_{1}\in\mathscr{P}_{1},\dots,p_{k}\in\mathscr{P}_{k}}^{\log}f\big(n+\frac{b^{\prime}}{b^{2}}p_{1}\cdots p_{k}\big)1_{A}(b^{\prime}p_{1}\cdots p_{k}).$		(1.4)

Supposing that this has been established, imagine that we additionally have

\Pi^{\operatorname{sml}}f\approx\Pi^{\operatorname{lrg}}f

(1.5)

(in an $\ell^{2}$ sense). Combining ˜1.3, 1.4, and 1.5 then gives, assuming the various uses of $\approx$ work in our favour, that

\mathbb{E}_{n\in[N],p_{1}\in\mathscr{P}_{1},\dots,p_{k}\in\mathscr{P}_{k}}^{\log}f\big(n+\frac{b^{\prime}}{b^{2}}p_{1}\cdots p_{k}\big)1_{A}(b^{\prime}p_{1}\cdots p_{k})\gg r^{-O(1)}.

Recalling that $f(n)=1_{A}(bn)$ , it then follows that for some choice of $n$ and $p_{1},\dots,p_{k}$ we have $bn+\frac{b^{\prime}}{b}p_{1}\cdots p_{k},b^{\prime}p_{1}\cdots p_{k}\in A$ . This is the desired configuration $\{x+y,xy\}$ , with $x=bn$ and $y=\frac{b^{\prime}}{b}p_{1}\cdots p_{k}$ .

Whilst ˜1.5 will not be true in general (the projections $\Pi^{\operatorname{sml}},\Pi^{\operatorname{lrg}}$ are quite different in scale), an ‘energy-chaining’ or arithmetic regularity type of argument can be used to show that ˜1.5 does hold for at least one scale of primes $\mathscr{P}_{1},\dots,\mathscr{P}_{k}$ . This part of the argument can be thought of as a quantitative version of the existence of projections in Hilbert space, specifically of the decomposition into locally aperiodic and locally quasiperiodic functions which is important in Richter’s work. This connection between existence of projections in Hilbert space and regularity lemmas is by now well established; see e.g. [Tao07, Section 2].

The remaining part of the argument is then to justify ˜1.4. This is done via a general study of averages

\mathbb{E}_{n\in[N],p_{1}\in\mathscr{P}_{1},\dots,p_{k}\in\mathscr{P}_{k}}^{\log}f_{1}(n+\lambda p_{1}\cdots p_{k})f_{2}(np_{1}\cdots p_{k}),

(1.6)

where $\lambda=b^{\prime}/b^{2}$ in our setting. Here, we consider arbitrary $1$ -bounded functions $f_{1},f_{2}$ , and the key question of interest is the ‘inverse question’ of what can be said if ˜1.6 is at least $\delta$ in magnitude for some $\delta>0$ . Our main result on this topic, Proposition˜5.1, is an inverse theorem for this question. It concludes that under such a hypothesis (and with suitable assumptions on the sets $\mathscr{P}_{i}$ of primes) the function $f_{1}$ is biased along progressions to some modulus $\tilde{q}=\lambda\lfloor\delta^{-C}\rfloor!$ and length $H$ comparable (in logarithmic scale) to the largest of the primes $\mathscr{P}_{i}$ . The statement ˜1.4 follows very quickly from this inverse theorem (see Lemma˜6.3 for the argument).

This inverse theorem, Proposition˜5.1, is the most novel part of our paper. Whilst it is in a sense a quantitative, finitary version of [Ric25, Theorem 3.5], it is not a direct translation of that result, which would appear to be far too weak for our purposes. The key difference when unwinding the argument in [Ric25] in finitary language is that the latter finds bias along progressions with size depending on $\mathscr{P}_{i}$ while ours depends only on $\delta$ . The proof of Proposition˜5.1 is lengthy, and involves a Fourier analytic argument combined with Cauchy–Schwarz manœuvres inspired by certain “concatenation” results in the additive combinatorics literature, for instance [PP24, Pel20]. Ultimately it is these concatenation ideas which eliminate the dependence on $\mathscr{P}_{i}$ . Key further ingredients are:

•

Quantitative diophantine approximation results (Lemma˜2.2);
•

‘Log-free’ exponential sum estimates for certain arithmetic sets, specifically sets $\mathscr{P}^{\prime}=\{p_{2}\cdots p_{k}:p_{2}\in I_{2},\dots,p_{k}\in I_{k}\}$ of ‘almost primes’, as well as the sets of squares of the elements of such sets (Section˜3);
•

Construction of a majorant for the primes with a certain Fourier decomposition (Section˜4), in order to avoid the constant $r_{0}$ in our main result being ineffective due to possible Siegel zeros.

1.3. Acknowledgments

BG is supported by Simons Investigator Award 376201. This research was conducted during the period MS served as a Clay Research Fellow.

1.4. Notation

At various points, for brevity it will be expedient to use the following notation. If $f:\mathbf{Z}\rightarrow\mathbf{C}$ is a function and if $h,h^{\prime}\in\mathbf{Z}$ , we write $\Delta_{(h,h^{\prime})}f(x):=f(x+h)\overline{f(x+h^{\prime})}$ . If $\lambda$ is some further integer parameter, by $\Delta_{\lambda(h,h^{\prime})}f$ we mean $\Delta_{(\lambda h,\lambda h^{\prime})}f$ .

By a dyadic interval we mean any subset of $\mathbf{N}$ of the form $\{n:Y\leqslant n<2Y\}$ . We will occasionally abuse notation by writing $[H]$ when we really mean $[\lfloor H\rfloor]$ , for some $H\in\mathbf{R}_{\geqslant 1}$ .

When we say that a parameter (for instance $\delta$ ) is ‘sufficiently small’ we mean that $\delta\leqslant\delta_{0}$ for some absolute $\delta_{0}$ which we do not explicitly specify, and analogously if we say that $N$ is ‘sufficiently large’ we mean that $N\geqslant N_{0}$ for some absolute constant $N_{0}$ . It is important to remark that $\delta_{0},N_{0}$ are absolute and do not depend on the number of colours $r$ (otherwise our results would have little content). Throughout the paper the letter $N$ will always denote a sufficiently large integer parameter.

We write $(x,y)$ for the greatest common divisor of $x,y$ and $[x,y]$ for the lowest common multiple.

2. Diophantine sets and averages

The purpose of this section is to bound certain averages that will appear in the arguments of the next section, where our key technical result is established. The averages in question will be of the form

\mathbb{E}_{n\in[N]}^{\log}\mathbb{E}_{s\in S,t,t^{\prime}\leqslant T}f(n+ts)\overline{f(n+t^{\prime}s)}=\mathbb{E}_{n\in[N]}^{\log}\mathbb{E}_{s\in S,t,t^{\prime}\in T}\Delta_{t(s,s^{\prime})}f(n),

where $S\subset\mathbf{N}$ is contained in some dyadic interval, or the analogous average with $\mathbb{E}_{n\in[N]}$ in place of the logarithmic average. The main result of the section is Lemma˜2.4 below.

In our applications the set $S$ will have a useful arithmetic property, namely that it satisfies a ‘log-free Weyl-type estimate’. The precise definition we will use is the following.

Definition 2.1.

Let $L,L^{\prime},D$ be parameters. Let $S$ be a set of integers. Suppose that whenever $\delta\in(0,\frac{1}{2})$ and $|\mathbb{E}_{s\in S}e(\theta s)|\geqslant\delta$ , then there is some natural number $q$ , $q\leqslant(L^{\prime}/\delta)^{L}$ , such that $\|q\theta\|_{\mathbf{R}/\mathbf{Z}}\leqslant(L^{\prime}/\delta)^{L}/D$ . Then we say that $S$ is $(L,L^{\prime},D)$ -diophantine.

Remarks.

Note that the definition is invariant under translation of $S$ . In applications the parameter $D$ will be comparable to the diameter of $S$ , but it is convenient not to simply set $D:=\operatorname{diam}(S)$ , since this would lead to unnecessary estimations of the diameter of $S$ in some situations. Being diophantine with $D\asymp\operatorname{diam}(S)$ (for some $L,L^{\prime}$ ) is a common property of sets of integers. For instance, (the log-free variant of) Weyl’s inequality asserts that the set of $j$ th powers in $[D]$ is $(L,L^{\prime},D)$ -diophantine with appropriate parameters $L,L^{\prime}\ll_{j}1$ ; the set of $j$ th powers of primes in $[D]$ is also $(L,L^{\prime},D)$ -diophantine for some $L,L^{\prime}\ll_{j}1$ . In fact, we will use the latter fact in our argument; for the proof see Lemma˜B.2.

Before turning to the statement and proof of the main results, we isolate the following lemma, which is of a standard type in the analysis of exponential sums. A proof of this particular variant may be found in [Gre25, Lemma C.1] (we have changed some dummy variables to avoid conflicts with the present paper).

Lemma 2.2.

Suppose that $\alpha\in\mathbf{R}$ and that $T\geqslant 1$ is an integer. Suppose that $\delta_{1},\delta_{2}$ are positive real numbers satisfying $\delta_{2}\geqslant 32\delta_{1}$ , and suppose that there are at least $\delta_{2}T$ elements $t\in[T]$ for which $\|\alpha t\|_{\mathbf{R}/\mathbf{Z}}\leqslant\delta_{1}$ . Suppose that $T\geqslant 16/\delta_{2}$ . Then there is some positive integer $q\leqslant 16/\delta_{2}$ such that $\|\alpha q\|_{\mathbf{R}/\mathbf{Z}}\leqslant\delta_{1}\delta_{2}^{-1}T^{-1}$ .

We next give the definition of certain norms describing bias of functions along arithmetic progressions.

Definition 2.3.

Let $f:\mathbf{Z}\rightarrow\mathbf{C}$ be a function. Let $q\in\mathbf{N}$ and $H\in\mathbf{N}$ be parameters. Set

\|f\|_{U^{1}_{\log}[N;q,H]}^{2}:=\mathbb{E}_{n\in[N]}^{\log}\big|\mathbb{E}_{h\in[H]}f(n+hq)\big|^{2}=\mathbb{E}_{n\in[N]}^{\log}\mathbb{E}_{h,h^{\prime}\in[H]}\Delta_{q(h,h^{\prime})}f(n)

(2.1)

and

\|f\|_{U^{1}[N;q,H]}^{2}:=\mathbb{E}_{n\in[N]}\big|\mathbb{E}_{h\in[H]}f(n+hq)\big|^{2}=\mathbb{E}_{n\in[N]}\mathbb{E}_{h,h^{\prime}\in[H]}\Delta_{q(h,h^{\prime})}f(n).

(2.2)

The logarithmic norm ˜2.1 will play the more prominent role in our analysis, with the uniform norm ˜2.2 being relegated to a more modest technical role in Lemma˜2.6. We record that, roughly speaking, we have $\|f\|_{U^{1}_{\log}[N;q,H]}\lessapprox\|f\|_{U^{1}_{\log}[N;\tilde{q},\tilde{H}]}$ if $q\mid\tilde{q}$ and that $\tilde{H}\tilde{q}<Hq$ (for a precise statement, see Lemma˜A.6). In particular for fixed $q$ the information that $\|f\|_{U^{1}_{\log}[N;q,H]}$ is large becomes weaker as $H$ becomes smaller. We are now ready for the first main result of the section, which could potentially have other applications.

Lemma 2.4.

Let $\delta$ be a sufficiently small positive parameter and $L,L^{\prime},D\geqslant 1$ . Let $S\subset\mathbf{Z}$ be $(L,L^{\prime},D)$ -diophantine with $S\subset[-4D,4D]$ , and let $T\in\mathbf{N}$ be a parameter. Suppose that $D,T\geqslant(L^{\prime}/\delta)^{8L}$ and that $\frac{\log TD}{\log N}\leqslant(\delta/L^{\prime})^{50L}$ . Let $H$ be any positive integer with $H\leqslant(\delta/L^{\prime})^{50L}TD$ . Let $f:\mathbf{N}\rightarrow\mathbf{C}$ be $1$ -bounded and suppose that we have

\mathbb{E}_{n\in[N]}^{\log}\mathbb{E}_{t,t^{\prime}\in[T]}\mathbb{E}_{s\in S}f(n+ts)\overline{f(n+t^{\prime}s)}\geqslant\delta.

(2.3)

Then there exists $q\in\mathbf{N}$ , $q\leqslant(L^{\prime}/\delta)^{8L}$ , such that $\|f\|_{U^{1}_{\log}[N;q,H]}\geqslant(\delta/L^{\prime})^{25L}$ .

Remark.

Note here that $q$ may depend on $f$ , but we are free to specify $H$ subject to the stated upper bound condition.

Proof.

Throughout the proof we assume that $\delta_{0}$ is sufficiently small without further comment. The proof is Fourier-analytic; closely related arguments have appeared as base cases for various ‘concatenation’ results (see e.g. [PP24, Lemma 5.3] or [Pel20, Lemma 5.4]). By ˜A.2 applied with $h=(t-t^{\prime})s$ we have

\mathbb{E}_{n\in[N]}^{\log}f(n)\mathbb{E}_{t,t^{\prime}\in[T]}\mathbb{E}_{s\in S}\overline{f(n+(t-t^{\prime})s)}\geqslant\delta/2,

which for brevity we write

\mathbb{E}_{n\in[N]}^{\log}f(n)\mathbb{E}_{u\in[T]-[T]}\mathbb{E}_{s\in S}\overline{f(n+us)}\geqslant\delta/2,

with the understanding that $[T]-[T]$ is considered with multiplicity. By Cauchy–Schwarz this gives that

\mathbb{E}_{n\in[N]}^{\log}\mathbb{E}_{u,u^{\prime}\in[T]-[T]}\mathbb{E}_{s,s^{\prime}\in S}f(n+us)\overline{f(n+u^{\prime}s^{\prime})}\geqslant\delta^{2}/4.

By a further application of ˜A.2, followed by the triangle inequality, we have that

\mathbb{E}_{n\in[N]}^{\log}\Big|\mathbb{E}_{h\in[TD]-[TD]}\mathbb{E}_{u,u^{\prime}\in[T]-[T]}\mathbb{E}_{s,s^{\prime}\in S}f(n+h+us)\overline{f(n+h+u^{\prime}s^{\prime})}\Big|\geqslant\delta^{2}/8.

Denote

\mathcal{N}_{0}:=\big\{n\in[N]:\big|\mathbb{E}_{h\in[TD]-[TD]}\mathbb{E}_{u,u^{\prime}\in[T]-[T]}\mathbb{E}_{s,s^{\prime}\in S}f(n+h+us)\overline{f(n+h+u^{\prime}s^{\prime})}\big|\geqslant\delta^{2}/16\big\}.

By a simple averaging argument we have

\mathbb{E}^{\log}_{n\in[N]}1_{\mathcal{N}_{0}}(n)\geqslant\delta^{2}/16.

(2.4)

For the time being, let $n\in\mathcal{N}_{0}$ be fixed. Defining $g_{n}:\mathbf{N}\rightarrow\mathbf{C}$ by $g_{n}(m)=f(n+m)$ for $|m|\leqslant 16TD$ and $0$ otherwise, we have from the definition of $\mathcal{N}_{0}$ that

\big|\mathbb{E}_{h\in[TD]-[TD]}\mathbb{E}_{u,u^{\prime}\in[T]-[T]}\mathbb{E}_{s,s^{\prime}\in S}g_{n}(h+us)\overline{g_{n}(h+u^{\prime}s^{\prime})}\big|\geqslant\delta^{2}/16.

Note here that $|h+us|,|h+u^{\prime}s^{\prime}|\leqslant 16TD$ , using here that $S\subset[-4D,4D]$ . Taking the Fourier expansion $g_{n}(m)=\int_{\mathbf{R}/\mathbf{Z}}\widehat{g_{n}}(\theta)e(\theta m)d\theta$ and applying the triangle inequality, this gives

\int_{(\mathbf{R}/\mathbf{Z})^{2}}\big|\widehat{g}_{n}(\theta)\widehat{g}_{n}(\theta^{\prime})\big|K(\theta,\theta^{\prime})d\theta d\theta^{\prime}\geqslant\delta^{2}/16,

(2.5)

where

K(\theta,\theta^{\prime}):=\big|\mathbb{E}_{h\in[TD]-[TD]}e\big((\theta-\theta^{\prime})h\big)\psi(\theta)\psi(\theta^{\prime})\big|

with

\psi(\theta):=\mathbb{E}_{u\in[T]-[T]}\mathbb{E}_{s\in S}e(\theta us).

(2.6)

Now by bounding the $\psi(\cdot)$ terms trivially by $1$ and using that

|\mathbb{E}_{h\in[TD]-[TD]}e((\theta-\theta^{\prime})h)|=|\mathbb{E}_{h\in[TD]}e((\theta-\theta^{\prime})h)|^{2}\ll(TD)^{-2}\lVert\theta-\theta^{\prime}\rVert_{\mathbf{R}/\mathbf{Z}}^{-2},

we have $K(\theta,\theta^{\prime})\ll\min(1,(TD)^{-2}\lVert\theta-\theta^{\prime}\rVert_{\mathbf{R}/\mathbf{Z}}^{-2})$ . From this, Cauchy–Schwarz and Parseval it follows that

\int_{\mathbf{R}/\mathbf{Z}}\big|\widehat{g_{n}}(\theta)\widehat{g_{n}}(\theta+\alpha)\big|K(\theta,\theta+\alpha)d\theta\ll\Big(\int_{\mathbf{R}/\mathbf{Z}}|\widehat{g}_{n}(\theta)|^{2}\Big)(TD)^{-2}\|\alpha\|_{\mathbf{R}/\mathbf{Z}}^{-2}\ll(TD)^{-1}\|\alpha\|_{\mathbf{R}/\mathbf{Z}}^{-2}.

Integrating over $\alpha\in\mathbf{R}/\mathbf{Z}$ , we see that the contribution to ˜2.5 from $\|\alpha\|_{\mathbf{R}/\mathbf{Z}}\geqslant C\delta^{-2}/TD$ is negligible for $C$ sufficiently large, that is to say

\int_{\|\theta-\theta^{\prime}\|_{\mathbf{R}/\mathbf{Z}}\leqslant C\delta^{-2}/TD}\big|\widehat{g}_{n}(\theta)\widehat{g}_{n}(\theta^{\prime})\big|K(\theta,\theta^{\prime})d\theta d\theta^{\prime}\geqslant\delta^{2}/32.

Therefore, bounding the geometric series part of $K$ trivially by $1$ ,

\int_{\|\theta-\theta^{\prime}\|_{\mathbf{R}/\mathbf{Z}}\leqslant C\delta^{-2}/TD}\big|\widehat{g}_{n}(\theta)\widehat{g}_{n}(\theta^{\prime})\psi(\theta)\psi(\theta^{\prime})\big|d\theta d\theta^{\prime}\geqslant\delta^{2}/32.

In particular, for some $\alpha\in\mathbf{R}/\mathbf{Z}$ we have

\int_{\mathbf{R}/\mathbf{Z}}\big|\widehat{g}_{n}(\theta)\widehat{g}_{n}(\theta+\alpha)\psi(\theta)\psi(\theta+\alpha)\big|d\theta\gg\delta^{4}TD.

Using the AM–GM inequality $x^{2}+y^{2}\geqslant 2xy$ with $x=|\widehat{g_{n}}(\theta)\psi(\theta)|$ and $y=|\widehat{g_{n}}(\theta+\alpha)\psi(\theta+\alpha)|$ , it follows that

\int_{\mathbf{R}/\mathbf{Z}}|\widehat{g}_{n}(\theta)|^{2}|\psi(\theta)|^{2}d\theta\gg\delta^{4}TD.

(2.7)

By Parseval’s inequality we have $\int_{\mathbf{R}/\mathbf{Z}}|\widehat{g}_{n}(\theta)|^{2}\ll TD$ , and so for sufficiently small $c_{1}$ we have

\int_{|\psi(\theta)|\geqslant c_{1}\delta^{2}}|\widehat{g}_{n}(\theta)|^{2}|\psi(\theta)|^{2}d\theta\gg\delta^{4}TD.

(2.8)

To proceed further we need to analyse the $\theta$ for which $|\psi(\theta)|\geqslant c_{1}\delta^{2}$ . Suppose in the following discussion that $\theta$ has this property. Recalling that the definition of $\psi$ is ˜2.6, it follows that $\mathbb{E}_{u\in[T]-[T]}\big|\mathbb{E}_{s\in S}e(\theta us)\big|\geqslant c_{1}\delta^{2}$ . Writing $\mathcal{U}:=\{u\in[T]:\big|\mathbb{E}_{s\in S}e(\theta us)\big|\geqslant c_{1}\delta^{2}/2\}$ , we see that $\mu_{[T]-[T]}(\mathcal{U}\cup-\mathcal{U})\gg\delta^{2}$ , where $\mu_{[T]-[T]}$ denotes the natural weighted probability measure on $[T]-[T]$ . Since $\mu_{[T]-[T]}(x)\leqslant 1/T$ pointwise, it follows that $|\mathcal{U}|\gg\delta^{2}T$ .

We now apply the diophantine assumption on $S$ . We conclude that for each $u\in\mathcal{U}$ there is some nonzero $q_{u}\ll(L^{\prime}/\delta)^{2L}$ such that $\|q_{u}u\theta\|_{\mathbf{R}/\mathbf{Z}}\ll(L^{\prime}/\delta)^{2L}/D$ . By further refining the set of $u$ (to a set of size $\gg(\delta/L^{\prime})^{4L}T$ ) we may assume that $q_{u}$ does not depend on $u$ . Denote this common value by $q_{0}$ .

Now we apply Lemma˜2.2, taking $\alpha=q_{0}\theta$ , $\delta_{2}\gg(\delta/L^{\prime})^{4L}$ and $\delta_{1}=(L^{\prime}/\delta)^{2L}/D$ . One can check that the conditions of Lemma˜2.2 are consequences of the hypothesised lower bounds on $D$ and $T$ , provided $C$ is large enough. The conclusion of the lemma is then that there is some $q\ll(L^{\prime}/\delta)^{4L}$ such that $\|\alpha q\|_{\mathbf{R}/\mathbf{Z}}\ll(L^{\prime}/\delta)^{6L}/TD$ . Taking $q^{\prime}:=qq_{0}$ , we see that $q^{\prime}\ll(L^{\prime}/\delta)^{6L}$ and $\|\theta q^{\prime}\|_{\mathbf{R}/\mathbf{Z}}\ll(L^{\prime}/\delta)^{8L}/TD$ .

It follows from this analysis and ˜2.7 that $\int_{\theta\in\Theta}|\widehat{g}_{n}(\theta)|^{2}\gg\delta^{4}TD$ , where $\Theta$ is the set of all $\theta$ for which $\|\theta q\|_{\mathbf{R}/\mathbf{Z}}\leqslant(L^{\prime}/\delta)^{8L}/TD$ for some $q\in\mathbf{N}$ with $q\ll(L^{\prime}/\delta)^{6L}$ . Since the measure of $\Theta$ is $\ll(L^{\prime}/\delta)^{14L}/TD$ , there is some $\theta_{n}\in\Theta$ such that

|\widehat{g}_{n}(\theta_{n})|\gg(\delta/L^{\prime})^{18L}TD.

(2.9)

By refining $\mathcal{N}_{0}$ we may, using ˜2.4, find $\mathcal{N}_{1}\subset\mathcal{N}_{0}$ such that

\mathbb{E}_{n\in[N]}^{\log}1_{\mathcal{N}_{1}}(n)\gg(\delta/L^{\prime})^{8L}

(2.10)

and such that, for all $n\in\mathcal{N}_{1}$ , the corresponding $\theta_{n}$ all have the same value of $q$ ; that is, $\|q\theta_{n}\|_{\mathbf{R}/\mathbf{Z}}\ll(L^{\prime}/\delta)^{8L}/TD$ for all $n\in\mathcal{N}_{1}$ . Writing out the definition of the Fourier transform, we have from ˜2.9 that

\Big|\mathbb{E}_{|m|\leqslant 16TD}g_{n}(m)e(\theta_{n}m)\Big|\gg(\delta/L^{\prime})^{18L}.

Recall that $H\in\mathbf{N}$ is a given parameter, satisfying $H\leqslant(L^{\prime}/\delta)^{50L}$ . By the properties of $\theta_{n}$ , we have

\Big|\mathbb{E}_{h\in[H]}\mathbb{E}_{|m|\leqslant 16TD}g_{n}(m)e(\theta_{n}(m-qh))\Big|\gg(\delta/L^{\prime})^{18L}.

Substituting $m^{\prime}:=m-qh$ gives

\Big|\mathbb{E}_{h\in[H]}\mathbb{E}_{-16TD+qh\leqslant m^{\prime}\leqslant 16TD+qh}g_{n}(m^{\prime}+qh)e(\theta_{n}m^{\prime})\Big|\gg(\delta/L^{\prime})^{18L},

which implies that

\Big|\mathbb{E}_{h\in[H]}\mathbb{E}_{|m^{\prime}|\leqslant 16TD}g_{n}(m^{\prime}+qh)e(\theta_{n}m^{\prime})\Big|\gg(\delta/L^{\prime})^{18L}

by the bound on $H$ and ˜A.1. Dropping the dashes on $m^{\prime}$ and swapping the order of the averages gives

\Big|\mathbb{E}_{|m|\leqslant 16TD}e(\theta_{n}m)\mathbb{E}_{h\in[H]}g_{n}(m+qh)\Big|\gg(\delta/L^{\prime})^{18L}.

By Cauchy–Schwarz, it follows that

\mathbb{E}_{|m|\leqslant 16TD}\mathbb{E}_{h,h^{\prime}\in[H]}g_{n}(m+qh)\overline{g_{n}(m+qh^{\prime})}\gg(\delta/L^{\prime})^{36L}.

Recall that we have this for all $n\in\mathcal{N}_{1}$ . However, the quantity on the left is non-negative for all $n$ . Taking the logarithmic average over $n$ (and recalling ˜2.10) we obtain

\mathbb{E}_{n\in[N]}^{\log}\mathbb{E}_{|m|\leqslant 16TD}\mathbb{E}_{h,h^{\prime}\in[H]}g_{n}(m+qh)\overline{g_{n}(m+qh^{\prime})}\gg(\delta/L^{\prime})^{44L}.

Recalling that $g_{n}(m)=f(n+m)$ , and taking the $n$ average to the inside, this is

\mathbb{E}_{|m|\leqslant 16TD}\mathbb{E}_{h,h^{\prime}\in[H]}\mathbb{E}_{n\in[N]}^{\log}f(n+m+qh)\overline{f(n+m+qh^{\prime})}\gg(\delta/L^{\prime})^{44L}.

Applying ˜A.2 to the inner average for each $m$ (and using the assumed bound on $\frac{\log TD}{\log N}$ ) we may drop the $m$ -average, obtaining

\mathbb{E}_{h,h^{\prime}\in[H]}\mathbb{E}_{n\in[N]}^{\log}f(n+qh)\overline{f(n+qh^{\prime})}\gg(\delta/L^{\prime})^{44L}.

This is equivalent to the stated result. ∎

Lemma 2.5.

There is an absolute constant $\delta_{0}$ such that the following holds. Fix $\delta\in(0,\delta_{0}]$ and $L,L^{\prime},D\geqslant 1$ . Let $S\subset[-4D,4D]$ be a set which is $(L,L^{\prime},D)$ -diophantine, and let $T\in\mathbf{N}$ be a parameter. Let $X$ be a further sufficiently large parameter. Suppose that $D,T\geqslant(L^{\prime}/\delta)^{8L}$ and that $TD\leqslant(\delta/L^{\prime})^{50L}X$ . Let $H$ be a positive integer with $H\leqslant(\delta/L^{\prime})^{50L}TD$ . Let $f:\mathbf{N}\rightarrow\mathbf{C}$ be $1$ -bounded and suppose that we have

\mathbb{E}_{n\in[X]}\mathbb{E}_{t,t^{\prime}\in[T]}\mathbb{E}_{s\in S}f(n+ts)\overline{f(n+t^{\prime}s)}\geqslant\delta.

(2.11)

Then there exists $q\in\mathbf{N}$ , $q\leqslant(L^{\prime}/\delta)^{8L}$ , such that $\|f\|_{U^{1}[X;q,H]}\geqslant(\delta/L^{\prime})^{25L}$ .

Proof.

The same proof works essentially verbatim, except that the three applications of ˜A.2 are replaced by appeals to ˜A.1, using each time the assumption $\frac{TD}{X}\leqslant(\delta/L^{\prime})^{50L}$ rather than a bound on $\frac{\log TD}{\log X}$ in the logarithmic case. ∎

Rather than Lemma˜2.5 itself, we will need the following iterated variant. Here we use the notation for difference operators $\Delta_{(h,h^{\prime})}$ described in Section˜1.4.

Lemma 2.6.

There are absolute constants $\delta_{0}<1$ and $C=C_{\operatorname{\ref{lem:input-concat-2-iter}}}$ such that the following holds. Fix $\delta\in(0,\delta_{0}]$ and $L,L^{\prime},D_{1},D_{2}\geqslant 1$ . For $i=1,2$ suppose that $S_{i}\subset[-4D_{i},4D_{i}]$ is a set which is $(L,L^{\prime},D_{i})$ -diophantine, and let $T_{i}$ be a parameter. Let $X$ be a sufficiently large parameter and suppose that $D_{i},T_{i}\geqslant(L^{\prime}/\delta)^{CL^{2}}$ and that $T_{i}D_{i}\leqslant(L^{\prime}/\delta)^{CL^{2}}X$ . Let $H_{1},H_{2}$ be positive integers with $H_{i}\leqslant(L^{\prime}/\delta)^{CL^{2}}T_{i}D_{i}$ . Let $\psi:\mathbf{N}\rightarrow\mathbf{C}$ be $1$ -bounded and suppose that

\mathbb{E}_{n\in[X,2X),t_{1},t_{1}^{\prime}\in[T_{1}],t_{2},t^{\prime}_{2}\in[T_{2}],s_{1}\in S_{1},s_{2}\in S_{2}}\Delta_{s_{1}(t_{1},t^{\prime}_{1})}\Delta_{s_{2}(t_{2},t^{\prime}_{2})}\psi(n)\geqslant\delta.

(2.12)

Then there exist $q_{1},q_{2}\in\mathbf{N}$ , $q_{i}\leqslant(L^{\prime}/\delta)^{CL^{2}}$ , such that

\mathbb{E}_{n\in[2X],h_{1},h^{\prime}_{1}\in[H_{1}],h_{2},h^{\prime}_{2}\in[H_{2}]}\Delta_{q_{1}(h_{1},h^{\prime}_{1})}\Delta_{q_{2}(h_{2},h^{\prime}_{2})}\psi(n)\geqslant(\delta/L^{\prime})^{CL^{2}}.

Remark.

We have only stated a version with two difference operators (which will involve one iteration of Lemma˜2.5), since this is what we will need later. A similar argument gives a version with $k$ difference operators.

Proof.

By an averaging argument, there are at least $\delta|S_{2}|T_{2}^{2}/2$ triples $(s_{2},t_{2},t^{\prime}_{2})$ such that

\mathbb{E}_{n\in[X,2X),t_{1},t^{\prime}_{1}\in[T_{1}],s_{1}\in S_{1}}\Delta_{s_{1}(t_{1},t^{\prime}_{1})}\big(\Delta_{s_{2}(t_{2},t^{\prime}_{2})}\psi\big)(n)\geqslant\delta/2.

Since, for any $n,s_{1}$ , the average over $t_{1},t^{\prime}_{1}$ is non-negative, we have

\mathbb{E}_{n\in[2X],t_{1},t^{\prime}_{1}\in[T_{1}],s_{1}\in S_{1}}\Delta_{s_{1}(t_{1},t^{\prime}_{1})}\big(\Delta_{s_{2}(t_{2},t^{\prime}_{2})}\psi\big)(n)\geqslant\delta/4.

For each such triple, this is exactly the hypothesis ˜2.11 of Lemma˜2.5 with $f=\Delta_{s_{2}(t_{2},t^{\prime}_{2})}\psi$ (and $\delta$ replaced by $\delta/4$ and $X$ by $2X$ ). The conclusion of Lemma˜2.5 is then that there exists $q=q(s_{2},t_{2},t^{\prime}_{2})\leqslant(L^{\prime}/\delta)^{O(L)}$ such that $\|\Delta_{s_{2}(t_{2},t^{\prime}_{2})}\psi\|_{U^{1}[2X;q,H_{1}]}\geqslant(\delta/L^{\prime})^{O(L)}$ . Squaring and writing out, this gives

\mathbb{E}_{n\in[2X]}\mathbb{E}_{h_{1},h^{\prime}_{1}\in[H_{1}]}\Delta_{q(h_{1},h^{\prime}_{1})}\big(\Delta_{s_{2}(t_{2},t^{\prime}_{2})}\psi\big)(n)\geqslant(\delta/L^{\prime})^{O(L)}.

(2.13)

By pigeonhole, we may pass to set of $(\delta/L^{\prime})^{O(L)}|S_{2}|T_{2}^{2}$ triples $(s_{2},t_{2},t^{\prime}_{2})$ such that $q_{1}=q(s_{2},t_{2},t^{\prime}_{2})$ is independent of $s_{2},t_{2},t^{\prime}_{2}$ . Since the expression on the left in ˜2.13 is always nonnegative, we may average over all $(s_{2},t_{2},t^{\prime}_{2})\in S_{2}\times[T_{2}]\times[T_{2}]$ , obtaining

\mathbb{E}_{h_{1},h^{\prime}_{1}\in[H_{1}]}\mathbb{E}_{n\in[2X],t_{2},t^{\prime}_{2}\in T_{2},s_{2}\in S_{2}}\Delta_{s_{2}(t_{2},t^{\prime}_{2})}\big(\Delta_{q_{1}(h_{1},h^{\prime}_{1})}\psi\big)(n)\geqslant(\delta/L^{\prime})^{O(L)}.

For at least $(\delta/L^{\prime})^{O(L)}H_{1}^{2}$ pairs $(h_{1},h^{\prime}_{1})$ , we have

\mathbb{E}_{n\in[2X],t_{2},t^{\prime}_{2}\in T_{2},s_{2}\in S_{2}}\Delta_{s_{2}(t_{2},t^{\prime}_{2})}\big(\Delta_{q_{1}(h_{1},h^{\prime}_{1})}\psi\big)(n)\geqslant(\delta/L^{\prime})^{O(L)}.

For each such pair, this is again the hypothesis ˜2.11 of Lemma˜2.5, now with $f=\Delta_{q_{1}(h_{1},h^{\prime}_{1})}\psi$ , $\delta$ replaced by $(\delta/L^{\prime})^{O(L)}$ , and again with $N=2X$ . Another application of Lemma˜2.5 gives that there exists $q=q(h_{1},h^{\prime}_{1})\leqslant(L^{\prime}/\delta)^{O(L^{2})}$ such that $\|\Delta_{q_{1}(h_{1},h^{\prime}_{1})}\psi\|_{U^{1}[2X;q,H_{2}]}\geqslant(\delta/L^{\prime})^{O(L^{2})}$ , provided that $C$ is sufficiently large that the relevant conditions on $D_{2},T_{2}$ and $T_{2}D_{2}/X$ are satisfied. Squaring and writing out, this gives

\mathbb{E}_{n\in[2X]}\mathbb{E}_{h_{2},h^{\prime}_{2}\in[H_{2}]}\Delta_{q(h_{2},h^{\prime}_{2})}\Delta_{q_{1}(h_{1},h^{\prime}_{1})}\psi(n)\geqslant(\delta/L^{\prime})^{O(L^{2})}.

(2.14)

Passing to a further subset of $(\delta/L^{\prime})^{O(L^{2})}$ pairs $(h_{1},h^{\prime}_{1})$ , we may assume that $q_{2}=q(h_{1},h^{\prime}_{1})$ does not depend on $(h_{1},h^{\prime}_{1})$ . Since the expression on the left in ˜2.14 is non-negative for all $q$ , we obtain the desired result by averaging over $h_{1},h^{\prime}_{1}$ . ∎

3. Diophantine properties of almost primes

The main result of this section, Lemma˜3.2, is a vital technical ingredient in our later arguments. Roughly, it states that sets such as $\{p_{1}\cdots p_{k}:p_{i}\in\mathscr{P}_{i}\}$ and $\{p^{2}_{1}\cdots p^{2}_{k}:p_{i}\in\mathscr{P}_{i}\}$ are diophantine (see Definition˜2.1) with suitable parameters, where $\mathscr{P}_{i}$ are dyadically localised sets of primes. We first note a general lemma for ‘bilinear’ exponential sums.

Lemma 3.1.

Let $j\in\mathbf{N}$ . Let $\delta\in(0,\frac{1}{2})$ , and let $S_{1}\subset[N_{1}]$ and $S_{2}\subset[N_{2}]$ be sets with $|S_{i}|=\sigma_{i}N_{i}$ for $i=1,2$ . Suppose that, for some $\theta\in\mathbf{R}/\mathbf{Z}$ , we have $|\mathbb{E}_{s_{1}\in S_{1},s_{2}\in S_{2}}e(\theta s^{j}_{1}s^{j}_{2})|\geqslant\delta$ . Then either $N_{i}\leqslant(\sigma_{1}\sigma_{2}\delta)^{-O_{j}(1)}$ for some $i\in\{1,2\}$ , or else there is some $q\in\mathbf{N}$ , $q\leqslant(\delta\sigma_{1}\sigma_{2})^{-O_{j}(1)}$ , such that $\|q\theta\|_{\mathbf{R}/\mathbf{Z}}\leqslant(\delta\sigma_{1}\sigma_{2})^{-O_{j}(1)}(N_{1}N_{2})^{-j}$ .

Remark.

We will only need the cases $j=1,2$ , in which case of course the exponents may be taken to be absolute constants.

Proof.

Write the condition as

\big|\mathbb{E}_{n_{1}\in[N_{1}],n_{2}\in[N_{2}]}1_{S_{1}}(n_{1})1_{S_{2}}(n_{2})e(\theta n^{j}_{1}n^{j}_{2})\big|\geqslant\delta\sigma_{1}\sigma_{2}.

By two applications of the Cauchy–Schwarz inequality, we obtain

\mathbb{E}_{n_{1},n^{\prime}_{1}\in[N_{1}],n_{2},n^{\prime}_{2}\in[N_{2}]}e\big(\theta(n^{j}_{1}-n^{\prime j}_{1})(n^{j}_{2}-n^{\prime j}_{2})\big)\geqslant(\delta\sigma_{1}\sigma_{2})^{4}.

To handle this, we use the ‘log-free’ multidimensional Weyl inequality [GT14, Proposition 2.2]; we remark that the published version of that paper omits the necessary constraint that $\min(N_{i})$ be sufficiently large. ∎

We now proceed to the main technical lemma of the section. Although we will only need this lemma for $j=1,2$ , it is no harder to prove it for general $j$ .

Lemma 3.2.

Let $j\in\mathbf{N}$ . Then there is a constant $L_{j}\geqslant 1$ such that the following holds. Let $k\geqslant 2$ be a natural number and let $\delta\in(0,\frac{1}{2})$ . Let $M_{1},\ldots,M_{k}$ be a sequence of integers such that the intervals $[M_{i},(1+\frac{1}{4k})M_{i})$ are disjoint. Set $Q:=k^{k}\prod_{i=1}^{k}\log M_{i}$ , and suppose the condition $\min_{i}M_{i}>Q^{L_{j}}$ is satisfied. For each $i$ , suppose we are given a parameter $\eta_{i}$ satisfying $\frac{1}{8k}\leqslant\eta_{i}\leqslant\frac{1}{4k}$ and define $\mathscr{P}_{i}$ to be the set of primes satisfying $M_{i}\leqslant p<M_{i}(1+\eta_{i})$ , and set $S:=\{p_{1}^{j}\cdots p_{k}^{j}:p_{i}\in\mathscr{P}_{i}\}$ . Then $S$ is $(L_{j},k,(M_{1}\cdots,M_{k})^{j})$ -diophantine.

Proof.

We first note that the case $k=1$ is also true and is essentially a standard result about exponential sums over powers of primes. We in fact need (a slight generalisation of) this result in our proof. Since it is hard to find an appropriate reference with the log-free bound that we require, we give this in Lemma˜B.2.

Suppose now that $k\geqslant 2$ . Without loss of generality, assume $M_{1}>M_{2}>\cdots>M_{k}\geqslant 3$ . Let $\theta\in\mathbf{R}/\mathbf{Z}$ and suppose that

\big|\mathbb{E}_{p_{1}\in\mathscr{P}_{1},\dots,p_{k}\in\mathscr{P}_{k}}e(\theta p_{1}^{j}\cdots p_{k}^{j})\big|\geqslant\delta.

(3.1)

We must show that there is some $q\in\mathbf{N}$ such that

q\leqslant(k/\delta)^{L_{j}}\quad\mbox{and}\quad\|q\theta\|_{\mathbf{R}/\mathbf{Z}}\leqslant(k/\delta)^{L_{j}}(M_{1}\cdots M_{k})^{-j}.

(3.2)

We try applying Lemma˜3.1 with $N_{1}:=2\prod_{i\leqslant k:i\operatorname{even}}M_{i}$ and $N_{2}:=2\prod_{i\leqslant k:i\operatorname{odd}}M_{i}$ . Define sets $S_{1}\subset[N_{1}]$ , $S_{2}\subset[N_{2}]$ by $S_{1}:=\prod_{i\leqslant k:i\operatorname{even}}\mathscr{P}_{i}$ and $S_{2}:=\prod_{i\leqslant k:i\operatorname{odd}}\mathscr{P}_{i}$ (the stated containments are easily verified). Set $\sigma_{i}:=|S_{i}|/N_{i}$ . Since $M_{i}>Q>k^{k}$ , it follows from the prime number theorem with classical error term (see e.g. [IK-book, Section 5.6]) that we have $|\mathscr{P}_{i}|\geqslant cM_{i}/k\log M_{i}$ for some absolute $c>0$ . Therefore

\sigma_{1}\sigma_{2}=\frac{1}{4}\prod_{i=1}^{k}\frac{|\mathscr{P}_{i}|}{M_{i}}\geqslant\Big(\frac{c}{2k}\Big)^{k}\prod_{j=1}^{k}\frac{1}{\log M_{i}}\geqslant\big(\frac{c}{2}\big)^{k}Q^{-1}\gg Q^{-2},

using in this last step that $Q>k^{k}$ . Applying Lemma˜3.1, and noting that $N_{1}\leqslant N_{2}$ , it follows that either

N_{1}\leqslant(Q/\delta)^{O_{j}(1)}

(3.3)

or else there is some $q\in\mathbf{N}$ with

q\leqslant(Q/\delta)^{O_{j}(1)}\quad\mbox{and}\quad\|\theta q\|_{\mathbf{R}/\mathbf{Z}}\leqslant(Q/\delta)^{O_{j}(1)}(M_{1}\cdots M_{k})^{-j}.

(3.4)

We leave aside ˜3.3 for now, and assume that ˜3.4 holds. If $\delta\leqslant 1/Q$ then ˜3.2 follows immediately (with $L_{j}$ equal to twice the $O_{j}(1)$ exponent). Therefore we may suppose henceforth that $\delta\geqslant 1/Q$ . In particular, ˜3.4 gives (after doubling the implied constant in the exponents) that

q\leqslant Q^{O_{j}(1)}\quad\mbox{and}\quad\|\theta q\|_{\mathbf{R}/\mathbf{Z}}\leqslant Q^{O_{j}(1)}(M_{1}\cdots M_{k})^{-j}.

Thus $\theta=\frac{a}{q}+\theta^{\prime}$ for some $a\in\mathbf{Z}$ , with

|\theta^{\prime}|\leqslant Q^{O_{j}(1)}(M_{1}\cdots M_{k})^{-j}.

(3.5)

We now return to the original sum ˜3.1. By pigeonhole, there is a choice of $t=p_{2}^{j}\cdots p_{k}^{j}$ such that $\big|\mathbb{E}_{p_{1}\in\mathscr{P}_{1}}e(\theta tp_{1}^{j})\big|\geqslant\delta$ . By Lemma˜B.2 (that is, essentially the case $k=1$ of the present lemma) it follows that there is some $q_{0}\leqslant(k/\delta)^{O_{j}(1)}$ such that $\|\theta tq_{0}\|_{\mathbf{R}/\mathbf{Z}}\leqslant(k/\delta)^{O_{j}(1)}M_{1}^{-j}$ . Since $\theta^{\prime}=\theta-a/q$ , this means that $\theta^{\prime}tq_{0}$ is within $(k/\delta)^{O_{j}(1)}M_{1}^{-j}$ of $-atq_{0}/q$ , an integer multiple of $1/q$ . However, we may also note using ˜3.5 and the bound $p_{2}\cdots p_{k}\leqslant 3M_{2}\cdots M_{k}$ that

|\theta^{\prime}tq_{0}|\leqslant Q^{O_{j}(1)}(M_{1}\cdots M_{k})^{-j}\cdot(3M_{2}\cdots M_{k})^{j}\cdot(k/\delta)^{O_{j}(1)}\leqslant Q^{O_{j}(1)}M_{1}^{-j}<\frac{1}{2q}.

Here, in the penultimate step we used that $\delta\leqslant 1/Q$ (and so $k/\delta\leqslant Q^{2}$ ), and in the last step we invoked the assumption $M_{1}>Q^{L_{j}}$ and the upper bound $q\leqslant Q^{O_{j}(1)}$ (and assumed $L_{j}$ is large enough). Since $(k/\delta)^{O_{j}(1)}M_{1}^{-j}\leqslant Q^{O_{j}(1)}M_{1}^{-j}<\frac{1}{2q}$ (for the aforementioned reasons) the only possible integer multiple of $1/q$ that $\theta^{\prime}tq$ can be near is $0$ , and therefore $|\theta^{\prime}tq_{0}|\leqslant(k/\delta)^{O_{j}(1)}M_{1}^{-j}$ and $q\mid atq_{0}$ . Dividing through by $tq_{0}$ , we obtain $|\theta^{\prime}|\leqslant(k/\delta)^{O_{j}(1)}(M_{1}\cdots M_{k})^{-O_{j}(1)}$ . Note also that $(q,t)=1$ since all prime factors of $t$ are at least $M_{k}>Q^{L_{j}}\geqslant q$ , and therefore $q\mid aq_{0}$ . Finally it follows that $\|\theta q_{0}\|_{\mathbf{R}/\mathbf{Z}}=\|\theta^{\prime}q_{0}\|_{\mathbf{R}/\mathbf{Z}}\leqslant|\theta^{\prime}|q_{0}\leqslant(k/\delta)^{O_{j}(1)}(M_{1}\cdots M_{k})^{-j}$ , which is the desired conclusion ˜3.2.

It remains to analyse the ‘small parameter’ case ˜3.3, that is to say $N_{1}\leqslant(Q/\delta)^{O_{j}(1)}$ . The assumption $\min_{i}M_{i}>Q^{L_{j}}$ certainly implies that $N_{1}>Q^{L_{j}}$ . Therefore (assuming $L_{j}$ large enough) we have $N_{1}\leqslant\delta^{-O_{j}(1)}$ . It follows that

M_{2}\cdots M_{k}=\prod_{\begin{subarray}{c}i\leqslant k\\ i\operatorname{even}\end{subarray}}M_{i}\cdot\prod_{\begin{subarray}{c}i\leqslant k-1\\ i\operatorname{even}\end{subarray}}M_{i+1}\leqslant\prod_{\begin{subarray}{c}i\leqslant k\\ i\operatorname{even}\end{subarray}}M_{i}\cdot\prod_{\begin{subarray}{c}i\leqslant k-1\\ i\operatorname{even}\end{subarray}}M_{i}\leqslant N_{1}^{2}\leqslant\delta^{-O_{j}(1)}.

(3.6)

As before, ˜3.1 implies that there is some $t=p_{2}^{j}\cdots p_{k}^{j}$ such that $|\mathbb{E}_{p_{1}\in\mathscr{P}_{1}}e(\theta tp_{1}^{j})|\geqslant\delta$ . By Lemma˜B.2 it follows that there is some $q_{0}\leqslant(k/\delta)^{O_{j}(1)}$ such that $\|\theta tq_{0}\|_{\mathbf{R}/\mathbf{Z}}\leqslant(k/\delta)^{O_{j}(1)}M_{1}^{-j}$ . Taking $q:=tq_{0}$ , we then have (using ˜3.6)

q\leqslant(M_{2}\cdots M_{k})^{j}(k/\delta)^{O_{j}(1)}\leqslant(k/\delta)^{O_{j}(1)},

and (using ˜3.6 again) $\|\theta q\|_{\mathbf{R}/\mathbf{Z}}\leqslant(k/\delta)^{O_{j}(1)}M_{1}^{-j}\leqslant(k/\delta)^{O_{j}(1)}(M_{1}\cdots M_{k})^{-j}$ , which is once again the desired conclusion ˜3.2. ∎

4. Fourier decomposition of a majorant for the primes

In this section we give another technical ingredient for our later arguments. Here is the main result.

Lemma 4.1.

Let $X$ be a large parameter. Then there is a function $\tilde{\Lambda}:[X,2X)\rightarrow\mathbf{R}_{\geqslant 0}$ with

\tilde{\Lambda}(p)\gg\log X

(4.1)

for all primes $p\in[X,2X)$ and

\mathbb{E}_{x\in[X,2X)}\tilde{\Lambda}(x)\ll 1

(4.2)

such that the following is true. Let $c\in(0,1)$ be a constant. For any parameter $Q\in\mathbf{N}$ , $Q\leqslant\log X$ , there is a $(Q!)$ -periodic function $\Lambda_{\operatorname{per}}$ satisfying

\mathbb{E}_{x\in[X,2X)}|\Lambda_{\operatorname{per}}(x)|\ll(\log Q)^{O(1)}\quad\mbox{and}\quad\|\Lambda_{\operatorname{per}}\|_{\infty}\ll Q^{2}

(4.3)

together with a decomposition $\tilde{\Lambda}-\Lambda_{\operatorname{per}}=\sum_{i}g_{i}+h$ (where the sum over $i$ is finite) with the following properties. First, the function $h$ is small in $\ell^{1}$ in the sense that

\mathbb{E}_{x\in[X,2X)}|h(x)|\ll Q^{-1}.

(4.4)

Second, the functions $g_{i}$ are reasonably bounded in sup norm in the sense that

\sum_{i}\|g_{i}\|_{\infty}\ll(\log X)^{O_{c}(1)}

(4.5)

for all $i$ . Finally, denoting $\|\widehat{f}\|_{\infty}:=\sup_{\theta\in\mathbf{R}/\mathbf{Z}}\big|\sum_{x\in[X,2X]}f(x)e(\theta x)\big|$ we have the estimate

\sum_{i}\|\widehat{g}_{i}\|_{\infty}^{c}\|g_{i}\|_{\infty}^{1-c}\ll X^{c}Q^{-c/4}.

(4.6)

Here, all implied constants may depend on $c$ but are effectively computable.

Proof.

We take $\tilde{\Lambda}$ to be a Selberg-type majorant for the primes. Rather than describe the construction explicitly here, we can just refer to [green-tao-selberg, Proposition 3.1], which provides the relevant properties. Taking $F(n)=n$ in that proposition (thus the singular series $\mathfrak{S}_{F}$ as defined in [green-tao-selberg] is $\asymp 1$ ) and $R:=X^{1/10}$ , we can take $\tilde{\Lambda}=\beta$ , where $\beta$ is the function constructed in [green-tao-selberg, Proposition 3.1]. The desired majorant property ˜4.1 is a consequence of [green-tao-selberg, Equation (3.1)]. The bound ˜4.2 is an absolutely standard fact about the Selberg sieve. It could be deduced within the framework of [green-tao-selberg] by summing [green-tao-selberg, Equation (3.3)] over $n\in[X,2X)$ , and discarding the negligible contribution from all frequencies except $a/q=0$ . On the Fourier side we have (see [green-tao-selberg, Equation (7.7)])

\tilde{\Lambda}(n)=\Big(\sum_{q\leqslant R}\frac{\mu(q)}{\phi(q)}\Big)^{-1}\Big(\sum_{q\leqslant R}\frac{\mu(q)}{\phi(q)}\sum_{(a,q)=1}e\big(\frac{an}{q}\big)\Big)^{2}.

It is shown in [green-tao-selberg, Proposition 7.1], following Ramaré and Ruzsa [ramare-ruzsa], that

\tilde{\Lambda}(n)=\sum_{q\leqslant R^{2}}c_{q}\sum_{(a,q)=1}e(an/q)

with $c_{q}$ supported on squarefrees with $q\leqslant R^{2}$ and $|c_{q}|\ll\tau(q)^{2}/q$ . Set $i_{0}:=\lfloor\log_{2}Q\rfloor$ , $i_{1}:=\lfloor A\log_{2}\log X\rfloor$ for some $A=A(c)$ to be specified below, and finally set

\Lambda_{\operatorname{per}}(n):=\sum_{q\leqslant 2^{i_{0}}}c_{q}\sum_{(a,q)=1}e(an/q),\qquad f_{i}(n):=\sum_{2^{i}<q\leqslant 2^{i+1}}c_{q}\sum_{(a,q)=1}e(an/q)

for $i_{0}\leqslant i<i_{1}$ and

f_{i_{1}}(n):=\sum_{2^{i_{1}}\leqslant q\leqslant R^{2}}c_{q}\sum_{(a,q)=1}e(an/q).

It is then clear that $\Lambda_{\operatorname{per}}$ is $(Q!)$ -periodic and that $\tilde{\Lambda}-\Lambda_{\operatorname{per}}=\sum_{i}f_{i}$ . We now define $g_{i},g^{\prime}_{i}$ by ‘thresholding’ the $f_{i}$ , specifically by setting

g_{i}(n):=f_{i}(n)1_{|f_{i}(n)|\leqslant 2^{ic/2}},\qquad g_{i}^{\prime}(n):=f_{i}(n)1_{|f_{i}(n)|>2^{ic/2}}

for $i_{0}\leqslant i\leqslant i_{1}$ . Set $h:=\sum_{i}g^{\prime}_{i}$ . The $\ell^{\infty}$ bound ˜4.5 is then immediate.

Next we establish ˜4.4. For this, we will use the moment estimates

\mathbb{E}_{x\in[X,2X)}|f_{i}(x)|^{m}\ll_{m}i^{C_{m}},\quad i\in[i_{0},i_{1}),\quad\mbox{and}\quad\mathbb{E}_{x\in[X,2X)}|f_{i_{1}}(x)|^{m}\ll_{m}(\log X)^{C_{m}}

(4.7)

for $m\in\mathbf{N}$ and for some constants $C_{m}$ , which we will establish below. Indeed, taking $m=\lceil 4/c\rceil$ in ˜4.7 yields

\mathbb{E}_{x\in[X,2X)}|g^{\prime}_{i}(x)|\leqslant 2^{ci(1-m)/2}\mathbb{E}_{x\in[X,2X]}|f_{i}(x)|^{m}\ll 2^{ci(1-m)}i^{C_{m}}\ll 2^{-i},

(4.8)

uniformly for $i\in[i_{0},i_{1})$ , and similarly

\mathbb{E}_{x\in[X,2X)}|g^{\prime}_{i_{1}}(x)|\leqslant(\log X)^{cA(1-m)/2}\mathbb{E}_{x\in[X,2X]}|f_{i_{1}}(x)|^{m}\ll(\log X)^{cA(1-m)+C_{m}}\ll(\log X)^{-A}

(4.9)

provided $A$ is chosen large enough (depending only on $c$ ). The desired estimate ˜4.4 is now immediate from the triangle inequality, the dominant contribution being from ˜4.8 with values $i\approx i_{0}$ . (Here we use the assumption that $Q\leqslant\log X$ to guarantee that the contribution from ˜4.9 is insignificant.)

Now we establish ˜4.6. It is enough to show that

\|\widehat{g_{i}}\|_{\infty}\ll X2^{-3i/4}

(4.10)

for $i\in[i_{0},i_{1}]$ , since the desired estimate then follows using the $\ell^{\infty}$ bounds on the $g_{i}$ implicit in the definitions of these functions. To show ˜4.10, it suffices to show the non-thresholded estimates

\|\widehat{f_{i}}\|_{\infty}\ll X2^{-3i/4}

(4.11)

for $i\in[i_{0},i_{1}]$ , from which ˜4.10 follows using ˜4.8 and 4.9. From the definition of $f_{i}$ , summing the geometric series and the bound $|c_{q}|\ll\tau(q)^{2}/q$ , we have

\big|\sum_{x\in[X,2X]}f_{i}(x)e(\theta x)\big|\ll\sum_{2^{i}<q\leqslant R^{2}}\frac{\tau(q)^{2}}{q}\sum_{(a,q)=1}\min\big(X,\|\theta-a/q\|_{\mathbf{R}/\mathbf{Z}}^{-1}\big)

Since the fractions $a/q$ are $R^{-4}$ -separated, the contribution from all except at most one $a/q$ will be (crudely) $\ll R^{2}\cdot R^{4}\ll X2^{-i}$ . For the fraction $a/q$ closest to $\theta$ , we have the trivial bound $\ll X\tau(q)^{2}/q$ , which is $<Xq^{-3/4}\ll X2^{-3i/4}$ by the divisor bound, and ˜4.11 (and therefore ˜4.6) follows.

We now return to establish the moment estimate ˜4.7. An ingredient in the proof will be the (standard) estimate

\sum_{P^{+}(d)\leqslant Q}\frac{\tau(d)^{C}}{d}\ll_{C}(\log Q)^{2^{C}}.

(4.12)

To prove this, observe that the LHS is $\prod_{p\leqslant Q}(1+\frac{2^{C}}{p}+\frac{3^{C}}{p^{2}}+\dots)\ll_{C}\prod_{p\leqslant Q}(1+\frac{1}{p})^{2^{C}}$ .

Turning to ˜4.7 itself, it suffices to prove the general estimate

\mathbb{E}_{x\in[X,2X)}|f(x)|^{m}\ll(\log Q)^{O_{m,B}(1)},

(4.13)

for $m\in\mathbf{N}$ , where

f(x)=\sum_{P^{+}(q)\leqslant Q}c_{q}\sum_{(a,q)=1}e(\frac{an}{q}),

the $c_{q}$ are supported on squarefrees and $|c_{q}|\leqslant\tau(q)^{B}/q$ . To prove such an estimate, we first write $f$ in physical space using Kluyver’s identity $\sum_{(a,q)=1}e(an/q)=\sum_{d\mid(n,q)}d\mu(q/d)$ for Ramanujan sums. This gives

f(n)=\sum_{\begin{subarray}{c}P^{+}(d)\leqslant Q\\ d\mid n\end{subarray}}d\sum_{\begin{subarray}{c}d\mid q\\ P^{+}(q)\leqslant Q\end{subarray}}\mu\big(\frac{q}{d}\big)c_{q}=\sum_{\begin{subarray}{c}P^{+}(d)\leqslant Q\\ d\mid n\end{subarray}}\lambda_{d},\quad\mbox{where}\quad\lambda_{d}:=d\sum_{\begin{subarray}{c}d\mid q\\ P^{+}(q)\leqslant Q\end{subarray}}\mu\big(\frac{q}{d}\big)c_{q}.

(4.14)

Now we have

|\lambda_{d}|\leqslant d\sum_{\begin{subarray}{c}d\mid q\\ P^{+}(q)\leqslant Q\end{subarray}}|c_{q}|\leqslant\sum_{P^{+}(k)\leqslant Q}\frac{\tau(kd)^{B}}{k}\leqslant\tau(d)^{B}\sum_{P^{+}(k)\leqslant Q}\frac{\tau(k)^{B}}{k}\ll\tau(d)^{B}(\log Q)^{2^{B}}

(4.15)

by ˜4.12. Now observe that

$\displaystyle\mathbb{E}_{n\in[X,2X)}\big(\sum_{\begin{subarray}{c}P^{+}(d)\leqslant Q\\ d\mid n\end{subarray}}\tau(d)^{B}\big)^{m}$	$\displaystyle=\mathbb{E}_{n\in[X,2X)}\sum_{P^{+}(d_{1}),\dots,P^{+}(d_{m})\leqslant Q}\big(\tau(d_{1})\cdots\tau(d_{m})\big)^{B}1_{[d_{1},\dots,d_{m}]\mid n}$
	$\displaystyle\leqslant\mathbb{E}_{n\in[X,2X)}\tau([d_{1},\dots,d_{m}])^{mB}1_{[d_{1},\dots,d_{m}]\mid n}$
	$\displaystyle\leqslant\mathbb{E}_{n\in[X,2X)}\sum_{P^{+}(d)\leqslant Q}\tau(d)^{mB+m}1_{d\mid n}$
	$\displaystyle\ll\sum_{P^{+}(d)\leqslant Q}\frac{\tau(d)^{mB+m}}{d}\ll(\log Q)^{O_{B,m}(1)},$	(4.16)

In the middle step here the key point was that the number of representations of $d$ as $[d_{1},\dots,d_{m}]$ is at most $\tau(d)^{m}$ , and in the penultimate step that $\mathbb{E}_{n\in[X,2X)}1_{d\mid n}\ll 1/d$ for all $d$ . Combining ˜4.14, 4.15, and 4.16 gives ˜4.13, and so ˜4.7 follows.

The final task is to establish ˜4.3. The first statement is immediate from ˜4.13 and Cauchy–Schwarz. For the second statement (which is rather crude) one can proceed directly from the definition of $\Lambda_{\operatorname{per}}(n)$ using $|c_{q}|\ll 1$ . ∎

We remark that from the first bound in ˜4.3 and the $Q!$ -periodicity of $\Lambda_{\operatorname{per}}$ (or by direct proof) we have

\mathbb{E}_{x\in I}\Lambda_{\operatorname{per}}(x)\ll(\log Q)^{O(1)}

(4.17)

for any interval of length $Q!$ .

Remarks.

It is possible to establish an analogue of Lemma˜4.1 with $\tilde{\Lambda}$ equal to the von Mandoldt function itself, taking $\Lambda_{\operatorname{per}}$ and the $g_{i}$ to be suitable Cramér approximants to the von Mangoldt function and $h=0$ . The details necessary to accomplish this may be found in [Gre05], though the context there was different. There are some advantages to this, for instance $\Lambda_{\operatorname{per}}$ is non-negative and subject to good $\ell^{1}$ - and $\ell^{\infty}$ -bounds. The drawback of proceeding this way is that the bounds are ineffective due to an application of the Siegel-Walfisz theorem. This can be corrected via the introduction of appropriate ‘Siegel-modified Cramér approximants’ as in [TT25] but this is quite technical. By passing to a suitable majorant as in Lemma˜4.1 we can avoid all Siegel zero issues entirely.

5. An inverse theorem

In this section we explore the consequences of an assumption

\big|\mathbb{E}_{n\in[N],p\in\mathscr{P},p^{\prime}\in\mathscr{P}^{\prime}}^{\log}f_{1}(n+\lambda pp^{\prime})f_{2}(\lambda npp^{\prime})\big|\geqslant\delta

(5.1)

where $f_{1},f_{2}:\mathbf{N}\rightarrow\mathbf{C}$ are $1$ -bounded, $\mathscr{P}$ consists of primes, $\mathscr{P}^{\prime}$ of almost primes and $\lambda\in\mathbf{N}$ is some parameter. The reason for being interested in such an assumption was sketched in Section˜1.2 and will be further apparent in Section˜7.

The aim is to show that ˜5.1 implies that $\|f_{1}\|_{U^{1}_{\log}[N;q,H]}$ is large for suitable parameters $q,H$ . (Recall from Definition˜2.3 the definition of these norms.) This result is directly inspired by [Ric25, Theorem 3.5], a connection we shall elaborate upon later. Here is the technical statement of our main result.

Proposition 5.1.

There is an absolute constant $C\in\mathbf{N}$ such that the following holds. Let $\delta\in\mathbf{R}$ be a sufficiently small parameter and let $k\in\mathbf{N}$ . Suppose that $\max(k,1/\delta)\leqslant\log\log N$ and $k\leqslant\delta^{-10}$ . Let $P_{1},P_{2},P^{\prime}_{1},P^{\prime}_{2}$ be parameters with $\exp\exp((\log\log N)^{1/10})\leqslant P^{\prime}_{1}<P^{\prime}_{2}<P_{1}<P_{2}<\exp((\log N)^{1/4})$ and $P_{1}\geqslant(P^{\prime}_{2})^{10}$ . Suppose that $\lambda\in\mathbf{N}$ satisfies $\lambda\leqslant\exp((\log N)^{1/4})$ , and that all prime factors of $\lambda$ are less than $P^{\prime}_{1}$ . Let $\mathscr{P}$ denote the set of primes in $[P_{1},P_{2})$ and suppose that $\mathscr{P}^{\prime}\subset[P^{\prime}_{1},P^{\prime}_{2})$ is a set of ‘almost primes’ of the following form: $\mathscr{P}^{\prime}=\{p_{1}\cdots p_{k}:p_{\ell}\in I_{\ell}\}$ , where $I_{1},\dots,I_{k}\subset[P^{\prime}_{1},P^{\prime}_{2})$ are disjoint intervals, all with $\log\log(\max(I_{\ell}))-\log\log(\min(I_{\ell}))\geqslant k\delta^{-4.1}$ , and the $p_{\ell}$ range over all primes in $I_{\ell}$ for $\ell\in[k]$ . Set $V:=\lfloor\delta^{-C}\rfloor!$ . Suppose we have ˜5.1. Then we have $\|f_{1}\|_{U^{1}_{\log}[N;\lambda V,H]}\gg\delta^{O(1)}$ for any $H\in\mathbf{N}$ with $H\leqslant P_{1}^{1/8}$ .

Remarks.

For the rest of the section we write $\varepsilon_{0}:=\frac{1}{10}$ , thus the lower bound on $\log\log(\max(I_{\ell}))-\log\log(\min(I_{\ell}))$ is $k\delta^{-4-\varepsilon_{0}}$ . Any sufficiently small absolute constant $\varepsilon_{0}$ would do here. More generally, several of the assumptions on parameters are made so as to be comfortable for the required application and we do not claim these conditions are tight. For instance, the lower bound $k\delta^{-4-\varepsilon_{0}}$ could be $k\delta^{-4}(\log(1/\delta))^{C}$ for an appropriate $C$ .

5.1. Setting up the proof of the inverse theorem

The proof of Proposition˜5.1 is somewhat lengthy. We prepare the ground by defining some key parameters and observing simple preliminary bounds. In the proof $C_{1}<C_{2}$ are absolute constants, with $C_{1}$ assumed to be sufficiently large and $C_{2}$ assumed sufficiently large in terms of $C_{1}$ . We will write $Q:=\lfloor\delta^{-C_{2}}\rfloor$ .

Next we point out some consequences of the (somewhat elaborate) conditions on parameters in the statement of Proposition˜5.1. First, the $P^{\prime}_{i},P_{i}$ are enormously larger than powers of $Q!$ (and a fortiori powers of $\delta^{-O(1)}$ ). Indeed $P^{\prime}_{1}\geqslant\exp\exp((\log\log N)^{1/10})$ whilst $Q!\leqslant\exp(\delta^{-O(C_{2})})\leqslant\exp((\log\log N)^{O(C_{2})})$ , using here the assumption that $1/\delta\leqslant\log\log N$ .

Second, we have

P^{\prime}_{1}>(k\log P^{\prime}_{2})^{kL}

(5.2)

for any fixed constant $L$ (assuming $N$ sufficiently large in terms of $L$ ). This is easily confirmed using the assumptions $P^{\prime}_{1}>\exp\exp((\log\log N)^{1/10})>\exp((\log\log N)^{3})$ , $P^{\prime}_{2}\leqslant N$ and $k\leqslant\log\log N$ , and will be used (twice) to verify the key condition in Lemma˜3.2.

Third and finally, we note that all the $P_{i}$ parameters are significantly smaller than $N$ , and one has for example $\frac{\log P_{2}}{\log N}\ll\delta^{10}$ , which will be used several times in the analysis to assert that error terms coming from ˜A.2 are negligible.

Next we record the fact that, under the stated conditions, the elements of $\mathscr{P}^{\prime}$ are almost pairwise coprime. If $\mathcal{N}$ is any finite set of positive integers, we define $\gamma(\mathcal{N}):=\mathbb{E}_{n,n^{\prime}\in\mathcal{N}}^{\log}(n,n^{\prime})-1$ , where $(n,n^{\prime})$ is the gcd of $n,n^{\prime}$ . This is a measure of the pairwise coprimality of elements of $\mathcal{N}$ ; note that $\gamma(\mathcal{N})\geqslant 0$ always, and that if $\gamma(\mathcal{N})$ is small then we expect the elements of $\mathcal{N}$ to be mostly coprime. Recall that $\varepsilon_{0}:=\frac{1}{10}$ (though this is irrelevant to the following lemma).

Lemma 5.2.

Under the conditions of Proposition˜5.1, we have $\gamma(\mathscr{P}^{\prime})\leqslant\delta^{4+\varepsilon_{0}/2}$ .

Proof.

If $\mathscr{P}_{*}$ is a set of primes and $p,p^{\prime}\in\mathscr{P}_{*}$ then $(p,p^{\prime})=1$ unless $p=p^{\prime}$ , and so if we denote by $\mathscr{P}_{\ell}$ the set of primes in $I_{\ell}$ we have

\gamma(\mathscr{P}_{\ell})=\Big(\sum_{p\in\mathscr{P}_{\ell}}\frac{1}{p}\Big)^{-2}\sum_{p\in\mathscr{P}_{\ell}}\frac{p-1}{p^{2}}<\Big(\sum_{p\in\mathscr{P}_{\ell}}\frac{1}{p}\Big)^{-1}.

(5.3)

Now since $\log\log(\max(I_{\ell}))-\log\log(\min(I_{\ell}))\geqslant k\delta^{-4-\varepsilon_{0}}$ , it follows from Mertens’ theorem (see e.g. [Kou19, Theorem 5.4]) and ˜5.3 that we have $\max_{\ell}\gamma(\mathscr{P}_{\ell})\leqslant 2\delta^{4+\varepsilon_{0}}/k$ . It follows that

	$\displaystyle\gamma(\mathscr{P}^{\prime})$	$\displaystyle=\mathbb{E}^{\log}_{p_{1},p^{\prime}_{1}\in\mathscr{P}_{1},\dots,p_{k},p^{\prime}_{k}\in\mathscr{P}_{k}}(p_{1}\cdots p_{k},p^{\prime}_{1},\cdots,p^{\prime}_{k})-1=\prod_{\ell=1}^{k}\mathbb{E}^{\log}_{p_{\ell},p^{\prime}_{\ell}\in\mathscr{P}_{\ell}}(p_{\ell},p^{\prime}_{\ell})-1$
		$\displaystyle=\prod_{\ell=1}^{k}(1+\gamma(\mathscr{P}_{\ell}))-1\leqslant\Big(1+\frac{2\delta^{4+\varepsilon_{0}}}{k}\Big)^{k}-1\leqslant e^{2\delta^{4+\varepsilon_{0}}}-1\leqslant\delta^{4+\varepsilon_{0}/2}.\qed$

As we said, the proof of Proposition˜5.1 is lengthy. Moreover, the logic is somewhat complicated, since it is difficult to state self-contained intermediate lemmas. For reference we summarise the proof structure now.

•

We proceed directly from the assumption ˜5.1 via a series of steps to show that either ˜5.9 or ˜5.10 below holds.
•

We then aim to show that ˜5.9 leads to a contradiction. This is first done subject to an unproven claim ˜5.15.
•

Claim ˜5.15 is proven by contradiction. This task is quickly reduced to showing that statements ˜5.16 and ˜5.17 imply ˜5.18, which is then a somewhat lengthy undertaking.
•

At this point we have confirmed that ˜5.9 cannot hold. Therefore (by the first bullet point) ˜5.10 holds.
•

We then proceed directly from ˜5.10 to the desired conclusion via a quite lengthy (but linear) sequence of manipulations.

5.2. Proof of the inverse theorem

We turn now to the proof of Proposition˜5.1.

Proof.

Throughout the proof we will freely use the fact that $N$ is sufficiently large and that $\delta$ is sufficiently small. The starting assumption is ˜5.1. We start by removing the function $f_{2}$ using essentially the same manipulation as in [Ric25, Theorem 5.2]. First observe that, for each $p,p^{\prime}$ , an application of Lemma˜A.4 yields

\mathbb{E}_{n\in[N]}^{\log}f_{1}(n+\lambda pp^{\prime})f_{2}(\lambda npp^{\prime})=\mathbb{E}_{n\in[N]}^{\log}p^{\prime}\mathbf{1}_{p^{\prime}\mid n}f_{1}\big(\frac{n}{p^{\prime}}+\lambda pp^{\prime}\big)f_{2}(\lambda npp^{\prime})+O\big(\frac{\log P_{2}}{\log N}\big).

Averaging over $p^{\prime}$ (and using the upper bound $\frac{\log P_{2}}{\log N}\ll\delta^{3}$ ) gives

\mathbb{E}^{\log}_{n\in[N],p^{\prime}\in\mathscr{P}^{\prime}}f_{1}(n+\lambda pp^{\prime})f_{2}(\lambda npp^{\prime})=\mathbb{E}^{\log}_{n\in[N],p^{\prime}\in\mathscr{P}^{\prime}}p^{\prime}\mathbf{1}_{p^{\prime}\mid n}f_{1}\big(\frac{n}{p^{\prime}}+\lambda pp^{\prime}\big)f_{2}(\lambda np)+O(\delta^{3}).

Write $g(p)$ for the expression on the left, and $\tilde{g}(p)$ for the first expression on the right; thus $\tilde{g}(p)=g(p)+\varepsilon(p)$ with $|\varepsilon(p)|\ll\delta^{3}$ . Now the assumption is that $|\mathbb{E}^{\log}_{p\in\mathscr{P}}g(p)|\geqslant\delta$ . By Cauchy–Schwarz (since $g$ is $1$ -bounded) we have $\mathbb{E}^{\log}_{p\in\mathscr{P}}|g(p)|^{2}\geqslant\delta^{2}$ . Therefore $\mathbb{E}^{\log}_{p\in\mathscr{P}}|\tilde{g}(p)|^{2}\geqslant\delta^{2}-2\mathbb{E}^{\log}_{p\in\mathscr{P}}|\varepsilon(p)||g(p)|-\mathbb{E}^{\log}_{p\in\mathscr{P}}|\varepsilon(p)|^{2}\geqslant\delta^{2}/2$ , using the $1$ -boundedness of $g$ to estimate the second term. That is,

\mathbb{E}^{\log}_{p\in\mathscr{P}}\Big|\mathbb{E}_{n\in[N],p^{\prime}\in\mathscr{P}^{\prime}}^{\log}p^{\prime}\mathbf{1}_{p^{\prime}\mid n}f_{1}\big(\frac{n}{p^{\prime}}+\lambda pp^{\prime}\big)f_{2}(\lambda np)\Big|^{2}\geqslant\delta^{2}/2.

Using Cauchy–Schwarz on the inner sum (and the $1$ -boundedness of $f_{2}$ ) gives

\mathbb{E}_{p\in\mathscr{P}}^{\log}\mathbb{E}_{n\in[N]}^{\log}\Big|\mathbb{E}_{p^{\prime}\in\mathscr{P}^{\prime}}^{\log}p^{\prime}\mathbf{1}_{p^{\prime}\mid n}f_{1}\big(\frac{n}{p^{\prime}}+\lambda pp^{\prime}\big)\Big|^{2}\geqslant\delta^{2}/2.

We now pass to a non-logarithmic average in the $p$ variable, on a suitable dyadic interval. To do this, first partition $[P_{1},P_{2}]$ into intervals $I$ with $\frac{3}{2}\leqslant\max(I)/\min(I)\leqslant 2$ . By averaging, there is some such $I$ for which

\mathbb{E}_{p\in\mathscr{P}\cap I}^{\log}\mathbb{E}_{n\in[N]}^{\log}\Big|\mathbb{E}_{p^{\prime}\in\mathscr{P}^{\prime}}^{\log}p^{\prime}\mathbf{1}_{p^{\prime}\mid n}f_{1}\big(\frac{n}{p^{\prime}}+\lambda pp^{\prime}\big)\Big|^{2}\geqslant\delta^{2}/2.

Let $X$ be such that $I\subset[X,2X)$ . We introduce the majorant $\tilde{\Lambda}$ from Lemma˜4.1. Since the logarithmic weight $\frac{1}{p}$ varies by a factor at most $2$ on $I$ , it follows that

\mathbb{E}_{x\in[X,2X]}\tilde{\Lambda}(x)\mathbb{E}_{n\in[N]}^{\log}\Big|\mathbb{E}_{p^{\prime}\in\mathscr{P}^{\prime}}^{\log}p^{\prime}\mathbf{1}_{p^{\prime}\mid n}f_{1}\big(\frac{n}{p^{\prime}}+\lambda xp^{\prime}\big)\Big|^{2}\gg\delta^{2}.

Expanding out the square gives

\mathbb{E}_{x\in[X,2X)}\tilde{\Lambda}(x)\mathbb{E}_{n\in[N],p_{1}^{\prime}\in\mathscr{P}^{\prime},p_{2}^{\prime}\in\mathscr{P}^{\prime}}^{\log}p_{1}^{\prime}p^{\prime}_{2}\mathbf{1}_{[p^{\prime}_{1},p^{\prime}_{2}]\mid n}f_{1}\big(\frac{n}{p^{\prime}_{1}}+\lambda xp^{\prime}_{1}\big)\overline{f_{1}\big(\frac{n}{p^{\prime}_{2}}+\lambda xp^{\prime}_{2}\big)}\gg\delta^{2}.

(5.4)

The next technical reduction is to replace the cutoff $\mathbf{1}_{[p^{\prime}_{1},p^{\prime}_{2}]\mid n}$ with $\mathbf{1}_{p^{\prime}_{1}p^{\prime}_{2}\mid n}$ , which we do using the fact that the elements of $\mathscr{P}^{\prime}$ are mostly coprime due to Lemma˜5.2. Let us justify this carefully. Since $f_{1},f_{2}$ are $1$ -bounded and $\mathbb{E}_{x\in[X,2X]}\widetilde{\Lambda}(x)\ll 1$ , the error in making this switch in the LHS of ˜5.4 is bounded up to a constant factor by

\mathbb{E}_{n\in[N],p_{1}^{\prime},p^{\prime}_{2}\in\mathscr{P}^{\prime}}^{\log}p_{1}^{\prime}p_{2}^{\prime}\big|\mathbf{1}_{[p_{1}^{\prime},p_{2}^{\prime}]\mid n}-\mathbf{1}_{p_{1}^{\prime}p_{2}^{\prime}\mid n}\big|.

(5.5)

We have the pointwise bound $\big|\mathbf{1}_{[p_{1}^{\prime},p_{2}^{\prime}]\mid n}-\mathbf{1}_{p_{1}^{\prime}p_{2}^{\prime}\mid n}\big|\leqslant 2\mathbf{1}_{(p_{1}^{\prime},p_{2}^{\prime})\neq 1}\mathbf{1}_{[p^{\prime}_{1},p^{\prime}_{2}]\mid n}$ and therefore

\mathbb{E}^{\log}_{n\in[N]}\big|\mathbf{1}_{[p_{1}^{\prime},p_{2}^{\prime}]\mid n}-\mathbf{1}_{p_{1}^{\prime}p_{2}^{\prime}\mid n}\big|\leqslant 2\mathbf{1}_{(p^{\prime}_{1},p^{\prime}_{2})\neq 1}\mathbb{E}_{n\in[N]}^{\log}\mathbf{1}_{[p^{\prime}_{1},p^{\prime}_{2}]\mid n}\leqslant\frac{4\mathbf{1}_{(p^{\prime}_{1},p^{\prime}_{2})\neq 1}}{[p^{\prime}_{1},p^{\prime}_{2}]},

using in the last step that $p^{\prime}_{1},p^{\prime}_{2}$ are much smaller than $N$ . It follows that ˜5.5 is bounded above by $4\mathbb{E}_{p_{1}^{\prime},p_{2}^{\prime}\in\mathscr{P}^{\prime}}^{\log}(p^{\prime}_{1},p^{\prime}_{2})\mathbf{1}_{(p^{\prime}_{1},p^{\prime}_{2})\neq 1}$ . Using the pointwise bound $(p^{\prime}_{1},p^{\prime}_{2})\mathbf{1}_{(p^{\prime}_{1},p^{\prime}_{2})\neq 1}\leqslant 2((p^{\prime}_{1},p^{\prime}_{2})-1)$ , this in turn is bounded by $8\mathbb{E}_{p_{1}^{\prime},p^{\prime}_{2}\in\mathscr{P}^{\prime}}^{\log}((p^{\prime}_{1},p^{\prime}_{2})-1)=8\gamma(\mathscr{P}^{\prime})$ . By Lemma˜5.2, we see that ˜5.5 is bounded by $O(\delta^{4})$ . Therefore, as claimed, we may replace ˜5.4 by

\Big|\mathbb{E}_{x\in[X,2X)}\tilde{\Lambda}(x)\mathbb{E}_{n\in[N],p_{1}^{\prime},p^{\prime}_{2}\in\mathscr{P}^{\prime}}^{\log}p_{1}^{\prime}p^{\prime}_{2}\mathbf{1}_{p^{\prime}_{1}p^{\prime}_{2}\mid n}f_{1}\big(\frac{n}{p^{\prime}_{1}}+\lambda xp^{\prime}_{1}\big)\overline{f_{1}\big(\frac{n}{p^{\prime}_{2}}+\lambda xp^{\prime}_{2}\big)}\Big|\gg\delta^{2}.

(5.6)

The reason for having replaced ˜5.4 with ˜5.6 is that we may now invoke Lemma˜A.4 (with $q=p^{\prime}_{1}p^{\prime}_{2}$ ) to conclude that

\Big|\mathbb{E}_{x\in[X,2X)}\tilde{\Lambda}(x)\mathbb{E}_{n\in[N],p_{1}^{\prime},p^{\prime}_{2}\in\mathscr{P}^{\prime}}^{\log}f_{1}\big(np^{\prime}_{2}+\lambda xp^{\prime}_{1}\big)\overline{f_{1}\big(np^{\prime}_{1}+\lambda xp^{\prime}_{2}\big)}\Big|\gg\delta^{2}.

(5.7)

We now apply Lemma˜4.1 with parameter $Q=\lfloor\delta^{-C_{2}}\rfloor$ and constant $c:=1/4C_{1}$ . Observe that the required inequality

Q\leqslant\log X

(5.8)

is true and follows from the choice of parameters, using here that $X\geqslant P_{1}$ .

Let $\Lambda_{\operatorname{per}}$ be the $Q!$ -periodic function as in that lemma. Our aim is to replace $\tilde{\Lambda}$ in ˜5.7 by $\Lambda_{\operatorname{per}}$ .

From ˜5.7 and the triangle inequality, one of the following two statements holds:

\Big|\mathbb{E}_{x\in[X,2X)}(\tilde{\Lambda}-\Lambda_{\operatorname{per}})(x)\mathbb{E}_{n\in[N],p_{1}^{\prime},p^{\prime}_{2}\in\mathscr{P}^{\prime}}^{\log}f_{1}\big(np^{\prime}_{2}+\lambda xp^{\prime}_{1}\big)\overline{f_{1}\big(np^{\prime}_{1}+\lambda xp^{\prime}_{2}\big)}\Big|\gg\delta^{2},

(5.9)

\Big|\mathbb{E}_{x\in[X,2X)}\Lambda_{\operatorname{per}}(x)\mathbb{E}_{n\in[N],p_{1}^{\prime},p^{\prime}_{2}\in\mathscr{P}^{\prime}}^{\log}f_{1}\big(np^{\prime}_{2}+\lambda xp^{\prime}_{1}\big)\overline{f_{1}\big(np^{\prime}_{1}+\lambda xp^{\prime}_{2}\big)}\Big|\gg\delta^{2}.

(5.10)

We analyse these two possibilities in turn. In the analysis we will use several times that

\mathbb{E}_{x\in[X,2X)}|(\tilde{\Lambda}-\Lambda_{\operatorname{per}})(x)|\ll(\log Q)^{O(1)},

(5.11)

which follows from ˜4.3 and the triangle inequality (since $\tilde{\Lambda}$ is non-negative).

Analysis of ˜5.9. We begin by dyadically localising the (two copies of) the set $\mathscr{P}^{\prime}$ . Recall that $\mathscr{P}^{\prime}=\{p_{1}\cdots p_{k}:p_{i}\in I_{i}\}$ . Since $\max(I_{i})/\min(I_{i})\geqslant 10$ , we can decompose each $I_{i}$ as a disjoint union of intervals $I_{i,j}$ , each of the form $[Y,(1+\eta_{i,j})Y]$ for some $\eta_{i,j}$ satisfying $\frac{1}{8k}\leqslant\eta_{i,j}\leqslant\frac{1}{4k}$ . We then have a corresponding decomposition $\mathscr{P}^{\prime}=\bigcup_{j_{1},\dots,j_{k}}\mathscr{P}^{\prime}_{j_{1},\dots j_{k}}$ , where $\mathscr{P}^{\prime}_{j_{1},\dots,j_{k}}:=\{p_{1}\cdots p_{k}:p_{i}\in I_{i,j_{i}}\}$ . Note that, since $(1+\frac{1}{4k})^{k}<2$ , each $\mathscr{P}^{\prime}_{j_{1},\dots,j_{k}}$ is contained in a dyadic interval. By averaging, there are $\vec{j}=(j_{1},\dots,j_{k})$ and $\vec{j}^{\prime}=(j^{\prime}_{1},\dots,j^{\prime}_{k})$ such that

\mathbb{E}_{p_{1}^{\prime}\in\mathscr{P}^{\prime}_{\vec{j}},p^{\prime}_{2}\in\mathscr{P}_{\vec{j}^{\prime}}^{\prime}}^{\log}\Big|\mathbb{E}_{x\in[X,2X)}(\tilde{\Lambda}-\Lambda_{\operatorname{per}})(x)\mathbb{E}_{n\in[N]}^{\log}f_{1}\big(np^{\prime}_{2}+\lambda xp^{\prime}_{1}\big)\overline{f_{1}\big(np^{\prime}_{1}+\lambda xp^{\prime}_{2}\big)}\Big|\gg\delta^{2}.

(5.12)

For notational brevity, write $\mathscr{P}^{\prime}_{1}:=\mathscr{P}^{\prime}_{\vec{j}}$ and $\mathscr{P}^{\prime}_{2}:=\mathscr{P}^{\prime}_{\vec{j}^{\prime}}$ . As $\mathscr{P}_{1}^{\prime},\mathscr{P}^{\prime}_{2}$ are each contained in dyadic intervals, we can remove the logarithmic averaging to obtain

\mathbb{E}_{p_{1}^{\prime}\in\mathscr{P}_{1}^{\prime},p^{\prime}_{2}\in\mathscr{P}^{\prime}_{2}}\Big|\mathbb{E}_{x\in[X,2X)}(\tilde{\Lambda}-\Lambda_{\operatorname{per}})(x)\mathbb{E}_{n\in[N]}^{\log}f_{1}\big(np^{\prime}_{2}+\lambda xp^{\prime}_{1}\big)\overline{f_{1}\big(np^{\prime}_{1}+\lambda xp^{\prime}_{2}\big)}\Big|\gg\delta^{2}.

(5.13)

The next several manipulations leading to ˜5.14 are straightforward and are aimed to replacing the logarithmic average over $n$ by an ordinary average on an appropriate subinterval. We first discard the contribution from small values of $N$ . Set $N^{\prime}:=e^{(\log N)^{3/4}}$ (say). Writing ˜5.13 as

\mathbb{E}_{p_{1}^{\prime}\in\mathscr{P}_{1}^{\prime},p^{\prime}_{2}\in\mathscr{P}^{\prime}_{2}}\Big|\mathbb{E}_{x\in[X,2X)}(\tilde{\Lambda}-\Lambda_{\operatorname{per}})(x)\sum_{n\in[N]}\frac{1}{n}f_{1}\big(np^{\prime}_{2}+\lambda xp^{\prime}_{1}\big)\overline{f_{1}\big(np^{\prime}_{1}+\lambda xp^{\prime}_{2}\big)}\Big|\gg\delta^{2}H_{N},

(where $H_{N}$ is the harmonic sum), using ˜5.11 we see that the contribution to the LHS from $n\leqslant N^{\prime}$ is bounded by $H_{N^{\prime}}(\log Q)^{O(1)}<\delta^{10}H_{N}$ , using here that $1/\delta\leqslant\log\log N$ .

Since $\frac{H_{N}}{H_{N}-H_{N^{\prime}}}\approx 1$ , it follows that we may replace ˜5.13 by

\mathbb{E}_{p_{1}^{\prime}\in\mathscr{P}_{1}^{\prime},p^{\prime}_{2}\in\mathscr{P}^{\prime}_{2}}\Big|\mathbb{E}_{x\in[X,2X)}(\tilde{\Lambda}-\Lambda_{\operatorname{per}})(x)\mathbb{E}_{n\in[N^{\prime},N]}^{\log}f_{1}\big(np^{\prime}_{2}+\lambda xp^{\prime}_{1}\big)\overline{f_{1}\big(np^{\prime}_{1}+\lambda xp^{\prime}_{2}\big)}\Big|\gg\delta^{2}.

We now break $[N^{\prime},N]$ into intervals $I$ whose lengths satisfy $e^{(\log N)^{1/2}}\leqslant|I|\leqslant 2e^{(\log N)^{1/2}}$ . By pigeonhole there exists such an interval for which

\mathbb{E}_{p_{1}^{\prime}\in\mathscr{P}_{1}^{\prime},p_{2}^{\prime}\in\mathscr{P}_{2}^{\prime}}\Big|\mathbb{E}_{x\in[X,2X)}(\tilde{\Lambda}-\Lambda_{\operatorname{per}})(x)\mathbb{E}_{n\in I}^{\log}f_{1}\big(np^{\prime}_{2}+\lambda xp^{\prime}_{1}\big)\overline{f_{1}\big(np^{\prime}_{1}+\lambda xp^{\prime}_{2}\big)}\Big|\gg\delta^{2}.

The weight $1/n$ varies by at most $1+O(|I|\cdot N^{\prime-1})$ on $I$ and so, using ˜5.11, we can justify replacing $\mathbb{E}^{\log}_{n\in I}$ with a uniform average $\mathbb{E}_{n\in I}$ , thus obtaining

\mathbb{E}_{p_{1}^{\prime}\in\mathscr{P}_{1}^{\prime},p_{2}^{\prime}\in\mathscr{P}^{\prime}_{2}}\Big|\mathbb{E}_{x\in[X,2X)}(\tilde{\Lambda}-\Lambda_{\operatorname{per}})(x)\mathbb{E}_{n\in I}f_{1}\big(np^{\prime}_{2}+\lambda xp^{\prime}_{1}\big)\overline{f_{1}\big(np^{\prime}_{1}+\lambda xp^{\prime}_{2}\big)}\Big|\gg\delta^{2}.

(5.14)

Our plan now is to use the decomposition $\tilde{\Lambda}-\Lambda_{\operatorname{per}}=\sum_{i}g_{i}+h$ from Lemma˜4.1 in order to obtain a contradiction from ˜5.14. To do this, we claim that for a general function $\psi:[X,2X)\rightarrow\mathbf{C}$ we have

	$\displaystyle\mathbb{E}_{p_{1}^{\prime}\in\mathscr{P}_{1}^{\prime},p^{\prime}_{2}\in\mathscr{P}^{\prime}_{2}}\Big\|\mathbb{E}_{x\in[X,2X)}$	$\displaystyle\psi(x)\mathbb{E}_{n\in I}f_{1}\big(np^{\prime}_{2}+\lambda xp^{\prime}_{1}\big)\overline{f_{1}\big(np^{\prime}_{1}+\lambda xp^{\prime}_{2}\big)}\Big\|$
		$\displaystyle\ll\min\Big(\mathbb{E}_{x\in[X,2X)}\|\psi(x)\|,\frac{k}{X^{c}}\\|\widehat{\psi}\\|_{\infty}^{c}\\|\psi\\|_{\infty}^{1-c}+(\log X)^{-C_{2}}\\|\psi\\|_{\infty}\Big).$		(5.15)

Here, $\widehat{\psi}(\theta)=\sum_{x\in[X,2X)}\psi(x)e(-\theta x)$ . Assuming the claim for now, we see that the LHS of ˜5.14 is bounded above by

\ll\frac{k}{X^{c}}\sum_{i}\|\widehat{g_{i}}\|_{\infty}^{c}\|g_{i}\|_{\infty}^{1-c}+(\log X)^{-C_{2}}\sum_{i}\|g_{i}\|_{\infty}+\mathbb{E}_{x\in[X,2X)}|h(x)|\ll kQ^{-c/4}

by ˜4.4, 4.6, and 4.5, assuming here that $C_{2}$ is sufficiently large and noting ˜5.8. This contradicts ˜5.14, recalling here that $Q=\lfloor\delta^{-C_{2}}\rfloor$ , that $C_{2}$ is sufficiently large in terms of $C_{1}$ , and additionally recalling here our assumption (in Proposition˜5.1) that $k\leqslant\delta^{-10}$ . That is (assuming the claim ˜5.15) we cannot have ˜5.9, and therefore ˜5.10 holds.

Proof of claim ˜5.15. The first bound is trivial, but the second is a somewhat involved task. By homogeneity, we may assume that $\|\psi\|_{\infty}=1$ . Thus if the second bound in ˜5.15 does not hold, we have

\mathbb{E}_{p_{1}^{\prime}\in\mathscr{P}_{1}^{\prime},p^{\prime}_{2}\in\mathscr{P}^{\prime}_{2}}\Big|\mathbb{E}_{x\in[X,2X)}\psi(x)\mathbb{E}_{n\in I}f_{1}\big(np^{\prime}_{2}+\lambda xp^{\prime}_{1}\big)\overline{f_{1}\big(np^{\prime}_{1}+\lambda xp^{\prime}_{2}\big)}\Big|\geqslant\tau/\tau_{0},

(5.16)

where we are free to choose an absolute $\tau_{0}$ and $\tau:=\frac{k}{X^{c}}\|\widehat{\psi}\|_{\infty}^{c}+(\log X)^{-C_{2}}$ , thus in particular

\tau\in[(\log X)^{-C_{2}},\tau_{0}].

(5.17)

It therefore suffices to show that the assumption ˜5.16 and the inclusion ˜5.17 imply that

\|\widehat{\psi}\|_{\infty}=\sup_{\theta\in\mathbf{R}/\mathbf{Z}}|\widehat{\psi}(\theta)|\geqslant(\tau/k)^{1/c}X,

(5.18)

since this immediately contradicts the definition of $\tau$ . The remainder of the proof of claim ˜5.15 is devoted to this task.

Suppose that $\mathscr{P}^{\prime}_{1}\subset[Y_{1},2Y_{1}]$ and $\mathscr{P}^{\prime}_{2}\subset[Y_{2},2Y_{2}]$ , where $P^{\prime}_{1}\leqslant Y_{1},Y_{2}\leqslant P^{\prime}_{2}$ . Set $T_{1}:=\lfloor\tau^{C_{1}}X/Y_{1}\rfloor$ and $T_{2}:=\lfloor\tau^{C_{1}}X/Y_{2}\rfloor$ . Since $X\geqslant P_{1}>(P^{\prime}_{2})^{10}\geqslant Y_{i}^{10}$ (by one of the assumptions of Proposition˜5.1) and $\tau^{C_{1}}\geqslant(\log X)^{-C_{1}C_{2}}$ we have $T_{1},T_{2}\geqslant X^{1/2}\geqslant 1$ . Let $t_{1},t_{2}$ be integers with $|t_{i}|\leqslant T_{i}$ , and substitute $n:=n^{\prime}-\lambda p^{\prime}_{1}t_{2}-\lambda p^{\prime}_{2}t_{1}$ , $x:=x^{\prime}+p^{\prime}_{1}t_{1}+p^{\prime}_{2}t_{2}$ in ˜5.16. This gives

	$\displaystyle\mathbb{E}_{p_{1}^{\prime}\in\mathscr{P}_{1}^{\prime},p_{2}^{\prime}\in\mathscr{P}_{2}^{\prime}}$	$\displaystyle\Big\|\mathbb{E}_{x^{\prime}\in[X,2X)-p^{\prime}_{1}t_{1}-p^{\prime}_{2}t_{2}}\mathbb{E}_{n^{\prime}\in I+\lambda(p^{\prime}_{1}t_{2}+p^{\prime}_{2}t_{1})}f_{1}\big(n^{\prime}p^{\prime}_{2}+\lambda x^{\prime}p^{\prime}_{1}+\lambda(p_{1}^{\prime 2}-p_{2}^{\prime 2})t_{1}\big)$
		$\displaystyle\times\overline{f_{1}\big(n^{\prime}p^{\prime}_{1}+\lambda x^{\prime}p^{\prime}_{2}+\lambda(p_{2}^{\prime 2}-p_{1}^{\prime 2})t_{2}\big)}\psi(x^{\prime}+p^{\prime}_{1}t_{1}+p^{\prime}_{2}t_{2})\Big\|\geqslant\tau.$		(5.19)

Now observe that $|p^{\prime}_{1}t_{1}+p^{\prime}_{2}t_{2}|\ll\tau^{C_{1}}X$ , and also we have the crude bound

|\lambda(p^{\prime}_{1}t_{2}+p^{\prime}_{2}t_{1})|\leqslant|\lambda|P^{\prime}_{2}X\leqslant e^{3(\log N)^{1/4}}\ll\tau^{10}|I|,

using here that all of $|\lambda|,P^{\prime}_{2},X$ are $\leqslant e^{(\log N)^{1/4}}$ , that $|I|\geqslant e^{(\log N)^{1/2}}$ and that $\tau\geqslant(\log X)^{-C_{2}}>(\log N)^{-C_{2}}$ . It follows using Lemma˜A.1 that for each fixed $p^{\prime}_{1},p^{\prime}_{2}$ we may replace the $x^{\prime}$ -average in ˜5.19 by $\mathbb{E}_{x\in[X,2X)}$ , and the $n^{\prime}$ -average by $\mathbb{E}_{n^{\prime}\in I}$ , at the cost of changing the inner sum in ˜5.19 by $O(\tau^{2})$ . Doing this, averaging over $t_{1},t_{2}$ and dropping the dashes on $x^{\prime},n^{\prime}$ for clarity, we obtain

	$\displaystyle\mathbb{E}_{p_{1}^{\prime}\in\mathscr{P}_{1}^{\prime},p_{2}^{\prime}\in\mathscr{P}_{2}^{\prime}}\Big\|$	$\displaystyle\mathbb{E}_{t_{1}\in[T_{1}],t_{2}\in[T_{2}],x\in[X,2X),n\in I}f_{1}\big(np^{\prime}_{2}+\lambda xp^{\prime}_{1}+\lambda(p_{1}^{\prime 2}-p_{2}^{\prime 2})t_{1}\big)$
		$\displaystyle\qquad\qquad\times\overline{f_{1}\big(np^{\prime}_{1}+\lambda xp^{\prime}_{2}+\lambda(p_{2}^{\prime 2}-p_{1}^{\prime 2})t_{2}\big)}\psi(x+p^{\prime}_{1}t_{1}+p^{\prime}_{2}t_{2})\Big\|\geqslant\tau/2.$

Therefore there exists $n$ such that

	$\displaystyle\mathbb{E}_{p_{1}^{\prime}\in\mathscr{P}_{1}^{\prime},p_{2}^{\prime}\in\mathscr{P}_{2}^{\prime}}\Big\|$	$\displaystyle\mathbb{E}_{t_{1}\in[T_{1}],t_{2}\in[T_{2}],x\in[X,2X)}f_{1}\big(np^{\prime}_{2}+\lambda xp^{\prime}_{1}+\lambda(p_{1}^{\prime 2}-p_{2}^{\prime 2})t_{1}\big)$
		$\displaystyle\qquad\qquad\qquad\times\overline{f_{1}\big(np^{\prime}_{1}+\lambda xp^{\prime}_{2}+\lambda(p_{2}^{\prime 2}-p_{1}^{\prime 2})t_{2}\big)}\psi(x+p^{\prime}_{1}t_{1}+p^{\prime}_{2}t_{2})\Big\|\geqslant\tau/2.$

This implies that

\displaystyle\mathbb{E}_{p_{1}^{\prime}\in\mathscr{P}_{1}^{\prime},p_{2}^{\prime}\in\mathscr{P}_{2}^{\prime},t_{1}\in[T_{1}],t_{2}\in[T_{2}],x\in[X,2X)}F_{1}(p_{1}^{\prime},p_{2}^{\prime},x,t_{1})F_{2}(p_{1}^{\prime},p_{2}^{\prime},x,t_{2})\psi(x+p^{\prime}_{1}t_{1}+p^{\prime}_{2}t_{2})\geqslant\tau/2

with $F_{i}$ being $1$ -bounded functions; here we have absorbed the absolute value as a unit complex number into $F_{1}(p_{1}^{\prime},p_{2}^{\prime},x,t_{1})$ . We now apply Cauchy–Schwarz twice, and replace the dummy variable $x$ by $n$ , to obtain that

\mathbb{E}_{p_{1}^{\prime}\in\mathscr{P}_{1}^{\prime},p_{2}^{\prime}\in\mathscr{P}_{2}^{\prime},t_{1},t_{1}^{\prime}\in[T_{1}],t_{2},t^{\prime}_{2}\in[T_{2}],n\in[X,2X)}\Delta_{p^{\prime}_{1}(t_{1},t_{1}^{\prime})}\Delta_{p_{2}^{\prime}(t_{2},t_{2}^{\prime})}\psi(n)\geqslant(\tau/2)^{4}.

(The notation used here is described in Section˜1.4.)

The expression on the LHS is the same as the one in ˜2.12, with $S_{i}=\mathscr{P}^{\prime}_{i}$ . In order to apply Lemma˜2.6, we need the sets $\mathscr{P}^{\prime}_{1},\mathscr{P}^{\prime}_{2}$ to have suitable diophantine properties. Such a statement is precisely the content of Lemma˜3.2. To see this, recall that by definition we have $\mathscr{P}^{\prime}_{1}=\{p_{1}\cdots p_{k}:p_{i}\in I^{\prime}_{i}\}$ , where each interval $I^{\prime}_{i}$ has the form $[M_{i},(1+\eta_{i})M_{i})$ for some $\eta_{i}\in(1/8k,1/4k)$ and some $M_{i}$ , which we may assume to be the smallest prime $p_{i}$ in $I^{\prime}_{i}$ . Note that we always have $P^{\prime}_{1}\leqslant M_{i}\leqslant P^{\prime}_{2}$ , and $Y_{1}\leqslant M_{1}\cdots M_{k}$ since $M_{1}\cdots M_{k}\in\mathscr{P}^{\prime}_{1}$ and $\mathscr{P}^{\prime}_{1}\subset[Y_{1},2Y_{1}]$ . We now apply Lemma˜3.2 with $j=1$ . The required condition $\min_{i}M_{i}>Q^{L_{1}}$ in that lemma follows using ˜5.2. Thus Lemma˜3.2 gives that $\mathscr{P}^{\prime}_{1}$ is $(L_{1},k,Y_{1})$ -diophantine, for some absolute constant $L_{1}$ . Similarly, $\mathscr{P}^{\prime}_{2}$ is $(L_{1},k,Y_{2})$ -diophantine.

We may now apply Lemma˜2.6 with $S_{i}=\mathscr{P}^{\prime}_{i}$ for $i=1,2$ , $\delta=(\tau/2)^{4}$ , $L=L_{1}$ , $L^{\prime}=k$ and $D_{i}=Y_{i}$ for $i=1,2$ . To apply that lemma we need to verify, for $i=1,2$ , the three conditions $D_{i},T_{i}\geqslant(L^{\prime}/\delta)^{C_{\operatorname{\ref{lem:input-concat-2-iter}}}L^{2}}$ and $T_{i}D_{i}\leqslant(L^{\prime}/\delta)^{C_{\operatorname{\ref{lem:input-concat-2-iter}}}L^{2}}X$ . The first condition holds comfortably using $Y_{i}\geqslant P^{\prime}_{1}$ and the choice of parameters. The second condition holds even more comfortably using $T_{i}\geqslant X^{1/2}\geqslant P_{1}^{1/2}$ and the choice of parameters. Finally, the third condition holds using that $T_{i}D_{i}\asymp\tau^{C_{1}}X$ , provided $C_{1}$ is large enough; larger than $4L_{1}C_{\operatorname{\ref{lem:input-concat-2-iter}}}$ is sufficient.

The conclusion of Lemma˜2.6 gives that for any $H_{1},H_{2}\in\mathbf{N}$ with $H_{i}\leqslant(\tau/k)^{2C_{1}}X$ , there are $q_{1},q_{2}\in\mathbf{N}$ , $q_{i}\leqslant(k/\tau)^{O(1)}$ such that

\mathbb{E}_{n\in[2X],h_{1},h^{\prime}_{1}\in[H_{1}],h_{2},h^{\prime}_{2}\in[H_{2}]}\Delta_{q_{1}(h_{1},h^{\prime}_{1})}\Delta_{q_{2}(h_{2},h^{\prime}_{2})}\psi(n)\gg(\tau/k)^{O(1)}.

(5.20)

Set $H_{1}=H_{2}=H:=\lfloor(\tau/k)^{2C_{1}}X\rfloor$ . The expression on the left in ˜5.20 is closely related to the Gowers $U^{2}$ -norm of $\psi$ (or more accurately a Gowers–Peluse norm; see [Pel20] where they are called “Gowers box norms”). Rather than appeal to any general theory of such norms, we proceed with a direct analysis using the Fourier transform. By the Fourier expansion $\psi(n)=\int_{\mathbf{R}/\mathbf{Z}}\widehat{\psi}(\theta)e(n\theta)d\theta$ , ˜5.20 is

	$\displaystyle\int\widehat{\psi}(\theta_{1})\overline{\widehat{\psi}(\theta_{2})}$	$\displaystyle\overline{\widehat{\psi}(\theta_{3})}\widehat{\psi}(\theta_{4})\widehat{\mu_{[2X]}}(-\theta_{1}+\theta_{2}+\theta_{3}-\theta_{4})\widehat{\mu_{[H]}}(q_{1}(-\theta_{1}+\theta_{3}))\widehat{\mu_{[H]}}(q_{1}(-\theta_{2}+\theta_{4})))$
		$\displaystyle\qquad\qquad\times\widehat{\mu_{[H]}}(q_{2}(-\theta_{1}+\theta_{2}))\widehat{\mu_{[H]}}(q_{2}(-\theta_{3}+\theta_{4}))d\theta_{1}d\theta_{2}d\theta_{3}d\theta_{4}\gg(\tau/k)^{O(1)}.$

Here, $\mu_{[M]}$ denotes the normalised probability measure on $[M]$ . By AM-GM and the pointwise bound $|\widehat{\mu_{[2X]}}|\leqslant 1$ we have that

	$\displaystyle\int\sum_{j=1}^{4}\|\widehat{\psi}(\theta_{j})\|^{4}$	$\displaystyle\Big\|\widehat{\mu_{[H]}}(q_{1}(-\theta_{1}+\theta_{3}))\widehat{\mu_{[H]}}(q_{1}(-\theta_{2}+\theta_{4})))$
		$\displaystyle\qquad\qquad\times\widehat{\mu_{[H]}}(q_{2}(-\theta_{1}+\theta_{2}))\widehat{\mu_{[H]}}(q_{2}(-\theta_{3}+\theta_{4}))\Big\|d\theta_{1}d\theta_{2}d\theta_{3}d\theta_{4}\gg(\tau/k)^{O(1)}.$

Substitute $\theta^{\prime}_{i}=-\theta_{i}+t$ , for $t\in\mathbf{R}/\mathbf{Z}$ , and integrate over $t$ . This gives (dropping the dashes)

	$\displaystyle\Big(4\int\|\widehat{\psi}(t)\|^{4}~dt\Big)$	$\displaystyle\int\Big\|\widehat{\mu_{[H]}}(q_{1}(\theta_{1}-\theta_{3}))\widehat{\mu_{[H]}}(q_{1}(\theta_{2}-\theta_{4})))$
		$\displaystyle\qquad\qquad\times\widehat{\mu_{[H]}}(q_{2}(\theta_{1}-\theta_{2}))\widehat{\mu_{[H]}}(q_{2}(\theta_{3}-\theta_{4}))\Big\|d\theta_{1}d\theta_{2}d\theta_{3}d\theta_{4}\gg(\tau/k)^{O(1)}.$		(5.21)

We claim that

\int\Big|\widehat{\mu_{[H]}}(q_{1}(\theta_{1}-\theta_{3}))\widehat{\mu_{[H]}}(q_{1}(\theta_{2}-\theta_{4})))\widehat{\mu_{[H]}}(q_{2}(\theta_{1}-\theta_{2}))\widehat{\mu_{[H]}}(q_{2}(\theta_{3}-\theta_{4}))\Big|d\theta_{1}d\theta_{2}d\theta_{3}d\theta_{4}\ll H^{-3}.

(5.22)

By AM-GM and symmetry it suffices to prove that

\int\Big|\widehat{\mu_{[H]}}(q_{1}(\theta_{1}-\theta_{3}))\widehat{\mu_{[H]}}(q_{1}(\theta_{2}-\theta_{4})))\widehat{\mu_{[H]}}(q_{2}(\theta_{1}-\theta_{2}))\Big|^{4/3}d\theta_{1}d\theta_{2}d\theta_{3}d\theta_{4}\ll H^{-3}.

The triple $(q_{1}(\theta_{1}-\theta_{3}),q_{1}(\theta_{2}-\theta_{4}),q_{2}(\theta_{1}-\theta_{2}))$ ranges uniformly over $(\mathbf{R}/\mathbf{Z})^{3}$ as $(\theta_{1},\theta_{2},\theta_{3},\theta_{4})$ ranges over $(\mathbf{R}/\mathbf{Z})^{4}$ and so it is enough to show that $\int|\widehat{\mu_{[H]}}(\theta)|^{4/3}d\theta\ll H^{-1}$ . This, however, follows immediately using the bound $|\widehat{\mu_{[H]}}(\theta)|\ll\min\big(1,H^{-1}\|\theta\|_{\mathbf{R}/\mathbf{Z}}^{-1}\big)$ . The claim ˜5.22 is therefore proven. From this and ˜5.21 we immediately have $\int|\widehat{\psi}(t)|^{4}d\theta\geqslant(\tau/k)^{O(1)}H^{3}\gg(\tau/k)^{7C_{1}}X^{3}$ (if $C_{1}$ is sufficiently large). Since $\int|\widehat{\psi}(t)|^{2}dt\ll X$ by Parseval, it follows that $\|\widehat{\psi}\|_{\infty}\gg(\tau/k)^{7C_{1}/2}X$ and so $\|\widehat{\psi}\|_{\infty}\geqslant(\tau/k)^{4C_{1}}X=(\tau/k)^{1/c}X$ . In this last step we used the fact ˜5.17 that $\tau\leqslant\tau_{0}$ ; what is written is then true if $\tau_{0}$ is chosen sufficiently small. This completes the proof that the claims ˜5.16 and 5.17 imply ˜5.18, and hence finishes the proof of the claim ˜5.15.

As explained just before the statement of claim ˜5.15, it now follows that ˜5.10 holds. The remainder of the proof of Proposition˜5.1 consists of the analysis of this case.

Analysis of ˜5.10. We first recall the statement, which is (after a mild reordering of the averaging operators)

\Big|\mathbb{E}^{\log}_{p_{1}^{\prime},p_{2}^{\prime}\in\mathscr{P}^{\prime}}\mathbb{E}_{h\in[X,2X)}\Lambda_{\operatorname{per}}(h)\mathbb{E}_{n\in[N]}^{\log}f_{1}\big(np^{\prime}_{2}+\lambda hp^{\prime}_{1}\big)\overline{f_{1}\big(np^{\prime}_{1}+\lambda hp^{\prime}_{2}\big)}\Big|\gg\delta^{2}.

(5.23)

The advantage of having the function $\Lambda_{\operatorname{per}}$ in place of $\tilde{\Lambda}$ is that the former is invariant under shifts by $Q!$ . This is by construction (Lemma˜4.1); recall here that $Q=\lfloor\delta^{-C_{2}}\rfloor$ . For fixed $p^{\prime}_{1},p^{\prime}_{2}$ , in the inner average over $h$ and $n$ in ˜5.23 we substitute $n:=n^{\prime}-Q!\lambda p^{\prime}_{2}t$ and $h:=h^{\prime}+Q!p^{\prime}_{1}t$ for some $t\in\mathbf{Z}$ and then average over all $t\in[P_{1}^{1/2}]$ . By the periodicity of $\Lambda_{\operatorname{per}}$ we obtain

	$\displaystyle\Big\|\mathbb{E}_{t\in[P_{1}^{1/2}]}\mathbb{E}^{\log}_{p_{1}^{\prime},p_{2}^{\prime}\in\mathscr{P}^{\prime}}\mathbb{E}_{h^{\prime}\in[X,2X)-Q!p^{\prime}_{1}t}\Lambda_{\operatorname{per}}(h^{\prime})\mathbb{E}_{n^{\prime}\in[N]+Q!\lambda p^{\prime}_{2}t}^{\log}$	$\displaystyle f_{1}\big(n^{\prime}p^{\prime}_{2}+\lambda h^{\prime}p^{\prime}_{1}+\lambda tQ!(p^{\prime 2}_{1}-p^{\prime 2}_{2})\big)$
		$\displaystyle\times\overline{f_{1}\big(n^{\prime}p^{\prime}_{1}+\lambda h^{\prime}p^{\prime}_{2}\big)}\Big\|\gg\delta^{2}.$		(5.24)

Fix $t,p^{\prime}_{1},p^{\prime}_{2}$ . By ˜A.2, crude bounds for the parameters, and ˜4.3, the error in replacing the average over $n^{\prime}$ by $\mathbb{E}^{\log}_{n^{\prime}\in[N]}$ is

\ll\frac{\log(Q!|\lambda|P^{\prime}_{2}P_{1}^{1/2})}{\log N}\mathbb{E}_{h^{\prime}}|\Lambda_{\operatorname{per}}(h^{\prime})|\ll(\log N)^{-1/2}\mathbb{E}_{h^{\prime}}|\Lambda_{\operatorname{per}}(h^{\prime})|\ll(\log N)^{-1/2}\log(\frac{1}{\delta})^{O(1)}\lll\delta^{10},

so we may make this replacement without affecting ˜5.24. Moreover, by applying ˜A.1 and the bound $\|\Lambda_{\operatorname{per}}\|_{\infty}\leqslant Q^{2}$ , the error in then replacing the average over $h^{\prime}$ by $\mathbb{E}_{h^{\prime}\in[X,2X)}$ is $\ll Q^{2}\cdot\frac{Q!P^{\prime}_{1}P_{1}^{1/2}}{X}\ll P_{1}^{-1/4}\ll\delta^{10}$ , so we may again make the replacement without affecting ˜5.24. (In the chain of inequalities here we used that $P^{\prime}_{1}\leqslant P^{\prime}_{2}\leqslant P_{1}^{1/10}$ , that $X\geqslant P_{1}$ and that $P_{1}$ is much larger than fixed powers of $Q!$ and $\delta^{-1}$ , cf. remarks in Section˜5.1.) Having made these two replacements we drop the dashes on $n^{\prime},h^{\prime}$ for clarity, thereby arriving at

\Big|\mathbb{E}_{t\in[P_{1}^{1/2}]}\mathbb{E}^{\log}_{p^{\prime}_{1},p_{2}^{\prime}\in\mathscr{P}^{\prime}}\mathbb{E}_{h\in[X,2X)}\Lambda_{\operatorname{per}}(h)\mathbb{E}_{n\in[N]}^{\log}f_{1}\big(np^{\prime}_{2}+\lambda hp^{\prime}_{1}+\lambda tQ!(p^{\prime 2}_{1}-p^{\prime 2}_{2})\big)\overline{f_{1}\big(np^{\prime}_{1}+\lambda hp^{\prime}_{2}\big)}\Big|\gg\delta^{2}.

By the triangle inequality, we obtain

\mathbb{E}_{h\in[X,2X)}|\Lambda_{\operatorname{per}}(h)|\mathbb{E}_{n\in[N],p_{1}^{\prime},p_{2}^{\prime}\in\mathscr{P}^{\prime}}^{\log}\Big|\mathbb{E}_{t\in[P_{1}^{1/2}]}f_{1}\big(np^{\prime}_{2}+\lambda hp^{\prime}_{1}+\lambda tQ!(p^{\prime 2}_{1}-p^{\prime 2}_{2})\big)\Big|\gg\delta^{2}.

Applying Cauchy–Schwarz, we obtain

	$\displaystyle\mathbb{E}_{h\in[X,2X)}\|\Lambda_{\operatorname{per}}(h)\|\mathbb{E}_{t,t^{\prime}\in[P_{1}^{1/2}]}\mathbb{E}_{n\in[N],p_{1}^{\prime},p^{\prime}_{2}\in\mathscr{P}^{\prime}}^{\log}f_{1}\big($	$\displaystyle np^{\prime}_{2}+\lambda hp^{\prime}_{1}+\lambda tQ!(p^{\prime 2}_{1}-p^{\prime 2}_{2})\big)$
		$\displaystyle\times\overline{f_{1}\big(np^{\prime}_{2}+\lambda hp^{\prime}_{1}+\lambda t^{\prime}Q!(p^{\prime 2}_{1}-p^{\prime 2}_{2})\big)}\gg\delta^{4}.$

Using the pointwise bound $\mathbf{1}_{(p^{\prime}_{1},p^{\prime}_{2})\neq 1}\leqslant(p^{\prime}_{1},p^{\prime}_{2})-1$ and the fact (Lemma˜5.2) that $\gamma(\mathscr{P}^{\prime})\leqslant\delta^{4+\varepsilon_{0}/2}$ , as well as the bound $\mathbb{E}_{h\in[X,2X)}|\Lambda_{\operatorname{per}}(h)|\ll\log^{O(1)}(1/\delta)$ (see ˜4.3), we see that the contribution from pairs with $(p^{\prime}_{1},p^{\prime}_{2})\neq 1$ can be ignored. Thus

	$\displaystyle\mathbb{E}_{h\in[X,2X)}\|\Lambda_{\operatorname{per}}(h)\|\mathbb{E}_{t,t^{\prime}\in[P_{1}^{1/2}]}\mathbb{E}_{n\in[N],p_{1}^{\prime},p^{\prime}_{2}\in\mathscr{P}^{\prime}}^{\log}\mathbf{1}_{(p^{\prime}_{1},p^{\prime}_{2})=1}$	$\displaystyle f_{1}\big(np^{\prime}_{2}+\lambda hp^{\prime}_{1}+\lambda tQ!(p^{\prime 2}_{1}-p^{\prime 2}_{2})\big)$
		$\displaystyle\times\overline{f_{1}\big(np^{\prime}_{2}+\lambda hp^{\prime}_{1}+\lambda t^{\prime}Q!(p^{\prime 2}_{1}-p^{\prime 2}_{2})\big)}\gg\delta^{4}.$

Since $\Lambda_{\operatorname{per}}$ is invariant under shifts by $Q!$ , we may introduce an additional average obtaining

	$\displaystyle\mathbb{E}_{h\in[X,2X),h^{\prime}\in[X^{\prime}],t,t^{\prime}\in[P_{1}^{1/2}]}\|\Lambda_{\operatorname{per}}(h)\|\mathbb{E}_{n\in[N],p_{1}^{\prime},p^{\prime}_{2}\in\mathscr{P}^{\prime}}^{\log}\mathbf{1}_{(p^{\prime}_{1},p^{\prime}_{2})=1}$
	$\displaystyle\times f_{1}\big(np^{\prime}_{2}+\lambda(h+Q!h^{\prime})p^{\prime}_{1}+\lambda tQ!(p^{\prime 2}_{1}-p^{\prime 2}_{2})\big)\overline{f_{1}\big(np^{\prime}_{2}+\lambda(h+Q!h^{\prime})p^{\prime}_{1}+\lambda t^{\prime}Q!(p^{\prime 2}_{1}-p^{\prime 2}_{2})\big)}\gg\delta^{4},$

where here $X^{\prime}:=\lfloor\delta^{5}X/Q!\rfloor$ ; note that $X^{\prime}$ is much larger than 1 by the choice of parameters. Apart from the invariance of $\Lambda_{\operatorname{per}}$ under translation by $Q!$ , the key point here is that, for each fixed $h^{\prime}$ , the shifted average differs from the original one by at most $X^{-1}\sum_{h\in[X,X+X^{\prime}Q!]}|\Lambda_{\operatorname{per}}(h)|\ll\frac{X^{\prime}Q!}{X}\log(1/\delta)^{O(1)}$ by ˜4.17 (and a similar term corresponding to the edge effects near $2X$ ).

In the display above, consider the average over $n,h^{\prime}$ (for fixed $h,t,t^{\prime},p^{\prime}_{1},p^{\prime}_{2}$ ). The point now is that, from the point of view of logarithmic averages, $np^{\prime}_{2}+\lambda Q!h^{\prime}p^{\prime}_{1}$ may be regarded as essentially just varying over $[N]$ . More precisely, applying Lemma˜A.3 with $q=p^{\prime}_{2}$ , $b=\lambda Q!p^{\prime}_{1}$ , $H:=X^{\prime}=\lfloor\delta^{5}X/Q!\rfloor$ and $f(x):=f_{1}(x+\lambda hp^{\prime}_{1}+\lambda tQ!(p^{\prime 2}_{1}-p^{\prime 2}_{2}))\overline{f_{1}(x+\lambda hp^{\prime}_{1}+\lambda t^{\prime}Q!(p^{\prime 2}_{1}-p^{\prime 2}_{2}))}$ , we may replace the above with

	$\displaystyle\mathbb{E}_{h\in[X,2X)}\|\Lambda_{\operatorname{per}}(h)\|\mathbb{E}_{t,t^{\prime}\in[P_{1}^{1/2}]}\mathbb{E}_{n\in[N],p_{1}^{\prime},p^{\prime}_{2}\in\mathscr{P}^{\prime}}^{\log}$	$\displaystyle\mathbf{1}_{(p^{\prime}_{1},p^{\prime}_{2})=1}f_{1}\big(n+\lambda hp^{\prime}_{1}+\lambda tQ!(p^{\prime 2}_{1}-p^{\prime 2}_{2})\big)$
		$\displaystyle\times\overline{f_{1}\big(n+\lambda hp^{\prime}_{1}+\lambda t^{\prime}Q!(p^{\prime 2}_{1}-p^{\prime 2}_{2})\big)}\gg\delta^{4}.$		(5.25)

Let us comment on the application of Lemma˜A.3. First, we used that $q=p^{\prime}_{2}$ and $b=\lambda Q!p^{\prime}_{1}$ are coprime. That $(p^{\prime}_{2},\lambda)=1$ follows from the assumption that all prime factors of $\lambda$ are less than $P^{\prime}_{1}$ , and that $(p^{\prime}_{2},Q!)=1$ follows using that $P^{\prime}_{1}$ is much larger than $\delta^{-C_{2}}$ . The error terms $O\big(\frac{\log q+\log bh}{\log N}\big)$ and $O\big(\frac{q}{H}\big)$ resulting from the application of Lemma˜A.3 are all $\ll\delta^{10}$ by simple verifications using the choice of parameters, the key point being that $H>P_{1}^{1/2}$ is much larger than $q$ , but $bH<P_{2}^{2}$ is much smaller than $N$ .

Applying ˜A.2, we may remove the $\lambda hp^{\prime}_{1}$ shifts in ˜5.25, allowing us to decouple the average over $h$ and thus obtain via another application of ˜4.3 that

\mathbb{E}_{t,t^{\prime}\in[P_{1}^{1/2}]}\mathbb{E}_{n\in[N],p_{1}^{\prime},p^{\prime}_{2}\in\mathscr{P}^{\prime}}^{\log}\mathbf{1}_{(p^{\prime}_{1},p^{\prime}_{2})=1}f_{1}\big(n+\lambda tQ!(p^{\prime 2}_{1}-p^{\prime 2}_{2})\big)\overline{f_{1}\big(n+\lambda t^{\prime}Q!(p^{\prime 2}_{1}-p^{\prime 2}_{2})\big)}\gg\delta^{4+\varepsilon_{0}/4}.

(5.26)

We may remove the condition $(p^{\prime}_{1},p^{\prime}_{2})=1$ (losing a further factor of 2 in the implicit constant) exactly as before, obtaining

\mathbb{E}_{t,t^{\prime}\in[P_{1}^{1/2}]}\mathbb{E}_{n\in[N],p_{1}^{\prime},p^{\prime}_{2}\in\mathscr{P}^{\prime}}^{\log}f_{1}\big(n+\lambda tQ!(p^{\prime 2}_{1}-p^{\prime 2}_{2})\big)\overline{f_{1}\big(n+\lambda t^{\prime}Q!(p^{\prime 2}_{1}-p^{\prime 2}_{2})\big)}\gg\delta^{4+\varepsilon_{0}/4}>\delta^{5}.

To analyse this, we will eventually use the diophantine nature of suitable sets $\{(p^{\prime})^{2}:p^{\prime}\in\mathscr{P}^{\prime}\}$ , applying Lemma˜3.2 in the case $j=2$ . To prepare the ground, we must again foliate into appropriate ‘subdyadic products’ as we did in the analysis of ˜5.9 leading to ˜5.12. With notation exactly the same as in that analysis, we may locate $\mathscr{P}^{\prime}_{1}:=\mathscr{P}^{\prime}_{\vec{j}}$ and $\mathscr{P}^{\prime}_{2}:=\mathscr{P}^{\prime}_{\vec{j}^{\prime}}$ such that

\mathbb{E}_{t,t^{\prime}\in[P_{1}^{1/2}]}\mathbb{E}^{\log}_{n\in[N]}\mathbb{E}_{p_{1}^{\prime}\in\mathscr{P}^{\prime}_{1},p^{\prime}_{2}\in\mathscr{P}^{\prime}_{2}}f_{1}\big(n+\lambda tQ!(p^{\prime 2}_{1}-p^{\prime 2}_{2})\big)\overline{f_{1}\big(n+\lambda t^{\prime}Q!(p^{\prime 2}_{1}-p^{\prime 2}_{2})\big)}\geqslant\delta^{5}.

Note here that we were able to replace the logarithmic average over the $p^{\prime}_{i}$ variables by a uniform average since these are now dyadically localised, and each $t,t^{\prime}$ -average is nonnegative. Suppose that $\mathscr{P}^{\prime}_{1}\subset[Y_{1},2Y_{1}]$ and $\mathscr{P}^{\prime}_{2}\subset[Y_{2},2Y_{2}]$ , where $P^{\prime}_{1}\leqslant Y_{1},Y_{2}\leqslant P^{\prime}_{2}$ . Without loss of generality, $Y_{1}\geqslant Y_{2}$ . Pigeonholing in $p^{\prime}_{2}$ , we see that there is some $p^{\prime}_{2}$ such that

\mathbb{E}_{t,t^{\prime}\in[P_{1}^{1/2}]}\mathbb{E}_{n\in[N]}^{\log}\mathbb{E}_{p_{1}^{\prime}\in\mathscr{P}^{\prime}_{1}}f_{1}\big(n+\lambda tQ!(p^{\prime 2}_{1}-p^{\prime 2}_{2})\big)\overline{f_{1}\big(n+\lambda t^{\prime}Q!(p^{\prime 2}_{1}-p^{\prime 2}_{2})\big)}\geqslant\delta^{5}.

By Lemma˜A.2 with modulus $q=\lambda Q!$ , this gives

\mathbb{E}_{a\in\{0,1,\dots,\lambda Q!-1\}}\mathbb{E}^{\log}_{n\in[N]}\mathbb{E}_{p_{1}^{\prime}\in\mathscr{P}^{\prime}_{1},t,t^{\prime}\in[P_{1}^{1/2}]}f_{1,a}\big(n+t(p^{\prime 2}_{1}-p^{\prime 2}_{2})\big)\overline{f_{1,a}\big(n+t^{\prime}(p^{\prime 2}_{1}-p^{\prime 2}_{2})\big)}\geqslant\delta^{6}

where $f_{1,a}(n):=f_{1}(\lambda Q!n+a)$ . For each fixed $a$ , the inner average is of the form ˜2.3, with $S:=\{p^{\prime 2}_{1}-p^{\prime 2}_{2}:p^{\prime}_{1}\in\mathscr{P}^{\prime}_{1}\}$ and $\delta$ replaced by $\delta^{6}$ . We showed in Lemma˜3.2 (with $j=2$ ) that $S+p^{\prime 2}_{2}=\{p^{\prime 2}_{1}:p^{\prime}_{1}\in\mathscr{P}^{\prime}_{1}\}$ is $(L_{2},k,Y_{1}^{2})$ -Diophantine (the condition $\min_{i}M_{i}>Q^{L_{2}}$ in that lemma follows using ˜5.2), and so by translation invariance of the notion of diophantine, the same is true of $S$ . Observe that $S\subset[-4Y_{1}^{2},4Y_{1}^{2}]$ . Thus we may aim to apply Lemma˜2.4 with $S=\{p_{1}^{\prime 2}-p_{2}^{\prime 2}:p^{\prime}_{1}\in\mathscr{P}^{\prime}_{1}\}$ , $T:=\lfloor P_{1}^{1/2}\rfloor$ , $(L,L^{\prime},D)=(L_{2},k,Y_{1}^{2})$ , and $\delta$ replaced by $\delta^{6}$ . There are three conditions to be checked, namely that $D,T\geqslant(L^{\prime}/\delta^{6})^{8L}=(k/\delta^{6})^{8L_{2}}$ , and that $\frac{\log TD}{\log N}\leqslant(\delta^{6}/k)^{50L_{2}}$ .

The first condition, involving $D=Y_{1}^{2}$ , is immediate from $Y_{1}\geqslant P^{\prime}_{1}$ and the parameter hierarchy. The second condition, involving $T=\lfloor P_{1}^{1/2}\rfloor$ , is also immediate. For the third condition note that $TY_{1}^{2}\geqslant T\geqslant P_{1}^{1/2}$ and $(\delta^{6}/k)^{50L_{2}}$ is much smaller than $P_{1}^{1/4}$ .

Thus the appeal to Lemma˜2.4 is indeed valid, and we are free to take any $H=P_{1}^{1/4}$ in this application. Recalling that $k\leqslant\delta^{-10}$ , the conclusion of Lemma˜2.4 that for each $a$ there is $q_{a}\leqslant(k/\delta)^{O(1)}\leqslant\delta^{-O(1)}$ such that $\|f_{1,a}\|_{U^{1}_{\log}[N;q_{a},P_{1}^{1/4}]}\gg\delta^{O(1)}$ . By pigeonhole there is a set of $\geqslant\delta^{O(1)}\lambda Q!$ values of $a$ such that $q_{a}$ does not depend on $a$ . Denote this common value by $q$ (which is of course not the same quantity as in the application of Lemma˜A.2 above). It follows that

\mathbb{E}_{a\in\{0,1,\dots,\lambda Q!-1\}}\|f_{1,a}\|^{2}_{U^{1}_{\log}[N;q_{a},H]}\gg\delta^{O(1)},

that is to say

\mathbb{E}_{a\in\{0,1,\dots,\lambda Q!-1\}}\mathbb{E}_{n\in[N]}^{\log}\mathbb{E}_{h,h^{\prime}\in[P_{1}^{1/4}]}f_{1,a}(n+qh)\overline{f_{1,a}(n+qh^{\prime})}\geqslant\delta^{O(1)}.

A further application of Lemma˜A.2 then yields

\mathbb{E}_{n\in[N]}^{\log}\mathbb{E}_{h,h^{\prime}\in[P_{1}^{1/4}]}f_{1}(n+\lambda hqQ!)\overline{f_{1}(n+\lambda h^{\prime}qQ!)}\gg\delta^{O(1)},

which is the statement

\|f_{1}\|^{2}_{U^{1}_{\log}[N;\lambda qQ!,P_{1}^{1/4}]}\gg\delta^{O(1)}.

Finally, let $H\leqslant P_{1}^{1/8}$ be as in the statement of Proposition˜5.1. Set $C:=2C_{2}$ and $V:=\lfloor\delta^{-C}\rfloor!$ . Note that $qQ!\mid V$ . Therefore by Lemma˜A.6 we have

\|f_{1}\|_{U^{1}_{\log}[N;\lambda V;H]}\geqslant\|f_{1}\|_{U^{1}_{\log}[N;\lambda qQ!,P_{1}^{1/4}]}-O\Big(\frac{\log|P_{1}^{1/4}\lambda qQ!|}{\log N}\Big)-O\Big(\frac{HV}{P_{1}^{1/4}qQ!}\Big)\gg\delta^{O(1)},

where the error terms can be estimated crudely bearing in mind the comments in Section˜5.1 (essentially, $P_{1}$ is much smaller than $N$ but much larger than all other variables). This concludes the proof of Proposition˜5.1. ∎

6. Averaging projections and orthogonality

In the introduction we discussed certain ‘projection’ operators $\Pi^{\operatorname{sml}},\Pi^{\operatorname{lrg}}$ . In this section we introduce the general class of such operators and establish some of their basic properties.

Definition 6.1.

Let $f:\mathbf{Z}\rightarrow\mathbf{C}$ be a function. Suppose that $q,H\in\mathbf{N}$ . Then we define

\Pi_{q,H}f(n):=\mathbb{E}_{h,h^{\prime}\in[H]}f(n+q(h-h^{\prime})).

Whilst we informally think of these maps as projections, this is not quite accurate as $\Pi_{q,H}\Pi_{q,H}f\neq\Pi_{q,H}f$ . The first observation we require is that $\Pi_{q,H}f$ has an almost periodicity property.

Lemma 6.2.

Let $f:\mathbf{Z}\rightarrow\mathbf{C}$ be a 1-bounded function. Let $q,H\in\mathbf{N}$ . Then, for any $h$ we have

\Pi_{q,H}f(n+qh)=\Pi_{q,H}f(n)+O(\frac{|h|}{H}).

Proof.

The LHS may be expanded as $\mathbb{E}_{h_{1},h^{\prime}_{1}\in[H]}f(n+q(h+h_{1}-h^{\prime}_{1}))$ . The result then follows from ˜A.1. ∎

A crucial feature of the maps $\Pi_{q,H}$ is that they essentially preserve the $U^{1}_{\log}$ -norms (see Definition˜2.3). Indeed we have the following lemma.

Lemma 6.3.

Let $q\in\mathbf{N}$ and $H^{\prime}\leqslant H$ . Then for $f:\mathbf{Z}\to\mathbf{C}$ which is $1$ -bounded, we have that $\|\Pi_{q,H^{\prime}}f-f\|_{U^{1}_{\log}[N;q,H]}\ll H^{\prime}/H$ .

Proof.

First recall that by definition Definition˜2.3 we have

\|g\|_{U^{1}_{\log}[N;q,H]}^{2}=\mathbb{E}_{n\in[N]}^{\log}\big|\mathbb{E}_{h\in[H]}g(n+hq)\big|^{2}.

(6.1)

Note that by ˜A.1 we have

\mathbb{E}_{h\in[H]}\Pi_{q,H^{\prime}}f(n+hq)=\mathbb{E}_{h\in[H],h^{\prime}_{1},h^{\prime}_{2}\in[H^{\prime}]}f(n+q(h+h^{\prime}_{1}-h^{\prime}_{2}))=\mathbb{E}_{h\in[H]}f(n+hq)+O\Big(\frac{H^{\prime}}{H}\Big).

The desired result follows immediately upon taking $g=f-\Pi_{q,H^{\prime}}f$ in ˜6.1. ∎

We next require an approximate Pythagoras relation for projections $\Pi_{H,q},\Pi_{H^{\prime},q^{\prime}}$ .

Lemma 6.4.

Let $q,q^{\prime},H,H^{\prime}$ be parameters with $q\mid q^{\prime}$ and $H^{\prime}\leqslant H$ . Let $f:\mathbf{Z}\rightarrow\mathbf{C}$ be a $1$ -bounded function. We have that

\mathbb{E}_{n\in[N]}^{\log}\big|\Pi_{q^{\prime},H^{\prime}}f(n)-\Pi_{q,H}f(n)\big|^{2}\leqslant\mathbb{E}_{n\in[N]}^{\log}\big|\Pi_{q^{\prime},H^{\prime}}f(n)\big|^{2}-\mathbb{E}_{n\in[N]}^{\log}\big|\Pi_{q,H}f(n)\big|^{2}+O\Big(\frac{\log q^{\prime}H}{\log N}+\frac{q^{\prime}H^{\prime}}{qH}\Big).

Proof.

For brevity we write $\langle g_{1},g_{2}\rangle:=\mathbb{E}_{n\in[N]}^{\log}g_{1}(n)\overline{g_{2}(n)}$ and $\|g\|^{2}:=\langle g,g\rangle=\mathbb{E}_{n\in[N]}^{\log}|g(n)|^{2}$ .

We first expand the LHS as

\|\Pi_{q^{\prime},H^{\prime}}f\|^{2}+\|\Pi_{q,H}f\|^{2}-\langle\Pi_{q^{\prime},H^{\prime}}f,\Pi_{q,H}f\rangle-\overline{\langle\Pi_{q^{\prime},H^{\prime}}f,\Pi_{q,H}f\rangle}.

(6.2)

Expanding the definitions, we have

\langle\Pi_{q^{\prime},H^{\prime}}f,\Pi_{q,H}f\rangle=\mathbb{E}_{n\in[N]}^{\log}\mathbb{E}_{h_{1},h_{2}\in[H],h^{\prime}_{1},h^{\prime}_{2}\in[H^{\prime}]}f(n+q^{\prime}(h^{\prime}_{1}-h^{\prime}_{2}))\overline{f(n+q(h_{1}-h_{2}))}.

Substitute $n=n^{\prime}+qh_{2}-q^{\prime}h^{\prime}_{1}$ ; then, dropping the dash on $n^{\prime}$ , we see from ˜A.2 that this is

\mathbb{E}_{n\in[N]}^{\log}\mathbb{E}_{h_{1},h_{2}\in[H],h^{\prime}_{1},h^{\prime}_{2}\in[H^{\prime}]}f(n+qh_{2}-q^{\prime}h^{\prime}_{2})\overline{f(n+qh_{1}-q^{\prime}h^{\prime}_{1}))}+O\Big(\frac{\log q^{\prime}H}{\log N}\Big),

which equals

\mathbb{E}_{n\in[N]}^{\log}\big|\mathbb{E}_{h\in[H],h^{\prime}\in[H^{\prime}]}f(n+qh-q^{\prime}h^{\prime})\big|^{2}+O\Big(\frac{\log q^{\prime}H}{\log N}\Big).

Now by ˜A.1 (using here that $q\mid q^{\prime}$ ) we have

\mathbb{E}_{h\in[H],h^{\prime}\in[H^{\prime}]}f(n+qh-q^{\prime}h^{\prime})=\mathbb{E}_{h\in[H]}f(n+qh)+O\Big(\frac{q^{\prime}H^{\prime}}{qH}\Big).

Therefore, putting these observations together we obtain

\langle\Pi_{q^{\prime},H^{\prime}}f,\Pi_{q,H}f\rangle=\mathbb{E}_{n\in[N]}^{\log}\big|\mathbb{E}_{h\in[H]}f(n+qh)\big|^{2}+O\Big(\frac{q^{\prime}H^{\prime}}{qH}+\frac{\log q^{\prime}H}{\log N}\Big).

Taking complex conjugates and adding, we obtain

\langle\Pi_{q^{\prime},H^{\prime}}f,\Pi_{q,H}f\rangle+\overline{\langle\Pi_{q^{\prime},H^{\prime}}f,\Pi_{q,H}f\rangle}=2\mathbb{E}_{n\in[N]}^{\log}\big|\mathbb{E}_{h\in[H]}f(n+qh)\big|^{2}+O\Big(\frac{q^{\prime}H^{\prime}}{qH}+\frac{\log q^{\prime}H}{\log N}\Big).

Now by a further application of ˜A.2,

\mathbb{E}_{n\in[N]}^{\log}\big|\mathbb{E}_{h\in[H]}f(n+qh)\big|^{2}=\mathbb{E}_{h^{\prime}\in[H]}\mathbb{E}_{n\in[N]}^{\log}\big|\mathbb{E}_{h\in[H]}f(n+q(h-h^{\prime}))\big|^{2}+O\Big(\frac{\log qH}{\log N}\Big),

and by Cauchy–Schwarz this is at least

\mathbb{E}_{n\in[N]}^{\log}\big|\mathbb{E}_{h,h^{\prime}\in[H]}f(n+q(h-h^{\prime}))\big|^{2}+O(\frac{\log qH}{\log N})=\|\Pi_{q,H}f\|^{2}+O\Big(\frac{\log qH}{\log N}\Big).

It follows that

\langle\Pi_{q^{\prime},H^{\prime}}f,\Pi_{q,H}f\rangle+\overline{\langle\Pi_{q^{\prime},H^{\prime}}f,\Pi_{q,H}f\rangle}\geqslant 2\|\Pi_{q,H}f\|^{2}+O\Big(\frac{q^{\prime}H^{\prime}}{qH}+\frac{\log q^{\prime}H}{\log N}\Big).

Substituting in to ˜6.2 gives the lemma.∎

We now give the ‘maximal function’ argument which was hinted at in the introduction where we explained how to move from ˜1.1 to ˜1.2.

Lemma 6.5.

Let $f,g:\mathbf{N}\rightarrow\mathbf{C}$ be non-negative $1$ -bounded functions. Let $\delta\in(0,\frac{1}{2})$ and let $H,q$ be positive integer parameters with $\frac{\log Hq}{\log N}<c\delta^{2}$ . Suppose that $\mathbb{E}_{n\in[N]}^{\log}f(n)g(n)\geqslant\delta$ . Then $\mathbb{E}_{n\in[N]}^{\log}(\Pi_{q,H}f)(n)g(n)\geqslant\delta^{2}/8$ .

Proof.

Write $\Pi=\Pi_{q,H}$ for brevity. Set $\varepsilon:=\delta/4$ and denote $h(n):=1_{\Pi f(n)>\varepsilon}$ . Then since $0\leqslant fh\leqslant 1$ and $(\Pi f)h\geqslant\varepsilon h$ pointwise we have

\mathbb{E}_{n\in[N]}^{\log}(\Pi f)(n)g(n)\geqslant\mathbb{E}_{n\in[N]}^{\log}f(n)(\Pi f)(n)g(n)h(n)\geqslant\varepsilon\mathbb{E}_{n\in[N]}^{\log}f(n)h(n)g(n).

Therefore we are done if we can show that $\mathbb{E}_{n\in[N]}^{\log}f(n)(1-h(n))\leqslant\delta/2$ , that is to say

\mathbb{E}_{n\in[N]}^{\log}f(n)1_{\Pi f(n)\leqslant\varepsilon}\leqslant\delta/2.

(6.3)

Write $F(n):=f(n)1_{\Pi f(n)\leqslant\varepsilon}$ . Since $F\leqslant f$ pointwise, we have $\Pi F\leqslant\Pi f$ pointwise, and so if $F(n)\neq 0$ then we have $\Pi F(n)\leqslant\Pi f(n)\leqslant\varepsilon$ . It follows that using ˜A.2 and Cauchy–Schwarz that

	$\displaystyle\big\|\mathbb{E}_{n\in[N]}^{\log}F(n)\big\|^{2}$	$\displaystyle=\big\|\mathbb{E}_{n\in[N]}^{\log}\mathbb{E}_{h\in[H]}F(n+hq)\big\|^{2}+O\Big(\frac{\log Hq}{\log N}\Big)$
		$\displaystyle\leqslant\mathbb{E}_{n\in[N]}^{\log}\big\|\mathbb{E}_{h\in[H]}F(n+hq)\big\|^{2}+O\Big(\frac{\log Hq}{\log N}\Big)$
		$\displaystyle=\mathbb{E}_{n\in[N]}^{\log}\mathbb{E}_{h,h^{\prime}\in[H]}F(n+hq)F(n+h^{\prime}q)+O\Big(\frac{\log Hq}{\log N}\Big)$
		$\displaystyle\leqslant\mathbb{E}_{n\in[N]}^{\log}\mathbb{E}_{h,h^{\prime}\in[H]}F(n)F(n+(h-h^{\prime})q)+O\Big(\frac{\log Hq}{\log N}\Big)$
		$\displaystyle=\mathbb{E}^{\log}_{n\in[N]}F(n)(\Pi F)(n)+O\Big(\frac{\log Hq}{\log N}\Big)$
		$\displaystyle\leqslant\varepsilon\mathbb{E}^{\log}_{n\in[N]}F(n)+\varepsilon^{2}.$

It follows that $\mathbb{E}_{n\in[N]}^{\log}F(n)\leqslant 2\varepsilon$ , so the claim ˜6.3 follows due to the choice of $\varepsilon$ . ∎

We note a corollary under the same conditions which is good for taking averages, namely that for any $\eta$

\mathbb{E}_{n\in[N]}^{\log}\Pi f(n)g(n)\geqslant\frac{\eta}{8}\mathbb{E}^{\log}_{n\in[N]}f(n)g(n)-\frac{\eta^{2}}{8}.

(6.4)

Indeed, if we write $\delta:=\mathbb{E}_{n\in[N]}^{\log}f(n)g(n)$ then ˜6.4 is trivial for $\delta\leqslant\eta$ , while for $\delta\geqslant\eta$ it follows from Lemma˜6.5.

7. Proof of the main theorem

We are now ready to prove our main result, Theorem˜1.1. The reader may find it helpful to revisit the overview given in the introduction.

7.1. Setting up parameters.

We begin by defining parameters and scales to be used in the proof.

Let $r$ be the number of colours; we will fix this for the remainder of the proof and we may assume it is sufficiently large. Let $C_{0}$ be a suitable large positive integer (independent of $r$ ), recall that $\varepsilon_{0}:=\frac{1}{10}$ , and set

K:=C_{0}r^{8},\quad t:=K^{2},\quad V=(\lceil r^{4+\varepsilon_{0}}\rceil^{C})!\quad\mbox{and}\quad N:=\exp\exp(r^{50}),

(7.1)

where here $C$ is the constant in Proposition˜5.1. Define

B_{0}:=\{V^{4^{i}}:i=1,2,\dots,K^{2}\}.

(7.2)

We now define a doubly-indexed sequence of positive integer scales $(H_{i,j})_{i\in[t],j\in[2K]}$ by

H_{i,j}:=\lfloor\exp\exp(r^{25}(4Ki+j))\rfloor.

(7.3)

Note that we have the crude bounds

\exp\exp((\log\log N)^{1/10})<\max B_{0}<H_{1,1}<\cdots<H_{1,2K}<H_{2,1}<\cdots<H_{t,2K}<e^{(\log N)^{1/10}},

(7.4)

provided $r$ is large enough. We will also use the auxiliary scales $H_{i,0}$ defined by the same formula ˜7.3. For $i\in[t]$ and $j\in[K]$ , define $\mathscr{P}_{i,j}$ to be the set of primes satisfying $H_{i,2j-1}\leqslant p\leqslant H_{i,2j}$ . We note that with this choice of parameters we have, by Mertens’ theorem, $\sum_{p\in\mathscr{P}_{i,j}}\frac{1}{p}\gg r^{25}$ .

7.2. Positivity for $x,xy$

The first step of the proof is to isolate the colour class in which we will eventually find our configuration $\{x+y,xy\}$ , and to show that it is rich in configurations $\{x,xy\}$ . This is a mild variant of [Ric25, Theorem 3.6], which itself is related to results of Ahlswede, Khachatrian and Sárközy [AKS99] and Davenport and Erdős [DE36].

Consider an $r$ -colouring $A_{1}\cup\cdots\cup A_{r}=[N]$ . For each $b\in B_{0}$ we have

\mathbb{E}_{n\in[N]}^{\log}\sum_{j=1}^{r}\mathbf{1}_{A_{j}}(bn)=\mathbb{E}_{n\in[N]}^{\log}1_{[N]}(bn)=\frac{H_{N/b}}{H_{N}}\geqslant\tfrac{1}{2},

where here $H_{N}$ denotes the harmonic sum. The last bound here follows (comfortably) using ˜7.4. By summing over all $b\in B_{0}$ and an appeal to the pigeonhole principle, there is some colour class $A=A_{j}$ such that

\sum_{b\in B_{0}}\mathbb{E}_{n\in[N]}^{\log}1_{A}(bn)\geqslant K^{2}/2r,

which implies that $\mathbb{E}_{n\in[N]}^{\log}1_{A}(bn)\geqslant 1/4r$ for at least $K^{2}/4r\geqslant K$ elements $b\in B_{0}$ . Fix a set $B\subset B_{0}$ of $K$ such elements. We fix the colour class $A$ for the remainder of the proof.

By repeated applications of Lemma˜A.5, we have

\mathbb{E}^{\log}_{n\in[N],p_{i,1}\in\mathscr{P}_{i,1},\dots,p_{i,j}\in\mathscr{P}_{i,j}}1_{A}(bp_{i,1}\cdots p_{i,j}n)\geqslant 1/8r

for any $i\in[t]$ , any $j\leqslant K$ and for any $b\in B$ . Note here that the error term arising from this repeated application of Lemma˜A.5 is dominated by $\ll K\max_{i,j}\big(\sum_{p\in\mathscr{P}_{i,j}}\frac{1}{p}\big)^{-1/2}\ll Kr^{-25/2}\ll r^{-3}$ .

Let the elements of $B$ be $b_{1}<\cdots<b_{K}$ . Then, applying the above with $b=b_{j}$ and summing over $1\leqslant j\leqslant K$ , we obtain

\sum_{j=1}^{K}\mathbb{E}^{\log}_{n\in[N],p_{i,1}\in\mathscr{P}_{i,1},\dots,p_{i,K}\in\mathscr{P}_{i,K}}1_{A}(b_{j}p_{i,1}\cdots p_{i,j}n)\geqslant\frac{K}{8r}.

(Note here that, for the term with index $j$ , we can include the extra averages over $\mathscr{P}_{i,j+1},\dots\mathscr{P}_{i,K}$ with no change to the expression.) By Cauchy–Schwarz it follows that

\mathbb{E}^{\log}_{n\in[N],p_{i,1}\in\mathscr{P}_{i,1},\dots,p_{i,K}\in\mathscr{P}_{i,K}}\sum_{1\leqslant j,j^{\prime}\leqslant K}1_{A}(b_{j}p_{i,1}\cdots p_{i,j}n)1_{A}(b_{j^{\prime}}p_{i,1}\cdots p_{i,j^{\prime}}n)\geqslant 2^{-6}\big(\frac{K}{r}\big)^{2}.

Since $K=r^{8}$ , if $r$ is large enough we may exclude the $O(K)$ pairs of indices with $|j-j^{\prime}|\leqslant 1$ at the loss of at most a factor $2$ . By symmetry we are also free to only include the pairs with $j>j^{\prime}$ (at the loss of another factor of 2), and we thereby obtain

\mathbb{E}^{\log}_{n\in[N],p_{i,1}\in\mathscr{P}_{i,1},\dots,p_{i,K}\in\mathscr{P}_{i,K}}\sum_{\begin{subarray}{c}1\leqslant j^{\prime}<j\leqslant K\\ j\geqslant j^{\prime}+2\end{subarray}}1_{A}(b_{k}p_{i,1}\cdots p_{i,j^{\prime}}n)1_{A}(b_{j}p_{i,1}\cdots p_{i,j}n)\geqslant 2^{-8}\big(\frac{K}{r}\big)^{2}.

(7.5)

By another repeated application of Lemma˜A.5 we have

	$\displaystyle\mathbb{E}^{\log}_{n\in[N],p_{i,1}\in\mathscr{P}_{i,1},\dots,p_{i,K}\in\mathscr{P}_{i,K}}1_{A}(b_{j^{\prime}}p_{i,1}\cdots p_{i,j^{\prime}}n)1_{A}(b_{j}p_{i,1}\cdots p_{i,j}n)$
	$\displaystyle\qquad\qquad=\mathbb{E}^{\log}_{n\in[N],p_{i,1}\in\mathscr{P}_{i,1},\dots,p_{i,K}\in\mathscr{P}_{i,K}}1_{A}(b_{j^{\prime}}n)1_{A}(b_{j}p_{i,j^{\prime}+1}\cdots p_{i,j}n)+O(r^{-3})$

for each pair $j,j^{\prime}$ with $j>j^{\prime}$ . From this and ˜7.5, it follows (again assuming $r$ large enough) that

\mathbb{E}^{\log}_{n\in[N],p_{i,1}\in\mathscr{P}_{i,1},\dots,p_{i,K}\in\mathscr{P}_{i,K}}\sum_{\begin{subarray}{c}1\leqslant j^{\prime}<j\leqslant K\\ j\geqslant j^{\prime}+2\end{subarray}}1_{A}(b_{j^{\prime}}n)1_{A}(b_{j}p_{i,j^{\prime}+1}\cdots p_{i,j}n)\geqslant 2^{-9}\big(\frac{K}{r}\big)^{2}.

Recall that this is true for all $i\in[t]$ . By pigeonhole, for each $i$ there is some $j^{\prime}(i)$ such that

\mathbb{E}^{\log}_{n\in[N],p_{i,1}\in\mathscr{P}_{i,1},\dots,p_{i,K}\in\mathscr{P}_{i,K}}\sum_{j=j^{\prime}(i)+2}^{K}1_{A}(b_{j^{\prime}(i)}n)1_{A}(b_{j}p_{i,j^{\prime}(i)+1}\cdots p_{i,j}n)\geqslant 2^{-9}\frac{K}{r^{2}}.

Pass to a subset $I\subset[t]$ of size at least $t/K$ such that $j^{\prime}(i)$ does not depend on $i\in I$ , and denote by $j^{\prime}$ the common value of these $j^{\prime}(i)$ . Writing $b:=b_{j^{\prime}}$ and $f(n):=1_{A}(bn)$ , we then have

\mathbb{E}^{\log}_{n\in[N],p_{i,1}\in\mathscr{P}_{i,1},\dots,p_{i,K}\in\mathscr{P}_{i,K}}\sum_{j=j^{\prime}+2}^{K}f(n)1_{A}(b_{j}p_{i,j^{\prime}+1}\cdots p_{i,j}n)\geqslant 2^{-9}\frac{K}{r^{2}}

(7.6)

for all $i\in I$ . Fix this choice of $j^{\prime}$ (and hence of $b=b_{j^{\prime}}$ and the function $f$ ) for the rest of the proof. Define also $I_{*}:=I\setminus\{\max I\}$ to be the elements of $I$ except the largest one; thus $|I_{*}|\geqslant|I|/2$ .

7.3. Proof of the main theorem

We think of pairs $(i,j)$ (with $i\in I_{*}$ and $j\geqslant j^{\prime}+2$ ) as ‘scales’ in the proof. Associated to any scale will be a pair of ‘projection’ operators in the sense of Definition˜6.1. Define $Q_{j}:=b_{j}/b^{2}V$ . Note that $Q_{j}$ is an integer (in fact it equals $V^{4^{j}-2\cdot 4^{j^{\prime}}+1}$ ).

For each pair $(i,j)$ there will be two important projection operators $\Pi$ , namely

\Pi^{\operatorname{sml}}_{i,j}:=\Pi_{Q_{j-1},H_{i_{+},0}}\quad\mbox{and}\quad\Pi^{\operatorname{lrg}}_{i,j}:=\Pi_{Q_{j},H_{i,0}}.

(7.7)

Here, $i_{+}$ denotes the next largest element in $I$ after $i$ , which exists since $i\in I_{*}=I\setminus\{\max I\}$ . We informally refer to these as the ‘small’ and ‘large’ projections associated to $(i,j)$ .

We first apply the small projection operator to ˜7.6 using Lemma˜6.5, or more accurately ˜6.4. Taking $\eta=2^{-10}r^{-2}$ there, we have

\mathbb{E}^{\log}_{n\in[N],p_{i,*}\in\mathscr{P}_{i,*}}\sum_{j=j^{\prime}+2}^{K}\Pi_{i,j}^{\operatorname{sml}}f(n)1_{A}(b_{j}p_{i,k+1}\cdots p_{i,j}n)\\ \geqslant\frac{\eta}{8}(2^{-9}\frac{K}{r^{2}})-\frac{\eta^{2}}{8}K\gg\frac{K}{r^{4}}.

(7.8)

Here, and below, $\mathbb{E}^{\log}_{p_{i,*}\in\mathscr{P}_{i,*}}$ is shorthand for $\mathbb{E}^{\log}_{p_{i,1}\in\mathscr{P}_{i,1},\dots,p_{i,K}\in\mathscr{P}_{i,K}}$ . Now observe that by Lemma˜6.2 we have

	$\displaystyle\Pi_{i,j}^{\operatorname{sml}}f(n)$	$\displaystyle=\Pi_{i,j}^{\operatorname{sml}}f\big(n+\frac{b_{j}}{b^{2}}p_{i,j^{\prime}+1}\cdots p_{i,j}\big)+O\Big(\frac{b_{j}}{b^{2}}\frac{p_{i,j^{\prime}+1}\cdots p_{i,j}}{H_{i_{+},0}}\Big)$
		$\displaystyle=\Pi_{i,j}^{\operatorname{sml}}f\big(n+\frac{b_{j}}{b^{2}}p_{i,j^{\prime}+1}\cdots p_{i,j}\big)+O(r^{-10}).$		(7.9)

The key points to observe here in applying Lemma˜6.2 are that $Q_{j-1}=\frac{b_{j-1}}{b^{2}}V\mid\frac{b_{j}}{b^{2}}$ by the definitions of the $b_{j}$ s, and also

\frac{b_{j}}{b^{2}}p_{i,j^{\prime}+1}\cdots p_{i,j}\leqslant V^{4^{K^{2}}}\prod_{j=1}^{K}H_{i,2j}<V^{4^{K^{2}}}H_{i,2K}^{2}<r^{-10}H_{i_{+},0}.

The inequalities here are all very comfortably true (when $r$ is large); we have $r^{10}<V^{4^{K^{2}}}<H_{1,1}<H_{i,2K}$ , that $H_{i,2j}^{2}<H_{i,2(j+1)}$ for all $j$ , and that $H_{i,2K}^{4}<H_{i_{+},0}$ , all of which follow using ˜7.4. From ˜7.8 and 7.9 we have

\sum_{j=j^{\prime}+2}^{K}\mathbb{E}^{\log}_{n\in[N],p_{i,*}\in\mathscr{P}_{i,*}}\Pi_{i,j}^{\operatorname{sml}}f\big(n+\frac{b_{j}}{b^{2}}p_{i,j^{\prime}+1}\cdots p_{i,j}\big)1_{A}(b_{j}p_{i,j^{\prime}+1}\cdots p_{i,j}n)\gg\frac{K}{r^{4}}.

This, recall, is for all $i\in I_{*}$ . Summing over all these $i$ gives

\sum_{i\in I_{*}}\sum_{j=j^{\prime}+2}^{K}\mathbb{E}^{\log}_{n\in[N],p_{i,*}\in\mathscr{P}_{i,*}}\Pi_{i,j}^{\operatorname{sml}}f\big(n+\frac{b_{j}}{b^{2}}p_{i,j^{\prime}+1}\cdots p_{i,j}\big)1_{A}(b_{j}p_{i,j^{\prime}+1}\cdots p_{i,j}n)\gg\frac{K|I|}{r^{4}}.

(7.10)

Suppose we had a similar result with $\Pi_{i,j}^{\operatorname{sml}}f$ replaced by $f$ , that is

\sum_{i\in I_{*}}\sum_{j=j^{\prime}+2}^{K}\mathbb{E}^{\log}_{n\in[N],p_{i,*}\in\mathscr{P}_{i,*}}f\big(n+\frac{b_{j}}{b^{2}}p_{i,j^{\prime}+1}\cdots p_{i,j}\big)1_{A}(b_{j}p_{i,j^{\prime}+1}\cdots p_{i,j}n)\gg\frac{K|I|}{r^{4}}.

(7.11)

In particular, for some choice of $i,j,p_{i,j^{\prime}+1},\dots,p_{i,j}$ and $n\geqslant 3$ we would then have

f\big(n+\frac{b_{j}}{b^{2}}p_{i,j^{\prime}+1}\cdots p_{i,j}\big)1_{A}(b_{j}p_{i,j^{\prime}+1}\cdots p_{i,j}n)>0.

Taking $x:=bn$ and $y:=\frac{b_{j}}{b}p_{i,j^{\prime}+1}\cdots p_{i,j}$ (and recalling that $f(n)=1_{A}(bn)$ ) we then have $x+y,xy\in A$ , and the proof is complete.

It remains to prove that we do indeed have ˜7.11. As described in the introduction, we deduce it from ˜7.10 in two steps. First, we replace the ‘small’ projections $\Pi_{i,j}^{\operatorname{sml}}f$ in ˜7.10 by the ‘large’ projections $\Pi_{i,j}^{\operatorname{lrg}}f$ . The error in making this replacement is

\sum_{i\in I_{*}}\sum_{j=j^{\prime}+2}^{K}\mathbb{E}^{\log}_{n\in[N],p_{i,*}\in\mathscr{P}_{i,*}}\big(\Pi_{i,j}^{\operatorname{sml}}f-\Pi_{i,j}^{\operatorname{lrg}}f\big)\big(n+\frac{b_{j}}{b^{2}}p_{i,j^{\prime}+1}\cdots p_{i,j}\big)1_{A}(b_{j}p_{i,j^{\prime}+1}\cdots p_{i,j}n).

(7.12)

By ˜A.2 and the crude bounds $b_{j}\leqslant V^{4^{K^{2}}}$ , $p_{i,*}\leqslant H_{t,2K}$ this is

\sum_{i\in I_{*}}\sum_{j\geqslant j^{\prime}+2}\mathbb{E}^{\log}_{p_{i,*}\in\mathscr{P}_{i,*}}\mathbb{E}^{\log}_{n\in[N]}\big(\Pi_{i,j}^{\operatorname{sml}}f-\Pi_{i,j}^{\operatorname{lrg}}f\big)(n)\psi_{i,j,p_{i,*}}(n)+O\Big(|I|K\frac{\log(V^{4^{K^{2}}}H^{2K}_{t,2K})}{\log N}\Big),

(7.13)

where

\psi_{i,j,p_{i,*}}(n):=1_{A}\big(b_{j}p_{i,j^{\prime}+1}\cdots p_{i,j}(n-\frac{b_{j}}{b^{2}}p_{i,j^{\prime}+1}\cdots p_{i,j})\big).

For the rest of the proof (as in Lemma˜6.4) we use the notation $\langle g_{1},g_{2}\rangle:=\mathbb{E}_{n\in[N]}^{\log}g_{1}(n)\overline{g_{2}(n)}$ and $\|g\|^{2}:=\langle g,g\rangle=\mathbb{E}_{n\in[N]}^{\log}|g(n)|^{2}$ . Using ˜7.1 and 7.4, the error term in ˜7.13 is seen to be $O(|I|Kr^{-10})$ . Thus ˜7.13 is

\sum_{i\in I}\sum_{j=j^{\prime}+2}^{K}\mathbb{E}^{\log}_{p_{i,*}\in\mathscr{P}_{i,*}}\langle\Pi_{i,j}^{\operatorname{sml}}f-\Pi_{i,j}^{\operatorname{lrg}}f,\psi_{i,j,p_{i,*}}\rangle+O(|I|Kr^{-10}).

By Cauchy–Schwarz and the $1$ -boundedness of the functions $\psi$ , this is bounded above by

	$\displaystyle\sum_{i\in I_{*}}\sum_{j=j^{\prime}+2}^{K}$	$\displaystyle\\|\Pi_{i,j}^{\operatorname{lrg}}f-\Pi_{i,j}^{\operatorname{sml}}f\\|+O(\|I\|Kr^{-10})$
		$\displaystyle\leqslant(\|I\|K)^{1/2}\Big(\sum_{i\in I_{*}}\sum_{j=j^{\prime}+2}^{K}\\|\Pi_{i,j}^{\operatorname{lrg}}f-\Pi_{i,j}^{\operatorname{sml}}f\\|^{2}\Big)^{1/2}+O(\|I\|Kr^{-10}).$		(7.14)

For each $i,j$ we apply Lemma˜6.4 with $q=Q_{j-1}$ , $q^{\prime}=Q_{j}$ , $H=H_{i_{+},0}$ and $H^{\prime}=H_{i,0}$ , obtaining

	$\displaystyle\\|\Pi_{i,j}^{\operatorname{lrg}}f-\Pi_{i,j}^{\operatorname{sml}}f\\|^{2}$	$\displaystyle\leqslant\\|\Pi_{i,j}^{\operatorname{lrg}}f\\|^{2}-\\|\Pi_{i,j}^{\operatorname{sml}}f\\|^{2}+O\Big(\frac{\log Q_{j}H_{i_{+},0}}{\log N}\Big)+O\Big(\frac{Q_{j}H_{i,0}}{Q_{j-1}H_{i_{+},0}}\Big)$
		$\displaystyle\leqslant\\|\Pi_{i,j}^{\operatorname{lrg}}f\\|^{2}-\\|\Pi_{i,j}^{\operatorname{sml}}f\\|^{2}+r^{-10}.$		(7.15)

The explain the last line here, we can bound the first error term by $<(\log N)^{-1/2}<r^{-20}$ using ˜7.4. The second error term can be bounded using $Q_{j}<\max B_{0}$ and the fact that $H_{i_{+},0}\geqslant H_{i+1,0}>r^{20}(\max B_{0})H_{i,0}$ , which can be verified using the definitions ˜7.2 and 7.3.

Summing ˜7.15 over $i,j$ gives

\sum_{i\in I_{*}}\sum_{j=j^{\prime}+2}^{K}\|\Pi_{i,j}^{\operatorname{lrg}}f-\Pi_{i,j}^{\operatorname{sml}}f\|^{2}\leqslant\sum_{i\in I_{*}}\sum_{j=j^{\prime}+2}^{K}\big(\|\Pi_{i,j}^{\operatorname{lrg}}f\|^{2}-\|\Pi_{i,j}^{\operatorname{sml}}f\|^{2}\big)+O(|I|Kr^{-10}).

Recalling the definitions ˜7.7 of the two projection operators, we see that the bracketed sum has considerable cancellation; the only uncancelled positive terms are the $\|\Pi_{i,j}^{\operatorname{lrg}}f\|^{2}$ terms from scales $(i,j)$ which are not of the form $(\overline{i}_{+},\overline{j}-1)$ for some other scale $(\overline{i},\overline{j})$ , that is to say with $i=\min(I)$ or $j=K$ ; thus the bracketed sum is bounded by $|I|+K$ . It follows that ˜7.14 is bounded by

\leqslant(|I|K)^{1/2}\big(|I|+K+O(r^{-10})\big)^{1/2}+O(|I|Kr^{-10})\ll C_{0}^{-1/2}r^{-4}|I|K,

using here that $K=C_{0}r^{8}$ and $|I|\geqslant t/K=K$ .

If the constant $C_{0}$ is chosen large enough, this means that ˜7.12 is small compared with the RHS of ˜7.10.

To summarise so far, we have replaced the ‘small’ projections $\Pi^{\operatorname{sml}}_{i,j}$ in ˜7.10 by the ‘larger’ ones $\Pi^{\operatorname{lrg}}_{i,j}$ at the loss of only the quality of the implied constant, that is to say we have shown

\sum_{i\in I_{*}}\sum_{j=j^{\prime}+2}^{K}\mathbb{E}^{\log}_{n\in[N],p_{i,*}\in\mathscr{P}_{i,*}}\Pi_{i,j}^{\operatorname{lrg}}f(n+\frac{b_{j}}{b^{2}}p_{i,j^{\prime}+1}\cdots p_{i,j})1_{A}(b_{j}p_{i,j^{\prime}+1}\cdots p_{i,j}n)\gg\frac{K|I|}{r^{4}}.

To complete the proof of ˜7.11 (and hence of Theorem˜1.1) we now replace the copies of $\Pi_{i,j}^{\operatorname{lrg}}f$ by $f$ itself. For this we can work one value of $(i,j)$ at a time; thus it is enough to show that, for each $(i,j)$ ,

\mathbb{E}^{\log}_{n\in[N],p_{i,*}\in\mathscr{P}_{i,*}}\big(f-\Pi_{i,j}^{\operatorname{lrg}}f\big)(n+\frac{b_{j}}{b^{2}}p_{i,j^{\prime}+1}\cdots p_{i,j})1_{A}(b_{j}p_{i,j^{\prime}+1}\cdots p_{i,j}n)\leqslant r^{-4-\varepsilon_{0}}.

(7.16)

(Here $\varepsilon_{0}=\frac{1}{10}$ again). To prove this we use Proposition˜5.1. Indeed, we note that the LHS of ˜7.16 is of the form

\mathbb{E}_{n\in[N],p\in\mathscr{P},p^{\prime}\in\mathscr{P}^{\prime}}^{\log}f_{1}(n+\lambda pp^{\prime})f_{2}(\lambda npp^{\prime}).

(which is exactly the expression in ˜5.1) where $f_{1}=f-\Pi_{Q_{j},H_{i,0}}f$ , $f_{2}(n)=1_{A}(b^{2}n)$ , $\lambda=b_{j}/b^{2}$ , $\mathscr{P}=\mathscr{P}_{i,j}$ , $\mathscr{P}^{\prime}=\mathscr{P}_{i,j^{\prime}+1}\cdots\mathscr{P}_{i,j-1}$ and $k=j-j^{\prime}-1\in\mathbf{N}$ .

Note here that every element of $\mathscr{P}^{\prime}$ has just one representation in this product since all primes in $\mathscr{P}_{i,j^{\prime}+1}$ are much smaller than those in $\mathscr{P}_{i,j^{\prime}+2}$ , and so on, and so $\mathbb{E}^{\log}_{p_{i,j^{\prime}+1}\in\mathscr{P}_{i,j^{\prime}+1},\dots,p_{i,j-1}\in\mathscr{P}_{i,j-1}}$ is the same thing as $\mathbb{E}^{\log}_{p^{\prime}\in\mathscr{P}^{\prime}}$ .

The setup for the application of Proposition˜5.1 requires some discussion. We address the various requirements in the statement of that proposition in turn.

•

The parameter $k$ will be $j-j^{\prime}-1$ . Note $1\leqslant k\leqslant K$ , so the condition $k\leqslant\log\log N$ is satisfied due to the choices ˜7.1.
•

We will take $\delta:=\lceil r^{4+\varepsilon_{0}}\rceil^{-1}$ (the aim being to show that the LHS of ˜7.16 is at most $\delta$ ). The conditions $1/\delta\leqslant\log\log N$ and $k\leqslant\delta^{-10}$ are then immediately checked.
•

We take $\mathscr{P}=\mathscr{P}_{i,j}$ and $\mathscr{P^{\prime}}=\mathscr{P}_{i,j^{\prime}+1}\cdots\mathscr{P}_{i,j-1}$ . For notational consistency with Proposition˜5.1, write $\mathscr{P}^{\prime}_{\ell}:=\mathscr{P}_{i,j^{\prime}+\ell}$ for $\ell\in[k]$ . Thus, by definition, $\mathscr{P}^{\prime}_{\ell}$ is the set of primes in the interval $I_{\ell}=[H_{i,2(j^{\prime}+\ell)-1},H_{i,2(j^{\prime}+\ell)}]$ , which is exactly the situation in Proposition˜5.1. By ˜7.3 and the choice of parameters we have $\log\log(\max(I_{\ell}))-\log\log(\min(I_{\ell}))\geqslant r^{25}>k\delta^{-4-\varepsilon_{0}}$ . (This is essentially the ‘pinch point’ for the analysis; for the main result to have the stated exponent of 50 we need $(4+\varepsilon_{0})^{2}<17$ here.)
•

We take $P_{1}=H_{i,2j-1}$ , $P_{2}=H_{i,2j}$ . The condition $P_{2}<\exp((\log N)^{1/4})$ is implied by ˜7.4, if $C_{2}$ is large enough.
•

We take $P^{\prime}_{1}=H_{i,2j^{\prime}+1}$ , $P^{\prime}_{2}=H^{2}_{i,2j-2}$ . Note here that $\min(\mathscr{P}^{\prime})\geqslant P^{\prime}_{1}$ and $\max(\mathscr{P}^{\prime})\leqslant H_{i,2j^{\prime}+2}\cdots H_{i,2j-2}\leqslant P^{\prime}_{2}$ , as required, using here that $H_{i,j}^{2}<H_{i,j+1}$ . The condition $P^{\prime}_{2}\geqslant\exp\exp((\log\log N)^{1/10})$ follows immediately from ˜7.4.
•

The condition $\lambda\leqslant e^{(\log N)^{1/4}}$ follows from ˜7.4 and the fact that $\lambda\leqslant\max B_{0}$ .
•

That all prime factors of $\lambda$ are less than $P^{\prime}_{1}$ is immediate from the lower bound $H_{1,1}>\max B_{0}$ .

Suppose that ˜7.16 does not hold. By the above discussion we are in a position to apply Proposition˜5.1. Note that $V$ in the conclusion there is, with our choice of parameters, exactly the same as $V$ in ˜7.1. Since $\lfloor P_{1}^{1/8}\rfloor\geqslant\lfloor H_{i,1}^{1/8}\rfloor>H_{i,0}^{2}$ , we may take the parameter $H$ in Proposition˜5.1 to be $H_{i,0}^{2}$ . The conclusion of Proposition˜5.1 is then that

\|f-\Pi_{Q_{j},H_{i,0}}f\|_{U^{1}_{\log}[N;Q_{j},H_{i,0}^{2}]}\gg K^{-O(1)}.

(Here we observed from the various definitions that $Q_{j}=\lambda V$ .) However, this is contrary to Lemma˜6.3, which asserts that the LHS is $\ll H_{i,0}^{-1}$ , which is enormously smaller. This contradiction shows that we indeed have ˜7.16, and all of the required statements are proven.

8. Further remarks

We end the main body of the paper with a series of remarks regarding the bounds obtained for the pattern $\{x+y,xy\}$ and related patterns.

First of all, we comment that there are two different ways in which the double exponential bound in the main theorem seems hard to improve using anything like the methods of this paper. The first is that it seems difficult to avoid the need to define a highly divisible set such as the set $B_{0}$ in ˜7.2, and any such definition seems to immediately lead to elements of double exponential size in $r$ . Second, the hierarchy of scales ˜7.4 needed to be chosen with $\log\log(H_{i,j+1})-\log\log(H_{i,j})\gg 1$ in order that the primes in this range satisfy $\sum_{p\in\mathscr{P}}\frac{1}{p}\gg 1$ , which is crucial in the application of Proposition˜5.1. It is possible to show using arguments somewhat related to those in [Tao24] that one cannot do appreciably better by choosing an alternative set to the primes. In particular, when applying Lemma˜A.5 with an alternate set of integers $\mathscr{P}$ , the error term is dominated by $\gamma(\mathscr{P})^{1/2}$ and one can prove that for any set $\mathscr{P}\subseteq[2,X]$ one has $\gamma(\mathscr{P})\gg(\log\log X)^{-1}$ .

Next we make some comments on the potential for extending the underlying analytic method to handle the pattern $\{x,x+y,xy\}$ (for which partition regularity was established by Moreira [Mor17], but with essentially no bounds). Presumably any such approach would require one to (at least) establish an inverse theorem establishing some structure assuming that

\big|\mathbb{E}_{n\in[N],p\in\mathscr{P}}^{\log}f_{1}(n)f_{2}(n+p)f_{3}(np)\big|\geqslant\delta,

(8.1)

where $\mathscr{P}$ is a suitable set of almost primes (compare here with ˜5.1. The following two rather different examples suggest this may be far from straightforward.

•

Suppose first that $f_{2}=1$ . Let $(\xi_{p})_{p\in\mathscr{P}}$ be an arbitrary sequence of unit complex numbers, and define $f_{1}(n):=\xi_{p}$ if $p$ is the least prime in $\mathscr{P}$ which divides $n$ , and $f_{1}(n)=0$ otherwise. Set $f_{3}(n):=\overline{f_{1}(n)}$ . Assuming that $\sum_{p\in\mathscr{P}}\frac{1}{p}\ggg 1$ , the (logarithmic) proportion of $n$ for which $f_{1}(n)=0$ is negligible. Now observe that $f_{1}(n)f_{3}(pn)=1$ if the least prime factor of $n$ in $\mathscr{P}$ is less than $p$ . On average over $p,n$ , one expects this to happen half the time. If, one other other hand, the least prime factor of $n$ is $p^{\prime}>p$ then we have $f_{1}(n)f_{3}(pn)=\xi_{p^{\prime}}\overline{\xi_{p}}$ , and typically we expect cancellation of this when summed over $p,p^{\prime}$ . Examples of this type therefore give ˜8.1 with $\delta\approx 1/2$ , but with $f_{1},f_{3}$ only having rather weak structure.
•

Now suppose that $f_{1}(n)=e(\alpha n^{2})$ , $f_{2}(n)=e(-\alpha n^{2})$ and $f_{3}(n)=e(2\alpha n)$ for some $\alpha\in\mathbf{R}$ . One may then observe that $f_{1}(n)f_{2}(n+p)f_{3}(np)=e(-\alpha p^{2})$ . If $P$ is the scale of $\mathscr{P}$ then this is $\approx 1$ for $|\alpha|\lessapprox P^{-2}$ .

Even with an inverse theorem for ˜8.1 in hand, it is far from clear how the other arguments of the paper might be modified.

Appendix A Properties of averages

In this appendix we assemble simple properties of (mostly) logarithmic averages. Throughout the appendix we assume $N\geqslant 2$ to avoid trivialities. For $m\in\mathbf{R}_{\geqslant 1}$ , $H_{m}$ denotes the harmonic sum $\sum_{n\leqslant m}\frac{1}{n}$ ; we do not require $m$ to be an integer. The first lemma concerns the behaviour of averages (both uniform and logarithmic) under shifts.

Lemma A.1.

Let $f:\mathbf{N}\rightarrow\mathbf{C}$ be a 1-bounded function and let $h\in\mathbf{Z}$ . Then

\big|\mathbb{E}_{n\in[N]}f(n)-\mathbb{E}_{n\in[N]}f(n+h)\big|\ll\frac{|h|}{N}

(A.1)

and, if $h\neq 0$ ,

\big|\mathbb{E}^{\log}_{n\in[N]}f(n)-\mathbb{E}^{\log}_{n\in[N]}f(n+h)\big|\ll\frac{1+\log|h|}{\log N}.

(A.2)

Proof.

˜A.1 is straightforward. For ˜A.2, we may suppose $|h|\leqslant N/2$ else the result is trivial. Without loss of generality we may suppose $h$ is positive, since the case $h$ negative follows from the positive case. We have

\sum_{n\in[N]}\frac{f(n+h)}{n}-\sum_{n\in[N]}\frac{f(n)}{n}=\sum_{m=h+1}^{N}\big(\frac{f(m)}{m-h}-\frac{f(m)}{m}\big)-\sum_{n=1}^{h}\frac{f(n)}{n}+\sum_{n=N-h+1}^{N}\frac{f(n+h)}{n}.

(A.3)

The second sum on the right is $\ll 1+\log h$ , whilst the third is $\leqslant\log N-\log(N-h+1)+O(1)\ll 1$ since $h\leqslant N/2$ . Finally, the first sum on the right is bounded above by $h\sum_{m=h+1}^{N}\frac{1}{m(m-h)}$ . Since $\sum_{m=h+1}^{2h}\frac{1}{m(m-h)}\ll\frac{1}{h}\sum_{m=h+1}^{2h}\frac{1}{m-h}\ll\frac{1+\log h}{h}$ , and $\sum_{m=2h+1}^{N}\frac{1}{m(m-h)}\ll\sum_{m>2h}m^{-2}\ll h^{-1}$ , the first sum on the right in ˜A.3 is bounded by $\ll 1+\log h$ . Putting all this together, the result follows. ∎

Next we give a result about splitting into residue classes.

Lemma A.2.

Let $f:\mathbf{Z}\rightarrow\mathbf{C}$ be $1$ -bounded. Let $q\in\mathbf{N}$ . Then

\mathbb{E}_{a\in\{0,1,\dots,q-1\}}\mathbb{E}_{n\in[N]}^{\log}f(qn+a)=\mathbb{E}_{n\in[N]}^{\log}f(n)+O\Big(\frac{1+\log q}{\log N}\Big).

Proof.

We may suppose $2\leqslant q\leqslant N$ since the result is trivial otherwise. The LHS may be expanded as

\frac{1}{H_{N}}\sum_{a\in\{0,1,\dots,q-1\}}\sum_{n\in[N]}\frac{f(qn+a)}{qn}.

The change if we replace $qn$ in the denominator by $qn+a$ is bounded above by

\ll\frac{1}{\log N}\sup_{a}\sum_{n\in[N]}\Big|\frac{1}{n}-\frac{1}{n+a/q}\Big|\ll\frac{1}{\log N},

which is acceptable. If we make this change, the resulting expression is

\frac{1}{H_{N}}\sum_{q\leqslant n^{\prime}\leqslant qN+(q-1)}\frac{f(n^{\prime})}{n^{\prime}}=\mathbb{E}_{n\in[N]}^{\log}f(n)+O\Big(\frac{H_{q}}{H_{N}}\Big)+O\Big(\frac{H_{qN+(q-1)}-H_{N}}{H_{N}}\Big).

The two error terms are $\ll\frac{1+\log q}{\log N}$ , and this concludes the proof. ∎

We also need the following related result.

Lemma A.3.

Let $q,b$ be coprime positive integers and let $H$ be a further positive integer parameter. Let $f:\mathbf{N}\rightarrow\mathbf{C}$ be a 1-bounded function. Then

\mathbb{E}_{n\in[N]}^{\log}\mathbb{E}_{h\in[H]}f(qn+bh)=\mathbb{E}^{\log}_{n\in[N]}f(n)+O\Big(\frac{1+\log q+\log bH}{\log N}\Big)+O\big(\frac{q}{H}\big).

Proof.

Clearly we may assume that $H\geqslant q$ , as the result is trivial otherwise. If we replace $H$ by $\tilde{H}:=q\lfloor H/q\rfloor$ , the LHS changes by at most $O(q/H)$ . It therefore suffices to consider the case $q\mid H$ . In this case we establish the result without the $O(q/H)$ error term. We have $f(qn+bh)=f(q(n+\sigma_{h})+(bh)_{q})$ , where $(bh)_{q}$ denotes the unique element of $\{0,1,\dots,q-1\}$ congruent to $bh(\operatorname{mod}\,q)$ , and $\sigma_{h}:=\frac{1}{q}(bh-(bh)_{q})$ . By ˜A.2, we have

\mathbb{E}^{\log}_{n\in[N]}f(q(n+\sigma_{h})+(bh)_{q})=\mathbb{E}^{\log}_{n\in[N]}f(qn+(bh)_{q})+O\Big(\frac{1+\log bH}{\log N}\Big).

However, since $(bh)_{q}$ ranges over $\{0,1,\dots,q-1\}$ as $h$ ranges over any interval of length $H/q$ ,

\mathbb{E}_{h\in[H]}\mathbb{E}^{\log}_{n\in[N]}f(qn+(bh)_{q})=\mathbb{E}_{a\in\{0,1,\dots,q-1\}}\mathbb{E}^{\log}_{n\in[N]}f(qn+a).

The result now follows from Lemma˜A.2. ∎

The next result states that logarithmic averages are essentially preserved under dilations. This is standard and appears, for instance, as [Ric25, Lemma 2.1].

Lemma A.4.

Let $f:\mathbf{N}\to\mathbf{C}$ be $1$ -bounded and let $q\in\mathbf{N}$ . Then

\Big|\mathbb{E}_{n\in[N]}^{\log}\big(f(n)-q\mathbf{1}_{q|n}f(n/q)\big)\Big|\ll\frac{\log q}{\log N}.

Proof.

When $q=1$ the result is trivial, so suppose $q\geqslant 2$ . By definition,

\mathbb{E}_{n\in[N]}^{\log}\big(f(n)-q\mathbf{1}_{q|n}f(n/q)\big)=\frac{1}{H_{N}}\sum_{n\in[N]}\frac{f(n)-q\mathbf{1}_{q|n}f(n/q)}{n}=\frac{1}{H_{N}}\Big(\sum_{n\in[N]}\frac{f(n)}{n}-\sum_{n^{\prime}\in[N/q]}\frac{f(n^{\prime})}{n^{\prime}}\Big).

This is bounded by $\leqslant\frac{1}{H_{N}}\big(H_{N}-H_{N/q}\big)=\frac{1}{H_{N}}(\log q+O(1))$ , and the result follows. ∎

We next require the logarithmic version of Elliott’s inequality. The proof is exactly that given in [Ric25, Corollary 2.3] modulo tracking error terms.

Lemma A.5.

Let $\mathscr{P}$ be a finite set of primes, all bounded by $P$ . Let $f:\mathbf{N}\to\mathbf{C}$ be $1$ -bounded. We have that

\Big|\mathbb{E}_{n\in[N]}^{\log}f(n)-\mathbb{E}_{n\in[N],p\in\mathscr{P}}^{\log}f(pn)\Big|\ll\frac{\log P}{\log N}+\Big(\sum_{p\in\mathscr{P}}\frac{1}{p}\Big)^{-1/2}.

Proof.

By Lemma˜A.4 applied with $\tilde{f}(n):=f(pn)$ , for each $p\in\mathscr{P}$ we have

\mathbb{E}_{n\in[N]}^{\log}f(pn)=p\mathbb{E}^{\log}_{n\in[N]}\mathbf{1}_{p|n}f(n)+O\big(\frac{\log P}{\log N}\big).

Therefore by Cauchy–Schwarz we have

	$\displaystyle\Big\|\mathbb{E}_{n\in[N]}^{\log}f(n)-\mathbb{E}_{n\in[N],p\in\mathscr{P}}^{\log}f(pn)\Big\|$	$\displaystyle=\Big\|\mathbb{E}_{n\in[N]}^{\log}\mathbb{E}_{p\in\mathscr{P}}^{\log}f(n)(p\mathbf{1}_{p\|n}-1)\Big\|+O\Big(\frac{\log P}{\log N}\Big)$
		$\displaystyle\leqslant\Big(\mathbb{E}_{n\in[N]}^{\log}\big\|\mathbb{E}_{p\in\mathscr{P}}^{\log}(p\mathbf{1}_{p\|n}-1)\big\|^{2}\Big)^{1/2}+O\Big(\frac{\log P}{\log N}\Big).$

By [Ric25, Proposition 2.2], we have that

\displaystyle\mathbb{E}_{n\in[N]}^{\log}

\displaystyle\big|\mathbb{E}_{p\in\mathscr{P}}^{\log}(p\mathbf{1}_{p|n}-1)\big|^{2}\leqslant 9\Big(\sum_{p\in\mathscr{P}}\frac{1}{p}\Big)^{-1},

and the result follows. ∎

We end with a proposition regarding the behaviour of $U^{1}_{\log}[N;q,H]$ under replacing $q$ by a multiple or shrinking the interval $H$ .

Lemma A.6.

Suppose that $q\mid\tilde{q}$ and that $\tilde{H}\tilde{q}<Hq<N/2$ . Then

\|f\|_{U^{1}_{\log}[N;q,H]}\leqslant\|f\|_{U^{1}_{\log}[N;\tilde{q},\tilde{H}]}+O\Big(\frac{\log|Hq|}{\log N}\Big)+O\Big(\frac{\tilde{H}\tilde{q}}{Hq}\Big).

Proof.

First observe that

\mathbb{E}_{h\in[H]}f(n+hq)=\mathbb{E}_{h\in[H],\tilde{h}\in[\tilde{H}]}f(n+hq+\tilde{h}\tilde{q})+O\Big(\frac{\tilde{H}\tilde{q}}{Hq}\Big)

by the assumptions and ˜A.1. Substituting into the definition of $\|f\|_{U^{1}_{\log}[N;q,H]}$ , we have

\|f\|_{U^{1}_{\log}[N;q,H]}^{2}=\mathbb{E}_{n\in[N]}^{\log}\big|\mathbb{E}_{h\in[H],\tilde{h}\in[\tilde{H}]}f(n+hq+\tilde{h}\tilde{q})\big|^{2}+O\Big(\frac{\tilde{H}\tilde{q}}{Hq}\Big).

By Cauchy–Schwarz,

\|f\|_{U^{1}_{\log}[N;q,H]}^{2}\leqslant\mathbb{E}_{n\in[N]}^{\log}\mathbb{E}_{h\in[H]}\big|\mathbb{E}_{\tilde{h}\in[\tilde{H}]}f(n+hq+\tilde{h}\tilde{q})\big|^{2}+O\Big(\frac{\tilde{H}\tilde{q}}{Hq}\Big).

However by ˜A.2, for each $h$ we have

\mathbb{E}_{n\in[N]}^{\log}\big|\mathbb{E}_{\tilde{h}\in[\tilde{H}]}f(n+hq+\tilde{h}\tilde{q})\big|^{2}=\mathbb{E}_{n\in[N]}^{\log}\big|\mathbb{E}_{\tilde{h}\in[\tilde{H}]}f(n+\tilde{h}\tilde{q})\big|^{2}+O\Big(\frac{\log|hq|}{\log N}\Big).

Averaging over $h\in[H]$ gives the result. ∎

Appendix B An exponential sum estimate over the primes

In this appendix we prove a log-free exponential sum estimate for the von Mangoldt function with polynomial phase.

Lemma B.1.

Let $m\in\mathbf{N}$ and $\varepsilon\in(0,\frac{1}{2})$ . Suppose that

\big|\sum_{n\leqslant X}\Lambda(n)e(n^{m}\theta)\big|\geqslant\varepsilon X.

(B.1)

Then there is some $q\in\mathbf{N}$ such that

q\leqslant\varepsilon^{-O_{m}(1)}\quad\mbox{and}\quad\|\theta q\|_{\mathbf{R}/\mathbf{Z}}\leqslant\varepsilon^{O_{m}(1)}X^{-m}.

(B.2)

Proof.

We proceed via the weaker result with ˜B.2 replaced by

q\leqslant\big(\frac{\log X}{\varepsilon}\big)^{O_{m}(1)}\quad\mbox{and}\quad\|\theta q\|_{\mathbf{R}/\mathbf{Z}}\leqslant\big(\frac{\log X}{\varepsilon}\big)^{O_{m}(1)}X^{-m}.

(B.3)

This is a standard application of the method of Type I/II sums. However, some sources in the literature such as [Har81] lose factors of $X^{o(1)}$ instead of a power of $\log X$ via an invocation of the divisor bound in the proof of Weyl’s inequality. This loss can be avoided with a little care, but it is hard to find a convenient source in the literature. One may find an essentially equivalent argument (with the polynomial phase $e(n^{m}\theta)$ replaced by a general nilsequence) in [GT-mobius]. The key point is that [GT-mobius, Proposition 3.1] holds verbatim if the Möbius function $\mu$ is replaced by $\Lambda$ . This is established in a standard fashion as in the proof of [GT-mobius, Proposition 3.1] (which is outsourced to [GT08, Section 4], which itself is derivative of standard expositions such as [IK-book, Chapter 13]) by using Vaughan’s identity for $\Lambda$ rather than the variant for $\mu$ . One may now run the arguments of [GT-mobius, Section 3]; in this context most of the language of nilmanifolds is redundant since $e(n^{m}\theta)$ is a nilsequence on the abelian torus $\mathbf{R}/\mathbf{Z}$ . In particular the ‘complexity’ parameter $Q$ is simply $O(1)$ . The conclusion of [GT-mobius, Section 3] is then that, starting from ˜B.1, and setting $\delta:=\varepsilon/\log X$ , there is some $q\ll_{m}\delta^{-O_{m}(1)}$ such that we have $\|\theta q\|_{\mathbf{R}/\mathbf{Z}}\ll_{m}\delta^{-O_{m}(1)}X^{-m}$ ; this is exactly ˜B.3 (noting here that the $\ll$ can be upgraded to $\leqslant$ at the expense of worsening exponents since $\delta\lll 1$ ).

If $\varepsilon\geqslant(\log X)^{-1}$ then ˜B.3 immediately implies ˜B.2 (after adjusting the exponents $O_{m}(1)$ ). To complete the proof of Lemma˜B.1, it therefore suffices to handle the case $\varepsilon\leqslant(\log X)^{-1}$ . In this case, from ˜B.3 we certainly have $q\leqslant(\log X)^{O_{m}(1)}$ and $\|\theta q\|_{\mathbf{R}/\mathbf{Z}}\leqslant(\log X)^{O_{m}(1)}X^{-m}$ . In this case one can obtain an asymptotic for the exponential sum $\sum_{n\leqslant X}\Lambda(n)e(n^{m}\theta)$ using the Siegel-Walfisz theorem on the distribution of $\Lambda$ in progressions $(\operatorname{mod}\,q)$ . These arguments are carried out in detail in work of Hua [Hua38]. Summarising briefly, the main term of this asymptotic at $\theta=\frac{a}{q}+\eta$ will be $\frac{X}{\phi(q)}S(a,q)\nu(\eta X^{m})$ where $\nu(y)=\int^{1}_{0}e(yx^{m})dx$ satisfies an appropriate van der Corput estimate and $S(a,q):=\sum_{b\in(\mathbf{Z}/q\mathbf{Z})^{*}}e(ab^{m}/q)$ satisfies $|S(a,q)|\ll_{m}q^{1/2+o_{m}(1)}$ . The assumption ˜B.1 therefore forces both $\eta X^{m}\ll\varepsilon^{-1}$ and $q\ll_{m}\varepsilon^{-2-o_{m}(1)}$ . ∎

Finally we give the case $k=1$ of (a slight generalisation of) Lemma˜3.2, which was used in the proof of the case $k\geqslant 2$ of that result. This can be quickly deduced from Lemma˜B.1 as a consequence of partial summation.

Lemma B.2.

Let $m\in\mathbf{N}$ and $\delta,\eta\in(0,\frac{1}{2})$ . Suppose that

\big|\sum_{X\leqslant p<(1+\eta)X}e(p^{m}\theta)\big|\geqslant\frac{\delta\eta X}{\log X}.

(B.4)

Then there is some $q\in\mathbf{N}$ such that $q\leqslant(\eta\delta)^{-O_{m}(1)}$ and $\|\theta q\|_{\mathbf{R}/\mathbf{Z}}\leqslant(\eta\delta)^{-O_{m}(1)}X^{-m}$ .

Proof.

The result is trivial if $\delta\eta\leqslant X^{-1/10}$ (say), so suppose this is not the case. We may replace the assumption ˜B.4 by

\Big|\sum_{X\leqslant n<(1+\eta)X}\frac{\Lambda(n)}{\log n}e(n^{m}\theta)\Big|\geqslant\frac{\delta\eta X}{2\log X}.

(The loss of a further factor of 2 here comes from the essentially negligible contribution of the prime power support of $\Lambda$ ). Now for $n\geqslant X$ we have $\frac{1}{\log n}=\frac{1}{\log X}-\int^{n}_{X}\frac{dt}{t(\log t)^{2}}$ . Substituting in and applying the triangle inequality gives

\frac{1}{\log X}\Big|\sum_{X\leqslant n<(1+\eta)X}\Lambda(n)e(n^{m}\theta)\Big|+\int^{2X}_{X}\frac{dt}{t(\log t)^{2}}\Big|\sum_{t\leqslant n\leqslant(1+\eta)X}\Lambda(n)e(n^{m}\theta)\Big|\geqslant\frac{\delta\eta X}{2\log X}.

By further applications of the triangle inequality and $\int^{2X}_{X}\frac{dt}{t(\log t)^{2}}\leqslant\frac{1}{\log X}$ it follows that

\sup_{Y\in[X,2X]}\Big|\sum_{n\leqslant Y}\Lambda(n)e(n^{m}\theta)\Big|\geqslant\delta\eta X/16.

The desired conclusion ˜3.2 now follows from Lemma˜B.1. ∎

B.1. Effectivity

The proof outline above for Lemma˜B.1 gives ineffective bounds due to the invocation of the Siegel–Walfisz theorem in Hua’s work. However, one can replace this with a version of the prime number theorem in progressions incorporating an additional correction term for a potential Siegel zero such as [IK-book, Equation (5.71)]. Specifically, for $q\leqslant e^{\sqrt{\log X}}$ and $(b,q)=1$ we have

\sum_{n\leqslant X:n\equiv b(\mbox{\scriptsize mod}\,q)}\Lambda(n)=\frac{X}{\phi(q)}-\frac{\overline{\chi(b)}}{\phi(q)}\frac{X^{\beta}}{\beta}+O(Xe^{-c\sqrt{\log X}}),

where here $\chi$ is some quadratic Dirichlet character for which $L(s,\chi)$ has a Siegel zero $\beta$ . The Siegel zero term introduces a secondary main term in Hua’s asymptotic formula, now of the form $-\frac{X^{\beta}}{\phi(q)}\tilde{S}(a,q)\tilde{\nu}(\eta X^{m})$ , where $\tilde{S}(a,q)=\sum_{b(\mbox{\scriptsize mod}\,q),(b,q)=1}\overline{\chi(b)}e(ab^{m}/q)$ and $\tilde{\nu}(y)=\int^{1}_{0}x^{\beta-1}e(x^{m}y)dx$ . These terms satisfy similar estimates to $S,\nu$ in the Hua analysis, allowing us to draw an analogous conclusion.

	$\displaystyle\mathbb{E}_{p_{1}^{\prime}\in\mathscr{P}_{1}^{\prime},p^{\prime}_{2}\in\mathscr{P}^{\prime}_{2}}\Big\|\mathbb{E}_{x\in[X,2X)}$	$\displaystyle\psi(x)\mathbb{E}_{n\in I}f_{1}\big(np^{\prime}_{2}+\lambda xp^{\prime}_{1}\big)\overline{f_{1}\big(np^{\prime}_{1}+\lambda xp^{\prime}_{2}\big)}\Big\|$
		$\displaystyle\ll\min\Big(\mathbb{E}_{x\in[X,2X)}\|\psi(x)\|,\frac{k}{X^{c}}\\|\widehat{\psi}\\|_{\infty}^{c}\\|\psi\\|_{\infty}^{1-c}+(\log X)^{-C_{2}}\\|\psi\\|_{\infty}\Big).$		(5.15)

	$\displaystyle\Big(4\int\|\widehat{\psi}(t)\|^{4}~dt\Big)$	$\displaystyle\int\Big\|\widehat{\mu_{[H]}}(q_{1}(\theta_{1}-\theta_{3}))\widehat{\mu_{[H]}}(q_{1}(\theta_{2}-\theta_{4})))$
		$\displaystyle\qquad\qquad\times\widehat{\mu_{[H]}}(q_{2}(\theta_{1}-\theta_{2}))\widehat{\mu_{[H]}}(q_{2}(\theta_{3}-\theta_{4}))\Big\|d\theta_{1}d\theta_{2}d\theta_{3}d\theta_{4}\gg(\tau/k)^{O(1)}.$		(5.21)

	$\displaystyle\\|\Pi_{i,j}^{\operatorname{lrg}}f-\Pi_{i,j}^{\operatorname{sml}}f\\|^{2}$	$\displaystyle\leqslant\\|\Pi_{i,j}^{\operatorname{lrg}}f\\|^{2}-\\|\Pi_{i,j}^{\operatorname{sml}}f\\|^{2}+O\Big(\frac{\log Q_{j}H_{i_{+},0}}{\log N}\Big)+O\Big(\frac{Q_{j}H_{i,0}}{Q_{j-1}H_{i_{+},0}}\Big)$
		$\displaystyle\leqslant\\|\Pi_{i,j}^{\operatorname{lrg}}f\\|^{2}-\\|\Pi_{i,j}^{\operatorname{sml}}f\\|^{2}+r^{-10}.$		(7.15)

Bounds for monochromatic solutions to {x+y,x​y}\{x+y,xy\}

Abstract.

1. Introduction

Theorem 1.1.

Remarks.

1.1. Previous results

1.2. Proof outline

1.3. Acknowledgments

1.4. Notation

2. Diophantine sets and averages

Definition 2.1.

Remarks.

Lemma 2.2.

Definition 2.3.

Lemma 2.4.

Remark.

Proof.

Lemma 2.5.

Proof.

Lemma 2.6.

Remark.

Proof.

3. Diophantine properties of almost primes

Lemma 3.1.

Remark.

Proof.

Lemma 3.2.

Proof.

4. Fourier decomposition of a majorant for the primes

Lemma 4.1.

Proof.

Remarks.

5. An inverse theorem

Proposition 5.1.

Remarks.

5.1. Setting up the proof of the inverse theorem

Lemma 5.2.

Proof.

5.2. Proof of the inverse theorem

Proof.

6. Averaging projections and orthogonality

Definition 6.1.

Lemma 6.2.

Proof.

Lemma 6.3.

Proof.

Lemma 6.4.

Proof.

Lemma 6.5.

Proof.

7. Proof of the main theorem

7.1. Setting up parameters.

7.2. Positivity for x,x​yx,xy

7.3. Proof of the main theorem

8. Further remarks

Appendix A Properties of averages

Lemma A.1.

Proof.

Lemma A.2.

Proof.

Lemma A.3.

Proof.

Lemma A.4.

Proof.

Lemma A.5.

Proof.

Lemma A.6.

Proof.

Appendix B An exponential sum estimate over the primes

Lemma B.1.

Proof.

Lemma B.2.

Proof.

B.1. Effectivity

Bounds for monochromatic solutions to $\{x+y,xy\}$

7.2. Positivity for $x,xy$