{\huge Getting to the Roots of Factoring}

We revisit a paper from 1994 \vspace{.25in}

Richard Lipton is, among so many other things, a newlywed. He and Kathryn Farley were married on June 4th in Atlanta. The wedding was attended by family and friends including many faculty from Georgia Tech, some from around the country, and even one of Dick's former students coming from Greece. Their engagement was noted here last St. Patrick's Day, and Kathryn was previously mentioned in a relevantly-titled post on cryptography.

Today we congratulate him and Kathryn, and as part of our tribute, revisit a paper of his on factoring from 1994.

They have just come back from their honeymoon in Paris. Paris is many wonderful things: a flashstone of history, a center of culture, a city for lovers. It is also the setting for most of Dan Brown's novel The Da Vinci Code and numerous other conspiracy-minded thrillers. Their honeymoon was postponed by an event that could be a plot device in these novels: the Seine was flooded enough to close the Louvre and Musée D'Orsay and other landmarks until stored treasures could be brought to safe higher ground.

It is fun to read or imagine stories of cabals seeking to collapse world systems and achieve domination. Sometimes these stories turn on scientific technical advances, even purely mathematical points as in Brown's new novel, Inferno. It needs a pinch to realize that we as theorists often verge on some of these points. Computational complexity theory as we know it is asymptotic and topical, so it is a stretch to think that papers such as the present one impact the daily work of those guarding the security of international commerce or investigating possible threats. But from its bird's-eye view there is always the potential to catch a new glint of light reflected from the combinatorial depths that could not be perceived until the sun and stars align right. In this quest we take a spade to dig up old ideas anew.

\includegraphics[width=3.5in]{PeiPyramidPexel.jpg}
{\small Pei's Pyramid of the Louvre Court = Phi delves out your prime factor...}

The Paper and the Code

The paper is written in standard mathematical style: first a theorem statement with hypotheses, next a series of lemmas, and the final algorithm and its analysis coming at the very end. We will reverse the presentation by beginning with the algorithm and treating the final result as a mystery to be decoded.

Here is the code of the algorithm. It all fits on one sheet and is self-contained; no abstruse math text or Rosetta Stone is needed to decipher. The legend says that the input $latex {x}&fg=000000$ is a product of two prime numbers, $latex {\phi}&fg=000000$ is a polynomial in just one variable, and $latex {\gcd}&fg=000000$ refers to the greatest-common-divisor algorithm expounded by Euclid around 300 B.C. Then come the runes, which could not be simpler:

Repeat until exit:
$latex {a := }&fg=000000$ a random number in $latex {1,\dots,x-1}&fg=000000$;
$latex {b := \phi(a) \bmod{x}}&fg=000000$;
if $latex {(\gcd(b,x) > 1)}&fg=000000$ then exit.

Exiting enables carrying out the two prime factors of $latex {x}&fg=000000$---but a final message warns of a curse of vast unknowable consequences.

How many iterations must one expect to make through this maze before exit? How and when can the choice of the polynomial $latex {\phi}&fg=000000$ speed up the exploration? That is the mystery.

Our goal is to expose the innards of how the paper works, so that its edifice resembles another famous modern Paris landmark:

\includegraphics[width=3.5in]{PompidouCentre.jpg}

This is the Georges Pompidou Centre, whose anagram ``go count degree prime ops'' well covers the elements of the paper. Part of the work for this post---in particular the possibility of improving $latex {|p| = n^{\epsilon}}&fg=000000$ to $latex {|p| = n(\frac{1}{2} - \epsilon)}&fg=000000$---is by my newest student at Buffalo, Chaowen Guan.

A First Analysis

Let $latex {x = pq}&fg=000000$ with $latex {p}&fg=000000$ and $latex {q}&fg=000000$ prime. To get the expected running time, it suffices to have good lower and upper bounds

$latex \displaystyle \pi_p \leq \Pr_a[\phi(a) \equiv 0 \!\!\!\!\pmod{p}] \leq \rho_p &fg=000000$

and analogous bounds $latex {\pi_q,\rho_q}&fg=000000$ for $latex {q}&fg=000000$. Then the probability $latex {P}&fg=000000$ of success on any trial $latex {a}&fg=000000$ is at least

$latex \displaystyle \pi_p - \rho_q. &fg=000000$

This lower bounds the probability of $latex {b \equiv 0 \!\!\pmod{p} \wedge b \not\equiv 0 \!\!\pmod{q}}&fg=000000$, whereupon $latex {\gcd(b,x)}&fg=000000$ gives us the factor $latex {p}&fg=000000$. We could add a term $latex {\pi_q - \rho_p}&fg=000000$ for the other way to have success, which is $latex {b \not\equiv 0 \!\!\pmod{p} \wedge b \equiv 0 \!\!\pmod{q}}&fg=000000$. However, our strategy will be to make $latex {\rho_q}&fg=000000$ and hence $latex {\pi_q}&fg=000000$ close to $latex {0}&fg=000000$ by considering cases where $latex {p \ll q}&fg=000000$ but $latex {p}&fg=000000$ is still large enough to matter. Then we can ignore this second possibility and focus on $latex {\pi_p - \rho_q}&fg=000000$. At the end we will consider relaxing just so $latex {\rho_q}&fg=000000$ is bounded away from $latex {1}&fg=000000$.

Note that we cannot consider the events $latex {b \equiv 0 \!\!\pmod{p}}&fg=000000$ and $latex {b \equiv 0 \!\!\pmod{q}}&fg=000000$ to be independent, even though $latex {p}&fg=000000$ and $latex {q}&fg=000000$ are prime, because $latex {b = \phi(a)}&fg=000000$ and $latex {\phi}&fg=000000$ may introduce bias. We could incidentally insert an initial text for $latex {\gcd(a,x) > 1}&fg=000000$ without affecting the time or improving the success probability by much. Then conditioned on its failure, the events $latex {a \equiv 0 \!\!\pmod{p}}&fg=000000$ and $latex {a \equiv 0 \!\!\pmod{q}}&fg=000000$ become independent via the Chinese Remainder Theorem. This fact is irrelevant to the algorithm but helps motivate the analysis in part.

This first analysis thus focuses the question to become:

How does computing $latex {b = \phi(a)}&fg=000000$ change the sampling? We mention in passing that Peter Shor's algorithm basically shows that composing a certain function $latex {\phi(a)}&fg=000000$ into the quantum Fourier transform greatly improves the success of the sampling. This requires, however, a special kind of machine that, according to some of its principal conceivers, harnesses the power of multiple universes. There are books and even a movie about such machines, but none have been built yet and this is not a Dan Brown novel so we'll stay classically rooted.

Using Polynomials Roots

Two great facts about polynomials $latex {\phi(z)}&fg=000000$ of degree $latex {d}&fg=000000$ are:

There are guaranteed to be at most $latex {d}&fg=000000$ roots.
If $latex {u \equiv v \!\!\pmod{p}}&fg=000000$, then $latex {\phi(u) \equiv \phi(v) \!\!\pmod{p}}&fg=000000$.

The second requires the coefficients of $latex {\phi}&fg=000000$ to be integers. Neither requires all the roots to be integers, but we will begin by assuming this is the case. Take $latex {R}&fg=000000$ to be the set of integer roots of $latex {\phi}&fg=000000$. Then define

$latex \displaystyle S_p = \{u + cp \!\!\!\!\pmod{x} : u \in R \wedge c \in [q]\}, &fg=000000$

where as usual $latex {[q] = \{0,\dots,q-1\}}&fg=000000$. The key point is that

$latex \displaystyle P \geq \frac{1}{x}(|S_p \setminus R|) - \rho_q. &fg=000000$

To prove this, suppose the random $latex {a}&fg=000000$ belongs to $latex {S_p}&fg=000000$ but not to $latex {R}&fg=000000$. Then $latex {a = u + cp}&fg=000000$ for some $latex {u \in R}&fg=000000$, so $latex { \phi(a) \equiv \phi(u + cp) \equiv \phi(u) = 0 \!\!\pmod{p}}&fg=000000$, but $latex {\phi(a) \neq 0}&fg=000000$ since $latex {a \notin R}&fg=000000$. There is still the possibility that $latex {\phi(a)}&fg=000000$ is a nonzero multiple of $latex {x}&fg=000000$, which would give $latex {b = 0}&fg=000000$ and deny success, but this entails $latex {\phi(a) \equiv 0 \!\!\pmod{q}}&fg=000000$ and so this is accounted by subtracting off $latex {\rho_q}&fg=000000$.

Our lower bound $latex {\pi_p}&fg=000000$ will be based on $latex {\frac{1}{x}(|S_p \setminus R|)}&fg=000000$. There is one more important element of the analysis. We do not have to bound the running time for all $latex {x}&fg=000000$, that is, all pairs $latex {(p,q)}&fg=000000$ of primes. The security of factoring being hard is needed for almost all $latex {x}&fg=000000$. Hence to challenge this, it suffices to show that $latex {\pi_p}&fg=000000$ is large in average case over $latex {p}&fg=000000$. Thus we are estimating the distributional complexity of a randomized algorithm. These are two separate components of the analysis. We will show:

For many primes $latex {p}&fg=000000$ belonging to a large set $latex {\mathcal{P}}&fg=000000$ of primes of length substantially below $latex {n/2}&fg=000000$, where $latex {n}&fg=000000$ is the length of $latex {x}&fg=000000$, $latex {\pi_p}&fg=000000$ is ``large.''

We will quantify ``large'' at the end, and it will follow that since $latex {q}&fg=000000$ is substantially greater, $latex {\rho_q}&fg=000000$ is ``tiny'' in the needed sense. Now we are ready to estimate the key cardinality $latex {|S_p \setminus R|}&fg=000000$.

The Lever

In the best case, $latex {|S_p|}&fg=000000$ can be larger than $latex {|R|}&fg=000000$ by a factor of $latex {q}&fg=000000$. This happens if for every root $latex {u}&fg=000000$, the values $latex {u + cp \!\!\pmod{x}}&fg=000000$ do not hit any other members of $latex {R}&fg=000000$. When this happens, $latex {|S_p|}&fg=000000$ itself can be as large as $latex {x}&fg=000000$. Then

$latex \displaystyle \pi_p = \frac{1}{x} (q - 1)|R| \approx 1. &fg=000000$

By a similar token, for any $latex {\delta > 0}&fg=000000$, if $latex {|S_p \setminus R| > \delta q|R|}&fg=000000$, then $latex {\pi_p > \delta}&fg=000000$---or in general,

$latex \displaystyle \pi_p > \delta\frac{|R|}{p}. &fg=000000$

The factor of $latex {(q-1)}&fg=000000$ is the lever by which to gain a higher likelihood of quick success. When will it be at our disposal? It depends on whether $latex {p}&fg=000000$ is ``good'' in the sense that $latex {|S_p \setminus R| \geq \delta q |R|}&fg=000000$ and also on $latex {R}&fg=000000$ itself being large enough.

For each root $latex {u \in R}&fg=000000$ and $latex {p}&fg=000000$, define the ``strand'' $latex {S_{u,p} = \{v_c : c \in [q]\}}&fg=000000$ where $latex {v_c = u + cp \!\!\pmod{x}}&fg=000000$. There are always $latex {q}&fg=000000$ distinct values in any strand. If $latex {|R| \ll q}&fg=000000$ then every strand has most $latex {v_c}&fg=000000$ as non-roots. There is still the possibility that $latex {\phi(v_c) \equiv 0 \!\!\pmod{x}}&fg=000000$---that is, such that $latex {\phi(v_c) \equiv 0 \!\!\pmod{q}}&fg=000000$---which would prevent a successful exit. This is where $latex {p \ll q}&fg=000000$ really comes in, attending to the upper bound $latex {\rho_q}&fg=000000$.

\includegraphics[width=3.5in]{StSulpiceAndCrypt.png}

Hence what can make a prime $latex {p}&fg=000000$ ``bad'' is having a low number of strands. When $latex {v = u + cp}&fg=000000$ and $latex {v \in R}&fg=000000$ the strands $latex {S_{u,p}}&fg=000000$ and $latex {S_{v,p}}&fg=000000$ coincide---and this happens for any other $latex {p'}&fg=000000$ such that $latex {p'}&fg=000000$ divides $latex {v - u}&fg=000000$. Here is where we hit the last important requirement on $latex {R}&fg=000000$. Suppose $latex {v = u + Cp}&fg=000000$ where $latex {C}&fg=000000$ is the product of every prime $latex {p' < x}&fg=000000$ other than $latex {p}&fg=000000$. Then $latex {S_{u,p'}}&fg=000000$ and $latex {S_{v,p'}}&fg=000000$ coincide for every prime $latex {p' < x}&fg=000000$. It doesn't matter that $latex {v}&fg=000000$ is astronomically bigger than $latex {x}&fg=000000$ or $latex {x' = p' q}&fg=000000$; the strands still coincide within $latex {[x]}&fg=000000$ and within $latex {[x']}&fg=000000$.

Hence what we need to do is bound the roots by some value $latex {L}&fg=000000$ that is greater than any $latex {x'}&fg=000000$ we are likely to encounter. The $latex {x'}&fg=000000$ is not too great: if we limit to $latex {p'}&fg=000000$ of some same given length as that of $latex {p}&fg=000000$, then $latex {p' < 2p}&fg=000000$ so $latex {x' < 2x}&fg=000000$. We need not impose the requirement $latex {R \subseteq [L]}&fg=000000$ but must replace $latex {|R|}&fg=000000$ above by $latex {r = |R'|}&fg=000000$ where $latex {R' = R \cap [L]}&fg=000000$. We can't get in trouble from $latex {w \in R \setminus R'}&fg=000000$ such that $latex {p}&fg=000000$ divides $latex {w - u}&fg=000000$ and $latex {p}&fg=000000$ divides $latex {w - v}&fg=000000$ since then $latex {p}&fg=000000$ divides $latex {v - u}&fg=000000$ already. This allows the key observation:

For any distinct pair $latex {u,v \in R'}&fg=000000$, there are at most $latex {\log(L)}&fg=000000$ primes $latex {p}&fg=000000$ such that $latex {p}&fg=000000$ divides $latex {v - u}&fg=000000$.

Thus given $latex {R'}&fg=000000$ we have $latex {\binom{r}{2}\log(L)}&fg=000000$ ``slots'' for primes $latex {p}&fg=000000$. Every bad prime must occupy a certain number of these slots. Counting these involves the last main ingredient in Dick's paper. We again try to view it a different way.

Counting Bad Primes

Given $latex {\delta}&fg=000000$, and replacing the original $latex {R}&fg=000000$ with $latex {R' = R \cap [L]}&fg=000000$, ultimately we want to call a prime $latex {p}&fg=000000$ bad if $latex {|S_p \setminus R'| < \delta q r}&fg=000000$, where $latex {r = |R'|}&fg=000000$. We will approach this by calling $latex {p}&fg=000000$ ``bad'' if there are $latex {\leq \delta r}&fg=000000$ strands.

For intuition, let's suppose $latex {r = 6}&fg=000000$, $latex {R' = \{t,u,v,w,y,z\}}&fg=000000$. If we take $latex {\delta = \frac{1}{2}}&fg=000000$ as the paper does, then we can make $latex {p}&fg=000000$ bad by inserting it into three slots: say $latex {(t,u)}&fg=000000$, $latex {(v,w)}&fg=000000$, and $latex {(y,z)}&fg=000000$. We could instead insert a copy of $latex {p}&fg=000000$ into $latex {(t,u)}&fg=000000$, $latex {(u,v)}&fg=000000$, and $latex {(v,w)}&fg=000000$, which lumps $latex {\{t,u,v,w\}}&fg=000000$ into one strand and leaves $latex {y,z}&fg=000000$ free to make two others. In the latter case, however, we also know by transitivity that $latex {p}&fg=000000$ divides $latex {v = t}&fg=000000$, $latex {w - u}&fg=000000$, and $latex {w - t}&fg=000000$ as well. Thus we have effectively used up $latex {6}&fg=000000$ not $latex {3}&fg=000000$ slots on $latex {p}&fg=000000$. Now suppose $latex {\delta = \frac{1}{3}}&fg=000000$ instead, so ``bad'' means getting down to $latex {s = 2}&fg=000000$ strands. Now we are forced to create at least one $latex {3}&fg=000000$-clique and this means using more than $latex {r(1 - \delta) = 4}&fg=000000$ slots. Combinatorially the problem we are facing is:

Cover $latex {r}&fg=000000$ nodes by $latex {s}&fg=000000$ cliques while minimizing the total number $latex {m}&fg=000000$ of edges.

This problem has an easy answer: make all cliques as small as possible. Supposing $latex {\ell = r/s = 1/\delta}&fg=000000$ is an integer, this means making $latex {s}&fg=000000$-many $latex {\ell}&fg=000000$-cliques, which (ignoring the difference between $latex {\binom{\ell}{2}}&fg=000000$ and $latex {\ell^2 / 2}&fg=000000$) totals $latex {m \sim \ell^2 s/2 = \ell r/2 = r/(2\delta))}&fg=000000$ edges. When $latex {\delta}&fg=000000$ is constant this is $latex {\Theta(r)}&fg=000000$, but shows possible ways to improve when $latex {\delta}&fg=000000$ is not constant. We conclude:

Lemma 1 The number of bad primes $latex {p}&fg=000000$ is at most $latex {\frac{r^2 \log(L)/2}{r/(2\delta)} = \delta r \log(L)}&fg=000000$.

We will constrain $latex {r}&fg=000000$ by bounding the degree of $latex {\phi}&fg=000000$. By $latex {p \ll q}&fg=000000$ this will also bound $latex {r}&fg=000000$ relative to $latex {q}&fg=000000$ so that the number of possible strands is small with respect to $latex {q}&fg=000000$, which will lead to the desired bound on $latex {\rho_q}&fg=000000$. Now we are able to conclude the analysis well enough to state a result.

The Result

Define $latex {\mathsf{RAVGTIME}_{\gamma}[t(n)]}&fg=000000$ to mean problems solvable on a $latex {\gamma}&fg=000000$ fraction of inputs by randomized algorithms with expected time $latex {t(n)}&fg=000000$. Superscripting $latex {\phi}&fg=000000$ means having an oracle to compute $latex {\phi(a)}&fg=000000$ from $latex {a}&fg=000000$ for free. If $latex {\phi}&fg=000000$ is such that the time to compute $latex {\phi(a)}&fg=000000$ is $latex {t(n)}&fg=000000$ and this is done once per trial, then a given $latex {\mathsf{RAVGTIME}^{\phi}_{\gamma}[t(n)]}&fg=000000$ algorithm can be re-classified into $latex {\mathsf{RAVGTIME}_{\gamma}[t(n)^2]}&fg=000000$ without the oracle notation.

Theorem 2 Suppose $latex {[\phi_n]}&fg=000000$ is a sequence of polynomials in $latex {\mathbb{Z}[u]}&fg=000000$ of degrees $latex {d_n}&fg=000000$ having $latex {r_n}&fg=000000$ integer roots in the interval $latex {[0,L_n]}&fg=000000$, for all $latex {n}&fg=000000$. Then for any fixed $latex {\epsilon > 0}&fg=000000$, the problem of factoring $latex {n}&fg=000000$-bit integers $latex {x = pq}&fg=000000$ with $latex {|p| = \nu < \frac{n}{2} - \epsilon}&fg=000000$ belongs to
$latex \displaystyle \mathsf{RAVGTIME}^{\phi_n}_{\Omega(1)}[O(n^2 \log L_n)] &fg=000000$
provided $latex {d_n = \Theta(\frac{2^{\nu}}{\log L_n})}&fg=000000$ and $latex {r_n = \Omega(d_n)}&fg=000000$.

Proof: We first note that the probability of a random $latex {a \in [x]}&fg=000000$ making $latex {\phi(a) \equiv 0 \!\!\pmod{q}}&fg=000000$ is negligible. By the Chinese Remainder Theorem, as remarked above, $latex {a}&fg=000000$ gives independent draws $latex {a_p \in [p]}&fg=000000$ and $latex {a_q \in [q]}&fg=000000$ and whether $latex {\phi(a) \equiv 0 \!\!\pmod{q}}&fg=000000$ depends only on $latex {a_q}&fg=000000$. This induces a polynomial $latex {\phi'(a_q)}&fg=000000$ over the field $latex {\mathbb{Z}_q}&fg=000000$ of degree (at most) $latex {d_n}&fg=000000$. So the probability $latex {\rho_q}&fg=000000$ of getting a root mod $latex {q}&fg=000000$ is at most

$latex \displaystyle \frac{d_n}{q} < O(2^{\nu - (n - \nu)}) = O(2^{2\nu -n}) = O(\frac{1}{2^{2\epsilon n}}), &fg=000000$

which is exponentially vanishing. Thus we may ignore $latex {\rho_q}&fg=000000$ in the rest of the analysis. The chance of a randomly sampled $latex {a}&fg=000000$ in a strand of length $latex {q}&fg=000000$ coinciding with another member $latex {v}&fg=000000$ of $latex {R'}&fg=000000$ is likewise bounded by $latex {\frac{d_n}{q}}&fg=000000$ and hence ignorable.

The reason why the probability of $latex {a_p}&fg=000000$ giving a root in the field $latex {\mathbb{Z}_p}&fg=000000$ is not vanishing is that $latex {r_n}&fg=000000$ is close to $latex {p}&fg=000000$. By $latex {r_n \leq d_n}&fg=000000$, we satisfy the constraint

$latex \displaystyle r_n = O(\frac{2^{\nu}}{\log L_n}) = O(\frac{p}{\log(p)\log_p(L_n)}). &fg=000000$

The condition $latex {r_n = \Omega(d_n)}&fg=000000$ ensures that this is the actual asymptotic order of $latex {r_n}&fg=000000$. Since we are limiting attention to primes of the same length as $latex {p}&fg=000000$, the ``$latex {\log(L)}&fg=000000$'' above can be to base $latex {p}&fg=000000$. Hence $latex {r_n}&fg=000000$ has the right order to give that for some $latex {\delta > 0}&fg=000000$ and constant fraction of primes of length $latex {\nu}&fg=000000$, the success probability $latex {\pi_a}&fg=000000$ of one trial over $latex {a}&fg=000000$ satisfies

$latex \displaystyle \pi_a \geq \frac{\delta r_n q}{x} = \frac{\delta r_n}{p} = \Omega(\frac{1}{\log(p)\log_p(L_n)}) = \Omega(\frac{1}{\log(L_n)}). &fg=000000$

Hence the expected number of trials is $latex {O(\log L_n)}&fg=000000$. The extra $latex {n^2}&fg=000000$ in the theorem statement is the time for each iteration, i.e., for arithmetic modulo $latex {x}&fg=000000$ and the Euclidean algorithm. $latex \Box&fg=000000$

It follows that if $latex {\phi_n}&fg=000000$ is also computable modulo $latex {x}&fg=000000$ in $latex {n^{O(1)}}&fg=000000$ time, and presuming $latex {L_n < x^{n^{O(1)}}}&fg=000000$ so that $latex {\log(L_n) = n^{O(1)}\log(x) = n^{O(1)}}&fg=000000$, then factoring products $latex {x = pq}&fg=000000$ of primes whose lengths differ by just a hair is in randomized average-case polynomial time. Of course this depends on the availability of a suitable polynomial $latex {\phi}&fg=000000$. But $latex {\phi}&fg=000000$ could be any polynomial---it needs no relation to factoring other than having plenty of distinct roots relative to its degree as itemized above. Hence there might be a lot of scope for such ``dangerous'' polynomials $latex {\phi}&fg=000000$ to exist.

\includegraphics[width=3.5in]{PalaisRoyalParisBuren.jpg}
Under the Palais Royale---a supercomputer?

Dick's paper does give an example where a $latex {\phi}&fg=000000$ with specified properties cannot exist, but there is still a lot of ``play'' in the bounds above. This emboldens us also to ask exactly how big the ``hair'' needs to be. We do not actually need to send $latex {\rho_q}&fg=000000$ toward zero. If a constant fraction of the values $latex {a}&fg=000000$ get bounced by the $latex {\phi(a) \equiv 0 \!\!\pmod{q}}&fg=000000$ event, then the expected time just goes up by the same constant factor.

Open Problems

We have tried to present Dick's paper in an ``open'' manner that encourages variations of its underlying enigma. We have also optically improved the result by using $latex {\nu = n(\frac{1}{2} - \epsilon)}&fg=000000$ rather than $latex {n^{\epsilon}}&fg=000000$ as in the paper. However, this may be implicit anyway since the paper's proofs might not require ``$latex {\epsilon}&fg=000000$'' to be constant, so that by taking $latex {\epsilon = 1 + \frac{\log C}{\log n}}&fg=000000$ one can make $latex {n^{\epsilon} = Cn}&fg=000000$ for any desired factor $latex {C < \frac{1}{2}}&fg=000000$. Is all of this correct?

If so, then possibly one can come even tighter to $latex {\frac{n}{2}}&fg=000000$ for the length of $latex {p}&fg=000000$. Then the question shifts to the possibilities of finding suitable polynomials $latex {\phi}&fg=000000$. The paper ``Few Product Gates But Many Zeroes,'' by Bernd Borchert, Pierre McKenzie, and Klaus Reinhardt, goes into such issues. This paper investigates ``gems''---that is, integer polynomials of degree $latex {N}&fg=000000$ having $latex {N}&fg=000000$ distinct integer roots---finding some for $latex {N}&fg=000000$ as high as 55 but notably leaving $latex {N = 32}&fg=000000$ open. Moreover, the role of a limitation $latex {L}&fg=000000$ on the magnitude of a constant fraction of a gem's roots remains at issue.

Finally, we address the general case $latex {\phi(z) = \sum_{i=0}^d c_i z^i}&fg=000000$ with rational coefficients $latex {c_i}&fg=000000$ (and $latex {c_d = 1}&fg=000000$). If $latex {\phi(a) = \frac{b}{c}}&fg=000000$ in lowest terms then $latex {\gcd(\phi(a),x)}&fg=000000$ means $latex {\gcd(b,x)}&fg=000000$ (divided by $latex {c}&fg=000000$) so the algorithm is the same. Suppose $latex {\frac{r}{s}}&fg=000000$ is a rational root in lowest terms and $latex {p}&fg=000000$ does not divide $latex {s}&fg=000000$, nor the denominator of any $latex {c_i}&fg=000000$. Then we can take $latex {w}&fg=000000$ such that $latex {ws = bp + 1}&fg=000000$ for some $latex {b}&fg=000000$ and define $latex {u = w(p + r)}&fg=000000$. This gives

$latex \displaystyle u - \frac{r}{s} = \frac{us - r}{s} = \frac{w(p+r)s - r}{s} = \frac{(bp+1)(p+r) - r}{s} = \frac{bp^2 + brp + p}{s} = \frac{p\beta}{s}, &fg=000000$

where $latex {\beta}&fg=000000$ is the integer $latex {bp + br + a}&fg=000000$. Then $latex {\phi(u) = \phi(\frac{r}{s} + \frac{\beta p}{s})}&fg=000000$. Because $latex {\frac{r}{s}}&fg=000000$ is a root, it follows that $latex {\phi(u)}&fg=000000$ is a sum of terms in which each numerator is a multiple of $latex {p}&fg=000000$ and each denominator is not. So $latex {\phi(u) = \frac{p\alpha}{\sigma}}&fg=000000$ in lowest terms where possibly $latex {\alpha = 0}&fg=000000$. Thus $latex {\gcd(\phi(u),x)}&fg=000000$ either yields $latex {p}&fg=000000$ or falls into one of two cases we already know how to count: $latex {u}&fg=000000$ is another root of $latex {\phi}&fg=000000$ or we have found a root mod $latex {q}&fg=000000$. Since $latex {u + cp}&fg=000000$ behaves the same as $latex {u}&fg=000000$ for all $latex {c \in [q]}&fg=000000$, we can define integer ``strands'' $latex {S_{u,p} = S_{r,s,p}}&fg=000000$ as before. There remains the possibility that strands induced by two roots $latex {y = r/s}&fg=000000$ and $latex {z = r'/s'}&fg=000000$ coincide. Take the inverse $latex {w'}&fg=000000$ for $latex {s'}&fg=000000$ and resulting integer $latex {u' = w'(p + r')}&fg=000000$, then the strands coincide if $latex {u \equiv u' \!\!\pmod{p}}&fg=000000$. This happens iff $latex {wr \equiv w' r' \!\!\pmod{p}}&fg=000000$. Multiplying both sides by $latex {ss'}&fg=000000$ gives

$latex \displaystyle wrss' = (bp+1)rs' \equiv rs' \equiv w' r' s s' = r' s(b' p + 1) \equiv r' s, &fg=000000$

so it follows that $latex {p}&fg=000000$ divides the numerator of $latex {z - y = \frac{rs' - sr'}{ss'}}&fg=000000$ in lowest terms. Thus we again have ``slots'' for each distinct pair $latex {y,z \in R}&fg=000000$ of rational roots and each possible prime divisor of the numerator of their difference. Essentially the same counting argument shows that a ``bad'' $latex {p}&fg=000000$ must fill $latex {\Omega(|R|)}&fg=000000$ of such slots. The other ways $latex {p}&fg=000000$ can be bad include dividing the denominator $latex {s}&fg=000000$ of a root or the denominator of a coefficient $latex {c_i}&fg=000000$---although neither way is mentioned in the paper it seems the choices for $latex {d_n}&fg=000000$ and $latex {r_n}&fg=000000$ in the above theorem leave just enough headroom. Then we just need $latex {L}&fg=000000$ to be a bound on all numerators and denominators involved in the bad cases, arguing as before. Last, it seems as above that only a subset $latex {R'}&fg=000000$ of the roots with $latex {|R'|/|R|}&fg=000000$ constant (or at least non-negligible) is needed to obey this bound. Assuming this sketch of Dick's full argument is airtight and works for our improved result, we leave its possible further ramifications over the integer case as a further open problem.