Pollard's rho algorithm is an

algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...

for integer factorization. It was invented by John Pollard in 1975. It uses only a small amount of space, and its expected running time is proportional to the

square root In mathematics, a square root of a number is a number such that ; in other words, a number whose ''square'' (the result of multiplying the number by itself, or  ⋅ ) is . For example, 4 and −4 are square roots of 16, because . ...

of the smallest

prime factor A prime number (or a prime) is a natural number greater than 1 that is not a product of two smaller natural numbers. A natural number greater than 1 that is not prime is called a composite number. For example, 5 is prime because the only ways ...

of the

composite number A composite number is a positive integer that can be formed by multiplying two smaller positive integers. Equivalently, it is a positive integer that has at least one divisor other than 1 and itself. Every positive integer is composite, prime, ...

being factorized.

Core ideas

The algorithm is used to factorize a number

n = pq

, where

p

is a non-trivial factor. A

polynomial In mathematics, a polynomial is an expression consisting of indeterminates (also called variables) and coefficients, that involves only the operations of addition, subtraction, multiplication, and positive-integer powers of variables. An ex ...

modulo

n

, called

g(x)

(e.g.,

g(x) = (x^2 + 1) \bmod n

), is used to generate a

pseudorandom sequence A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generate ...

. It is important to note that

g(x)

must be a polynomial. A starting value, say 2, is chosen, and the sequence continues as

x_1 = g(2)

x_2 = g(g(2))

x_3 = g(g(g(2)))

, etc. The sequence is related to another sequence

\

. Since

p

is not known beforehand, this sequence cannot be explicitly computed in the algorithm. Yet, in it lies the core idea of the algorithm. Because the number of possible values for these sequences is finite, both the

\

sequence, which is mod

n

, and

\

sequence will eventually repeat, even though these values are unknown. If the sequences were to behave like random numbers, the

birthday paradox In probability theory, the birthday problem asks for the probability that, in a set of randomly chosen people, at least two will share a birthday. The birthday paradox is that, counterintuitively, the probability of a shared birthday exceeds 5 ...

implies that the number of

x_k

before a repetition occurs would be expected to be

O(\sqrt N)

, where

N

is the number of possible values. So the sequence

\

will likely repeat much earlier than the sequence

\

. When one has found a

k_1,k_2

such that

x_\neq x_

but

x_\equiv x_\bmod p

, the number

, x_-x_,

is a multiple of

p

, so

p

has been found. Once a sequence has a repeated value, the sequence will cycle, because each value depends only on the one before it. This structure of eventual cycling gives rise to the name "rho algorithm", owing to similarity to the shape of the Greek letter ''ρ'' when the values

x_1 \bmod p

x_2 \bmod p

, etc. are represented as nodes in a

directed graph In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph that is made up of a set of vertices connected by directed edges, often called arcs. Definition In formal terms, a directed graph is an ordered pai ...

This is detected by Floyd's cycle-finding algorithm: two nodes

i

and

j

(i.e.,

x_i

and

x_j

) are kept. In each step, one moves to the next node in the sequence and the other moves forward by two nodes. After that, it is checked whether

\gcd(x_i - x_j, n) \ne 1

. If it is not 1, then this implies that there is a repetition in the

\

sequence (i.e.

x_i \bmod p = x_j \bmod p)

. This works because if the

x_i

is the same as

x_j

, the difference between

x_i

and

x_j

is necessarily a multiple of

p

. Although this always happens eventually, the resulting

greatest common divisor In mathematics, the greatest common divisor (GCD) of two or more integers, which are not all zero, is the largest positive integer that divides each of the integers. For two integers ''x'', ''y'', the greatest common divisor of ''x'' and ''y'' i ...

(GCD) is a divisor of

n

other than 1. This may be

n

itself, since the two sequences might repeat at the same time. In this (uncommon) case the algorithm fails, and can be repeated with a different parameter.

Algorithm

The algorithm takes as its inputs , the

integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign ( −1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...

to be factored; and , a polynomial in computed modulo . In the original algorithm,

g(x) = (x^2 - 1) \bmod n

, but nowadays it is more common to use

g(x) = (x^2 + 1) \bmod n

. The output is either a non-trivial factor of , or failure. It performs the following steps: x ← 2 y ← 2 d ← 1 while d = 1: x ← g(x) y ← g(g(y)) d ← gcd(, x - y, , n) if d = n: return failure else: return d Here and corresponds to and in the previous section. Note that this algorithm may fail to find a nontrivial factor even when is composite. In that case, the method can be tried again, using a starting value other than 2 or a different .

Example factorization

Let

n = 8051

and

g(x) = (x^2 + 1) \bmod 8051

97 is a non-trivial factor of 8051. Starting values other than may give the cofactor (83) instead of 97. One extra iteration is shown above to make it clear that moves twice as fast as . Note that even after a repetition, the GCD can return to 1.

Variants

In 1980, Richard Brent published a faster variant of the rho algorithm. He used the same core ideas as Pollard but a different method of cycle detection, replacing Floyd's cycle-finding algorithm with the related Brent's cycle finding method. A further improvement was made by Pollard and Brent. They observed that if

\gcd(a,n) > 1

, then also

\gcd(ab,n) > 1

for any positive integer . In particular, instead of computing

\gcd (, x-y, ,n)

at every step, it suffices to define as the product of 100 consecutive

, x-y,

terms modulo , and then compute a single

\gcd(z,n)

. A major speed up results as 100 steps are replaced with 99 multiplications modulo and a single . Occasionally it may cause the algorithm to fail by introducing a repeated factor, for instance when is a

square In Euclidean geometry, a square is a regular quadrilateral, which means that it has four equal sides and four equal angles (90-degree angles, π/2 radian angles, or right angles). It can also be defined as a rectangle with two equal-length a ...

. But it then suffices to go back to the previous term, where

\gcd(z,n)=1

, and use the regular ''ρ'' algorithm from there.

Application

The algorithm is very fast for numbers with small factors, but slower in cases where all factors are large. The ''ρ'' algorithm's most remarkable success was the 1980 factorization of the Fermat number = 1238926361552897 × 93461639715357977769163558199606896584051237541638188580280321. The ''ρ'' algorithm was a good choice for because the prime factor = 1238926361552897 is much smaller than the other factor. The factorization took 2 hours on a

UNIVAC UNIVAC (Universal Automatic Computer) was a line of electronic digital stored-program computers starting with the products of the Eckert–Mauchly Computer Corporation. Later the name was applied to a division of the Remington Rand company and ...

1100/42.

Example: factoring = 10403 = 101 · 103

The following table shows numbers produced by the algorithm, starting with

x=2

and using the polynomial

g(x) = (x^2 + 1) \bmod 10403

. The third and fourth columns of the table contain additional information not known by the algorithm. They are included to show how the algorithm works. The first repetition modulo 101 is 97 which occurs in step 17. The repetition is not detected until step 23, when

x \equiv y \pmod

. This causes

\gcd (x - y, n) = \gcd (2799 - 9970, n)

to be

p = 101

, and a factor is found.

Complexity

If the pseudorandom number

x = g(x)

occurring in the Pollard ''ρ'' algorithm were an actual random number, it would follow that success would be achieved half the time, by the

Birthday paradox In probability theory, the birthday problem asks for the probability that, in a set of randomly chosen people, at least two will share a birthday. The birthday paradox is that, counterintuitively, the probability of a shared birthday exceeds 5 ...

O(\sqrt p)\le O(n^)

iterations. It is believed that the same analysis applies as well to the actual rho algorithm, but this is a heuristic claim, and rigorous analysis of the algorithm remains open..

References

External links

Comprehensive article on Pollard's Rho algorithm aimed at an introductory-level audience
*

{{DEFAULTSORT:Pollard's Rho Algorithm Integer factorization algorithms Articles with example Python (programming language) code