Currently there may be errors shown on top of a page, because of a missing Wiki update (PHP version and extension DPL3). |
Topics | Help • Register • News • History • How to • Sequences statistics • Template prototypes |
Elliptic curve method
The elliptic curve method (sometimes called Lenstra elliptic curve factorization, commonly abbreviated as ECM) is a factorization method which computes a large multiple of a point on a random elliptic curve modulo the number to be factored. It is currently the best algorithm known, among those whose complexity depends mainly on the size of the factor found.
This complexity can be represented by:
where p is the smallest factor.
Contents
[hide]Maths
Simple Explanation
This is a simple explanation of the Elliptic Curve Method. It is not a mathematical proof of the method, nor is it an explanation of why it works; it is simply an explanation of what the method is.
The Elliptic Curve Method is a way of determining factors of a composite number. This method cannot be used when it is not known in advance that the number is composite, so it cannot be used as a primality test.
First we need to understand some simple mathematical terms. A prime number is a number greater than 1 that is only divisible by itself and 1. A composite number is a number that has divisors that are neither itself, nor 1. A highly composite number is a number that has lots and lots of divisors. We can make highly composite numbers simply by multiplying lots of different prime numbers together. So that 2 x 3 x 5 x 7 x 11 x 13 etc will be a highly composite number. But that is only the first six prime numbers multiplied together. If we multiply the first 2000 prime numbers together we get a huge number that has lots and lots of divisors. The significance of this to the Elliptic Curve Method is that a huge amount of other numbers will have factors in common with our highly composite number.
Modular arithmetic can be thought of as a system of arithmetic in which we only concern ourselves with the remainder after dividing by a specific number called the modulus. So that if we take 17 as our modulus, when we multiply 47 x 23, instead of saying that the answer is 1081, we say it leaves a remainder of 10 after division by the modulus. We can then abbreviate this to 47 x 23 = 10 (mod 17). The significance of this to the Elliptic Curve Method is that all of our maths is done with the number we are trying to factor as the modulus.
If we plug different values for x into an algebraic expression such as
If we change the expression to
So now we have an algebraic expression that generates an elliptic curve. We have a highly composite number that has loads and loads of divisors, and we have some arithmetic being done modulo the number being factored. How do these three go together ?
We start by calculating a point on the curve. (Naturally the computer programmes that do this do not actually draw graphs, they simply calculate the x and y values for a particular point on the curve). Now, by doing some very beautiful mathematics explained below, we can calculate the position of other points on the same curve, the multiples of the original point. This process is called "running a curve" as in "I have run 400 curves" and in terms of the computer program doing this it is one iteration of the algorithm. So we compute the product of the first point times the highly composite number, and then calculate the greatest common divisor of the coordinate x of the found point and the number to be factored. If the greatest common divisor is greater than 1 and less than the number to be factored, congratulations! we have found a proper factor. Otherwise we have to run more curves, using a different starting point.
As described above it probably sounds a bit of a hit-or-miss affair, but there are controls built into the programme. By carefully determining the starting values for the programme it is possible to limit our search to factors of no more than some size measured in bits (a bit is the smallest unit of computer memory). After running the requisite number of curves (detailed on a table on this page) at one particular bit level, we can then start to look for factors at a higher bit level. Even at the higher bit level it is still possible to find factors of the smaller size. You can think of this as being a tall man and his short wife with an umbrella. If the wife is holding the umbrella, when we look under the umbrella we will only find her. But if her husband is holding the umbrella when we look underneath it we may find either of them, or even both. By running sufficient curves at each bit level we can eliminate the possibility of their being any factors hiding under the umbrella.
More detailed explanation
Introduction
The elliptic curve stated above (Weierstrass form) is not the best in terms of computations. The Montgomery form is used instead:
(mod ) (1)
which is an elliptic curve when
Notice that the curve (really a surface in the 3-dimensional space) above always contains the point
In this paragraph let's suppose that we work modulo a prime
It is interesting to note that the elliptic curve method does not use the
Step 1
Now if we work modulo the number to be factored,
How do we find a multiple of the group order that is not known in advance? Using highly composite numbers. Given a bound B1, we multiply the original point P by all prime numbers less than B1 (each prime number is raised to a power such that the result is also about B1). If the group order is a composite number whose factors are all less than B1 we are done. Otherwise we have to select another curve and another initial point P or continue with step 2.
Step 2
The step 2 is useful when the group order has a prime factor
Let Q be the point computed in the step 1 (the previous paragraph), so
An improvement to step 2 can be done if there is enough memory to store intermediate values. As a simple example let's consider the modulus 30. The numbers coprime to it are: 1, -1, 7, -7, 11, -11, 13 and -13. So we can precompute a table of four values: Q, 7Q, 11Q and 13Q (of course the first one does not require any calculation), and the values 30Q and
The modulus 30 can be replaced by 210, or 2310 if there is enough memory to hold all values of
Another optimization can be done when the modular multiplications are expensive: when both
For high values of B2 another method, known as the birthday paradox continuation, is faster.
Formulas for addition and duplication
Since the algorithm does not use the
If P = -Q then Px = Qx and Pz = Qz (they only differ in the
Let P = (Px : : Pz) and Q = (Qx : : Qz) where Px / Pz is not equal to Qx / Qz (otherwise they are the same point or negatives). Also let P + Q = (X+ : : Z+) and P - Q = (X- : : Z-).
where
Since the previous formula does not work when Px / Pz = Qx / Qz we need another one to compute 2P.
Let P = (Px : : Pz) and Q = 2P = (Qx : : Qz).
We get:
where:
All the arithmetic above must be done modulo
Only modular additions, subtractions and multiplications are needed. No modular inversions are required (they are very expensive in execution time).
In order to compute
Multiplication
In order to perform multiplication of a point P by a natural number
Montgomery's PRAC is an almost optimal algorithm, but this is too complicated to explain it here.
Another algorithm is the same used in exponentiation. We represent the value
For example for the number 21 which is 10101 when written in binary, you will have the string DDADDA. The letter D means duplication and A means addition.
So if we have to compute 21P using the recipe above we will get the sequence: 2P, 4P, 5P, 10P, 20P, 21P.
A problem with this approach is that the formula for addition shown above for Q + R needs the coordinates of Q - R. This can be overcomed by using another point. When we compute kP we will be also computing (k+1)P.
Then if we start with the points (kP, (k+1)P), when the bit equals zero we compute (2kP, (2k+1)P) and when the bit equals one we compute ((2k+1)P, 2(k+1)P). In both cases we need one duplication and one addition requiring only 10 modular multiplications.
The PRAC algorithm is 30% faster in average than the algorithm shown above to multiply a point by a natural number.
Selecting curve and starting point
Using the Montgomery form of elliptic curve, the group order is always multiple of four. Suyama found a method where the group order is multiple of 12, thus increasing the probability that this number has only small prime factors.
First we have to select a number sigma (
Implementations
- GMP-ECM
- Prime95
- A Web application, written by Dario Alpern, to factorize using ECM and SIQS.
Choosing the best parameters for ECM
It's still an open question how to choose the best parameters for ECM. Although, a few conventions have been made, including the B1 parameter for factors up to 70 digits (as you can see, that is a good limit because ECM on 70-digit factors is almost impossible with current hardware). Many programs today use the default B2=100*B1:
Digits | B1(B2=B1*100) | Curves |
---|---|---|
15 | 2,000 | 25 |
20 | 11,000 | 90 |
25 | 50,000 | 300 |
30 | 250,000 | 700 |
35 | 1,000,000 | 1,800 |
40 | 3,000,000 | 5,100 |
45 | 11,000,000 | 10,600 |
50 | 43,000,000 | 19,300 |
55 | 110,000,000 | 49,000 |
60 | 260,000,000 | 124,000 |
65 | 850,000,000 | 210,000 |
70 | 2,900,000,000 | 340,000 |
GMP-ECM also has default B2 values to use, which are commonly taken by most factorers. Some say that the B2 value doesn't affect the chances of finding a factor much, but as a rule of thumb it's common to spend as many time in stage 2 as in stage 1, and with the progress made on algorithms, this value should be much higher than B1*100. As we can see, using a higher B2 does affect the chances of finding a factor, according to GMP-ECM's probability function. A red square means N/A:
Digits | Value of B1 | GMP-ECM B2 default | Curves |
---|---|---|---|
15 | 2,000 | 147,396 | |
20 | 11,000 | 1,873,422 | 86 |
25 | 50,000 | 12,746,592 | 214 |
30 | 250,000 | 128,992,510 | 430 |
35 | 1,000,000 | 1,045,563,762 | 910 |
40 | 3,000,000 | 5,706,890,290 | 2,351 |
45 | 11,000,000 | 35,133,391,030 | 4,482 |
50 | 43,000,000 | 240,490,660,426 | 7,557 |
55 | 110,000,000 | 776,278,396,540 | 17,884 |
60 | 260,000,000 | 3,178,559,884,516 | 42,057 |
65 | 850,000,000 | 15,892,628,251,516 | 69,471 |
70 | 2,900,000,000 | 105,101,237,217,912 | 102,212 |
75 | 7,600,000,000 | 425,332,376,469,022 | 188,056 |
80 | 25,000,000,000 | 2,551,982,328,195,322 | 265,557 |
Ongoing factorization effort with ECM
There is an ongoing effort by GIMPS to factorize composite Mersenne and Fermat numbers using ECM. As of May 2018, the lowest composite Mersenne number without any known factors is 21277-1. [1]
The largest factor found using ECM so far has 83 decimal digits and was discovered on 2013-09-07 by R. Propper.[2]