Trial factoring

From Prime-Wiki
Jump to: navigation, search

Trial factoring or trial division is a factorization method that checks every prime less than the square root of the number in question to see if any of them divide that number, trying to finding a factor. If none are found, the number in question is prime; otherwise, it is a composite number.

The simplest approach is to already have available a supply of small prime numbers to use as trial divisors. If P(i) is the i'th prime number so P(1) = 2, P(2) = 3, P(3) = 5, etc, then the last prime factor possibility for some number N would be P(m) where P(m + 1) squared exceeds N.

Suppose you wish to test N = 45; the last prime factor you'd try would be 5 because although 5*5 = 25 is less than N, (or, [math]5 \lt \sqrt{N}[/math]) there is no need to try 7 since 2*7 is excluded because 2 will have been tried, 3*7 is excluded because 3 will have been tried, and 5*7 is excluded because 5 will have been tried. The first prime multiple of 7 that would not be tried is 7*7, and there is no need for that test because N is less than 7*7.

A useful table need not be large: P(3512) = 32749, the last prime that fits into a sixteen-bit signed integer, and P(6542) = 65521, the largest prime fitting into an unsigned sixteen bit integer.

Such a table can be prepared automatically, usually by an adaption of the Sieve of Eratosthenes, itself requiring a small table of known prime numbers to start its process, such as 2 and 3.

Alas, as the number being tested becomes large, so also does the table. The number of prime numbers less than N is near [math]\frac {N}{Ln(N) - 1}[/math] So, to check N for primality the largest prime factor needed is just less than [math]\sqrt{N}[/math] and so the number of such prime factor candidates would be close to [math]\frac {\sqrt{N}}{Ln(\sqrt{N}) - 1}[/math] which for [math]N = 10^{20}[/math] is 450 million.

Advancing from [math]N = 10^{50}[/math] to [math]N = 10^{60}[/math] may appear minor, but the table size has long been impossible.

Much depends on the details of the representation of large numbers in your computational device. For instance, it might use unsigned sixteen bit "digits" so that it operates in base 65536. Any candidate number N that fits into such a number can be tested in sixteen bit arithmetic, easily done, just as in decimal and base ten, we can deal with single digit numbers. Larger numbers are of interest, so one must escalate to multiple-digit arithmetic. If your CPU supports unsigned integer arithmetic of 32 bits, the straightforward implementations of long multiplication and division as learnt in primary school are easy enough, requiring work proportional to d*d where d is the number of digits (being sixteen-bit integers) in the number.

Should you instead desire to work with 32-bit unsigned integers, then these methods require that your cpu support inbuilt operations on 64-bit unsigned integers (because a basic step requires the multiplication of two single digits which produces a two-digit result) and so on for other possible size choices. But there are risks in using larger digit sizes. Verifying that your CPU functions correctly by testing the add, subtract, multiply and divide operations on all possible pairs of eight-bit numbers will not take long since there are only 256 number values, but testing sixteen bit operations is tedious and exhaustive 32-bit testing is hopeless.

And a special opportunity presents itself: suppose that N is a d-digit large number, where each digit is a sixteen-bit unsigned integer. Test divisions by all the primes up to the 16-bit limit is easy, requiring work proportional to d*1 only for each, not d*d, because the trial divisors are single digit numbers, and there are only 6,542 of them. Similarly for the case of a 32-bit digit size, though now there are 203,280,220 primes to test, and, each one requires 32 bits of storage rather than 16 bits... But such will deliver certain knowledge of factorization for N up to [math]2^{64}[/math]...

Rather than maintaining so large a table (though memory is ever cheaper) it might be better to develop it as needed via a suitable modification of the sieve process, especially as the effort extends to factors beyond 32-bit integers. Put another way, an array of 6,542 sixteen-bit integers is small, but two hundred million 32-bit numbers occupy more space than is comfortable. They could be kept in a disc file (either explicitly, or implicitly via the workings of 'virtual memory') but access time is slow. Various storage compaction schemes could be used, but none can beat the small amount of code needed to operate the sieve process, and if the trial factors are requested in sequence (as they will be) the sieve process can produce them in batches faster than they could be fetched from a disc file.

Suppose you choose to work with sixteen-bit integers and set aside an array P of the 6542 prime numbers: once this array is prepared the first 6,512 candidate factors are found directly. To find factor candidates beyond the sixteen-bit limit you prepare them in batches using a sieve. Using P as the supplier of small primes to sieve with means that you can generate all primes up to 32 bits before you need to go beyond P(6542), the highest sixteen-bit prime. Only when N is beyond the 64-bit limit will you be wanting candidate factors beyond the 32-bit limit, which is as far as a sieving process involving primes up to the 16-bit limit can reach. If instead you were to escalate to working in 32-bit unsigned integers, then P(203280220) = 4294967291, the last prime before [math]2^{32} - 1[/math], would suffice for a sieving process to generate primes as candidate factors up to [math]2^{64}[/math] that would suffice to test N up to [math]2^{128}[/math], were you to have enough time to wait...

By this point, you will have noticed that the primes used in the sieving process are large. With sieving primes such as 65537, only if the sieve batch width is greater will it knock out more than one non-prime number in that width. This "hit" average inexorably descends so that if a sieving prime is X times the size of the sieve's batch width, the chance it will contribute something useful (by knocking out a non-prime) is 1/X. Thus, rather than pushing the sieve process to completion to remove the ever-fewer remaining non-primes in the batch, the process could be truncated at the price of it presenting a few candidate factors that in fact are the products of some smaller primes for which it has already been established that they are not factors of N.

When referred to as a GIMPS workunit, it means the application of this method on Mersenne prime candidates checking potential divisors up to [math]2^{58}[/math] to [math]2^{80}[/math], depending on the exponent of the candidate.

If a factor is found, the Mersenne number/factor pair is stored in a database and the former candidate deleted from the list. If no factor is found, the candidate will be determined prime or composite by performing a Lucas-Lehmer test on at least two different computers which have to report matching results to assume the tests were properly done.

External links