Welcome to the the Great Internet Mersenne Prime Search! (The non-Pentium version ;)
This site contains Ernst Mayer's C source code for performing Lucas-Lehmer tests (and coming soon: sieve-based factoring code which looks for small factors of prime-exponent Mersenne numbers.) In short, everything you need to search for Mersenne primes on your non-x86 computer!
worktodo.ini file found...checking next exponent in range...
p = 10585777
restart file p10585777 found...reading...
Assertion failed: a != 0x0 && a_ptmp != 0x0 && arrtmp != 0x0, file ../Mlucas.c, line 745
This has been fixed and new source/binaries are available. (Note that the AMD64 and Mac/PPC binaries were posted on the 23rd, as soon as Tom Cage had time to do the builds for me.) Your savefiles are fine; you'll just need the bugfixed code to properly restart an interrupted run from them.
For general questions about the math behind the Lucas-Lehmer test, general primality testing or related topics, check out the Mersenne Forum.
18 August 2004: Mlucas 2.8 beta released! Binaries for selected platforms are also available.
Major highlights of v2.8:
Planned for the next major release:
To do Lucas-Lehmer tests, you'll need to build either the Mlucas C source code (418 KB, last updated 21 October 2004) or download one of the precompiled binaries available for selected platforms (see below.)
Most exponents handed out by the PrimeNet server have already been trial-factored to the recommended depth, so in most cases, no additional factoring effort is necessary. If you have exponents that require additional trial factoring, you'll need to do the trial factoring using Prime95 on a PC, as Mlucas currently has no trial factoring capability.
If you are using one of the platforms below, you can save yourself the effort of compiling the source code:
| For MAC PowerPC G5/OS X: | Description/Build Notes | File size: | Last modified |
| MlucasXLC_G5_11.03.04.tgz | Thanks to Mark Rodenkirch, we have an improved binary for Apple PowerPC G5, built
using the IBM XL C compiler.
This gives 15-20% better performance at most runlengths than the old gcc build. The build used simply 'xlc -O2 *.c -o Mlucas' on a G5.
Here is the snippet from the configuration file generated during the self-tests which summarizes the best per-iteration timing at each available FFT length and the corresponding set of FFT radices. Note the anomalous timing at 1280K marked with ** - this appears to be due to something going awry with the XLC compiler's optimizations for the radix-5 DFT macros; I've notified the XLC folks about it, hopefully it can be fixed in the near future. In practice you'd remove this line from your .cfg file. These timings were on a single CPU of a 2.5GHz dual-CPU G5: 576 2 # sec/iter = 0.036; radices = 18 32 32 16
640 2 # sec/iter = 0.042; radices = 10 32 32 32
704 2 # sec/iter = 0.045; radices = 22 32 32 16
768 3 # sec/iter = 0.047; radices = 24 32 32 16
832 2 # sec/iter = 0.055; radices = 26 32 32 16
896 3 # sec/iter = 0.059; radices = 28 32 32 16
960 2 # sec/iter = 0.061; radices = 30 32 32 16
1024 6 # sec/iter = 0.063; radices = 32 32 32 16
1152 5 # sec/iter = 0.073; radices = 36 32 32 16
1280** 2 # sec/iter = 0.109; radices = 10 8 16 16 32
1408 3 # sec/iter = 0.101; radices = 22 32 32 32
1536 3 # sec/iter = 0.106; radices = 24 32 32 32
1664 3 # sec/iter = 0.122; radices = 26 32 32 32
1792 3 # sec/iter = 0.131; radices = 28 32 32 32
1920 3 # sec/iter = 0.136; radices = 30 32 32 32
2048 4 # sec/iter = 0.143; radices = 32 32 32 32
| 560 KB | 11/03/04 |
| For MAC PowerPC G4/OS X: | Description/Build Notes | File size: | Last modified |
| MlucasXLC_G4_11.03.04.tgz | Thanks to Tom Cage, we have an improved binary for Apple PowerPC G4, built
using the IBM XL C compiler.
This gives 15-20% better performance at most runlengths than the old gcc build. (Compare the .cfg file numbers below to those for MAC PowerPC/Linux - these timings were on the same G4 hardware runnings two different OSes.) The build used simply 'xlc -O2 *.c -o Mlucas' on a G4.
Here is the snippet from the configuration file generated during the self-tests which summarizes the best per-iteration timing at each available FFT length and the corresponding set of FFT radices. Note the anomalous timing at 1280K marked with ** - this appears to be due to something going awry with the XLC compiler's optimizations for the radix-5 DFT macros; I've notified the XLC folks about it, hopefully it can be fixed in the near future. In practice you'd remove this line from your .cfg file. These timings were on a 867MHz G4: 576 2 # sec/iter = 0.181; radices = 18 32 32 16
640 2 # sec/iter = 0.206; radices = 10 32 32 32
704 2 # sec/iter = 0.230; radices = 22 32 32 16
768 3 # sec/iter = 0.241; radices = 24 32 32 16
832 2 # sec/iter = 0.277; radices = 26 32 32 16
896 3 # sec/iter = 0.297; radices = 28 32 32 16
960 2 # sec/iter = 0.308; radices = 30 32 32 16
1024 6 # sec/iter = 0.322; radices = 32 32 32 16
1152 5 # sec/iter = 0.368; radices = 36 32 32 16
1280** 1 # sec/iter = 0.520; radices = 10 16 16 16 16
1408 3 # sec/iter = 0.490; radices = 22 32 32 32
1536 3 # sec/iter = 0.516; radices = 24 32 32 32
1664 3 # sec/iter = 0.590; radices = 26 32 32 32
1792 3 # sec/iter = 0.630; radices = 28 32 32 32
1920 3 # sec/iter = 0.658; radices = 30 32 32 32
2048 4 # sec/iter = 0.689; radices = 32 32 32 32
| 560 KB | 11/03/04 |
| For MAC PowerPC/Linux: | Description/Build Notes | File size: | Last modified |
| Mlucas_PowerPCBinary.tgz | Tom Cage is my build czar for the PowerPC. From his build notes:
Motorola PPc-7455 (Titanium) G4-867, 256K L2, 1M L3, 768M PC133, Bus 133MHz, Darwin 7.5 g4-867:~/Desktop/Mlucas k5gj$ uname -a Darwin g4-867.local 7.5.0 Darwin Kernel Version 7.5.0: Thu Aug 5 19:26:16 PDT 2004; root:xnu/xnu-517.7.21.obj~3/RELEASE_PPC Power Macintosh powerpc g4-867:~/Desktop/Mlucas k5gj$ /usr/local/bin/./gcc34 -v Reading specs from /usr/local/lib/gcc/powerpc-apple-darwin7.5.0/3.4.2/specs Configured with: /usr/gcc-3.4-20040820/./configure --program-suffix=34 --enable-languages=c --disable-nls Thread model: posix gcc version 3.4.2 20040820 (prerelease) g4-867:~/Desktop/Mlucas k5gj$ /usr/local/bin/./gcc34 -O3 -mtune=7450 *.c -o Mlucas mers_mod_square.c:3103:2: warning: no newline at end of file g4-867:~/Desktop/Mlucas k5gj$Here is the snippet from the configuration file generated during the self-tests which summarizes the best per-iteration timing at each available FFT length and the corresponding set of FFT radices: 576 0 # sec/iter = 0.214; radices = 9 8 16 16 16
640 0 # sec/iter = 0.243; radices = 5 16 16 16 16
704 0 # sec/iter = 0.283; radices = 11 8 16 16 16
768 1 # sec/iter = 0.293; radices = 12 8 16 16 16
832 0 # sec/iter = 0.336; radices = 13 8 16 16 16
896 0 # sec/iter = 0.350; radices = 7 16 16 16 16
960 0 # sec/iter = 0.388; radices = 15 8 16 16 16
1024 1 # sec/iter = 0.400; radices = 8 16 16 16 16
1152 1 # sec/iter = 0.464; radices = 9 16 16 16 16
1280 1 # sec/iter = 0.523; radices = 10 16 16 16 16
1408 1 # sec/iter = 0.606; radices = 11 16 16 16 16
1536 1 # sec/iter = 0.634; radices = 12 16 16 16 16
1664 1 # sec/iter = 0.725; radices = 13 16 16 16 16
1792 1 # sec/iter = 0.766; radices = 14 16 16 16 16
1920 1 # sec/iter = 0.833; radices = 15 16 16 16 16
2048 3 # sec/iter = 0.875; radices = 16 16 16 16 16
| 602 KB | 09/23/04 |
| For AMD64/Linux: | Description/Build Notes | File size: | Last modified |
| Mlucas_AMD64_11.03.04.tgz | Note that George Woltman's Prime95 code is currently 20-30% faster on the Athlon64 than Mlucas. Tom Cage is my build czar for the AMD64. From his build notes: FreeBSD-5.3-BETA1 Gigabyte GA-K8VT800M motherboard (AMD64-3400+) K8-2200, 64K L1, 1M L2, FSB 800MHz, 1024M DDR400 PC3200, Bus 200MHz gigabyte# uname -a FreeBSD gigabyte.dl.cox.net 5.3-RELEASE FreeBSD 5.3-RELEASE #0: Sun Oct 24 19:23:39 CDT 2004 5.3-BETA1 FreeBSD 5.3-BETA1 #0: Wed Aug 25 20:31:20 CDT 2004 root@gigabyte.dl.cox.net:/usr/obj/usr/src/sys/MYKERNEL amd64 gigabyte# gcc34 -v Reading specs from /usr/local/lib/gcc/x86_64-portbld-freebsd5.3/3.4.3/specs Configured with: /usr/gcc-3.4-20040820/./configure --program-suffix=34 --enable-languages=c --disable-nls Configured with: ./..//gcc-3.4-20041015/configure --disable-nls --with-system-zlib --with-libiconv-prefix=/usr/local --program-suffix=34 --with-gxx-include-dir=/usr/local/lib/gcc/x86_64-portbld-freebsd5.3/3.4.3/include/c++/ --disable-shared --disable-libgcj --prefix=/usr/local x86_64-portbld-freebsd5.3 Thread model: posix gcc version 3.4.3 20041015 (prerelease) [FreeBSD] gigabyte# gcc34 -O3 -march=k8 -m64 *.c -o Mlucas -lm radix24_ditN_cy_dif1.c:1376:3: warning: no newline at end of file gigabyte#Here is the snippet from the configuration file generated during the self-tests which summarizes the best per-iteration timing at each available FFT length and the corresponding set of FFT radices:
576 1 # sec/iter = 0.044; radices = 9 32 32 32
640 2 # sec/iter = 0.049; radices = 10 32 32 32
704 1 # sec/iter = 0.060; radices = 11 32 32 32
768 2 # sec/iter = 0.057; radices = 12 32 32 32
832 1 # sec/iter = 0.068; radices = 13 32 32 32
896 2 # sec/iter = 0.070; radices = 14 32 32 32
960 1 # sec/iter = 0.074; radices = 15 32 32 32
1024 5 # sec/iter = 0.077; radices = 32 16 32 32
1152 5 # sec/iter = 0.093; radices = 36 32 32 16
1280 2 # sec/iter = 0.109; radices = 10 8 16 16 32
1408 2 # sec/iter = 0.128; radices = 11 8 16 16 32
1536 3 # sec/iter = 0.124; radices = 24 32 32 32
1664 3 # sec/iter = 0.143; radices = 26 32 32 32
1792 2 # sec/iter = 0.154; radices = 14 8 16 16 32
1920 3 # sec/iter = 0.160; radices = 30 32 32 32
2048 4 # sec/iter = 0.167; radices = 32 32 32 32
| 594 KB | 11/03/04 |
| For AMD64/Solaris: | Description/Build Notes | File size: | Last modified |
| Mlucas_AMD64_Solaris | ***NOTE***: 31 July 2006: If you downloaded the previous version of this binary (which turns out to have had a fatal bug related to the floating-point optimizations - the code ran but gave garbage results), you'll want to download the latest version and restart your runs from scratch, i.e. blow away the pxxxxxx and qxxxxxx savefiles prior to invoking the new binary. (I'm hoping Rob, as chief builder and guinea pig for this relatively new binary, was the only one affected.)
This was built by Rob Giltrap using the Sun C compiler (version: Sun C 5.8 2005/10/13) and the simple compile sequence (we may improve on this, but this was the first time anyone's built the code on AMD64/Solaris) cc -o Mlucas -fast -fsimple=1 *.c Rob built and did the self-tests on a 1.8 GHz AMD64 Opteron. Here is the snippet from the configuration file generated during the self-tests which summarizes the best per-iteration timing at each available FFT length and the corresponding set of FFT radices: 576 1 # sec/iter = 0.054; radices = 9 32 32 32
640 2 # sec/iter = 0.060; radices = 10 32 32 32
704 1 # sec/iter = 0.070; radices = 11 32 32 32
768 2 # sec/iter = 0.071; radices = 12 32 32 32
832 1 # sec/iter = 0.083; radices = 13 32 32 32
896 2 # sec/iter = 0.086; radices = 14 32 32 32
960 1 # sec/iter = 0.090; radices = 15 32 32 32
1024 5 # sec/iter = 0.093; radices = 32 16 32 32
1152 4 # sec/iter = 0.111; radices = 18 32 32 32
1280 3 # sec/iter = 0.125; radices = 20 32 32 32
1408 3 # sec/iter = 0.145; radices = 22 32 32 32
1536 3 # sec/iter = 0.148; radices = 24 32 32 32
1664 3 # sec/iter = 0.171; radices = 26 32 32 32
1792 3 # sec/iter = 0.180; radices = 28 32 32 32
1920 2 # sec/iter = 0.192; radices = 15 8 16 16 32
2048 3 # sec/iter = 0.200; radices = 16 16 16 16 16
| 1.4 MB | 07/31/06 |
| For Alpha under TruUnix/Linux: | Description/Build Notes | File size: | Last modified |
| Mlucas_alfa.gz | Statically linked exe, runs under both TruUnix (version 5.0 or higher) and Linux. Built using Compaq C V6.5-011 on Compaq Tru64 UNIX V5.1B (Rev. 2650) and the compile sequence cc -non_shared -o Mlucas -O4 -assume accuracy_sensitive -unroll 1 -arch ev6 -tune ev6 -Olimit 100000 *.c -lm I built and did self-tests on a 1 GHz Alpha ev67 running TruUnix 5.1B. Here is the snippet from the configuration file generated during the self-tests which summarizes the best per-iteration timing at each available FFT length and the corresponding set of FFT radices: 576 0 # sec/iter = 0.061; radices = 9 8 16 16 16 640 0 # sec/iter = 0.065; radices = 5 16 16 16 16 704 0 # sec/iter = 0.081; radices = 11 8 16 16 16 768 0 # sec/iter = 0.079; radices = 6 16 16 16 16 832 0 # sec/iter = 0.097; radices = 13 8 16 16 16 896 0 # sec/iter = 0.101; radices = 7 16 16 16 16 960 0 # sec/iter = 0.105; radices = 15 8 16 16 16 1024 1 # sec/iter = 0.110; radices = 8 16 16 16 16 1152 1 # sec/iter = 0.131; radices = 9 16 16 16 16 1280 1 # sec/iter = 0.153; radices = 10 16 16 16 16 1408 1 # sec/iter = 0.179; radices = 11 16 16 16 16 1536 1 # sec/iter = 0.180; radices = 12 16 16 16 16 1664 1 # sec/iter = 0.213; radices = 13 16 16 16 16 1792 1 # sec/iter = 0.230; radices = 14 16 16 16 16 1920 1 # sec/iter = 0.236; radices = 15 16 16 16 16 2048 2 # sec/iter = 0.251; radices = 8 16 16 32 16 | 730 KB | 09/21/04 |
| For Alpha/VMS: | Description/Build Notes | File size: | Last modified |
| NO PRECOMPILED BINARIES CURRENTLY AVAILABLE | |||
| For Itanium/Linux: | Description/Build Notes | File size: | Last modified |
| Mlucas_ia64_linux.gz | Statically linked exe, built using the Intel v8.0 C compiler and the compile sequence icc -o Mlucas -O3 -static *.c -lm I built and did self-tests on a 1 GHz Itanium 2 running Red Hat Ent. Linux ES 3.0. Here is the snippet from the configuration file generated during the self-tests which summarizes the best per-iteration timing at each available FFT length and the corresponding set of FFT radices: 576 1 # sec/iter = 0.056; radices = 9 32 32 32
640 2 # sec/iter = 0.061; radices = 10 32 32 32
704 1 # sec/iter = 0.069; radices = 11 32 32 32
768 2 # sec/iter = 0.074; radices = 12 32 32 32
832 1 # sec/iter = 0.082; radices = 13 32 32 32
896 2 # sec/iter = 0.086; radices = 14 32 32 32
960 1 # sec/iter = 0.092; radices = 15 32 32 32
1024 4 # sec/iter = 0.098; radices = 16 32 32 32
1152 1 # sec/iter = 0.125; radices = 9 16 16 16 16
1280 1 # sec/iter = 0.137; radices = 10 16 16 16 16
1408 1 # sec/iter = 0.153; radices = 11 16 16 16 16
1536 1 # sec/iter = 0.165; radices = 12 16 16 16 16
1664 1 # sec/iter = 0.181; radices = 13 16 16 16 16
1792 1 # sec/iter = 0.192; radices = 14 16 16 16 16
1920 1 # sec/iter = 0.205; radices = 15 16 16 16 16
2048 3 # sec/iter = 0.219; radices = 16 16 16 16 16
| 1.25 MB | 09/21/04 |
| For Itanium/HPUX: | Description/Build Notes | File size: | Last modified |
| Mlucas_ia64_hpux.gz | Built using a experimental beta version of the latest HP-C compiler: cc: HP aC++/ANSI C B3910B A.05.56 [June 09 2004] [aCC6_beta] and the compile sequence cc -o Mlucas -fast *.c -lm This gives a roughly ~5-10% faster executable than the latest release version (5.36) of the compiler which I had access to. I built and did self-tests on a 1 GHz Itanium 2 running HP-UX 11i 11.23. Here is the snippet from the configuration file generated during the self-tests which summarizes the best per-iteration timing at each available FFT length and the corresponding set of FFT radices: 576 2 # sec/iter = 0.062; radices = 18 32 32 16
640 1 # sec/iter = 0.068; radices = 10 8 16 16 16
704 0 # sec/iter = 0.076; radices = 11 8 16 16 16
768 1 # sec/iter = 0.082; radices = 12 8 16 16 16
832 0 # sec/iter = 0.091; radices = 13 8 16 16 16
896 1 # sec/iter = 0.095; radices = 14 8 16 16 16
960 0 # sec/iter = 0.103; radices = 15 8 16 16 16
1024 3 # sec/iter = 0.108; radices = 16 8 16 16 16
1152 3 # sec/iter = 0.122; radices = 18 8 16 16 16
1280 1 # sec/iter = 0.148; radices = 10 16 16 16 16
1408 1 # sec/iter = 0.163; radices = 11 16 16 16 16
1536 1 # sec/iter = 0.177; radices = 12 16 16 16 16
1664 1 # sec/iter = 0.196; radices = 13 16 16 16 16
1792 1 # sec/iter = 0.205; radices = 14 16 16 16 16
1920 1 # sec/iter = 0.222; radices = 15 16 16 16 16
2048 3 # sec/iter = 0.235; radices = 16 16 16 16 16
| 759 KB | 09/21/04 |
| For SPARC/Solaris (and SunOS): | Description/Build Notes | File size: | Last modified |
| Mlucas_sparc_v9.gz | NOTE: This binary will not run on pre UltraSPARC III processors. Built using the Sun native C compiler: cc -V: Sun C 5.7 2005/01/07 and the compile sequence cc -o Mlucas -fast *.c Note that I found no benefit from using runtime profiling via the -xprofile flag, nor from any of the other compile options I played with, which included -xO5, -fsimple, -xtarget, -xprefetch=auto,explicit, -xsafe=mem, -xarch=v8plusa, -xcrossfile. I built and did self-tests on a 900 MHz Sparc 3. Here is the snippet from the configuration file generated during the self-tests which summarizes the best per-iteration timing at each available FFT length and the corresponding set of FFT radices: 576 2 # sec/iter = 0.118; radices = 18 32 32 16
640 2 # sec/iter = 0.135; radices = 10 32 32 32 16
704 2 # sec/iter = 0.160; radices = 22 32 32 16 16
768 3 # sec/iter = 0.152; radices = 24 32 32 16 16
832 2 # sec/iter = 0.182; radices = 26 32 32 16 16
896 0 # sec/iter = 0.199; radices = 7 16 16 16 16
960 2 # sec/iter = 0.198; radices = 30 32 32 16 16
1024 6 # sec/iter = 0.206; radices = 32 32 32 16 16
1152 5 # sec/iter = 0.249; radices = 36 32 32 16 16
1280 1 # sec/iter = 0.293; radices = 10 16 16 16 16
1408 1 # sec/iter = 0.365; radices = 11 16 16 16 16
1536 3 # sec/iter = 0.366; radices = 24 32 32 32 16
1664 1 # sec/iter = 0.431; radices = 13 16 16 16 16
1792 1 # sec/iter = 0.475; radices = 14 16 16 16 16
1920 3 # sec/iter = 0.485; radices = 30 32 32 32 16
2048 3 # sec/iter = 0.500; radices = 16 16 16 16 16
| 1.1 MB | 12/26/05 |
| For MAC/PowerPC under MacOS X: | Description/Build Notes | File size: | Last modified |
| NO PRECOMPILED BINARIES CURRENTLY AVAILABLE | |||
| For SGI/MIPS (all CPU types) under IRIX: | Description/Build Notes | File size: | Last modified |
| NO PRECOMPILED BINARIES CURRENTLY AVAILABLE | |||
| For PA-RISC/HPUX: | Description/Build Notes | File size: | Last modified |
| Mlucas_parisc_hpux_10.21.04.gz | Statically linked exe, built using the Intel v8.0 C compiler and the compile sequence cc -noshared -o Mlucas +O3 +Odataprefetch +DA2.0 *.c -lm I built and did self-tests on an 800 MHz PA-RISC 8800 running HP-UX 11i 11.11. Here is the snippet from the configuration file generated during the self-tests which summarizes the best per-iteration timing at each available FFT length and the corresponding set of FFT radices: 576 2 # sec/iter = 0.054; radices = 18 32 32 16
640 3 # sec/iter = 0.061; radices = 20 32 32 16
704 2 # sec/iter = 0.069; radices = 22 32 32 16
768 3 # sec/iter = 0.070; radices = 24 32 32 16
832 2 # sec/iter = 0.082; radices = 26 32 32 16
896 3 # sec/iter = 0.088; radices = 28 32 32 16
960 2 # sec/iter = 0.091; radices = 30 32 32 16
1024 6 # sec/iter = 0.092; radices = 32 32 32 16
1152 5 # sec/iter = 0.109; radices = 36 32 32 16
1280 3 # sec/iter = 0.133; radices = 20 32 32 32
1408 3 # sec/iter = 0.149; radices = 22 32 32 32
1536 3 # sec/iter = 0.153; radices = 24 32 32 32
1664 3 # sec/iter = 0.179; radices = 26 32 32 32
1792 3 # sec/iter = 0.189; radices = 28 32 32 32
1920 3 # sec/iter = 0.198; radices = 30 32 32 32
2048 4 # sec/iter = 0.203; radices = 32 32 32 32
| 761 KB | 10/21/04 |
For a complete list of Mlucas command line options, type 'Mlucas -h'.
After building the source code or downloading a binary executable appropriate for your system, the first thing that should be done is a set of self-tests to make sure the binary works properly on your system. During these self-tests, the code also collects various timing data which allow it to configure itself for optimal performance on your hardware. It does this by saving data about the optimal FFT radix combination at each FFT length tried in the self-test to a configuration file, named mlucas.cfg. Once this file has been generated, it will be read whenever the program is invoked to get the optimal-FFT data (specifically, the optimal set of radices into which to subdivide each FFT length) for the exponent currently being tested.
To perform the needed self-tests for a typical-user setup (which implies that you'll be either doing double-checking or first-time LL testing), simply type
Mlucas -s m
This tells the program to perform a series of self-tests for FFT lengths in the 'medium' range, which currently (as of August 2004) means FFT lengths from 576K-2048K, which covers Mersenne numbers with exponents from 11.15M-39.7M.
If you want to do the self-tests of the various available radix sets for one particular FFT length, enter
Mlucas -s {FFT length in K}
For instance, to test all the FFT radices available at 704K, enter
Mlucas -s 704
The above single-FFT-length self-test feature is particularly handy if the binary you are using throws errors for one or more particular FFT lengths, which interrupt the complete self-test before it has a chance to complete the configuration file. In that case, one must skip the offending FFT length and go on to the next-higher one, and in this fashion build a .cfg file one FFT length at a time. (Note that each such test overwrites any existing mlucas.cfg file, so make sure to save a copy of this file before doing additional self-tests. Or after each FFT length is processed, type tail -1 mlucas.cfg and accumulate the results in an editor window.)
If you are running multiple copies of Mlucas on a multi-processor system, a copy of the mlucas.cfg file should be placed into each working directory (i.e.wherever you have a worktodo.ini file). Note that the program can run without this file, but with a proper configuration file it will run optimally at each runlength.
What is contained in the configuration file? Well, let's let one speak for itself. The following mlucas.cfg file was generated on a 1.0 GHz Alpha ev67 running TruUnix 5.1. I've italicized and colorized the comments to set them off from the actual optimal-FFT-radix data:
#
# mlucas.cfg file
# Insert comments as desired in lines beginning with a # symbol.
#
# First non-comment line contains number of iterations to do before
# turning per-iteration roundoff error checking off. Default = 200000.
#
200000
#
# Remaining non-comment lines contain data about the optimal
# FFT parameters at each runlength on the host platform.
# This file is updated automatically as needed by the program.
#
# Each line below contains a pair of integers. The first is the FFT length
# (in units of 1K), i.e. the number of 8-byte floats used to store the
# LL test residues for the exponent being tested. The program does
# a complex-data FFT, i.e. treats this vector of reals as a complex
# vector having half the length. The second integer on each line is
# the index of the optimal set of FFT radices with respect to the
# supported FFT radix sets for the particular FFT length. The available
# FFT radices can be found in the large table at the bottom (specifically
# the switch(n){} structure in the function get_fft_radices()) of the
# mers_mod_square.c source file. The optimal set of complex radices
# and the per-iteration timing it yields on the host platform are written
# to the comment field right of each FFT-length/optimal-radix-set pair.
#
576 0 # sec/iter = 0.061; radices = 9 8 16 16 16
640 0 # sec/iter = 0.065; radices = 5 16 16 16 16
704 0 # sec/iter = 0.081; radices = 11 8 16 16 16
768 0 # sec/iter = 0.079; radices = 6 16 16 16 16
832 0 # sec/iter = 0.097; radices = 13 8 16 16 16
896 0 # sec/iter = 0.101; radices = 7 16 16 16 16
960 0 # sec/iter = 0.105; radices = 15 8 16 16 16
1024 1 # sec/iter = 0.110; radices = 8 16 16 16 16
1152 1 # sec/iter = 0.131; radices = 9 16 16 16 16
1280 1 # sec/iter = 0.153; radices = 10 16 16 16 16
1408 1 # sec/iter = 0.179; radices = 11 16 16 16 16
1536 1 # sec/iter = 0.180; radices = 12 16 16 16 16
1664 1 # sec/iter = 0.213; radices = 13 16 16 16 16
1792 1 # sec/iter = 0.230; radices = 14 16 16 16 16
1920 1 # sec/iter = 0.236; radices = 15 16 16 16 16
2048 2 # sec/iter = 0.251; radices = 8 16 16 32 16
You are free to modify or append data to the right of the # signs in the .cfg file and to add or delete comment lines beginning with a # as desired. For instance, one useful thing is to add information about the specific build and platform at the top of the file.
One important thing to look for in a .cfg file generated on your local system is non-monotone timing entries in the sec/iter (seconds per iteration at the particular FFT length) data. for instance, consider the following snippet from an example mlucas.cfg file (to which I've added some boldface highlighting):
1536 1 # sec/iter = 0.225; radices = 12 16 16 16 16 1664 1 # sec/iter = 0.244; radices = 13 16 16 16 16 1792 1 # sec/iter = 0.253; radices = 14 16 16 16 16 1920 1 # sec/iter = 0.299; radices = 15 16 16 16 16 2048 4 # sec/iter = 0.284; radices = 32 32 32 32
We see that the per-iteration time for runlength 2048K is actually less than that for the next-smaller vector length that precedes it. If you encounter such occurrences in the mlucas.cfg file generated by the self-test run on your system, delete or comment out the entry in question. This will cause the program to skip to the next-larger-available runlength for exponents that would have been tested using the smaller-but-slower FFT length.
Aside: This type of thing frequently occurs for FFT lengths with non-power-of-2 leading radices (which are algorithmically less efficient than power-of-2 radices) just slightly less than a power-of-2 FFT length (e.g. 2048K = 221.) It can also happen if for some reason the compiler does a relatively poorer job of optimization on a particular FFT radix, or if some FFT radix combinations happen to give better or worse memory-access and cache behavior on the system in question.
Assuming your self-tests ran successfully, reserve a range of exponents from the GIMPS PrimeNet server. Here's the procedure (for less-experienced users, I suggest toggling between the PrimeNet links and my explanatory comments):
Each PrimeNet work assigment output line is in the form
{assignment type}={Mersenne exponent},{known to have no factors with base-2 logarithm less than}
A sample of typical assignments returned by the server follows:
| Assigment | Explanation |
| Factor=5130641,52 | Trial-factor M5130641. This exponent has already been sieved to depth 2^52, with no factors found. |
| Test=5130707,52 | Trial-factor M5130707 from 2^52 to 2^61 (see table below), and Lucas-Lehmer test if no factor found. |
| Test=3858629,60 | M3858629 has already been trial-factored to the recommended depth (see table below), so LL test ONLY. |
| DoubleCheck=4272029,62 | M4272029 has already been trial-factored to greater than the recommended depth (see table below),
so perform a second LL test of M4272029 in order to validate the results of the initial test. |
There may be some additional numbers beyond the above (e.g. the 0/1 that indicates whether p-1 factoring on the number has been attempted) in current assignments - these are ignored by Mlucas.
Copy the Test=... or DoubleCheck=... lines returned by the server into the worktodo.ini file, which must be in the same directory as the Mlucas executable and the mlucas.cfg file. If this file does not yet exist, create it. If this file already has some existing entries, append the new ones below them.
Note that Mlucas makes no distinction between first-time LL tests and double-checks - this distinction is only important to the
Primenet server. Mlucas currently has no trial-factoring capability, so if you somehow wind up factoring assignments
(e.g. if the server is currently out of exponents that need double-checking),
you'll have to release any assignments that require
factoring, request some new ones, then proceed to Lucas-Lehmer testing.
On a Unix or Linux system, cd to the directory containing the Mlucas executable (or a link to it), the worktodo.ini and the mlucas.cfg file and type
nice Mlucas &
The program will run silently in background, leaving you free to do other things or to log out. Every 2000 iterations, the program writes a timing to the "p{exponent}.stat" file (which is automatically created for each exponent), and writes the current residue and all other data it needs to pick up at this point (in case of a crash or powerdown) to a pair of restart files, named "p{exponent}" and "q{exponent}." (The second is a backup, in the rare event the first is corrupt.) When the exponent finishes, the program writes the least significant 64 bits of the final residue (in hexadecimal form, just like Prime95) to the .stat and results.txt (master output) file. Any round-off or FFT convolution error warnings are written as they are detected both to the status and to the output file, thus preserving a record of them when the Lucas-Lehmer test of the current exponent is completed.
ADDING NEW EXPONENTS TO THE WORKTODO.INI FILE: you may add or modify ALL BUT THE FIRST EXPONENT (i.e. the current one) in the worktodo.ini file while the program is running. When the current exponent finishes, the program opens the file, deletes the first entry and, if there is another exponent on what was line 2 (and now is line 1), starts work on that one.
CONCATENATING SEVERAL PARTIALLY COMPLETED ASSIGNMENTS INTO THE SAME WORKTODO.INI FILE: Say you've been running some jobs on a friend's UltraSuperPower 3000 workstation, and she tells you that as part of her recent divorce settlement, she gets the Jaguar and her husband gets the USP3K. (Sounds like a bad trade if you ask me, but perhaps her ex had a real shark of a lawyer.) What to do with the soon-to-be-orphaned savefiles? Well, if your own humble workstation uses the same bytewise -endian format as the USP3K (see the compatibility matrix below), you can simply move the savefiles into your working Mlucas directory there, copy the corresponding exponent into any line beyond the first in the worktodo.ini file, and the program will restart that job at the last-saved iteration when it gets around to it.
SAVEFILE COMPATIBILITY ACROSS VARIOUS PLATFORMS:
Every 2000 iterations, the program does a simple binary dump of the array of 4-byte integer residue digits and an 8-byte
floating checksum to the two redundant p-and-q savefiles. This means that savefiles are only portable across platforms
that use the same bytewise -endian convention for both integer and floating-point data.
The table below summarizes savefile intercompatibility for some popular RISC platforms. If your intended hardware
pair for a savefile transfer is not listed, you can simply try restarting a job whose savefiles were transferred from
hardware A using the Mlucas code on hardware B - if the data formats are not compatible, the code will immediately quit
with a checksum error notice, leaving the original savefiles intact. (I.e. you'll just have to find a machine compatible
with A to finish them on.) If enough users ask for it, I may get around to implementing an endian-independent (truly
portable) savefile format in a future release.
| Savefile Hardware Intercompatibility Matrix | |||||
| Machine type | Alpha | SGI/MIPS | Sparc | IBM RS6000 | Apple/PowerPC |
| Alpha | YES | NO | NO | UNKNOWN | UNKNOWN |
| SGI/MIPS | NO | YES | YES | UNKNOWN | UNKNOWN |
| Sparc | NO | YES | YES | UNKNOWN | UNKNOWN |
| IBM RS6000 | UNKNOWN | UNKNOWN | UNKNOWN | YES | UNKNOWN |
| Apple/PowerPC | UNKNOWN | UNKNOWN | UNKNOWN | UNKNOWN | YES |
To report results (either after finishing a range, or as they come in), Logon to the primenet web pages so that you get CPU credit. Connect to the PrimeNet manual tests page,
http://mersenne.org/manual_result/,
then paste the results you wish to report (i.e. the final line of the p*.stat file, or
one or more lines of the results.txt file) into the large window immediately below.
Manual test results do receive CPU time credit on the PrimeNet server's producer pages.
It uses the well-known Lucas-Lehmer test for Mersenne numbers, which involves selecting an initial seed number (typically 4, but other values, such as 10, also work), then repeatedly squaring and subtracting 2, with the result of each squaring being reduced modulo M(p), the number under test. This square/subtract-2 step is done exactly p-2 times. If the result (modulo M(p)) is zero, M(p) is prime. For an excellent and much more in-depth discussion of the Lucas-Lehmer test and many other prime-related topics, please visit Chris Caldwell's web page on Mersenne numbers.
Since the numbers being tested (and hence the intermediate residues in the LL test) are so large that they typically must be distributed across several hundred thousand computer words, by far the most time-consuming part of the LL test is the modular squaring.
For a large integer occupying N computer words, a simple digit-by-digit ("grammar school") multiply operation (which needs on the order of N2 machine operations) is much too slow to be practical. Rather, the code uses a multiply algorithm based on the fast Fourier transform (FFT), which is described in many numerical analysis texts, such as the well-known Numerical Recipes (NR) books. (NR even has a set of so-called multiprecision integer routines, but I suggest staying away from them - they're awful.)
The code also uses the now-well-known Discrete Weighted Transform technique of Crandall and Fagin (for a detailed reference, see the source code header or this research paper) to implicitly do the modding. This permits one to "fill" the digits of the input vector to the FFT-based squaring, and thus to reduce the vector length by a factor of 2 or more relative to any pre-1994 codes.
The upshot is, to write the world's fastest Mersenne testing program, one must write (or make use of) the world's fastest FFT algorithm.
Mlucas uses a custom FFT implementation written by me (EWM). I first started on this algorithmic journey in the late summer of 1996, and being a complete novice at transform-based arithmetic at the time, the first FFT routines I used were those from NR. Since then, the code has greatly evolved, and the FFT I currently use looks absolutely nothing like the original one, although it is doing basically the same thing (except for the non-power-of-2 vector length routines - NR has nothing along those lines.)
I have not had time or desire to package the FFT core of Mlucas into a form suitable for inclusion in the FFTW benchmarks, but
my own comparisons indicate that the Mlucas FFT is typically about twice as fast as FFTW for the vector lengths of interest
to Mersenne prime searchers (real vectors of length 128K and larger, where K=210=1024).