Prime Sieve Benchmarks
by Dana Jacobsen, 29 January 2018
last update: 21 January 2019
Segmented Sieve program | 10^9 | 10^10 | 10^11 | 10^12 | 10^13 |
---|---|---|---|---|---|
yafu r366/wip AVX2 | 0.218 | 1.286 | 15.052 | 3m 18.4s | 39m 2s |
primesieve 7.4 | 0.109 | 1.306 | 16.117 | 3m 27.2s | 42m 19s |
yafu 1.35 r366 | 0.280 | 1.819 | 19.681 | 4m 17.2s | 50m 56s |
Perl/ntheory 0.70 | 0.142 | 2.048 | 23.602 | 4m 58.5s | 70m 54s |
primegen 0.97 | 0.244 | 2.544 | 28.525 | 6m 32.9s | 121m 50s |
Oliveira e Silva v2 | 0.432 | 4.726 | 52.535 | 10m 2.2s | 117m 7s |
Roy/Luo Algo3 | 0.400 | 4.783 | 62.278 | 11m 16.0s | 148m 52s |
tverniquet 2012 | 0.469 | 5.294 | 63.857 | 14m 37.4s | |
Flammenkamp | 0.559 | 5.848 | 68.992 | 16m 9.0s | |
Walisch bit segment | 0.781 | 8.364 | 92.858 | 18m 35.5s | |
tverniquet 2015 | 0.082 | 1.332 | 27.546 | 30m 50.1s | |
Walisch byte segment | 0.641 | 10.817 | 178.087 | 51m 37.5s | |
Monolithic Sieve program | 10^9 | 10^10 | 10^11 | 10^12 | 10^13 |
Perl/ntheory 0.70 | 0.882 | 12.304 | 2m 25.8s | 30m 14s | |
Mathisen 1998 | 1.847 | 21.714 | 4m 8.0s | 48m 0s | |
Adler 2019 | 0.336 | 4.456 | 1m 9.9s | ||
Trivial Odd SOE | 2.748 | 31.816 | 6m 6.0s | ||
Sieve Iterator program | 10^9 | 10^10 | 10^11 | 10^12 | 10^13 |
Math::Prime::Util::GMP | 1.168 | 13.541 | 3m 16.5s | 61m 2s | |
Pari/GP 2.10.0 | 1.610 | 16.916 | 3m 9.9s | >16 hours | |
BHelmes 2009 | 1.858 | 20.116 | 3m 55.1s | 55m 5s |
Machine: Intel i7-6700K, 4.2GHz
32GB (4x8) DDR4 2400, CAS 15, 1.2V (F4-2400C15Q-32GRR)
Fedora 23
gcc version 5.3.1 20160406 (Red Hat 5.3.1-6) (GCC)
Single thread only. Some of the software can be run with multiple threads. This is not a test of that functionality.
primesieve
primesieve.org
Segmented Sieve of Eratosthenes
Library, nice API and easy to use in projects, efficient multi-threading.
cmake .; make
time ./primesieve 1e11 -c1 -t1 --no-status
yafu
YAFU at SourceForge
Segmented Sieve of Eratosthenes
Built into yafu factoring utility, not so easy to pull out and use elsewhere. Efficient multi-threading.
svn update; make x86_64
time ./yafu "primes(0,10^11)" -threads 1
yafu (with AVX2)
YAFU at SourceForge
Segmented Sieve of Eratosthenes
Built into yafu factoring utility, not so easy to pull out and use elsewhere. Efficient multi-threading. AVX2 support for more speed.
svn update; cd branches/wip; make -DUSE_AVX2=1
time ./yafu "primes(0,10^11)" -threads 1
Perl/ntheory
ntheory on github (also on MetaCPAN)
Segmented Sieve of Eratosthenes
Built into Perl module, possible but tricky to pull out to use elsewhere.
git pull; perl Makefile.PL OPTIMIZE="-O3 -march=native -DBENCH_SEGCOUNT"; make
time perl -Iblib/lib -Iblib/arch -Mntheory=:all -E 'say ntheory::_segment_pi(1e11)'
primegen
primegen from D.J. Bernstein
Segmented Sieve of Atkin
Reference implementation of Atkin sieve. More complex than Wikipedia pseudocode.
edit conf-words, set to 32000
use gcc -O3 -fomit-frame-pointer -march=native
time ./primespeed 100000000000
v2 sieve of Tomás Oliveira e Silva
fast_sieve.c from TOeS
modification for counting
Segmented Sieve of Eratosthenes using buckets
Demonstration code. Decently fast at small sizes, excels at large sizes. Memory intensive.
set _sieve_bits_log2_ to 19u
gcc -Wall -O3 -fomit-frame-pointer -march=native toes_v2_count.c -o pr-toes
time ./pr-toes -m 1e11
Mario Roy's implementation of Luo Algorithm 3
source
Segmented Sieve of Eratosthenes (from Luo 1989)
An interesting variant.
gcc -O3 -fomit-frame-pointer -march=native algo3-new.c -o pr-roy-algo3 -lm
time ./pr-roy-algo3 2 100000000000
tverniquet 2012 (Tristan Verniquet)
source and include -- original link: Archive link
Segmented Sieve of Eratosthenes
As seen in last post on whirlpool.net thread, uses a large mod 2310 wheel. Multi-thread capable.
gcc -O3 -fomit-frame-pointer -march=native -DNCPU=0 tver-prime.c -o pr-tver -lm
time ./pr-tver 100000000000
tverniquet 2015 (hprime by Tristan Verniquet)
hprime on github
Segmented Sieve of Eratosthenes
Includes some nice demos of wheel optimizations. Multi-threaded. Does not compile on MacOS.
Very fast for small sizes but really slows down as size increases.
make
time bin/hprime 0 100000000000 0 0
Achim Flammenkamp 1998-2003
source
Segmented Sieve of Eratosthenes
Historically interesting.
gcc -O3 -fomit-frame-pointer -march=native -DTEST -DLONG="unsigned long" -Dtrue_64bit_words=1 -DSIEVE_SIZE=65536 af_prime_sieve.c -o pr-flammenkamp -lm
time ./pr-flammenkamp 100000000000
Kim Walisch demo bit segment
source
Segmented Sieve of Eratosthenes
Really straightforward demo code, hence not many optimizations.
g++ -O3 -fomit-frame-pointer -march=native segmented_bit_sieve.cpp -o pr-walisch-bit
time ./pr-walisch-bit 100000000000
Kim Walisch demo byte segment
web page and description
/
source
Segmented Sieve of Eratosthenes
Really straightforward demo code, hence not many optimizations.
g++ -O3 -fomit-frame-pointer -march=native segmented_sieve.cpp -o pr-walisch-byte
time ./pr-walisch-byte 100000000000
Pari/GP 2.10.0
Pari/GP home page
Segmented Sieve of Eratosthenes
Loops through u_forprime_next
which does sieving, but not so efficiently.
git pull; git checkout a1cc066d04938338a8d892227bbc947396e522e5
edit src/basemath/prime.c
, to skip tables replace uprimepi()
body with:
ulong p=2, n=1; forprime_t S;
(void)u_forprime_init(&S, p+1, a);
for (; p; n++) p = u_forprime_next(&S);
return n-1;
make gp
echo "primepi(10^11)" | time ./gp -f -q
Perl/ntheory (Monolithic)
ntheory on github (also on MetaCPAN)
Monolithic Sieve of Eratosthenes
Mod-30 wheel, presieving 7,11,13.
Built into Perl module, could be pulled out.
git pull; perl Makefile.PL OPTIMIZE="-O3 -march=native -DBENCH_SEGCOUNT"; make
time perl -Iblib/lib -Iblib/arch -Mntheory=:all -E 'prime_precalc(1e11); say ntheory::_segment_pi(1e11)'
Bernhard Helmes and Jasper Neumann 2009
source
Monolithic Sieve of Eratosthenes
Small sieve using a heap as shown in O'neill (2009).
While not segmented, there is no giant bit array, and memory use is very low.
g++ -O3 -fomit-frame-pointer -march=native bhelmes.cpp -o pr-bhelmes
time ./pr-bhelmes 100000000000
Terje Mathisen 1998
source
Monolithic Sieve of Eratosthenes
Very straightforward mod-30 wheel sieve.
gcc -O3 -fomit-frame-pointer -march=native mathisen.c -o pr-mathisen -lm
time ./pr-mathisen 100000000000
Trivial 4-line SoE
source
Monolithic Sieve of Eratosthenes
Demostration simple odds-only SoE.
gcc -O3 -fomit-frame-pointer -march=native simple3.c -o pr-simple -lm
time ./pr-simple 100000000000
Mark Adler's two-pass SoE
source
Monolithic odds-only Sieve of Eratosthenes
Posted on Quora January 2019.
Creates masks to speed up marking for early primes, then marks later portions in chunks to get some of the locality advantages of a segmented sieve.
Like the trivial monolithic SoC, cannot complete 10^12 in 32GB of RAM.
gcc -O3 -fomit-frame-pointer -march=native adler2019.c -o adler2019
time ./adler2019
Perl/ntheory (Iterator)
ntheory on github (also on MetaCPAN)
Iterator built on segmented SoE
Built into Math::Prime::Util::GMP Perl module, could be pulled out.
... todo ... , replaces is_semiprime() with a for loop of prime_iterator_next with count++ ...
time mpu 'Math::Prime::Util::GMP::is_semiprime(1e12)'