Prime Sieve Benchmarks

by Dana Jacobsen, 29 January 2018

last update: 21 January 2019


Benchmark results

Segmented Sieve program 10^9 10^10 10^11 10^12 10^13
yafu r366/wip AVX2 0.218 1.286 15.052 3m 18.4s 39m   2s
primesieve 7.4 0.109 1.306 16.117 3m 27.2s 42m 19s
yafu 1.35 r366 0.280 1.819 19.681 4m 17.2s 50m 56s
Perl/ntheory 0.70 0.142 2.048 23.602 4m 58.5s 70m 54s
primegen 0.97 0.244 2.544 28.525 6m 32.9s 121m 50s
Oliveira e Silva v2 0.432 4.726 52.535 10m   2.2s 117m   7s
Roy/Luo Algo3 0.400 4.783 62.278 11m 16.0s 148m 52s
tverniquet 2012 0.469 5.294 63.857 14m 37.4s
Flammenkamp 0.559 5.848 68.992 16m   9.0s
Walisch bit segment 0.781 8.364 92.858 18m 35.5s
tverniquet 2015 0.082 1.332 27.546 30m 50.1s
Walisch byte segment 0.641 10.817 178.087 51m 37.5s
Monolithic Sieve program 10^9 10^10 10^11 10^12 10^13
Perl/ntheory 0.70 0.882 12.304 2m 25.8s 30m 14s
Mathisen 1998 1.847 21.714 4m   8.0s 48m   0s
Adler 2019 0.336 4.456 1m   9.9s
Trivial Odd SOE 2.748 31.816 6m   6.0s
Sieve Iterator program 10^9 10^10 10^11 10^12 10^13
Math::Prime::Util::GMP 1.168 13.541 3m 16.5s 61m   2s
Pari/GP 2.10.0 1.610 16.916 3m   9.9s >16 hours
BHelmes 2009 1.858 20.116 3m 55.1s 55m   5s

Testing Details

Machine: Intel i7-6700K, 4.2GHz
32GB (4x8) DDR4 2400, CAS 15, 1.2V (F4-2400C15Q-32GRR)
Fedora 23
gcc version 5.3.1 20160406 (Red Hat 5.3.1-6) (GCC)

Single thread only. Some of the software can be run with multiple threads. This is not a test of that functionality.


Individual program details

primesieve
primesieve.org
Segmented Sieve of Eratosthenes
Library, nice API and easy to use in projects, efficient multi-threading.
cmake .; make
time ./primesieve 1e11 -c1 -t1 --no-status

yafu
YAFU at SourceForge
Segmented Sieve of Eratosthenes
Built into yafu factoring utility, not so easy to pull out and use elsewhere. Efficient multi-threading.
svn update; make x86_64
time ./yafu "primes(0,10^11)" -threads 1

yafu (with AVX2)
YAFU at SourceForge
Segmented Sieve of Eratosthenes
Built into yafu factoring utility, not so easy to pull out and use elsewhere. Efficient multi-threading. AVX2 support for more speed.
svn update; cd branches/wip; make -DUSE_AVX2=1
time ./yafu "primes(0,10^11)" -threads 1

Perl/ntheory
ntheory on github (also on MetaCPAN)
Segmented Sieve of Eratosthenes
Built into Perl module, possible but tricky to pull out to use elsewhere.
git pull; perl Makefile.PL OPTIMIZE="-O3 -march=native -DBENCH_SEGCOUNT"; make
time perl -Iblib/lib -Iblib/arch -Mntheory=:all -E 'say ntheory::_segment_pi(1e11)'

primegen
primegen from D.J. Bernstein
Segmented Sieve of Atkin
Reference implementation of Atkin sieve. More complex than Wikipedia pseudocode.
edit conf-words, set to 32000
use gcc -O3 -fomit-frame-pointer -march=native
time ./primespeed 100000000000

v2 sieve of Tomás Oliveira e Silva
fast_sieve.c from TOeS
modification for counting
Segmented Sieve of Eratosthenes using buckets
Demonstration code. Decently fast at small sizes, excels at large sizes. Memory intensive.
set _sieve_bits_log2_ to 19u
gcc -Wall -O3 -fomit-frame-pointer -march=native toes_v2_count.c -o pr-toes
time ./pr-toes -m 1e11

Mario Roy's implementation of Luo Algorithm 3
source
Segmented Sieve of Eratosthenes (from Luo 1989)
An interesting variant.
gcc -O3 -fomit-frame-pointer -march=native algo3-new.c -o pr-roy-algo3 -lm
time ./pr-roy-algo3 2 100000000000

tverniquet 2012 (Tristan Verniquet)
source and include -- original link: Archive link
Segmented Sieve of Eratosthenes
As seen in last post on whirlpool.net thread, uses a large mod 2310 wheel. Multi-thread capable.
gcc -O3 -fomit-frame-pointer -march=native -DNCPU=0 tver-prime.c -o pr-tver -lm
time ./pr-tver 100000000000

tverniquet 2015 (hprime by Tristan Verniquet)
hprime on github
Segmented Sieve of Eratosthenes
Includes some nice demos of wheel optimizations. Multi-threaded. Does not compile on MacOS.
Very fast for small sizes but really slows down as size increases.
make
time bin/hprime 0 100000000000 0 0

Achim Flammenkamp 1998-2003
source
Segmented Sieve of Eratosthenes
Historically interesting.
gcc -O3 -fomit-frame-pointer -march=native -DTEST -DLONG="unsigned long" -Dtrue_64bit_words=1 -DSIEVE_SIZE=65536 af_prime_sieve.c -o pr-flammenkamp -lm
time ./pr-flammenkamp 100000000000

Kim Walisch demo bit segment
source
Segmented Sieve of Eratosthenes
Really straightforward demo code, hence not many optimizations.
g++ -O3 -fomit-frame-pointer -march=native segmented_bit_sieve.cpp -o pr-walisch-bit
time ./pr-walisch-bit 100000000000

Kim Walisch demo byte segment
web page and description / source
Segmented Sieve of Eratosthenes
Really straightforward demo code, hence not many optimizations.
g++ -O3 -fomit-frame-pointer -march=native segmented_sieve.cpp -o pr-walisch-byte
time ./pr-walisch-byte 100000000000

Pari/GP 2.10.0
Pari/GP home page
Segmented Sieve of Eratosthenes
Loops through u_forprime_next which does sieving, but not so efficiently.
git pull; git checkout a1cc066d04938338a8d892227bbc947396e522e5
edit src/basemath/prime.c, to skip tables replace uprimepi() body with:

    ulong p=2, n=1;  forprime_t S;
    (void)u_forprime_init(&S, p+1, a);
    for (; p; n++) p = u_forprime_next(&S);
    return n-1;

make gp
echo "primepi(10^11)" | time ./gp -f -q

Perl/ntheory (Monolithic)
ntheory on github (also on MetaCPAN)
Monolithic Sieve of Eratosthenes
Mod-30 wheel, presieving 7,11,13.
Built into Perl module, could be pulled out.
git pull; perl Makefile.PL OPTIMIZE="-O3 -march=native -DBENCH_SEGCOUNT"; make
time perl -Iblib/lib -Iblib/arch -Mntheory=:all -E 'prime_precalc(1e11); say ntheory::_segment_pi(1e11)'

Bernhard Helmes and Jasper Neumann 2009
source
Monolithic Sieve of Eratosthenes
Small sieve using a heap as shown in O'neill (2009).
While not segmented, there is no giant bit array, and memory use is very low.
g++ -O3 -fomit-frame-pointer -march=native bhelmes.cpp -o pr-bhelmes
time ./pr-bhelmes 100000000000

Terje Mathisen 1998
source
Monolithic Sieve of Eratosthenes
Very straightforward mod-30 wheel sieve.
gcc -O3 -fomit-frame-pointer -march=native mathisen.c -o pr-mathisen -lm
time ./pr-mathisen 100000000000

Trivial 4-line SoE
source
Monolithic Sieve of Eratosthenes
Demostration simple odds-only SoE.
gcc -O3 -fomit-frame-pointer -march=native simple3.c -o pr-simple -lm
time ./pr-simple 100000000000

Adler 2019

Mark Adler's two-pass SoE
source
Monolithic odds-only Sieve of Eratosthenes
Posted on Quora January 2019.
Creates masks to speed up marking for early primes, then marks later portions in chunks to get some of the locality advantages of a segmented sieve.
Like the trivial monolithic SoC, cannot complete 10^12 in 32GB of RAM.
gcc -O3 -fomit-frame-pointer -march=native adler2019.c -o adler2019
time ./adler2019

Perl/ntheory (Iterator)
ntheory on github (also on MetaCPAN)
Iterator built on segmented SoE
Built into Math::Prime::Util::GMP Perl module, could be pulled out.
... todo ... , replaces is_semiprime() with a for loop of prime_iterator_next with count++ ...
time mpu 'Math::Prime::Util::GMP::is_semiprime(1e12)'