Prime Sieves

Prime Sieve Benchmarks

by Dana Jacobsen, 29 January 2018

last update: 21 January 2019

Benchmark results

Segmented Sieve program	10^9	10^10	10^11	10^12	10^13
yafu r366/wip AVX2	0.218	1.286	15.052	3m 18.4s	39m 2s
primesieve 7.4	0.109	1.306	16.117	3m 27.2s	42m 19s
yafu 1.35 r366	0.280	1.819	19.681	4m 17.2s	50m 56s
Perl/ntheory 0.70	0.142	2.048	23.602	4m 58.5s	70m 54s
primegen 0.97	0.244	2.544	28.525	6m 32.9s	121m 50s
Oliveira e Silva v2	0.432	4.726	52.535	10m 2.2s	117m 7s
Roy/Luo Algo3	0.400	4.783	62.278	11m 16.0s	148m 52s
tverniquet 2012	0.469	5.294	63.857	14m 37.4s
Flammenkamp	0.559	5.848	68.992	16m 9.0s
Walisch bit segment	0.781	8.364	92.858	18m 35.5s
tverniquet 2015	0.082	1.332	27.546	30m 50.1s
Walisch byte segment	0.641	10.817	178.087	51m 37.5s
Monolithic Sieve program	10^9	10^10	10^11	10^12	10^13
Perl/ntheory 0.70	0.882	12.304	2m 25.8s	30m 14s
Mathisen 1998	1.847	21.714	4m 8.0s	48m 0s
Adler 2019	0.336	4.456	1m 9.9s
Trivial Odd SOE	2.748	31.816	6m 6.0s
Sieve Iterator program	10^9	10^10	10^11	10^12	10^13
Math::Prime::Util::GMP	1.168	13.541	3m 16.5s	61m 2s
Pari/GP 2.10.0	1.610	16.916	3m 9.9s	>16 hours
BHelmes 2009	1.858	20.116	3m 55.1s	55m 5s

Testing Details

Machine: Intel i7-6700K, 4.2GHz
32GB (4x8) DDR4 2400, CAS 15, 1.2V (F4-2400C15Q-32GRR)
Fedora 23
gcc version 5.3.1 20160406 (Red Hat 5.3.1-6) (GCC)

Single thread only. Some of the software can be run with multiple threads. This is not a test of that functionality.

Individual program details

primesieve
primesieve.org
Segmented Sieve of Eratosthenes
Library, nice API and easy to use in projects, efficient multi-threading.
cmake .; make
time ./primesieve 1e11 -c1 -t1 --no-status

yafu
YAFU at SourceForge
Segmented Sieve of Eratosthenes
Built into yafu factoring utility, not so easy to pull out and use elsewhere. Efficient multi-threading.
svn update; make x86_64
time ./yafu "primes(0,10^11)" -threads 1

yafu (with AVX2)
YAFU at SourceForge
Segmented Sieve of Eratosthenes
Built into yafu factoring utility, not so easy to pull out and use elsewhere. Efficient multi-threading. AVX2 support for more speed.
svn update; cd branches/wip; make -DUSE_AVX2=1
time ./yafu "primes(0,10^11)" -threads 1

Perl/ntheory
ntheory on github (also on MetaCPAN)
Segmented Sieve of Eratosthenes
Built into Perl module, possible but tricky to pull out to use elsewhere.
git pull; perl Makefile.PL OPTIMIZE="-O3 -march=native -DBENCH_SEGCOUNT"; make
time perl -Iblib/lib -Iblib/arch -Mntheory=:all -E 'say ntheory::_segment_pi(1e11)'

primegen
primegen from D.J. Bernstein
Segmented Sieve of Atkin
Reference implementation of Atkin sieve. More complex than Wikipedia pseudocode.
edit conf-words, set to 32000
use gcc -O3 -fomit-frame-pointer -march=native
time ./primespeed 100000000000

v2 sieve of Tomás Oliveira e Silva
fast_sieve.c from TOeS
modification for counting
Segmented Sieve of Eratosthenes using buckets
Demonstration code. Decently fast at small sizes, excels at large sizes. Memory intensive.
set _sieve_bits_log2_ to 19u
gcc -Wall -O3 -fomit-frame-pointer -march=native toes_v2_count.c -o pr-toes
time ./pr-toes -m 1e11

Mario Roy's implementation of Luo Algorithm 3
source
Segmented Sieve of Eratosthenes (from Luo 1989)
An interesting variant.
gcc -O3 -fomit-frame-pointer -march=native algo3-new.c -o pr-roy-algo3 -lm
time ./pr-roy-algo3 2 100000000000

tverniquet 2012 (Tristan Verniquet)
source and include -- original link: Archive link
Segmented Sieve of Eratosthenes
As seen in last post on whirlpool.net thread, uses a large mod 2310 wheel. Multi-thread capable.
gcc -O3 -fomit-frame-pointer -march=native -DNCPU=0 tver-prime.c -o pr-tver -lm
time ./pr-tver 100000000000

tverniquet 2015 (hprime by Tristan Verniquet)
hprime on github
Segmented Sieve of Eratosthenes
Includes some nice demos of wheel optimizations. Multi-threaded. Does not compile on MacOS.
Very fast for small sizes but really slows down as size increases.
make
time bin/hprime 0 100000000000 0 0

Achim Flammenkamp 1998-2003
source
Segmented Sieve of Eratosthenes
Historically interesting.
gcc -O3 -fomit-frame-pointer -march=native -DTEST -DLONG="unsigned long" -Dtrue_64bit_words=1 -DSIEVE_SIZE=65536 af_prime_sieve.c -o pr-flammenkamp -lm
time ./pr-flammenkamp 100000000000

Kim Walisch demo bit segment
source
Segmented Sieve of Eratosthenes
Really straightforward demo code, hence not many optimizations.
g++ -O3 -fomit-frame-pointer -march=native segmented_bit_sieve.cpp -o pr-walisch-bit
time ./pr-walisch-bit 100000000000

Kim Walisch demo byte segment
web page and description / source
Segmented Sieve of Eratosthenes
Really straightforward demo code, hence not many optimizations.
g++ -O3 -fomit-frame-pointer -march=native segmented_sieve.cpp -o pr-walisch-byte
time ./pr-walisch-byte 100000000000

Pari/GP 2.10.0
Pari/GP home page
Segmented Sieve of Eratosthenes
Loops through u_forprime_next which does sieving, but not so efficiently.
git pull; git checkout a1cc066d04938338a8d892227bbc947396e522e5
edit src/basemath/prime.c, to skip tables replace uprimepi() body with:

    ulong p=2, n=1;  forprime_t S;
    (void)u_forprime_init(&S, p+1, a);
    for (; p; n++) p = u_forprime_next(&S);
    return n-1;

make gp
echo "primepi(10^11)" | time ./gp -f -q

Perl/ntheory (Monolithic)
ntheory on github (also on MetaCPAN)
Monolithic Sieve of Eratosthenes
Mod-30 wheel, presieving 7,11,13.
Built into Perl module, could be pulled out.
git pull; perl Makefile.PL OPTIMIZE="-O3 -march=native -DBENCH_SEGCOUNT"; make
time perl -Iblib/lib -Iblib/arch -Mntheory=:all -E 'prime_precalc(1e11); say ntheory::_segment_pi(1e11)'

Bernhard Helmes and Jasper Neumann 2009
source
Monolithic Sieve of Eratosthenes
Small sieve using a heap as shown in O'neill (2009).
While not segmented, there is no giant bit array, and memory use is very low.
g++ -O3 -fomit-frame-pointer -march=native bhelmes.cpp -o pr-bhelmes
time ./pr-bhelmes 100000000000

Terje Mathisen 1998
source
Monolithic Sieve of Eratosthenes
Very straightforward mod-30 wheel sieve.
gcc -O3 -fomit-frame-pointer -march=native mathisen.c -o pr-mathisen -lm
time ./pr-mathisen 100000000000

Trivial 4-line SoE
source
Monolithic Sieve of Eratosthenes
Demostration simple odds-only SoE.
gcc -O3 -fomit-frame-pointer -march=native simple3.c -o pr-simple -lm
time ./pr-simple 100000000000

Adler 2019

Mark Adler's two-pass SoE
source
Monolithic odds-only Sieve of Eratosthenes
Posted on Quora January 2019.
Creates masks to speed up marking for early primes, then marks later portions in chunks to get some of the locality advantages of a segmented sieve.
Like the trivial monolithic SoC, cannot complete 10^12 in 32GB of RAM.
gcc -O3 -fomit-frame-pointer -march=native adler2019.c -o adler2019
time ./adler2019

Perl/ntheory (Iterator)
ntheory on github (also on MetaCPAN)
Iterator built on segmented SoE
Built into Math::Prime::Util::GMP Perl module, could be pulled out.
... todo ... , replaces is_semiprime() with a for loop of prime_iterator_next with count++ ...
time mpu 'Math::Prime::Util::GMP::is_semiprime(1e12)'