You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Benchmarks for string similarity and alignment algorithms across Rust and Python implementations, including CPU and GPU variants.
Overview
Edit Distance calculation is a common component of Search Engines, Data Cleaning, and Natural Language Processing, as well as in Bioinformatics.
It's a computationally expensive operation, generally implemented using dynamic programming, with a quadratic time complexity upper bound.
For biological sequences, the Needleman-Wunsch and Smith-Waterman algorithms are more appropriate, as they allow overriding the default substitution costs.
Each of those has two flavors - with linear and affine gap penalties, also known as the "Gotoh" variation.
Performance is measured in MCUPS (Million Cell Updates Per Second).
Levenshtein Distance
Library
~100 bytes lines
~1,000 bytes lines
Rust
bio::levenshtein on 1x SPR
428 MCUPS
823 MCUPS
rapidfuzz::levenshtein<Bytes> on 1x SPR
4,633 MCUPS
14,316 MCUPS
rapidfuzz::levenshtein<Chars> on 1x SPR
3,877 MCUPS
13,179 MCUPS
stringzillas::LevenshteinDistances on 1x SPR
3,315 MCUPS
13,084 MCUPS
stringzillas::LevenshteinDistancesUtf8 on 1x SPR
3,283 MCUPS
11,690 MCUPS
stringzillas::LevenshteinDistances on 16x SPR
29,430 MCUPS
105,400 MCUPS
stringzillas::LevenshteinDistancesUtf8 on 16x SPR
38,954 MCUPS
103,500 MCUPS
stringzillas::LevenshteinDistances on RTX6000
32,030 MCUPS
901,990 MCUPS
stringzillas::LevenshteinDistances on H100
31,913 MCUPS
925,890 MCUPS
stringzillas::LevenshteinDistances on B200
32,960 MCUPS
998,620 MCUPS
stringzillas::LevenshteinDistances on 384x GNR
114,190 MCUPS
3,084,270 MCUPS
stringzillas::LevenshteinDistancesUtf8 on 384x GNR
103,590 MCUPS
2,938,320 MCUPS
Python
nltk.edit_distance
2 MCUPS
2 MCUPS
jellyfish.levenshtein_distance
81 MCUPS
228 MCUPS
rapidfuzz.Levenshtein.distance
108 MCUPS
9,272 MCUPS
editdistance.eval
89 MCUPS
660 MCUPS
edlib.align
82 MCUPS
7,262 MCUPS
polyleven.levenshtein
89 MCUPS
3,887 MCUPS
stringzillas.LevenshteinDistances on 1x SPR
53 MCUPS
3,407 MCUPS
stringzillas.LevenshteinDistancesUTF8 on 1x SPR
57 MCUPS
3,693 MCUPS
cudf.edit_distance batch on H100
24,754 MCUPS
6,976 MCUPS
stringzillas.LevenshteinDistances batch on 1x SPR
2,343 MCUPS
12,141 MCUPS
stringzillas.LevenshteinDistances batch on 16x SPR
3,762 MCUPS
119,261 MCUPS
stringzillas.LevenshteinDistances batch on H100
18,081 MCUPS
320,109 MCUPS
Needleman-Wunsch (Global Alignment)
Library
~100 bytes lines
~1,000 bytes lines
Rust
bio::pairwise::global on 1x SPR
51 MCUPS
57 MCUPS
stringzillas::NeedlemanWunschScores on 1x SPR
278 MCUPS
612 MCUPS
stringzillas::NeedlemanWunschScores on 16x SPR
4,057 MCUPS
8,492 MCUPS
stringzillas::NeedlemanWunschScores on 384x GNR
64,290 MCUPS
331,340 MCUPS
stringzillas::NeedlemanWunschScores on H100
131 MCUPS
12,113 MCUPS
Python
biopython.PairwiseAligner.score on 1x SPR
95 MCUPS
557 MCUPS
stringzillas.NeedlemanWunschScores on 1x SPR
30 MCUPS
481 MCUPS
stringzillas.NeedlemanWunschScores batch on 1x SPR
246 MCUPS
570 MCUPS
stringzillas.NeedlemanWunschScores batch on 16x SPR
3,103 MCUPS
9,208 MCUPS
stringzillas.NeedlemanWunschScores batch on H100
127 MCUPS
12,246 MCUPS
Smith-Waterman (Local Alignment)
Library
~100 bytes lines
~1,000 bytes lines
Rust
bio::pairwise::local on 1x SPR
49 MCUPS
50 MCUPS
stringzillas::SmithWatermanScores on 1x SPR
263 MCUPS
552 MCUPS
stringzillas::SmithWatermanScores on 16x SPR
3,883 MCUPS
8,011 MCUPS
stringzillas::SmithWatermanScores on 384x GNR
58,880 MCUPS
285,480 MCUPS
stringzillas::SmithWatermanScores on H100
143 MCUPS
12,921 MCUPS
Python
biopython.PairwiseAligner.score on 1x SPR
95 MCUPS
557 MCUPS
stringzillas.SmithWatermanScores on 1x SPR
28 MCUPS
440 MCUPS
stringzillas.SmithWatermanScores batch on 1x SPR
255 MCUPS
582 MCUPS
stringzillas.SmithWatermanScores batch on 16x SPR
3,535 MCUPS
8,235 MCUPS
stringzillas.SmithWatermanScores batch on H100
130 MCUPS
12,702 MCUPS
Needleman-Wunsch-Gotoh (Affine Gap Penalties)
Library
~100 bytes lines
~1,000 bytes lines
Rust
stringzillas::NeedlemanWunschScores on 1x SPR
83 MCUPS
354 MCUPS
stringzillas::NeedlemanWunschScores on 16x SPR
1,267 MCUPS
4,694 MCUPS
stringzillas::NeedlemanWunschScores on 384x GNR
42,050 MCUPS
155,920 MCUPS
stringzillas::NeedlemanWunschScores on H100
128 MCUPS
13,799 MCUPS
Smith-Waterman-Gotoh (Local with Affine Gaps)
Library
~100 bytes lines
~1,000 bytes lines
Rust
stringzillas::SmithWatermanScores on 1x SPR
79 MCUPS
284 MCUPS
stringzillas::SmithWatermanScores on 16x SPR
1,026 MCUPS
3,776 MCUPS
stringzillas::SmithWatermanScores on 384x GNR
38,430 MCUPS
129,140 MCUPS
stringzillas::SmithWatermanScores on H100
127 MCUPS
13,205 MCUPS
See README.md for dataset information and replication instructions.