Skip to content

perf: improve parallelism of SortBotMerge#831

Open
jodavies wants to merge 2 commits into
form-dev:masterfrom
jodavies:sortbot-blocks
Open

perf: improve parallelism of SortBotMerge#831
jodavies wants to merge 2 commits into
form-dev:masterfrom
jodavies:sortbot-blocks

Conversation

@jodavies
Copy link
Copy Markdown
Collaborator

@jodavies jodavies commented May 18, 2026

This change comes from discussion and benchmarking efforts in collaboration with @AnaIPereira. See the commit message for a full description.

On my 12-core 7900X, the results are as follows. @AnaIPereira is finishing up some testing on higher core-count sytems, to make sure we have a good value for MINWRITENUMBEROFTERMS and that the adjusted, lower value of MINIMUMNUMBEROFTERMS does not affect performance.

Speedup w.r.t. master:

Benchmark form -w2 -w4 -w6 -w12 -w24 (HT)
chromatic 0.99 ± 0.01 1.03 ± 0.02 1.08 ± 0.02 1.14 ± 0.01 1.12 ± 0.01 1.08 ± 0.00
color 0.99 ± 0.01 1.01 ± 0.01 1.02 ± 0.01 1.03 ± 0.02 1.03 ± 0.02 1.03 ± 0.01
fmft 1.01 ± 0.01 1.02 ± 0.00 1.08 ± 0.01 1.13 ± 0.01 1.14 ± 0.01 1.09 ± 0.01
forcer-exp 1.00 ± 0.00 1.00 ± 0.00 1.04 ± 0.01 1.09 ± 0.03 1.13 ± 0.00 1.14 ± 0.00
forcer 1.00 ± 0.01 1.01 ± 0.00 1.07 ± 0.01 1.14 ± 0.00 1.20 ± 0.01 1.21 ± 0.00
hyperform 1.00 ± 0.01 1.03 ± 0.01 1.06 ± 0.01 1.09 ± 0.01 1.08 ± 0.01 1.05 ± 0.01
mass-fact 1.00 ± 0.01 1.02 ± 0.02 1.01 ± 0.03 1.02 ± 0.02 1.06 ± 0.07 1.03 ± 0.05
mbox1l 1.00 ± 0.00 1.01 ± 0.00 1.07 ± 0.01 1.15 ± 0.02 1.30 ± 0.03 1.38 ± 0.04
minceex 1.00 ± 0.00 1.04 ± 0.00 1.06 ± 0.01 1.08 ± 0.01 1.08 ± 0.01 1.06 ± 0.01
mincer 1.01 ± 0.02 1.01 ± 0.01 1.03 ± 0.01 1.03 ± 0.01 1.02 ± 0.01 1.02 ± 0.01
mzv-dm 1.00 ± 0.01 0.99 ± 0.02 1.00 ± 0.01 1.00 ± 0.02 1.01 ± 0.04 1.00 ± 0.02
sort-disk 1.02 ± 0.01 1.01 ± 0.02 1.01 ± 0.01 1.01 ± 0.01 1.02 ± 0.02 1.00 ± 0.01
sort-large 1.02 ± 0.01 0.98 ± 0.01 0.99 ± 0.01 1.00 ± 0.02 1.01 ± 0.01 1.01 ± 0.01
sort-small 1.02 ± 0.02 0.98 ± 0.02 1.00 ± 0.02 1.02 ± 0.03 1.07 ± 0.04 1.07 ± 0.08
trace 1.00 ± 0.02 0.99 ± 0.01 0.99 ± 0.01 1.00 ± 0.01 1.00 ± 0.01 0.99 ± 0.03

Currently, the sortbot stage of tform sorting does not achieve good
parallelism. Primarily, this is because the "sortblock" size has
grown with default SmallSize and LargeSize adjustments, such that in
many cases the sortbot levels do not run in parallel at all, because
each thread's output fits in a single block.

This commit adjusts the logic for filling and unlocking the blocks
such that sortbot threads can start work as soon as possible:
 - Only put complete terms into the blocks, no splitting over the blocks.
 - Track the number of terms in each block, and use this when reading
   the data to determine when a block is complete, rather than waiting
   for a term to overlap the "stop" pointer.
 - When filling blocks (in PutToMaster), if a certain (small) minimum
   number of terms has been written, probe if a reading thread is waiting
   on the current block by attempting to lock the previous block. If a
   thread is waiting (so the lock fails), unlock the current block
   immediately and write the term into the next.

With these changes, the "zeroth" block is no longer necessary since the
terms never span two blocks.

Also reduce the MINIMUMNUMBEROFTERMS parameter from 10 to 2, so that
the small+large buffer does not scale (so much) with large MaxTermSize
and many threads.
@coveralls
Copy link
Copy Markdown

Coverage Status

coverage: 58.532% (+0.05%) from 58.481% — jodavies:sortbot-blocks into form-dev:master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants