Skip to content

src: add Latin1 fast path in StringBytes::Encode utf8#63385

Open
mertcanaltin wants to merge 1 commit into
nodejs:mainfrom
mertcanaltin:mert/buffer-tostring-utf8-latin1
Open

src: add Latin1 fast path in StringBytes::Encode utf8#63385
mertcanaltin wants to merge 1 commit into
nodejs:mainfrom
mertcanaltin:mert/buffer-tostring-utf8-latin1

Conversation

@mertcanaltin
Copy link
Copy Markdown
Member

@mertcanaltin mertcanaltin commented May 17, 2026

In StringBytes::Encode utf8, latin1-fits content was going through the UTF-16 path. I added a latin1 fast path that converts via simdutf and returns a one-byte V8 string. Benefits every Buffer.toString('utf8'), including fs.readFile/promises.readFile when they delegate to it.

@nodejs/performance @mcollina @anonrig @lemire @addaleax

Benchmark results:

➜  node git:(mert/buffer-tostring-utf8-latin1) ✗ node-benchmark-compare ./result.csv
                                                                                 confidence improvement accuracy (*)   (**)  (***)
buffers/buffer-tostring-utf8-latin1.js n=10000 content='ascii' size=1024                         0.96 %       ±1.54% ±2.06% ±2.68%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='ascii' size=16384                        0.68 %       ±1.03% ±1.37% ±1.78%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='ascii' size=262144                      -0.87 %       ±1.82% ±2.42% ±3.16%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='ascii' size=4194304                      0.06 %       ±0.65% ±0.87% ±1.14%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='ascii' size=64                          -1.13 %       ±1.66% ±2.21% ±2.89%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='latin1' size=1024               ***     12.38 %       ±1.04% ±1.39% ±1.81%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='latin1' size=16384              ***     39.65 %       ±1.96% ±2.63% ±3.47%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='latin1' size=262144             ***     40.57 %       ±1.10% ±1.48% ±1.96%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='latin1' size=4194304            ***      2.78 %       ±0.60% ±0.80% ±1.05%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='latin1' size=64                          0.22 %       ±1.23% ±1.65% ±2.16%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='utf8_mixed' size=1024                    1.33 %       ±1.41% ±1.87% ±2.44%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='utf8_mixed' size=16384                  -0.30 %       ±1.00% ±1.33% ±1.73%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='utf8_mixed' size=262144                 -0.51 %       ±1.27% ±1.70% ±2.23%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='utf8_mixed' size=4194304                -0.26 %       ±0.67% ±0.89% ±1.16%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='utf8_mixed' size=64                      1.07 %       ±1.66% ±2.21% ±2.88%

Be aware that when doing many comparisons the risk of a false-positive result increases.
In this case, there are 15 comparisons, you can thus expect the following amount of false-positive results:
  0.75 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.15 false positives, when considering a   1% risk acceptance (**, ***),
  0.01 false positives, when considering a 0.1% risk acceptance (***)
➜  node git:(mert/buffer-tostring-utf8-latin1) ✗

Signed-off-by: Mert Can Altin <mertgold60@gmail.com>
@nodejs-github-bot
Copy link
Copy Markdown
Collaborator

Review requested:

  • @nodejs/performance

@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels May 17, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 17, 2026

Codecov Report

❌ Patch coverage is 94.44444% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 90.07%. Comparing base (265679b) to head (f2a6a98).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
src/string_bytes.cc 94.44% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #63385      +/-   ##
==========================================
+ Coverage   90.05%   90.07%   +0.01%     
==========================================
  Files         714      714              
  Lines      225628   225647      +19     
  Branches    42673    42691      +18     
==========================================
+ Hits       203198   203248      +50     
+ Misses      14225    14186      -39     
- Partials     8205     8213       +8     
Files with missing lines Coverage Δ
src/string_bytes.cc 75.63% <94.44%> (+0.99%) ⬆️

... and 36 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants