Improve crossover dual simplex by rg20 · Pull Request #948 · NVIDIA/cuopt

rg20 · 2026-03-10T18:47:09Z

Description

This PR includes two performance improvements.

Take advantage of hyper sparsity while computing reduced costs in dual pushes in crossover code. This optimization improves the root relaxation solve significantly in some problems. For example: cod105, dano3_5.
Optimize right looking lu by replacing the linear degree bucket search with O(1) search. This optimization speeds up some of the hardest problems (root relaxation of MIP). For example:

chromaticindex1024(30%) 
cod105 (50%) 
k1mushroom (28%) 
splice1k1 (63%) 
supportcase26 (51%)

There are some regressions as well, but they are already solved in milliseconds.

Issue

Checklist

I am familiar with the Contributing Guidelines.
Testing
- New or existing tests cover these changes
- Added tests
- Created an issue to follow-up
- NA
Documentation
- The documentation is up to date with these changes
- Added new documentation
- NA

…Replace linear degree-bucket search with O(1) swap-with-last removal using col_pos/row_pos position arrays, and eliminate O(row_degree) pre-traversal in schur_complement via a persistent last_in_row[] array

copy-pr-bot · 2026-03-10T18:47:13Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

rg20 · 2026-03-10T18:49:25Z

/ok to test 5f377a2

coderabbitai · 2026-03-10T18:56:10Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Added a CSR row view (Arow) and sparse/dense delta_y paths to dual_push and its call sites in crossover; introduced per-bucket position tracking (col_pos, row_pos, last_in_row) and propagated signatures through right-looking LU routines; moved time-limit and concurrent-halt checks to the top of the phase2 main loop.

Changes

Cohort / File(s)	Summary
Sparse dual push / crossover `cpp/src/dual_simplex/crossover.cpp`	Added `csr_matrix_t Arow` (`lp.A.to_compressed_row(Arow)`), extended `dual_push` signature to accept `Arow`, implemented sparse vs dense delta_y computation paths (workspace buffers `delta_zN`, `delta_expanded`, `delta_y_dense`), sparse expansion via `Arow` to compute `delta_zN`, and updated y updates to apply only nonzero sparse contributions; call-sites updated.
Right-looking LU: bucket position tracking `cpp/src/dual_simplex/right_looking_lu.cpp`	Added `initialize_bucket_positions`, introduced `col_pos` and `row_pos` arrays and `last_in_row` propagation, extended `load_elements` to accept `last_in_row`, and threaded `col_pos/row_pos/last_in_row` through `update_Cdegree_and_col_count`, `update_Rdegree_and_row_count`, `schur_complement`, `remove_pivot_col`, and top-level `right_looking_lu` / `right_looking_lu_row_permutation_only`; multiple function signatures updated.
Phase2 loop control `cpp/src/dual_simplex/phase2.cpp`	Moved runtime (`toc(start_time) > settings.time_limit`) and concurrent-halt checks to the start of the `dual_phase2_with_advanced_basis` main iteration loop, removing the previous end-of-loop checks.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title "Improve crossover dual simplex" is concise and directly relates to the main changes: performance improvements in crossover and dual simplex optimization through sparse computation and LU factorization improvements.
Description check	✅ Passed	The description clearly explains the two performance improvements (hyper-sparsity in reduced cost computation and O(1) degree bucket search in LU factorization) with specific problem examples and relative speedups, directly relating to the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

cpp/src/dual_simplex/crossover.cpp (1)
418-436: Consider hoisting delta_expanded allocation outside the while loop.

Currently delta_expanded is allocated as a new std::vector<f_t>(n, 0.) on every iteration of the while (superbasic_list.size() > 0) loop (starting at line 391). For large problems with many superbasic variables, this results in repeated O(n) allocations and deallocations.

Moving the allocation before the loop and using std::fill to reset would avoid this overhead.
♻️ Suggested refactor
+  std::vector<f_t> delta_expanded(n);
   while (superbasic_list.size() > 0) {
     // ... existing code ...
     
     // delta_zN = -N^T delta_y
     std::vector<f_t> delta_zN(n - m);
-    std::vector<f_t> delta_expanded(n, 0.);
+    std::fill(delta_expanded.begin(), delta_expanded.end(), 0.);
     
     // Iterate directly over sparse delta_y instead of checking zeros
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/dual_simplex/crossover.cpp` around lines 418 - 436, Hoist the
allocation of delta_expanded (and optionally delta_zN) out of the while
(superbasic_list.size() > 0) loop: create std::vector<f_t> delta_expanded(n)
once before the loop and inside each iteration reset its contents with
std::fill(delta_expanded.begin(), delta_expanded.end(), 0.0) (and similarly
reset delta_zN or ensure it is resized once and overwritten). Update the loop
body that accumulates into delta_expanded and the subsequent assignment to
delta_zN[k] = -delta_expanded[nonbasic_list[k]] to rely on the reused buffers;
if n can change, ensure you call delta_expanded.resize(n) before the fill.
cpp/src/dual_simplex/right_looking_lu.cpp (1)

649-661: Extract the bucket-position bootstrap into one helper.

The col_pos / row_pos initialization is now duplicated in both factorization entry points, with only the bucket bounds differing. This is subtle invariant code; a small helper next to initialize_degree_data() would reduce drift the next time the bucket bookkeeping changes.

Also applies to: 1049-1062

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/dual_simplex/right_looking_lu.cpp`:
- Around line 405-409: work_estimate is using Cdegree[pivot_j] and
Rdegree[pivot_i] after those entries have been set to -1 by
update_Cdegree_and_col_count() / update_Rdegree_and_row_count(), causing
work_estimate to decrease incorrectly; fix by using the actual degree before it
is clobbered — either capture a local variable like int deg_j = Cdegree[pivot_j]
and/or deg_i = Rdegree[pivot_i] before calling the update functions, or compute
the degree from the corresponding count arrays (e.g. col_count[pivot_j] /
row_count[pivot_i]) or by traversing first_in_col/first_in_row if counts are not
available — then use deg_j/deg_i when updating work_estimate instead of reading
Cdegree/Rdegree after they were set to -1.

---

Nitpick comments:
In `@cpp/src/dual_simplex/crossover.cpp`:
- Around line 418-436: Hoist the allocation of delta_expanded (and optionally
delta_zN) out of the while (superbasic_list.size() > 0) loop: create
std::vector<f_t> delta_expanded(n) once before the loop and inside each
iteration reset its contents with std::fill(delta_expanded.begin(),
delta_expanded.end(), 0.0) (and similarly reset delta_zN or ensure it is resized
once and overwritten). Update the loop body that accumulates into delta_expanded
and the subsequent assignment to delta_zN[k] = -delta_expanded[nonbasic_list[k]]
to rely on the reused buffers; if n can change, ensure you call
delta_expanded.resize(n) before the fill.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6a904205-beca-40f2-bce0-a3dcdb15be1e

📥 Commits

Reviewing files that changed from the base of the PR and between 2b21118 and 5f377a2.

📒 Files selected for processing (2)

cpp/src/dual_simplex/crossover.cpp
cpp/src/dual_simplex/right_looking_lu.cpp

coderabbitai · 2026-03-10T18:56:14Z

cpp/src/dual_simplex/right_looking_lu.cpp

+  // Initialize row_last_workspace from last_in_row (O(1) per row, no full row traversal)
  for (i_t p1 = first_in_col[pivot_j]; p1 != kNone; p1 = elements[p1].next_in_column) {
-    element_t<i_t, f_t>* e = &elements[p1];
-    const i_t i            = e->i;
-    i_t row_last           = kNone;
-    for (i_t p3 = first_in_row[i]; p3 != kNone; p3 = elements[p3].next_in_row) {
-      row_last = p3;
-    }
-    work_estimate += 2 * Rdegree[i];
-    row_last_workspace[i] = row_last;
+    row_last_workspace[elements[p1].i] = last_in_row[elements[p1].i];
  }
  work_estimate += 4 * Cdegree[pivot_j];


⚠️ Potential issue | 🟡 Minor

Fix work_estimate accounting after pivot elimination.

Line 409, Line 499, and Line 509 still read Cdegree[pivot_j] / Rdegree[pivot_i], but those were already set to -1 by update_Cdegree_and_col_count() and update_Rdegree_and_row_count() before this helper is called. That makes work_estimate decrease on every pivot instead of approximating the actual loop cost.

Also applies to: 499-509

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cpp/src/dual_simplex/right_looking_lu.cpp` around lines 405 - 409, work_estimate is using Cdegree[pivot_j] and Rdegree[pivot_i] after those entries have been set to -1 by update_Cdegree_and_col_count() / update_Rdegree_and_row_count(), causing work_estimate to decrease incorrectly; fix by using the actual degree before it is clobbered — either capture a local variable like int deg_j = Cdegree[pivot_j] and/or deg_i = Rdegree[pivot_i] before calling the update functions, or compute the degree from the corresponding count arrays (e.g. col_count[pivot_j] / row_count[pivot_i]) or by traversing first_in_col/first_in_row if counts are not available — then use deg_j/deg_i when updating work_estimate instead of reading Cdegree/Rdegree after they were set to -1.

Looks like even the original code has some discrepancies. I am not going to modify it in this PR. @aliceb-nv for viz!

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

cpp/src/dual_simplex/right_looking_lu.cpp

cpp/src/dual_simplex/crossover.cpp

Allocate buffers once before the superbasic loop and reset with std::fill each iteration to avoid repeated O(n) allocations (PR NVIDIA#948 review). Made-with: Cursor

…ual_simplex

rg20 · 2026-03-16T16:33:17Z

/ok to test cc54886

coderabbitai

♻️ Duplicate comments (1)

cpp/src/dual_simplex/right_looking_lu.cpp (1)
371-372: ⚠️ Potential issue | 🟡 Minor

Preserve pivot degrees until Schur work accounting is finished.

Line 371 and Line 408 set pivot degrees to -1, but schur_complement() still uses those pivot degrees for work_estimate terms, which can push the estimate downward incorrectly. Please cache pivot degrees (or defer sentinel assignment) before those bookkeeping terms are computed.

Also applies to: 408-409
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/dual_simplex/right_looking_lu.cpp` around lines 371 - 372, The code
currently sets Cdegree[pivot_j] = -1 (in the block around pivot handling) before
schur_complement() computes work_estimate terms, which causes those terms to use
the sentinel -1 and undercount work; preserve the original pivot degree value
until after all Schur work accounting is finished by caching the degree (e.g.,
save old_degree = Cdegree[pivot_j]) or by deferring the sentinel assignment
until after schur_complement() and any work_estimate calculations complete;
update both spots where Cdegree[pivot_j] is set to -1 (the one near pivot_j and
the one around the other referenced location) to use the cached value approach
and only assign -1 after work_estimate is computed.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@cpp/src/dual_simplex/right_looking_lu.cpp`:
- Around line 371-372: The code currently sets Cdegree[pivot_j] = -1 (in the
block around pivot handling) before schur_complement() computes work_estimate
terms, which causes those terms to use the sentinel -1 and undercount work;
preserve the original pivot degree value until after all Schur work accounting
is finished by caching the degree (e.g., save old_degree = Cdegree[pivot_j]) or
by deferring the sentinel assignment until after schur_complement() and any
work_estimate calculations complete; update both spots where Cdegree[pivot_j] is
set to -1 (the one near pivot_j and the one around the other referenced
location) to use the cached value approach and only assign -1 after
work_estimate is computed.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: eb075e7a-fa49-46ae-b96a-caf985a7b449

📥 Commits

Reviewing files that changed from the base of the PR and between 5f377a2 and cc54886.

📒 Files selected for processing (2)

cpp/src/dual_simplex/crossover.cpp
cpp/src/dual_simplex/right_looking_lu.cpp

rg20 · 2026-03-16T17:38:22Z

/ok to test 3e2352b

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

cpp/src/dual_simplex/crossover.cpp (1)
421-453: Consider extracting sparse/dense delta_z computation into a shared utility.

As noted in prior review discussion, the sparse vs. dense computation of delta_z = -A^T * delta_y could be factored into a shared function for use in both dual simplex and crossover. This would reduce code duplication and ensure consistent threshold tuning.

This can be deferred to a follow-up PR as suggested by the reviewer.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/dual_simplex/crossover.cpp` around lines 421 - 453, Extract the
sparse vs dense computation of delta_z (currently computing delta_zN using
delta_y_sparse / delta_y_dense, delta_expanded, nonbasic_list, lp.A and Arow)
into a shared utility function (e.g., compute_delta_z_from_delta_y) that accepts
delta_y in sparse or dense form, A (or Arow), nonbasic_list and outputs
delta_zN; move the sparse path (loop over delta_y_sparse.i/x and Arow to build
delta_expanded) and the dense path (delta_y_sparse.to_dense → dot product over
lp.A columns) into the single function, preserve the existing 30% sparsity
threshold outside the helper, and update callers in dual simplex and crossover
to call the new utility to avoid duplication and centralize threshold/tuning.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/dual_simplex/crossover.cpp`:
- Around line 462-466: The loop over sparse entries in crossover.cpp uses
delta_y_sparse.i.size() as the upper bound without casting, causing a
signed/unsigned comparison warning and inconsistency with the earlier loop;
change the loop condition to use static_cast<i_t>(delta_y_sparse.i.size()) so
the type of nnz_idx (i_t) matches the cast size, keeping the body unchanged
(variables: delta_y_sparse, nnz_idx, i_t, y, step_length).

---

Nitpick comments:
In `@cpp/src/dual_simplex/crossover.cpp`:
- Around line 421-453: Extract the sparse vs dense computation of delta_z
(currently computing delta_zN using delta_y_sparse / delta_y_dense,
delta_expanded, nonbasic_list, lp.A and Arow) into a shared utility function
(e.g., compute_delta_z_from_delta_y) that accepts delta_y in sparse or dense
form, A (or Arow), nonbasic_list and outputs delta_zN; move the sparse path
(loop over delta_y_sparse.i/x and Arow to build delta_expanded) and the dense
path (delta_y_sparse.to_dense → dot product over lp.A columns) into the single
function, preserve the existing 30% sparsity threshold outside the helper, and
update callers in dual simplex and crossover to call the new utility to avoid
duplication and centralize threshold/tuning.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cecd1046-6c96-4a21-a4a0-df1a91062ace

📥 Commits

Reviewing files that changed from the base of the PR and between cc54886 and 3e2352b.

📒 Files selected for processing (1)

cpp/src/dual_simplex/crossover.cpp

cpp/src/dual_simplex/crossover.cpp

Allocate buffers once before the superbasic loop and reset with std::fill each iteration to avoid repeated O(n) allocations (PR NVIDIA#948 review). Made-with: Cursor

…ossover_dual_simplex

rg20 · 2026-03-20T15:44:09Z

/ok to test 1ecb41f

…ossover_dual_simplex

rg20 · 2026-03-23T15:23:11Z

/ok to test c4ae488

…ossover_dual_simplex

rg20 · 2026-03-31T18:58:57Z

/ok to test 2ed5853

…eration

…ossover_dual_simplex

rg20 · 2026-04-01T15:26:17Z

/ok to test 42e7b8f

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/dual_simplex/phase2.cpp`:
- Around line 2787-2792: The early stop checks using toc(start_time) and
settings.concurrent_halt are returning before pricing and can mask a truly
optimal basis; move these checks so pricing runs first, then if leaving_index ==
-1 call phase2::prepare_optimality(...) and only after that honor the time
limit/CONCURRENT_LIMIT returns (i.e., keep the pricing/leave-index logic—the
branch that detects leaving_index == -1 and the call to
phase2::prepare_optimality—before any return of dual::status_t::TIME_LIMIT or
CONCURRENT_LIMIT based on toc(start_time) or *settings.concurrent_halt).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 741fa4be-2213-4414-ba97-1f75831be66a

📥 Commits

Reviewing files that changed from the base of the PR and between 3e2352b and 1f0c158.

📒 Files selected for processing (1)

cpp/src/dual_simplex/phase2.cpp

cpp/src/dual_simplex/phase2.cpp

…ossover_dual_simplex

rg20 · 2026-04-01T18:17:28Z

/ok to test ec5e015

rg20 · 2026-04-01T18:20:41Z

/ok to test 5331696

chris-maes

LGTM! Thanks for the perf improvements.

rg20 · 2026-04-02T01:53:11Z

/merge

rg20 added 2 commits March 10, 2026 11:42

Optimize right-looking LU factorization with O(1) degree-bucket ops …

00c58df

…Replace linear degree-bucket search with O(1) swap-with-last removal using col_pos/row_pos position arrays, and eliminate O(row_degree) pre-traversal in schur_complement via a persistent last_in_row[] array

Take advantage of hyper sparsity in dual push

5f377a2

rg20 requested a review from a team as a code owner March 10, 2026 18:47

rg20 requested review from aliceb-nv and chris-maes March 10, 2026 18:47

rg20 added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Mar 10, 2026

rg20 added this to the 26.04 milestone Mar 10, 2026

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

chris-maes reviewed Mar 11, 2026

View reviewed changes

cpp/src/dual_simplex/right_looking_lu.cpp Show resolved Hide resolved

chris-maes reviewed Mar 11, 2026

View reviewed changes