Skip to content

Improve crossover dual simplex#948

Merged
rapids-bot[bot] merged 16 commits intoNVIDIA:release/26.04from
rg20:improve_crossover_dual_simplex
Apr 2, 2026
Merged

Improve crossover dual simplex#948
rapids-bot[bot] merged 16 commits intoNVIDIA:release/26.04from
rg20:improve_crossover_dual_simplex

Conversation

@rg20
Copy link
Copy Markdown
Contributor

@rg20 rg20 commented Mar 10, 2026

Description

This PR includes two performance improvements.

  1. Take advantage of hyper sparsity while computing reduced costs in dual pushes in crossover code. This optimization improves the root relaxation solve significantly in some problems. For example: cod105, dano3_5.
  2. Optimize right looking lu by replacing the linear degree bucket search with O(1) search. This optimization speeds up some of the hardest problems (root relaxation of MIP). For example:
chromaticindex1024(30%) 
cod105 (50%) 
k1mushroom (28%) 
splice1k1 (63%) 
supportcase26 (51%) 

There are some regressions as well, but they are already solved in milliseconds.

Issue

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

rg20 added 2 commits March 10, 2026 11:42
…Replace linear degree-bucket search with O(1) swap-with-last removal using col_pos/row_pos position arrays, and eliminate O(row_degree) pre-traversal in schur_complement via a persistent last_in_row[] array
@rg20 rg20 requested a review from a team as a code owner March 10, 2026 18:47
@rg20 rg20 requested review from aliceb-nv and chris-maes March 10, 2026 18:47
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 10, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@rg20 rg20 added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Mar 10, 2026
@rg20 rg20 added this to the 26.04 milestone Mar 10, 2026
@rg20
Copy link
Copy Markdown
Contributor Author

rg20 commented Mar 10, 2026

/ok to test 5f377a2

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 10, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Added a CSR row view (Arow) and sparse/dense delta_y paths to dual_push and its call sites in crossover; introduced per-bucket position tracking (col_pos, row_pos, last_in_row) and propagated signatures through right-looking LU routines; moved time-limit and concurrent-halt checks to the top of the phase2 main loop.

Changes

Cohort / File(s) Summary
Sparse dual push / crossover
cpp/src/dual_simplex/crossover.cpp
Added csr_matrix_t Arow (lp.A.to_compressed_row(Arow)), extended dual_push signature to accept Arow, implemented sparse vs dense delta_y computation paths (workspace buffers delta_zN, delta_expanded, delta_y_dense), sparse expansion via Arow to compute delta_zN, and updated y updates to apply only nonzero sparse contributions; call-sites updated.
Right-looking LU: bucket position tracking
cpp/src/dual_simplex/right_looking_lu.cpp
Added initialize_bucket_positions, introduced col_pos and row_pos arrays and last_in_row propagation, extended load_elements to accept last_in_row, and threaded col_pos/row_pos/last_in_row through update_Cdegree_and_col_count, update_Rdegree_and_row_count, schur_complement, remove_pivot_col, and top-level right_looking_lu / right_looking_lu_row_permutation_only; multiple function signatures updated.
Phase2 loop control
cpp/src/dual_simplex/phase2.cpp
Moved runtime (toc(start_time) > settings.time_limit) and concurrent-halt checks to the start of the dual_phase2_with_advanced_basis main iteration loop, removing the previous end-of-loop checks.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title "Improve crossover dual simplex" is concise and directly relates to the main changes: performance improvements in crossover and dual simplex optimization through sparse computation and LU factorization improvements.
Description check ✅ Passed The description clearly explains the two performance improvements (hyper-sparsity in reduced cost computation and O(1) degree bucket search in LU factorization) with specific problem examples and relative speedups, directly relating to the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
cpp/src/dual_simplex/crossover.cpp (1)

418-436: Consider hoisting delta_expanded allocation outside the while loop.

Currently delta_expanded is allocated as a new std::vector<f_t>(n, 0.) on every iteration of the while (superbasic_list.size() > 0) loop (starting at line 391). For large problems with many superbasic variables, this results in repeated O(n) allocations and deallocations.

Moving the allocation before the loop and using std::fill to reset would avoid this overhead.

♻️ Suggested refactor
+  std::vector<f_t> delta_expanded(n);
   while (superbasic_list.size() > 0) {
     // ... existing code ...
     
     // delta_zN = -N^T delta_y
     std::vector<f_t> delta_zN(n - m);
-    std::vector<f_t> delta_expanded(n, 0.);
+    std::fill(delta_expanded.begin(), delta_expanded.end(), 0.);
     
     // Iterate directly over sparse delta_y instead of checking zeros
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/dual_simplex/crossover.cpp` around lines 418 - 436, Hoist the
allocation of delta_expanded (and optionally delta_zN) out of the while
(superbasic_list.size() > 0) loop: create std::vector<f_t> delta_expanded(n)
once before the loop and inside each iteration reset its contents with
std::fill(delta_expanded.begin(), delta_expanded.end(), 0.0) (and similarly
reset delta_zN or ensure it is resized once and overwritten). Update the loop
body that accumulates into delta_expanded and the subsequent assignment to
delta_zN[k] = -delta_expanded[nonbasic_list[k]] to rely on the reused buffers;
if n can change, ensure you call delta_expanded.resize(n) before the fill.
cpp/src/dual_simplex/right_looking_lu.cpp (1)

649-661: Extract the bucket-position bootstrap into one helper.

The col_pos / row_pos initialization is now duplicated in both factorization entry points, with only the bucket bounds differing. This is subtle invariant code; a small helper next to initialize_degree_data() would reduce drift the next time the bucket bookkeeping changes.

Also applies to: 1049-1062

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/dual_simplex/right_looking_lu.cpp`:
- Around line 405-409: work_estimate is using Cdegree[pivot_j] and
Rdegree[pivot_i] after those entries have been set to -1 by
update_Cdegree_and_col_count() / update_Rdegree_and_row_count(), causing
work_estimate to decrease incorrectly; fix by using the actual degree before it
is clobbered — either capture a local variable like int deg_j = Cdegree[pivot_j]
and/or deg_i = Rdegree[pivot_i] before calling the update functions, or compute
the degree from the corresponding count arrays (e.g. col_count[pivot_j] /
row_count[pivot_i]) or by traversing first_in_col/first_in_row if counts are not
available — then use deg_j/deg_i when updating work_estimate instead of reading
Cdegree/Rdegree after they were set to -1.

---

Nitpick comments:
In `@cpp/src/dual_simplex/crossover.cpp`:
- Around line 418-436: Hoist the allocation of delta_expanded (and optionally
delta_zN) out of the while (superbasic_list.size() > 0) loop: create
std::vector<f_t> delta_expanded(n) once before the loop and inside each
iteration reset its contents with std::fill(delta_expanded.begin(),
delta_expanded.end(), 0.0) (and similarly reset delta_zN or ensure it is resized
once and overwritten). Update the loop body that accumulates into delta_expanded
and the subsequent assignment to delta_zN[k] = -delta_expanded[nonbasic_list[k]]
to rely on the reused buffers; if n can change, ensure you call
delta_expanded.resize(n) before the fill.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6a904205-beca-40f2-bce0-a3dcdb15be1e

📥 Commits

Reviewing files that changed from the base of the PR and between 2b21118 and 5f377a2.

📒 Files selected for processing (2)
  • cpp/src/dual_simplex/crossover.cpp
  • cpp/src/dual_simplex/right_looking_lu.cpp

Comment on lines 405 to 409
// Initialize row_last_workspace from last_in_row (O(1) per row, no full row traversal)
for (i_t p1 = first_in_col[pivot_j]; p1 != kNone; p1 = elements[p1].next_in_column) {
element_t<i_t, f_t>* e = &elements[p1];
const i_t i = e->i;
i_t row_last = kNone;
for (i_t p3 = first_in_row[i]; p3 != kNone; p3 = elements[p3].next_in_row) {
row_last = p3;
}
work_estimate += 2 * Rdegree[i];
row_last_workspace[i] = row_last;
row_last_workspace[elements[p1].i] = last_in_row[elements[p1].i];
}
work_estimate += 4 * Cdegree[pivot_j];
Copy link
Copy Markdown

@coderabbitai coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix work_estimate accounting after pivot elimination.

Line 409, Line 499, and Line 509 still read Cdegree[pivot_j] / Rdegree[pivot_i], but those were already set to -1 by update_Cdegree_and_col_count() and update_Rdegree_and_row_count() before this helper is called. That makes work_estimate decrease on every pivot instead of approximating the actual loop cost.

Also applies to: 499-509

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/dual_simplex/right_looking_lu.cpp` around lines 405 - 409,
work_estimate is using Cdegree[pivot_j] and Rdegree[pivot_i] after those entries
have been set to -1 by update_Cdegree_and_col_count() /
update_Rdegree_and_row_count(), causing work_estimate to decrease incorrectly;
fix by using the actual degree before it is clobbered — either capture a local
variable like int deg_j = Cdegree[pivot_j] and/or deg_i = Rdegree[pivot_i]
before calling the update functions, or compute the degree from the
corresponding count arrays (e.g. col_count[pivot_j] / row_count[pivot_i]) or by
traversing first_in_col/first_in_row if counts are not available — then use
deg_j/deg_i when updating work_estimate instead of reading Cdegree/Rdegree after
they were set to -1.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like even the original code has some discrepancies. I am not going to modify it in this PR. @aliceb-nv for viz!

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

rg20 added 4 commits March 16, 2026 07:40
Allocate buffers once before the superbasic loop and reset with std::fill
each iteration to avoid repeated O(n) allocations (PR NVIDIA#948 review).

Made-with: Cursor
@rg20
Copy link
Copy Markdown
Contributor Author

rg20 commented Mar 16, 2026

/ok to test cc54886

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
cpp/src/dual_simplex/right_looking_lu.cpp (1)

371-372: ⚠️ Potential issue | 🟡 Minor

Preserve pivot degrees until Schur work accounting is finished.

Line 371 and Line 408 set pivot degrees to -1, but schur_complement() still uses those pivot degrees for work_estimate terms, which can push the estimate downward incorrectly. Please cache pivot degrees (or defer sentinel assignment) before those bookkeeping terms are computed.

Also applies to: 408-409

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/dual_simplex/right_looking_lu.cpp` around lines 371 - 372, The code
currently sets Cdegree[pivot_j] = -1 (in the block around pivot handling) before
schur_complement() computes work_estimate terms, which causes those terms to use
the sentinel -1 and undercount work; preserve the original pivot degree value
until after all Schur work accounting is finished by caching the degree (e.g.,
save old_degree = Cdegree[pivot_j]) or by deferring the sentinel assignment
until after schur_complement() and any work_estimate calculations complete;
update both spots where Cdegree[pivot_j] is set to -1 (the one near pivot_j and
the one around the other referenced location) to use the cached value approach
and only assign -1 after work_estimate is computed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@cpp/src/dual_simplex/right_looking_lu.cpp`:
- Around line 371-372: The code currently sets Cdegree[pivot_j] = -1 (in the
block around pivot handling) before schur_complement() computes work_estimate
terms, which causes those terms to use the sentinel -1 and undercount work;
preserve the original pivot degree value until after all Schur work accounting
is finished by caching the degree (e.g., save old_degree = Cdegree[pivot_j]) or
by deferring the sentinel assignment until after schur_complement() and any
work_estimate calculations complete; update both spots where Cdegree[pivot_j] is
set to -1 (the one near pivot_j and the one around the other referenced
location) to use the cached value approach and only assign -1 after
work_estimate is computed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: eb075e7a-fa49-46ae-b96a-caf985a7b449

📥 Commits

Reviewing files that changed from the base of the PR and between 5f377a2 and cc54886.

📒 Files selected for processing (2)
  • cpp/src/dual_simplex/crossover.cpp
  • cpp/src/dual_simplex/right_looking_lu.cpp

@rg20
Copy link
Copy Markdown
Contributor Author

rg20 commented Mar 16, 2026

/ok to test 3e2352b

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
cpp/src/dual_simplex/crossover.cpp (1)

421-453: Consider extracting sparse/dense delta_z computation into a shared utility.

As noted in prior review discussion, the sparse vs. dense computation of delta_z = -A^T * delta_y could be factored into a shared function for use in both dual simplex and crossover. This would reduce code duplication and ensure consistent threshold tuning.

This can be deferred to a follow-up PR as suggested by the reviewer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/dual_simplex/crossover.cpp` around lines 421 - 453, Extract the
sparse vs dense computation of delta_z (currently computing delta_zN using
delta_y_sparse / delta_y_dense, delta_expanded, nonbasic_list, lp.A and Arow)
into a shared utility function (e.g., compute_delta_z_from_delta_y) that accepts
delta_y in sparse or dense form, A (or Arow), nonbasic_list and outputs
delta_zN; move the sparse path (loop over delta_y_sparse.i/x and Arow to build
delta_expanded) and the dense path (delta_y_sparse.to_dense → dot product over
lp.A columns) into the single function, preserve the existing 30% sparsity
threshold outside the helper, and update callers in dual simplex and crossover
to call the new utility to avoid duplication and centralize threshold/tuning.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/dual_simplex/crossover.cpp`:
- Around line 462-466: The loop over sparse entries in crossover.cpp uses
delta_y_sparse.i.size() as the upper bound without casting, causing a
signed/unsigned comparison warning and inconsistency with the earlier loop;
change the loop condition to use static_cast<i_t>(delta_y_sparse.i.size()) so
the type of nnz_idx (i_t) matches the cast size, keeping the body unchanged
(variables: delta_y_sparse, nnz_idx, i_t, y, step_length).

---

Nitpick comments:
In `@cpp/src/dual_simplex/crossover.cpp`:
- Around line 421-453: Extract the sparse vs dense computation of delta_z
(currently computing delta_zN using delta_y_sparse / delta_y_dense,
delta_expanded, nonbasic_list, lp.A and Arow) into a shared utility function
(e.g., compute_delta_z_from_delta_y) that accepts delta_y in sparse or dense
form, A (or Arow), nonbasic_list and outputs delta_zN; move the sparse path
(loop over delta_y_sparse.i/x and Arow to build delta_expanded) and the dense
path (delta_y_sparse.to_dense → dot product over lp.A columns) into the single
function, preserve the existing 30% sparsity threshold outside the helper, and
update callers in dual simplex and crossover to call the new utility to avoid
duplication and centralize threshold/tuning.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cecd1046-6c96-4a21-a4a0-df1a91062ace

📥 Commits

Reviewing files that changed from the base of the PR and between cc54886 and 3e2352b.

📒 Files selected for processing (1)
  • cpp/src/dual_simplex/crossover.cpp

@rgsl888prabhu rgsl888prabhu changed the base branch from main to release/26.04 March 19, 2026 16:32
rg20 added a commit to rg20/cuopt that referenced this pull request Mar 19, 2026
Allocate buffers once before the superbasic loop and reset with std::fill
each iteration to avoid repeated O(n) allocations (PR NVIDIA#948 review).

Made-with: Cursor
@rg20
Copy link
Copy Markdown
Contributor Author

rg20 commented Mar 20, 2026

/ok to test 1ecb41f

@rg20
Copy link
Copy Markdown
Contributor Author

rg20 commented Mar 23, 2026

/ok to test c4ae488

@chris-maes chris-maes added the P1 label Mar 25, 2026
@rg20
Copy link
Copy Markdown
Contributor Author

rg20 commented Mar 31, 2026

/ok to test 2ed5853

@rg20
Copy link
Copy Markdown
Contributor Author

rg20 commented Apr 1, 2026

/ok to test 42e7b8f

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/dual_simplex/phase2.cpp`:
- Around line 2787-2792: The early stop checks using toc(start_time) and
settings.concurrent_halt are returning before pricing and can mask a truly
optimal basis; move these checks so pricing runs first, then if leaving_index ==
-1 call phase2::prepare_optimality(...) and only after that honor the time
limit/CONCURRENT_LIMIT returns (i.e., keep the pricing/leave-index logic—the
branch that detects leaving_index == -1 and the call to
phase2::prepare_optimality—before any return of dual::status_t::TIME_LIMIT or
CONCURRENT_LIMIT based on toc(start_time) or *settings.concurrent_halt).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 741fa4be-2213-4414-ba97-1f75831be66a

📥 Commits

Reviewing files that changed from the base of the PR and between 3e2352b and 1f0c158.

📒 Files selected for processing (1)
  • cpp/src/dual_simplex/phase2.cpp

@rg20
Copy link
Copy Markdown
Contributor Author

rg20 commented Apr 1, 2026

/ok to test ec5e015

@rg20
Copy link
Copy Markdown
Contributor Author

rg20 commented Apr 1, 2026

/ok to test 5331696

Copy link
Copy Markdown
Contributor

@chris-maes chris-maes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the perf improvements.

@rg20
Copy link
Copy Markdown
Contributor Author

rg20 commented Apr 2, 2026

/merge

@rapids-bot rapids-bot bot merged commit 3cdc7a2 into NVIDIA:release/26.04 Apr 2, 2026
305 of 312 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change P1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants