Improve crossover dual simplex#948
Conversation
…Replace linear degree-bucket search with O(1) swap-with-last removal using col_pos/row_pos position arrays, and eliminate O(row_degree) pre-traversal in schur_complement via a persistent last_in_row[] array
|
/ok to test 5f377a2 |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdded a CSR row view (Arow) and sparse/dense delta_y paths to dual_push and its call sites in crossover; introduced per-bucket position tracking (col_pos, row_pos, last_in_row) and propagated signatures through right-looking LU routines; moved time-limit and concurrent-halt checks to the top of the phase2 main loop. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
cpp/src/dual_simplex/crossover.cpp (1)
418-436: Consider hoistingdelta_expandedallocation outside the while loop.Currently
delta_expandedis allocated as a newstd::vector<f_t>(n, 0.)on every iteration of thewhile (superbasic_list.size() > 0)loop (starting at line 391). For large problems with many superbasic variables, this results in repeated O(n) allocations and deallocations.Moving the allocation before the loop and using
std::fillto reset would avoid this overhead.♻️ Suggested refactor
+ std::vector<f_t> delta_expanded(n); while (superbasic_list.size() > 0) { // ... existing code ... // delta_zN = -N^T delta_y std::vector<f_t> delta_zN(n - m); - std::vector<f_t> delta_expanded(n, 0.); + std::fill(delta_expanded.begin(), delta_expanded.end(), 0.); // Iterate directly over sparse delta_y instead of checking zeros🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/dual_simplex/crossover.cpp` around lines 418 - 436, Hoist the allocation of delta_expanded (and optionally delta_zN) out of the while (superbasic_list.size() > 0) loop: create std::vector<f_t> delta_expanded(n) once before the loop and inside each iteration reset its contents with std::fill(delta_expanded.begin(), delta_expanded.end(), 0.0) (and similarly reset delta_zN or ensure it is resized once and overwritten). Update the loop body that accumulates into delta_expanded and the subsequent assignment to delta_zN[k] = -delta_expanded[nonbasic_list[k]] to rely on the reused buffers; if n can change, ensure you call delta_expanded.resize(n) before the fill.cpp/src/dual_simplex/right_looking_lu.cpp (1)
649-661: Extract the bucket-position bootstrap into one helper.The
col_pos/row_posinitialization is now duplicated in both factorization entry points, with only the bucket bounds differing. This is subtle invariant code; a small helper next toinitialize_degree_data()would reduce drift the next time the bucket bookkeeping changes.Also applies to: 1049-1062
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/dual_simplex/right_looking_lu.cpp`:
- Around line 405-409: work_estimate is using Cdegree[pivot_j] and
Rdegree[pivot_i] after those entries have been set to -1 by
update_Cdegree_and_col_count() / update_Rdegree_and_row_count(), causing
work_estimate to decrease incorrectly; fix by using the actual degree before it
is clobbered — either capture a local variable like int deg_j = Cdegree[pivot_j]
and/or deg_i = Rdegree[pivot_i] before calling the update functions, or compute
the degree from the corresponding count arrays (e.g. col_count[pivot_j] /
row_count[pivot_i]) or by traversing first_in_col/first_in_row if counts are not
available — then use deg_j/deg_i when updating work_estimate instead of reading
Cdegree/Rdegree after they were set to -1.
---
Nitpick comments:
In `@cpp/src/dual_simplex/crossover.cpp`:
- Around line 418-436: Hoist the allocation of delta_expanded (and optionally
delta_zN) out of the while (superbasic_list.size() > 0) loop: create
std::vector<f_t> delta_expanded(n) once before the loop and inside each
iteration reset its contents with std::fill(delta_expanded.begin(),
delta_expanded.end(), 0.0) (and similarly reset delta_zN or ensure it is resized
once and overwritten). Update the loop body that accumulates into delta_expanded
and the subsequent assignment to delta_zN[k] = -delta_expanded[nonbasic_list[k]]
to rely on the reused buffers; if n can change, ensure you call
delta_expanded.resize(n) before the fill.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6a904205-beca-40f2-bce0-a3dcdb15be1e
📒 Files selected for processing (2)
cpp/src/dual_simplex/crossover.cppcpp/src/dual_simplex/right_looking_lu.cpp
| // Initialize row_last_workspace from last_in_row (O(1) per row, no full row traversal) | ||
| for (i_t p1 = first_in_col[pivot_j]; p1 != kNone; p1 = elements[p1].next_in_column) { | ||
| element_t<i_t, f_t>* e = &elements[p1]; | ||
| const i_t i = e->i; | ||
| i_t row_last = kNone; | ||
| for (i_t p3 = first_in_row[i]; p3 != kNone; p3 = elements[p3].next_in_row) { | ||
| row_last = p3; | ||
| } | ||
| work_estimate += 2 * Rdegree[i]; | ||
| row_last_workspace[i] = row_last; | ||
| row_last_workspace[elements[p1].i] = last_in_row[elements[p1].i]; | ||
| } | ||
| work_estimate += 4 * Cdegree[pivot_j]; |
There was a problem hiding this comment.
Fix work_estimate accounting after pivot elimination.
Line 409, Line 499, and Line 509 still read Cdegree[pivot_j] / Rdegree[pivot_i], but those were already set to -1 by update_Cdegree_and_col_count() and update_Rdegree_and_row_count() before this helper is called. That makes work_estimate decrease on every pivot instead of approximating the actual loop cost.
Also applies to: 499-509
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/dual_simplex/right_looking_lu.cpp` around lines 405 - 409,
work_estimate is using Cdegree[pivot_j] and Rdegree[pivot_i] after those entries
have been set to -1 by update_Cdegree_and_col_count() /
update_Rdegree_and_row_count(), causing work_estimate to decrease incorrectly;
fix by using the actual degree before it is clobbered — either capture a local
variable like int deg_j = Cdegree[pivot_j] and/or deg_i = Rdegree[pivot_i]
before calling the update functions, or compute the degree from the
corresponding count arrays (e.g. col_count[pivot_j] / row_count[pivot_i]) or by
traversing first_in_col/first_in_row if counts are not available — then use
deg_j/deg_i when updating work_estimate instead of reading Cdegree/Rdegree after
they were set to -1.
There was a problem hiding this comment.
Looks like even the original code has some discrepancies. I am not going to modify it in this PR. @aliceb-nv for viz!
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
Allocate buffers once before the superbasic loop and reset with std::fill each iteration to avoid repeated O(n) allocations (PR NVIDIA#948 review). Made-with: Cursor
|
/ok to test cc54886 |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
cpp/src/dual_simplex/right_looking_lu.cpp (1)
371-372:⚠️ Potential issue | 🟡 MinorPreserve pivot degrees until Schur work accounting is finished.
Line 371 and Line 408 set pivot degrees to
-1, butschur_complement()still uses those pivot degrees forwork_estimateterms, which can push the estimate downward incorrectly. Please cache pivot degrees (or defer sentinel assignment) before those bookkeeping terms are computed.Also applies to: 408-409
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/dual_simplex/right_looking_lu.cpp` around lines 371 - 372, The code currently sets Cdegree[pivot_j] = -1 (in the block around pivot handling) before schur_complement() computes work_estimate terms, which causes those terms to use the sentinel -1 and undercount work; preserve the original pivot degree value until after all Schur work accounting is finished by caching the degree (e.g., save old_degree = Cdegree[pivot_j]) or by deferring the sentinel assignment until after schur_complement() and any work_estimate calculations complete; update both spots where Cdegree[pivot_j] is set to -1 (the one near pivot_j and the one around the other referenced location) to use the cached value approach and only assign -1 after work_estimate is computed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@cpp/src/dual_simplex/right_looking_lu.cpp`:
- Around line 371-372: The code currently sets Cdegree[pivot_j] = -1 (in the
block around pivot handling) before schur_complement() computes work_estimate
terms, which causes those terms to use the sentinel -1 and undercount work;
preserve the original pivot degree value until after all Schur work accounting
is finished by caching the degree (e.g., save old_degree = Cdegree[pivot_j]) or
by deferring the sentinel assignment until after schur_complement() and any
work_estimate calculations complete; update both spots where Cdegree[pivot_j] is
set to -1 (the one near pivot_j and the one around the other referenced
location) to use the cached value approach and only assign -1 after
work_estimate is computed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: eb075e7a-fa49-46ae-b96a-caf985a7b449
📒 Files selected for processing (2)
cpp/src/dual_simplex/crossover.cppcpp/src/dual_simplex/right_looking_lu.cpp
|
/ok to test 3e2352b |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
cpp/src/dual_simplex/crossover.cpp (1)
421-453: Consider extracting sparse/dense delta_z computation into a shared utility.As noted in prior review discussion, the sparse vs. dense computation of
delta_z = -A^T * delta_ycould be factored into a shared function for use in both dual simplex and crossover. This would reduce code duplication and ensure consistent threshold tuning.This can be deferred to a follow-up PR as suggested by the reviewer.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/dual_simplex/crossover.cpp` around lines 421 - 453, Extract the sparse vs dense computation of delta_z (currently computing delta_zN using delta_y_sparse / delta_y_dense, delta_expanded, nonbasic_list, lp.A and Arow) into a shared utility function (e.g., compute_delta_z_from_delta_y) that accepts delta_y in sparse or dense form, A (or Arow), nonbasic_list and outputs delta_zN; move the sparse path (loop over delta_y_sparse.i/x and Arow to build delta_expanded) and the dense path (delta_y_sparse.to_dense → dot product over lp.A columns) into the single function, preserve the existing 30% sparsity threshold outside the helper, and update callers in dual simplex and crossover to call the new utility to avoid duplication and centralize threshold/tuning.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/dual_simplex/crossover.cpp`:
- Around line 462-466: The loop over sparse entries in crossover.cpp uses
delta_y_sparse.i.size() as the upper bound without casting, causing a
signed/unsigned comparison warning and inconsistency with the earlier loop;
change the loop condition to use static_cast<i_t>(delta_y_sparse.i.size()) so
the type of nnz_idx (i_t) matches the cast size, keeping the body unchanged
(variables: delta_y_sparse, nnz_idx, i_t, y, step_length).
---
Nitpick comments:
In `@cpp/src/dual_simplex/crossover.cpp`:
- Around line 421-453: Extract the sparse vs dense computation of delta_z
(currently computing delta_zN using delta_y_sparse / delta_y_dense,
delta_expanded, nonbasic_list, lp.A and Arow) into a shared utility function
(e.g., compute_delta_z_from_delta_y) that accepts delta_y in sparse or dense
form, A (or Arow), nonbasic_list and outputs delta_zN; move the sparse path
(loop over delta_y_sparse.i/x and Arow to build delta_expanded) and the dense
path (delta_y_sparse.to_dense → dot product over lp.A columns) into the single
function, preserve the existing 30% sparsity threshold outside the helper, and
update callers in dual simplex and crossover to call the new utility to avoid
duplication and centralize threshold/tuning.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cecd1046-6c96-4a21-a4a0-df1a91062ace
📒 Files selected for processing (1)
cpp/src/dual_simplex/crossover.cpp
Allocate buffers once before the superbasic loop and reset with std::fill each iteration to avoid repeated O(n) allocations (PR NVIDIA#948 review). Made-with: Cursor
…ossover_dual_simplex
|
/ok to test 1ecb41f |
…ossover_dual_simplex
|
/ok to test c4ae488 |
…ossover_dual_simplex
|
/ok to test 2ed5853 |
|
/ok to test 42e7b8f |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/dual_simplex/phase2.cpp`:
- Around line 2787-2792: The early stop checks using toc(start_time) and
settings.concurrent_halt are returning before pricing and can mask a truly
optimal basis; move these checks so pricing runs first, then if leaving_index ==
-1 call phase2::prepare_optimality(...) and only after that honor the time
limit/CONCURRENT_LIMIT returns (i.e., keep the pricing/leave-index logic—the
branch that detects leaving_index == -1 and the call to
phase2::prepare_optimality—before any return of dual::status_t::TIME_LIMIT or
CONCURRENT_LIMIT based on toc(start_time) or *settings.concurrent_halt).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 741fa4be-2213-4414-ba97-1f75831be66a
📒 Files selected for processing (1)
cpp/src/dual_simplex/phase2.cpp
|
/ok to test ec5e015 |
|
/ok to test 5331696 |
chris-maes
left a comment
There was a problem hiding this comment.
LGTM! Thanks for the perf improvements.
|
/merge |
Description
This PR includes two performance improvements.
There are some regressions as well, but they are already solved in milliseconds.
Issue
Checklist