Skip to content

papilo probing workaround, probing cache sync, extra error logging#992

Open
aliceb-nv wants to merge 2 commits intoNVIDIA:release/26.04from
aliceb-nv:papilo-probing-fix
Open

papilo probing workaround, probing cache sync, extra error logging#992
aliceb-nv wants to merge 2 commits intoNVIDIA:release/26.04from
aliceb-nv:papilo-probing-fix

Conversation

@aliceb-nv
Copy link
Contributor

In NDEBUG mode, Papilo skips some sanitization steps in the problem representation, and sometimes outputs a matrix with invalid column indices (-1) which haven't been properly cleaned up. This happens in the Probing presolver.

NDEBUG is now conditionally undefined when the ProbingView.hpp header is included to ensure the appropriate sanitization steps are being executed.

Furthermore, a stream sync call was missing in the probing cache code between the working buffer allocation and their use by OMP threads, and has been fixed.
fprintf(stderr,...) calls have also been added to catch scenarios where exception cleanup cause logging calls to be made after the logger object cleanup.

Description

Issue

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

@aliceb-nv aliceb-nv added this to the 26.04 milestone Mar 24, 2026
@aliceb-nv aliceb-nv requested a review from a team as a code owner March 24, 2026 13:27
@aliceb-nv aliceb-nv requested review from Kh4ster and chris-maes March 24, 2026 13:27
@aliceb-nv aliceb-nv added bug Something isn't working non-breaking Introduces a non-breaking change labels Mar 24, 2026
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 24, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@aliceb-nv
Copy link
Contributor Author

/ok to test 3c20e64

@aliceb-nv aliceb-nv changed the title papilo probing workaroudn, probing cache sync, extra error logging papilo probing workaround, probing cache sync, extra error logging Mar 24, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 24, 2026

📝 Walkthrough

Walkthrough

This pull request enhances error handling and synchronization across the CUDA optimization library. Changes include: adding direct stderr logging for exception paths in the CLI and solver; inserting CUDA stream synchronization before parallel regions; forcing debug-only code compilation in release builds; and adding bounds validation for presolved constraint matrices.

Changes

Cohort / File(s) Summary
Exception Handling & Logging
cpp/cuopt_cli.cpp, cpp/src/mip_heuristics/solve.cu
Added direct stderr output and exception handlers to capture and report unexpected errors during solve operations prior to existing logging, with expanded catch blocks in solve_mip.
CUDA Stream Synchronization
cpp/src/mip_heuristics/presolve/probing_cache.cu
Inserted explicit sync_stream() call before OpenMP parallel region to ensure CUDA-visible state from earlier allocations is ready for thread-specific stream access.
Validation & Debug Compilation Override
cpp/src/mip_heuristics/presolve/third_party_presolve.cpp
Added conditional preprocessor manipulation to compile ProbingView debug-only bounds-restoration logic in release builds; added runtime assertion validating all presolved column indices are within valid bounds.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~35 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The description explains the Papilo sanitization issue, NDEBUG workaround, stream sync fix, and error logging additions—all related to the changeset.
Title check ✅ Passed The title clearly summarizes the three main changes: papilo probing workaround, probing cache synchronization, and additional error logging. It aligns with the actual modifications across the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/mip_heuristics/presolve/third_party_presolve.cpp`:
- Around line 445-449: The current validation of column indices uses
cuopt_assert (around papilo_problem.getNCols(), cols, start, nnz), which is
disabled in release builds; replace it with a runtime check that always
runs—either use cuopt_expects with the same std::all_of predicate or implement
an explicit if-check over cols + start .. cols + start + nnz that throws or logs
and aborts when any col is out of [0, ncols) (keep the same error message), so
the malformed Papilo output is detected in all build configurations.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: dfb06ccf-1877-4e55-b8cc-c402be299172

📥 Commits

Reviewing files that changed from the base of the PR and between cbec0e9 and 3c20e64.

📒 Files selected for processing (4)
  • cpp/cuopt_cli.cpp
  • cpp/src/mip_heuristics/presolve/probing_cache.cu
  • cpp/src/mip_heuristics/presolve/third_party_presolve.cpp
  • cpp/src/mip_heuristics/solve.cu

Comment on lines +445 to +449
i_t ncols = papilo_problem.getNCols();
cuopt_assert(
std::all_of(
cols + start, cols + start + nnz, [ncols](i_t col) { return col >= 0 && col < ncols; }),
"Papilo produced invalid column indices in presolved matrix");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Consider using a runtime check instead of cuopt_assert for release-build protection.

The Papilo bug this guards against specifically manifests in release builds (when NDEBUG is defined), which is why the NDEBUG workaround above is needed. However, cuopt_assert is typically disabled in release builds, meaning this validation may not trigger when it's most needed.

Consider using cuopt_expects or an explicit runtime check that works in all build configurations:

🛡️ Suggested fix for runtime validation
  i_t ncols = papilo_problem.getNCols();
- cuopt_assert(
-   std::all_of(
-     cols + start, cols + start + nnz, [ncols](i_t col) { return col >= 0 && col < ncols; }),
-   "Papilo produced invalid column indices in presolved matrix");
+ cuopt_expects(
+   std::all_of(
+     cols + start, cols + start + nnz, [ncols](i_t col) { return col >= 0 && col < ncols; }),
+   error_type_t::RuntimeError,
+   "Papilo produced invalid column indices in presolved matrix");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/presolve/third_party_presolve.cpp` around lines 445 -
449, The current validation of column indices uses cuopt_assert (around
papilo_problem.getNCols(), cols, start, nnz), which is disabled in release
builds; replace it with a runtime check that always runs—either use
cuopt_expects with the same std::all_of predicate or implement an explicit
if-check over cols + start .. cols + start + nnz that throws or logs and aborts
when any col is out of [0, ncols) (keep the same error message), so the
malformed Papilo output is detected in all build configurations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is checked a bit later down the line via cuopt_expects(), which throws exceptions.

// Force-include ProbingView.hpp with NDEBUG undefined so the restoration is compiled in.
#ifdef NDEBUG
#undef NDEBUG
#include <papilo/core/ProbingView.hpp>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you file an issue with Papilo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rg20 I can file this. What is the Papilo issue synopsis?

raft::common::nvtx::range fun_scope("run_mip");
auto constexpr const running_mip = true;

// TODO ask Akif and Alice how was this passed down?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove/change this comment

@aliceb-nv
Copy link
Contributor Author

/ok to test 218178e

1 similar comment
@aliceb-nv
Copy link
Contributor Author

/ok to test 218178e

@chris-maes chris-maes added the P0 label Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Introduces a non-breaking change P0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants