papilo probing workaround, probing cache sync, extra error logging#992
papilo probing workaround, probing cache sync, extra error logging#992aliceb-nv wants to merge 2 commits intoNVIDIA:release/26.04from
Conversation
|
/ok to test 3c20e64 |
📝 WalkthroughWalkthroughThis pull request enhances error handling and synchronization across the CUDA optimization library. Changes include: adding direct stderr logging for exception paths in the CLI and solver; inserting CUDA stream synchronization before parallel regions; forcing debug-only code compilation in release builds; and adding bounds validation for presolved constraint matrices. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~35 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/mip_heuristics/presolve/third_party_presolve.cpp`:
- Around line 445-449: The current validation of column indices uses
cuopt_assert (around papilo_problem.getNCols(), cols, start, nnz), which is
disabled in release builds; replace it with a runtime check that always
runs—either use cuopt_expects with the same std::all_of predicate or implement
an explicit if-check over cols + start .. cols + start + nnz that throws or logs
and aborts when any col is out of [0, ncols) (keep the same error message), so
the malformed Papilo output is detected in all build configurations.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: dfb06ccf-1877-4e55-b8cc-c402be299172
📒 Files selected for processing (4)
cpp/cuopt_cli.cppcpp/src/mip_heuristics/presolve/probing_cache.cucpp/src/mip_heuristics/presolve/third_party_presolve.cppcpp/src/mip_heuristics/solve.cu
| i_t ncols = papilo_problem.getNCols(); | ||
| cuopt_assert( | ||
| std::all_of( | ||
| cols + start, cols + start + nnz, [ncols](i_t col) { return col >= 0 && col < ncols; }), | ||
| "Papilo produced invalid column indices in presolved matrix"); |
There was a problem hiding this comment.
Consider using a runtime check instead of cuopt_assert for release-build protection.
The Papilo bug this guards against specifically manifests in release builds (when NDEBUG is defined), which is why the NDEBUG workaround above is needed. However, cuopt_assert is typically disabled in release builds, meaning this validation may not trigger when it's most needed.
Consider using cuopt_expects or an explicit runtime check that works in all build configurations:
🛡️ Suggested fix for runtime validation
i_t ncols = papilo_problem.getNCols();
- cuopt_assert(
- std::all_of(
- cols + start, cols + start + nnz, [ncols](i_t col) { return col >= 0 && col < ncols; }),
- "Papilo produced invalid column indices in presolved matrix");
+ cuopt_expects(
+ std::all_of(
+ cols + start, cols + start + nnz, [ncols](i_t col) { return col >= 0 && col < ncols; }),
+ error_type_t::RuntimeError,
+ "Papilo produced invalid column indices in presolved matrix");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/mip_heuristics/presolve/third_party_presolve.cpp` around lines 445 -
449, The current validation of column indices uses cuopt_assert (around
papilo_problem.getNCols(), cols, start, nnz), which is disabled in release
builds; replace it with a runtime check that always runs—either use
cuopt_expects with the same std::all_of predicate or implement an explicit
if-check over cols + start .. cols + start + nnz that throws or logs and aborts
when any col is out of [0, ncols) (keep the same error message), so the
malformed Papilo output is detected in all build configurations.
There was a problem hiding this comment.
This is checked a bit later down the line via cuopt_expects(), which throws exceptions.
| // Force-include ProbingView.hpp with NDEBUG undefined so the restoration is compiled in. | ||
| #ifdef NDEBUG | ||
| #undef NDEBUG | ||
| #include <papilo/core/ProbingView.hpp> |
There was a problem hiding this comment.
Did you file an issue with Papilo?
There was a problem hiding this comment.
@rg20 I can file this. What is the Papilo issue synopsis?
| raft::common::nvtx::range fun_scope("run_mip"); | ||
| auto constexpr const running_mip = true; | ||
|
|
||
| // TODO ask Akif and Alice how was this passed down? |
There was a problem hiding this comment.
Can you remove/change this comment
|
/ok to test 218178e |
1 similar comment
|
/ok to test 218178e |
In NDEBUG mode, Papilo skips some sanitization steps in the problem representation, and sometimes outputs a matrix with invalid column indices (-1) which haven't been properly cleaned up. This happens in the Probing presolver.
NDEBUG is now conditionally undefined when the ProbingView.hpp header is included to ensure the appropriate sanitization steps are being executed.
Furthermore, a stream sync call was missing in the probing cache code between the working buffer allocation and their use by OMP threads, and has been fixed.
fprintf(stderr,...) calls have also been added to catch scenarios where exception cleanup cause logging calls to be made after the logger object cleanup.
Description
Issue
Checklist