Reduce QP overheads#1140
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds native free-variable support for QP barrier solves, adjusts related barrier math and control flow, records free-variable indices in presolve metadata, conditions dense-column detection on Q presence/diagonality, times cuDSS initialization, removes one pre-barrier diagnostic, and introduces a ratio-based GMRES stop/update rule. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
cpp/src/barrier/iterative_refinement.hpp (1)
181-182: Make the early-stop ratio configurable.Line 181 hard-codes a solver-wide accuracy/performance tradeoff into a shared iterative-refinement path. A fixed
5.0may be fine for the profiling case that motivated this PR, but it is likely too aggressive for some matrices and too loose for others. Please plumb this through existing solver settings or a caller-supplied parameter instead of baking it into the header.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/barrier/iterative_refinement.hpp` around lines 181 - 182, Replace the hard-coded local constant f_t stop_ratio = 5.0 in iterative_refinement.hpp with a configurable parameter sourced from the solver settings or a caller-provided argument: add a new float-typed setting (or function parameter) named stop_ratio (or iterative_refinement_stop_ratio) to the solver config or to the iterative refinement function/method, use that variable in place of the literal when computing bnorm and the early-stop decision, keep the default value at 5.0 if not supplied, and update any callers or constructor that invoke the iterative refinement path to thread the new setting through.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/barrier/iterative_refinement.hpp`:
- Around line 365-380: The branch treating improvement_ratio == stop_ratio as
non-improvement is wrong; update the comparison logic in the iterative
refinement block (symbols: improvement_ratio, stop_ratio, best_residual, x_sav,
x, show_info, CUOPT_LOG_INFO) so that hitting the threshold counts as
improvement (e.g., use >= instead of >) or use an epsilon-based comparison for
floating-point stability (|improvement_ratio - stop_ratio| <= eps) to decide
improvement, and ensure the residual update and raft::copy for
best_residual/x_sav->x remain inside the "improved" branch while keeping the
logging/break behavior only for true stagnation/worsening cases.
---
Nitpick comments:
In `@cpp/src/barrier/iterative_refinement.hpp`:
- Around line 181-182: Replace the hard-coded local constant f_t stop_ratio =
5.0 in iterative_refinement.hpp with a configurable parameter sourced from the
solver settings or a caller-provided argument: add a new float-typed setting (or
function parameter) named stop_ratio (or iterative_refinement_stop_ratio) to the
solver config or to the iterative refinement function/method, use that variable
in place of the literal when computing bnorm and the early-stop decision, keep
the default value at 5.0 if not supplied, and update any callers or constructor
that invoke the iterative refinement path to thread the new setting through.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a7b6ab20-ce6f-49c8-9326-46dd180e9846
📒 Files selected for processing (1)
cpp/src/barrier/iterative_refinement.hpp
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/barrier/barrier.cu`:
- Around line 2310-2340: The free-variable branch currently sets diag to 0
whenever q_jj > 0 which allows tiny positive Q(j,j) to produce near-zero diag;
change the lambda used in the cub::DeviceTransform::Transform (the one capturing
free_var_reg in the data.Q.n > 0 && data.Q_diagonal case) to floor the returned
value at free_var_reg (e.g., return max(q_jj, free_var_reg) or return (q_jj >
free_var_reg) ? q_jj : free_var_reg) so diag for free variables is never below
free_var_reg; also ensure the other free-variable-only lambda (the data.Q.n == 0
path) similarly returns at least free_var_reg for is_free entries.
- Around line 3596-3606: The initial mu computation (the one that divides by n +
num_upper_bounds before the iteration loop) doesn't account for native free
variables set via presolve_info.free_variable_indices and data.n_free_vars;
update that initial mu calculation to use the same adjusted divisor as
compute_mu()/compute_target_mu() by subtracting data.n_free_vars (i.e., use n +
num_upper_bounds - data.n_free_vars) when presolve_info.free_variable_indices is
non-empty, and ensure data.n_free_vars and d_is_free_ are initialized before
this initial mu is computed so the first iteration uses the same barrier measure
as later compute_mu()/compute_target_mu() calls.
- Around line 3081-3083: The calculation of mu_aff and subsequent mu division
must guard against mu_denom == 0 (mu_denom = x.size() + n_upper_bounds -
n_free_vars) to avoid division-by-zero when all variables are free; add an
explicit fast-path: compute mu_denom as now, then if mu_denom <= eps (use a
small f_t epsilon like 0 or machine epsilon), set mu_aff = static_cast<f_t>(0)
and mu = static_cast<f_t>(0), set any mu ratio (e.g., mu_aff_over_mu) to a safe
default (1 or 0 depending on downstream expectations) and skip the
mu-aff/mu-based updates, otherwise perform the existing divisions. Apply the
identical guard and handling for the other occurrence around the mu computation
block (the second spot noted in the comment).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 94fd7d1f-a999-4dab-a9dc-d5d097ed0af6
📒 Files selected for processing (3)
cpp/src/barrier/barrier.cucpp/src/dual_simplex/presolve.cppcpp/src/dual_simplex/presolve.hpp
|
/ok to test ab709dc |
| #define CUOPT_RANDOM_SEED "random_seed" | ||
| #define CUOPT_PDLP_PRECISION "pdlp_precision" | ||
| #define CUOPT_MIP_SEMICONTINUOUS_BIG_M "mip_semi_continuous_big_m" | ||
| #define CUOPT_BARRIER_ITERATIVE_REFINEMENT "barrier_iterative_refinement" |
There was a problem hiding this comment.
Nit: could you put these next to CUOPT_BARRIER_DUAL_INITIAL_POINT above. So we keep all the barrier parameters together?
| A_dense.from_sparse(lp.A, j, k++); | ||
| } | ||
| } | ||
| original_A_values = AD.x; |
There was a problem hiding this comment.
Is this line no longer necessary? It isn't included below. Just checking.
| rmm::cuda_stream_view stream) const | ||
| { | ||
| static_assert(std::is_signed_v<i_t>); | ||
| i_t const mm = m; |
There was a problem hiding this comment.
Nit: why can't you just use m and n?
There was a problem hiding this comment.
Because you want these to be const?
| }; | ||
|
|
||
| template <typename i_t, typename f_t> | ||
| void device_csc_matrix_t<i_t, f_t>::to_compressed_row(device_csr_matrix_t<i_t, f_t>& Arow, |
| raft::copy(rows.data(), i.data(), nz, stream); | ||
| raft::copy(vals.data(), x.data(), nz, stream); | ||
|
|
||
| thrust::tabulate(exec, |
There was a problem hiding this comment.
This is using a binary search to find out which a column the nonzero stored at index p is goes in?
There was a problem hiding this comment.
Oh you are expanding to back to triplet?
| device_A_x_values.resize(device_AD.x.size(), handle_ptr->get_stream()); | ||
| raft::copy( | ||
| device_A_x_values.data(), device_AD.x.data(), device_AD.x.size(), handle_ptr->get_stream()); | ||
| device_AD.to_compressed_row(device_A, handle_ptr->get_stream()); |
| vector_norm2<i_t, f_t>(data.dual_residual)); | ||
| #endif | ||
| // Make sure (w, x, v, z) > 0 | ||
| // Save free variable x values before forcing positive (they can be negative) |
There was a problem hiding this comment.
I will update this in my PR
| data.w.ensure_positive(epsilon_adjust); | ||
| data.x.ensure_positive(epsilon_adjust); | ||
|
|
||
| // For native free variables (QP): restore x values and set z = 0 |
| data.d_complementarity_xz_rhs_.size(), | ||
| [new_mu] HD(f_t dx_aff, f_t dz_aff) { return -(dx_aff * dz_aff) + new_mu; }, | ||
| stream_view_.value()); | ||
| auto fill_linear_cc_rhs = [&](raft::device_span<f_t> out, |
There was a problem hiding this comment.
I will update this in my PR
| iteration_data_t<i_t, f_t> data(lp, num_upper_bounds, Q, settings, start_time); | ||
|
|
||
| // Set up native free variable tracking for QPs | ||
| if (!presolve_info.free_variable_indices.empty()) { |
There was a problem hiding this comment.
I will update this in my PR
|
|
||
| auto row_iter = thrust::device_pointer_cast(rows.data()); | ||
| auto col_iter = thrust::device_pointer_cast(cols.data()); | ||
| thrust::sort_by_key(exec, |
There was a problem hiding this comment.
No issue with the code, but I'd love to understand how it works.
You want to sort the nonzeros by row. I'm not sure I understand why the second iterator has row_iter + nz and col_iter + nz.
|
|
||
| rmm::device_uvector<i_t> rows(nz, stream); | ||
| rmm::device_uvector<i_t> cols(nz, stream); | ||
| rmm::device_uvector<f_t> vals(nz, stream); |
There was a problem hiding this comment.
Is vals needed? Why not directly use Arow.x?
| cudaMemcpyAsync(Arow.row_start.data() + mm, &nz, sizeof(i_t), cudaMemcpyHostToDevice, stream)); | ||
|
|
||
| rmm::device_uvector<i_t> rows(nz, stream); | ||
| rmm::device_uvector<i_t> cols(nz, stream); |
There was a problem hiding this comment.
Is cols needed? Why not directly use Arow.x? You could always make cols a reference to Arow.x if you wanted to keep the name cols. But it seems like you need an extra copy at the end.
| thrust::make_zip_iterator(thrust::make_tuple(row_iter + nz, col_iter + nz)), | ||
| thrust::device_pointer_cast(vals.data())); | ||
|
|
||
| raft::copy(Arow.j.data(), cols.data(), nz, stream); |
There was a problem hiding this comment.
I think you can get rid of these two copies and extra allocations by using Arow.j and Arow.x directly.
| } | ||
|
|
||
| if (settings.barrier_presolve && free_variables > 0) { | ||
| // Try to remove free variables |
There was a problem hiding this comment.
Please add back comment.
| {CUOPT_DUAL_INFEASIBLE_TOLERANCE, &pdlp_settings.tolerances.dual_infeasible_tolerance, f_t(0.0), f_t(1e-1), std::max(f_t(1e-10), std::numeric_limits<f_t>::epsilon())}, | ||
| {CUOPT_MIP_CUT_CHANGE_THRESHOLD, &mip_settings.cut_change_threshold, f_t(-1.0), std::numeric_limits<f_t>::infinity(), f_t(-1.0)}, | ||
| {CUOPT_MIP_CUT_MIN_ORTHOGONALITY, &mip_settings.cut_min_orthogonality, f_t(0.0), f_t(1.0), f_t(0.5)}, | ||
| {CUOPT_BARRIER_STEP_SCALE, &pdlp_settings.barrier_step_scale, f_t(0.5), f_t(1.0), f_t(0.9)}, |
There was a problem hiding this comment.
We can't ever go to 1.0. The max needs to be something like 0.9999
| case dual_simplex::lp_status_t::OPTIMAL: return pdlp_termination_status_t::Optimal; | ||
| case dual_simplex::lp_status_t::INFEASIBLE: | ||
| return pdlp_termination_status_t::PrimalInfeasible; | ||
| case dual_simplex::lp_status_t::UNBOUNDED: return pdlp_termination_status_t::DualInfeasible; |
There was a problem hiding this comment.
If the dual is infeasible, I don't think we have a way to tell if the primal is unbounded or infeasible. So this should probably map to the UNBOUNDED_OR_INFEASIBLE status.
| barrier_settings.log.log = false; | ||
| } | ||
|
|
||
| barrier_settings.log.printf("Barrier settings created at %.2f seconds, toc time: %.2f seconds\n", |
There was a problem hiding this comment.
Remove this before merging
| } | ||
|
|
||
| template <typename i_t, typename f_t> | ||
| static dual_simplex::user_problem_t<i_t, f_t> cuopt_optimization_problem_to_simplex_problem( |
There was a problem hiding this comment.
Nit: this should probably be called cuopt_optimization_problem_to_user_problem
| cudss_mt_lib_file = CUDSS_MT_LIB_FILE_NAME; | ||
| } | ||
|
|
||
| if (cudss_mt_lib_file != nullptr) { |
There was a problem hiding this comment.
Do you want to provide an option for the user to disable the CUDSS_THREADING_LIB?
chris-maes
left a comment
There was a problem hiding this comment.
Awesome.
I will make changes to the free variable PR. I think you will either need to rebase off that PR or pull them in.
Most of the comments are very minor or just questions.
The only one that definitely requires changing is the maximum barrier_step_scale
Description
This PR improves QP performance, particularly on easy problems that run under few hundred milli seconds.
Algorithmic improvements
Tunable parameters
Overhead reductions
optimization_problem_ttoproblem_ttakes about 70ms. For QP, problem_t is not used as we are again converting that to user_problem_t. Added conversion fromoptimization_problem_ttouser_problem_tdirectly.Performance improvements
Issue
Checklist