Skip to content

Fix route priority sort indexing#1222

Open
bdice wants to merge 1 commit into
NVIDIA:mainfrom
bdice:fix-routing-priority-remove-routes
Open

Fix route priority sort indexing#1222
bdice wants to merge 1 commit into
NVIDIA:mainfrom
bdice:fix-routing-priority-remove-routes

Conversation

@bdice
Copy link
Copy Markdown
Contributor

@bdice bdice commented May 14, 2026

Summary

  • Fix an out-of-bounds host read in adapted_sol_t::priority_remove_diff_routes by sorting route ids directly on routes[id].length.
  • The old comparator indexed a compact route_priority vector by route id, which could corrupt the host heap and intermittently abort ROUTING_TEST.

Closes #1221. Closes #866. Closes #783.

Validation

Running many times under ASan no longer triggers the SIGABRT, though it was very hard to reproduce the previous failure before this fix.

Signed-off-by: Bradley Dice <bdice@bradleydice.com>
@bdice bdice requested a review from a team as a code owner May 14, 2026 18:32
@bdice bdice requested review from Bubullzz and Kh4ster May 14, 2026 18:32
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR fixes an intermittent host heap corruption crash in the routing adapter by removing unsafe vector indexing from the sort comparator. The comparator now compares route lengths directly instead of using route IDs as indices into a temporary priority vector, eliminating out-of-bounds access when route IDs exceed the vector size.

Changes

Route Priority Sort Bounds Fix

Layer / File(s) Summary
Sort comparator route length access
cpp/src/routing/adapters/adapted_sol.cuh
The sort comparator in priority_remove_diff_routes now directly accesses routes[i].length and routes[j].length instead of indexing into a pre-allocated route_priority vector, eliminating out-of-bounds reads when route IDs ≥ vector size.

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and accurately summarizes the primary change: fixing route priority sort indexing to resolve an out-of-bounds read issue.
Linked Issues check ✅ Passed The PR fully addresses issue #1221 by changing the comparator to compare routes[i].length and routes[j].length directly instead of indexing a compact route_priority vector, eliminating out-of-bounds access.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing the comparator in adapted_sol.cuh as required; no unrelated modifications are present.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed The description directly addresses the changeset by explaining the bug fix for out-of-bounds host read in adapted_sol_t::priority_remove_diff_routes and the specific implementation change.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

@bdice bdice self-assigned this May 14, 2026
@bdice bdice added bug Something isn't working non-breaking Introduces a non-breaking change labels May 14, 2026
Copy link
Copy Markdown
Contributor

@hlinsen hlinsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, thank you @bdice

@mlubin
Copy link
Copy Markdown
Contributor

mlubin commented May 14, 2026

Uh oh, I see a crash in routing test:
https://github.com/NVIDIA/cuopt/actions/runs/25878108924/job/76057979097?pr=1222#step:12:17602

[ RUN      ] level0_retail/retail_float_test_t.CVRPTW_Retail/18
ROUTING_TEST: /tmp/conda-bld-output/bld/rattler-build_libmps-parser/work/cpp/src/routing/local_search/local_search.cu:315: void cuopt::routing::detail::local_search_t<i_t, f_t, REQUEST>::run_best_local_search(cuopt::routing::detail::solution_t<i_t, f_t, REQUEST>&, bool, bool, bool) [with i_t = int; f_t = float; cuopt::routing::request_t REQUEST = cuopt::routing::request_t::VRP]: Assertion `(cost_after - cost_before) - move_candidates.cycles.total_cycle_cost < 1.&& "Cost mismatch after a move"' failed.

@rgsl888prabhu
Copy link
Copy Markdown
Collaborator

Think this is same failure as noted in the bug #1221

@bdice
Copy link
Copy Markdown
Contributor Author

bdice commented May 14, 2026

The double free or corruption (out) error is what this PR fixes.

There seems to be a separate bug in the same test:

Assertion `(cost_after - cost_before) - move_candidates.cycles.total_cycle_cost < 1.&& "Cost mismatch after a move"' failed.

Attempting to reproduce that now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Introduces a non-breaking change

Projects

None yet

4 participants