Draft
Conversation
Implements GatewayPlugin + UpdateProvider for the OTA demo. Polls a FastAPI catalog at boot and supports update / install / uninstall operations derived from SOVD ISO 17978-3 metadata. Process model: SIGTERM old executable, swap files on disk, fork+exec new executable. No lifecycle commands. Operation kind is classified from updated_components / added_components / removed_components. Components: - OtaUpdatePlugin: list/get/register/delete/prepare/execute/supports_automated - CatalogClient: cpp-httplib GET /catalog and artifact download, with parse_url - OperationDispatcher: SOVD metadata -> Update/Install/Uninstall/Unknown - ProcessRunner: pgrep via /proc, kill_by_executable with SIGTERM->SIGKILL fallback, fork+exec spawn 21 gtests pass (7 dispatcher, 6 parse_url, 8 plugin smoke).
… configure, add -Wshadow -Wconversion
Adds optional --replaces-executable flag to pack_artifact.py and threads it into the catalog entry as x_medkit_replaces_executable when kind=update. This lets the gateway plugin kill the OLD executable (broken_lidar_node) before spawning the NEW one (fixed_lidar_node) when the two live in separate ROS 2 packages.
…process When a SOVD update package swaps a node across ROS 2 packages (e.g. broken_lidar -> fixed_lidar), the OLD process binary basename differs from the new one. Read x_medkit_replaces_executable from the entry metadata before issuing the kill, falling back to x_medkit_executable when the field is absent (in-package upgrades).
…ime libs in image - ProcessRunner::pgrep now reads /proc/<pid>/cmdline argv[0] basename instead of /proc/<pid>/comm (which kernel truncates to 15 chars - 'broken_lidar_node' would never match). - plugin_exports.cpp exports get_update_provider so the gateway's plugin_loader can resolve the UpdateProvider interface across the dlopen boundary without relying on dynamic_cast. - Dockerfile.gateway: drop --symlink-install (broke multi-stage COPY) and add runtime libs (libcpp-httplib, libsystemd, nlohmann-json3, lifecycle, test_msgs). - ota_update_server Dockerfile: bake artifacts/ into image (WSL2 + Docker Desktop bind mounts unreliable). - Compose: gateway port configurable via OTA_GATEWAY_PORT (default 8080). Verified via end-to-end smoke against the live stack: - Plugin loads and reports as UpdateProvider - Boot poll registers all 3 catalog entries - Update flow kills broken_lidar_node and spawns fixed_lidar_node
pack_artifact.py was emitting 'name' (not in SOVD ISO 17978-3 - spec uses 'update_name') and 'version' (not a SOVD field at all). Spec-compliant clients (ros2_medkit_web_ui, the Foxglove updates panel) expect update_name; vendor-specific data lives under x_medkit_*. Confirmed against the live demo gateway: the web UI happily renders the updated shape, all 3 catalog entries visible end-to-end.
…nst gateway
Verifies the canonical SOVD client flow that the Foxglove updates panel
mirrors: connect form, /api/v1/updates returns {items: [<id>]}, per-id
/status calls, all 3 catalog entries render in the dashboard.
Adopt the same script convention as sensor_diagnostics, multi_ecu_aggregation,
and turtlebot3_integration:
./run-demo.sh build artifacts + bring up gateway + nodes + update
server (daemon mode by default, --attached for fg)
./stop-demo.sh tear down (-v removes volumes, --images removes
built images)
./check-demo.sh show registered updates + per-id status + live
plugin-managed processes inside the gateway
container
./trigger-update.sh broken_lidar -> fixed_lidar (the headline)
./trigger-install.sh install obstacle_classifier_v2 from scratch
./trigger-uninstall.sh remove broken_lidar_legacy
OTA_GATEWAY_PORT (or OTA_GATEWAY_URL for full override) lets the user
sidestep collisions with another gateway on host port 8080.
README quickstart updated to point at run-demo.sh.
…tern
tests/smoke_test_ota.sh asserts:
- gateway /health 200
- gateway log says 'Update backend provided by plugin' (no 'no provider' warn)
- GET /updates returns SOVD {items: [<id>]} envelope with all 3 catalog ids
- GET /updates/{id} detail uses spec field names: update_name (not 'name'),
x_medkit_version (not bare 'version'), updated_components for kind,
x_medkit_replaces_executable threaded through pack_artifact
- update flow: PUT prepare + execute kills broken_lidar_node and
spawns fixed_lidar_node inside the gateway container
- install flow: spawns obstacle_classifier_node
ci.yml gets a build-and-test-ota job following the same shape as the other
per-demo jobs: checkout -> install Python + ROS Jazzy on the runner ->
build_artifacts.sh -> docker compose up -d --build -> run smoke ->
log dumps on failure -> teardown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
demos/ota_nav2_sensor_fix/end-to-end OTA demoota_update_pluginC++ gateway plugin (UpdateProvider+GatewayPlugin)updated_components/added_components/removed_components)pack_artifact.pyCLI for building tarballs and catalog entriesdocker-compose.yml(gateway + update server); nav2 / Foxglove are bring-your-own (documented in README)Out of scope (deliberate, dev-grade positioning)
Test plan / verification
Unit & integration tests (all clean):
pytest -vforpack_artifact.py(16 tests)pytest -vforota_update_server(5 tests)colcon testforota_update_plugin(24 GTest cases)-Wall -Wextra -Wpedantic -Wshadow -Wconversionbuild_artifacts.shproduces a 3-entry catalog + tarballs end-to-endEnd-to-end smoke (
docker compose up --build, plugin against live gateway):UpdateProvider(gateway logs: "Update backend provided by plugin")/catalogand registers all 3 catalog entries/updates/fixed_lidar_2_1_0/prepare && /executekillsbroken_lidar_nodeand spawnsfixed_lidar_node(verified via PID checks)/updates/obstacle_classifier_v2_1_0_0/prepare && /executeswaps files and spawnsobstacle_classifier_node/updates/broken_lidar_legacy_remove/executeaccepted by gateway but the SOVDUpdateManagerstate machine stops at phasepreparedfor our no-opprepare()semantic. Plugin's uninstall code path is unit-tested; full integration needs either an explicitpreparestep from the panel UI or a small adjustment to the plugin's prepare for uninstall. Tracking as a follow-up; does not block the demo's main Update narrative.obstacle_classifier_noderuntime: spawned process crashes onvisualization_msgsfastcdr ABI mismatch in the runtime image. The install mechanism is verified; the runtime crash is aros-jazzy-visualization-msgspackaging issue separate from the plugin. Workaround: rebuild the runtime stage with matching fastcdr versions.Notes
selfpatch/ros2_medkitmainfor the gateway sources (clone happens at image build time)__has_includeshim inota_update_plugin.hppcovers both gateway header layouts (providers/vsupdates/)pgrepmatches against/proc/<pid>/cmdlineargv[0] basename (notcomm, which the kernel truncates to 15 chars)ros2_medkit_foxglove_extensionUpdates panel PR (fix(docker): add missing ros2_medkit components and submodules #6 of that repo)