Implement Choria Transport (Phases 1 and 2) by nmburgan · Pull Request #206 · OpenVoxProject/openbolt

nmburgan · 2026-03-27T16:34:24Z

Adds a Choria transport to OpenBolt, enabling task execution, command running, and script execution on nodes via Choria's NATS pub/sub messaging as an alternative to SSH and WinRM. This implements phases 1 and 2 of the transport plan (docs/choria-transport-plan.md).

Rather than opening direct connections to each node, OpenBolt sends MCollective RPC requests through a NATS broker. Nodes running the Choria server execute the requests via Ruby MCollective agents and return results over the same messaging bus. This scales well to large fleets and works through NAT/firewalls since nodes only need outbound connectivity to the broker.

Documentation (docs/)

choria-transport.md: User guide covering configuration, usage, and examples
choria-transport-dev.md: Developer guide for architecture, data flow, and code patterns
choria-transport-plan.md: Project plan with phased roadmap and progress tracking
choria-transport-testing.md: Test environment setup with OpenVox/Choria infrastructure configuration and manual verification steps

Phase 1: bolt_tasks agent

Phase 1 delivers task execution via the bolt_tasks agent, which downloads
task files from an OpenVox/Puppet Server and executes them on target nodes.

run_task via bolt_tasks agent with async execution and polling
run_command, run_script return clear per-target errors when the shell
agent is not available (rather than crashing)
upload, download return "not yet supported" errors
Connectivity checking via rpcutil.ping
Agent detection with per-target caching
Client configuration with NATS, TLS, and collective overrides
Config class with validation for all transport options
Transport and config registration in OpenBolt's executor and config systems

Phase 2: shell agent

Phase 2 adds command and script execution via the shell agent, plus an
alternative task execution path that uploads task files directly instead of
downloading from an OpenVox/Puppet Server.

run_command with async execution, timeout, and process kill on timeout
run_script with remote tmpdir creation, script upload via base64, and
cleanup
run_task via shell agent with support for all input methods (environment,
stdin, both)
Deterministic agent selection via task-agent config and --choria-task-agent
CLI flag (no automatic fallback between agents)
Batched shell polling via shell.list + shell.statuses for scalability
Platform-aware command builders for POSIX and Windows (PowerShell)
Interpreter support via the interpreters config option

Shared infrastructure

Client management (client.rb): MCollective RPC client setup with auto-detected or explicit Choria config, NATS/TLS overrides, collective routing, and thread-safe one-time initialization
Agent discovery (agent_discovery.rb): Per-target agent detection and version checking with caching, OS family detection for platform-specific command building
Command builders (command_builders.rb): POSIX and Windows command construction for tasks, scripts, file uploads, directory management, and environment variable injection
Helpers (helpers.rb): Shared polling with configurable retries and timeout, result building, input validation (path traversal, env key injection, null bytes)
Config (config/transport/choria.rb): Transport configuration with validation for all options including SSL overrides, timeout settings, agent selection, tmpdir, and interpreters

Transport config options

See docs for a list of all config options. I tried to expose all of the relevant knobs, but if you can think of others that should get added, let me know.

Key design decisions

Use the Base class rather than Simple: The Simple class assumes the model used by SSH and WinRM transports, where you have one thread per target that handles the connection, execution, and cleanup and directly for one target. That doesn’t work with Choria’s architecture and would be far too inefficient. This uses the Base class and implements our own batching. It means more code, but is necessary to use the goodness and scalability of Choria.
Deterministic agent routing: No fallback from bolt_tasks to shell. If the configured agent isn't available, the target gets a clear error. Mixed fleets (some nodes with shell, some without) produce per-target success/failure results, not crashes. I considered trying to do an automatic fallback from bolt_tasks to shell if the task isn't available on the server, but this added a fair bit of extra complexity, and it's probably better to give the user more control over exactly how a task is run anyway.
Partitioned functionality and graceful failure handling: Aligning with Choria’s philosophy, functionality is narrowly scoped by the agents installed on target nodes. If a node doesn’t have the agent needed for an action, the action fails for that target in a graceful way.
Batched polling and fetching of results: shell.list for O(1) status checks per poll round, shell.statuses for batched output retrieval. Avoids per-target polling overhead. The shell.statuses action is a new action for the shell agent, which is why version 1.2.0 is required. Otherwise, fetching results from nodes at scale would have been very slow and cumbersome.
Collective-based batching, not concurrency-based: Unlike SSH/WinRM, targets are grouped by Choria collective (typically one group), and each batch uses MCollective's native multi-node RPC fanout. OpenBolt's --concurrency flag doesn't apply and all targets in a collective are addressed in a single RPC call.
Shell DDL bundled with OpenBolt: The shell agent DDL is shipped in lib/mcollective/agent/shell.ddl and preloaded during client setup, so users don't need to install it separately on the controller. The bolt_tasks DDL comes from the choria-mcorpc-support gem which is already an OpenBolt dependency.
Code readability and maintainability: I tried to find the right balance of keeping the code relatively easy to follow and encapsulating logic where it makes sense to, without too many functions you have to pass through for a particular action. Keep nesting of logic to a sane level, don’t mutate objects passed by reference to remote function calls, don’t require keeping too much state in your head to understand what the code is doing.

Future phases (not in this PR)

Phase 3: Getting this integrated into the foreman_openbolt/smart_proxy_openbolt Foreman plugins.
Phase 4: Implementing upload/download with a new file-transfer Choria agent.
Phase 5: Full plan support, including apply blocks

Implements phases 1 and 2 of the Choria transport, enabling OpenBolt to run tasks, commands, and scripts on nodes via Choria's NATS pub/sub messaging as an alternative to SSH and WinRM. Phase 1 (bolt_tasks agent): Downloads task files to targets from an OpenVox/Puppet Server and executes them using the bolt_tasks Choria agent. Phase 2 (shell agent): Executes commands, scripts, and tasks through the Choria shell agent. This allows running tasks not available on an OpenVox/Puppet server. Everything is implemented as asynchronously as possible, aligning with Choria's model, and is built to run at scale across many thousands of nodes at once. See docs in a later commit for details on the phases of this project as well as user-facing and developer documentation.

Attempts to minimize stubbing (although we still need a fair bit) and use the choria-mcorpc-support gem as much as possible.

- choria-transport.md: User guide covering configuration, usage, and examples - choria-transport-dev.md: Developer guide for architecture, data flow, and patterns - choria-transport-plan.md: Project plan with phased roadmap and progress tracking - choria-transport-testing.md: Test environment setup for manual verification

Add CLI flags for all Choria transport options so they can be passed on the command line. CLI flags use a choria- prefix for clarity (e.g., --choria-config-file, --choria-ssl-ca) while internal option keys remain unprefixed so inventory files stay clean (e.g., choria: { config-file: /path }). Rename choria-agent to task-agent since it only applies to task execution. The CLI flag becomes --choria-task-agent. New CLI flags: --choria-task-agent, --choria-config-file, --choria-ssl-ca, --choria-ssl-cert, --choria-ssl-key, --choria-collective, --choria-puppet-environment, --choria-rpc-timeout, --choria-task-timeout, --choria-command-timeout, --nats-servers, --nats-connection-timeout The nats-* flags are not prefixed since they are already clearly Choria-specific. Shared options (cleanup, tmpdir, host, interpreters) are unchanged.

BoltOptionParser::OPTIONS[:choria] needs CLI switch names (e.g., choria-config-file) not internal keys (config-file) so that remove_excluded_opts correctly includes them in --help output. Also fix task-agent -> choria-task-agent in the task run flags list.

The 11 new Choria flags added to ACTION_OPTS increase the parameter count for bolt apply, bolt command, and bolt file.

nmburgan force-pushed the choria branch from f4887a2 to ad143db Compare March 27, 2026 19:35

nmburgan added 4 commits March 27, 2026 14:15

Add tests for Choria transport phases 1 and 2

3eb3150

Attempts to minimize stubbing (although we still need a fair bit) and use the choria-mcorpc-support gem as much as possible.

Add updated schemas for Choria options

683a3ca

nmburgan force-pushed the choria branch from ad143db to 683a3ca Compare March 27, 2026 21:15

nmburgan added 3 commits March 27, 2026 15:07

Update Pester parameter counts for new Choria CLI flags

411d044

The 11 new Choria flags added to ACTION_OPTS increase the parameter count for bolt apply, bolt command, and bolt file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Choria Transport (Phases 1 and 2)#206

Implement Choria Transport (Phases 1 and 2)#206
nmburgan wants to merge 7 commits intomainfrom
choria

nmburgan commented Mar 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nmburgan commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation (docs/)

Phase 1: bolt_tasks agent

Phase 2: shell agent

Shared infrastructure

Transport config options

Key design decisions

Future phases (not in this PR)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nmburgan commented Mar 27, 2026 •

edited

Loading