Skip to content

feat: LLM-powered model building tool plugboard go#224

Draft
toby-coleman wants to merge 13 commits intomainfrom
feat/plugboard-go-app
Draft

feat: LLM-powered model building tool plugboard go#224
toby-coleman wants to merge 13 commits intomainfrom
feat/plugboard-go-app

Conversation

@toby-coleman
Copy link
Contributor

Summary

Currently work-in-progress. plugboard go opens an interactive chat tool for building Plugboard models.

Changes

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant new feature: an interactive, AI-powered model building experience for Plugboard. The new plugboard go command provides a conversational interface where users can describe their desired models, and the integrated Copilot agent will assist in designing, implementing, and running them. This enhances the user experience by making model creation more intuitive and guided, leveraging large language models to streamline the development workflow.

Highlights

  • New LLM-powered Interactive Model Builder: Introduced a new plugboard go CLI command that launches an interactive chat-based Textual User Interface (TUI) for building Plugboard models, powered by GitHub Copilot.
  • Copilot Agent Integration: Implemented a PlugboardAgent to manage the Copilot client and session, enabling real-time interaction, streaming responses, and tool execution within the TUI.
  • New Agent Definitions and Tools: Added new agent definitions (examples.agent.md, researcher.agent.md) and Copilot tools (run_plugboard_model, get_mermaid_diagram_url) to facilitate model creation, execution, and visualization directly from the interactive environment.
  • Dynamic Model Selection: Users can now dynamically select and switch between different LLM models (e.g., GPT-5 mini, Claude-Sonnet-4) within the interactive TUI.
  • Dependency Management and Theming: Updated project dependencies to include github-copilot-sdk and textual as optional go extras, and introduced a shared theme for consistent UI styling.
Changelog
  • .github/agents/docs.agent.md
    • Updated with new model configurations for GPT-5 mini and GPT-4.1.
  • .github/agents/examples.agent.md
    • Added a new agent definition for developing example Plugboard models, including guidelines for tutorials and demo Jupyter notebooks.
  • .github/agents/lint.agent.md
    • Updated with new model configurations for GPT-5 mini and GPT-4.1.
  • .github/agents/researcher.agent.md
    • Added a new agent definition for researching specific topics to inform model development.
  • examples/AGENTS.md
    • Refactored content to be more general for 'Plugboard Models' and removed specific Jupyter Notebook guidelines, which were moved to the new examples agent definition.
  • plugboard/cli/init.py
    • Integrated the new 'go' sub-command into the main Plugboard CLI application.
  • plugboard/cli/go/AGENTS.md
    • Added a new file, referencing the main examples/AGENTS.md for agent instructions.
  • plugboard/cli/go/init.py
    • Added the entry point for the plugboard go CLI command, including dependency checks and TUI application launch logic.
  • plugboard/cli/go/agent.py
    • Added the PlugboardAgent class to manage Copilot client interactions, session events, and model resolution for the TUI.
  • plugboard/cli/go/app.py
    • Added the PlugboardGoApp Textual TUI application, implementing the interactive chat interface, model selection, and display of file trees and Mermaid diagrams.
  • plugboard/cli/go/tools.py
    • Added Copilot tool definitions for running Plugboard models from YAML and generating Mermaid diagram URLs.
  • plugboard/utils/theme.py
    • Added a new module defining shared color constants for Plugboard user interfaces.
  • pyproject.toml
    • Added github-copilot-sdk and textual as optional dependencies under the new go extra.
    • Configured plugboard/cli/go/AGENTS.md to be included in the wheel build artifacts.
    • Updated the provides-extras list to include the new go extra.
  • tests/unit/test_cli_go.py
    • Added comprehensive unit tests for the plugboard go CLI entry point, system prompt loading, PlugboardAgent functionality, and PlugboardGoApp widgets and message handling.
  • uv.lock
    • Updated dependency lock file to include github-copilot-sdk, linkify-it-py, textual, and uc-micro-py.
Activity
  • The pull request is currently a work-in-progress, with the author toby-coleman actively developing the plugboard go interactive chat tool.
  • Initial implementation of the LLM-powered model building tool has been pushed.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link

github-actions bot commented Mar 8, 2026

Benchmark comparison for 64adfa9a (base) vs fd7b9be8 (PR)


------------------------------------------------------------------------------------------------------------------ benchmark: 2 tests -----------------------------------------------------------------------------------------------------------------
Name (time in ms)                                                                         Min                 Max                Mean            StdDev              Median               IQR            Outliers     OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_process_run (pr/.benchmarks/Linux-CPython-3.12-64bit/0001_pr)         443.0566 (1.0)      451.6036 (1.0)      447.3059 (1.0)      3.3064 (1.0)      446.3391 (1.0)      4.7014 (1.0)           2;0  2.2356 (1.0)           5           1
test_benchmark_process_run (main/.benchmarks/Linux-CPython-3.12-64bit/0001_base)     446.8037 (1.01)     455.1559 (1.01)     449.6616 (1.01)     3.4286 (1.04)     449.0458 (1.01)     4.7388 (1.01)          1;0  2.2239 (0.99)          5           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces plugboard go, an interactive LLM-powered model building tool, featuring a new CLI command, a Textual-based TUI, and integration with the Copilot SDK. While innovative, it contains critical security vulnerabilities related to insecure LLM tool usage and potential Remote Code Execution (RCE). The agent automatically approves all tool calls, and the tools allow for arbitrary file access and code execution via malicious Plugboard models. Addressing these requires implementing explicit user confirmation for tool execution and strictly sanitizing tool inputs. Furthermore, a critical bug in how the agent's system prompt is loaded will likely cause the feature to fail when installed as a package. Several unit tests for the new TUI are outdated and failing, and error handling could be improved by using more specific exceptions instead of broad except Exception blocks. Addressing these points will significantly improve the robustness, maintainability, and security of this new feature.

system_message={
"content": system_prompt,
},
on_permission_request=PermissionHandler.approve_all,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

The use of PermissionHandler.approve_all automatically grants the LLM permission to execute any registered tool without user intervention. When combined with tools that can execute code or access the file system (like run_plugboard_model), this creates a significant security risk. An attacker could use prompt injection to trick the LLM into executing malicious code on the user's machine. It is highly recommended to implement a permission handler that requires explicit user confirmation for sensitive tool executions.

Comment on lines +56 to +57
async def run_plugboard_model(params: RunModelParams) -> str:
yaml_path = Path(params.yaml_path).resolve()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

The run_plugboard_model tool takes a yaml_path directly from the LLM and resolves it without verifying that it resides within a safe or expected directory. This allows for arbitrary file read (of YAML files) and, more critically, Remote Code Execution (RCE) because the tool subsequently builds and runs a Plugboard model from the specified file. Since ProcessBuilder.build uses pydoc.locate and the tool adds the file's directory to sys.path, an attacker can provide a malicious Python file alongside a YAML file to execute arbitrary code. You should restrict the yaml_path to a designated safe directory and ensure the resolved path is contained within it.

@@ -0,0 +1 @@
../../../examples/AGENTS.md No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The content of this file is a relative path, but the code in plugboard/cli/go/agent.py reads this file's content directly to use as a system prompt. This will result in the prompt being the literal string ../../../examples/AGENTS.md, which is not the intended behavior and will cause the go command to function incorrectly when the package is installed. To fix this, you should replace the path with the actual content of examples/AGENTS.md.

Comment on lines +127 to +128
except Exception:
return requested_model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Catching a broad Exception can hide unexpected errors. It would be better to catch a more specific exception if the copilot SDK provides one for this case. If not, consider logging the exception at a DEBUG or WARNING level to aid in debugging potential issues with model resolution.

Comment on lines +176 to +177
except Exception:
return ["gpt-4o", "gpt-5", "claude-sonnet-4", "claude-sonnet-4-thinking", "o3"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the _resolve_model method, catching a broad Exception here is not ideal as it can hide the root cause of failures. It would be better to catch a more specific exception. Also, consider logging the exception to provide more context when the fallback list of models is returned. This will make it easier to diagnose connection issues with the Copilot service.

Comment on lines +466 to +473
except Exception as e:
self.post_message(
AgentStatus(
f"Failed to connect to Copilot: {e}"
"\n\nMake sure the GitHub Copilot CLI "
"is installed and you are authenticated.",
),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Catching a broad Exception here can hide the root cause of connection failures. It would be more robust to catch more specific exceptions if the copilot SDK provides them. If not, logging the full traceback at a DEBUG level would be helpful for debugging. This pattern of catching broad exceptions is repeated in _send_to_agent and _change_model and should be addressed there as well.

Comment on lines +47 to +55
def test_go_default_model_option(self) -> None:
"""The --model flag should default to gpt-4o."""
with patch("plugboard.cli.go.app.PlugboardGoApp") as mock_app_cls:
mock_app = MagicMock()
mock_app_cls.return_value = mock_app
result = runner.invoke(app, ["go"])
assert result.exit_code == 0
mock_app_cls.assert_called_once_with(model_name="gpt-4o")
mock_app.run.assert_called_once()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test asserts that the default model is gpt-4o. However, the default model for the plugboard go command is defined as gpt-5-mini in plugboard/cli/go/__init__.py. The test should be updated to assert the correct default value.

Suggested change
def test_go_default_model_option(self) -> None:
"""The --model flag should default to gpt-4o."""
with patch("plugboard.cli.go.app.PlugboardGoApp") as mock_app_cls:
mock_app = MagicMock()
mock_app_cls.return_value = mock_app
result = runner.invoke(app, ["go"])
assert result.exit_code == 0
mock_app_cls.assert_called_once_with(model_name="gpt-4o")
mock_app.run.assert_called_once()
def test_go_default_model_option(self) -> None:
"""The --model flag should default to gpt-5-mini."""
with patch("plugboard.cli.go.app.PlugboardGoApp") as mock_app_cls:
mock_app = MagicMock()
mock_app_cls.return_value = mock_app
result = runner.invoke(app, ["go"])
assert result.exit_code == 0
mock_app_cls.assert_called_once_with(model_name="gpt-5-mini")
mock_app.run.assert_called_once()

Comment on lines +279 to +285
def test_model_selector_default(self) -> None:
"""ModelSelector default model should be gpt-4o."""
from plugboard.cli.go.app import ModelSelector

selector = ModelSelector()
assert selector.model_name == "gpt-4o"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The widget ModelSelector does not exist in plugboard.cli.go.app. It seems to have been replaced or renamed, possibly to ModelSelectionOverlay. This test is outdated and should be updated or removed.

Comment on lines +371 to +376
assert app.query_one("#model-selector") is not None
assert app.query_one("#mermaid-link") is not None
assert app.query_one("#file-tree") is not None
assert app.query_one("#model-overlay") is not None
assert app.query_one("#shortcut-hint") is not None
assert app.query_one("#title-banner") is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Some of the widget IDs being queried in this test do not exist in plugboard.cli.go.app.py. Specifically, #model-selector, #shortcut-hint, and #title-banner seem to have been renamed or removed. Please update the test to use the correct widget IDs, for example #header-banner instead of #title-banner, and remove queries for non-existent widgets.

messages = list(chat_scroll.query(ChatMessage))

assert messages[-1].role == "user"
assert "First user line\nSecond user line" in messages[-1]._content
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _append_to_last_message method in app.py joins messages with \n\n. This test asserts that the content contains First user line\nSecond user line, which will fail because of the single newline. The assertion should check for content with two newlines between the messages.

Suggested change
assert "First user line\nSecond user line" in messages[-1]._content
assert "First user line\n\nSecond user line" in messages[-1]._content

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant