feat: LLM-powered model building tool plugboard go#224
feat: LLM-powered model building tool plugboard go#224toby-coleman wants to merge 13 commits intomainfrom
plugboard go#224Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant new feature: an interactive, AI-powered model building experience for Plugboard. The new Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Benchmark comparison for |
There was a problem hiding this comment.
Code Review
This pull request introduces plugboard go, an interactive LLM-powered model building tool, featuring a new CLI command, a Textual-based TUI, and integration with the Copilot SDK. While innovative, it contains critical security vulnerabilities related to insecure LLM tool usage and potential Remote Code Execution (RCE). The agent automatically approves all tool calls, and the tools allow for arbitrary file access and code execution via malicious Plugboard models. Addressing these requires implementing explicit user confirmation for tool execution and strictly sanitizing tool inputs. Furthermore, a critical bug in how the agent's system prompt is loaded will likely cause the feature to fail when installed as a package. Several unit tests for the new TUI are outdated and failing, and error handling could be improved by using more specific exceptions instead of broad except Exception blocks. Addressing these points will significantly improve the robustness, maintainability, and security of this new feature.
| system_message={ | ||
| "content": system_prompt, | ||
| }, | ||
| on_permission_request=PermissionHandler.approve_all, |
There was a problem hiding this comment.
The use of PermissionHandler.approve_all automatically grants the LLM permission to execute any registered tool without user intervention. When combined with tools that can execute code or access the file system (like run_plugboard_model), this creates a significant security risk. An attacker could use prompt injection to trick the LLM into executing malicious code on the user's machine. It is highly recommended to implement a permission handler that requires explicit user confirmation for sensitive tool executions.
| async def run_plugboard_model(params: RunModelParams) -> str: | ||
| yaml_path = Path(params.yaml_path).resolve() |
There was a problem hiding this comment.
The run_plugboard_model tool takes a yaml_path directly from the LLM and resolves it without verifying that it resides within a safe or expected directory. This allows for arbitrary file read (of YAML files) and, more critically, Remote Code Execution (RCE) because the tool subsequently builds and runs a Plugboard model from the specified file. Since ProcessBuilder.build uses pydoc.locate and the tool adds the file's directory to sys.path, an attacker can provide a malicious Python file alongside a YAML file to execute arbitrary code. You should restrict the yaml_path to a designated safe directory and ensure the resolved path is contained within it.
| @@ -0,0 +1 @@ | |||
| ../../../examples/AGENTS.md No newline at end of file | |||
There was a problem hiding this comment.
The content of this file is a relative path, but the code in plugboard/cli/go/agent.py reads this file's content directly to use as a system prompt. This will result in the prompt being the literal string ../../../examples/AGENTS.md, which is not the intended behavior and will cause the go command to function incorrectly when the package is installed. To fix this, you should replace the path with the actual content of examples/AGENTS.md.
| except Exception: | ||
| return requested_model |
There was a problem hiding this comment.
| except Exception: | ||
| return ["gpt-4o", "gpt-5", "claude-sonnet-4", "claude-sonnet-4-thinking", "o3"] |
There was a problem hiding this comment.
Similar to the _resolve_model method, catching a broad Exception here is not ideal as it can hide the root cause of failures. It would be better to catch a more specific exception. Also, consider logging the exception to provide more context when the fallback list of models is returned. This will make it easier to diagnose connection issues with the Copilot service.
| except Exception as e: | ||
| self.post_message( | ||
| AgentStatus( | ||
| f"Failed to connect to Copilot: {e}" | ||
| "\n\nMake sure the GitHub Copilot CLI " | ||
| "is installed and you are authenticated.", | ||
| ), | ||
| ) |
There was a problem hiding this comment.
Catching a broad Exception here can hide the root cause of connection failures. It would be more robust to catch more specific exceptions if the copilot SDK provides them. If not, logging the full traceback at a DEBUG level would be helpful for debugging. This pattern of catching broad exceptions is repeated in _send_to_agent and _change_model and should be addressed there as well.
| def test_go_default_model_option(self) -> None: | ||
| """The --model flag should default to gpt-4o.""" | ||
| with patch("plugboard.cli.go.app.PlugboardGoApp") as mock_app_cls: | ||
| mock_app = MagicMock() | ||
| mock_app_cls.return_value = mock_app | ||
| result = runner.invoke(app, ["go"]) | ||
| assert result.exit_code == 0 | ||
| mock_app_cls.assert_called_once_with(model_name="gpt-4o") | ||
| mock_app.run.assert_called_once() |
There was a problem hiding this comment.
This test asserts that the default model is gpt-4o. However, the default model for the plugboard go command is defined as gpt-5-mini in plugboard/cli/go/__init__.py. The test should be updated to assert the correct default value.
| def test_go_default_model_option(self) -> None: | |
| """The --model flag should default to gpt-4o.""" | |
| with patch("plugboard.cli.go.app.PlugboardGoApp") as mock_app_cls: | |
| mock_app = MagicMock() | |
| mock_app_cls.return_value = mock_app | |
| result = runner.invoke(app, ["go"]) | |
| assert result.exit_code == 0 | |
| mock_app_cls.assert_called_once_with(model_name="gpt-4o") | |
| mock_app.run.assert_called_once() | |
| def test_go_default_model_option(self) -> None: | |
| """The --model flag should default to gpt-5-mini.""" | |
| with patch("plugboard.cli.go.app.PlugboardGoApp") as mock_app_cls: | |
| mock_app = MagicMock() | |
| mock_app_cls.return_value = mock_app | |
| result = runner.invoke(app, ["go"]) | |
| assert result.exit_code == 0 | |
| mock_app_cls.assert_called_once_with(model_name="gpt-5-mini") | |
| mock_app.run.assert_called_once() |
| def test_model_selector_default(self) -> None: | ||
| """ModelSelector default model should be gpt-4o.""" | ||
| from plugboard.cli.go.app import ModelSelector | ||
|
|
||
| selector = ModelSelector() | ||
| assert selector.model_name == "gpt-4o" | ||
|
|
| assert app.query_one("#model-selector") is not None | ||
| assert app.query_one("#mermaid-link") is not None | ||
| assert app.query_one("#file-tree") is not None | ||
| assert app.query_one("#model-overlay") is not None | ||
| assert app.query_one("#shortcut-hint") is not None | ||
| assert app.query_one("#title-banner") is not None |
There was a problem hiding this comment.
Some of the widget IDs being queried in this test do not exist in plugboard.cli.go.app.py. Specifically, #model-selector, #shortcut-hint, and #title-banner seem to have been renamed or removed. Please update the test to use the correct widget IDs, for example #header-banner instead of #title-banner, and remove queries for non-existent widgets.
| messages = list(chat_scroll.query(ChatMessage)) | ||
|
|
||
| assert messages[-1].role == "user" | ||
| assert "First user line\nSecond user line" in messages[-1]._content |
There was a problem hiding this comment.
The _append_to_last_message method in app.py joins messages with \n\n. This test asserts that the content contains First user line\nSecond user line, which will fail because of the single newline. The assertion should check for content with two newlines between the messages.
| assert "First user line\nSecond user line" in messages[-1]._content | |
| assert "First user line\n\nSecond user line" in messages[-1]._content |
Summary
Currently work-in-progress.
plugboard goopens an interactive chat tool for building Plugboard models.Changes