Skip to content

Conversation

@shivammittal274
Copy link
Contributor

No description provided.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 18, 2025

Greptile Summary

  • Adds Google Gemini AI model integration to the BrowserOS agent system by implementing GeminiAgent class, formatter, and registering it in the factory
  • Updates SessionManager with new CANCELLING state and safety timeout mechanism to prevent race conditions during agent cancellation operations
  • Sets Gemini as the default agent type and configures file-based storage for macOS compatibility to avoid keychain prompts

Important Files Changed

Filename Overview
packages/agent/src/agent/GeminiAgent.ts New Gemini agent implementation with MCP server integration, dual config support, and streaming response handling
packages/agent/src/session/SessionManager.ts Added CANCELLING state and safety timeout for robust cancellation; changed default agent to gemini-sdk
packages/agent/src/agent/GeminiAgent.formatter.ts New formatter class that transforms Gemini events into standardized FormattedEvent format

Confidence score: 4/5

  • This PR is largely safe to merge with careful attention to the session state management changes
  • Score reflects solid architecture following established patterns but complexity in SessionManager cancellation logic requires review
  • Pay close attention to SessionManager.ts for potential race conditions in the new CANCELLING state implementation

Sequence Diagram

sequenceDiagram
    participant User
    participant WebSocketServer
    participant SessionManager
    participant GeminiAgent
    participant GeminiClient
    participant MCPServer
    participant ControllerBridge

    User->>WebSocketServer: "Connect WebSocket"
    WebSocketServer->>SessionManager: "createSession(agentConfig)"
    SessionManager->>GeminiAgent: "create agent"
    WebSocketServer-->>User: "connection event"

    User->>WebSocketServer: "send message"
    WebSocketServer->>SessionManager: "markProcessing()"
    WebSocketServer->>GeminiAgent: "execute(message)"
    
    GeminiAgent->>GeminiAgent: "init() - fetch config"
    GeminiAgent->>GeminiClient: "initialize with MCP server"
    GeminiAgent->>GeminiClient: "sendMessageStream()"
    
    loop "Multi-turn conversation"
        GeminiClient-->>GeminiAgent: "stream events"
        GeminiAgent->>GeminiEventFormatter: "format(event)"
        GeminiEventFormatter-->>WebSocketServer: "FormattedEvent"
        WebSocketServer-->>User: "formatted event"
        
        alt "Tool call required"
            GeminiAgent->>MCPServer: "executeToolCall()"
            MCPServer->>ControllerBridge: "browser automation"
            ControllerBridge-->>MCPServer: "tool result"
            MCPServer-->>GeminiAgent: "tool response"
            GeminiAgent->>GeminiClient: "continue with tool results"
        end
    end
    
    GeminiAgent-->>WebSocketServer: "completion event"
    WebSocketServer->>SessionManager: "markIdle()"
    WebSocketServer-->>User: "completion"
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format

shivammittal274 and others added 4 commits November 19, 2025 22:30
- Add VercelAIContentGenerator with multi-provider support (OpenAI, Anthropic, Google)
- Implement tool/message/response conversion strategies with full type safety
- Fix tool parameter extraction (parametersJsonSchema support)
- Fix OpenAI schema validation (recursive normalization for nested objects)
- Fix text streaming (textDelta property name correction)
- Fix usage metadata (promptTokens/completionTokens property names)
- Add comprehensive test suite (streaming, non-streaming, error handling)
- Complete E2E verification with browser tools (successful)
- Enable strict mode for OpenAI token usage tracking
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Copy link
Contributor

@felarof99 felarof99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss in the review section itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This event formatter would only be required temporarily until we continue using our old UI, right?

For new UI, we don't need this event formatter, is that right? @shivammittal274

this.useVercelAI = true;
this.vercelProvider = 'openai';
this.config.apiKey = openaiKey;
this.config.modelName = 'gpt-4o';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's set the default model as gpt-4.1. @shivammittal274

this.useVercelAI = true;
this.vercelProvider = 'google';
this.config.apiKey = googleKey;
this.config.modelName = 'gemini-2.0-flash-exp';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bad default model. Let's actually use constants at the top of file or create environment variables for these defaults as well.

This by default should be gemini-2.5-flash

excludeTools: ['run_shell_command', 'write_file', 'replace'],
compressionThreshold: 1000000, // Disable aggressive compression (1M tokens threshold)
mcpServers: {
'browseros-mcp': new MCPServerConfig(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivammittal274 Let's use named args here! If there is any upstream change, will be passing args in wrong order and would be a pain to debug.

logger.debug('✅ VercelAIContentGenerator instance created');

// Type assertion needed as contentGenerator is private in Config
(this.geminiConfig as any).contentGenerator = contentGenerator;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greptileai why are we typecasting this.geminiConfig as any here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type assertion (this.geminiConfig as any) is being used here because the contentGenerator property is private in the GeminiConfig class from the @google/gemini-cli-core package.

Looking at the code, this is happening in the Vercel AI integration path where you're trying to inject a custom VercelAIContentGenerator to override the default content generation behavior. Since the contentGenerator property isn't part of the public API of GeminiConfig, TypeScript won't allow you to access it directly.

This is a workaround to bypass TypeScript's type checking and directly set the private property. However, this approach has some risks:

  1. Brittle: If the internal structure of GeminiConfig changes, this could break
  2. Type safety: You lose compile-time type checking for this operation
  3. Maintenance: It's not clear from the public API that this is supported

A few potential alternatives to consider:

  1. Check if there's a public setter method in GeminiConfig for the content generator
  2. Use a factory pattern where you create the GeminiConfig with the custom generator from the start
  3. Extend GeminiConfig to create your own subclass that properly exposes this functionality
  4. Submit a PR to @google/gemini-cli-core to add a public method for setting custom content generators

The type assertion works for now, but it would be better to find a more type-safe approach if possible.

this.promptId = `prompt-${Date.now()}`;

logger.info('✅ GeminiClient initialized', {
mode: this.useVercelAI ? 'Vercel AI' : 'Native Gemini',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivammittal274 in a follow-up CL, let's add integration test with useVercelAI knob set to true and false, making sure both modes are working as of now. And in future if something breaks, we can then use that test to debug both.


this.updateEventTime();

const formatted = formatter.format(event);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivammittal274 I think we should delete GeminiAgentFormatter and delete all these pieces of code. We are not launching Gemini Agent with old UI, we will only launch it with the new UI. So, let's not have unnecessary code which will be hard to clean up later.

request: GenerateContentParameters,
_userPromptId: string,
): Promise<GenerateContentResponse> {
console.log('[VercelAI] generateContent called');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivammittal274 let's not use any console.log. Let's consistently use logger we have, so that we can easily disable logs in production.

Logger we have -- /Users/felarof01/Workspaces/build/browseros-server/packages/common/src/logger.ts

*/
geminiToVercel(contents: readonly Content[]): CoreMessage[] {
const messages: CoreMessage[] = [];

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use logger

} from '../types.js';
import type { ToolConversionStrategy } from './tool.js';

export class ResponseConversionStrategy {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

**

@shivammittal274 would it be possible to add a unit tests verifiying vercelToGemini(geminiToVercel(x)) == x for each of the strategies? That'll be the fullproof way of testing we are translating correctly I guess.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you have such a test, you can even tell claude to keep fixing errors until the above test passes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants