Skip to content

Conversation

@shivammittal274
Copy link
Contributor

No description provided.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 28, 2025

Greptile Overview

Greptile Summary

This PR integrates BAML (Boundary ML) structured output extraction using the Vercel AI SDK. The implementation adds the ability to extract structured data from LLM responses by converting JSON schemas to BAML types, rendering prompts with BAML's template system, and parsing responses using BAML's SAP (Schema-Aligned Parsing).

Key Changes:

  • New /extract endpoint for standalone structured extraction
  • responseSchema parameter added to chat endpoint for inline extraction
  • BAML schema converter translates JSON Schema to BAML class definitions
  • Integration with existing Vercel AI SDK multi-provider architecture
  • Extraction happens after agent execution completes, using last 4 model responses as context

Issues Found:

  • Critical: Nested objects with properties are incorrectly converted to map<string, string> in schemaConverter.ts:48, which loses the nested schema structure entirely
  • SSE format for structured output uses custom d: prefix - should verify client compatibility with Vercel AI SDK data stream format

Confidence Score: 3/5

  • This PR has a critical logic bug affecting nested object handling but is otherwise well-structured
  • Score reflects one critical issue where nested objects lose their schema structure (converted to map<string, string>), which will cause incorrect extraction results for any schemas with nested object properties. The rest of the implementation is solid with good error handling, proper separation of concerns, and clean integration with existing systems. The SSE format concern is minor and may be intentional.
  • packages/agent/src/baml/schemaConverter.ts requires immediate attention to fix nested object handling

Important Files Changed

File Analysis

Filename Score Overview
packages/agent/src/baml/schemaConverter.ts 2/5 adds JSON Schema to BAML converter - nested objects incorrectly converted to map<string, string>, losing structure
packages/agent/src/baml/extractor.ts 4/5 implements BAML extraction using Vercel AI SDK - well-structured with good error handling and lazy initialization
packages/agent/src/agent/GeminiAgent.ts 3/5 adds structured output extraction to agent execute flow - SSE format may need verification for client compatibility
packages/agent/src/http/HttpServer.ts 4/5 adds /extract endpoint and responseSchema support to chat - clean integration with existing error handling

Sequence Diagram

sequenceDiagram
    participant Client
    participant HttpServer
    participant GeminiAgent
    participant BAMLExtractor
    participant VercelAI as Vercel AI SDK
    participant LLM

    Note over Client,LLM: Chat with Structured Output Flow

    Client->>HttpServer: POST /chat<br/>{message, responseSchema}
    HttpServer->>GeminiAgent: execute(message, honoStream, signal, responseSchema)
    
    loop Agent Execution (until no tool calls)
        GeminiAgent->>VercelAI: generateContentStream()
        VercelAI->>LLM: Stream request
        LLM-->>VercelAI: Stream response
        VercelAI-->>GeminiAgent: Response chunks
        GeminiAgent->>HttpServer: Write SSE chunks
        HttpServer-->>Client: Stream agent response
        
        opt Tool calls detected
            GeminiAgent->>GeminiAgent: executeToolCall()
        end
    end
    
    Note over GeminiAgent,BAMLExtractor: Structured Output Extraction

    alt responseSchema provided
        GeminiAgent->>GeminiAgent: buildExtractionContext(history, 4)
        GeminiAgent->>BAMLExtractor: extract(query, context, schema, config)
        BAMLExtractor->>BAMLExtractor: jsonSchemaToBAML(schema)
        BAMLExtractor->>BAMLExtractor: b.request.Extract()<br/>(render BAML prompt)
        BAMLExtractor->>VercelAI: generateTextFromPrompt(prompt)
        VercelAI->>LLM: Generate structured data
        LLM-->>VercelAI: Response text
        VercelAI-->>BAMLExtractor: LLM response
        BAMLExtractor->>BAMLExtractor: b.parse.Extract()<br/>(SAP parsing)
        BAMLExtractor-->>GeminiAgent: Extracted data
        GeminiAgent->>HttpServer: Write SSE (structured-output)
        HttpServer-->>Client: Structured output event
    end

    Note over Client,LLM: Standalone Extract Endpoint

    Client->>HttpServer: POST /extract<br/>{query, content, schema}
    HttpServer->>BAMLExtractor: extract(query, content, schema, config)
    BAMLExtractor->>BAMLExtractor: jsonSchemaToBAML(schema)
    BAMLExtractor->>BAMLExtractor: b.request.Extract()
    BAMLExtractor->>VercelAI: generateTextFromPrompt(prompt)
    VercelAI->>LLM: Generate structured data
    LLM-->>VercelAI: Response text
    VercelAI-->>BAMLExtractor: LLM response
    BAMLExtractor->>BAMLExtractor: b.parse.Extract()
    BAMLExtractor-->>HttpServer: Extracted data
    HttpServer-->>Client: JSON response
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines 47 to 48
case 'object':
return `map<string, string>${nullSuffix}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: nested object properties are lost - converts to map<string, string> instead of recursively generating BAML classes

Suggested change
case 'object':
return `map<string, string>${nullSuffix}`;
case 'object':
if (schema.properties && Object.keys(schema.properties).length > 0) {
throw new Error('Nested objects with properties not yet supported. Consider flattening schema or handling at top level.');
}
return `map<string, string>${nullSuffix}`;
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/agent/src/baml/schemaConverter.ts
Line: 47:48

Comment:
**logic:** nested object properties are lost - converts to `map<string, string>` instead of recursively generating BAML classes

```suggestion
    case 'object':
      if (schema.properties && Object.keys(schema.properties).length > 0) {
        throw new Error('Nested objects with properties not yet supported. Consider flattening schema or handling at top level.');
      }
      return `map<string, string>${nullSuffix}`;
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support added for nested objects

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants