Skip to content

Support Pydantic models on Extract#330

Merged
pirate merged 8 commits intomainfrom
pydantic-extract
Apr 2, 2026
Merged

Support Pydantic models on Extract#330
pirate merged 8 commits intomainfrom
pydantic-extract

Conversation

@miguelg719
Copy link
Copy Markdown
Collaborator

@miguelg719 miguelg719 commented Apr 1, 2026

Why

Users currently have to manually convert Pydantic models to JSON Schema (model_json_schema()) before extract calls and manually validate responses back (model_validate()). The v0 SDK handled this automatically.

What

Session.extract() and AsyncSession.extract() now accept a Pydantic BaseModel class as the schema parameter. The SDK auto-converts it to JSON Schema for the API call and validates the response back into a model instance, with a camelCase→snake_case fallback.

Tests

  • local script
Screenshot 2026-04-01 at 4 02 00 PM Screenshot 2026-04-01 at 4 05 10 PM

@miguelg719 miguelg719 changed the title Pydantic extract Support Pydantic models on Extract Apr 1, 2026
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

Confidence score: 2/5

  • There is a high-confidence compatibility risk in src/stagehand/session.py: helpers imported from _pydantic_extract rely on Pydantic v2-only APIs (model_rebuild, model_json_schema, model_validate).
  • If this project/environment is still on Pydantic v1 as noted, extraction/schema-validation paths can fail at runtime, creating a likely regression for users.
  • Pay close attention to src/stagehand/session.py - version-mismatched Pydantic helper calls may break model schema generation and response validation.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/stagehand/session.py">

<violation number="1" location="src/stagehand/session.py:20">
P1: The helpers imported from `_pydantic_extract` (`pydantic_model_to_json_schema`, `validate_extract_response`) call Pydantic v2-only methods (`model_rebuild`, `model_json_schema`, `model_validate`), but the project supports `pydantic>=1.9.0`. Users on Pydantic v1 will get `AttributeError` at runtime. The existing `_compat.py` module already provides version-aware shims (`PYDANTIC_V1`, `model_validate`, etc.) that should be used or mirrored in `_pydantic_extract.py`.</violation>
</file>
Architecture diagram
sequenceDiagram
    participant User
    participant Session as Session (SDK)
    participant PUtils as _pydantic_extract (Utils)
    participant API as Browserbase API

    User->>Session: extract(schema=BaseModel)
    
    Session->>PUtils: NEW: is_pydantic_model(schema)
    PUtils-->>Session: True

    Session->>PUtils: NEW: pydantic_model_to_json_schema()
    Note over PUtils: model_rebuild() to resolve forward refs
    PUtils-->>Session: json_schema_dict

    Session->>API: POST /sessions/{id}/extract (schema=json_schema)
    API-->>Session: SessionExtractResponse (raw result data)

    opt If schema was Pydantic Model
        Session->>PUtils: NEW: validate_extract_response(result, schema)
        
        alt Happy Path: Direct Validation
            PUtils->>PUtils: schema.model_validate(result)
        else Validation Fails (e.g. casing mismatch)
            PUtils->>PUtils: NEW: _convert_dict_keys_to_snake_case()
            Note right of PUtils: Recursive camelCase to snake_case
            PUtils->>PUtils: schema.model_validate(normalized_result)
        end
        
        PUtils-->>Session: Validated Model Instance (or raw data if failure)
    end

    Session-->>User: return validated result
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link
Copy Markdown
Member

@pirate pirate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after this PR we should probably move all our custom code to a specific subfolder like src/_custom/{sea,pydantic,...}.py

Copy link
Copy Markdown
Member

@pirate pirate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple changes:

  • ⚠️ bump min pydantic version to >=2.0.0
  • ➕ add pydantic auto-loading of responses from agentExecute .output (can be separate PR)
  • ➕ make sure the pydantic model has model_config = ConfigDict(extra='allow') set before calling .model_validate() so that any extra fields in the JSON don't throw validation errors

Copy link
Copy Markdown
Member

@pirate pirate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to restructure this slightly so it follows our existing sessions_helpers.py monkey-patching approach instead of modifying codegen'd code directly.

* move custom code to dedicated files and cleanup v1 vs v2 checks

* hard-require pydantic v2 or greater

* lint
@pirate
Copy link
Copy Markdown
Member

pirate commented Apr 1, 2026

Ok pushed my changes, you can see just my changes to your PR here @miguelg719: #331

@miguelg719 miguelg719 requested a review from pirate April 2, 2026 00:08
@pirate pirate merged commit 380f5e0 into main Apr 2, 2026
8 checks passed
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/stagehand/resources/sessions_helpers.py">

<violation number="1" location="src/stagehand/resources/sessions_helpers.py:97">
P2: `_camel_to_snake` mishandles the boundary between an uppercase acronym and a following word (e.g., `"getHTTPResponse"` → `"get_httpresponse"` instead of `"get_http_response"`). Since this is the fallback used when direct validation fails, a wrong conversion silently drops the user back to raw data instead of a Pydantic instance.

The fix adds a lookahead check: also insert `_` before an uppercase char that follows another uppercase char when the *next* char is lowercase.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

def _camel_to_snake(name: str) -> str:
chars: list[str] = []
for i, ch in enumerate(name):
if ch.isupper() and i != 0 and not name[i - 1].isupper():
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: _camel_to_snake mishandles the boundary between an uppercase acronym and a following word (e.g., "getHTTPResponse""get_httpresponse" instead of "get_http_response"). Since this is the fallback used when direct validation fails, a wrong conversion silently drops the user back to raw data instead of a Pydantic instance.

The fix adds a lookahead check: also insert _ before an uppercase char that follows another uppercase char when the next char is lowercase.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/stagehand/resources/sessions_helpers.py, line 97:

<comment>`_camel_to_snake` mishandles the boundary between an uppercase acronym and a following word (e.g., `"getHTTPResponse"` → `"get_httpresponse"` instead of `"get_http_response"`). Since this is the fallback used when direct validation fails, a wrong conversion silently drops the user back to raw data instead of a Pydantic instance.

The fix adds a lookahead check: also insert `_` before an uppercase char that follows another uppercase char when the *next* char is lowercase.</comment>

<file context>
@@ -24,8 +29,189 @@
+def _camel_to_snake(name: str) -> str:
+    chars: list[str] = []
+    for i, ch in enumerate(name):
+        if ch.isupper() and i != 0 and not name[i - 1].isupper():
+            chars.append("_")
+        chars.append(ch.lower())
</file context>
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants