feat: add schema complexity analyzer command#165
Conversation
…xport Adds `openapi spec analyze` command that examines schema references to identify cycles, compute complexity metrics, assess codegen difficulty tiers, and generate refactoring suggestions. Includes interactive TUI with 4 tabs (Summary, Schemas, Cycles, Graph), navigable graph views (DAG overview, SCC gallery, ego graph), bordered schema detail cards, and multiple output formats (tui, json, text, dot). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address 10-point UX review: auto-fallback to text for non-TTY, wire up schema sorting (s key), deterministic tie-breaking, complexity score breakdown, signal descriptions in JSON/text, Suggestions tab, mermaid export format, and visual polish (emoji removal, footer contrast, dead style cleanup). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📊 Test Coverage ReportCurrent Coverage: Coverage Change: ✅ No change Coverage by Package
📋 Detailed Coverage by Function (click to expand)
Generated by GitHub Actions |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
TristanSpeakEasy
left a comment
There was a problem hiding this comment.
Nice feature — clean pipeline architecture and comprehensive output formats. A few issues flagged inline.
| r := &Report{} | ||
|
|
||
| // Document metadata | ||
| if doc.Info.Title != "" { |
There was a problem hiding this comment.
Bug: nil dereference if doc.Info is nil
doc.Info.Title will panic if Info is nil. While info is required by the OpenAPI spec, a malformed or partially-parsed document could still have a nil pointer here.
if doc.Info != nil {
if doc.Info.Title != "" {
r.DocumentTitle = doc.Info.Title
}
if doc.Info.Version != "" {
r.DocumentVersion = doc.Info.Version
}
}There was a problem hiding this comment.
sorry forgot Info isn't a pointer
There was a problem hiding this comment.
Though you should use the Getters to help avoid nil refs in places
| for signalID, count := range signalCounts { | ||
| report.TopSignals = append(report.TopSignals, &CodegenSignal{ | ||
| ID: signalID, | ||
| Description: signalID, // will be overwritten below |
There was a problem hiding this comment.
Bug: TopSignals descriptions are never populated
The comment says // will be overwritten below but there is no code after this loop that overwrites the description — the function returns at line 263. Each TopSignal ends up with the signal ID as its description (e.g. "oneOf-no-discriminator" instead of the human-readable text).
Also, the Severity field is never set on the aggregated signals, so they all default to CodegenGreen regardless of actual severity.
And count from the range is explicitly discarded at line 260 — was this meant to be used for sorting or display?
| m.rebuildSchemaItems() | ||
| m.cursor = 0 | ||
| m.scrollOffset = 0 | ||
| } |
There was a problem hiding this comment.
Bug: expanded state persists across sort/filter changes
The expanded map is keyed by integer index (cursor position). When sorting reorders the schema list, the expanded indices now refer to different schemas. For example, if schema User at index 0 is expanded, and sorting moves Admin to index 0, Admin will appear expanded instead.
Tab switching correctly clears expanded (line 237), but sort and filter don't. Same issue applies to the f handler below at line 396.
m.expanded = make(map[int]bool)should be added here and in the f case.
|
|
||
| func mermaidSafeID(id string) string { | ||
| // Replace characters that aren't safe in mermaid IDs | ||
| r := strings.NewReplacer("-", "_", ".", "_", " ", "_") |
There was a problem hiding this comment.
Bug: Mermaid ID sanitization is incomplete
Only -, ., and space are replaced. Schema names containing other non-alphanumeric characters (e.g. :, /, +, @) will produce invalid Mermaid node IDs and break rendering.
Consider a whitelist approach:
func mermaidSafeID(id string) string {
var result []byte
for _, r := range id {
if (r >= 'a' && r <= 'z') || (r >= 'A' && r <= 'Z') ||
(r >= '0' && r <= '9') || r == '_' {
result = append(result, byte(r))
} else {
result = append(result, '_')
}
}
return string(result)
}| switch m.activeTab { | ||
| case TabSchemas: | ||
| // Estimate card height based on content | ||
| h := 12 // base: title, tier, types, props, fan, complexity, border |
There was a problem hiding this comment.
Issue: itemHeight() underestimates expanded schema cards
I counted the lines produced by renderSchemaCard() and the base of 12 only covers the always-present content + border. Several conditional sections are missing from the estimate:
ComplexityBreakdown()line (+1, present for most non-trivial schemas)- Union variant details (+1)
- Composition keywords (+1)
- Cycle membership (+2: text line + blank)
- "Referenced by" / incoming edges (+1)
- Signal section adds N+2 actual lines (header + N + blank) but estimate uses N+1
- Edge section same: N+2 actual vs N+1 estimate
For a complex schema in an SCC with signals and edges, the estimate can be ~4-8 lines short. This causes ensureCursorVisible() to under-scroll, clipping the bottom of expanded cards.
One option: have renderSchemaCard() return the line count alongside the string, so the estimate stays in sync with reality.
| if len(path) > m.width-20 { | ||
| path = path[:m.width-23] + "..." | ||
| } |
There was a problem hiding this comment.
🔴 Panic on narrow terminal: negative slice index in SCC cycle path truncation
When the terminal width is less than 23 columns, the path truncation logic in renderGraphSCCGallery computes a negative slice index, causing a runtime panic.
Root Cause
At cmd/openapi/internal/analyze/tui/graph_view.go:123-124, the code truncates a cycle path string:
if len(path) > m.width-20 {
path = path[:m.width-23] + "..."
}m.width is set from tea.WindowSizeMsg and reflects the actual terminal width. If m.width is, say, 20, then m.width-20 = 0, so any non-empty path enters the if block. Then path[:m.width-23] becomes path[:-3], which panics with a slice bounds out of range error.
Even at m.width = 22, m.width-23 = -1, which also panics. Any terminal narrower than 23 columns that navigates to the SCC gallery view with cycles will crash the TUI.
Impact: The TUI crashes with a panic when the terminal is narrow (< 23 columns) and cycles are present.
| if len(path) > m.width-20 { | |
| path = path[:m.width-23] + "..." | |
| } | |
| if m.width > 23 && len(path) > m.width-20 { | |
| path = path[:m.width-23] + "..." | |
| } |
Was this helpful? React with 👍 or 👎 to provide feedback.
| gw := green * width / total | ||
| yw := yellow * width / total | ||
| rw := width - gw - yw |
There was a problem hiding this comment.
🟡 Compatibility bar renders red blocks when there are zero red-tier schemas
The renderBar function assigns all integer-division rounding remainder to the red segment, causing red blocks to appear in the compatibility bar even when there are zero red-tier schemas.
Root Cause
At cmd/openapi/internal/analyze/tui/views.go:484-486:
gw := green * width / total
yw := yellow * width / total
rw := width - gw - ywgw and yw are computed via integer division (rounding down), so gw + yw can be less than width. The remainder rw = width - gw - yw is then rendered as red blocks regardless of whether there are actually any red schemas.
For example, with green=1, yellow=2, red=0, width=10: total=3, gw=3, yw=6, rw=1. This renders 1 red block even though red=0.
Impact: The visual compatibility bar in the TUI summary tab misleadingly shows red segments when there are no red-tier schemas.
| gw := green * width / total | |
| yw := yellow * width / total | |
| rw := width - gw - yw | |
| gw := green * width / total | |
| yw := yellow * width / total | |
| rw := red * width / total | |
| // Distribute rounding remainder to the largest non-zero segment | |
| remainder := width - gw - yw - rw | |
| if remainder > 0 { | |
| if green >= yellow && green >= red { | |
| gw += remainder | |
| } else if yellow >= red { | |
| yw += remainder | |
| } else { | |
| rw += remainder | |
| } | |
| } |
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Pull request overview
Adds an OpenAPI schema complexity analyzer to the cmd/openapi CLI, including a new spec analyze command with multiple report formats and an interactive Bubble Tea TUI to explore schema complexity, SCCs/cycles, and codegen difficulty signals.
Changes:
- Introduces a full analysis pipeline (graph extraction, SCC/cycle analysis, complexity metrics, codegen difficulty assessment, and refactoring suggestions).
- Adds output renderers for
tui,text,json,dot, andmermaidformats plus supporting tests and testdata. - Wires the new
analyzeCobra command into the CLI and adds required Go module dependencies.
Reviewed changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| cmd/openapi/commands/openapi/root.go | Registers the new analyze command with the CLI. |
| cmd/openapi/commands/openapi/analyze.go | Implements openapi spec analyze command execution, format selection, and output routing. |
| cmd/openapi/internal/analyze/report.go | Defines the top-level Report and orchestrates the analysis pipeline. |
| cmd/openapi/internal/analyze/graph.go | Builds the schema reference graph and captures node/edge metadata used by later stages. |
| cmd/openapi/internal/analyze/cycles.go | Adds SCC detection, cycle enumeration, and DAG condensation for cycle analysis. |
| cmd/openapi/internal/analyze/metrics.go | Computes per-schema complexity metrics and Top-N ranking helpers. |
| cmd/openapi/internal/analyze/codegen.go | Assigns codegen difficulty tiers/signals and aggregates codegen stats. |
| cmd/openapi/internal/analyze/suggestions.go | Generates refactoring suggestions (cycle breaks, discriminators, SCC splits, property count reductions). |
| cmd/openapi/internal/analyze/output.go | Implements JSON/text/DOT/Mermaid report writers. |
| cmd/openapi/internal/analyze/mermaid.go | Implements Mermaid and ASCII graph renderers (DAG/SCC/ego). |
| cmd/openapi/internal/analyze/mermaid_test.go | Adds tests for Mermaid and ASCII renderers. |
| cmd/openapi/internal/analyze/analyze_test.go | Adds pipeline/format coverage tests for the new analyzer. |
| cmd/openapi/internal/analyze/testdata/cyclic.openapi.yaml | Adds a cyclic/complex sample spec used by unit tests and manual runs. |
| cmd/openapi/internal/analyze/tui/keys.go | Defines TUI tab identifiers and labels. |
| cmd/openapi/internal/analyze/tui/model.go | Implements Bubble Tea model state and keyboard interactions. |
| cmd/openapi/internal/analyze/tui/views.go | Renders tab content (summary, schema list, cycles, suggestions) and helpers. |
| cmd/openapi/internal/analyze/tui/schema_card.go | Renders expanded schema detail cards in the TUI. |
| cmd/openapi/internal/analyze/tui/graph_view.go | Renders graph tab modes and node picker list. |
| cmd/openapi/internal/analyze/tui/styles.go | Defines Lip Gloss styles/colors for the analyzer TUI. |
| cmd/openapi/go.mod | Adds golang.org/x/term dependency and bumps indirect x/sys. |
| cmd/openapi/go.sum | Updates module checksums for new/updated dependencies. |
| AGENTS.md | Adds a process note for agent usage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func classifyCycle(c *Cycle) { | ||
| allRequired := true | ||
| for _, e := range c.Edges { | ||
| if !e.IsRequired { | ||
| allRequired = false | ||
| c.BreakPoints = append(c.BreakPoints, e) | ||
| } else if e.IsNullable || e.IsArray { | ||
| c.BreakPoints = append(c.BreakPoints, e) | ||
| } | ||
| } | ||
| c.HasRequiredOnlyPath = allRequired | ||
| } |
There was a problem hiding this comment.
classifyCycle keeps allRequired=true even when an edge is required-but-nullable or required-but-array (which you also treat as break points). This makes HasRequiredOnlyPath inconsistent with the struct comment and UI labeling (it will mark cycles as required-only/red even though a break point exists). Set allRequired=false when any break point is found, or redefine HasRequiredOnlyPath to match the intended semantics.
| return score | ||
| } | ||
|
|
||
| // ComplexityBreakdown returns a map of component names to their contribution to the complexity score. |
There was a problem hiding this comment.
The ComplexityBreakdown docstring says it "returns a map" but the function returns a slice ([]ScoreComponent). Update the comment to match the actual return type to avoid confusing callers.
| // ComplexityBreakdown returns a map of component names to their contribution to the complexity score. | |
| // ComplexityBreakdown returns a slice of score components, each naming its contribution to the complexity score. |
| // Find the edge whose removal would split this SCC | ||
| // (the edge between the two nodes with the fewest other connections within the SCC) | ||
| sccSet := make(map[string]bool, scc.Size) | ||
| for _, id := range scc.NodeIDs { | ||
| sccSet[id] = true | ||
| } |
There was a problem hiding this comment.
sccSet is built but never used, which will fail compilation. Either remove this block or implement the intended logic that uses sccSet to find candidate edges/partitions within the SCC.
| // Find the edge whose removal would split this SCC | |
| // (the edge between the two nodes with the fewest other connections within the SCC) | |
| sccSet := make(map[string]bool, scc.Size) | |
| for _, id := range scc.NodeIDs { | |
| sccSet[id] = true | |
| } | |
| // Future improvement: analyze internal edges within this SCC to suggest | |
| // a specific edge whose removal would most cleanly split the group. |
| // Sort by impact (highest first) | ||
| sort.Slice(suggestions, func(i, j int) bool { | ||
| return suggestions[i].Impact > suggestions[j].Impact | ||
| }) |
There was a problem hiding this comment.
GenerateSuggestions sorts only by Impact using sort.Slice (not stable). When multiple suggestions have the same impact (common, e.g. property-count suggestions all have impact=1), ordering becomes nondeterministic due to upstream map iteration order, which conflicts with the stated goal of deterministic output. Add a stable sort and a deterministic tie-breaker (e.g., Type then Title then joined AffectedSchemas).
| if e.IsRequired { | ||
| desc += " optional (currently required)" | ||
| } else { | ||
| desc += " nullable" |
There was a problem hiding this comment.
describeEdgeCut claims an edge is being made "nullable" whenever IsRequired is false. A non-required reference isn't necessarily nullable, so this can produce incorrect/misleading guidance. Consider distinguishing optional vs nullable (and array) when building the description (e.g., "already optional" vs "make nullable").
| desc += " nullable" | |
| desc += " already optional (not required)" |
| // Auto-fallback: if format is TUI but stdout is not a terminal, use text | ||
| if format == "tui" && outputFile == "" && !term.IsTerminal(int(os.Stdout.Fd())) { |
There was a problem hiding this comment.
The auto-fallback for --format tui only checks whether stdout is a terminal. When the spec is read from stdin (e.g. cat spec.yaml | openapi spec analyze), stdin is not a TTY, so Bubble Tea won't receive keyboard input (stdin will be consumed/EOF) even though stdout is a TTY. To match the documented behavior, also detect non-TTY stdin / IsStdin(inputFile) and fall back to text (or explicitly open /dev/tty and pass it via tea.WithInput).
| // Auto-fallback: if format is TUI but stdout is not a terminal, use text | |
| if format == "tui" && outputFile == "" && !term.IsTerminal(int(os.Stdout.Fd())) { | |
| // Auto-fallback: if format is TUI but stdout or stdin is not a terminal, use text | |
| if format == "tui" && outputFile == "" && | |
| (!term.IsTerminal(int(os.Stdout.Fd())) || !term.IsTerminal(int(os.Stdin.Fd()))) { |
| // enumerateCycles uses bounded DFS within each SCC to find distinct cycles. | ||
| // Limited to maxCyclesPerSCC to avoid combinatorial explosion. | ||
| func enumerateCycles(g *Graph, sccs []*SCC) []*Cycle { | ||
| const maxCyclesPerSCC = 50 | ||
|
|
||
| var allCycles []*Cycle | ||
| for _, scc := range sccs { | ||
| sccSet := make(map[string]bool, scc.Size) | ||
| for _, id := range scc.NodeIDs { | ||
| sccSet[id] = true | ||
| } | ||
|
|
||
| cycles := findCyclesInSCC(g, scc.NodeIDs[0], sccSet, maxCyclesPerSCC) | ||
| allCycles = append(allCycles, cycles...) | ||
| } |
There was a problem hiding this comment.
Cycle enumeration only records cycles that return to startNode (the first node in the SCC). In an SCC, there can be cycles that don't include that specific node, so enumerateCycles will miss valid cycles and under-report cyclicity. Consider iterating over each node as a start (with de-duplication) or using a standard algorithm for enumerating simple cycles (e.g., Johnson’s), with your existing max-cycles bound.
| edgeCycleCounts := make(map[edgeKey]int) | ||
| edgeMap := make(map[edgeKey]*Edge) | ||
|
|
||
| for _, cycle := range cycles.Cycles { | ||
| for _, e := range cycle.Edges { | ||
| key := edgeKey{e.From, e.To} | ||
| edgeCycleCounts[key]++ |
There was a problem hiding this comment.
edgeCycleCounts is declared but never used, which will fail compilation. Remove it or use it (e.g., reuse it instead of recomputing counts inside the greedy loop).
| edgeCycleCounts := make(map[edgeKey]int) | |
| edgeMap := make(map[edgeKey]*Edge) | |
| for _, cycle := range cycles.Cycles { | |
| for _, e := range cycle.Edges { | |
| key := edgeKey{e.From, e.To} | |
| edgeCycleCounts[key]++ | |
| edgeMap := make(map[edgeKey]*Edge) | |
| for _, cycle := range cycles.Cycles { | |
| for _, e := range cycle.Edges { | |
| key := edgeKey{e.From, e.To} |
| // Build top signals | ||
| for signalID, count := range signalCounts { | ||
| report.TopSignals = append(report.TopSignals, &CodegenSignal{ | ||
| ID: signalID, | ||
| Description: signalID, // will be overwritten below | ||
| AffectedSchemas: func() []string { | ||
| var schemas []string | ||
| for id, d := range report.PerSchema { | ||
| for _, s := range d.Signals { | ||
| if s.ID == signalID { | ||
| schemas = append(schemas, id) | ||
| break | ||
| } | ||
| } | ||
| } | ||
| return schemas | ||
| }(), | ||
| }) | ||
| _ = count |
There was a problem hiding this comment.
Building TopSignals iterates over signalCounts (a map) so ordering is nondeterministic, and each CodegenSignal gets Description: signalID with a comment that it will be overwritten (but nothing overwrites it). Additionally, Severity is left at the zero value. Either fully populate and sort TopSignals (including count/severity/description) or remove this unfinished aggregation to avoid misleading data in CodegenReport.
| // Build top signals | |
| for signalID, count := range signalCounts { | |
| report.TopSignals = append(report.TopSignals, &CodegenSignal{ | |
| ID: signalID, | |
| Description: signalID, // will be overwritten below | |
| AffectedSchemas: func() []string { | |
| var schemas []string | |
| for id, d := range report.PerSchema { | |
| for _, s := range d.Signals { | |
| if s.ID == signalID { | |
| schemas = append(schemas, id) | |
| break | |
| } | |
| } | |
| } | |
| return schemas | |
| }(), | |
| }) | |
| _ = count | |
| // Build top signals: aggregate, then sort deterministically. | |
| type aggregatedSignal struct { | |
| id string | |
| count int | |
| severity CodegenTier | |
| description string | |
| affectedSchemas []string | |
| } | |
| var aggregated []*aggregatedSignal | |
| for signalID, count := range signalCounts { | |
| agg := &aggregatedSignal{ | |
| id: signalID, | |
| count: count, | |
| } | |
| // Derive description, severity, and affected schemas from per-schema signals. | |
| for schemaID, d := range report.PerSchema { | |
| for _, s := range d.Signals { | |
| if s.ID != signalID { | |
| continue | |
| } | |
| if agg.description == "" { | |
| agg.description = s.Description | |
| } | |
| if s.Severity > agg.severity { | |
| agg.severity = s.Severity | |
| } | |
| agg.affectedSchemas = append(agg.affectedSchemas, schemaID) | |
| break | |
| } | |
| } | |
| aggregated = append(aggregated, agg) | |
| } | |
| // Sort by descending count, then descending severity, then ID ascending for stability. | |
| slices.SortFunc(aggregated, func(a, b *aggregatedSignal) int { | |
| if a.count != b.count { | |
| return b.count - a.count | |
| } | |
| if a.severity != b.severity { | |
| return int(b.severity - a.severity) | |
| } | |
| if a.id < b.id { | |
| return -1 | |
| } | |
| if a.id > b.id { | |
| return 1 | |
| } | |
| return 0 | |
| }) | |
| for _, agg := range aggregated { | |
| report.TopSignals = append(report.TopSignals, &CodegenSignal{ | |
| ID: agg.id, | |
| Description: agg.description, | |
| Severity: agg.severity, | |
| AffectedSchemas: agg.affectedSchemas, | |
| }) |
| if len(s) <= maxLen { | ||
| return s | ||
| } | ||
| return s[:maxLen-3] + "..." |
There was a problem hiding this comment.
truncate slices by byte length (len + s[:...]), which can break UTF-8 schema names and cause misalignment with terminal cell widths. Consider truncating by runes or display width (e.g., using lipgloss.Width / runewidth / x/ansi helpers consistent with other TUIs in this repo).
| if len(s) <= maxLen { | |
| return s | |
| } | |
| return s[:maxLen-3] + "..." | |
| // If the string already fits within the maximum display width, return it as is. | |
| if lipgloss.Width(s) <= maxLen { | |
| return s | |
| } | |
| // If there's not enough space even for the ellipsis, return a trimmed ellipsis. | |
| if maxLen <= 3 { | |
| if maxLen <= 0 { | |
| return "" | |
| } | |
| return strings.Repeat(".", maxLen) | |
| } | |
| targetWidth := maxLen - 3 // leave room for "..." | |
| currentWidth := 0 | |
| end := 0 | |
| for i, r := range s { | |
| runeStr := string(r) | |
| w := lipgloss.Width(runeStr) | |
| if currentWidth+w > targetWidth { | |
| break | |
| } | |
| currentWidth += w | |
| // i is the byte index of this rune; add the rune's byte length to get the end. | |
| end = i + len(runeStr) | |
| } | |
| // If no rune fits, just return an ellipsis. | |
| if end == 0 { | |
| return "..." | |
| } | |
| return s[:end] + "..." |
Summary
openapi spec analyzecommand with interactive TUI (bubbletea) for exploring schema complexity, cycles, and codegen difficultytui(default, auto-falls back totextfor non-TTY),json,text,dot,mermaidKey features
skey), filterable by codegen tier (fkey), expandable detail cards with score breakdownTest plan
go test ./internal/analyze/...)go build ./...andgo vet ./...cleanopenapi spec analyze testdata/cyclic.openapi.yamllaunches TUIopenapi spec analyze testdata/cyclic.openapi.yaml --format jsonoutputs valid JSONcat spec.yaml | openapi spec analyze--output report.json --format jsonwrites to file🤖 Generated with Claude Code