Skip to content

feat: add compare_file_contents tool with semantic diffs#2010

Open
ra-n-dom wants to merge 1 commit intogithub:mainfrom
ra-n-dom:feature/compare-file-contents
Open

feat: add compare_file_contents tool with semantic diffs#2010
ra-n-dom wants to merge 1 commit intogithub:mainfrom
ra-n-dom:feature/compare-file-contents

Conversation

@ra-n-dom
Copy link

Closes #1973

Summary

Adds a compare_file_contents tool that compares a file between two git refs, producing semantic diffs for structured formats (JSON, YAML) instead of noisy line-based unified diffs. Falls back to unified diff for unsupported formats.

  • Semantic diff shows path-notation changes (e.g. users[1].name: "Bob" → "Bobby") — fewer tokens, unambiguous output
  • Unified diff fallback for non-structured formats
  • Gated behind semantic_diff feature flag

Why

Line-based diffs for JSON/YAML are token-inefficient and noisy for AI models. A reformatted JSON file with one value change produces a huge unified diff. Semantic diffs surface only the actual changes, improving model comprehension and reducing context usage.

What changed

  • pkg/github/compare.go: Tool definition + CompareFileContents handler using raw.Client.GetRawContent() to fetch file content at specific refs
  • pkg/github/semantic_diff.go: Semantic diff engine — deep comparison of JSON/YAML structures with path-notation output (modified, added, removed, type_changed), plus unified diff fallback
  • pkg/github/compare_test.go: Table-driven tests (6 subtests) using MockHTTPClientWithHandlers
  • pkg/github/semantic_diff_test.go: Unit tests for the semantic diff engine
  • pkg/github/tools.go: Registered CompareFileContents in AllTools()
  • pkg/github/__toolsnaps__/compare_file_contents.snap: Toolsnap snapshot

Example

# Semantic diff (JSON)
Modified:
  users[1].name: "Bob" → "Bobby"

Added:
  settings.theme: "dark"

Removed:
  settings.legacy: true

vs a unified diff that would show every reformatted line.

MCP impact

  • New tool added

Tool: compare_file_contents(owner, repo, path, base, head)
Feature flag: semantic_diff

Prompts tested (tool changes only)

  • "Compare go.mod between two commits" → unified diff fallback (non-structured format)
  • "Compare a JSON config between branches" → semantic diff with path notation

Security / limits

  • No security or limits impact
  • Uses existing raw.Client for content fetching (same auth as other tools)
  • Read-only operation

Tool renaming

  • I am not renaming tools as part of this PR

Lint & tests

  • go test -v ./pkg/github/ -run TestCompare — 6/6 passing
  • go test -v ./pkg/github/ -run TestSemantic — all passing
  • go vet ./pkg/github/ — clean

@ra-n-dom ra-n-dom requested a review from a team as a code owner February 13, 2026 04:10
Copilot AI review requested due to automatic review settings February 13, 2026 04:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new compare_file_contents MCP tool that intelligently compares files between git refs using semantic diffs for structured formats (JSON, YAML) instead of line-based unified diffs. The feature is gated behind the semantic_diff feature flag for controlled rollout.

Changes:

  • Adds a new MCP tool for comparing file contents across git refs with semantic diff support for JSON/YAML
  • Implements a semantic diff engine that produces path-notation changes (e.g., users[1].name: "Bob" → "Bobby") for better AI model comprehension
  • Falls back to unified diff for non-structured file formats
  • Includes comprehensive unit tests for both the tool and semantic diff engine

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pkg/github/tools.go Registers the new CompareFileContents tool in the repository toolset
pkg/github/compare.go Implements the tool definition and handler with raw client integration for fetching file content at specific refs
pkg/github/semantic_diff.go Core semantic diff engine with JSON/YAML parsing, deep comparison logic, and unified diff fallback
pkg/github/compare_test.go Table-driven tests for the tool covering JSON, YAML, and unified diff fallback scenarios
pkg/github/semantic_diff_test.go Comprehensive unit tests for semantic diff engine edge cases (type changes, nested objects, arrays, etc.)
pkg/github/toolsnaps/compare_file_contents.snap Tool schema snapshot for API surface documentation

Comment on lines 198 to 214
// containsRef checks if a URL path contains a specific ref segment.
func containsRef(path, ref string) bool {
return len(path) > 0 && contains(path, "/"+ref+"/")
}

func contains(s, substr string) bool {
return len(s) >= len(substr) && searchString(s, substr)
}

func searchString(s, substr string) bool {
for i := 0; i <= len(s)-len(substr); i++ {
if s[i:i+len(substr)] == substr {
return true
}
}
return false
}
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions contains, searchString, and containsRef reinvent standard library functionality. The Go standard library provides strings.Contains which is widely used elsewhere in the codebase (seen in repositories_test.go, helper_test.go, etc.). Using standard library functions makes the code more maintainable and easier to understand.

Replace the custom contains and searchString functions with strings.Contains, and simplify containsRef to use it directly.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — replaced contains/searchString with strings.Contains. containsRef is kept as a thin wrapper since it adds the /ref/ segment formatting.

Add a new compare_file_contents tool that compares a file between two
git refs, producing semantic diffs for structured formats (JSON, YAML)
and falling back to unified diffs for unsupported formats. The tool is
feature-flagged behind the "semantic_diff" flag.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: compare_file_contents tool with semantic diffs

1 participant