Skip to content

Conversation

@satvikkk
Copy link

This PR introduces two major enhancements to the security analysis capabilities: a highly optimized, Diff-based analysis workflow and the foundational architecture for Codemaps, a semantic code graph.

  1. Optimized Diff-First Security Workflow

To significantly improve performance and reduce token usage, the security analysis workflow has been redesigned. The previous "Recon" pass, which analyzed the full content of all changed files, has been replaced with a more targeted, diff-centric approach:

  • Step 1: Analyze PR Diff: The workflow now begins by retrieving and analyzing only the content of the diff, focusing on lines that have been added or modified.
  • Step 2: Identify Taint Sources: It scans this limited diff content for potential "taint sources" (untrusted user input).
  • Step 3: Build a Targeted Plan: A file is scheduled for a full, deep-dive investigation only if a potential taint source is found within its diff. Benign changes (e.g., comments, documentation, lockfiles) are ignored, and their corresponding files are never read.
  • Step 4: Focused Investigation: The expensive deep-dive analysis is now reserved for the small subset of pre-qualified files that contain security-relevant changes.

This diff-first approach ensures that analysis is precisely targeted, dramatically reducing overhead and focusing on what matters most.

  1. Foundational Architecture for Codemaps

This PR also introduces the foundational architecture for Codemaps, a new feature that builds a semantic graph of the user's codebase.

  • Codemaps Engine: A GraphBuilder and GraphService now construct and query the code graph, with tree-sitter based parsers for Python, JavaScript, TypeScript, and Go.
  • Enhanced Security Analysis: The analyze command now leverages Codemaps. As the first application of this technology, a new "get_enclosing_entity" MCP tool has been added. It uses the graph to instantly find the surrounding context (e.g., function or class) for any line of code, when needed by the Agent.
  • Dependencies: Added tree-sitter and its language bindings.

This architectural change lays the groundwork for more advanced and accurate security analysis in the future.

@google-cla
Copy link

google-cla bot commented Feb 10, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gemini-code-assist
Copy link

Summary of Changes

Hello @satvikkk, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the security analysis capabilities by introducing a highly optimized, diff-based workflow and laying the groundwork for a semantic code graph called Codemaps. The new workflow focuses analysis efforts precisely where changes occur, improving efficiency and resource utilization. Concurrently, the foundational Codemaps architecture provides a structured understanding of the codebase, enabling more intelligent and context-aware security analysis through features like smart context retrieval.

Highlights

  • Diff-Based Security Workflow Optimization: The security analysis workflow has been re-engineered to be diff-centric, replacing the full-file 'Recon' pass with a targeted approach that analyzes only added or modified lines. This significantly reduces analysis scope and token usage.
  • Foundational Codemaps Architecture: Introduced the core architecture for Codemaps, a semantic code graph. This includes a GraphBuilder and GraphService utilizing Tree-sitter for parsing Python, JavaScript, TypeScript, and Go, enabling a deeper understanding of the codebase structure.
  • Smart Context Retrieval Tool: A new get_enclosing_entity tool has been added, leveraging Codemaps to efficiently identify the surrounding function or class context for any given line of code, which is crucial for focused security investigations.
  • Dependency Updates: Added tree-sitter and its language bindings for Python, JavaScript, TypeScript, and Go to support the new Codemaps functionality.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • .gitignore
    • Added '.DS_Store' to the ignore list.
  • commands/security/analyze.toml
    • Updated the description to reflect the new diff-based analysis approach.
    • Refactored the core operational loop from a 'Two-Pass Recon & Investigate' to a 'Diff-Based Two-Pass Workflow'.
    • Modified the Reconnaissance Pass to scan only the pull request diff for taint sources.
    • Introduced 'Smart Context Retrieval' procedure for efficient mapping of changed lines to their enclosing entities.
    • Streamlined the Investigation Pass to focus only on files and variables identified during diff reconnaissance.
    • Updated the example workflow in SECURITY_ANALYSIS_TODO.md to align with the diff-based strategy.
  • mcp-server/package-lock.json
    • Added tree-sitter and its language-specific bindings (tree-sitter-go, tree-sitter-javascript, tree-sitter-python, tree-sitter-typescript) as new dependencies.
    • Included node-addon-api and node-gyp-build as dependencies for tree-sitter packages.
  • mcp-server/package.json
    • Added tree-sitter and its language-specific bindings (tree-sitter-python, tree-sitter-javascript, tree-sitter-go, tree-sitter-typescript) to the project dependencies.
  • mcp-server/src/codemaps/graph_builder.test.ts
    • Added new test file for GraphBuilder to verify graph construction for Python, JavaScript, TypeScript, and Go files.
    • Included tests for handling unsupported file extensions.
  • mcp-server/src/codemaps/graph_builder.ts
    • Added new file defining the GraphBuilder class responsible for constructing a code graph.
    • Implemented logic to parse various programming languages (Python, JavaScript, Go, TypeScript) using tree-sitter.
    • Provided methods to traverse syntax trees and add nodes and edges to the GraphService.
  • mcp-server/src/codemaps/graph_service.test.ts
    • Added new test file for GraphService to validate node and edge addition, enclosing entity lookup, and symbol querying.
  • mcp-server/src/codemaps/graph_service.ts
    • Added new file defining the GraphService class to manage the code graph.
    • Implemented functionalities for adding nodes and edges, indexing nodes by name and file, and finding enclosing entities.
    • Included methods for querying symbols, ensuring module nodes, and handling pending calls.
    • Added capabilities to save and load the graph to/from a JSON file.
  • mcp-server/src/codemaps/index.ts
    • Added new index file to export modules from the codemaps directory.
  • mcp-server/src/codemaps/models.ts
    • Added new file defining GraphNode and GraphEdge interfaces for representing elements in the code graph.
  • mcp-server/src/codemaps/parsers/base_parser.ts
    • Added new file defining the LanguageParser interface for language-specific parsing logic.
  • mcp-server/src/codemaps/parsers/go_parser.ts
    • Added new file implementing GoParser for parsing Go syntax nodes and populating the code graph with functions, types, methods, calls, and imports.
  • mcp-server/src/codemaps/parsers/javascript_parser.ts
    • Added new file implementing JavaScriptParser for parsing JavaScript syntax nodes and populating the code graph with functions, classes, methods, variables, calls, and imports.
  • mcp-server/src/codemaps/parsers/python_parser.ts
    • Added new file implementing PythonParser for parsing Python syntax nodes and populating the code graph with functions, classes, calls, and imports.
  • mcp-server/src/codemaps/parsers/typescript_parser.ts
    • Added new file implementing TypeScriptParser for parsing TypeScript syntax nodes and populating the code graph with functions, classes, interfaces, enums, type aliases, methods, variables, calls, and imports.
  • mcp-server/src/filesystem.test.ts
    • Added git update-ref refs/remotes/origin/HEAD HEAD to the setup for consistent test environments.
    • Updated the test for getAuditScope to verify diff content for uncommitted changes, ensuring it correctly captures modifications.
  • mcp-server/src/index.ts
    • Imported GraphBuilder and GraphService to integrate Codemaps functionality.
    • Added scan_dir utility function to recursively find supported code files within a directory.
    • Introduced a new get_enclosing_entity tool that uses the GraphService to find the nearest enclosing function or class for a given line number.
    • Implemented graph building and loading logic within get_enclosing_entity to ensure the code graph is available for queries.
Activity
  • The pull request introduces significant new features and architectural changes, as detailed in the highlights and overview.
  • New files have been added to support the Codemaps functionality, including graph builders, services, models, and language-specific parsers.
  • Existing configuration and test files have been updated to reflect the new diff-based security analysis workflow and the integration of Codemaps.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-architected enhancement to the security analysis workflow by shifting to a diff-based approach and laying the foundation for 'Codemaps', a semantic code graph. The changes are extensive, including new tree-sitter based parsers for multiple languages, a graph service, and a new MCP tool get_enclosing_entity. The overall implementation is solid, with good test coverage for the new components.

My review includes a few high-severity findings: a bug in the file extension list that would prevent some files from being indexed, and a potential race condition in the graph-building logic that could cause issues under concurrent requests. I've also noted a couple of minor formatting and grammar issues that would improve code and prompt clarity.

version: '0.1.0',
});

const SUPPORTED_EXTS = ['.py', '.js', '.ts', 'go'];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's a small bug in the list of supported extensions. For Go, it should be '.go' (with a dot), not 'go'. Also, '.tsx' files are supported by the TypeScript parser but are missing from this list. This will prevent Go and TSX files from being included in the codemap graph.

Suggested change
const SUPPORTED_EXTS = ['.py', '.js', '.ts', 'go'];
const SUPPORTED_EXTS = ['.py', '.js', '.ts', '.tsx', '.go'];

Comment on lines +87 to +101
if (!graphBuilt) {
const loaded = await graphService.loadGraph(GEMINI_SECURITY_DIR);
if (!loaded) {
const files = await scan_dir(process.cwd());
for (const file of files) {
try {
await graphBuilder.buildGraph(file);
} catch (e: any) {
// Ignore errors for unsupported file types
}
}
await graphService.saveGraph(GEMINI_SECURITY_DIR);
}
graphBuilt = true;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a potential race condition here. If two requests to get_enclosing_entity arrive concurrently when the graph has not been built yet (graphBuilt is false), both could enter this block and attempt to build or load the graph simultaneously. This could lead to corrupted state or wasted resources.

Consider adding a locking mechanism or a 'building' flag to ensure the graph is built only once. For example:

let graphBuilt = false;
let graphBuildingPromise: Promise<void> | null = null;

// in the handler...
if (!graphBuilt && !graphBuildingPromise) {
  graphBuildingPromise = (async () => {
    try {
      // ... build/load logic ...
      graphBuilt = true;
    } finally {
      graphBuildingPromise = null;
    }
  })();
}

if (graphBuildingPromise) {
  await graphBuildingPromise;
}

// ... rest of the logic ...

* *Methodology:* You will iterate through each added or modified line in the retrieved diff:
1. First, call the `find_line_numbers` tool to obtain the precise line number for the change.
2. **Check Cache:** Before making further calls, verify if this line number is already covered by a function or class you have previously retrieved for this file. If it is covered, you **MUST SKIP** to the next line to avoid redundancy.
3. **Contextual Analysis and Optimization:** If the line is not covered, and if the changed lines within the diff *do not* visibly contain the entire relevant enclosing entity (e.g., the full function or class definition), you **STRICTLY MUST** call `get_enclosing_entity` to retrieve the function details (eg: name, content and line range). This ensures a complete understanding of the surrounding code. However, if the changed lines are small and *do* visibly contain the entire relevant enclosing entity within the diff itself, you may optimize by *skipping* the `get_enclosing_entity` call and use the diff's context directly. Record the determined context (whether from diff or tool) in your local cache for future checks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Minor grammar fix for clarity in this prompt. It's generally better to use e.g., (with commas) for 'for example'.

        3.  **Contextual Analysis and Optimization:** If the line is not covered, and if the changed lines within the diff *do not* visibly contain the entire relevant enclosing entity (e.g., the full function or class definition), you **STRICTLY MUST** call `get_enclosing_entity` to retrieve the function details (e.g., name, content, and line range). This ensures a complete understanding of the surrounding code. However, if the changed lines are small and *do* visibly contain the entire relevant enclosing entity within the diff itself, you may optimize by *skipping* the `get_enclosing_entity` call and use the diff's context directly. Record the determined context (whether from diff or tool) in your local cache for future checks.

Comment on lines +153 to +155
const parentNode = this.graphService.querySymbol(parentName, filePath); if (parentNode) {
this.graphService.addEdge({ source: nodeId, target: parentNode.id, type: 'inherits' });
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of code seems to have some formatting issues, likely from a copy-paste. The indentation is incorrect and the if statement is on the same line as the variable declaration. Refactoring this for clarity and consistency with the rest of the codebase is recommended.

      const parentNode = this.graphService.querySymbol(parentName, filePath);
      if (parentNode) {
        this.graphService.addEdge({ source: nodeId, target: parentNode.id, type: 'inherits' });
      }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant