-
Notifications
You must be signed in to change notification settings - Fork 42
Diff-Based Optimized Workflow with Foundational Codemaps (Enclosing Entity) #137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Summary of ChangesHello @satvikkk, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the security analysis capabilities by introducing a highly optimized, diff-based workflow and laying the groundwork for a semantic code graph called Codemaps. The new workflow focuses analysis efforts precisely where changes occur, improving efficiency and resource utilization. Concurrently, the foundational Codemaps architecture provides a structured understanding of the codebase, enabling more intelligent and context-aware security analysis through features like smart context retrieval. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant and well-architected enhancement to the security analysis workflow by shifting to a diff-based approach and laying the foundation for 'Codemaps', a semantic code graph. The changes are extensive, including new tree-sitter based parsers for multiple languages, a graph service, and a new MCP tool get_enclosing_entity. The overall implementation is solid, with good test coverage for the new components.
My review includes a few high-severity findings: a bug in the file extension list that would prevent some files from being indexed, and a potential race condition in the graph-building logic that could cause issues under concurrent requests. I've also noted a couple of minor formatting and grammar issues that would improve code and prompt clarity.
| version: '0.1.0', | ||
| }); | ||
|
|
||
| const SUPPORTED_EXTS = ['.py', '.js', '.ts', 'go']; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a small bug in the list of supported extensions. For Go, it should be '.go' (with a dot), not 'go'. Also, '.tsx' files are supported by the TypeScript parser but are missing from this list. This will prevent Go and TSX files from being included in the codemap graph.
| const SUPPORTED_EXTS = ['.py', '.js', '.ts', 'go']; | |
| const SUPPORTED_EXTS = ['.py', '.js', '.ts', '.tsx', '.go']; |
| if (!graphBuilt) { | ||
| const loaded = await graphService.loadGraph(GEMINI_SECURITY_DIR); | ||
| if (!loaded) { | ||
| const files = await scan_dir(process.cwd()); | ||
| for (const file of files) { | ||
| try { | ||
| await graphBuilder.buildGraph(file); | ||
| } catch (e: any) { | ||
| // Ignore errors for unsupported file types | ||
| } | ||
| } | ||
| await graphService.saveGraph(GEMINI_SECURITY_DIR); | ||
| } | ||
| graphBuilt = true; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a potential race condition here. If two requests to get_enclosing_entity arrive concurrently when the graph has not been built yet (graphBuilt is false), both could enter this block and attempt to build or load the graph simultaneously. This could lead to corrupted state or wasted resources.
Consider adding a locking mechanism or a 'building' flag to ensure the graph is built only once. For example:
let graphBuilt = false;
let graphBuildingPromise: Promise<void> | null = null;
// in the handler...
if (!graphBuilt && !graphBuildingPromise) {
graphBuildingPromise = (async () => {
try {
// ... build/load logic ...
graphBuilt = true;
} finally {
graphBuildingPromise = null;
}
})();
}
if (graphBuildingPromise) {
await graphBuildingPromise;
}
// ... rest of the logic ...| * *Methodology:* You will iterate through each added or modified line in the retrieved diff: | ||
| 1. First, call the `find_line_numbers` tool to obtain the precise line number for the change. | ||
| 2. **Check Cache:** Before making further calls, verify if this line number is already covered by a function or class you have previously retrieved for this file. If it is covered, you **MUST SKIP** to the next line to avoid redundancy. | ||
| 3. **Contextual Analysis and Optimization:** If the line is not covered, and if the changed lines within the diff *do not* visibly contain the entire relevant enclosing entity (e.g., the full function or class definition), you **STRICTLY MUST** call `get_enclosing_entity` to retrieve the function details (eg: name, content and line range). This ensures a complete understanding of the surrounding code. However, if the changed lines are small and *do* visibly contain the entire relevant enclosing entity within the diff itself, you may optimize by *skipping* the `get_enclosing_entity` call and use the diff's context directly. Record the determined context (whether from diff or tool) in your local cache for future checks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor grammar fix for clarity in this prompt. It's generally better to use e.g., (with commas) for 'for example'.
3. **Contextual Analysis and Optimization:** If the line is not covered, and if the changed lines within the diff *do not* visibly contain the entire relevant enclosing entity (e.g., the full function or class definition), you **STRICTLY MUST** call `get_enclosing_entity` to retrieve the function details (e.g., name, content, and line range). This ensures a complete understanding of the surrounding code. However, if the changed lines are small and *do* visibly contain the entire relevant enclosing entity within the diff itself, you may optimize by *skipping* the `get_enclosing_entity` call and use the diff's context directly. Record the determined context (whether from diff or tool) in your local cache for future checks.
| const parentNode = this.graphService.querySymbol(parentName, filePath); if (parentNode) { | ||
| this.graphService.addEdge({ source: nodeId, target: parentNode.id, type: 'inherits' }); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block of code seems to have some formatting issues, likely from a copy-paste. The indentation is incorrect and the if statement is on the same line as the variable declaration. Refactoring this for clarity and consistency with the rest of the codebase is recommended.
const parentNode = this.graphService.querySymbol(parentName, filePath);
if (parentNode) {
this.graphService.addEdge({ source: nodeId, target: parentNode.id, type: 'inherits' });
}
This PR introduces two major enhancements to the security analysis capabilities: a highly optimized, Diff-based analysis workflow and the foundational architecture for Codemaps, a semantic code graph.
To significantly improve performance and reduce token usage, the security analysis workflow has been redesigned. The previous "Recon" pass, which analyzed the full content of all changed files, has been replaced with a more targeted, diff-centric approach:
This diff-first approach ensures that analysis is precisely targeted, dramatically reducing overhead and focusing on what matters most.
This PR also introduces the foundational architecture for Codemaps, a new feature that builds a semantic graph of the user's codebase.
This architectural change lays the groundwork for more advanced and accurate security analysis in the future.