-
Notifications
You must be signed in to change notification settings - Fork 81
Add support for scanning rubygems #638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2e5e5a7 to
d243a12
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for scanning RubyGems packages to GuardDog. The implementation enables detection of malicious gems through both source code analysis (using Semgrep rules) and metadata checks. The top Ruby gem names are sourced from packages.ecosyste.ms for typosquatting detection, and the implementation handles the nested tar format of .gem files.
Changes:
- Added RubyGems ecosystem support with package and project scanners
- Implemented 6 Semgrep rules for detecting malicious Ruby code patterns
- Added 5 metadata detectors for RubyGems (typosquatting, empty info, release zero, bundled binary, repository integrity mismatch)
Reviewed changes
Copilot reviewed 28 out of 29 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| guarddog/ecosystems.py | Added RUBYGEMS enum and friendly name |
| guarddog/scanners/rubygems_package_scanner.py | Implements downloading and extracting .gem files with nested tar handling |
| guarddog/scanners/rubygems_project_scanner.py | Parses Gemfile.lock to extract gem dependencies |
| guarddog/scanners/init.py | Registers RubyGems scanners |
| guarddog/analyzer/sourcecode/*.yml | Six new Semgrep rules for Ruby malware detection |
| guarddog/analyzer/sourcecode/init.py | Maps ruby language to RUBYGEMS ecosystem |
| guarddog/analyzer/metadata/rubygems/*.py | Five metadata detectors for RubyGems |
| guarddog/analyzer/metadata/init.py | Registers RubyGems metadata rules |
| guarddog/analyzer/metadata/resources/top_rubygems_packages.json | List of 976 popular gems for typosquatting detection |
| tests/core/test_rubygems_*.py | Tests for RubyGems scanners |
| tests/analyzer/sourcecode/*.rb | Test files for Semgrep rules |
| README.md | Documentation updates for RubyGems support |
Comments suppressed due to low confidence (1)
guarddog/analyzer/metadata/rubygems/repository_integrity_mismatch.py:1
- The error message only mentions 'data.tar.gz' but the code also checks for 'data.tar'. The message should mention both formats: 'data.tar.gz or data.tar not found in gem'.
import hashlib
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
guarddog/analyzer/metadata/rubygems/repository_integrity_mismatch.py
Outdated
Show resolved
Hide resolved
|
@sobregosodd updated the one copilot suggestion that made sense |
guarddog/analyzer/metadata/rubygems/repository_integrity_mismatch.py
Outdated
Show resolved
Hide resolved
|
@andrew the code looks really good! I will take a stab to fix my comments, I'll tag you in case I stumble with a roadblock |
|
Thanks again for the contribution @andrew ! |
|
Thanks for merging! |
I used https://packages.ecosyste.ms to get the list of the top ruby gem names for typosquatting detection.
The rubygems.org api doesn't expose email addresses so can't use some of the Email-based detectors.
Gem extraction handles the nested tar format (.gem contains data.tar.gz)
Added tests for the ruby semgrep rules