Skip to content

Conversation

@andrew
Copy link
Contributor

@andrew andrew commented Dec 17, 2025

I used https://packages.ecosyste.ms to get the list of the top ruby gem names for typosquatting detection.

The rubygems.org api doesn't expose email addresses so can't use some of the Email-based detectors.

Gem extraction handles the nested tar format (.gem contains data.tar.gz)

Added tests for the ruby semgrep rules

@andrew andrew force-pushed the main branch 3 times, most recently from 2e5e5a7 to d243a12 Compare December 17, 2025 15:33
@sobregosodd sobregosodd requested a review from Copilot January 14, 2026 09:06
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for scanning RubyGems packages to GuardDog. The implementation enables detection of malicious gems through both source code analysis (using Semgrep rules) and metadata checks. The top Ruby gem names are sourced from packages.ecosyste.ms for typosquatting detection, and the implementation handles the nested tar format of .gem files.

Changes:

  • Added RubyGems ecosystem support with package and project scanners
  • Implemented 6 Semgrep rules for detecting malicious Ruby code patterns
  • Added 5 metadata detectors for RubyGems (typosquatting, empty info, release zero, bundled binary, repository integrity mismatch)

Reviewed changes

Copilot reviewed 28 out of 29 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
guarddog/ecosystems.py Added RUBYGEMS enum and friendly name
guarddog/scanners/rubygems_package_scanner.py Implements downloading and extracting .gem files with nested tar handling
guarddog/scanners/rubygems_project_scanner.py Parses Gemfile.lock to extract gem dependencies
guarddog/scanners/init.py Registers RubyGems scanners
guarddog/analyzer/sourcecode/*.yml Six new Semgrep rules for Ruby malware detection
guarddog/analyzer/sourcecode/init.py Maps ruby language to RUBYGEMS ecosystem
guarddog/analyzer/metadata/rubygems/*.py Five metadata detectors for RubyGems
guarddog/analyzer/metadata/init.py Registers RubyGems metadata rules
guarddog/analyzer/metadata/resources/top_rubygems_packages.json List of 976 popular gems for typosquatting detection
tests/core/test_rubygems_*.py Tests for RubyGems scanners
tests/analyzer/sourcecode/*.rb Test files for Semgrep rules
README.md Documentation updates for RubyGems support
Comments suppressed due to low confidence (1)

guarddog/analyzer/metadata/rubygems/repository_integrity_mismatch.py:1

  • The error message only mentions 'data.tar.gz' but the code also checks for 'data.tar'. The message should mention both formats: 'data.tar.gz or data.tar not found in gem'.
import hashlib

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@andrew
Copy link
Contributor Author

andrew commented Jan 14, 2026

@sobregosodd updated the one copilot suggestion that made sense

@sobregosodd
Copy link
Contributor

@andrew the code looks really good! I will take a stab to fix my comments, I'll tag you in case I stumble with a roadblock

@sobregosodd
Copy link
Contributor

Thanks again for the contribution @andrew !

@sobregosodd sobregosodd merged commit 5b525b4 into DataDog:main Jan 15, 2026
10 checks passed
@andrew
Copy link
Contributor Author

andrew commented Jan 15, 2026

Thanks for merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants