-
Notifications
You must be signed in to change notification settings - Fork 1
openssf malicious packages integration #175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR integrates OpenSSF's malicious packages database into Trivy to detect known malicious packages. This adds a new "Malicious packages detection" rule at the highest severity level to identify packages containing malware, typosquatting attacks, and dependency confusion attacks.
- Implements OpenSSF scanner with OSV format parsing for malicious package detection
- Adds new
malicious_packagesrule with critical severity level - Integrates scanning into the main tool execution flow
Reviewed Changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/tool/tool.go | Integrates OpenSSF scanner into main execution flow and adds new rule pattern |
| internal/tool/openssf_scanner.go | Core implementation of OpenSSF malicious packages scanner with OSV parsing |
| internal/docgen/rule.go | Adds rule definition for malicious packages detection |
| docs/patterns.json | Adds pattern configuration for malicious packages rule |
| docs/multiple-tests/pattern-malicious/* | Test files demonstrating malicious package detection |
| docs/description/* | Documentation files for the new pattern |
| Dockerfile | Copies OpenSSF cache data into container |
| .circleci/config.yml | Downloads OpenSSF database during CI build |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
…s to accelerate scanning.
… package-lock.json also
… reporting in openssf scanner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still more fixes and improvements but I think I've left enough comments for a first pass.
We'll move to those fixes after that. And obviously there are still tests to write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete and add a line to .gitignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You did not delete the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete and add a line to .gitignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You did not delete the file.
internal/tool/openssf_scanner.go
Outdated
| if s.shouldAnalyzeFile(toolExecution.Files, target) { | ||
| return []codacy.Result{issue} | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already logic to filter out results from files that should not be analyzed in tool.go. We should rely on that for consistency.
Or we want to filter when OSSF scanning, do it way earlier.
internal/tool/openssf_scanner.go
Outdated
| func (s *OpenSSFScanner) isPatternEnabled(patterns *[]codacy.Pattern, patternID string) bool { | ||
| if patterns == nil { | ||
| return false | ||
| } | ||
| for _, pattern := range *patterns { | ||
| if pattern.ID == patternID { | ||
| return true | ||
| } | ||
| } | ||
| return false | ||
| } | ||
|
|
||
| func (s *OpenSSFScanner) shouldAnalyzeFile(knownFiles *[]string, fileName string) bool { | ||
| if knownFiles == nil { | ||
| return true | ||
| } | ||
| for _, file := range *knownFiles { | ||
| if file == fileName { | ||
| return true | ||
| } | ||
| } | ||
| return false | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given my suggestion above you can delete these.
internal/tool/openssf_scanner.go
Outdated
| } | ||
|
|
||
| func (s *OpenSSFScanner) findPackageLineNumber(sourceDir, fileName, pkgName string) int { | ||
| return fallbackSearchForLineNumber(sourceDir, fileName, pkgName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This simply uses the fallback. Please use the same logic in tool.go get the line number.
internal/tool/openssf_scanner.go
Outdated
| for _, affectedVersion := range affectedVersions { | ||
| if version == affectedVersion { | ||
| return true | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for _, affectedVersion := range affectedVersions { | |
| if version == affectedVersion { | |
| return true | |
| } | |
| } | |
| if slices.Contains(affectedVersions, version) { | |
| return true | |
| } |
internal/tool/openssf_scanner.go
Outdated
| if versionRange.Type == "SEMVER" || versionRange.Type == "ECOSYSTEM" { | ||
| if s.checkSingleVersionRange(version, versionRange.Events) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if versionRange.Type == "SEMVER" || versionRange.Type == "ECOSYSTEM" { | |
| if s.checkSingleVersionRange(version, versionRange.Events) { | |
| if (versionRange.Type == "SEMVER" || versionRange.Type == "ECOSYSTEM") && s.checkSingleVersionRange(version, versionRange.Events) { |
afsmeira
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still comments to address. And unit tests to add.
|
|
||
|
|
||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: delete empty lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You did not delete the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You did not delete the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You did not delete the file.
| BASE = os.environ.get('OPENSSF_OSV_DIR', 'openssf-cache/osv') | ||
| OUT = os.environ.get('OPENSSF_INDEX_OUT', 'openssf-index.json.gz') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the environment variables are not used right now, I would remove them and just use the hardcoded values.
| # Generate the OpenSSF index during build | ||
| COPY scripts/ scripts/ | ||
| COPY openssf-cache/osv/ openssf-cache/osv/ | ||
| RUN apk add --no-cache python3 && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I was thinking that we should probably run this in the CI and just copy the resulting file to the image.
This avoids adding python3 to the image and I'm pretty sure our CI already has python available.
| with gzip.open(OUT, 'wt', encoding='utf-8') as gz: | ||
| json.dump(index, gz) | ||
|
|
||
| print(f"Wrote index: {OUT} (ecosystems={len(index)})") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if print adds a new line, but if it doesn't please add one manually.
| if toolExecution.Files == nil { | ||
| return mapIssuesWithoutLineNumber(issues), nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this generated by AI? Do you actually know if this description is true?
| if err := s.ensureIndexLoaded(); err != nil { | ||
| log.Printf("Warning: Failed to load OpenSSF malicious packages database: %v", err) | ||
| return results | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this pattern is selected but we can't load the file, the should fail instead of returning empty results.
OpenSSF publishes a regularly updated list of 3rd party packages that contain malware. This PR adds a new Trivy rule "Malicious packages detection" at the highest severity level.
The OpenSSF DB is 227mb which is about 1/3rd the size of Trivy's vuln DB (734mb) -- I would hope this would not add a vast extra burden on processing times. Probably we would want/need to run malicious package detection both on commit and in the nightly SCA process, since packages can be retroactively designated as malicious.