Fix purl namespace corruption when path contains ".git" mid-path#1268
Fix purl namespace corruption when path contains ".git" mid-path#1268spoorcc wants to merge 5 commits into
Conversation
The generic VCS branch in _vcs_namespace_and_name used str.replace to strip a ".git" suffix from the path, which silently removed every occurrence anywhere in the path. A URL like https://gitlab.com/group/foo.github/project.git produced the namespace "group/foohub" instead of "group/foo.github". Use removesuffix so only the trailing .git is stripped. https://claude.ai/code/session_01KKvrvnVvsBChohuxbRRmzA
The fallthrough guard only handled falsy operands, so comparing a Version to any truthy non-Version (a plain tuple with matching arity, a string, an int) raised AttributeError when trying to read other.tag. Replace the guard with an isinstance check returning False — Version equality is domain-specific (tag has precedence over branch/revision) and should not silently fall back to tuple equality. Add an explicit __hash__ to keep the class hashable under pyright's static rules. https://claude.ai/code/session_01KKvrvnVvsBChohuxbRRmzA
target.startswith(remote_base) matched any URL whose host+path happened to share a leading string with the remote, regardless of path component boundaries. A remote base https://github.com/myorg therefore matched an unrelated URL https://github.com/myorg-private/repo, and dfetch add would attach the wrong remote (and compute a bogus repo-path). Require either an exact match or a "/" path boundary. https://claude.ai/code/session_01KKvrvnVvsBChohuxbRRmzA
|
Warning Review limit reached
More reviews will be available in 30 minutes and 45 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
WalkthroughFour independent bug fixes for dfetch 0.14.1: Changesdfetch 0.14.1 Bug Fixes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@dfetch/manifest/version.py`:
- Around line 22-25: The `__hash__` method does not match the semantics of the
`__eq__` method in the Version class. The `__eq__` method prioritizes tag
comparison—if either version has a tag, only tags are compared, ignoring branch
and revision. However, `__hash__` currently hashes all three fields
unconditionally. This violates Python's hash/equality contract where equal
objects must have equal hashes. Update the `__hash__` method to hash only the
fields that determine equality: if the tag exists, hash only the tag; otherwise,
hash the tuple of branch and revision. This ensures two versions that are equal
under `__eq__` will always produce the same hash value.
In `@tests/test_project_version.py`:
- Around line 220-223: The test_version_remains_hashable function only verifies
that a single Version instance can be in a set, but doesn't validate the
critical __eq__/__hash__ contract for equal-but-distinct instances. Strengthen
the test by creating two separate Version objects with identical tag values
(both "1.0"), asserting they compare as equal, verifying they have the same hash
value, and confirming that a set containing one Version instance will correctly
identify the other equal Version instance as a member (using the 'in' operator).
This will catch violations of Python's rule that equal objects must have equal
hashes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 541af782-9a2f-47f8-ad81-cb804f1f2704
📒 Files selected for processing (9)
CHANGELOG.rstdfetch/manifest/manifest.pydfetch/manifest/version.pydfetch/util/purl.pydfetch/vcs/git.pytests/test_add.pytests/test_git_vcs.pytests/test_project_version.pytests/test_purl.py
| if self.tag or other.tag: | ||
| return bool(self.tag == other.tag) | ||
|
|
||
| return bool(self.branch == other.branch and self.revision == other.revision) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Read-only verification: confirm equality/hash logic sites.
rg -n "def __eq__|if self.tag or other.tag|def __hash__|tuple.__hash__" dfetch/manifest/version.pyRepository: dfetch-org/dfetch
Length of output: 216
🏁 Script executed:
cat -n dfetch/manifest/version.pyRepository: dfetch-org/dfetch
Length of output: 1672
Fix __hash__ to match __eq__ semantics.
The __eq__ method (lines 22-25) treats tag as dominant: if either object has a tag, only tags are compared. However, __hash__ (line 29) hashes all three tuple fields. This breaks the hash/equality contract—two equal versions can hash differently, causing dict/set lookup failures.
For example, Version(tag="1.0", branch="", revision="") equals Version(tag="1.0", branch="main", revision="abc123") but produces different hashes.
Update __hash__ to hash only the fields that determine equality:
Suggested patch
def __hash__(self) -> int:
- """Hash based on the underlying tuple value."""
- return tuple.__hash__(self)
+ """Hash consistent with Version equality semantics."""
+ if self.tag:
+ return hash(("tag", self.tag))
+ return hash(("branch_revision", self.branch, self.revision))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@dfetch/manifest/version.py` around lines 22 - 25, The `__hash__` method does
not match the semantics of the `__eq__` method in the Version class. The
`__eq__` method prioritizes tag comparison—if either version has a tag, only
tags are compared, ignoring branch and revision. However, `__hash__` currently
hashes all three fields unconditionally. This violates Python's hash/equality
contract where equal objects must have equal hashes. Update the `__hash__`
method to hash only the fields that determine equality: if the tag exists, hash
only the tag; otherwise, hash the tuple of branch and revision. This ensures two
versions that are equal under `__eq__` will always produce the same hash value.
| def test_version_remains_hashable(): | ||
| """Defining __eq__ must not break hashing/set membership.""" | ||
| version = Version(tag="1.0") | ||
| assert version in {version} |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Strengthen the hashability test to cover equal-but-distinct instances.
The current assertion only proves an object is hashable with itself. It won’t catch __eq__/__hash__ contract mismatches for two different Version objects that compare equal.
Suggested test extension
def test_version_remains_hashable():
"""Defining __eq__ must not break hashing/set membership."""
version = Version(tag="1.0")
assert version in {version}
+
+ left = Version(tag="1.0", branch="main", revision="abc")
+ right = Version(tag="1.0", branch="develop", revision="xyz")
+ assert left == right
+ assert hash(left) == hash(right)
+ assert right in {left}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_project_version.py` around lines 220 - 223, The
test_version_remains_hashable function only verifies that a single Version
instance can be in a set, but doesn't validate the critical __eq__/__hash__
contract for equal-but-distinct instances. Strengthen the test by creating two
separate Version objects with identical tag values (both "1.0"), asserting they
compare as equal, verifying they have the same hash value, and confirming that a
set containing one Version instance will correctly identify the other equal
Version instance as a member (using the 'in' operator). This will catch
violations of Python's rule that equal objects must have equal hashes.
sha.startswith(rev) matched the first reference unconditionally when rev was an empty string, returning a bogus branch/tag tuple for any caller that passed in an uninitialised SHA. Return early when rev is empty. https://claude.ai/code/session_01KKvrvnVvsBChohuxbRRmzA
git remote get-url emits the URL with a trailing newline. The decoded result was returned verbatim, leaving the newline embedded in the URL string used by submodule URL resolution and downstream string operations. https://claude.ai/code/session_01KKvrvnVvsBChohuxbRRmzA
91b2a19 to
29590cf
Compare
|
@coderabbitai review |
✅ Action performedReview finished.
|
The generic VCS branch in _vcs_namespace_and_name used str.replace to
strip a ".git" suffix from the path, which silently removed every
occurrence anywhere in the path. A URL like
https://gitlab.com/group/foo.github/project.git produced the namespace
"group/foohub" instead of "group/foo.github". Use removesuffix so only
the trailing .git is stripped.
https://claude.ai/code/session_01KKvrvnVvsBChohuxbRRmzA
Summary by CodeRabbit
Release Notes
Bug Fixes
Tests