Skip to content

Fix purl namespace corruption when path contains ".git" mid-path#1268

Open
spoorcc wants to merge 5 commits into
mainfrom
claude/dfetch-bug-fixes-A5KXW
Open

Fix purl namespace corruption when path contains ".git" mid-path#1268
spoorcc wants to merge 5 commits into
mainfrom
claude/dfetch-bug-fixes-A5KXW

Conversation

@spoorcc

@spoorcc spoorcc commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

The generic VCS branch in _vcs_namespace_and_name used str.replace to
strip a ".git" suffix from the path, which silently removed every
occurrence anywhere in the path. A URL like
https://gitlab.com/group/foo.github/project.git produced the namespace
"group/foohub" instead of "group/foo.github". Use removesuffix so only
the trailing .git is stripped.

https://claude.ai/code/session_01KKvrvnVvsBChohuxbRRmzA

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Fixed remote URL matching to enforce path boundaries, preventing false matches on shared prefixes
    • Corrected Version equality checks when comparing with non-Version objects
    • Removed trailing newlines from git remote URL output
    • Improved .git suffix stripping in repository URL parsing
    • Enhanced branch and tag identification from commit SHAs
  • Tests

    • Expanded test coverage for URL boundary validation, version comparisons, and Git operations

claude added 3 commits June 15, 2026 05:47
The generic VCS branch in _vcs_namespace_and_name used str.replace to
strip a ".git" suffix from the path, which silently removed every
occurrence anywhere in the path. A URL like
https://gitlab.com/group/foo.github/project.git produced the namespace
"group/foohub" instead of "group/foo.github". Use removesuffix so only
the trailing .git is stripped.

https://claude.ai/code/session_01KKvrvnVvsBChohuxbRRmzA
The fallthrough guard only handled falsy operands, so comparing a Version
to any truthy non-Version (a plain tuple with matching arity, a string,
an int) raised AttributeError when trying to read other.tag. Replace the
guard with an isinstance check returning False — Version equality is
domain-specific (tag has precedence over branch/revision) and should
not silently fall back to tuple equality. Add an explicit __hash__ to
keep the class hashable under pyright's static rules.

https://claude.ai/code/session_01KKvrvnVvsBChohuxbRRmzA
target.startswith(remote_base) matched any URL whose host+path happened
to share a leading string with the remote, regardless of path component
boundaries. A remote base https://github.com/myorg therefore matched an
unrelated URL https://github.com/myorg-private/repo, and dfetch add
would attach the wrong remote (and compute a bogus repo-path). Require
either an exact match or a "/" path boundary.

https://claude.ai/code/session_01KKvrvnVvsBChohuxbRRmzA
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@spoorcc, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 30 minutes and 45 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 48e0d5d9-a0bb-4269-8055-79df8d5814b7

📥 Commits

Reviewing files that changed from the base of the PR and between 91b2a19 and 29590cf.

📒 Files selected for processing (3)
  • CHANGELOG.rst
  • dfetch/vcs/git.py
  • tests/test_git_vcs.py

Walkthrough

Four independent bug fixes for dfetch 0.14.1: Version.__eq__ gains an isinstance type guard and a new __hash__ method is added; find_remote_for_url requires a /-delimited path boundary instead of a plain prefix; purl.py uses removesuffix to strip only a trailing .git; and git.py fixes SHA scan completeness and strips whitespace from get_remote_url output.

Changes

dfetch 0.14.1 Bug Fixes

Layer / File(s) Summary
Version equality type guard and hash restoration
dfetch/manifest/version.py, tests/test_project_version.py
Version.__eq__ now uses isinstance(other, Version) instead of a generic falsy check; a new __hash__ delegates to tuple.__hash__. Tests verify False is returned for non-Version operands and that instances remain usable in sets.
Remote URL path-boundary matching fix
dfetch/manifest/manifest.py, tests/test_add.py
find_remote_for_url now requires the remote base URL to be either exactly equal to or a /-delimited prefix of the candidate URL. Tests assert no match for a shared-org-prefix URL and correct matches for the exact remote and its subpaths.
PURL trailing .git suffix normalization
dfetch/util/purl.py, tests/test_purl.py
Generic VCS path normalization switches from replace to removesuffix(".git"). A new parametrized test with a .git-embedded path component verifies the namespace/name are derived correctly.
Git SHA scan completeness and remote URL whitespace strip
dfetch/vcs/git.py, tests/test_git_vcs.py, CHANGELOG.rst
_find_branch_tip_or_tag_from_sha adds an early-return for empty rev, replaces break with continue to scan all refs, and can now return both a tag and a branch from one call. get_remote_url adds .strip() to remove trailing newlines. Tests cover all three behaviors. Changelog entry documents all 0.14.1 fixes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the primary focus of the PR: fixing purl namespace corruption caused by mid-path '.git' substrings. While the PR includes additional bug fixes, the title correctly identifies the main issue.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/dfetch-bug-fixes-A5KXW

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dfetch/manifest/version.py`:
- Around line 22-25: The `__hash__` method does not match the semantics of the
`__eq__` method in the Version class. The `__eq__` method prioritizes tag
comparison—if either version has a tag, only tags are compared, ignoring branch
and revision. However, `__hash__` currently hashes all three fields
unconditionally. This violates Python's hash/equality contract where equal
objects must have equal hashes. Update the `__hash__` method to hash only the
fields that determine equality: if the tag exists, hash only the tag; otherwise,
hash the tuple of branch and revision. This ensures two versions that are equal
under `__eq__` will always produce the same hash value.

In `@tests/test_project_version.py`:
- Around line 220-223: The test_version_remains_hashable function only verifies
that a single Version instance can be in a set, but doesn't validate the
critical __eq__/__hash__ contract for equal-but-distinct instances. Strengthen
the test by creating two separate Version objects with identical tag values
(both "1.0"), asserting they compare as equal, verifying they have the same hash
value, and confirming that a set containing one Version instance will correctly
identify the other equal Version instance as a member (using the 'in' operator).
This will catch violations of Python's rule that equal objects must have equal
hashes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 541af782-9a2f-47f8-ad81-cb804f1f2704

📥 Commits

Reviewing files that changed from the base of the PR and between e35c251 and 91b2a19.

📒 Files selected for processing (9)
  • CHANGELOG.rst
  • dfetch/manifest/manifest.py
  • dfetch/manifest/version.py
  • dfetch/util/purl.py
  • dfetch/vcs/git.py
  • tests/test_add.py
  • tests/test_git_vcs.py
  • tests/test_project_version.py
  • tests/test_purl.py

Comment on lines 22 to 25
if self.tag or other.tag:
return bool(self.tag == other.tag)

return bool(self.branch == other.branch and self.revision == other.revision)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read-only verification: confirm equality/hash logic sites.
rg -n "def __eq__|if self.tag or other.tag|def __hash__|tuple.__hash__" dfetch/manifest/version.py

Repository: dfetch-org/dfetch

Length of output: 216


🏁 Script executed:

cat -n dfetch/manifest/version.py

Repository: dfetch-org/dfetch

Length of output: 1672


Fix __hash__ to match __eq__ semantics.

The __eq__ method (lines 22-25) treats tag as dominant: if either object has a tag, only tags are compared. However, __hash__ (line 29) hashes all three tuple fields. This breaks the hash/equality contract—two equal versions can hash differently, causing dict/set lookup failures.

For example, Version(tag="1.0", branch="", revision="") equals Version(tag="1.0", branch="main", revision="abc123") but produces different hashes.

Update __hash__ to hash only the fields that determine equality:

Suggested patch
     def __hash__(self) -> int:
-        """Hash based on the underlying tuple value."""
-        return tuple.__hash__(self)
+        """Hash consistent with Version equality semantics."""
+        if self.tag:
+            return hash(("tag", self.tag))
+        return hash(("branch_revision", self.branch, self.revision))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dfetch/manifest/version.py` around lines 22 - 25, The `__hash__` method does
not match the semantics of the `__eq__` method in the Version class. The
`__eq__` method prioritizes tag comparison—if either version has a tag, only
tags are compared, ignoring branch and revision. However, `__hash__` currently
hashes all three fields unconditionally. This violates Python's hash/equality
contract where equal objects must have equal hashes. Update the `__hash__`
method to hash only the fields that determine equality: if the tag exists, hash
only the tag; otherwise, hash the tuple of branch and revision. This ensures two
versions that are equal under `__eq__` will always produce the same hash value.

Comment on lines +220 to +223
def test_version_remains_hashable():
"""Defining __eq__ must not break hashing/set membership."""
version = Version(tag="1.0")
assert version in {version}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Strengthen the hashability test to cover equal-but-distinct instances.

The current assertion only proves an object is hashable with itself. It won’t catch __eq__/__hash__ contract mismatches for two different Version objects that compare equal.

Suggested test extension
 def test_version_remains_hashable():
     """Defining __eq__ must not break hashing/set membership."""
     version = Version(tag="1.0")
     assert version in {version}
+
+    left = Version(tag="1.0", branch="main", revision="abc")
+    right = Version(tag="1.0", branch="develop", revision="xyz")
+    assert left == right
+    assert hash(left) == hash(right)
+    assert right in {left}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_project_version.py` around lines 220 - 223, The
test_version_remains_hashable function only verifies that a single Version
instance can be in a set, but doesn't validate the critical __eq__/__hash__
contract for equal-but-distinct instances. Strengthen the test by creating two
separate Version objects with identical tag values (both "1.0"), asserting they
compare as equal, verifying they have the same hash value, and confirming that a
set containing one Version instance will correctly identify the other equal
Version instance as a member (using the 'in' operator). This will catch
violations of Python's rule that equal objects must have equal hashes.

claude added 2 commits June 15, 2026 19:55
sha.startswith(rev) matched the first reference unconditionally when
rev was an empty string, returning a bogus branch/tag tuple for any
caller that passed in an uninitialised SHA. Return early when rev is
empty.

https://claude.ai/code/session_01KKvrvnVvsBChohuxbRRmzA
git remote get-url emits the URL with a trailing newline. The decoded
result was returned verbatim, leaving the newline embedded in the URL
string used by submodule URL resolution and downstream string
operations.

https://claude.ai/code/session_01KKvrvnVvsBChohuxbRRmzA
@spoorcc spoorcc force-pushed the claude/dfetch-bug-fixes-A5KXW branch from 91b2a19 to 29590cf Compare June 15, 2026 19:57
@spoorcc

spoorcc commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants