Fix purl namespace corruption when path contains ".git" mid-path by spoorcc · Pull Request #1268 · dfetch-org/dfetch

spoorcc · 2026-06-15T05:53:13Z

The generic VCS branch in _vcs_namespace_and_name used str.replace to
strip a ".git" suffix from the path, which silently removed every
occurrence anywhere in the path. A URL like
https://gitlab.com/group/foo.github/project.git produced the namespace
"group/foohub" instead of "group/foo.github". Use removesuffix so only
the trailing .git is stripped.

https://claude.ai/code/session_01KKvrvnVvsBChohuxbRRmzA

Summary by CodeRabbit

Release Notes

Bug Fixes
- Fixed remote URL matching to enforce path boundaries, preventing false matches on shared prefixes
- Corrected Version equality checks when comparing with non-Version objects
- Removed trailing newlines from git remote URL output
- Improved .git suffix stripping in repository URL parsing
- Enhanced branch and tag identification from commit SHAs
Tests
- Expanded test coverage for URL boundary validation, version comparisons, and Git operations

The generic VCS branch in _vcs_namespace_and_name used str.replace to strip a ".git" suffix from the path, which silently removed every occurrence anywhere in the path. A URL like https://gitlab.com/group/foo.github/project.git produced the namespace "group/foohub" instead of "group/foo.github". Use removesuffix so only the trailing .git is stripped. https://claude.ai/code/session_01KKvrvnVvsBChohuxbRRmzA

The fallthrough guard only handled falsy operands, so comparing a Version to any truthy non-Version (a plain tuple with matching arity, a string, an int) raised AttributeError when trying to read other.tag. Replace the guard with an isinstance check returning False — Version equality is domain-specific (tag has precedence over branch/revision) and should not silently fall back to tuple equality. Add an explicit __hash__ to keep the class hashable under pyright's static rules. https://claude.ai/code/session_01KKvrvnVvsBChohuxbRRmzA

target.startswith(remote_base) matched any URL whose host+path happened to share a leading string with the remote, regardless of path component boundaries. A remote base https://github.com/myorg therefore matched an unrelated URL https://github.com/myorg-private/repo, and dfetch add would attach the wrong remote (and compute a bogus repo-path). Require either an exact match or a "/" path boundary. https://claude.ai/code/session_01KKvrvnVvsBChohuxbRRmzA

coderabbitai · 2026-06-15T05:53:24Z

Warning

Review limit reached

@spoorcc, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 30 minutes and 45 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 48e0d5d9-a0bb-4269-8055-79df8d5814b7

📥 Commits

Reviewing files that changed from the base of the PR and between 91b2a19 and 29590cf.

📒 Files selected for processing (3)

CHANGELOG.rst
dfetch/vcs/git.py
tests/test_git_vcs.py

Walkthrough

Four independent bug fixes for dfetch 0.14.1: Version.__eq__ gains an isinstance type guard and a new __hash__ method is added; find_remote_for_url requires a /-delimited path boundary instead of a plain prefix; purl.py uses removesuffix to strip only a trailing .git; and git.py fixes SHA scan completeness and strips whitespace from get_remote_url output.

Changes

dfetch 0.14.1 Bug Fixes

Layer / File(s)	Summary
Version equality type guard and hash restoration `dfetch/manifest/version.py`, `tests/test_project_version.py`	`Version.__eq__` now uses `isinstance(other, Version)` instead of a generic falsy check; a new `__hash__` delegates to `tuple.__hash__`. Tests verify `False` is returned for non-`Version` operands and that instances remain usable in sets.
Remote URL path-boundary matching fix `dfetch/manifest/manifest.py`, `tests/test_add.py`	`find_remote_for_url` now requires the remote base URL to be either exactly equal to or a `/`-delimited prefix of the candidate URL. Tests assert no match for a shared-org-prefix URL and correct matches for the exact remote and its subpaths.
PURL trailing `.git` suffix normalization `dfetch/util/purl.py`, `tests/test_purl.py`	Generic VCS path normalization switches from `replace` to `removesuffix(".git")`. A new parametrized test with a `.git`-embedded path component verifies the namespace/name are derived correctly.
Git SHA scan completeness and remote URL whitespace strip `dfetch/vcs/git.py`, `tests/test_git_vcs.py`, `CHANGELOG.rst`	`_find_branch_tip_or_tag_from_sha` adds an early-return for empty `rev`, replaces `break` with `continue` to scan all refs, and can now return both a tag and a branch from one call. `get_remote_url` adds `.strip()` to remove trailing newlines. Tests cover all three behaviors. Changelog entry documents all 0.14.1 fixes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the primary focus of the PR: fixing purl namespace corruption caused by mid-path '.git' substrings. While the PR includes additional bug fixes, the title correctly identifies the main issue.
Docstring Coverage	✅ Passed	Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/dfetch-bug-fixes-A5KXW

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dfetch/manifest/version.py`:
- Around line 22-25: The `__hash__` method does not match the semantics of the
`__eq__` method in the Version class. The `__eq__` method prioritizes tag
comparison—if either version has a tag, only tags are compared, ignoring branch
and revision. However, `__hash__` currently hashes all three fields
unconditionally. This violates Python's hash/equality contract where equal
objects must have equal hashes. Update the `__hash__` method to hash only the
fields that determine equality: if the tag exists, hash only the tag; otherwise,
hash the tuple of branch and revision. This ensures two versions that are equal
under `__eq__` will always produce the same hash value.

In `@tests/test_project_version.py`:
- Around line 220-223: The test_version_remains_hashable function only verifies
that a single Version instance can be in a set, but doesn't validate the
critical __eq__/__hash__ contract for equal-but-distinct instances. Strengthen
the test by creating two separate Version objects with identical tag values
(both "1.0"), asserting they compare as equal, verifying they have the same hash
value, and confirming that a set containing one Version instance will correctly
identify the other equal Version instance as a member (using the 'in' operator).
This will catch violations of Python's rule that equal objects must have equal
hashes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 541af782-9a2f-47f8-ad81-cb804f1f2704

📥 Commits

Reviewing files that changed from the base of the PR and between e35c251 and 91b2a19.

📒 Files selected for processing (9)

CHANGELOG.rst
dfetch/manifest/manifest.py
dfetch/manifest/version.py
dfetch/util/purl.py
dfetch/vcs/git.py
tests/test_add.py
tests/test_git_vcs.py
tests/test_project_version.py
tests/test_purl.py

coderabbitai · 2026-06-15T06:00:38Z

        if self.tag or other.tag:
            return bool(self.tag == other.tag)

        return bool(self.branch == other.branch and self.revision == other.revision)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Read-only verification: confirm equality/hash logic sites. rg -n "def __eq__|if self.tag or other.tag|def __hash__|tuple.__hash__" dfetch/manifest/version.py

Repository: dfetch-org/dfetch

Length of output: 216

🏁 Script executed:

cat -n dfetch/manifest/version.py

Repository: dfetch-org/dfetch

Length of output: 1672

Fix __hash__ to match __eq__ semantics.

The __eq__ method (lines 22-25) treats tag as dominant: if either object has a tag, only tags are compared. However, __hash__ (line 29) hashes all three tuple fields. This breaks the hash/equality contract—two equal versions can hash differently, causing dict/set lookup failures.

For example, Version(tag="1.0", branch="", revision="") equals Version(tag="1.0", branch="main", revision="abc123") but produces different hashes.

Update __hash__ to hash only the fields that determine equality:

Suggested patch

def __hash__(self) -> int: - """Hash based on the underlying tuple value.""" - return tuple.__hash__(self) + """Hash consistent with Version equality semantics.""" + if self.tag: + return hash(("tag", self.tag)) + return hash(("branch_revision", self.branch, self.revision))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dfetch/manifest/version.py` around lines 22 - 25, The `__hash__` method does not match the semantics of the `__eq__` method in the Version class. The `__eq__` method prioritizes tag comparison—if either version has a tag, only tags are compared, ignoring branch and revision. However, `__hash__` currently hashes all three fields unconditionally. This violates Python's hash/equality contract where equal objects must have equal hashes. Update the `__hash__` method to hash only the fields that determine equality: if the tag exists, hash only the tag; otherwise, hash the tuple of branch and revision. This ensures two versions that are equal under `__eq__` will always produce the same hash value.

coderabbitai · 2026-06-15T06:00:38Z

+def test_version_remains_hashable():
+    """Defining __eq__ must not break hashing/set membership."""
+    version = Version(tag="1.0")
+    assert version in {version}


🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Strengthen the hashability test to cover equal-but-distinct instances.

The current assertion only proves an object is hashable with itself. It won’t catch __eq__/__hash__ contract mismatches for two different Version objects that compare equal.

Suggested test extension

def test_version_remains_hashable(): """Defining __eq__ must not break hashing/set membership.""" version = Version(tag="1.0") assert version in {version} + + left = Version(tag="1.0", branch="main", revision="abc") + right = Version(tag="1.0", branch="develop", revision="xyz") + assert left == right + assert hash(left) == hash(right) + assert right in {left}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_project_version.py` around lines 220 - 223, The test_version_remains_hashable function only verifies that a single Version instance can be in a set, but doesn't validate the critical __eq__/__hash__ contract for equal-but-distinct instances. Strengthen the test by creating two separate Version objects with identical tag values (both "1.0"), asserting they compare as equal, verifying they have the same hash value, and confirming that a set containing one Version instance will correctly identify the other equal Version instance as a member (using the 'in' operator). This will catch violations of Python's rule that equal objects must have equal hashes.

sha.startswith(rev) matched the first reference unconditionally when rev was an empty string, returning a bogus branch/tag tuple for any caller that passed in an uninitialised SHA. Return early when rev is empty. https://claude.ai/code/session_01KKvrvnVvsBChohuxbRRmzA

git remote get-url emits the URL with a trailing newline. The decoded result was returned verbatim, leaving the newline embedded in the URL string used by submodule URL resolution and downstream string operations. https://claude.ai/code/session_01KKvrvnVvsBChohuxbRRmzA

spoorcc · 2026-06-15T20:14:55Z

@coderabbitai review

coderabbitai · 2026-06-15T20:15:01Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

claude added 3 commits June 15, 2026 05:47

spoorcc temporarily deployed to testpypi June 15, 2026 05:54 — with GitHub Actions Inactive

coderabbitai Bot reviewed Jun 15, 2026

View reviewed changes

claude added 2 commits June 15, 2026 19:55

spoorcc force-pushed the claude/dfetch-bug-fixes-A5KXW branch from 91b2a19 to 29590cf Compare June 15, 2026 19:57

spoorcc temporarily deployed to testpypi June 15, 2026 20:00 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix purl namespace corruption when path contains ".git" mid-path#1268

Fix purl namespace corruption when path contains ".git" mid-path#1268
spoorcc wants to merge 5 commits into
mainfrom
claude/dfetch-bug-fixes-A5KXW

spoorcc commented Jun 15, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading

Review limit reached

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 15, 2026

Uh oh!

coderabbitai Bot Jun 15, 2026

Uh oh!

spoorcc commented Jun 15, 2026

Uh oh!

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

spoorcc commented Jun 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

spoorcc commented Jun 15, 2026

Uh oh!

coderabbitai Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

spoorcc commented Jun 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading