fix windows updater zip root path normalization#8019
fix windows updater zip root path normalization#8019zouyonghe wants to merge 5 commits intoAstrBotDevs:masterfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request improves path normalization during the unzipping process in both PluginUpdator and RepoZipUpdator by utilizing os.path.normpath and a new _normalize_archive_root_dir method. These changes, supported by new test cases, specifically address path handling issues on Windows, including support for extended length paths. Review feedback suggests narrowing the exception handling from BaseException to Exception during the cleanup phase to avoid catching system-exiting signals like KeyboardInterrupt.
| shutil.rmtree(os.path.join(target_dir, update_dir), onerror=on_error) | ||
| shutil.rmtree(update_root_path, onerror=on_error) | ||
| os.remove(zip_path) | ||
| except BaseException: |
There was a problem hiding this comment.
…-updater-path-normalization # Conflicts: # astrbot/core/star/updator.py
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- Consider moving
_normalize_archive_root_dirinto a shared base class or utility used by both repo and plugin updaters so the normalization logic is defined in one place and you don’t rely on each class defining its own helper. - In both
unzip_fileimplementations you repeatedly callos.path.normpathinside the loop; you could simplify and reduce redundancy by computing normalizedPathobjects once (e.g., viapathlib.Path) and reusing them for item paths.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider moving `_normalize_archive_root_dir` into a shared base class or utility used by both repo and plugin updaters so the normalization logic is defined in one place and you don’t rely on each class defining its own helper.
- In both `unzip_file` implementations you repeatedly call `os.path.normpath` inside the loop; you could simplify and reduce redundancy by computing normalized `Path` objects once (e.g., via `pathlib.Path`) and reusing them for item paths.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In
RepoZipUpdator._normalize_archive_root_dir, referencingRepoZipUpdator._normalize_archive_pathdirectly makes it harder to override in subclasses; consider using a@classmethodwithcls._normalize_archive_pathor extracting these helpers to a shared module-level function for better extensibility.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `RepoZipUpdator._normalize_archive_root_dir`, referencing `RepoZipUpdator._normalize_archive_path` directly makes it harder to override in subclasses; consider using a `@classmethod` with `cls._normalize_archive_path` or extracting these helpers to a shared module-level function for better extensibility.
## Individual Comments
### Comment 1
<location path="tests/test_updator_socks.py" line_range="429-438" />
<code_context>
+ archive_root = "AstrBotDevs-AstrBot-39386ee/"
</code_context>
<issue_to_address>
**suggestion (testing):** Add coverage for archive roots without a trailing slash or with nested folder names to fully exercise normalization
These tests currently assume `namelist()[0]` is a single-level folder with a trailing slash (e.g., `"AstrBotDevs-AstrBot-39386ee/"`). Since `_normalize_archive_root_dir` is meant to harden path handling, please add coverage for: (1) `"AstrBotDevs-AstrBot-39386ee"` (no trailing slash), (2) nested roots like `"owner-repo-branch/subdir/"`, and (3) a case where the root normalizes to `"."` so the empty-string fallback is exercised. This will better lock in the normalization semantics and prevent regressions in zip root handling.
Suggested implementation:
```python
@pytest.mark.parametrize(
"archive_root",
[
# Trailing slash single-level folder
"AstrBotDevs-AstrBot-39386ee/",
# Same folder without trailing slash
"AstrBotDevs-AstrBot-39386ee",
# Nested archive root
"owner-repo-branch/subdir/",
# Archive root that normalizes to "." so the empty-string fallback is exercised
".",
],
)
def test_repo_unzip_file_normalizes_windows_extended_length_paths(
monkeypatch: pytest.MonkeyPatch,
archive_root: str,
) -> None:
```
```python
target_dir = r"\\?\\C:\\Users\\admin\\AppData\\Local\\AstrBot\\backend\\app"
```
</issue_to_address>
### Comment 2
<location path="astrbot/core/zip_updator.py" line_range="234" />
<code_context>
return author, repo, branch
raise ValueError("无效的 GitHub URL")
+ @staticmethod
+ def _normalize_archive_path(*parts: str) -> str:
+ return os.path.normpath(os.path.join(*parts))
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying the new path-normalization logic by removing the extra helper and intermediate variables while preserving the updated handling of '.' and archive roots.
The new behavior around `"."` and path normalization is useful, but you can get it with less indirection by:
* Dropping `_normalize_archive_path`
* Keeping a single root-normalization helper
* Using direct `os.path.join` calls instead of multiple intermediate path variables
For example:
```python
@staticmethod
def _normalize_archive_root_dir(entry: str) -> str:
"""Normalize the first zip entry; treat '.' as empty root."""
update_dir = os.path.normpath(entry)
return "" if update_dir == "." else update_dir
```
Then `unzip_file` can stay close to the original structure while preserving your new behavior:
```python
def unzip_file(self, zip_path: str, target_dir: str) -> None:
"""解压缩文件, 并将压缩包内**第一个**文件夹内的文件移动到 target_dir"""
ensure_dir(target_dir)
update_dir = ""
with zipfile.ZipFile(zip_path, "r") as z:
update_dir = self._normalize_archive_root_dir(z.namelist()[0])
z.extractall(target_dir)
logger.debug(f"解压文件完成: {zip_path}")
update_root = os.path.join(target_dir, update_dir)
files = os.listdir(update_root)
for f in files:
update_item_path = os.path.join(update_root, f)
target_item_path = os.path.join(target_dir, f)
if os.path.isdir(update_item_path):
if os.path.exists(target_item_path):
shutil.rmtree(target_item_path, onerror=on_error)
elif os.path.exists(target_item_path):
os.remove(target_item_path)
shutil.move(update_item_path, target_dir)
try:
logger.debug(f"删除临时更新文件: {zip_path} 和 {update_root}")
shutil.rmtree(update_root, onerror=on_error)
os.remove(zip_path)
except BaseException:
logger.warning(
f"删除更新文件失败,可以手动删除 {zip_path} 和 {update_root}",
)
```
This keeps the improved normalization and `"."` handling, but reduces the number of helper methods and intermediate variables, making `unzip_file` easier to scan and reason about.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| archive_root = "AstrBotDevs-AstrBot-39386ee/" | ||
| expected_root = ntpath.normpath(ntpath.join(target_dir, archive_root)) | ||
| expected_file = ntpath.normpath( | ||
| ntpath.join(target_dir, archive_root, ".dockerignore") | ||
| ) | ||
| captured: dict[str, object] = {} | ||
|
|
||
| def fake_listdir(path: str) -> list[str]: | ||
| captured.setdefault("listdir", path) | ||
| return [".dockerignore"] |
There was a problem hiding this comment.
suggestion (testing): Add coverage for archive roots without a trailing slash or with nested folder names to fully exercise normalization
These tests currently assume namelist()[0] is a single-level folder with a trailing slash (e.g., "AstrBotDevs-AstrBot-39386ee/"). Since _normalize_archive_root_dir is meant to harden path handling, please add coverage for: (1) "AstrBotDevs-AstrBot-39386ee" (no trailing slash), (2) nested roots like "owner-repo-branch/subdir/", and (3) a case where the root normalizes to "." so the empty-string fallback is exercised. This will better lock in the normalization semantics and prevent regressions in zip root handling.
Suggested implementation:
@pytest.mark.parametrize(
"archive_root",
[
# Trailing slash single-level folder
"AstrBotDevs-AstrBot-39386ee/",
# Same folder without trailing slash
"AstrBotDevs-AstrBot-39386ee",
# Nested archive root
"owner-repo-branch/subdir/",
# Archive root that normalizes to "." so the empty-string fallback is exercised
".",
],
)
def test_repo_unzip_file_normalizes_windows_extended_length_paths(
monkeypatch: pytest.MonkeyPatch,
archive_root: str,
) -> None: target_dir = r"\\?\\C:\\Users\\admin\\AppData\\Local\\AstrBot\\backend\\app"|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- The
_normalize_archive_root_dirlogic is only defined onRepoZipUpdatorbut is now also used in the plugin updater; consider moving this helper into a shared utility or explicit base class to make the dependency between the two updaters clearer and avoid relying on implicit inheritance/import structure. - In the unzip flows you now call
os.path.normpathboth inside_normalize_archive_root_dirand again when buildingupdate_root_path; if the second normalization is not strictly needed, removing it would simplify the path handling and reduce confusion about where normalization is expected to happen.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `_normalize_archive_root_dir` logic is only defined on `RepoZipUpdator` but is now also used in the plugin updater; consider moving this helper into a shared utility or explicit base class to make the dependency between the two updaters clearer and avoid relying on implicit inheritance/import structure.
- In the unzip flows you now call `os.path.normpath` both inside `_normalize_archive_root_dir` and again when building `update_root_path`; if the second normalization is not strictly needed, removing it would simplify the path handling and reduce confusion about where normalization is expected to happen.
## Individual Comments
### Comment 1
<location path="astrbot/core/zip_updator.py" line_range="244-248" />
<code_context>
logger.info(f"Extracting archive: {zip_path}")
with zipfile.ZipFile(zip_path, "r") as z:
- update_dir = z.namelist()[0]
+ update_dir = self._normalize_archive_root_dir(z.namelist()[0])
z.extractall(target_dir)
- files = os.listdir(os.path.join(target_dir, update_dir))
+ update_root_path = os.path.normpath(os.path.join(target_dir, update_dir))
+ files = os.listdir(update_root_path)
for f in files:
</code_context>
<issue_to_address>
**issue (bug_risk):** Handle the case where `_normalize_archive_root_dir` returns an empty string to avoid deleting/moving from `target_dir` itself.
If `_normalize_archive_root_dir` returns `""` (e.g., first entry normalizes to `"."`), `update_root_path` becomes `target_dir`. Then `os.listdir(update_root_path)` iterates `target_dir` directly, and for each entry you `shutil.rmtree(target_item_path)` and then `shutil.move(update_item_path, target_dir)` where `update_item_path == target_item_path` and has just been deleted. This both breaks the move and removes the extracted content. Please guard this logic when `update_dir` is empty (or `update_root_path == target_dir`), e.g., by skipping the move/delete loop in that case.
</issue_to_address>
### Comment 2
<location path="astrbot/core/star/updator.py" line_range="78-81" />
<code_context>
logger.info(f"Extracting archive: {zip_path}")
with zipfile.ZipFile(zip_path, "r") as z:
- update_dir = z.namelist()[0]
+ update_dir = self._normalize_archive_root_dir(z.namelist()[0])
z.extractall(target_dir)
- files = os.listdir(os.path.join(target_dir, update_dir))
+ update_root_path = os.path.normpath(os.path.join(target_dir, update_dir))
+ files = os.listdir(update_root_path)
for f in files:
</code_context>
<issue_to_address>
**issue (bug_risk):** Apply a similar guard in the star updater when the normalized archive root is empty.
Because `_normalize_archive_root_dir` can return an empty string, `update_root_path` may equal `target_dir`. In that case the loop will operate on `target_dir` itself, deleting its contents and then trying to move files from paths that no longer exist, potentially removing files that should remain. Adding a guard (e.g. `if not update_dir: return` or skipping the move/delete loop when `update_root_path == target_dir`) avoids this while still supporting archives whose top-level entry is a real subdirectory.
</issue_to_address>
### Comment 3
<location path="tests/test_updator_socks.py" line_range="454" />
<code_context>
+ captured.setdefault("listdir", path)
+ return [".dockerignore"]
+
+ monkeypatch.setattr(
+ zip_updator_module.os, "makedirs", lambda path, exist_ok=True: None
+ )
</code_context>
<issue_to_address>
**suggestion (testing):** Assert that the zip file itself is removed, or avoid capturing it if you don’t use it
In `test_repo_unzip_file_normalizes_windows_extended_length_paths`, `os.remove` is monkeypatched to capture the "removed" path, but this value is never asserted. Please either assert that `captured["removed"] == "temp.zip"` to confirm the archive is cleaned up, or remove the monkeypatch and capture if it’s not needed.
Suggested implementation:
```python
expected_root = ntpath.normpath(ntpath.join(target_dir, archive_root))
expected_file = ntpath.normpath(
ntpath.join(target_dir, archive_root, ".dockerignore")
)
# used to capture side effects like listdir path and removed zip archive
captured: dict[str, object] = {"removed": None}
```
In `test_repo_unzip_file_normalizes_windows_extended_length_paths`:
1. Ensure `os.remove` is monkeypatched to populate `captured["removed"]`, e.g.:
```python
monkeypatch.setattr(
zip_updator_module.os,
"remove",
lambda path: captured.__setitem__("removed", path),
)
```
2. After the function under test has been invoked and any other assertions on `expected_root` / `expected_file` are made, add:
```python
assert captured["removed"] == "temp.zip"
```
If you prefer not to assert on removal, instead delete the `os.remove` monkeypatch and any use of `captured["removed"]` so the unnecessary capture is removed entirely.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| captured.setdefault("listdir", path) | ||
| return [".dockerignore"] | ||
|
|
||
| monkeypatch.setattr( |
There was a problem hiding this comment.
suggestion (testing): Assert that the zip file itself is removed, or avoid capturing it if you don’t use it
In test_repo_unzip_file_normalizes_windows_extended_length_paths, os.remove is monkeypatched to capture the "removed" path, but this value is never asserted. Please either assert that captured["removed"] == "temp.zip" to confirm the archive is cleaned up, or remove the monkeypatch and capture if it’s not needed.
Suggested implementation:
expected_root = ntpath.normpath(ntpath.join(target_dir, archive_root))
expected_file = ntpath.normpath(
ntpath.join(target_dir, archive_root, ".dockerignore")
)
# used to capture side effects like listdir path and removed zip archive
captured: dict[str, object] = {"removed": None}In test_repo_unzip_file_normalizes_windows_extended_length_paths:
-
Ensure
os.removeis monkeypatched to populatecaptured["removed"], e.g.:monkeypatch.setattr( zip_updator_module.os, "remove", lambda path: captured.__setitem__("removed", path), )
-
After the function under test has been invoked and any other assertions on
expected_root/expected_fileare made, add:assert captured["removed"] == "temp.zip"
If you prefer not to assert on removal, instead delete the os.remove monkeypatch and any use of captured["removed"] so the unnecessary capture is removed entirely.
Summary
This fixes Windows updater failures caused by using the zip archive root entry directly as a filesystem path fragment during extraction.
On Windows, GitHub zipballs expose root entries like
AstrBotDevs-AstrBot-39386ee/. When the updater joins that POSIX-style fragment onto an extended-length path such as\\?\C:\Users\..., downstream filesystem calls can receive mixed separators and fail withWinError 123. Users then cannot update either AstrBot Core or plugins from the dashboard.The root cause was that both
astrbot/core/zip_updator.pyandastrbot/core/star/updator.pyreusedz.namelist()[0]directly foros.listdir(),shutil.move(), and cleanup paths. That made the updater depend on archive entry formatting instead of normalizing to the local platform path semantics first.This change adds a small shared normalization step in
RepoZipUpdatorand reuses the normalized root path for all downstream filesystem operations in both update flows. The fix stays narrow and does not change download behavior or the overall extraction flow.Testing
I added regression coverage for both updater paths and verified that the tests fail before the fix and pass after it.
Commands used:
uv run pytest tests/test_updator_socks.py -quv run ruff check astrbot/core/zip_updator.py astrbot/core/star/updator.py tests/test_updator_socks.pyuv run ruff format --check astrbot/core/zip_updator.py astrbot/core/star/updator.py tests/test_updator_socks.pyNotes
The new tests are unit-level and mocked, so the remaining residual risk is that they do not exercise a real Windows filesystem end to end. They do verify the specific behavior that was broken here: normalizing extended-length Windows paths before
listdir,move, and cleanup reuse.Summary by Sourcery
Normalize archive root directories when extracting GitHub zipballs to fix Windows updater path handling for core and plugin updates.
Bug Fixes:
Tests: