Skip to content

Interact with lookaside cache without pkg-tool#604

Open
nforro wants to merge 2 commits into
packit:mainfrom
nforro:lookaside
Open

Interact with lookaside cache without pkg-tool#604
nforro wants to merge 2 commits into
packit:mainfrom
nforro:lookaside

Conversation

@nforro

@nforro nforro commented Jun 17, 2026

Copy link
Copy Markdown
Member

No description provided.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the lookaside tools by replacing external subprocess calls to rhpkg/centpkg with direct usage of the pyrpkg library, and removes the PrepSourcesTool. The review feedback highlights two critical path traversal vulnerabilities: one in DownloadSourcesTool where unvalidated filenames from the sources file could lead to arbitrary file writes, and another in UploadSourcesTool where unvalidated input filenames could allow arbitrary file reads and uploads. Additionally, it is recommended to download sources in parallel using asyncio.gather to improve efficiency.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread ymir/tools/privileged/lookaside.py
Comment thread ymir/tools/privileged/lookaside.py
@nforro

nforro commented Jun 17, 2026

Copy link
Copy Markdown
Member Author

/gemini review

@nforro

nforro commented Jun 17, 2026

Copy link
Copy Markdown
Member Author

/agentic_review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the lookaside toolchain by replacing external CLI subprocess calls to rhpkg and centpkg with direct Python API usage via pyrpkg, while also removing the PrepSourcesTool. Feedback on these changes suggests several robustness improvements: explicitly specifying UTF-8 encoding when reading and writing the .gitignore file, gracefully handling missing sources files during download, deduplicating download entries to avoid concurrent write race conditions, and skipping duplicate files during upload to prevent redundant operations.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread ymir/tools/privileged/lookaside.py Outdated
Comment thread ymir/tools/privileged/lookaside.py
Comment thread ymir/tools/privileged/lookaside.py Outdated
Comment thread ymir/tools/privileged/lookaside.py
@nforro

nforro commented Jun 17, 2026

Copy link
Copy Markdown
Member Author

/gemini review

@nforro

nforro commented Jun 17, 2026

Copy link
Copy Markdown
Member Author

/agentic_review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the lookaside tool to use the pyrpkg library directly instead of executing rhpkg and centpkg commands via subprocesses, and removes the PrepSourcesTool. The review feedback highlights critical issues with this transition: first, direct instantiation of CGILookasideCache bypasses the loading of client and CA certificates from system configuration files, which will cause authentication failures in production; second, the path traversal checks should explicitly forbid reading or writing files within the .git directory to prevent potential security vulnerabilities such as arbitrary code execution.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread ymir/tools/privileged/lookaside.py
Comment thread ymir/tools/privileged/lookaside.py
Comment thread ymir/tools/privileged/lookaside.py
Comment thread ymir/tools/privileged/lookaside.py
Comment thread ymir/tools/privileged/lookaside.py Outdated
@nforro

nforro commented Jun 17, 2026

Copy link
Copy Markdown
Member Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the lookaside tool implementation to use the pyrpkg library directly instead of executing rhpkg or centpkg CLI commands via subprocesses. As part of this change, the PrepSourcesTool has been removed, and the DownloadSourcesTool and UploadSourcesTool have been updated to utilize pyrpkg APIs for managing sources and updating .gitignore files. Additionally, path traversal checks have been introduced to prevent unauthorized file access, and unit tests have been thoroughly updated to cover the new logic. I have no further feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread ymir/tools/privileged/lookaside.py Outdated
@nforro

nforro commented Jun 22, 2026

Copy link
Copy Markdown
Member Author

/agentic_review

@qodo-for-packit

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0) 📜 Skill insights (0)

Context used
✅ Compliance rules (platform): 7 rules

Grey Divider


Remediation recommended

1. Unhandled executor exceptions 🐞 Bug ☼ Reliability
Description
DownloadSourcesTool/UploadSourcesTool only translate specific pyrpkg errors into ToolError; other
exceptions from cache.download/hash_file/upload (e.g., OSError, unexpected pyrpkg exceptions) will
bubble up as raw exceptions and can crash the tool with an unhelpful traceback.
Code

ymir/tools/privileged/lookaside.py[R140-148]

+            try:
+                await loop.run_in_executor(
+                    None,
+                    partial(
+                        cache.download, qualified_name, entry.file, entry.hash, str(outfile), entry.hashtype
+                    ),
+                )
+            except pyrpkg.errors.DownloadError as e:
+                raise ToolError(f"Failed to download {entry.file}: {e}") from e
Relevance

⭐⭐⭐ High

Team prioritizes surfacing failures as ToolError with context (lookaside hardening accepted in PR
#596).

PR-#596

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The download path only handles DownloadError around cache.download, and the upload path does not
guard hash_file at all and only handles a subset of upload exceptions; other failures will escape as
non-ToolError exceptions.

ymir/tools/privileged/lookaside.py[124-149]
ymir/tools/privileged/lookaside.py[229-238]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The new lookaside implementation catches only `pyrpkg.errors.DownloadError` (download) and `AlreadyUploadedError`/`UploadError` (upload). Any other exception raised inside `run_in_executor(...)` (e.g., `OSError` from filesystem problems, `TypeError`/`ValueError` from unexpected pyrpkg inputs, etc.) will propagate as a raw exception rather than a `ToolError`, degrading reliability and debuggability.

### Issue Context
Both `DownloadSourcesTool` and `UploadSourcesTool` run blocking pyrpkg calls in the default executor. Those calls can fail for reasons beyond the explicitly handled pyrpkg exception types.

### Fix Focus Areas
- ymir/tools/privileged/lookaside.py[132-149]
- ymir/tools/privileged/lookaside.py[216-240]

### Suggested fix
- In both tools, wrap the `run_in_executor(...)` calls with broader exception handling that:
 - lets existing `ToolError` pass through,
 - catches `Exception` (or at least `OSError`) and re-raises as `ToolError` with actionable context (package, filename, destination path).
- (Optional but helpful) ensure parent directories exist before downloading when `entry.file` contains subdirectories: `outfile.parent.mkdir(parents=True, exist_ok=True)`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Unlimited concurrent downloads 🐞 Bug ➹ Performance
Description
DownloadSourcesTool schedules one executor-backed task per source entry via asyncio.gather with no
concurrency limit, which can overwhelm the threadpool queue and external lookaside service for
packages with many sources.
Code

ymir/tools/privileged/lookaside.py[R150-158]

+        unique_entries = []
+        seen_files: set[str] = set()
+        for entry in sources.entries:
+            if entry.file not in seen_files:
+                seen_files.add(entry.file)
+                unique_entries.append(entry)
+
+        if unique_entries:
+            await asyncio.gather(*(download_entry(entry) for entry in unique_entries))
Relevance

⭐⭐ Medium

No clear precedent on adding asyncio concurrency limits; only related resilience work (retry/jitter)
in PR #581.

PR-#581

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The tool builds a list of all unique source entries and passes them directly to asyncio.gather; each
entry performs a blocking download via run_in_executor, so concurrency is not controlled at the
application level.

ymir/tools/privileged/lookaside.py[129-160]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`asyncio.gather(*(download_entry(...) for entry in unique_entries))` submits all downloads concurrently, and each download uses `run_in_executor(None, ...)`. While the default executor is bounded in worker threads, task submission is unbounded and can still create significant resource contention (many queued executor jobs, many concurrent HTTP requests at the pyrpkg layer, etc.).

### Issue Context
Some dist-git packages can have many source artifacts; submitting them all at once can slow the system and increase failure rates.

### Fix Focus Areas
- ymir/tools/privileged/lookaside.py[129-160]

### Suggested fix
- Use an `asyncio.Semaphore` (e.g., 4–10) to bound in-flight downloads:
 - wrap the body of `download_entry` with `async with sem:`.
- Alternatively, use a small worker pool pattern (queue + N workers) to process downloads in batches.
- Consider making the concurrency configurable via env var for tuning.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Sources overwrite not explicit 🐞 Bug ≡ Correctness
Description
UploadSourcesTool clears the parsed sources file before adding entries for new_sources, but the
input field and tool description describe only uploading specified/new sources; callers that pass
only a subset will silently drop other required sources from the sources file.
Code

ymir/tools/privileged/lookaside.py[R207-213]

+        try:
+            sources = pyrpkg.sources.SourcesFile(str(sources_path), "bsd")
+        except (pyrpkg.errors.MalformedLineError, ValueError, OSError) as e:
+            raise ToolError(f"Failed to parse sources file: {e}") from e
+        sources.entries.clear()
+
+        loop = asyncio.get_running_loop()
Relevance

⭐⭐ Medium

No historical evidence found on whether upload_sources should overwrite vs append sources entries.

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The API describes 'new sources to upload', but the implementation unconditionally clears existing
sources entries and rewrites the sources file based only on the provided list.

ymir/tools/privileged/lookaside.py[163-175]
ymir/tools/privileged/lookaside.py[207-243]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`UploadSourcesTool` performs `sources.entries.clear()` and then writes only the uploaded filenames back. This behavior is not clearly communicated by the API (`new_sources` naming + description) and can break packages with multiple sources if the caller provides only a subset.

### Issue Context
If the intended behavior is "replace sources file contents", the API should make that explicit to prevent misuse; if the intended behavior is additive, the clear() is incorrect.

### Fix Focus Areas
- ymir/tools/privileged/lookaside.py[163-175]
- ymir/tools/privileged/lookaside.py[207-243]

### Suggested fix
Choose one:
1) **Make overwrite semantics explicit** (lowest-risk):
  - Rename `new_sources` to `sources`/`sources_to_set` (or adjust description) and update tool description to state it overwrites `sources`.
2) **Preserve existing sources entries** (behavioral change):
  - Remove `sources.entries.clear()` and instead merge/replace per-filename entries, keeping entries for sources not mentioned in the call.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@nforro nforro force-pushed the lookaside branch 2 times, most recently from 34fcd41 to 2d95670 Compare June 22, 2026 07:16
nforro added 2 commits June 22, 2026 16:53
Signed-off-by: Nikola Forró <nforro@redhat.com>
Assisted-by: Claude Opus 4.6 via Claude Code
Signed-off-by: Nikola Forró <nforro@redhat.com>
Assisted-by: Claude Opus 4.6 via Claude Code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants