docs-refresh: add registry-based external docs deploy check#146
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughThis PR implements a complete system for loading external documentation at browser runtime. It adds registry-based remote markdown sourcing from GitHub repositories with build-time fingerprinting for change detection, GitHub Actions workflows for automated deployment, Mermaid diagram rendering, and integrates remote content into the search index and page rendering pipeline. ChangesExternal documentation and diagram rendering system
Sequence Diagram(s)The key flow illustrated above shows the remote Markdown fetch-and-render process: the component resolves the source (either as a GitHub URL or a registry ID via The fingerprint-based deployment workflow demonstrates how the scheduled job generates a current fingerprint, downloads the deployed fingerprint, compares them, and conditionally triggers a Vercel deployment when they differ—with support for dry-run mode and manual override. Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes This PR introduces substantial new infrastructure: a registry-based external documentation system with multiple build-time generators, a runtime remote-content component with manifest resolution and fallback logic, GitHub Actions workflows with conditional deployment, and Mermaid diagram support. The changes span diverse files and require understanding the integration between registration, build-time generation, runtime fetching, and CI/CD automation. Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (1)
pages/architecture/epds.md (1)
13-13: 🏗️ Heavy liftAvoid hard-coding the fallback GitHub URL here.
The page already points at
epdsvia the registry, so keeping a repo/path URL inline reintroduces drift in the exact failure path this wrapper is supposed to simplify. Please resolve the fallback link from the same registry/manifest contract instead of duplicating it in page content.Based on learnings from the PR objective and review stack context:
docs-sources.ymlis intended to be the central registry for external documentation sources.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pages/architecture/epds.md` at line 13, The page pages/architecture/epds.md currently hard-codes a fallback GitHub URL; replace that inline URL with a lookup from the central docs registry (docs-sources.yml) so the same manifest/registry entry used for the epds source drives the fallback. Update the page to reference the registry key for "epds" (or use the same helper that resolves docs sources) and emit the resolved URL as the fallback link instead of the literal GitHub path; ensure you use the registry/manifest resolution logic where the project already reads docs-sources.yml so the page cannot drift from the canonical source.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/docs-refresh.yml:
- Around line 32-39: The workflow currently uses floating tags
actions/checkout@v4 and actions/setup-node@v4; replace those with their
corresponding immutable commit SHAs (pin the exact commit for actions/checkout
and actions/setup-node) in this file and also mirror the same change in
.github/workflows/docs-refresh-pr-dry-run.yml so both docs workflows use exact
SHAs instead of version tags; update the uses lines for those two steps (the
entries referencing actions/checkout and actions/setup-node) to the exact commit
identifiers.
- Around line 32-33: Add the checkout steps' "with: persist-credentials: false"
option to the actions/checkout@v4 invocation(s) so the "Checkout docs repo" step
(and the equivalent checkout step in the docs-refresh-pr-dry-run workflow) does
not persist the GITHUB_TOKEN to job workspace; locate the actions/checkout@v4
usage in .github/workflows/docs-refresh.yml and
.github/workflows/docs-refresh-pr-dry-run.yml and add with: persist-credentials:
false under that step.
In `@components/CopyRawButton.js`:
- Around line 42-56: The code currently prefers the external registry URL over
the local generated cache for rawUrl; change the selection to prefer the
generated raw cache first by updating the rawUrl assignment to use
frontmatter?.rawUrl || generatedRawUrl || externalRawUrl (instead of placing
externalRawUrl before generatedRawUrl). Also update the useEffect that calls
getExternalDocRawUrl: add generatedRawUrl to its dependency list and
short-circuit (return) if generatedRawUrl exists so we do not fetch the external
registry when a same-origin generated raw page is available; keep using
getExternalDocRawUrl(frontmatter.externalDoc, controller.signal) and
setExternalRawUrl only when generatedRawUrl is absent. Ensure references:
rawUrl, generatedRawUrl, externalRawUrl, useEffect, getExternalDocRawUrl,
setExternalRawUrl, frontmatter.
In `@components/MermaidDiagram.js`:
- Around line 7-15: The cached dynamic import promise mermaidModulePromise can
stay rejected forever; update getMermaid so the import('mermaid') chain handles
rejection by clearing the cache and then rethrowing the error: when creating
mermaidModulePromise inside getMermaid, append a .catch handler that sets
mermaidModulePromise = null and then throws the error (or returns a rejected
promise) so subsequent calls to getMermaid will retry the import.
In `@components/RemoteMarkdown.js`:
- Around line 33-70: The resolveGitHubMarkdownSource function accepts http URLs
causing mixed-content fetch failures; update it to validate the URL protocol is
https by checking url.protocol === 'https:' (for both github.com and
raw.githubusercontent.com branches) and throw a clear error if not HTTPS, so
only secure https://... sourceUrl/rawUrl values are allowed; ensure the error
messages reference the expected HTTPS requirement and keep the rest of the
existing hostname/path validation logic intact.
In `@components/TableOfContents.js`:
- Around line 20-34: The slugifier in collectHeadings can produce duplicate ids
for headings with identical text; fix it by tracking generated IDs (e.g., a Set
or Map) when mapping elements and, after computing the base id in
collectHeadings, check the tracker and append a suffix like `-1`, `-2`, etc.
until the id is unique (also consider existing DOM ids via
document.getElementById if desired), then assign that unique id to el.id and
return it in the items array so keys and anchors are stable.
In `@lib/compare-docs-fingerprint.js`:
- Around line 14-38: The script currently accepts any string from
readCombinedFingerprint() and prints it to stdout, risking workflow-output
injection; update the logic to validate both current and deployed fingerprints
against the expected generator format (exactly "sha256:" followed by 64 hex
chars) before emitting outputs in main(): call a validator (or inline regex) for
the values returned by readCombinedFingerprint(), reject or replace any value
that fails validation (e.g., treat as missing/empty and throw or log an error
via stderr) and ensure you strip/deny any newline/control characters so only a
single safe fingerprint string (or explicit empty/missing marker) is ever
printed by console.log for current_fingerprint and deployed_fingerprint;
reference functions/vars to change: main(), readCombinedFingerprint(), and the
variables current and deployed.
In `@lib/external-docs.js`:
- Around line 58-60: buildRawGitHubUrl currently emits a raw.githubusercontent
URL but no consumer attaches DOCS_SOURCE_TOKEN so private repo fetches fail;
update the fetch plumbing so server/build fetches authenticate: modify the
contract returned by buildRawGitHubUrl (and/or the remoteSource object produced
by generate-docs-fingerprint.js) to include either an auth-aware raw fetch
endpoint (e.g., a GitHub Contents API URL with media type raw) or an
accompanying rawHeaders/rawAuth field that callers can use; then change build
consumers generate-raw-pages.js and generate-search-index.js (and server-side
path used by components/RemoteMarkdown.js fallback) to send Authorization:
`token ${DOCS_SOURCE_TOKEN}` (or the appropriate Bearer header) when
DOCS_SOURCE_TOKEN is present so private repo markdown can be fetched.
In `@lib/generate-raw-pages.js`:
- Around line 49-54: The fetch of remoteSource.rawUrl currently has no timeout
and can hang the build; wrap the fetch call in an AbortSignal.timeout (or create
an AbortController with a setTimeout) and pass the signal to fetch so it aborts
after a configurable short timeout, then centralize the logic that performs the
fetch+error handling into a helper (e.g., fetchWithTimeout) used by both
pipelines so both fail fast; update the error thrown around response (the
existing response.ok check and thrown Error using remoteSource.label and
pagePath) to include abort/timeout errors from the signal and to surface clear
context.
---
Nitpick comments:
In `@pages/architecture/epds.md`:
- Line 13: The page pages/architecture/epds.md currently hard-codes a fallback
GitHub URL; replace that inline URL with a lookup from the central docs registry
(docs-sources.yml) so the same manifest/registry entry used for the epds source
drives the fallback. Update the page to reference the registry key for "epds"
(or use the same helper that resolves docs sources) and emit the resolved URL as
the fallback link instead of the literal GitHub path; ensure you use the
registry/manifest resolution logic where the project already reads
docs-sources.yml so the page cannot drift from the canonical source.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: eb032003-c789-4318-bec5-2a28b1531367
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (23)
.github/workflows/docs-refresh-pr-dry-run.yml.github/workflows/docs-refresh.yml.gitignorecomponents/CopyRawButton.jscomponents/Layout.jscomponents/MermaidDiagram.jscomponents/RemoteMarkdown.jscomponents/TableOfContents.jsdocs-sources.ymldocs/remote-markdown.mdlib/compare-docs-fingerprint.jslib/external-docs.jslib/generate-docs-fingerprint.jslib/generate-external-docs-manifest.jslib/generate-raw-pages.jslib/generate-search-index.jsmarkdoc/nodes/fence.markdoc.jsmarkdoc/tags/index.jsmarkdoc/tags/remote-doc.markdoc.jspackage.jsonpages/_app.jspages/architecture/epds.mdstyles/globals.css
| - name: Checkout docs repo | ||
| uses: actions/checkout@v4 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
files=(
".github/workflows/docs-refresh.yml"
".github/workflows/docs-refresh-pr-dry-run.yml"
)
for f in "${files[@]}"; do
echo "==== $f ===="
if [ ! -f "$f" ]; then
echo "MISSING: $f"
continue
fi
# Show around the referenced lines
lines=$(wc -l < "$f")
start=1
end=$lines
# If files are short, just print whole; else print a window around 1-120
if [ "$lines" -gt 120 ]; then
end=120
fi
# Provide numbered view for context
sed -n "1,${end}p" "$f" | nl -ba | sed -n "1,${end}p"
echo
done
echo "==== Search persist-credentials in workflows ===="
rg -n "persist-credentials" .github/workflows || true
echo "==== Search checkout usage in those workflows ===="
for f in "${files[@]}"; do
echo "--- checkout blocks in $f ---"
rg -n "actions/checkout" "$f" || true
done
echo "==== Search for npm ci lifecycle vicinity ===="
for f in "${files[@]}"; do
echo "--- npm ci blocks in $f ---"
rg -n "npm ci|npm install" "$f" || true
doneRepository: hypercerts-org/documentation
Length of output: 179
🏁 Script executed:
#!/bin/bash
set -euo pipefail
files=(
".github/workflows/docs-refresh.yml"
".github/workflows/docs-refresh-pr-dry-run.yml"
)
for f in "${files[@]}"; do
echo "==== $f ===="
if [ ! -f "$f" ]; then
echo "MISSING: $f"
continue
fi
lines=$(wc -l < "$f")
end=$lines
if [ "$lines" -gt 160 ]; then end=160; fi
# numbered view using cat -n + limit
sed -n "1,${end}p" "$f" | cat -n
echo
done
echo "==== Search persist-credentials in workflows ===="
rg -n "persist-credentials" .github/workflows || true
echo "==== Search checkout blocks in those workflows ===="
for f in "${files[@]}"; do
echo "--- checkout blocks in $f ---"
rg -n "uses:\s*actions/checkout" "$f" || true
rg -n "with:" "$f" || true
done
echo "==== Search for npm ci / npm install in those workflows ===="
for f in "${files[@]}"; do
echo "--- npm ci blocks in $f ---"
rg -n "npm (ci|install)" "$f" || true
doneRepository: hypercerts-org/documentation
Length of output: 9735
🌐 Web query:
actions/checkout@v4 persist-credentials default value stores github.token in git config
💡 Result:
The persist-credentials input for actions/checkout defaults to true [1][2][3]. When set to true (the default), the action configures the authentication token (typically github.token) or SSH key to allow your workflow scripts to run authenticated Git commands [3][4]. While the documentation historically stated this token was persisted directly in the local .git/config file [3][4], recent updates (as of the main branch) have improved this behavior by storing credentials in a separate file under $RUNNER_TEMP and using Git's includeIf functionality to reference them, rather than placing them directly in .git/config [5][6]. Regardless of the exact storage mechanism, the action automatically removes these credentials during the post-job cleanup phase to ensure security [3][4]. You can opt out of this behavior by setting persist-credentials: false in your workflow configuration [3][4].
Citations:
- 1: https://github.com/actions/checkout/blob/v4/action.yml
- 2: https://github.com/actions/checkout/blob/v4.1.1/action.yml
- 3: https://github.com/actions/checkout/blob/v4/README.md
- 4: https://github.com/actions/checkout
- 5: https://github.com/actions/checkout/tree/refs/heads/main
- 6: actions/checkout@069c695
Disable persisted checkout credentials in both docs refresh workflows
Both workflows use actions/checkout@v4 without with: persist-credentials: false; the default persists/configures github.token for authenticated git commands during the job, so later steps (including npm ci / npm lifecycle scripts) can access it. Add with: persist-credentials: false to the actions/checkout@v4 steps in .github/workflows/docs-refresh.yml and .github/workflows/docs-refresh-pr-dry-run.yml.
🧰 Tools
🪛 zizmor (1.25.2)
[warning] 32-33: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false
(artipacked)
[error] 33-33: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)
(unpinned-uses)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/docs-refresh.yml around lines 32 - 33, Add the checkout
steps' "with: persist-credentials: false" option to the actions/checkout@v4
invocation(s) so the "Checkout docs repo" step (and the equivalent checkout step
in the docs-refresh-pr-dry-run workflow) does not persist the GITHUB_TOKEN to
job workspace; locate the actions/checkout@v4 usage in
.github/workflows/docs-refresh.yml and
.github/workflows/docs-refresh-pr-dry-run.yml and add with: persist-credentials:
false under that step.
Source: Linters/SAST tools
| - name: Checkout docs repo | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Setup Node | ||
| uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: 22 | ||
| cache: npm |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# List the workflow files to ensure they exist
ls -la .github/workflows | sed -n '1,200p'
# Show the relevant sections with line numbers
echo "---- docs-refresh.yml (lines 1-120) ----"
nl -ba .github/workflows/docs-refresh.yml | sed -n '1,140p'
echo "---- docs-refresh-pr-dry-run.yml (lines 1-140) ----"
nl -ba .github/workflows/docs-refresh-pr-dry-run.yml | sed -n '1,160p'
# Confirm all usages of actions/checkout and actions/setup-node in workflows
echo "---- occurrences of actions/checkout ----"
rg -n --hidden --glob '.github/workflows/**/*.yml' "uses:\s*actions/checkout@" .github/workflows || true
echo "---- occurrences of actions/setup-node ----"
rg -n --hidden --glob '.github/workflows/**/*.yml' "uses:\s*actions/setup-node@" .github/workflows || trueRepository: hypercerts-org/documentation
Length of output: 428
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- docs-refresh.yml (cat -n lines 25-60) ----"
cat -n .github/workflows/docs-refresh.yml | sed -n '25,70p'
echo "---- docs-refresh-pr-dry-run.yml (cat -n lines 25-70) ----"
cat -n .github/workflows/docs-refresh-pr-dry-run.yml | sed -n '25,80p'
echo "---- occurrences of actions/checkout@ ----"
rg -n --glob '.github/workflows/**/*.yml' "uses:\s*actions/checkout@" .github/workflows || true
echo "---- occurrences of actions/setup-node@ ----"
rg -n --glob '.github/workflows/**/*.yml' "uses:\s*actions/setup-node@" .github/workflows || trueRepository: hypercerts-org/documentation
Length of output: 4282
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- docs-refresh.yml triggers + job context (head 1-120) ----"
cat -n .github/workflows/docs-refresh.yml | sed -n '1,140p'
echo "---- docs-refresh.yml uses: lines (entire file) ----"
rg -n --glob '.github/workflows/docs-refresh.yml' "^\s*uses:\s*[^ ]+" .github/workflows/docs-refresh.yml || true
echo "---- docs-refresh-pr-dry-run.yml triggers + job context (head 1-140) ----"
cat -n .github/workflows/docs-refresh-pr-dry-run.yml | sed -n '1,160p'
echo "---- docs-refresh-pr-dry-run.yml uses: lines (entire file) ----"
rg -n --glob '.github/workflows/docs-refresh-pr-dry-run.yml' "^\s*uses:\s*[^ ]+" .github/workflows/docs-refresh-pr-dry-run.yml || true
echo "---- all occurrences of actions/checkout@v4 across workflows ----"
rg -n --glob '.github/workflows/**/*.yml' "uses:\s*actions/checkout@v4" .github/workflows || true
echo "---- all occurrences of actions/setup-node@v4 across workflows ----"
rg -n --glob '.github/workflows/**/*.yml' "uses:\s*actions/setup-node@v4" .github/workflows || trueRepository: hypercerts-org/documentation
Length of output: 9903
Pin actions/checkout and actions/setup-node to immutable SHAs in both docs workflows. Both .github/workflows/docs-refresh.yml (scheduled, triggers the Vercel deploy hook) and .github/workflows/docs-refresh-pr-dry-run.yml use the floating tags actions/checkout@v4 and actions/setup-node@v4, weakening the supply-chain boundary; pin these actions to exact commit SHAs.
🧰 Tools
🪛 zizmor (1.25.2)
[warning] 32-33: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false
(artipacked)
[error] 33-33: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)
(unpinned-uses)
[error] 36-36: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)
(unpinned-uses)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/docs-refresh.yml around lines 32 - 39, The workflow
currently uses floating tags actions/checkout@v4 and actions/setup-node@v4;
replace those with their corresponding immutable commit SHAs (pin the exact
commit for actions/checkout and actions/setup-node) in this file and also mirror
the same change in .github/workflows/docs-refresh-pr-dry-run.yml so both docs
workflows use exact SHAs instead of version tags; update the uses lines for
those two steps (the entries referencing actions/checkout and
actions/setup-node) to the exact commit identifiers.
Source: Linters/SAST tools
| const rawUrl = frontmatter?.rawUrl || externalRawUrl || generatedRawUrl; | ||
|
|
||
| useEffect(() => { | ||
| if (!frontmatter?.externalDoc || frontmatter?.rawUrl) { | ||
| setExternalRawUrl(null); | ||
| return undefined; | ||
| } | ||
|
|
||
| const controller = new AbortController(); | ||
| getExternalDocRawUrl(frontmatter.externalDoc, controller.signal) | ||
| .then(setExternalRawUrl) | ||
| .catch(() => setExternalRawUrl(null)); | ||
|
|
||
| return () => controller.abort(); | ||
| }, [frontmatter?.externalDoc, frontmatter?.rawUrl]); |
There was a problem hiding this comment.
Prefer the generated /raw/... cache before the remote registry URL.
This now routes external-doc pages through the live rawUrl first, so Copy raw / View raw can fail on GitHub availability, rate limits, or browser cross-origin fetch behavior even when the build already produced a same-origin raw cache for the page. The local generated raw page should stay the primary source, with the registry URL used only as a fallback.
Based on learnings from the PR objective and review stack context: external docs raw pages are generated during builds and are intended to back this runtime flow.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@components/CopyRawButton.js` around lines 42 - 56, The code currently prefers
the external registry URL over the local generated cache for rawUrl; change the
selection to prefer the generated raw cache first by updating the rawUrl
assignment to use frontmatter?.rawUrl || generatedRawUrl || externalRawUrl
(instead of placing externalRawUrl before generatedRawUrl). Also update the
useEffect that calls getExternalDocRawUrl: add generatedRawUrl to its dependency
list and short-circuit (return) if generatedRawUrl exists so we do not fetch the
external registry when a same-origin generated raw page is available; keep using
getExternalDocRawUrl(frontmatter.externalDoc, controller.signal) and
setExternalRawUrl only when generatedRawUrl is absent. Ensure references:
rawUrl, generatedRawUrl, externalRawUrl, useEffect, getExternalDocRawUrl,
setExternalRawUrl, frontmatter.
| function getMermaid() { | ||
| if (!mermaidModulePromise) { | ||
| mermaidModulePromise = import('mermaid').then((module) => { | ||
| const mermaid = module.default || module; | ||
| return mermaid; | ||
| }); | ||
| } | ||
|
|
||
| return mermaidModulePromise; |
There was a problem hiding this comment.
Clear the cached Mermaid import after a load failure.
If the dynamic import('mermaid') rejects once, mermaidModulePromise stays permanently rejected and every later diagram render in the same session fails immediately. Reset the cache in the rejection path so a later retry/navigation can recover.
Suggested fix
function getMermaid() {
if (!mermaidModulePromise) {
- mermaidModulePromise = import('mermaid').then((module) => {
- const mermaid = module.default || module;
- return mermaid;
- });
+ mermaidModulePromise = import('mermaid')
+ .then((module) => {
+ const mermaid = module.default || module;
+ return mermaid;
+ })
+ .catch((error) => {
+ mermaidModulePromise = undefined;
+ throw error;
+ });
}
return mermaidModulePromise;
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@components/MermaidDiagram.js` around lines 7 - 15, The cached dynamic import
promise mermaidModulePromise can stay rejected forever; update getMermaid so the
import('mermaid') chain handles rejection by clearing the cache and then
rethrowing the error: when creating mermaidModulePromise inside getMermaid,
append a .catch handler that sets mermaidModulePromise = null and then throws
the error (or returns a rejected promise) so subsequent calls to getMermaid will
retry the import.
| function resolveGitHubMarkdownSource(source) { | ||
| const url = new URL(source); | ||
|
|
||
| if (url.hostname === 'github.com') { | ||
| const [, owner, repo, marker, ref, ...fileParts] = url.pathname.split('/'); | ||
| if (owner?.toLowerCase() !== SOURCE_ORG || marker !== 'blob' || !repo || !ref || fileParts.length === 0) { | ||
| throw new Error('Remote docs must use a hypercerts-org GitHub blob URL, for example https://github.com/hypercerts-org/ePDS/blob/main/docs/tutorial.md.'); | ||
| } | ||
|
|
||
| const filePath = fileParts.join('/'); | ||
| return { | ||
| sourceUrl: url.toString(), | ||
| rawUrl: `https://raw.githubusercontent.com/${owner}/${repo}/${ref}/${filePath}`, | ||
| owner, | ||
| repo, | ||
| ref, | ||
| filePath, | ||
| }; | ||
| } | ||
|
|
||
| if (url.hostname === 'raw.githubusercontent.com') { | ||
| const [, owner, repo, ref, ...fileParts] = url.pathname.split('/'); | ||
| if (owner?.toLowerCase() !== SOURCE_ORG || !repo || !ref || fileParts.length === 0) { | ||
| throw new Error('Remote docs must use a raw.githubusercontent.com URL under hypercerts-org.'); | ||
| } | ||
|
|
||
| const filePath = fileParts.join('/'); | ||
| return { | ||
| sourceUrl: `https://github.com/${owner}/${repo}/blob/${ref}/${filePath}`, | ||
| rawUrl: url.toString(), | ||
| owner, | ||
| repo, | ||
| ref, | ||
| filePath, | ||
| }; | ||
| } | ||
|
|
||
| throw new Error('Remote docs can only be loaded from github.com or raw.githubusercontent.com.'); |
There was a problem hiding this comment.
Reject non-HTTPS direct source URLs.
This path currently accepts http://github.com/... and http://raw.githubusercontent.com/... as valid sources, but the docs site is served over HTTPS, so those requests become mixed-content fetches and fail after passing validation. Fail fast here and require https: for direct URLs.
Suggested fix
function resolveGitHubMarkdownSource(source) {
const url = new URL(source);
+
+ if (url.protocol !== 'https:') {
+ throw new Error('Remote docs must use an HTTPS GitHub URL.');
+ }
if (url.hostname === 'github.com') {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@components/RemoteMarkdown.js` around lines 33 - 70, The
resolveGitHubMarkdownSource function accepts http URLs causing mixed-content
fetch failures; update it to validate the URL protocol is https by checking
url.protocol === 'https:' (for both github.com and raw.githubusercontent.com
branches) and throw a clear error if not HTTPS, so only secure https://...
sourceUrl/rawUrl values are allowed; ensure the error messages reference the
expected HTTPS requirement and keep the rest of the existing hostname/path
validation logic intact.
| const collectHeadings = () => { | ||
| const elements = article.querySelectorAll("h2, h3, h4"); | ||
| const items = Array.from(elements).map((el) => { | ||
| if (!el.id) { | ||
| el.id = el.textContent | ||
| .toLowerCase() | ||
| .replace(/[^a-z0-9]+/g, "-") | ||
| .replace(/(^-|-$)/g, ""); | ||
| } | ||
| return { | ||
| id: el.id, | ||
| text: el.textContent, | ||
| level: Number(el.tagName.slice(1)), | ||
| }; | ||
| }); |
There was a problem hiding this comment.
Deduplicate synthesized heading IDs.
When two runtime headings share the same text, this slugifier emits the same id for both. That breaks anchor navigation, scroll tracking, and the React key={id} mapping because every TOC entry after the first points at the wrong node.
♻️ Suggested fix
const collectHeadings = () => {
const elements = article.querySelectorAll("h2, h3, h4");
+ const seenIds = new Map();
const items = Array.from(elements).map((el) => {
if (!el.id) {
- el.id = el.textContent
+ const baseId = el.textContent
.toLowerCase()
.replace(/[^a-z0-9]+/g, "-")
.replace(/(^-|-$)/g, "");
+ const count = seenIds.get(baseId) ?? 0;
+ seenIds.set(baseId, count + 1);
+ el.id = count === 0 ? baseId : `${baseId}-${count + 1}`;
}
return {
id: el.id,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@components/TableOfContents.js` around lines 20 - 34, The slugifier in
collectHeadings can produce duplicate ids for headings with identical text; fix
it by tracking generated IDs (e.g., a Set or Map) when mapping elements and,
after computing the base id in collectHeadings, check the tracker and append a
suffix like `-1`, `-2`, etc. until the id is unique (also consider existing DOM
ids via document.getElementById if desired), then assign that unique id to el.id
and return it in the items array so keys and anchors are stable.
| return typeof parsed.combinedFingerprint === 'string' ? parsed.combinedFingerprint : ''; | ||
| } | ||
|
|
||
| function main() { | ||
| const [currentPath, deployedPath] = process.argv.slice(2); | ||
| if (!currentPath || !deployedPath) { | ||
| throw new Error('Usage: node lib/compare-docs-fingerprint.js <current-docs-fingerprint.json> <deployed-docs-fingerprint.json>'); | ||
| } | ||
|
|
||
| const current = readCombinedFingerprint(currentPath, 'current'); | ||
| if (!current) { | ||
| throw new Error(`Current fingerprint file ${currentPath} does not contain combinedFingerprint.`); | ||
| } | ||
|
|
||
| const deployed = readCombinedFingerprint(deployedPath, 'deployed'); | ||
| const changed = current !== deployed; | ||
|
|
||
| console.error(changed | ||
| ? `External docs changed: deployed=${deployed || '<missing>'} current=${current}` | ||
| : `External docs unchanged: ${current}`); | ||
|
|
||
| console.log(`changed=${changed ? 'true' : 'false'}`); | ||
| console.log(`current_fingerprint=${current}`); | ||
| console.log(`deployed_fingerprint=${deployed}`); | ||
| console.log(`reason=${changed ? 'combined fingerprint differs' : 'combined fingerprint matches deployed site'}`); |
There was a problem hiding this comment.
Reject malformed combinedFingerprint values before emitting workflow outputs.
deployed comes from a downloaded public JSON file, but this script accepts any string and writes it verbatim to stdout, which both workflows append to $GITHUB_OUTPUT. A newline-bearing value can inject extra outputs or poison later shell interpolation. The generator in lib/generate-docs-fingerprint.js Line 31 only emits sha256: plus 64 hex chars, so anything else should be treated as invalid instead of passed through.
Proposed fix
function readCombinedFingerprint(path, label) {
let parsed;
try {
parsed = JSON.parse(readFileSync(path, 'utf8'));
} catch (error) {
throw new Error(`Unable to read ${label} fingerprint at ${path}: ${error.message}`);
}
- return typeof parsed.combinedFingerprint === 'string' ? parsed.combinedFingerprint : '';
+ if (parsed.combinedFingerprint == null) {
+ return '';
+ }
+
+ if (
+ typeof parsed.combinedFingerprint !== 'string' ||
+ !/^sha256:[a-f0-9]{64}$/.test(parsed.combinedFingerprint)
+ ) {
+ throw new Error(`${label} fingerprint file ${path} has an invalid combinedFingerprint.`);
+ }
+
+ return parsed.combinedFingerprint;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return typeof parsed.combinedFingerprint === 'string' ? parsed.combinedFingerprint : ''; | |
| } | |
| function main() { | |
| const [currentPath, deployedPath] = process.argv.slice(2); | |
| if (!currentPath || !deployedPath) { | |
| throw new Error('Usage: node lib/compare-docs-fingerprint.js <current-docs-fingerprint.json> <deployed-docs-fingerprint.json>'); | |
| } | |
| const current = readCombinedFingerprint(currentPath, 'current'); | |
| if (!current) { | |
| throw new Error(`Current fingerprint file ${currentPath} does not contain combinedFingerprint.`); | |
| } | |
| const deployed = readCombinedFingerprint(deployedPath, 'deployed'); | |
| const changed = current !== deployed; | |
| console.error(changed | |
| ? `External docs changed: deployed=${deployed || '<missing>'} current=${current}` | |
| : `External docs unchanged: ${current}`); | |
| console.log(`changed=${changed ? 'true' : 'false'}`); | |
| console.log(`current_fingerprint=${current}`); | |
| console.log(`deployed_fingerprint=${deployed}`); | |
| console.log(`reason=${changed ? 'combined fingerprint differs' : 'combined fingerprint matches deployed site'}`); | |
| function readCombinedFingerprint(path, label) { | |
| let parsed; | |
| try { | |
| parsed = JSON.parse(readFileSync(path, 'utf8')); | |
| } catch (error) { | |
| throw new Error(`Unable to read ${label} fingerprint at ${path}: ${error.message}`); | |
| } | |
| if (parsed.combinedFingerprint == null) { | |
| return ''; | |
| } | |
| if ( | |
| typeof parsed.combinedFingerprint !== 'string' || | |
| !/^sha256:[a-f0-9]{64}$/.test(parsed.combinedFingerprint) | |
| ) { | |
| throw new Error(`${label} fingerprint file ${path} has an invalid combinedFingerprint.`); | |
| } | |
| return parsed.combinedFingerprint; | |
| } | |
| function main() { | |
| const [currentPath, deployedPath] = process.argv.slice(2); | |
| if (!currentPath || !deployedPath) { | |
| throw new Error('Usage: node lib/compare-docs-fingerprint.js <current-docs-fingerprint.json> <deployed-docs-fingerprint.json>'); | |
| } | |
| const current = readCombinedFingerprint(currentPath, 'current'); | |
| if (!current) { | |
| throw new Error(`Current fingerprint file ${currentPath} does not contain combinedFingerprint.`); | |
| } | |
| const deployed = readCombinedFingerprint(deployedPath, 'deployed'); | |
| const changed = current !== deployed; | |
| console.error(changed | |
| ? `External docs changed: deployed=${deployed || '<missing>'} current=${current}` | |
| : `External docs unchanged: ${current}`); | |
| console.log(`changed=${changed ? 'true' : 'false'}`); | |
| console.log(`current_fingerprint=${current}`); | |
| console.log(`deployed_fingerprint=${deployed}`); | |
| console.log(`reason=${changed ? 'combined fingerprint differs' : 'combined fingerprint matches deployed site'}`); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lib/compare-docs-fingerprint.js` around lines 14 - 38, The script currently
accepts any string from readCombinedFingerprint() and prints it to stdout,
risking workflow-output injection; update the logic to validate both current and
deployed fingerprints against the expected generator format (exactly "sha256:"
followed by 64 hex chars) before emitting outputs in main(): call a validator
(or inline regex) for the values returned by readCombinedFingerprint(), reject
or replace any value that fails validation (e.g., treat as missing/empty and
throw or log an error via stderr) and ensure you strip/deny any newline/control
characters so only a single safe fingerprint string (or explicit empty/missing
marker) is ever printed by console.log for current_fingerprint and
deployed_fingerprint; reference functions/vars to change: main(),
readCombinedFingerprint(), and the variables current and deployed.
| function buildRawGitHubUrl(source, filePath) { | ||
| return `https://raw.githubusercontent.com/${source.owner}/${source.repoName}/${encodeURIComponent(source.branch)}/${encodeGitHubPath(filePath)}`; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does raw.githubusercontent.com support authenticated access to private repository files with a token, or should GitHub's Contents API/raw media type be used to fetch raw file contents from a private repo at a specific ref?
💡 Result:
raw.githubusercontent.com can be used for private repo files with authentication, but you generally cannot rely on a “token as a query parameter” approach (e.g.,?token=...); instead you authenticate the HTTP request using an Authorization header (or basic auth style URL credentialing). For example, multiple sources note that a working approach is to send an Authorization header when fetching from raw.githubusercontent.com. For private repos, one option shown is Authorization: token <PERSONAL_ACCESS_TOKEN> when requesting the raw file URL.[1][2] If you want the most “standard” and documented way to retrieve file contents at a specific ref for private repos, use GitHub’s REST “Contents” API with the raw media type. The GitHub Docs for the Contents endpoint state that you can GET repository contents and request the raw file contents using the raw media type (application/vnd.github.raw+json). [3][4] Practical guidance: 1) If you can make authenticated HTTP requests with headers: either approach can work, but use Authorization headers with raw.githubusercontent.com (not query params) for private files.[1][2] 2) If you need a stable, API-style way to fetch contents at an explicit ref/path: use GET /repos/{owner}/{repo}/contents/{path} with an Authorization header and Accept: application/vnd.github.raw+json (raw media type).[3][4] Example (API raw): GET https://api.github.com/repos/{owner}/{repo}/contents/{path}?ref={branch-or-tag} Headers: Authorization: Bearer <TOKEN> and Accept: application/vnd.github.raw+json.[3][4] Example (raw host with header): GET https://raw.githubusercontent.com/{owner}/{repo}/{ref}/{path} with header Authorization: token <TOKEN>.[1][2] Answer to your question: raw.githubusercontent.com does support authenticated access to private repo files, but for robustness and clarity—especially when selecting a specific ref—GitHub’s Contents API using the raw media type is the safer, officially documented choice.[1][2][3][4]
Citations:
- 1: https://stackoverflow.com/questions/18126559/how-can-i-download-a-single-raw-file-from-a-private-github-repo-using-the-comman
- 2: https://stackoverflow.com/questions/53083479/wget-a-raw-file-from-github-from-a-private-repo
- 3: https://docs.github.com/rest/repos/contents
- 4: https://docs.github.com/en/rest/repos/contents
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Locate the function in lib/external-docs.js =="
nl -ba lib/external-docs.js | sed -n '40,120p'
echo
echo "== Find all references to buildRawGitHubUrl / rawUrl =="
rg -n --hidden --no-ignore-vcs "buildRawGitHubUrl|rawUrl|resolveFrontmatterRawSource|DOCS_SOURCE_TOKEN" .
echo
echo "== Inspect build-time consumer(s) that fetch remoteSource.rawUrl (if any) =="
# Try to find common fetch sites by looking for raw.githubusercontent.com usage and for remoteSource.rawUrl usage.
rg -n --hidden --no-ignore-vcs "remoteSource\.rawUrl|raw\.githubusercontent\.com" .
echo
echo "== Inspect the README / docs for DOCS_SOURCE_TOKEN expectations (if present) =="
rg -n --hidden --no-ignore-vcs "DOCS_SOURCE_TOKEN" README* docs* . || trueRepository: hypercerts-org/documentation
Length of output: 183
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Show lib/external-docs.js around Lines ~50-80 =="
awk 'NR>=45 && NR<=90 {printf "%d\t%s\n", NR, $0}' lib/external-docs.js | sed -n '1,200p'
echo
echo "== Find all references to buildRawGitHubUrl / rawUrl / DOCS_SOURCE_TOKEN =="
rg -n --hidden --no-ignore-vcs "buildRawGitHubUrl|remoteSource\.rawUrl|raw\.githubusercontent\.com|resolveFrontmatterRawSource|DOCS_SOURCE_TOKEN" .
echo
echo "== Check where remoteSource/rawUrl is actually fetched =="
rg -n --hidden --no-ignore-vcs "fetch\(|axios\.|undici|node-fetch|got\(|http(s)?://" . | head -n 200
echo
echo "== Inspect docs/README mentions of DOCS_SOURCE_TOKEN (if present) =="
rg -n --hidden --no-ignore-vcs "DOCS_SOURCE_TOKEN" README* docs* . || trueRepository: hypercerts-org/documentation
Length of output: 50452
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== DOCS_SOURCE_TOKEN usages =="
rg -n "DOCS_SOURCE_TOKEN" .
echo
echo "== lib/generate-raw-pages.js (fetch area) =="
awk 'NR>=1 && NR<=120 {print NR"\t"$0}' lib/generate-raw-pages.js | sed -n '1,160p'
echo
echo "== lib/generate-search-index.js (fetch area) =="
awk 'NR>=70 && NR<=160 {print NR"\t"$0}' lib/generate-search-index.js | sed -n '1,200p'
echo
echo "== lib/generate-docs-fingerprint.js (token usage) =="
awk 'NR>=1 && NR<=180 {print NR"\t"$0}' lib/generate-docs-fingerprint.js | sed -n '1,220p'
echo
echo "== lib/external-docs.js (resolveFrontmatterRawSource + rawUrl contract) =="
awk 'NR>=110 && NR<=260 {print NR"\t"$0}' lib/external-docs.js | sed -n '1,260p'
echo
echo "== components/RemoteMarkdown.js (rawUrl usage) =="
awk 'NR>=1 && NR<=130 {print NR"\t"$0}' components/RemoteMarkdown.js | sed -n '1,220p'Repository: hypercerts-org/documentation
Length of output: 21364
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== components/RemoteMarkdown.js: find fetch usage and headers =="
rg -n "fetch\\(|Authorization|headers" components/RemoteMarkdown.js
echo
echo "== Show components/RemoteMarkdown.js around the fetch call =="
# Print region where rawUrl is fetched (best-effort: search for ".rawUrl" usage)
rg -n ".rawUrl" components/RemoteMarkdown.js | head -n 50
# Show a likely region (lines 120-220)
awk 'NR>=120 && NR<=240 {printf "%d\t%s\n", NR, $0}' components/RemoteMarkdown.js | sed -n '1,200p'
echo
echo "== docs/remote-markdown.md around DOCS_SOURCE_TOKEN and rawUrl/runtime behavior =="
awk 'NR>=1 && NR<=120 {print NR"\t"$0}' docs/remote-markdown.md | sed -n '1,140p'
echo
echo "== .github/workflows/docs-refresh.yml: show relevant steps =="
awk 'NR>=1 && NR<=140 {print NR"\t"$0}' .github/workflows/docs-refresh.yml | sed -n '1,180p'Repository: hypercerts-org/documentation
Length of output: 13586
Plumb DOCS_SOURCE_TOKEN into remote Markdown fetching (private repos currently can’t work).
buildRawGitHubUrl returns a raw.githubusercontent.com/... URL, but both build-time consumers (lib/generate-raw-pages.js and lib/generate-search-index.js) and the browser runtime (components/RemoteMarkdown.js) fetch remoteSource.rawUrl via fetch(...) without an Authorization header—so the promised private-repo support in docs/remote-markdown.md won’t function. DOCS_SOURCE_TOKEN is only used for GitHub API metadata in lib/generate-docs-fingerprint.js, not for fetching the Markdown content itself. Fix by changing the fetch contract/plumbing so build/server fetching can authenticate (e.g., Authorization header for raw, or GitHub Contents API raw media) and rely on /raw for the browser fallback.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lib/external-docs.js` around lines 58 - 60, buildRawGitHubUrl currently emits
a raw.githubusercontent URL but no consumer attaches DOCS_SOURCE_TOKEN so
private repo fetches fail; update the fetch plumbing so server/build fetches
authenticate: modify the contract returned by buildRawGitHubUrl (and/or the
remoteSource object produced by generate-docs-fingerprint.js) to include either
an auth-aware raw fetch endpoint (e.g., a GitHub Contents API URL with media
type raw) or an accompanying rawHeaders/rawAuth field that callers can use; then
change build consumers generate-raw-pages.js and generate-search-index.js (and
server-side path used by components/RemoteMarkdown.js fallback) to send
Authorization: `token ${DOCS_SOURCE_TOKEN}` (or the appropriate Bearer header)
when DOCS_SOURCE_TOKEN is present so private repo markdown can be fetched.
| const response = await fetch(remoteSource.rawUrl, { cache: 'no-store' }); | ||
| if (!response.ok) { | ||
| throw new Error(`Failed to fetch ${remoteSource.label} for ${pagePath}: ${remoteSource.rawUrl} returned ${response.status} ${response.statusText || ''}`.trim()); | ||
| } | ||
|
|
||
| return response.text(); |
There was a problem hiding this comment.
Add a timeout to the required remote-doc fetches.
Both of these fetches run inside npm run generate, so a stalled GitHub/raw connection can hang npm run dev and npm run build indefinitely. Please add AbortSignal.timeout(...) (or an equivalent controller) and centralize the fetch/error handling so both pipelines fail fast instead of blocking the whole build.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lib/generate-raw-pages.js` around lines 49 - 54, The fetch of
remoteSource.rawUrl currently has no timeout and can hang the build; wrap the
fetch call in an AbortSignal.timeout (or create an AbortController with a
setTimeout) and pass the signal to fetch so it aborts after a configurable short
timeout, then centralize the logic that performs the fetch+error handling into a
helper (e.g., fetchWithTimeout) used by both pipelines so both fail fast; update
the error thrown around response (the existing response.ok check and thrown
Error using remoteSource.label and pagePath) to include abort/timeout errors
from the signal and to surface clear context.
Summary
Testing
Notes
Summary by CodeRabbit
New Features
Documentation