feat(seo): noindex non-production hosts; add staging deploy workflow#42
feat(seo): noindex non-production hosts; add staging deploy workflow#42JohnRDOrazio merged 5 commits intomainfrom
Conversation
Two SEO/infra improvements that pair naturally: 1. middleware.ts — Sets X-Robots-Tag: noindex, nofollow on every response served from a host other than catholicdigitalcommons.org (or its www. alias). Catches staging.catholicdigitalcommons.org, any preview / one-off subdomain, raw IP probes, etc. Allowlist over blocklist so new hostnames default to safe. Pairs with the existing rel=canonical via metadataBase in layout.tsx (which already points all canonical URLs to catholicdigitalcommons.org), so search engines get a coherent "this is not the canonical site" signal at both the header and HTML levels. Matcher excludes _next/static and _next/image only — robots.txt and sitemap responses still carry the noindex header on staging. 2. .github/workflows/deploy-staging.yml — On-demand (workflow_dispatch) workflow that builds the Next.js standalone bundle and pushes it to staging via the same VPS / SSH path as production, but to a separate VPS_STAGING_APP_DIR. Deploys the Next.js bundle ONLY. Staging shares the same WordPress backend (cms.catholicdigitalcommons.org) as production, so pushing the headless theme or cdcf-redis-translations plugin from this workflow would overwrite production code. Theme/plugin changes still go through the existing Build and Deploy (production) workflow. All ${{ secrets.* }} / ${{ vars.* }} interpolations are routed through env: blocks per workflow-injection hardening guidance. Required new GitHub secret: VPS_STAGING_APP_DIR (path on the VPS, e.g. /home/.../staging.catholicdigitalcommons.org/). Once this lands and is run once, search engines will start to drop the staging URLs from their indexes. Submit URL Removals via Google Search Console for any already-indexed staging pages to speed that up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds a manual GitHub Actions workflow to build a Next.js standalone bundle and deploy it to a VPS over SSH with retry logic. Adds a Next.js middleware that sets Changes
Sequence Diagram(s)sequenceDiagram
participant Dev as Developer
participant GH as GitHub Actions (Runner)
participant Repo as Repository (code & secrets)
participant VPS as Staging VPS
rect rgba(66,135,245,0.5)
Dev->>GH: Manual dispatch "Build and Deploy to Staging"
GH->>Repo: Read .nvmrc, env vars, secrets
GH->>GH: npm ci, restore/cache .next/cache, npm run build (WP_GRAPHQL_URL)
GH->>GH: Prepare .next/standalone, copy static/public, tarball /tmp/nextjs-bundle.tar.gz
GH->>GH: Write SSH private key, add VPS host key
end
rect rgba(76,175,80,0.5)
GH->>VPS: scp /tmp/nextjs-bundle.tar.gz -> /tmp/staging-bundle.tar.gz (3 attempts)
GH->>VPS: ssh to create staging dir, extract tarball, rm tarball, touch tmp/restart.txt (3 attempts)
VPS-->>GH: Success / Failure
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 27 minutes and 30 seconds.Comment |
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 9 |
| Duplication | 0 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
.github/workflows/deploy-staging.yml (1)
67-71: ⚡ Quick winAdd connection timeouts to
scp/sshretry loops to avoid hung attempts.Retries exist, but each attempt can still stall for a long time without explicit timeout options.
Suggested update
- if scp -i ~/.ssh/deploy_key \ + if scp -o BatchMode=yes -o ConnectTimeout=15 -o ConnectionAttempts=1 -i ~/.ssh/deploy_key \ /tmp/nextjs-bundle.tar.gz \ "${VPS_USERNAME}@${VPS_HOST}:/tmp/staging-bundle.tar.gz"; then ... - if ssh -i ~/.ssh/deploy_key \ + if ssh -o BatchMode=yes -o ConnectTimeout=15 -o ConnectionAttempts=1 -i ~/.ssh/deploy_key \ "${VPS_USERNAME}@${VPS_HOST}" \Also applies to: 89-90
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/deploy-staging.yml around lines 67 - 71, The SCP/SSH retry loop (the for attempt in 1 2 3; do ... scp ...; then and the later ssh calls) can hang because no connection timeouts are set; update the scp command and any ssh invocations to include SSH timeout options (e.g., scp -o ConnectTimeout=10 -o ServerAliveInterval=15 -o ServerAliveCountMax=2 ...) and similarly for ssh (ssh -o ConnectTimeout=10 -o ServerAliveInterval=15 -o ServerAliveCountMax=2 ...) so each attempt fails fast and the retry loop proceeds.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/deploy-staging.yml:
- Around line 52-60: The workflow currently trusts whatever key ssh-keyscan
returns in the "Setup SSH key" step; instead add a new secret (e.g.
VPS_HOST_KEY) and pin/verify the host key: write the pinned key (from secret
VPS_HOST_KEY) into ~/.ssh/known_hosts (or run ssh-keyscan and compare its output
to VPS_HOST_KEY and fail if they differ) rather than blindly appending
ssh-keyscan output; update references in that step (env names VPS_HOST, new
VPS_HOST_KEY, files ~/.ssh/deploy_key and ~/.ssh/known_hosts) so the job aborts
on a mismatch to prevent MITM.
- Around line 41-45: The workflow currently sets WP_GRAPHQL_URL only during the
"Build Next.js" step but does not provision runtime environment variables on the
VPS, causing API routes (e.g., app/api/submit-project/route.ts and
app/api/refer-community-project/route.ts) to fail; update the deploy workflow so
that during the "Extract and restart" stage (the steps that extract the bundle
and restart the app) you either write required keys (WP_GRAPHQL_URL,
WP_REST_URL, WP_APP_USERNAME, WP_APP_PASSWORD, WP_PREVIEW_SECRET) into the VPS
.env.local or ensure those variables are exported in the remote shell before
starting the app, and make the change in the workflow around the "Build Next.js"
and extract/restart steps so runtime envs are present on the server.
In `@middleware.ts`:
- Around line 7-16: The middleware currently uses the hardcoded PRODUCTION_HOSTS
set and parses request.headers.get('host'), which can desync from
NEXT_PUBLIC_SITE_URL; update the middleware function to derive the allowed
host(s) from process.env.NEXT_PUBLIC_SITE_URL (e.g., new
URL(process.env.NEXT_PUBLIC_SITE_URL).hostname) and use request.nextUrl.hostname
(lowercased) for the incoming host check; handle missing/invalid
NEXT_PUBLIC_SITE_URL by falling back to the existing set or skipping the noindex
behavior so you don't accidentally block production, and replace references to
PRODUCTION_HOSTS.has(host) with the derived hostname comparison.
---
Nitpick comments:
In @.github/workflows/deploy-staging.yml:
- Around line 67-71: The SCP/SSH retry loop (the for attempt in 1 2 3; do ...
scp ...; then and the later ssh calls) can hang because no connection timeouts
are set; update the scp command and any ssh invocations to include SSH timeout
options (e.g., scp -o ConnectTimeout=10 -o ServerAliveInterval=15 -o
ServerAliveCountMax=2 ...) and similarly for ssh (ssh -o ConnectTimeout=10 -o
ServerAliveInterval=15 -o ServerAliveCountMax=2 ...) so each attempt fails fast
and the retry loop proceeds.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d813bff5-16a9-4ef9-825a-fe10fc9052d8
📒 Files selected for processing (2)
.github/workflows/deploy-staging.ymlmiddleware.ts
| - name: Build Next.js | ||
| run: npm run build | ||
| env: | ||
| WP_GRAPHQL_URL: ${{ vars.WP_GRAPHQL_URL }} | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify runtime dependency on WP_GRAPHQL_URL and where this workflow sets it.
rg -n 'process\.env\.WP_GRAPHQL_URL' -C2
rg -n 'WP_GRAPHQL_URL' .github/workflows/deploy-staging.yml -C2Repository: CatholicOS/cdcf-website
Length of output: 3067
🏁 Script executed:
# Examine the full deploy-staging.yml workflow, especially lines 82-95 and deployment steps
wc -l .github/workflows/deploy-staging.ymlRepository: CatholicOS/cdcf-website
Length of output: 108
🏁 Script executed:
# Read the complete workflow to understand how staging is deployed and where env vars are set
cat -n .github/workflows/deploy-staging.ymlRepository: CatholicOS/cdcf-website
Length of output: 4438
🏁 Script executed:
# Search for where WP_GRAPHQL_URL and other env vars are provisioned to staging/VPS
rg -n 'WP_GRAPHQL_URL|\.env|staging' .github/workflows/ -C1Repository: CatholicOS/cdcf-website
Length of output: 1776
Staging deployment doesn't provision runtime environment variables; WP_GRAPHQL_URL and other required vars must exist on VPS or build will fail at runtime.
The workflow sets WP_GRAPHQL_URL only during the build step (line 44), but multiple API routes (app/api/submit-project/route.ts, app/api/refer-community-project/route.ts, and others) read it at runtime. If the staging VPS lacks .env.local with these variables, requests to these endpoints return 500 "Server configuration error" despite a successful deployment.
Lines 82–95 extract the bundle and restart the app, but never provision the required environment variables (WP_GRAPHQL_URL, WP_REST_URL, WP_APP_USERNAME, WP_APP_PASSWORD, WP_PREVIEW_SECRET). These must either exist on the VPS beforehand or be provisioned during deployment.
Suggested safeguard
+ - name: Validate required staging config
+ env:
+ WP_GRAPHQL_URL: ${{ vars.WP_GRAPHQL_URL }}
+ VPS_STAGING_APP_DIR: ${{ secrets.VPS_STAGING_APP_DIR }}
+ run: |
+ test -n "$WP_GRAPHQL_URL" || { echo "::error::Missing vars.WP_GRAPHQL_URL"; exit 1; }
+ test -n "$VPS_STAGING_APP_DIR" || { echo "::error::Missing secrets.VPS_STAGING_APP_DIR"; exit 1; }Alternatively, provision environment variables to the VPS .env.local file during the Extract and restart step.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/deploy-staging.yml around lines 41 - 45, The workflow
currently sets WP_GRAPHQL_URL only during the "Build Next.js" step but does not
provision runtime environment variables on the VPS, causing API routes (e.g.,
app/api/submit-project/route.ts and app/api/refer-community-project/route.ts) to
fail; update the deploy workflow so that during the "Extract and restart" stage
(the steps that extract the bundle and restart the app) you either write
required keys (WP_GRAPHQL_URL, WP_REST_URL, WP_APP_USERNAME, WP_APP_PASSWORD,
WP_PREVIEW_SECRET) into the VPS .env.local or ensure those variables are
exported in the remote shell before starting the app, and make the change in the
workflow around the "Build Next.js" and extract/restart steps so runtime envs
are present on the server.
Three findings verified and fixed; one (runtime env provisioning) intentionally deferred — see notes below. middleware.ts: - Derive allowed-host set from NEXT_PUBLIC_SITE_URL so it stays in lockstep with metadataBase in layout.tsx instead of duplicating the hostname literal. - Hardcoded FALLBACK_PRODUCTION_HOSTS guarantees production stays indexable even if NEXT_PUBLIC_SITE_URL is unset or malformed at build time (defensive default). - Auto-add www. variant of each derived host. - Switch from request.headers.get('host') to request.nextUrl.hostname (Next.js idiom; already lowercased and port-stripped). deploy-staging.yml: - Optional pinned host key via VPS_HOST_KEY secret. If set, used directly as known_hosts. If unset, falls back to ssh-keyscan with a warning. Backward-compatible — deploys keep working pre-secret, hardened post-secret. - Add SSH/SCP timeouts (ConnectTimeout=10, ServerAliveInterval=15, ServerAliveCountMax=2) to scp and ssh invocations so each retry attempt fails fast (up to ~70s) rather than hanging on a stalled connection. Not fixed in this PR: - Runtime env provisioning (.env.local on the VPS for WP_GRAPHQL_URL, WP_REST_URL, WP_APP_USERNAME, etc.). The production deploy.yml has the same gap and presumably relies on Plesk-side env or a persistent .env that survives extraction. Adding env provisioning here would diverge from prod and risk breaking existing setups; will file a separate hardening PR that updates both workflows together. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two corrections after review: - Drop the VPS_HOST_KEY pinning logic. The MITM defense it adds is narrow (requires an attacker on the GitHub-runner ↔ OVH path AND timing the first ssh-keyscan), production deploy.yml doesn't have it, and adding it asymmetrically just creates operational drift. Match prod: plain ssh-keyscan. - VPS_STAGING_APP_DIR is now read from `vars` rather than `secrets`. An app directory path isn't a credential and doesn't need to be encrypted at rest in GitHub. (VPS_APP_DIR in the production deploy.yml could move to vars too, but that's a separate-PR cleanup since it touches a working production workflow.) The SSH/SCP timeout flags from the previous commit stay — that one is a clear win regardless. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@middleware.ts`:
- Around line 30-33: The current loop over the hosts set only adds a www.
variant for non-www hosts, missing the inverse when NEXT_PUBLIC_SITE_URL is a
www host; update the loop that iterates over [...hosts] (variable hosts) so that
for each host you add both variants: if host startsWith('www.') add the bare
host (strip the leading 'www.'), otherwise add `www.${host}`, ensuring you still
iterate over a copy (`[...hosts]`) while mutating the set to avoid modifying the
iterator.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ccd1f910-29c9-424f-a9dc-130ee6ed442b
📒 Files selected for processing (2)
.github/workflows/deploy-staging.ymlmiddleware.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/deploy-staging.yml
NEXT_PUBLIC_* env vars are inlined into the client bundle at build time, not read from process.env at runtime. The Plesk-panel value helps server code but client components (e.g. components/blog/ShareButtons.tsx) ship whatever was set when \`next build\` ran. Without this, the production bundle falls back to the hardcoded 'https://catholicdigitalcommons.org' literal in code — which happens to match prod, so it has been silently OK. Setting it from NEXT_PUBLIC_SITE_URL_PROD makes the configuration explicit and lets the staging workflow set its own value via NEXT_PUBLIC_SITE_URL_STAGING (see PR #42). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same fix as PR #43 applied to the staging workflow: NEXT_PUBLIC_* env vars are inlined at build time, so without setting it the staging client bundle (e.g. components/blog/ShareButtons.tsx) would ship the hardcoded production URL fallback instead of the staging URL. Reads from the new NEXT_PUBLIC_SITE_URL_STAGING repo variable so share buttons, sitemap canonicals, and other client-side absolute URLs on staging point to staging.catholicdigitalcommons.org. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the loop over [...hosts] only added a www. variant when the source host was bare. If NEXT_PUBLIC_SITE_URL is ever configured with a www. URL (e.g. https://www.catholicdigitalcommons.org), the bare domain wouldn't end up in PRODUCTION_HOSTS and visitors hitting the bare form would get noindex'd in production — exactly the opposite of intent. Now: for each host in the set, add the OTHER variant. www.X gets companion X; X gets companion www.X. Either configuration works. Smoke-tested against 5 NEXT_PUBLIC_SITE_URL inputs (bare, www, unset, malformed, subdomain). All produce both bare and www. variants in the resulting set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The VPS_APP_DIR repo secret was moved to a repo variable. Without
this update the production deploy workflow expands ${{ secrets.VPS_APP_DIR }}
to an empty string and the SSH step would mkdir/tar/touch into the
wrong path, breaking deploys.
Mirrors the convention now used in deploy-staging.yml (PR #42) — an
app directory path isn't a credential, so it belongs in vars.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NEXT_PUBLIC_* env vars are inlined into the client bundle at build time, not read from process.env at runtime. The Plesk-panel value helps server code but client components (e.g. components/blog/ShareButtons.tsx) ship whatever was set when \`next build\` ran. Without this, the production bundle falls back to the hardcoded 'https://catholicdigitalcommons.org' literal in code — which happens to match prod, so it has been silently OK. Setting it from NEXT_PUBLIC_SITE_URL_PROD makes the configuration explicit and lets the staging workflow set its own value via NEXT_PUBLIC_SITE_URL_STAGING (see PR #42). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Two paired infra/SEO additions:
middleware.ts— EmitsX-Robots-Tag: noindex, nofollowonevery response served from a host other than
catholicdigitalcommons.org(orwww.). Catchesstaging.catholicdigitalcommons.org, any preview / one-offsubdomain, raw IP probes, etc.
.github/workflows/deploy-staging.yml— On-demand(
workflow_dispatch) workflow to build and ship the Next.jsbundle to staging, leaving the shared WordPress backend untouched.
Why
Staging pages have started appearing in search results, competing
with production for ranking. Two layers of mitigation:
raw IP, sub-subdomains) defaults to
noindexuntil explicitlyadded to
PRODUCTION_HOSTS. Safer than enumerating known-badhosts.
robots.txtandsitemap-*.xmlon staging also get the noindexheader — otherwise crawlers might trust the staging sitemap and
index its URLs.
The staging deploy workflow exists today as an oversight: there's a
production deploy but no formal staging path, so testing changes
either runs production or doesn't run anywhere. Workflow_dispatch
gates it behind manual trigger so it can't accidentally run on push.
Required setup
Add a new GitHub secret to
cdcf-website:VPS_STAGING_APP_DIR/home/.../staging.catholicdigitalcommons.org/(All other secrets —
VPS_HOST,VPS_USERNAME,VPS_SSH_KEY,VPS_APP_DIR, etc. — are already configured for the productiondeploy and reused by this workflow.)
What this does NOT do
WordPress (
cms.catholicdigitalcommons.org), so pushing theheadless theme or
cdcf-redis-translationsplugin from stagingwould overwrite production code. Theme/plugin changes continue to
go through the existing Build and Deploy (production)
workflow.
staging is still reachable for testing.
<meta name="robots">to layout HTML. Header is thestronger signal and applies to non-HTML responses too. Can layer
in metadata-side later if desired.
Test plan
VPS_STAGING_APP_DIRsecretcurl -I https://staging.catholicdigitalcommons.org/andconfirm
X-Robots-Tag: noindex, nofollowis presentcurl -I https://catholicdigitalcommons.org/and confirmX-Robots-Tagis absent (production unaffected)staging pages
Related
the
/it/<localized-slug>SEO improvement we discussed🤖 Generated with Claude Code
Summary by CodeRabbit
Chores
New Features