Media proxy#483
Conversation
Introduce src/media-proxy.ts as the shared signing/verification core for the upcoming /proxy route and entity-serializer rewrites. Two new env vars are parsed at module load: MEDIA_PROXY off | proxy | cache (default off) REMOTE_MEDIA_THUMBNAILS on | off (default on) The signing key is derived from SECRET_KEY via HMAC-SHA256(SECRET_KEY, "media-proxy/v1") so no additional secret is required. Each proxied URL is turned into /proxy/<sig>/<b64url>, where b64url is the URL itself, and sig is an HMAC over that base64url string truncated to 16 bytes and constant-time-compared on verification. proxyUrlForMode() short-circuits in three ways before signing: it returns null for null/undefined input, returns the URL unchanged in off mode or when the URL's origin matches baseUrl or STORAGE_URL_BASE, and returns null for non-http(s) schemes so callers fall back to a default rather than minting a signed wrapper for data:/javascript:/etc. The base64url decoder additionally rejects the impossible mod-4-of-1 length and re-encodes the result to refuse trailing-bit aliasing, so near-canonical variants of a signed path can't slip through. For fedify-dev#481. Assisted-by: Claude Code:claude-opus-4-7 Assisted-by: Codex:gpt-5.5
Mount a new Hono app at /proxy that consumes the URLs minted by the
media-proxy module added in the previous commit. Behaviour follows the
three MEDIA_PROXY modes:
off — the app has no routes, so every /proxy/... path 404s.
proxy — verify the signature, run ssrfcheck on the URL and on every
Location header through up to 3 manual redirects, fetch
within a 30s abort window, validate Content-Type against an
image/video/audio allowlist that explicitly excludes SVG,
cap the body at 32 MiB during streaming, and return the
bytes with Cache-Control: public, max-age=30d, immutable and
X-Content-Type-Options: nosniff.
cache — same as proxy, but the body and a small JSON metadata
sidecar are persisted to the flydrive disk under
proxy/<sha256(url)>. Subsequent requests for the same URL
skip the upstream fetch entirely.
A few details worth calling out:
- The abort timer spans both the connect/headers phase and the body
read so a stalled upstream cannot tie a request up past the
timeout.
- Non-OK upstream responses and disallowed Content-Type both cancel
the response body before returning to avoid sockets lingering
until GC.
- Uint8Array bodies are turned into exactly-sized ArrayBuffers before
being handed to Hono so we don't leak bytes from a shared Buffer
pool into the response.
- DNS-rebinding via ssrfcheck (host string only, not resolved IP) is
the same limitation as src/federation/post.ts:525-526 and is
tracked as a follow-up that needs an SSRF-aware fetch connector.
For fedify-dev#481.
Assisted-by: Claude Code:claude-opus-4-7
Assisted-by: Codex:gpt-5.5
Wrap every remote URL that leaves Hollo through the Mastodon-compatible
API in proxyUrl(value, baseUrl). The previous commits added the module
and the /proxy route; this is the wiring that makes API clients actually
see proxied URLs once MEDIA_PROXY is set to proxy or cache.
Touched:
- account.avatar / .avatar_static / .header / .header_static
- media_attachments[].url and .preview_url
- emojis[].url / .static_url (account + post)
- emoji_reactions[].url / .static_url
- emoji_reaction (on notifications) plus its emoji_url / emojiURL
siblings
- card.image on preview cards
serializeMedium, serializeEmojis, serializeEmoji, serializeReactions,
serializeReaction, and serializePreviewCard all gained a baseUrl
parameter; status.ts threads c.req.url through every call site.
A couple of safety details worth calling out:
- serializeEmoji returns null when proxyUrl rejects the href (non-
http(s) scheme). serializeEmojis filters those, and
serializeReactions treats a rejected customEmoji as a unicode-only
reaction so the count is preserved without leaking the raw URL.
The notifications endpoint similarly skips emoji_url / emojiURL
when the proxy refuses.
- When a preview card's image URL is dropped, width and height are
forced to 0 so clients don't see dimensions on a null image.
Outbound federation (src/federation/actor.ts and friends) is
untouched: it reads avatarUrl / coverUrl / media.url straight from the
DB, which still holds the original remote URLs.
For fedify-dev#481.
Assisted-by: Claude Code:claude-opus-4-7
Assisted-by: Codex:gpt-5.5
Thread a baseUrl prop through every JSX component that renders a
remote-origin avatar, header, media thumbnail, preview-card image, or
custom emoji, and wrap each of those URLs in proxyUrl() at render time.
Once the API path was proxying URLs, the admin dashboard and public
profile pages were the only remaining places where an operator's
browser would still talk directly to the source server.
Components updated:
- Profile, AccountList / AccountItem, Post, PostContent, Medium,
PreviewCardView, AuthorizationPage
- renderCustomEmojis() in custom-emoji.ts now takes baseUrl too; if
proxyUrl rejects an emoji URL (non-http(s) scheme), it falls back
to the literal :shortcode: instead of emitting an <img> with the
raw href
Page handlers updated to pass c.req.url through:
- src/pages/home/index.tsx
- src/pages/profile/index.tsx and src/pages/profile/profilePost.tsx
- src/pages/tags/index.tsx
- src/pages/accounts.tsx
- src/pages/emojis.tsx (existing emoji table + import-discovery
table)
- src/pages/oauth/authorization.tsx via src/oauth.tsx
A couple of safety details:
- The Medium component used to wrap the thumbnail in an <a href>
that pointed at medium.url. When proxyUrl rejects that URL the
component now renders a plain <div> wrapper rather than fall back
to the raw href, so a non-http(s) media URL can no longer slip
into the rendered HTML.
- Layout's imageUrl prop (OpenGraph / Twitter card) is intentionally
left as the raw avatarUrl. That metadata is consumed by external
link-preview crawlers, and routing those fetches through our own
/proxy endpoint would push their crawler traffic onto our origin.
For fedify-dev#481.
Assisted-by: Claude Code:claude-opus-4-7
Assisted-by: Codex:gpt-5.5
When REMOTE_MEDIA_THUMBNAILS=off the federation ingest path in persistPost no longer downloads incoming attachments to generate webp thumbnails. It still creates the media row, but with thumbnailUrl set to the remote URL itself and the attachment-declared dimensions — mirroring the existing fallback shape used when the sharp pipeline fails. This addresses the long-standing complaint that the sharp-derived thumbnails take up a lot of operator storage for posts that may never actually be displayed. Combined with MEDIA_PROXY=proxy or cache, clients still see same-origin URLs at render time; with MEDIA_PROXY=off they see the original remote URL exactly the same way the existing catch-branch already did. REMOTE_MEDIA_THUMBNAILS defaults to on so existing deployments keep their current behaviour. For fedify-dev#481. Assisted-by: Claude Code:claude-opus-4-7 Assisted-by: Codex:gpt-5.5
Generalize src/cleanup/processors.ts into a single processCleanupItem()
entry point that the worker now calls. It inspects item.data.kind: a
new "proxy_cache" payload routes to processProxyCacheDeletion() (which
removes the body and its JSON sidecar), and everything without that
discriminator falls through to the existing processThumbnailDeletion()
so old queued items keep working unchanged.
The .bin key the proxy-cache job carries is validated against the exact
shape writeCached() in src/proxy.ts mints — `proxy/<sha256>.bin`, 64
hex chars — so the per-item key cannot be coaxed into a path-traversal
delete. listProxyCacheBinKeys() applies the same regex and pages
through every disk.listAll result so S3 buckets with more than one
listing page are fully enumerated.
The admin page (src/pages/thumbnail_cleanup.tsx, deliberately keeping
the existing route URL) gains:
- a third statistics row, "Media proxy cache entries", populated
from listProxyCacheBinKeys()
- a "Clear media proxy cache" section with a button that POSTs to
/thumbnail_cleanup/proxy_cache/clear, which enqueues one
cleanupJobs row plus one cleanupJobItems row per cached .bin, in
batches of 1000
The cleanup_thumbnails enum value is reused for these jobs (the kind
discriminator inside data picks the processor); the plan for this PR
explicitly chose that over a Drizzle migration for a new enum value.
A few details worth calling out:
- The body delete is required; only its sidecar delete is wrapped in
a try/log so a partial previous cleanup doesn't fail the item.
- When the storage listing call itself fails, the statistics row is
omitted (not shown as 0) and the section disables the clear button
with "Cache size unavailable" so the UI can't claim "the proxy
cache is empty" when it actually couldn't be inspected.
- The proxy section uses its own `proxy-cache-result` query param so
its post-action messages don't leak into the thumbnail section.
For fedify-dev#481.
Assisted-by: Claude Code:claude-opus-4-7
Assisted-by: Codex:gpt-5.5
Cover the two new environment variables introduced by this PR in every
place CLAUDE.md requires:
- AGENTS.md (CLAUDE.md is a symlink to it) — appended both vars to
the Optional variables table with their defaults
- docs/src/content/docs/install/env.mdx and its ja / ko / zh-cn
translations — added full sections describing the three MEDIA_PROXY
modes and what protections kick in under proxy and cache (SSRF
re-check on every redirect target, http(s)-only schemes, 32 MiB
body cap, image/svg+xml explicitly blocked), and the storage
layout (body at proxy/<sha256>.bin, content-type sidecar at
proxy/<sha256>.json)
- compose.yaml and compose-fs.yaml — added commented-out example
lines so operators see where to plug the variables in for both the
S3 and FS deployment templates
- CHANGES.md — release notes for 0.9.0 under issue fedify-dev#481, plus the
reference link
Defaults are off / on, so existing deployments are unaffected until an
operator opts in.
For fedify-dev#481.
Assisted-by: Claude Code:claude-opus-4-7
Assisted-by: Codex:gpt-5.5
Two leaks the per-commit reviews didn't catch:
- proxyUrl() decided "this URL is already on us" by comparing
origins. That's wrong when STORAGE_URL_BASE is path-scoped under
a shared origin (path-style S3 / MinIO, a shared CDN, or
STORAGE_URL_BASE set to the bare Hollo origin), because every
other URL on that origin slips past the proxy. And it was also
too permissive on baseUrl's origin: a federated actor whose
avatarUrl points at the Hollo server's own URL could ride a
logged-in operator's session.
proxyUrl() now uses a STORAGE_URL_BASE prefix match (normalized to
end in /), drops the baseUrl-origin passthrough entirely, and
keeps a single anti-recursion guard so an already-proxied URL on
our own origin isn't double-wrapped. Default avatars / header
images and uploaded local-account assets are unaffected: they're
either fetched via the `?? defaultAvatarUrl` fallback (raw, never
through proxyUrl) or sit under STORAGE_URL_BASE (prefix match
catches them).
- /api/v1/custom_emojis was still emitting `emoji.url` and
`emoji.static_url` straight from the DB. Every other API
response path was already routed through proxyUrl(); this one was
overlooked. The endpoint now uses proxyUrl(emoji.url, c.req.url)
and drops entries whose URL the proxy refuses, matching how
serializeEmoji() behaves elsewhere.
Tests updated: the old "does not proxy a URL on the same origin as
baseUrl" assertion now covers STORAGE_URL_BASE prefix instead, plus
two new tests assert that same-origin-but-not-storage URLs do get
proxied and that already-proxied URLs are not double-wrapped.
For fedify-dev#481.
Assisted-by: Claude Code:claude-opus-4-7
Assisted-by: Codex:gpt-5.5
The previous fix prefix-matched the raw URL string, which lets a URL
like `http://h/assets/../admin/x.png` (or its `%2e%2e` equivalents)
slip past the check while resolving to `/admin/x.png` in the browser
and at fetch time. Compare against `parsed.href` instead, so dot-
segment normalization happens before the check.
While here, derive the trusted prefix from the active storage driver
rather than from STORAGE_URL_BASE alone:
- FS mode always serves uploads at `/assets/` on the storage origin,
regardless of what STORAGE_URL_BASE's path looks like. Trust only
that namespace, so a bare-origin STORAGE_URL_BASE (the kind in
.env.test) doesn't end up trusting the whole Hollo origin.
- S3 / generic mode keeps using STORAGE_URL_BASE itself, normalized
to end in `/`, because the driver concatenates that string with
the object key.
Two new tests cover (a) `/admin/...` on the Hollo origin gets proxied
even though it shares an origin with STORAGE_URL_BASE, and (b)
`/assets/../admin/...` traversal is normalized before the prefix
check and therefore gets proxied too.
For fedify-dev#481.
Assisted-by: Claude Code:claude-opus-4-7
Assisted-by: Codex:gpt-5.5
…REMOTE_MEDIA_THUMBNAILS MEDIA_PROXY previously only accepted the three named modes (off/proxy/cache). Allow `true`/`on`/`1` as aliases for `proxy` and `false`/`off`/`0` as aliases for `off`, matching the boolean conventions the rest of Hollo's env vars use. Disk caching is still opt-in only: it must be requested explicitly with `cache` rather than via the truthy alias, to avoid silently turning on persistent storage by flipping a generic Boolean. REMOTE_MEDIA_THUMBNAILS was already Boolean internally, but its error message advertised only `on`/`off`. Updated the message to list every accepted form (`true`/`false`, `on`/`off`, `1`/`0`) and reuses the shared TRUTHY/FALSY sets with parseMediaProxyMode. Both parsers are now exported so they can be unit-tested directly. Tests cover defaults, named values, truthy/falsy synonyms, case- insensitivity, and the throw path on unknown values. Docs in all four languages (en/ja/ko/zh-cn), AGENTS.md, and CHANGES.md were updated to describe the accepted aliases. For fedify-dev#481. Assisted-by: Claude Code:claude-opus-4-7 Assisted-by: Codex:gpt-5.5
|
@codex review |
There was a problem hiding this comment.
Code Review
This pull request introduces a media proxy feature to Hollo, allowing remote media such as avatars, attachments, and emojis to be served through the instance's own origin to improve privacy and bypass CORS issues. It adds two new environment variables, MEDIA_PROXY (with off, proxy, and cache modes) and REMOTE_MEDIA_THUMBNAILS, and implements a new /proxy endpoint with SSRF protection and signature verification. Feedback from the reviewer focuses on security and performance, specifically suggesting improvements to the SSRF check to prevent DNS rebinding, and recommending streaming approaches for listing cache keys and handling media bodies to reduce memory pressure.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7a593d71c3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
When REMOTE_MEDIA_THUMBNAILS=off the ingest path skips the body download, which also gave up the Content-Type that the previous GET response used to backfill mediaType. Peers whose attachment objects don't carry mediaType therefore had their attachments dropped silently. Issue a HEAD request to the URL in that fallback path and use the upstream Content-Type if it comes back. SSRF is already vetted just above this branch, so the HEAD reuses that decision; on any error or non-OK response the attachment still skips, matching the previous behaviour for the genuinely unknowable case. fedify-dev#483 (comment) Assisted-by: Claude Code:claude-opus-4-7
Audio and video clients seek by issuing Range requests; the previous proxy stripped that header on the way to upstream, so every byte range request from a browser ended up downloading the full body. That broke scrubbing and inflated bandwidth on proxied media. The route now forwards the client's Range header through every redirect hop, accepts 206 Partial Content as a success status with Content-Range and Accept-Ranges passed through, and surfaces 416 Range Not Satisfiable with Content-Range (the upstream body is discarded so we never serve an attacker-shaped error page under our origin). In cache mode, range requests skip both the cache read and the cache write so that a single cache entry always represents the full resource. Three new tests cover the forwarding, the 416 passthrough, and the cache bypass. fedify-dev#483 (comment) Assisted-by: Claude Code:claude-opus-4-7
listProxyCacheBinKeys() collected every cache key into an array before returning, so the admin dashboard and the cleanup-enqueue endpoint both had to hold the full key list in memory at once. An instance with a few million cache entries would have spent hundreds of megabytes on this list, plus a duplicate copy as job-item objects during enqueue. Replace it with iterateProxyCacheBinKeys(), an async generator that yields one key at a time while paging through the storage listing. The dashboard's count uses a thin countProxyCacheBinKeys() wrapper. The /proxy_cache/clear handler now creates the cleanupJobs row with totalItems=0, streams keys into 1000-row INSERT batches, and updates totalItems once enumeration completes. If enumeration or any insert throws, the placeholder job is deleted so the dashboard doesn't see a stuck pending job. fedify-dev#483 (comment) Assisted-by: Claude Code:claude-opus-4-7
|
@codex review |
|
/gemini review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 27fa3a9202
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Code Review
This pull request implements a media proxy system and an optional toggle for remote media thumbnail generation. The media proxy allows Hollo to re-serve remote content—including avatars, headers, attachments, and emojis—through its own origin, supporting both streaming and on-disk caching modes with built-in SSRF protection and URL signing. Feedback on the implementation highlights potential performance bottlenecks in the admin dashboard when counting or enumerating large numbers of cache entries, which could lead to request timeouts. Additionally, it was suggested to use streaming for proxy responses to reduce memory pressure during high concurrency.
The previous /proxy_cache/clear handler created a cleanupJobs row with totalItems=0 and then streamed cache keys into batched item inserts. That left a window where the cleanup worker could pick up the pending job, see no items yet, and finalize it as completed. Items inserted after that moment ended up parented to a completed job that the worker no longer polled, so a large cache could be silently left half-cleared. The same handler was also doing the full storage walk inline, which could blow past reverse-proxy timeouts on caches with hundreds of thousands of entries. Both problems go away when the worker owns the whole lifecycle. /proxy_cache/clear now creates one cleanup job inside a transaction, along with a single "enumerate_proxy_cache" item. The worker picks up that item, streams the cache listing into 1000-row batches of proxy_cache items under the same jobId, and bumps the parent job's totalItems by exactly the number of entries it queued. Because the enumeration item is still in flight while it inserts those items, the worker's "no unfinished items? finalize." check can't race the enqueue. fedify-dev#483 (comment) fedify-dev#483 (comment) Assisted-by: Claude Code:claude-opus-4-7
Every load of /thumbnail_cleanup was iterating the storage backend
end to end to compute "Media proxy cache entries: N". On an instance
with a few hundred thousand .bin entries, that's enough listing
calls to exceed a typical reverse-proxy timeout.
countProxyCacheBinKeys now stops once it has seen
PROXY_CACHE_COUNT_CAP (10,000) entries and returns
{ count, truncated: true } so the dashboard knows the value is a
lower bound. Beyond that point the operator only needs to know
"there's a lot in there"; the worker-driven cleanup queues every
entry regardless of what the dashboard chose to show.
The stats row renders "10,000+" when truncated, and the clear button
likewise says "Clear 10,000+ entries" instead of pretending to know
the exact figure.
fedify-dev#483 (comment)
Assisted-by: Claude Code:claude-opus-4-7
|
@codex review |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements a media proxy for Hollo, enabling remote media to be served via the instance origin with support for streaming and disk caching. It also introduces a toggle for local thumbnail generation to reduce storage overhead. Review feedback highlighted opportunities to improve the proxy's memory efficiency by streaming directly to responses when caching is disabled, the need for better fallback mechanisms when fetching attachment metadata via HEAD requests, and suggested documenting DNS-level SSRF limitations within the proxy implementation.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a0b31669f8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
A future maintainer reading fetchWithSSRFAwareRedirects could easily read the per-hop isSSRFSafeURL call as a complete SSRF defence. It isn't: ssrfcheck only inspects the URL string, so a public hostname that resolves to a private address at fetch time still gets through. Add a comment block above the function explaining that, and note that the proper fix lives in a shared SSRF-aware fetch connector rather than this route. fedify-dev#483 (comment) Assisted-by: Claude Code:claude-opus-4-7
The previous HEAD probe for missing Content-Type was a single try:
if the upstream rejected HEAD (CDNs often reply 405, even when GET
would succeed) we dropped the attachment. Fediverse peers that
omit mediaType plus a CDN that dislikes HEAD silently lost their
attachments.
Probe in three steps now, each cheaper than persisting a body:
1. HEAD, as before.
2. GET with Range: bytes=0-0. 200 (server ignored Range) and 206
(server honoured it) both surface a usable Content-Type; the
response body is cancelled either way.
3. mime.getType() against the URL's pathname. Last-resort
extension inference for the case where the server tells us
nothing useful.
The attachment is only skipped if all three return nothing.
fedify-dev#483 (comment)
fedify-dev#483 (comment)
Assisted-by: Claude Code:claude-opus-4-7
|
@codex review |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a media proxy feature to Hollo, allowing remote media like avatars and attachments to be served through the local origin to bypass CORS issues and enhance privacy. The proxy supports streaming and on-disk caching modes, featuring SSRF protection and a security-focused content-type allowlist that excludes SVG files. It also adds a setting to disable local thumbnail generation for remote media to save disk space. The reviewer suggested an optimization in the proxy logic to validate the Content-Length header before downloading the response body, which would prevent unnecessary data transfer for files exceeding the size limit.
|
Codex Review: Didn't find any major issues. Nice work! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
The streaming cap in readBoundedBody catches oversized bodies, but only after we've started reading. When the upstream is already telling us the body is 50 MiB before we read a byte, there's no reason to spend that bandwidth: we'll bail anyway once the cap trips. Check Content-Length right after the Content-Type allowlist and short-circuit to 404 when it exceeds MAX_BYTES. A missing or malformed header still falls through to the streaming cap, which keeps enforcing the limit byte by byte. New test confirms a 64 MiB advertised body is dropped before the body read happens. fedify-dev#483 (comment) Assisted-by: Claude Code:claude-opus-4-7
|
@codex review |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements a media proxy for Hollo, allowing remote assets like avatars and attachments to be served via the local origin. It introduces the MEDIA_PROXY variable with off, proxy, and cache modes, alongside REMOTE_MEDIA_THUMBNAILS to toggle local thumbnail processing. The feature includes a signed /proxy endpoint with SSRF protection and optional disk caching. Updates span API serialization, JSX components, and the admin dashboard, which now features a proxy cache cleanup tool. I have no feedback to provide.
|
Codex Review: Didn't find any major issues. Chef's kiss. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
#481 asks for a Misskey-style media proxy so Hollo doesn't hand visitors the original URLs of remote avatars, headers, attachments, custom emojis, and link-preview images. Without one, restrictive CORS on remote object stores breaks client-side image work, and the visitor's browser ends up making a request to every server its timeline touches.
Configuration
MEDIA_PROXYdecides how Hollo serves remote media. It has three states:off(the default and the historical behaviour),proxy(rewrite every remote media URL to a signed/proxy/<sig>/<b64url>path served by Hollo, streaming bytes through without persisting them), andcache(the same rewriting, plus storing the body atproxy/<sha256>.binwith a content-type sidecar atproxy/<sha256>.jsonin the configured storage backend). It also accepts the Boolean synonymstrue/on/1forproxyandfalse/off/0foroff, but disk caching must be requested explicitly withcache.REMOTE_MEDIA_THUMBNAILS(default on) controls whether Hollo downloads incoming attachments to generate a local WebP thumbnail through Sharp. Operators can turn it off when they're already proxying remote media or when storage is tight.Signing and the proxy route
The signing key is derived from
SECRET_KEYviaHMAC-SHA256(SECRET_KEY, "media-proxy/v1"), so no additional secret is needed. Each proxy URL embeds an HMAC-SHA256 over the base64url-encoded original URL, truncated to 16 bytes; verification usescrypto.timingSafeEqualand rejects non-canonical base64url.The route in src/proxy.ts re-runs
isSSRFSafeURLon every redirect target through up to three manual hops, caps the body at 32 MiB with streaming reads, requiresimage/,video/, oraudio/Content-Type withimage/svg+xmlexplicitly blocked (SVG can carry inline scripts that would execute under the Hollo origin), and returns the response withCache-Control: public, max-age=2592000, immutableplusX-Content-Type-Options: nosniff.Where URL rewriting happens
Hollo rewrites these URLs in the entity serializers, not when posts are persisted. src/entities/ threads a
baseUrlparameter throughserializeMedium,serializeEmojis,serializeReaction, andserializePreviewCard, so the Mastodon API runs every remote URL throughproxyUrl()before emitting it:account.avatar,media_attachments[].urland.preview_url, account and postemojis[], reactionurl/static_url, preview-cardimage, notificationemoji_reaction, and/api/v1/custom_emojis. The server-rendered web UI does the same in src/components/Profile.tsx, src/components/Post.tsx, src/components/AccountList.tsx, and the inline-emoji helper in src/custom-emoji.ts.Outbound federation is deliberately untouched: src/federation/actor.ts and friends still publish the original remote URLs as
icon,image,attachment, and emojiTagreferences, because peers shouldn't have their ownimagerewritten to go through this instance.Trust boundary
The proxy treats only the configured storage prefix as trusted local media, not every URL on the Hollo origin. The check in src/media-proxy.ts compares each URL's normalized
.hrefagainst a canonical storage prefix derived fromDRIVE_DISK: the FS driver always serves at/assets/on the storage origin regardless ofSTORAGE_URL_BASE's path, while the S3 driver trustsSTORAGE_URL_BASEitself. Comparing the parsed.hrefrather than the raw input string blocks dot-segment and percent-encoded path traversal likehttp://h/assets/../admin/avatar.png. URLs that share the Hollo origin but live outside the storage namespace, including attacker-craftedavatarUrlvalues that point at/admin/..., get routed through the proxy so they can't ride a logged-in operator's session.Admin cache cleanup
The admin page at /thumbnail_cleanup picks up a media-proxy section: the statistics table gets a row counting the cache's
.binentries, and a "Clear N entries" button enqueues a cleanup job that deletes both the.binbody and its.jsonsidecar per entry. The existing cleanup worker handles the deletion; the new payload kind is distinguished bydata.kind = "proxy_cache"inside the existingcleanup_thumbnailsenum value, so no Drizzle migration is needed. Cleanup item keys are validated against a strictproxy/<sha256>.binregex beforedisk.delete()sees them, and the storage listing pages through everydisk.listAllpage so multi-page S3 buckets get fully enumerated.Out of scope
Post-content HTML, profile bios, and custom field values can still contain
<img src>,<audio src>,<video src>,<source>, andposterattributes pointing at remote servers; rewriting those would need separate handling forsrcset, relative URLs, and content fetched via the Open Graph scraper, on top of changes to the sanitiser pipeline in src/xss.ts.ssrfcheckonly validates the URL syntax, not DNS resolution, so a hostname that resolves to a private address can still be reached atfetchtime. That's the same limitation every server-side fetch in the codebase has today (see src/federation/post.ts) and should be fixed in the shared fetch layer, not only in this route.Tests
New tests live in src/media-proxy.test.ts (27 cases covering env-var parsing, signing, URL rewriting, and storage-prefix canonicalisation) and src/proxy.test.ts (10 cases for the route across off/proxy/cache modes, SSRF rejection on redirect, the SVG block, and the cache-hit reuse path). The full suite is at 432 passing.
Closes #481.