Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -460,6 +460,44 @@ proxy serve [flags]
proxy [flags] # same as 'proxy serve'
```

### mirror

Pre-populate the cache from PURLs, SBOM files, or entire registries. Useful for ensuring offline availability or warming the cache before deployments.

```bash
# Mirror specific package versions
proxy mirror pkg:npm/lodash@4.17.21 pkg:cargo/serde@1.0.0

# Mirror all versions of a package
proxy mirror pkg:npm/lodash

# Mirror from a CycloneDX or SPDX SBOM
proxy mirror --sbom sbom.cdx.json

# Preview what would be mirrored
proxy mirror --dry-run pkg:npm/lodash

# Control parallelism
proxy mirror --concurrency 8 pkg:npm/lodash@4.17.21
```

The mirror command accepts the same storage and database flags as `serve`. Already-cached artifacts are skipped.

A mirror API is also available when the server is running:

```bash
# Start a mirror job
curl -X POST http://localhost:8080/api/mirror \
-H "Content-Type: application/json" \
-d '{"purls": ["pkg:npm/lodash@4.17.21"]}'

# Check job status
curl http://localhost:8080/api/mirror/mirror-1

# Cancel a running job
curl -X DELETE http://localhost:8080/api/mirror/mirror-1
```

### stats

Show cache statistics without running the server.
Expand Down Expand Up @@ -534,6 +572,14 @@ Recently cached:
| `GET /debian/*` | Debian/APT repository protocol |
| `GET /rpm/*` | RPM/Yum repository protocol |

### Mirror API

| Endpoint | Description |
|----------|-------------|
| `POST /api/mirror` | Start a mirror job (JSON body with `purls`) |
| `GET /api/mirror/{id}` | Get job status and progress |
| `DELETE /api/mirror/{id}` | Cancel a running job |

### Enrichment API

The proxy provides REST endpoints for package metadata enrichment, vulnerability scanning, and outdated detection.
Expand Down
155 changes: 155 additions & 0 deletions cmd/proxy/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
//
// serve Start the proxy server (default if no command given)
// stats Show cache statistics
// mirror Pre-populate cache from PURLs, SBOMs, or registries
//
// Serve Flags:
//
Expand Down Expand Up @@ -100,7 +101,11 @@ import (

"github.com/git-pkgs/proxy/internal/config"
"github.com/git-pkgs/proxy/internal/database"
"github.com/git-pkgs/proxy/internal/handler"
"github.com/git-pkgs/proxy/internal/mirror"
"github.com/git-pkgs/proxy/internal/server"
"github.com/git-pkgs/proxy/internal/storage"
"github.com/git-pkgs/registries/fetch"
)

const defaultTopN = 10
Expand All @@ -124,6 +129,10 @@ func main() {
os.Args = append(os.Args[:1], os.Args[2:]...)
runStats()
return
case "mirror":
os.Args = append(os.Args[:1], os.Args[2:]...)
runMirror()
return
case "-version", "--version":
fmt.Printf("proxy %s (%s)\n", Version, Commit)
os.Exit(0)
Expand All @@ -145,6 +154,7 @@ Usage: proxy [command] [flags]
Commands:
serve Start the proxy server (default)
stats Show cache statistics
mirror Pre-populate cache from PURLs, SBOMs, or registries

Run 'proxy <command> -help' for more information on a command.

Expand Down Expand Up @@ -340,6 +350,151 @@ func runStats() {
}
}

func runMirror() {
fs := flag.NewFlagSet("mirror", flag.ExitOnError)
configPath := fs.String("config", "", "Path to configuration file")
storageURL := fs.String("storage-url", "", "Storage URL (file:// or s3://)")
databaseDriver := fs.String("database-driver", "", "Database driver: sqlite or postgres")
databasePath := fs.String("database-path", "", "Path to SQLite database file")
databaseURL := fs.String("database-url", "", "PostgreSQL connection URL")
sbomPath := fs.String("sbom", "", "Path to CycloneDX or SPDX SBOM file")
concurrency := fs.Int("concurrency", 4, "Number of parallel downloads") //nolint:mnd // default concurrency
dryRun := fs.Bool("dry-run", false, "Show what would be mirrored without downloading")

fs.Usage = func() {
fmt.Fprintf(os.Stderr, "git-pkgs proxy - Pre-populate cache\n\n")
fmt.Fprintf(os.Stderr, "Usage: proxy mirror [flags] [purl...]\n\n")
fmt.Fprintf(os.Stderr, "Examples:\n")
fmt.Fprintf(os.Stderr, " proxy mirror pkg:npm/lodash@4.17.21\n")
fmt.Fprintf(os.Stderr, " proxy mirror --sbom sbom.cdx.json\n")
fmt.Fprintf(os.Stderr, " proxy mirror pkg:npm/lodash # all versions\n\n")
fmt.Fprintf(os.Stderr, "Flags:\n")
fs.PrintDefaults()
}

_ = fs.Parse(os.Args[1:])
purls := fs.Args()

// Determine source
var source mirror.Source
switch {
case *sbomPath != "":
source = &mirror.SBOMSource{Path: *sbomPath}
case len(purls) > 0:
source = &mirror.PURLSource{PURLs: purls}
default:
fmt.Fprintf(os.Stderr, "error: provide PURLs or --sbom\n")
fs.Usage()
os.Exit(1)
}

// Load config
cfg, err := loadConfig(*configPath)
if err != nil {
fmt.Fprintf(os.Stderr, "error loading config: %v\n", err)
os.Exit(1)
}
cfg.LoadFromEnv()

if *storageURL != "" {
cfg.Storage.URL = *storageURL
}
if *databaseDriver != "" {
cfg.Database.Driver = *databaseDriver
}
if *databasePath != "" {
cfg.Database.Path = *databasePath
}
if *databaseURL != "" {
cfg.Database.URL = *databaseURL
}

if err := cfg.Validate(); err != nil {
fmt.Fprintf(os.Stderr, "invalid configuration: %v\n", err)
os.Exit(1)
}

logger := setupLogger("info", "text")

// Open database
var db *database.DB
switch cfg.Database.Driver {
case "postgres":
db, err = database.OpenPostgresOrCreate(cfg.Database.URL)
default:
db, err = database.OpenOrCreate(cfg.Database.Path)
}
if err != nil {
fmt.Fprintf(os.Stderr, "error opening database: %v\n", err)
os.Exit(1)
}
defer func() { _ = db.Close() }()

if err := db.MigrateSchema(); err != nil {
_ = db.Close()
fmt.Fprintf(os.Stderr, "error migrating schema: %v\n", err)
os.Exit(1) //nolint:gocritic // db closed above
}

// Open storage
sURL := cfg.Storage.URL
if sURL == "" {
sURL = "file://" + cfg.Storage.Path //nolint:staticcheck // backwards compat
}
store, err := storage.OpenBucket(context.Background(), sURL)
if err != nil {
_ = db.Close()
fmt.Fprintf(os.Stderr, "error opening storage: %v\n", err)
os.Exit(1) //nolint:gocritic // db closed above
}

// Build proxy (reuses same pipeline as serve)
fetcher := fetch.NewFetcher()
resolver := fetch.NewResolver()
proxy := handler.NewProxy(db, store, fetcher, resolver, logger)
proxy.CacheMetadata = true // mirror always caches metadata
proxy.MetadataTTL = cfg.ParseMetadataTTL()

m := mirror.New(proxy, db, store, logger, *concurrency)

ctx, cancel := context.WithCancel(context.Background())
go func() {
sigCh := make(chan os.Signal, 1)
signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)
<-sigCh
cancel()
}()

if *dryRun {
items, err := m.RunDryRun(ctx, source)
if err != nil {
fmt.Fprintf(os.Stderr, "error: %v\n", err)
os.Exit(1)
}
fmt.Printf("Would mirror %d package versions:\n", len(items))
for _, item := range items {
fmt.Printf(" %s\n", item)
}
return
}

progress, err := m.Run(ctx, source)
if err != nil {
fmt.Fprintf(os.Stderr, "error: %v\n", err)
os.Exit(1)
}

fmt.Printf("Mirror complete: %d downloaded, %d skipped (cached), %d failed, %s total\n",
progress.Completed, progress.Skipped, progress.Failed, formatSize(progress.Bytes))

if len(progress.Errors) > 0 {
fmt.Fprintf(os.Stderr, "\nErrors:\n")
for _, e := range progress.Errors {
fmt.Fprintf(os.Stderr, " %s/%s@%s: %s\n", e.Ecosystem, e.Name, e.Version, e.Error)
}
}
}

func printStats(db *database.DB, popular, recent int, asJSON bool) error {
defer func() { _ = db.Close() }()

Expand Down
23 changes: 22 additions & 1 deletion docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,20 @@ vulnerabilities (
updated_at DATETIME
)
-- indexes: (vuln_id, ecosystem, package_name) unique, (ecosystem, package_name)

metadata_cache (
id INTEGER PRIMARY KEY,
ecosystem TEXT NOT NULL,
name TEXT NOT NULL,
storage_path TEXT NOT NULL,
etag TEXT,
content_type TEXT,
size INTEGER, -- BIGINT on Postgres
fetched_at DATETIME,
created_at DATETIME,
updated_at DATETIME
)
-- indexes: (ecosystem, name) unique
```

On PostgreSQL, `INTEGER PRIMARY KEY` becomes `SERIAL`, `DATETIME` becomes `TIMESTAMP`, `INTEGER DEFAULT 0` booleans become `BOOLEAN DEFAULT FALSE`, and size/count columns use `BIGINT`.
Expand Down Expand Up @@ -277,6 +291,12 @@ Version age filtering for supply chain attack mitigation. Configurable at global

Package metadata enrichment. Fetches license, description, homepage, repository URL, and vulnerability data from upstream registries. Powers the `/api/` endpoints and the web UI's package detail pages.

### `internal/mirror`

Selective package mirroring for pre-populating the proxy cache. Supports multiple input sources: individual PURLs (versioned or unversioned), CycloneDX/SPDX SBOM files, and full registry enumeration. Uses a bounded worker pool backed by `errgroup` to download artifacts in parallel, reusing `handler.Proxy.GetOrFetchArtifact()` for the actual fetch-and-cache work.

The package also provides a `MetadataCache` for storing raw upstream metadata blobs so the proxy can serve metadata responses offline. The `JobStore` manages async mirror jobs exposed via the `/api/mirror` endpoints.

### `internal/config`

Configuration loading.
Expand Down Expand Up @@ -326,10 +346,11 @@ Eviction can be implemented as:
- Ensures clients fetch artifacts through proxy
- Alternative: Let clients fetch directly, miss cache opportunity

**Why not cache metadata?**
**Why not cache metadata (by default)?**
- Simplicity - no invalidation logic needed
- Fresh data - new versions visible immediately
- Metadata is small, upstream fetch is fast
- Set `cache_metadata: true` or use the mirror command to enable metadata caching for offline use via the `metadata_cache` table

**Why stream artifacts?**
- Memory efficient - don't load large files into RAM
Expand Down
59 changes: 59 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,65 @@ Currently supported for npm, PyPI, pub.dev, Composer, Cargo, NuGet, Conda, RubyG

Note: Hex cooldown requires disabling registry signature verification since the proxy re-encodes the protobuf payload without the original signature. Set `HEX_NO_VERIFY_REPO_ORIGIN=1` or configure your repo with `no_verify: true`.

## Metadata Caching

By default the proxy fetches metadata fresh from upstream on every request. Enable `cache_metadata` to store metadata responses in the database and storage backend for offline fallback. When upstream is unreachable, the proxy serves the last cached copy. ETag-based revalidation avoids re-downloading unchanged metadata.

```yaml
cache_metadata: true
```

Or via environment variable: `PROXY_CACHE_METADATA=true`.

The `proxy mirror` command always enables metadata caching regardless of this setting.

### Metadata TTL

When metadata caching is enabled, `metadata_ttl` controls how long a cached response is considered fresh before revalidating with upstream. During the TTL window, cached metadata is served directly without contacting upstream, reducing latency and upstream load.

```yaml
metadata_ttl: "5m" # default
```

Or via environment variable: `PROXY_METADATA_TTL=10m`.

Set to `"0"` to always revalidate with upstream (ETag-based conditional requests still avoid re-downloading unchanged content).

When upstream is unreachable and the cached entry is past its TTL, the proxy serves the stale cached copy with a `Warning: 110 - "Response is Stale"` header so clients can tell the data may be outdated.

## Mirror API

The `/api/mirror` endpoints are disabled by default. Enable them to allow starting mirror jobs via HTTP:

```yaml
mirror_api: true
```

Or via environment variable: `PROXY_MIRROR_API=true`.

When disabled, the endpoints are not registered and return 404.

## Mirror Command

The `proxy mirror` command pre-populates the cache from various sources. It accepts the same storage and database flags as `serve`.

| Flag | Default | Description |
|------|---------|-------------|
| `--sbom` | | Path to CycloneDX or SPDX SBOM file |
| `--concurrency` | `4` | Number of parallel downloads |
| `--dry-run` | `false` | Show what would be mirrored without downloading |
| `--config` | | Path to configuration file |
| `--storage-url` | | Storage URL |
| `--database-driver` | | Database driver |
| `--database-path` | | SQLite database file |
| `--database-url` | | PostgreSQL connection URL |

Positional arguments are treated as PURLs:

```bash
proxy mirror pkg:npm/lodash@4.17.21 pkg:cargo/serde@1.0.0
```

## Docker

### SQLite with Local Storage
Expand Down
Loading