Skip to content

feat(ingest): visual progress bars in docker logs for every source#34

Merged
gok03 merged 1 commit into
mainfrom
feat/ingest-progress-logs
Jun 10, 2026
Merged

feat(ingest): visual progress bars in docker logs for every source#34
gok03 merged 1 commit into
mainfrom
feat/ingest-progress-logs

Conversation

@gok03

@gok03 gok03 commented Jun 10, 2026

Copy link
Copy Markdown
Member

Why

`docker logs -f refuse` between v0.1.0 and v0.1.1 gave the user nothing to read between "cron started" and the first "osv-delta ok in 60103 ms". Five+ data sources and no signal of which was running, which were done, or how far along they were. The new `/readyz` endpoint in v0.1.1 told you "these sources are still pending" but not "npm OSV is 60% through right now".

What

Per-source progress lines with a 20-char ASCII bar, emitted every ~5s during streaming and at start/done for fast sources. Fixed-width tag column so the bars line up.

```
refuse: ingest[osv:npm ] ▶ starting (ecosystem 1/28)
refuse: ingest[osv:npm ] [██████░░░░░░░░░░░░░░] 30% • 9000 records • 12s
refuse: ingest[osv:npm ] [████████████████░░░░] 80% • 24000 records • 35s
refuse: ingest[osv:npm ] ✓ done — 30142 records in 41s
refuse: ingest[kev ] ▶ starting
refuse: ingest[kev ] [████████████████████] 100% • 1542/1542 entries
refuse: ingest[kev ] ✓ done — 1542 entries in 1.8s
refuse: ingest[epss ] ▶ starting
refuse: ingest[epss ] [████████████████████] 87% • 245678 rows scored 2026-06-09
refuse: ingest[epss ] ✓ done — 245678 rows in 14s
refuse: ingest[ghsa ] ▶ starting (no prior cursor)
refuse: ingest[ghsa ] [████████████████████] 100% • 100/100 records this page
refuse: ingest[ghsa ] ✓ done — 100 records in 2s (cursor saved)
refuse: ingest[wolfi ] ▶ starting
refuse: ingest[wolfi ] [████████████████████] 100% • 412 packages, 1247 records
refuse: ingest[wolfi ] ✓ done — 1247 records across 412 packages in 4s
refuse: ingest[deps-dev ] ▶ starting (400 packages in this batch)
refuse: ingest[deps-dev ] [██████████░░░░░░░░░░] 50% • 200/400 packages • 12s
refuse: ingest[deps-dev ] ✓ done — 9824 version rows across 400 packages in 25s
```

Streaming vs known-total bars

Streaming sources (OSV per-ecosystem, EPSS) don't know their final count until the stream closes, so the bar fills against a calibrated per-ecosystem estimate:

Ecosystem Estimate
npm 30K
PyPI 20K
Maven 15K
Go 8K
crates.io / NuGet 4K
RubyGems / Packagist / Hex / Pub 500–3K
All distros (default) 5K
EPSS 280K rows

If we overshoot, the bar holds at 100% — the trailing `done` line states the actual count. Known-total sources (KEV / GHSA page-of-100 / Wolfi / deps.dev batch) get real percentages.

Why ASCII not \r animation

`docker logs` doesn't preserve `\r` cleanly; each frame would appear as a new visible character. New-line-per-tick is the docker-logs-native idiom.

Test plan

  • `pnpm typecheck` clean
  • All 53 server tests + 102 versions tests stay green
  • On the next v0.1.2 image, `docker logs -f refuse` shows the bars during bootstrap

`docker logs -f refuse` previously gave the user nothing to read between
"cron started" and the first "osv-delta ok in 60103 ms". Five sources
plus deps.dev and they had no idea which were running, which were done,
or how far through they were.

Add structured progress lines per source with a 20-char ASCII bar:

  refuse: ingest[osv:npm        ] ▶ starting (ecosystem 1/28)
  refuse: ingest[osv:npm        ] [██████░░░░░░░░░░░░░░]  30% • 9000 records • 12s
  refuse: ingest[osv:npm        ] [████████████████░░░░]  80% • 24000 records • 35s
  refuse: ingest[osv:npm        ] ✓ done — 30142 records in 41s
  refuse: ingest[kev            ] ▶ starting
  refuse: ingest[kev            ] [████████████████████] 100% • 1542/1542 entries
  refuse: ingest[kev            ] ✓ done — 1542 entries in 1.8s
  refuse: ingest[epss           ] ▶ starting
  refuse: ingest[epss           ] [████████████████████]  87% • 245678 rows scored 2026-06-09
  refuse: ingest[epss           ] ✓ done — 245678 rows in 14s

Streaming sources (OSV per-ecosystem, EPSS) don't know their total until
they finish, so the bar fills against a calibrated per-ecosystem estimate
(npm: 30K, PyPI: 20K, Maven: 15K, …, distros: 5K default; EPSS: 280K).
The bar holds near the cap if we overshoot — the trailing `done` line
states the actual count.

Known-total sources (KEV, GHSA page-of-100, Wolfi) show real percentages.

Tag column is fixed-width so the bars line up; ticks every 5 s so the
output streams instead of arriving in one wall at the end.
@gok03 gok03 merged commit beddfdb into main Jun 10, 2026
7 checks passed
@gok03 gok03 deleted the feat/ingest-progress-logs branch June 10, 2026 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant