feat(ingest): visual progress bars in docker logs for every source#34
Merged
Conversation
`docker logs -f refuse` previously gave the user nothing to read between "cron started" and the first "osv-delta ok in 60103 ms". Five sources plus deps.dev and they had no idea which were running, which were done, or how far through they were. Add structured progress lines per source with a 20-char ASCII bar: refuse: ingest[osv:npm ] ▶ starting (ecosystem 1/28) refuse: ingest[osv:npm ] [██████░░░░░░░░░░░░░░] 30% • 9000 records • 12s refuse: ingest[osv:npm ] [████████████████░░░░] 80% • 24000 records • 35s refuse: ingest[osv:npm ] ✓ done — 30142 records in 41s refuse: ingest[kev ] ▶ starting refuse: ingest[kev ] [████████████████████] 100% • 1542/1542 entries refuse: ingest[kev ] ✓ done — 1542 entries in 1.8s refuse: ingest[epss ] ▶ starting refuse: ingest[epss ] [████████████████████] 87% • 245678 rows scored 2026-06-09 refuse: ingest[epss ] ✓ done — 245678 rows in 14s Streaming sources (OSV per-ecosystem, EPSS) don't know their total until they finish, so the bar fills against a calibrated per-ecosystem estimate (npm: 30K, PyPI: 20K, Maven: 15K, …, distros: 5K default; EPSS: 280K). The bar holds near the cap if we overshoot — the trailing `done` line states the actual count. Known-total sources (KEV, GHSA page-of-100, Wolfi) show real percentages. Tag column is fixed-width so the bars line up; ticks every 5 s so the output streams instead of arriving in one wall at the end.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
`docker logs -f refuse` between v0.1.0 and v0.1.1 gave the user nothing to read between "cron started" and the first "osv-delta ok in 60103 ms". Five+ data sources and no signal of which was running, which were done, or how far along they were. The new `/readyz` endpoint in v0.1.1 told you "these sources are still pending" but not "npm OSV is 60% through right now".
What
Per-source progress lines with a 20-char ASCII bar, emitted every ~5s during streaming and at start/done for fast sources. Fixed-width tag column so the bars line up.
```
refuse: ingest[osv:npm ] ▶ starting (ecosystem 1/28)
refuse: ingest[osv:npm ] [██████░░░░░░░░░░░░░░] 30% • 9000 records • 12s
refuse: ingest[osv:npm ] [████████████████░░░░] 80% • 24000 records • 35s
refuse: ingest[osv:npm ] ✓ done — 30142 records in 41s
refuse: ingest[kev ] ▶ starting
refuse: ingest[kev ] [████████████████████] 100% • 1542/1542 entries
refuse: ingest[kev ] ✓ done — 1542 entries in 1.8s
refuse: ingest[epss ] ▶ starting
refuse: ingest[epss ] [████████████████████] 87% • 245678 rows scored 2026-06-09
refuse: ingest[epss ] ✓ done — 245678 rows in 14s
refuse: ingest[ghsa ] ▶ starting (no prior cursor)
refuse: ingest[ghsa ] [████████████████████] 100% • 100/100 records this page
refuse: ingest[ghsa ] ✓ done — 100 records in 2s (cursor saved)
refuse: ingest[wolfi ] ▶ starting
refuse: ingest[wolfi ] [████████████████████] 100% • 412 packages, 1247 records
refuse: ingest[wolfi ] ✓ done — 1247 records across 412 packages in 4s
refuse: ingest[deps-dev ] ▶ starting (400 packages in this batch)
refuse: ingest[deps-dev ] [██████████░░░░░░░░░░] 50% • 200/400 packages • 12s
refuse: ingest[deps-dev ] ✓ done — 9824 version rows across 400 packages in 25s
```
Streaming vs known-total bars
Streaming sources (OSV per-ecosystem, EPSS) don't know their final count until the stream closes, so the bar fills against a calibrated per-ecosystem estimate:
If we overshoot, the bar holds at 100% — the trailing `done` line states the actual count. Known-total sources (KEV / GHSA page-of-100 / Wolfi / deps.dev batch) get real percentages.
Why ASCII not \r animation
`docker logs` doesn't preserve `\r` cleanly; each frame would appear as a new visible character. New-line-per-tick is the docker-logs-native idiom.
Test plan