Rules-first · ML-assisted · LLM-optional · Offline-first
简体中文 · Documentation · Releases
CleanBook is a command-line tool for cleaning, deduplicating, and classifying browser bookmark exports. It is designed for people who want a practical offline workflow: take an exported HTML bookmark file, run one command, and get a cleaner categorized result back.
- Offline by default: bookmark processing stays on your machine
- Rules first: stable category matches are driven by config, not opaque prompts
- ML where it helps: optional ML and LLM layers improve recall instead of owning the whole pipeline
- Export-friendly: generate cleaned bookmark HTML, JSON data, and report-style outputs
pipx install cleanbook
cleanbook -i bookmarks.html -o output/Stable rules-only mode:
cleanbook -i bookmarks.html -o output/ --no-mlFrom source:
git clone https://github.com/LessUp/bookmarks-cleaner.git
cd bookmarks-cleaner
pip install -e ".[dev]"
cleanbook -i examples/demo_bookmarks.html -o output/cleanbook— the maintained CLI entry pointcleanbook-wizard— interactive wizard entry pointconfig.json+ taxonomy YAML files — the default classification surface
main.py / cleanbook
-> BookmarkProcessor
-> classifier orchestration
-> plugin pipeline
-> services (feature store, taxonomy, performance, etc.)
This repository uses OpenSpec as the only active change workflow:
/opsx:explore/opsx:propose/opsx:apply/opsx:archive
Maintained verification baseline:
python3 -m pytest -q tests/test_runtime_paths.py
python3 -m pytest -q