Automated monitoring of HuggingFace for new AI model releases β filtered and summarised per user profile, delivered as rolling markdown digests updated every 3 hours.
Live digest β busebircan.github.io/model-tracker
- Most Recent panel: updated every 3 hours via GitHub Actions
- Most Popular panel: refreshed daily at 07:00 UTC
- Fetches models released in the last 24 hours from HuggingFace Hub (rolling window)
- Classifies each model against 6 use cases using license checks and keyword matching
- Enriches matched models with benchmark scores (HF Open LLM Leaderboard, MTEB)
- Summarises matched models (task, license, size, downloads, relevance explanation)
- Generates per-profile rolling digest files (
digests/latest-<profile>.md/.json) - Tracks popular models (top downloads on HuggingFace) separately, refreshed daily
Digests are committed back to the repo automatically after each run.
| Profile | License Filter | Focus |
|---|---|---|
| Agent & Tool Use | Commercial only | Tool-use, code gen, vision, fast inference, embeddings |
| Vision & Edge Deployment | Commercial only | Vision (thermal/IR), offline-capable, RAG, time-series, document understanding |
| Optimisation & Reasoning | All licenses | OR/optimization, simulation, code gen (OR-Tools/PuLP), reasoning |
| Retrieval & Embeddings | All licenses | Embeddings, rerankers, long-context, chunking |
| Research & Summarisation | All licenses | Summarization, research paper analysis, ArXiv monitoring |
| Safety & Security | All licenses | Guardrails, content filtering, PII detection, prompt injection defence, bias evaluation, alignment classifiers, output validation |
Commercial licenses include: apache-2.0, mit, bsd-3-clause, cc-by-4.0, openrail, llama3*, gemma, and similar permissive licenses.
model-tracker/
βββ config/
β βββ profiles.yaml # Profile definitions, license lists, fetcher settings
βββ src/
β βββ fetcher.py # HuggingFace Hub API fetching (recent + popular)
β βββ classifier.py # License + keyword classification
β βββ summariser.py # Human-readable model summaries
β βββ digest.py # Markdown/JSON digest assembly
β βββ benchmarks.py # Benchmark score enrichment (HF Leaderboard, MTEB)
β βββ popular.py # Popular models fetcher (top downloads, daily)
β βββ dashboard_data.py # Generates docs/data/latest.json for the dashboard
β βββ runner.py # Main entry point (orchestrator)
βββ templates/
β βββ digest.md.j2 # Jinja2 markdown template
βββ digests/ # Rolling digest files (latest-<profile>.md/.json)
βββ state/
β βββ last_run.json # State file (last run timestamp)
βββ docs/
β βββ data/
β βββ latest.json # Dashboard data (recent models, all profiles)
β βββ popular_cache.json # Dashboard data (popular models, all profiles)
βββ .github/
β βββ workflows/
β βββ run_tracker.yml # Runs every 3 hours β fetches recent models
β βββ refresh_popular.yml # Runs daily at 07:00 UTC β refreshes popular models
βββ requirements.txt
βββ README.md
pip install -r requirements.txtpython src/runner.py --dry-runpython src/runner.pypython src/runner.py --days 3python src/popular.pypython src/runner.py --help
--dry-run Fetch and classify, print to stdout, don't write files
--config PATH Path to profiles YAML (default: config/profiles.yaml)
--output-dir DIR Directory for digest files (default: digests/)
--state-file PATH Path to state JSON (default: state/last_run.json)
--days N Override lookback window in days (default: rolling 24h)
--verbose / -v Enable DEBUG logging
Edit config/profiles.yaml to:
- Add/remove profiles
- Change license allowlists
- Adjust task and tag keywords per profile
- Set
fetcher.max_models_per_run(default: 200)
The runner uses a rolling 24-hour lookback window by default. Pass --days N to override.
The config maintains two lists:
commercial_licensesβ licenses explicitly allowed for commercial usenon_commercial_licensesβ licenses that block commercial use (hard filter for commercial-only profiles)
Unknown licenses are shown with a β flag and still included (conservative approach β let users judge).
Two workflows run automatically:
| Workflow | Schedule | What it does |
|---|---|---|
run_tracker.yml |
Every 3 hours | Fetches recent models, writes digests and dashboard data |
refresh_popular.yml |
Daily at 07:00 UTC | Fetches top models by downloads, updates popular panel |
- Fork / clone this repo
- Go to Settings β Secrets β Actions
- Add
HF_TOKENβ your HuggingFace read token (optional but avoids rate limits) - The workflows will commit digests and dashboard data back to the repo automatically
Go to Actions β Run Model Tracker β Run workflow and optionally set:
dry_run: trueβ preview without committingdays: 3β custom lookback window
Each profile gets a rolling digest at digests/latest-<profile-slug>.md:
# Model Tracker Digest β Agent & Tool Use
**Date:** 2025-01-15
**Profile:** Agent & Tool Use
**New models found:** 12
---
### [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct)
**Author:** Qwen
**Task:** text generation / language modelling
**License:** `apache-2.0` β
commercial use allowed
**Size:** ~7B (from model name)
**Published:** 2025-01-14
**Popularity:** 45.2K downloads Β· 312 likes
**Tags:** `code`, `instruct`, `gguf`
**Why relevant:** Matched for Agent & Tool Use via commercial license, task match: text-generation, tag match: code. Capabilities: strong code generation capability.In config/profiles.yaml, add a new entry under profiles::
my_new_profile:
display_name: "My New Profile"
commercial_only: false
description: "Description of what this profile monitors"
task_keywords:
- text-generation
tag_keywords:
- my-keywordconfig/profiles.yaml includes an arxiv: section. The ArXiv fetcher can be enabled to pull recent papers matching your category list and include them in the Research & Summarisation profile digest.
MIT β see LICENSE