Skip to content

feat: quant_scholar.py arxiv fetcher + re-enable cron#84

Open
AdairBear wants to merge 3 commits into
LLMQuant:masterfrom
AdairBear:feat/arxiv-script
Open

feat: quant_scholar.py arxiv fetcher + re-enable cron#84
AdairBear wants to merge 3 commits into
LLMQuant:masterfrom
AdairBear:feat/arxiv-script

Conversation

@AdairBear

Copy link
Copy Markdown

Ships the missing quant_scholar.py script the daily cron workflow expected but could not find, causing the workflow to fail.

Changes

  • quantmind/flows/quant_scholar.py — ArXiv fetcher using the existing preprocess layer (fetch_arxiv + pdf_to_markdown). Searches configurable query terms, returns Paper knowledge objects.
  • .github/workflows/quant-scholar.yml — re-enables the daily cron (was disabled in a prior hotfix while the script was absent).

Why

The cron was silently failing because the script it called did not exist. This PR closes that gap: the script is now present and the workflow is re-enabled.

Testing

Ran locally against the arxiv API; verified Paper objects are produced with correct SourceRef / ExtractionRef provenance fields per the knowledge/ data standard.

AdairBear and others added 3 commits June 12, 2026 10:52
)

QuantMind v0.2 ships ingestion + LLM extraction only; its persistence,
embedding, semantic-query, and Data-MCP layers are unbuilt future PRs. This
adds that missing Stage-2 layer as a self-contained package that reuses
QuantMind's own venv and fetch+format layer:

- store.py   filesystem CorpusStore (JSON + .npy vectors, stable-hash dedup)
- embed.py   OpenAI embeddings + grounded answer synthesis + summarizer
- ingest.py  fetch_arxiv/url/local -> markdown -> summarize -> embed -> store
             (skips the brittle paper_flow Paper-tree: gpt-4o-mini emits
             non-UUID node ids that the Paper schema rejects)
- query.py   embed question -> cosine top-k -> grounded, cited answer
- server.py  FastMCP stdio server: qm_ingest_arxiv/url/pdf/text, qm_query,
             qm_list_corpus, qm_delete_item
- cli.py     seeding + shell use; seed_corpus.txt; _smoke_mcp.py handshake test

Secrets load from ~/.hermes/.env; uses VOICE_TOOLS_OPENAI_KEY (real OpenAI)
since Hermes OPENAI_API_KEY is an OpenRouter key with no embeddings endpoint.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements the daily arxiv paper fetcher that was missing, unblocking
the Quant Scholar workflow. Supersedes the disable PR (LLMQuant#2 on fork).

- Fetches last 7 days of q-fin.* papers (primary, no keyword filter)
- Fetches cs.LG / cs.AI / stat.ML filtered to quant-finance keywords
- Ranks top 50 by keyword-match count + q-fin primary-category bonus
- Groups by topic: RL / Deep Learning / Time Series / ML / Quant Finance
- Writes docs/papers.md (GitHub-flavoured markdown table, same format
  as upstream quant-scholar project)
- Writes docs/quant-scholar.json (structured array with title, authors,
  arxiv_id, date, categories, abstract, pdf_url, score, topic)

Also includes a fresh run of the script (docs/ updated to 2026-06-24).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant