release 0.1.0: NeurIPS 2025 #69

EYH0602 · 2025-09-30T20:38:15Z

No description provided.

New LLM generation workflow. * add an empty .env * refactor OpenAI util class * use new OpenAI client in main * assume .env unchanged * fix: response processing * use new Gemini client in main * enable reasoning effort from cli * document why two gemini wrapper * add Claude API * add claude models to supported list * handle UnionType for Literal ReasoningEffort * add vLLM support and use it as default option * fix: use vLLM chat interface instead of gen * env add vllm api key * add VLLM_HOST and VLLM_PORT * add vllm server mode * add vLLM in dependencies * doc: instruct to run vllm from uv * make deprecated ollama a standalone script * doc: revise ollama * use 3.12 * add Ollama models * fix: ollama model name * fix: ollama model name * fix: Gemini use its own EFFORT_TOKEN_MAP * remove unused imports * fix: google-genai version * fix: ci with uv run

* update tfb to huggingface with base and pure splits * feat: load tfbench from huggingface * remove mandatory path

* answer cannot be None from LM * move evaluation logic inside the tfbench package * fix: orjson writes binary, error is not an option * fix: use pure as parameter in main._eval

* fix: allow generation to fail * remove unnecessary imports * fix: OpenAI response add reasoning summary * fix: load_gen_results_json type * fix: analysis_saved script * fix: evaluation benchmark name * fix: OpenAI response API add summary * use pydantic-v2 * extract incorrect task-answer pairs * fix: groundtruth error (#63) * fix: missing type class and typevar in benchmark * fix: order of tasks in tfb * fix: allow load_gen_results to load error * remove error_cls unused imports * extract type variables from source code * add GHC type check by proving type equiv * fix: cp -> process * fix: API change for AST * feat: type prover support new type definition * test: ghc and type_util * feat: use prover_evaluate for base split * test: add real tfbench test cases, which the deprecated evaluation failed * alt error to syntax parsing error * feat: typeclass constrains reorder * fix: AST.get_all_nodes_of_type ignores the root itself * reorder_constraints using compiler frontend static analysis * feat: add type definitions for pure tasks * test: check type equivalence prover after rewriting mono types * fix: handle type classes alone when ading new definitions * feat: define new types automatically for pure tasks * ghc prover remove standalone type class * doc: detaile docstring for prover_evaluate * script: analysis_saved run both split

* error analysis use prover * error analysis script * feat: record model name when doing error analysis * add plot script for error analysis * adjust row and column spacing * update color map

* add transformers generation as default * remove None option for router * remove vllm option for ease of dependency * Update src/tfbench/lm/_hf.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update src/tfbench/lm/_hf.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * remove unnecessary imports --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* add transformers generation as default * Update src/tfbench/lm/_hf.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update src/tfbench/lm/_hf.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * remove unnecessary imports * doc: improve instructions * fix: unused parameter and import * enable github actions on main commits * doc: add badges and images --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

EYH0602 and others added 25 commits August 13, 2025 22:04

resturcture project

a843a1c

fix: ollama new type api

bd58b6c

fix lint

bb928bf

add ruff

cf96b8c

add ruff to ci

2564136

feat: load TF-Bench from HuggingFace by default (#61)

bb04023

* update tfb to huggingface with base and pure splits * feat: load tfbench from huggingface * remove mandatory path

avoid loading vLLM for now

d885470

remove vLLM option in main

52860c6

feat: update response processing inside tfbench package (#62)

276bab7

* answer cannot be None from LM * move evaluation logic inside the tfbench package * fix: orjson writes binary, error is not an option * fix: use pure as parameter in main._eval

feat: script to analysis saved generation results

e016bbe

use orjsonl in main for consistancy

029e7d8

fix: experiment use prover_evaluate

97cac5e

feat: error analysis with reasoning steps (#65)

42c2925

* error analysis use prover * error analysis script * feat: record model name when doing error analysis * add plot script for error analysis * adjust row and column spacing * update color map

revise error_analysis default path

9937ca2

test: list constructor

9fae8e8

remove tmp file

53c19d6

fix: main missing pure parameter to

34897e5

error analysis only output category

e2003dd

default error analysis model to gpt-5-mini

d7a9c35

adjust fontsize for 5 pies in a row

272e5b5

doc: require GHC >= 9.2.1 for ImpredicativeTypes

6733da2

EYH0602 merged commit 9c09ce2 into main Sep 30, 2025
4 checks passed

EYH0602 deleted the release-0.1.0 branch November 26, 2025 21:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

release 0.1.0: NeurIPS 2025 #69

release 0.1.0: NeurIPS 2025 #69

Uh oh!

EYH0602 commented Sep 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

release 0.1.0: NeurIPS 2025 #69

release 0.1.0: NeurIPS 2025 #69

Uh oh!

Conversation

EYH0602 commented Sep 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants