GitHub - callstackincubator/evals: A benchmark suite for evaluating how coding models solve real React Native tasks.

A benchmark suite for evaluating how coding models solve real React Native tasks.

Available Evals

Groups map to top-level folders under evals/.

Group	Path	Status
animation	`evals/animation`	Active
async-state	`evals/async-state`	Active
navigation	`evals/navigation`	Active
react-native-apis	`evals/react-native-apis`	WIP
expo-sdk	`evals/expo-sdk`	WIP
nitro-modules	`evals/nitro-modules`	WIP
lists	`evals/lists`	WIP

Want a group that is not listed here? Open an issue to request it. Contributions are also welcome.

Getting Started

bun install
bun runner/run.ts --model openai/gpt-4.1-mini --output generated/my-generated
bun runner/judge.ts --model openai/gpt-5.3-codex --input generated/my-generated

For full command reference and workflows, see docs and CONTRIBUTING.md.

Whitepaper

Methodology and scoring details are documented in the benchmark methodology whitepaper.

The benchmark evaluates model-generated React Native implementations using requirement-based assessment. Each eval specifies a fixed task context and a set of explicit, judgeable requirements. Model outputs are judged against these requirements using file-level evidence, and per-eval scores are computed from requirement outcomes with optional weighting. Aggregate run metrics summarize performance across evals under a consistent evaluation protocol.

Requests And Contributions

If you want to request new features to be evaluated, open an issue. We are open to covering the most popular ecosystem libraries and will continue expanding coverage.

Contributions are welcome. Start with CONTRIBUTING.md and AGENTS.md.

License

MIT (LICENSE)

Name		Name	Last commit message	Last commit date
Latest commit History 232 Commits
assets		assets
docs		docs
evals		evals
paper		paper
results		results
runner		runner
runs		runs
scripts		scripts
testbench		testbench
utils		utils
.gitignore		.gitignore
.prettierrc		.prettierrc
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
eslint.config.mjs		eslint.config.mjs
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Available Evals

Getting Started

Whitepaper

Requests And Contributions

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 15

Languages

Folders and files

Latest commit

History

Repository files navigation

Available Evals

Getting Started

Whitepaper

Requests And Contributions

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 15

Languages

Packages