Add /props introspection endpoint and model meta block by easel · Pull Request #81 · antirez/ds4

easel · 2026-05-11T20:24:35Z

Motivation

I'm working on a benchmarking a bunch of different providers and model settings and have found it helpful to be able to query the server directly to find out what it's settings are. I had claude/and codex gin up a llama-server style /props endpoint with a bunch of useful information. Happy to adjust as needed, the intent was for this to be a low-impact change of obvious value.

-- Claude's opinion follows --

Summary

Adds a single read-only HTTP endpoint, GET /props, that returns the server's full runtime configuration as JSON, and extends the existing /v1/models[/…] cards with a parallel meta block. Everything is sourced from a new server_runtime_config struct populated once in main() from the live engine/session/cfg, so the reported numbers are the ones actually in effect.

Prior art

llama.cpp's llama-server exposes GET /props for runtime/model/sampler introspection. This change borrows the endpoint name and the "single JSON document describing the live server" convention, but tailors the body to what ds4 actually has (DSML tool replay, KV disk cache, MTP, think-mode aliasing, routed-expert quant bits, context-memory estimator).
OpenRouter's /v1/models schema is what append_model_json_values was already emitting before this branch — context_length, max_completion_tokens, supported_parameters. The new meta object slots in alongside those without breaking the existing shape.
/props is intentionally not loopback-gated — same posture as /v1/models, matching the llama-server convention. Binding to a trusted interface is the operator's responsibility.

Shape of the response

GET /props returns one JSON object with these sections:

server — { name }
model — id, name, path (nullable), routed_expert_quant_bits (nullable), mtp, mtp_draft_tokens
runtime — backend, ctx_size, default_max_tokens, effective_max_completion_tokens (the min(ctx, default_tokens) clamp), threads, quality, warm_weights
reasoning — supported_efforts, alias map (low/medium/xhigh → high), default, effective_default (resolved against the current ctx), think_max_min_context
sampling — defaults, thinking_override, tool_protocol_sampling.structural_temperature, all driven from new DS4_DEFAULT_{TEMPERATURE,TOP_P,TOP_K,MIN_P} and DS4_TOOL_STRUCTURAL_TEMPERATURE macros that also feed request_init, the think-mode override path, and the tool-call structural-sampling branch in generate_job — so the introspection figures cannot drift from the runtime
context_memory — total_bytes/raw_bytes/compressed_bytes/scratch_bytes plus prefill_cap/raw_cap/comp_cap, from ds4_context_memory_estimate
kv_disk_cache — enabled, dir, budget_bytes, reject_different_quant, full policy block, entries
tool_replay — exact_dsml_replay_enabled, max_ids, current_ids, current_bytes (tool memory, taken under tool_mu)
api — endpoint list (kept in sync with the client_main() router, called out in a comment) and supported_request_parameters (reused from the /v1/models helper so the two surfaces cannot disagree)

/v1/models and /v1/models/deepseek-v4-flash now include a meta sub-object with backend, routed_expert_quant_bits, mtp/mtp_draft_tokens, and the reasoning fields. The pre-existing fields and their byte layout are unchanged (%g formatting on the new sampling defaults keeps the current integer-valued payload identical).

Notable implementation choices

MTP introspection goes through ds4_engine_has_mtp() / ds4_engine_mtp_draft_tokens() rather than sniffing cfg.engine.mtp_path, so the flag reflects engine state.
The kv.len read in append_props_json is intentionally lockless, matching the rest of the kv.* reads in that function (worker-thread-only mutation); called out in a comment.
Three new unit tests cover the model meta block and the /props payload in both enabled-disk-KV and disabled-disk-KV configurations.

Test plan

make test (covers the three new test_model_metadata_contains_meta_fields / test_props_json_* cases)
curl http://<host>:<port>/props | jq . against a running server and sanity-check runtime, sampling, context_memory, kv_disk_cache
curl .../v1/models/deepseek-v4-flash | jq .meta to confirm the new sub-object

- Drop loopback gate on GET /props; the endpoint is now reachable like /v1/models, matching llama-server convention. Operators are responsible for binding to a trusted interface. - Remove ctx_size / default_tokens fallback ladders in append_props_json by relying on the runtime config being unconditionally populated in main(). - Switch MTP introspection to ds4_engine_has_mtp() and ds4_engine_mtp_draft_tokens() so the flag reflects engine state rather than coupling to the cfg.engine.mtp_path argv shape. - Pull sampling defaults (temperature, top_p, top_k, min_p, tool-call structural temperature) into named macros and use them in request_init, the thinking override path, and the /props payload. %g formatting keeps the current JSON byte-identical while letting future non-integer defaults serialize cleanly. - Note that the kv.len read in append_props_json is intentionally lockless and consistent with other kv.* reads in the function. - Comment the api.endpoints array as a sync point with client_main() routing. - Add a short header comment on append_props_json describing its payload sections.

The previous polish commit landed a 121-char single-line block comment, which is the longest single-line block comment in the file and past the ~90-char convention used elsewhere in ds4_server.c. Wrap it with the leading-asterisk style used by other multi-line block comments in this file.

easel added 3 commits May 11, 2026 15:21

Add local server metadata introspection

e92d7ed

easel marked this pull request as ready for review May 11, 2026 20:27

antirez added the http-api label May 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add /props introspection endpoint and model meta block#81

Add /props introspection endpoint and model meta block#81
easel wants to merge 3 commits into
antirez:mainfrom
easel:add-local-server-props

easel commented May 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

easel commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Summary

Prior art

Shape of the response

Notable implementation choices

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

easel commented May 11, 2026 •

edited

Loading