Skip to content

Add /props introspection endpoint and model meta block#81

Open
easel wants to merge 3 commits into
antirez:mainfrom
easel:add-local-server-props
Open

Add /props introspection endpoint and model meta block#81
easel wants to merge 3 commits into
antirez:mainfrom
easel:add-local-server-props

Conversation

@easel
Copy link
Copy Markdown

@easel easel commented May 11, 2026

Motivation

I'm working on a benchmarking a bunch of different providers and model settings and have found it helpful to be able to query the server directly to find out what it's settings are. I had claude/and codex gin up a llama-server style /props endpoint with a bunch of useful information. Happy to adjust as needed, the intent was for this to be a low-impact change of obvious value.

-- Claude's opinion follows --

Summary

Adds a single read-only HTTP endpoint, GET /props, that returns the server's full runtime configuration as JSON, and extends the existing /v1/models[/…] cards with a parallel meta block. Everything is sourced from a new server_runtime_config struct populated once in main() from the live engine/session/cfg, so the reported numbers are the ones actually in effect.

Prior art

  • llama.cpp's llama-server exposes GET /props for runtime/model/sampler introspection. This change borrows the endpoint name and the "single JSON document describing the live server" convention, but tailors the body to what ds4 actually has (DSML tool replay, KV disk cache, MTP, think-mode aliasing, routed-expert quant bits, context-memory estimator).
  • OpenRouter's /v1/models schema is what append_model_json_values was already emitting before this branch — context_length, max_completion_tokens, supported_parameters. The new meta object slots in alongside those without breaking the existing shape.
  • /props is intentionally not loopback-gated — same posture as /v1/models, matching the llama-server convention. Binding to a trusted interface is the operator's responsibility.

Shape of the response

GET /props returns one JSON object with these sections:

  • server{ name }
  • modelid, name, path (nullable), routed_expert_quant_bits (nullable), mtp, mtp_draft_tokens
  • runtimebackend, ctx_size, default_max_tokens, effective_max_completion_tokens (the min(ctx, default_tokens) clamp), threads, quality, warm_weights
  • reasoningsupported_efforts, alias map (low/medium/xhighhigh), default, effective_default (resolved against the current ctx), think_max_min_context
  • samplingdefaults, thinking_override, tool_protocol_sampling.structural_temperature, all driven from new DS4_DEFAULT_{TEMPERATURE,TOP_P,TOP_K,MIN_P} and DS4_TOOL_STRUCTURAL_TEMPERATURE macros that also feed request_init, the think-mode override path, and the tool-call structural-sampling branch in generate_job — so the introspection figures cannot drift from the runtime
  • context_memorytotal_bytes/raw_bytes/compressed_bytes/scratch_bytes plus prefill_cap/raw_cap/comp_cap, from ds4_context_memory_estimate
  • kv_disk_cacheenabled, dir, budget_bytes, reject_different_quant, full policy block, entries
  • tool_replayexact_dsml_replay_enabled, max_ids, current_ids, current_bytes (tool memory, taken under tool_mu)
  • api — endpoint list (kept in sync with the client_main() router, called out in a comment) and supported_request_parameters (reused from the /v1/models helper so the two surfaces cannot disagree)

/v1/models and /v1/models/deepseek-v4-flash now include a meta sub-object with backend, routed_expert_quant_bits, mtp/mtp_draft_tokens, and the reasoning fields. The pre-existing fields and their byte layout are unchanged (%g formatting on the new sampling defaults keeps the current integer-valued payload identical).

Notable implementation choices

  • MTP introspection goes through ds4_engine_has_mtp() / ds4_engine_mtp_draft_tokens() rather than sniffing cfg.engine.mtp_path, so the flag reflects engine state.
  • The kv.len read in append_props_json is intentionally lockless, matching the rest of the kv.* reads in that function (worker-thread-only mutation); called out in a comment.
  • Three new unit tests cover the model meta block and the /props payload in both enabled-disk-KV and disabled-disk-KV configurations.

Test plan

  • make test (covers the three new test_model_metadata_contains_meta_fields / test_props_json_* cases)
  • curl http://<host>:<port>/props | jq . against a running server and sanity-check runtime, sampling, context_memory, kv_disk_cache
  • curl .../v1/models/deepseek-v4-flash | jq .meta to confirm the new sub-object

easel added 3 commits May 11, 2026 15:21
- Drop loopback gate on GET /props; the endpoint is now reachable like
  /v1/models, matching llama-server convention. Operators are responsible
  for binding to a trusted interface.
- Remove ctx_size / default_tokens fallback ladders in append_props_json
  by relying on the runtime config being unconditionally populated in
  main().
- Switch MTP introspection to ds4_engine_has_mtp() and
  ds4_engine_mtp_draft_tokens() so the flag reflects engine state rather
  than coupling to the cfg.engine.mtp_path argv shape.
- Pull sampling defaults (temperature, top_p, top_k, min_p, tool-call
  structural temperature) into named macros and use them in request_init,
  the thinking override path, and the /props payload. %g formatting keeps
  the current JSON byte-identical while letting future non-integer
  defaults serialize cleanly.
- Note that the kv.len read in append_props_json is intentionally
  lockless and consistent with other kv.* reads in the function.
- Comment the api.endpoints array as a sync point with client_main()
  routing.
- Add a short header comment on append_props_json describing its
  payload sections.
The previous polish commit landed a 121-char single-line block comment,
which is the longest single-line block comment in the file and past the
~90-char convention used elsewhere in ds4_server.c. Wrap it with the
leading-asterisk style used by other multi-line block comments in this
file.
@easel easel marked this pull request as ready for review May 11, 2026 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants