PoC server with fully functional router, model load/unload (multiple models in parallel) #37

ngxson · 2025-11-15T22:28:47Z

To test this:

Run the server without specifying model: llama-server
Open webui at localhost:8080 --> go to Settings --> Developers --> show model selector
Select the model you want, then send the message

coderabbitai · 2025-11-15T22:28:53Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch xsn/poc_proxy_3

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ServeurpersoCom · 2025-11-16T11:45:57Z

Testing this now. If it works, I’ll drop my local llama-swap, keep it rebased, and report any issues.
Also, the client should stop sending default sampling values. The old llama-swap filtered them to avoid problems when IndexedDB wasn’t initialized (for friends or colleagues). The backend must be the source of truth for every model and every person.
By the way, I had suggested this idea to the llama-swap developer, and he implemented it after I made a small POC: injecting a “loading weights” message into the reasoning_content to avoid waiting on the SSE stream while weights load. It gives instant feedback during development and helps track performance issues. With your new API we’ll be able to handle this cleanly on the frontend, which is even better.

fully working model maganement

7a7de2a

github-actions bot added examples server labels Nov 15, 2025

ngxson changed the base branch from master to xsn/split_http_server_context November 15, 2025 22:48

improve maybe_load_it_why_not

2a20068

thread-safe

bb123af

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PoC server with fully functional router, model load/unload (multiple models in parallel) #37

PoC server with fully functional router, model load/unload (multiple models in parallel) #37

ngxson commented Nov 15, 2025

Uh oh!

coderabbitai bot commented Nov 15, 2025 •

edited

Loading

Review skipped

Uh oh!

ServeurpersoCom commented Nov 16, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PoC server with fully functional router, model load/unload (multiple models in parallel) #37

Are you sure you want to change the base?

PoC server with fully functional router, model load/unload (multiple models in parallel) #37

Conversation

ngxson commented Nov 15, 2025

Uh oh!

coderabbitai bot commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

ServeurpersoCom commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Nov 15, 2025 •

edited

Loading

ServeurpersoCom commented Nov 16, 2025 •

edited

Loading