Skip to content

Conversation

@ngxson
Copy link
Owner

@ngxson ngxson commented Nov 15, 2025

To test this:

  • Run the server without specifying model: llama-server
  • Open webui at localhost:8080 --> go to Settings --> Developers --> show model selector
  • Select the model you want, then send the message

@coderabbitai
Copy link

coderabbitai bot commented Nov 15, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch xsn/poc_proxy_3

Comment @coderabbitai help to get the list of available commands and usage tips.

@ngxson ngxson changed the base branch from master to xsn/split_http_server_context November 15, 2025 22:48
@ServeurpersoCom
Copy link

ServeurpersoCom commented Nov 16, 2025

Testing this now. If it works, I’ll drop my local llama-swap, keep it rebased, and report any issues.
Also, the client should stop sending default sampling values. The old llama-swap filtered them to avoid problems when IndexedDB wasn’t initialized (for friends or colleagues). The backend must be the source of truth for every model and every person.
By the way, I had suggested this idea to the llama-swap developer, and he implemented it after I made a small POC: injecting a “loading weights” message into the reasoning_content to avoid waiting on the SSE stream while weights load. It gives instant feedback during development and helps track performance issues. With your new API we’ll be able to handle this cleanly on the frontend, which is even better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants