Skip to content

[Feature] MCP protocol implementation #1331

@alankila

Description

@alankila

Feature Summary

MCP server for easy to integration

Detailed Description

With recent addition of MCP servers into llama.cpp, hunt for software to integrate to llama.cpp inference will start in earnest. In particular, stable-diffusion.cpp is great enhancement for local inference, and probably quite often among the best options for AMD processors such as Ryzen AI Max. Models such as flux.2 klein 9b run at acceptable speed per iteration, and produce good quality results in mere 4 steps, so local image generation can be fit into unified RAM even while running large 100B-parameter LLMs on home hardware such as Qwen3.5-122B-A10B.

A bare bones implementation of the JSONRPC2.0 endpoint at /mcp could be added to sd-server, which can then respond to the absolute minimums required by MCP which are the initial client handshake, informing that server has tools in it, with the tools/list and tools/call requests supported, and could expose the functionality of this software using this protocol. That would make integration trivial, as e.g. llama.cpp would be able to directly integrate with sd.cpp, enhancing the local inference with the image generation utilities.

I'm propsoing that a basic text2img tool could be provided as MCP service, with e.g. parameters prompt: string, width: number and height: number, and possibly some others as makes sense. The required arguments for starting the server, e.g. diffusion model, autoencoder, llm, etc. would be left on the server's command line.

Alternatives you considered

I currently have this kind of tool for running the sd-cli from command line, using mcp-framework on node-ts. This is the entire tool's definition, taking the IMAGE_SCRIPT environment variable and running it to make the image and then the server feeds it to llama.cpp using the MCP protocol.

By myself, I could upgrade this to use sd-server and request image generation over http, converting the response to syntax required by MCP. This would already save a few seconds because loading the model would not need to be done for each generation, and it would also naturally serialize image generation preventing OOM condition from multiple concurrent invocations.

import { MCPInput, MCPTool } from "mcp-framework";
import { spawn } from "node:child_process";
import { readFileSync } from "node:fs";
import { cwd } from "node:process";
import { z } from "zod";

const IMAGE_SCRIPT = process.env.IMAGE_SCRIPT || "";

/**
 * Generates a unique filename for the output image using timestamp and random number
 */
function generateOutputFilename(): string {
  const timestamp = Date.now();
  const random = Math.floor(Math.random() * 10000);
  /* CWD needed because z-image script can be stored in different directory as per the env variable above. */
  return `${cwd()}/image_${timestamp}_${random}.png`;
}

/**
 * Executes the image script which runs sd-cli <default arguments> "$@", with prompt, size and output file.
 * Waits for the process to complete and returns when the output file is ready
 */
async function executeImageGen(prompt: string, width: number, height: number, outputFile: string): Promise<void> {
  return new Promise((resolve, reject) => {
    /* 120 second timeout */
    const timeout = setTimeout(() => {
      process.kill();
      reject(new Error(`Timeout: ${IMAGE_SCRIPT} took too long to complete`));
    }, 120000);

    const process = spawn(IMAGE_SCRIPT, ["-p", prompt, "-W", width.toFixed(), "-H", height.toFixed(), "-o", outputFile], {
      shell: false,
    });

    let i = 0;
    process.stdout?.on("data", (data: Buffer) => {
      console.log(data.toString());
    });
    let stderr = "";
    process.stderr?.on("data", (data: Buffer) => {
      stderr += data.toString();
    });

    process.on("error", (error) => {
      clearTimeout(timeout);
      reject(new Error(`Failed to execute ${IMAGE_SCRIPT}: ${error.message}`));
    });

    process.on("close", (code) => {
      clearTimeout(timeout);
      if (code === 0) {
        resolve();
      } else {
        reject(new Error(`${IMAGE_SCRIPT} exited with code ${code}: ${stderr}`));
      }
    });
  });
}

const schema = z.object({
  prompt: z.string().describe("Natural language prompt"),
  width: z.number().describe("Size in px").min(512).max(2048).default(512),
  height: z.number().describe("Size in px").min(512).max(2048).default(512),
});

class GenerateImageTool extends MCPTool {
  name = "generate_image";
  description = "Render an image";
  schema = schema;

  async execute({ prompt, width, height }: MCPInput<this>) {
    const outputFile = generateOutputFilename();
    await executeImageGen(prompt, width, height, outputFile);

    const imageBuffer = readFileSync(outputFile);
    const base64Data = imageBuffer.toString('base64');

    return [{
      type: "image",
      mimeType: "image/png",
      data: base64Data,
    }];
  }
}

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions