local-batch-api

Drop-in replacement for the Anthropic and OpenAI batch APIs — self-hosted, running against local Ollama or LM Studio. No API keys, no cloud costs, no rate limits.

Requirements

PHP 8.4+
Laravel 13+
Running Ollama or LM Studio instance
A queue worker (php artisan queue:work)

Installation

Step 1 — Install the package

composer require digitaldreams/local-batch-api

Step 2 — Run migrations

php artisan migrate

This creates two tables: batches and batch_files.

Step 3 — Configure your inference backend

Add to your .env:

# 'ollama' (default) or 'lmstudio'
INFERENCE_PROVIDER=ollama

# Base URL of your local inference server
INFERENCE_URL=http://localhost:11434   # Ollama default
# INFERENCE_URL=http://localhost:1234  # LM Studio default

# Default model (can be overridden per request)
INFERENCE_MODEL=llama3.2

# Seconds before a single request times out
INFERENCE_TIMEOUT=120

# Parallel requests per batch chunk — keep at 1 for CPU, raise to 3-5 for GPU
INFERENCE_CONCURRENCY=1

Step 4 — Start a queue worker

Batch jobs run asynchronously. The worker must be running:

php artisan queue:work

Two Ways to Use This Package

This package supports two independent usage patterns:

	Event-based	REST API
Who calls it	Your own Laravel code	Any HTTP client (SDK, curl, external app)
Auth	Laravel's existing auth	Sanctum token (or your middleware)
Routes needed	No	Yes
Best for	Internal pipelines, jobs, commands	Replacing Anthropic/OpenAI SDK endpoints

Approach 1 — Event-based (Internal Usage)

Use this when your own Laravel application needs to submit and process batches. No HTTP routes required.

Submitting an Anthropic-format batch

Fire a SubmitAnthropicBatchEvent event. The package listener picks it up and dispatches the processing job automatically.

use BatchApi\Events\SubmitAnthropicBatchEvent;
use BatchApi\Data\Input\AnthropicBatchItemDto;

$items = [
    new AnthropicBatchItemDto(
        customId: 'req-1',
        maxTokens: 512,
        messages: [
            ['role' => 'user', 'content' => 'Summarise this article in one paragraph.'],
        ],
    ),
    new AnthropicBatchItemDto(
        customId: 'req-2',
        maxTokens: 256,
        messages: [
            ['role' => 'user', 'content' => 'What is the capital of France?'],
        ],
        system: 'You are a geography expert.',
    ),
];

event(new SubmitAnthropicBatchEvent($items));

Submitting an OpenAI-format batch

The OpenAI flow requires a file ID. Upload first using the BatchService, then fire the event.

use BatchApi\BatchService;
use BatchApi\Events\SubmitOpenAiBatchEvent;
use BatchApi\Data\Input\OpenAiBatchItemDto;

$service = app(BatchService::class);

// Build items from raw JSONL or manually
$items = [
    new OpenAiBatchItemDto(
        customId: 'req-1',
        messages: [['role' => 'user', 'content' => 'Hello']],
        maxTokens: 512,
    ),
];

// Create a file record (mirrors OpenAI's file upload step)
$file = $service->uploadFile(
    collect($items)->map(fn ($item) => json_encode([
        'custom_id' => $item->customId,
        'method' => 'POST',
        'url' => '/v1/chat/completions',
        'body' => ['messages' => $item->messages, 'max_tokens' => $item->maxTokens],
    ]))->implode("\n")
);

event(new SubmitOpenAiBatchEvent($file->id, $items));

Listening for results

Listen to BatchCompletedEvent to act on results when processing finishes:

// app/Listeners/HandleBatchCompletedListener.php

use BatchApi\Events\BatchCompletedEvent;
use BatchApi\Data\BatchResultDto;

class HandleBatchCompletedListener
{
    public function handle(BatchCompletedEvent $event): void
    {
        $batch = $event->batch;

        foreach ($event->results as $result) {
            /** @var BatchResultDto $result */
            if ($result->succeeded) {
                // $result->customId   — matches your request's custom_id
                // $result->content    — the model's response text
                // $result->model      — model used
                // $result->inputTokens / $result->outputTokens
            } else {
                // $result->error — failure message
            }
        }
    }
}

Register it in AppServiceProvider::boot():

// app/Providers/AppServiceProvider.php

use BatchApi\Events\BatchCompletedEvent;
use App\Listeners\HandleBatchCompletedListener;
use Illuminate\Support\Facades\Event;

public function boot(): void
{
    Event::listen(BatchCompletedEvent::class, HandleBatchCompletedListener::class);
}

All available events

Event	Properties	Fired when
`BatchCreatedEvent`	`$batch`, `$items`, `$provider`	Batch record saved, job dispatched
`BatchProcessingEvent`	`$batch`	Queue worker picks up the job
`BatchItemStartedEvent`	`$batch`, `$dto`	Single request about to fire
`BatchItemCompletedEvent`	`$batch`, `$result`	Single request finished
`BatchCompletedEvent`	`$batch`, `$results`	All requests done
`BatchFailedEvent`	`$batch`, `$exception`	Job threw an unrecoverable error
`BatchCancelledEvent`	`$batch`	Batch cancelled

Approach 2 — REST API (External HTTP Clients)

Use this when you want to point an existing Anthropic or OpenAI SDK at your local server instead of the cloud. The API surface is identical to the real APIs.

Step 1 — Register routes with authentication

Do not set BATCH_API_EXPOSE_ROUTES=true. Instead, register routes manually inside a protected middleware group so you control authentication.

Install Sanctum if you haven't already:

composer require laravel/sanctum
php artisan install:api

In your routes/api.php (or a service provider), wrap BatchApi::routes() with Sanctum middleware:

use BatchApi\Facades\BatchApi;

Route::middleware('auth:sanctum')->group(function () {
    BatchApi::routes();
});

This registers all 11 endpoints, each requiring a valid Sanctum token.

Note: BatchApi::routes() also applies the api middleware internally. Wrapping it with auth:sanctum stacks both, so your routes have api + auth:sanctum.

Step 2 — Issue a token

// In a controller or seeder
$token = $user->createToken('batch-api-client')->plainTextToken;
// Pass this token to the HTTP client

Step 3 — Call the API

All requests need the token in the Authorization header:

Authorization: Bearer <token>

Anthropic Batch API — Step by Step

1. Submit a batch

POST /api/anthropic/v1/messages/batches
Content-Type: application/json
Authorization: Bearer <token>

{
  "requests": [
    {
      "custom_id": "req-1",
      "params": {
        "model": "llama3.2",
        "max_tokens": 512,
        "messages": [
          { "role": "user", "content": "Say hello in one sentence." }
        ]
      }
    },
    {
      "custom_id": "req-2",
      "params": {
        "model": "llama3.2",
        "max_tokens": 512,
        "system": "You are a pirate. Always respond like a pirate.",
        "messages": [
          { "role": "user", "content": "What is the capital of France?" }
        ]
      }
    }
  ]
}

Response 202 Accepted:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "type": "message_batch",
  "processing_status": "in_progress",
  "request_counts": { "processing": 2, "succeeded": 0, "errored": 0, "canceled": 0, "expired": 0 },
  "created_at": "2026-05-25T10:00:00+00:00",
  "expires_at": "2026-05-26T10:00:00+00:00",
  "ended_at": null,
  "cancel_initiated_at": null,
  "results_url": null
}

2. Poll until done

GET /api/anthropic/v1/messages/batches/{id}
Authorization: Bearer <token>

Keep polling until processing_status is "ended".

3. Fetch results (NDJSON)

GET /api/anthropic/v1/messages/batches/{id}/results
Accept: application/x-ndjson
Authorization: Bearer <token>

Returns 204 No Content if still processing. When ready, streams one JSON object per line:

{"custom_id":"req-1","result":{"type":"succeeded","message":{"id":"msg_abc","type":"message","role":"assistant","model":"llama3.2","content":[{"type":"text","text":"Hello! Great to meet you."}],"stop_reason":"end_turn","usage":{"input_tokens":12,"output_tokens":10}}}}
{"custom_id":"req-2","result":{"type":"errored","error":{"type":"server_error","message":"Ollama timeout"}}}

Other Anthropic endpoints

GET  /api/anthropic/v1/messages/batches              # list (supports ?limit=&before_id=&after_id=)
POST /api/anthropic/v1/messages/batches/{id}/cancel  # cancel

OpenAI Batch API — Step by Step

1. Upload a JSONL file

Create a .jsonl file (one request per line):

{"custom_id":"req-1","method":"POST","url":"/v1/chat/completions","body":{"model":"llama3.2","messages":[{"role":"user","content":"Hello"}],"max_tokens":512}}
{"custom_id":"req-2","method":"POST","url":"/v1/chat/completions","body":{"model":"llama3.2","messages":[{"role":"user","content":"What is 2+2?"}],"max_tokens":256}}

Upload it:

POST /api/openai/v1/files
Content-Type: multipart/form-data
Authorization: Bearer <token>

file=@requests.jsonl
purpose=batch

Response 201 Created:

{
  "id": "file-abc123",
  "object": "file",
  "purpose": "batch",
  "created_at": 1716631200
}

2. Submit the batch

POST /api/openai/v1/batches
Content-Type: application/json
Authorization: Bearer <token>

{
  "input_file_id": "file-abc123",
  "endpoint": "/v1/chat/completions",
  "completion_window": "24h"
}

Response 201 Created:

{
  "id": "550e8400-e29b-41d4-a716-446655440001",
  "object": "batch",
  "status": "validating",
  "input_file_id": "file-abc123",
  "output_file_id": null,
  "request_counts": { "total": 2, "completed": 0, "failed": 0 }
}

3. Poll until completed

GET /api/openai/v1/batches/{id}
Authorization: Bearer <token>

Poll until status is "completed". Note the output_file_id in the response.

4. Download results

GET /api/openai/v1/files/{output_file_id}/content
Authorization: Bearer <token>

Returns JSONL, one result per line:

{"id":"batch_req_abc","custom_id":"req-1","response":{"status_code":200,"body":{"id":"chatcmpl-123","object":"chat.completion","model":"llama3.2","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! How can I help?"},"finish_reason":"stop"}],"usage":{"prompt_tokens":10,"completion_tokens":8,"total_tokens":18}}},"error":null}

Other OpenAI endpoints

GET  /api/openai/v1/batches                  # list (supports ?limit=&after=)
POST /api/openai/v1/batches/{id}/cancel      # cancel

Pointing an existing SDK at this server

Python (Anthropic SDK):

import anthropic

client = anthropic.Anthropic(
    api_key="any-value",           # required by SDK but not validated here
    base_url="http://localhost:8000/api/anthropic",
    default_headers={"Authorization": "Bearer <token>"},
)

Python (OpenAI SDK):

from openai import OpenAI

client = OpenAI(
    api_key="any-value",
    base_url="http://localhost:8000/api/openai",
    default_headers={"Authorization": "Bearer <token>"},
)

Batch Status Lifecycle

pending → processing → completed
                     → failed
       → cancelling  → cancelled

Internal	Anthropic `processing_status`	OpenAI `status`
`pending`	`in_progress`	`validating`
`processing`	`in_progress`	`in_progress`
`completed`	`ended`	`completed`
`failed`	`ended`	`failed`
`cancelling`	`canceling`	`cancelling`
`cancelled`	`ended`	`cancelled`

Batches expire after 24 hours.

Switching to LM Studio

Open LM Studio → start the local server (default port 1234)
Load a model
Update .env:

INFERENCE_PROVIDER=lmstudio
INFERENCE_URL=http://localhost:1234
INFERENCE_MODEL=your-model-name

No other changes needed.

Concurrency Tuning

INFERENCE_CONCURRENCY controls parallel requests per batch chunk.

Hardware	Value
CPU-only	`1`
GPU with spare VRAM	`3`–`5`

Postman Collection

Import Local-Batch-API.postman_collection.json. Set the baseUrl variable to your server URL. The collection auto-saves batch IDs and file IDs between requests so you can run folders top-to-bottom without manually copying values.

Troubleshooting

Batches stay pending forever — Queue worker not running. Run php artisan queue:work.

Ollama timeout in results — Model is slow or INFERENCE_TIMEOUT too low. Raise to 300.

Routes return 404 — Routes not registered. Either set BATCH_API_EXPOSE_ROUTES=true (no auth) or call BatchApi::routes() manually in a middleware group.

401 Unauthorized on API routes — Sanctum token missing or invalid. Pass Authorization: Bearer <token> header.

cannot chdir git error in submodule — Run git submodule update --init in the parent repo.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
config		config
database/factories		database/factories
src		src
tests		tests
.gitignore		.gitignore
Local-Batch-API.postman_collection.json		Local-Batch-API.postman_collection.json
README.md		README.md
composer.json		composer.json
composer.lock		composer.lock
phpunit.xml		phpunit.xml

Folders and files

Latest commit

History

Repository files navigation

local-batch-api

Requirements

Installation

Step 1 — Install the package

Step 2 — Run migrations

Step 3 — Configure your inference backend

Step 4 — Start a queue worker

Two Ways to Use This Package

Approach 1 — Event-based (Internal Usage)

Submitting an Anthropic-format batch

Submitting an OpenAI-format batch

Listening for results

All available events

Approach 2 — REST API (External HTTP Clients)

Step 1 — Register routes with authentication

Step 2 — Issue a token

Step 3 — Call the API

Anthropic Batch API — Step by Step

1. Submit a batch

2. Poll until done

3. Fetch results (NDJSON)

Other Anthropic endpoints

OpenAI Batch API — Step by Step

1. Upload a JSONL file

2. Submit the batch

3. Poll until completed

4. Download results

Other OpenAI endpoints

Pointing an existing SDK at this server

Batch Status Lifecycle

Switching to LM Studio

Concurrency Tuning

Postman Collection

Troubleshooting

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages