Drop-in replacement for the Anthropic and OpenAI batch APIs — self-hosted, running against local Ollama or LM Studio. No API keys, no cloud costs, no rate limits.
composer require digitaldreams/local-batch-apiphp artisan migrateThis creates two tables: batches and batch_files.
Add to your .env:
# 'ollama' (default) or 'lmstudio'
INFERENCE_PROVIDER=ollama
# Base URL of your local inference server
INFERENCE_URL=http://localhost:11434 # Ollama default
# INFERENCE_URL=http://localhost:1234 # LM Studio default
# Default model (can be overridden per request)
INFERENCE_MODEL=llama3.2
# Seconds before a single request times out
INFERENCE_TIMEOUT=120
# Parallel requests per batch chunk — keep at 1 for CPU, raise to 3-5 for GPU
INFERENCE_CONCURRENCY=1Batch jobs run asynchronously. The worker must be running:
php artisan queue:workThis package supports two independent usage patterns:
| Event-based | REST API | |
|---|---|---|
| Who calls it | Your own Laravel code | Any HTTP client (SDK, curl, external app) |
| Auth | Laravel's existing auth | Sanctum token (or your middleware) |
| Routes needed | No | Yes |
| Best for | Internal pipelines, jobs, commands | Replacing Anthropic/OpenAI SDK endpoints |
Use this when your own Laravel application needs to submit and process batches. No HTTP routes required.
Fire a SubmitAnthropicBatchEvent event. The package listener picks it up and dispatches the processing job automatically.
use BatchApi\Events\SubmitAnthropicBatchEvent;
use BatchApi\Data\Input\AnthropicBatchItemDto;
$items = [
new AnthropicBatchItemDto(
customId: 'req-1',
maxTokens: 512,
messages: [
['role' => 'user', 'content' => 'Summarise this article in one paragraph.'],
],
),
new AnthropicBatchItemDto(
customId: 'req-2',
maxTokens: 256,
messages: [
['role' => 'user', 'content' => 'What is the capital of France?'],
],
system: 'You are a geography expert.',
),
];
event(new SubmitAnthropicBatchEvent($items));The OpenAI flow requires a file ID. Upload first using the BatchService, then fire the event.
use BatchApi\BatchService;
use BatchApi\Events\SubmitOpenAiBatchEvent;
use BatchApi\Data\Input\OpenAiBatchItemDto;
$service = app(BatchService::class);
// Build items from raw JSONL or manually
$items = [
new OpenAiBatchItemDto(
customId: 'req-1',
messages: [['role' => 'user', 'content' => 'Hello']],
maxTokens: 512,
),
];
// Create a file record (mirrors OpenAI's file upload step)
$file = $service->uploadFile(
collect($items)->map(fn ($item) => json_encode([
'custom_id' => $item->customId,
'method' => 'POST',
'url' => '/v1/chat/completions',
'body' => ['messages' => $item->messages, 'max_tokens' => $item->maxTokens],
]))->implode("\n")
);
event(new SubmitOpenAiBatchEvent($file->id, $items));Listen to BatchCompletedEvent to act on results when processing finishes:
// app/Listeners/HandleBatchCompletedListener.php
use BatchApi\Events\BatchCompletedEvent;
use BatchApi\Data\BatchResultDto;
class HandleBatchCompletedListener
{
public function handle(BatchCompletedEvent $event): void
{
$batch = $event->batch;
foreach ($event->results as $result) {
/** @var BatchResultDto $result */
if ($result->succeeded) {
// $result->customId — matches your request's custom_id
// $result->content — the model's response text
// $result->model — model used
// $result->inputTokens / $result->outputTokens
} else {
// $result->error — failure message
}
}
}
}Register it in AppServiceProvider::boot():
// app/Providers/AppServiceProvider.php
use BatchApi\Events\BatchCompletedEvent;
use App\Listeners\HandleBatchCompletedListener;
use Illuminate\Support\Facades\Event;
public function boot(): void
{
Event::listen(BatchCompletedEvent::class, HandleBatchCompletedListener::class);
}| Event | Properties | Fired when |
|---|---|---|
BatchCreatedEvent |
$batch, $items, $provider |
Batch record saved, job dispatched |
BatchProcessingEvent |
$batch |
Queue worker picks up the job |
BatchItemStartedEvent |
$batch, $dto |
Single request about to fire |
BatchItemCompletedEvent |
$batch, $result |
Single request finished |
BatchCompletedEvent |
$batch, $results |
All requests done |
BatchFailedEvent |
$batch, $exception |
Job threw an unrecoverable error |
BatchCancelledEvent |
$batch |
Batch cancelled |
Use this when you want to point an existing Anthropic or OpenAI SDK at your local server instead of the cloud. The API surface is identical to the real APIs.
Do not set BATCH_API_EXPOSE_ROUTES=true. Instead, register routes manually inside a protected middleware group so you control authentication.
Install Sanctum if you haven't already:
composer require laravel/sanctum
php artisan install:apiIn your routes/api.php (or a service provider), wrap BatchApi::routes() with Sanctum middleware:
use BatchApi\Facades\BatchApi;
Route::middleware('auth:sanctum')->group(function () {
BatchApi::routes();
});This registers all 11 endpoints, each requiring a valid Sanctum token.
Note:
BatchApi::routes()also applies theapimiddleware internally. Wrapping it withauth:sanctumstacks both, so your routes haveapi+auth:sanctum.
// In a controller or seeder
$token = $user->createToken('batch-api-client')->plainTextToken;
// Pass this token to the HTTP clientAll requests need the token in the Authorization header:
Authorization: Bearer <token>
POST /api/anthropic/v1/messages/batches
Content-Type: application/json
Authorization: Bearer <token>{
"requests": [
{
"custom_id": "req-1",
"params": {
"model": "llama3.2",
"max_tokens": 512,
"messages": [
{ "role": "user", "content": "Say hello in one sentence." }
]
}
},
{
"custom_id": "req-2",
"params": {
"model": "llama3.2",
"max_tokens": 512,
"system": "You are a pirate. Always respond like a pirate.",
"messages": [
{ "role": "user", "content": "What is the capital of France?" }
]
}
}
]
}Response 202 Accepted:
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"type": "message_batch",
"processing_status": "in_progress",
"request_counts": { "processing": 2, "succeeded": 0, "errored": 0, "canceled": 0, "expired": 0 },
"created_at": "2026-05-25T10:00:00+00:00",
"expires_at": "2026-05-26T10:00:00+00:00",
"ended_at": null,
"cancel_initiated_at": null,
"results_url": null
}GET /api/anthropic/v1/messages/batches/{id}
Authorization: Bearer <token>Keep polling until processing_status is "ended".
GET /api/anthropic/v1/messages/batches/{id}/results
Accept: application/x-ndjson
Authorization: Bearer <token>Returns 204 No Content if still processing. When ready, streams one JSON object per line:
{"custom_id":"req-1","result":{"type":"succeeded","message":{"id":"msg_abc","type":"message","role":"assistant","model":"llama3.2","content":[{"type":"text","text":"Hello! Great to meet you."}],"stop_reason":"end_turn","usage":{"input_tokens":12,"output_tokens":10}}}}
{"custom_id":"req-2","result":{"type":"errored","error":{"type":"server_error","message":"Ollama timeout"}}}GET /api/anthropic/v1/messages/batches # list (supports ?limit=&before_id=&after_id=)
POST /api/anthropic/v1/messages/batches/{id}/cancel # cancelCreate a .jsonl file (one request per line):
{"custom_id":"req-1","method":"POST","url":"/v1/chat/completions","body":{"model":"llama3.2","messages":[{"role":"user","content":"Hello"}],"max_tokens":512}}
{"custom_id":"req-2","method":"POST","url":"/v1/chat/completions","body":{"model":"llama3.2","messages":[{"role":"user","content":"What is 2+2?"}],"max_tokens":256}}Upload it:
POST /api/openai/v1/files
Content-Type: multipart/form-data
Authorization: Bearer <token>
file=@requests.jsonl
purpose=batchResponse 201 Created:
{
"id": "file-abc123",
"object": "file",
"purpose": "batch",
"created_at": 1716631200
}POST /api/openai/v1/batches
Content-Type: application/json
Authorization: Bearer <token>{
"input_file_id": "file-abc123",
"endpoint": "/v1/chat/completions",
"completion_window": "24h"
}Response 201 Created:
{
"id": "550e8400-e29b-41d4-a716-446655440001",
"object": "batch",
"status": "validating",
"input_file_id": "file-abc123",
"output_file_id": null,
"request_counts": { "total": 2, "completed": 0, "failed": 0 }
}GET /api/openai/v1/batches/{id}
Authorization: Bearer <token>Poll until status is "completed". Note the output_file_id in the response.
GET /api/openai/v1/files/{output_file_id}/content
Authorization: Bearer <token>Returns JSONL, one result per line:
{"id":"batch_req_abc","custom_id":"req-1","response":{"status_code":200,"body":{"id":"chatcmpl-123","object":"chat.completion","model":"llama3.2","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! How can I help?"},"finish_reason":"stop"}],"usage":{"prompt_tokens":10,"completion_tokens":8,"total_tokens":18}}},"error":null}GET /api/openai/v1/batches # list (supports ?limit=&after=)
POST /api/openai/v1/batches/{id}/cancel # cancelPython (Anthropic SDK):
import anthropic
client = anthropic.Anthropic(
api_key="any-value", # required by SDK but not validated here
base_url="http://localhost:8000/api/anthropic",
default_headers={"Authorization": "Bearer <token>"},
)Python (OpenAI SDK):
from openai import OpenAI
client = OpenAI(
api_key="any-value",
base_url="http://localhost:8000/api/openai",
default_headers={"Authorization": "Bearer <token>"},
)pending → processing → completed
→ failed
→ cancelling → cancelled
| Internal | Anthropic processing_status |
OpenAI status |
|---|---|---|
pending |
in_progress |
validating |
processing |
in_progress |
in_progress |
completed |
ended |
completed |
failed |
ended |
failed |
cancelling |
canceling |
cancelling |
cancelled |
ended |
cancelled |
Batches expire after 24 hours.
- Open LM Studio → start the local server (default port
1234) - Load a model
- Update
.env:
INFERENCE_PROVIDER=lmstudio
INFERENCE_URL=http://localhost:1234
INFERENCE_MODEL=your-model-nameNo other changes needed.
INFERENCE_CONCURRENCY controls parallel requests per batch chunk.
| Hardware | Value |
|---|---|
| CPU-only | 1 |
| GPU with spare VRAM | 3–5 |
Import Local-Batch-API.postman_collection.json. Set the baseUrl variable to your server URL. The collection auto-saves batch IDs and file IDs between requests so you can run folders top-to-bottom without manually copying values.
Batches stay pending forever — Queue worker not running. Run php artisan queue:work.
Ollama timeout in results — Model is slow or INFERENCE_TIMEOUT too low. Raise to 300.
Routes return 404 — Routes not registered. Either set BATCH_API_EXPOSE_ROUTES=true (no auth) or call BatchApi::routes() manually in a middleware group.
401 Unauthorized on API routes — Sanctum token missing or invalid. Pass Authorization: Bearer <token> header.
cannot chdir git error in submodule — Run git submodule update --init in the parent repo.
MIT