Skip to content

feat: Add Rate Limiting Middleware to Prevent LLM API Overuse#56

Open
SandeepChauhan00 wants to merge 3 commits intoINCF:mainfrom
SandeepChauhan00:feature/rate-limiting
Open

feat: Add Rate Limiting Middleware to Prevent LLM API Overuse#56
SandeepChauhan00 wants to merge 3 commits intoINCF:mainfrom
SandeepChauhan00:feature/rate-limiting

Conversation

@SandeepChauhan00
Copy link

@SandeepChauhan00 SandeepChauhan00 commented Feb 9, 2026

Summary

Adds rate limiting middleware to the /api/chat endpoint using slowapi to prevent uncontrolled LLM API usage, protect against abuse, and improve production readiness.

Closes #55

Problem

The /api/chat endpoint in backend/main.py had no rate limiting. Any user could send unlimited concurrent requests, leading to:

  • Uncontrolled Google Gemini/Vertex AI API costs
  • Unhandled rate limit errors from LLM providers
  • Vulnerability to bot abuse or accidental request loops

Changes Made

backend/main.py

  • Added slowapi rate limiter with per-IP tracking
  • Added @limiter.limit(RATE_LIMIT) decorator to /api/chat endpoint
  • Added custom 429 Too Many Requests exception handler with user-friendly message
  • Added Request parameter to chat_endpoint (required by slowapi)
  • Added structured logging for incoming requests, responses, and errors
  • Rate limit is configurable via RATE_LIMIT environment variable (default: 10/minute)

pyproject.toml

  • Added "slowapi>=0.1.9" to project dependencies

.env.template

  • Added RATE_LIMIT=10/minute configuration variable

How It Works

  • Each client IP is tracked independently
  • When limit is exceeded, returns 429 with a user-friendly JSON response
  • All requests are logged with client IP, session ID, and processing time
  • Rate limit is fully configurable without code changes via .env

Testing

  • Verified slowapi imports correctly
  • Verified rate limiting code present in main.py
  • No breaking changes to existing endpoints or functionality

Configuration

Variable Default Description
RATE_LIMIT 10/minute Max requests per IP per time window

Supports any format: 5/second, 100/hour, 1000/day

Acceptance Criteria

  • Rate limiting middleware added to /api/chat endpoint in backend/main.py
  • Limit is configurable via .env file
  • Returns clear 429 response with user-friendly error message
  • Basic request count logging added for monitoring
  • Existing tests still pass after integration

@QuantumByte-01
Copy link
Collaborator

Clean implementation — slowapi middleware with configurable RATE_LIMIT env var, proper 429 handler with retry_after, and useful request/response logging. Dependency added correctly to pyproject.toml. Good understanding of the codebase.

@QuantumByte-01
Copy link
Collaborator

This PR has merge conflicts with the current main branch, likely due to recent merges touching main.py. Please rebase against main and resolve the conflicts — the implementation itself is good and will be merged once clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Add Rate Limiting Middleware to Prevent LLM API Overuse

2 participants