Production-Ready USDA Nutritional Data API

This project provides a robust, high-performance, and production-grade serverless API for accessing nutritional data from the USDA's FoodData Central. Built on Cloudflare Workers and TypeScript, it features a resilient architecture with a Cloudflare D1-powered caching layer, structured logging, and a full suite of tests.

🚀 New Features: Advanced Natural Language Processing

Now featuring a zero-cost, highly efficient natural language processing system:

🎯 Intelligent Query Parsing

Parse complex food queries with quantity, units, and preparation methods
Smart food recognition with local fuzzy matching
Context-aware processing for preparation methods

📐 Advanced Unit Handling

Standard measurements (g, kg, lb, oz)
Informal measurements (pinch, dash, handful)
Fraction support (1/2, quarter, half)
Range handling (2-3 tablespoons)

🧠 Smart Features (Zero-Cost Implementation)

Local fuzzy string matching for food recognition
Food substitution suggestions
Preparation method impact analysis
Nutritional context awareness

💡 Example Queries

"100g of chedar cheese" 
→ Suggests "cheddar cheese" with alternatives

"2-3 tablespoons olive oil" 
→ Handles range and provides context

"grilled chicken breast" 
→ Includes preparation method impact

Production-Ready USDA Nutritional Data API

This project provides a robust, high-performance, and production-grade serverless API for accessing nutritional data from the USDA's FoodData Central. Built on Cloudflare Workers and TypeScript, it features a resilient architecture with a Cloudflare D1-powered caching layer, structured logging, and a full suite of tests.

🚀 New Features: Zero-Cost Smart Natural Language Processing

Now featuring a sophisticated yet cost-efficient natural language processing system:

🎯 Intelligent Query Parsing

Parse complex food queries with smart entity recognition
Handle informal measurements and fractions
Support for preparation methods and modifiers
Efficient local fuzzy matching

📊 Smart Nutritional Context

Preparation method impact analysis
Food category recognition
Intelligent substitution suggestions
Serving recommendations

🔍 Enhanced Error Handling

Smart typo detection
Helpful suggestions for invalid queries
Context-aware error messages
Alternative recommendations

All features implemented with zero external dependencies and no ongoing costs!

⚡ Phase 2: Performance Multipliers (NEW!)

Dramatic performance improvements with minimal setup:

🚀 USDA Batch API Service

Up to 20 foods in a single API call instead of 20 separate calls
Automatic request batching with intelligent queuing
90% reduction in API calls for multi-item queries
Zero configuration required - works automatically

🔥 Hot Cache for Top 100 Foods

<5ms response time for most common queries
~80% cache hit rate with just 100 entries
One-time seeding of popular foods
Automatic query frequency tracking

📊 Performance Impact

Before Phase 2: 150ms avg, 2-3 API calls per request
After Phase 2: <10ms for 80% of queries, 88% fewer API calls
Cost Savings: Massive reduction in API usage and compute time

See PHASE_2_QUICKSTART.md for deployment instructions.

Example Queries & Responses

Basic Query

POST /api/natural-language-search
Content-Type: application/json

{
  "query": "100g of chiken brest"
}

Response:

{
  "parsed": {
    "quantity": 100,
    "unit": "g",
    "foodName": "chicken breast",
    "quantityInGrams": 100
  },
  "suggestions": [
    {
      "word": "chicken breast",
      "similarity": 85,
      "category": "meat",
      "alternatives": ["turkey breast", "tofu"]
    }
  ],
  "nutritionalContext": {
    "category": "meat",
    "preparation": {
      "suggested": ["grilled", "baked", "pan-fried"],
      "impact": {
        "grilled": {
          "calories": -5,
          "notes": ["Reduced fat content", "Minimal nutrient loss"]
        }
      }
    }
  }
}

Complex Query with Preparation

POST /api/natural-language-search
Content-Type: application/json

{
  "query": "2-3 tablespoons of extra virgin olive oil for cooking"
}

Response:

{
  "parsed": {
    "quantity": 2.5,
    "unit": "tablespoons",
    "foodName": "extra virgin olive oil",
    "quantityInGrams": 37.5,
    "preparation": "cooking"
  },
  "nutritionalContext": {
    "category": "oils",
    "preparation": {
      "method": "cooking",
      "notes": [
        "Better alternatives for high-heat cooking: regular olive oil, avocado oil",
        "Extra virgin olive oil best used unheated for dressings and finishing"
      ]
    }
  }
}

Architecture Overview

The API is designed for high availability and low latency by leveraging a serverless architecture on Cloudflare Workers and a persistent caching layer with Cloudflare D1.

Core Components

Cloudflare Worker (TypeScript): The core application logic runs on Cloudflare's global network, ensuring requests are handled close to the user.
itty-router: A lightweight, high-performance router for handling API endpoints within the worker.
Cloudflare D1: Serves as a persistent, external cache to store responses from the USDA API. This dramatically reduces latency for repeated requests and lessens the load on the upstream API.
Structured Logging: All log output is in a machine-readable JSON format, which is essential for effective monitoring and debugging in a production environment.

Request Lifecycle & Caching Strategy

The caching logic is central to the API's performance and resilience. It implements a stale-while-revalidate strategy.

Incoming Request: A user requests data for a specific food_id.
Cache Check (Read): The worker first queries the D1 database using the food_id as the cache key.
Cache Hit: If a fresh (not expired) record is found, the worker immediately returns the cached data. This is indicated by an X-Cache-Status: HIT header.
Cache Stale: If the data is found but has passed its ttl (Time-to-Live), it is considered "stale." The worker returns the stale data immediately (X-Cache-Status: STALE) and simultaneously triggers a background fetch to the USDA API to refresh the cache. This ensures the user gets a fast response while the cache is updated asynchronously.
Cache Miss: If no record is found, the worker calls the external USDA FoodData Central API.
Fetch & Parse: The worker fetches the raw data, validates it against a Zod schema, and transforms it into a clean, standardized JSON format.
Cache Write: The newly fetched data is written to the D1 database with a ttl and a stale_while_revalidate period.
Response: The worker returns the freshly fetched data to the user with an X-Cache-Status: MISS header.

API Documentation

Health Check

A comprehensive endpoint to verify that the worker and all its dependencies (USDA API, D1) are running and responsive.

Endpoint: GET /health

Success Response (200 OK):

{
  "status": "ok",
  "checks": {
    "usdaApi": { "status": "ok", "message": "USDA API is reachable." },
    "d1": { "status": "ok", "message": "D1 is reachable." },
"apiKeyDb": { "status": "ok", "message": "API key D1 database is reachable (Cloudflare D1)." }
  }
}

Error Response (503 Service Unavailable):

{
  "status": "error",
  "checks": {
    "usdaApi": { "status": "error", "message": "USDA API is unreachable." },
    "d1": { "status": "ok", "message": "D1 is reachable." },
    "apiKeyDb": { "status": "ok", "message": "API key D1 database is reachable." }
  }
}

Get Food Data

Retrieves detailed nutritional information for a specific food item by its FDC ID.

Endpoint: GET /food/:id
URL Parameters:
- id (required): The FoodData Central ID of the food item.

Success Response (200 OK):

The response is a structured JSON object containing the most essential nutrients.

Example (GET /food/746782):

{
  "fdcId": 746782,
  "description": "Cheese, cheddar, sharp",
  "calories": {
    "value": 404,
    "unit": "KCAL"
  },
  "protein": {
    "value": 24.9,
    "unit": "G"
  },
  "fat": {
    "value": 33.14,
    "unit": "G"
  },
  "carbohydrates": {
    "value": 1.28,
    "unit": "G"
  }
}

Search for Foods

Searches for foods based on a query string. This endpoint is useful for finding foods by name or brand.

Endpoint: GET /v1/search
Authentication: Required (API Key)
Query Parameters:
- query (required): The search term (e.g., "cheddar cheese").
- dataType (optional): The type of food data (e.g., "Branded", "Foundation").
- pageSize (optional): The number of results to return (default: 10).

API Response Structure

Understanding the `/v1/search` Response Structure

The /v1/search endpoint returns detailed nutritional information. The primaryFood object contains two main sets of data regarding serving size and nutrients:

Reference Data (Based on USDA Standard):
- referenceServing: This object always describes the standard 100g serving size used by the USDA FoodData Central database.
  - size: Always 100.
  - unit: Always "g".
- referenceNutrients: This object contains the detailed nutritional values (protein, fat, calories, vitamins, etc.) corresponding exactly to the 100g referenceServing. This provides a consistent baseline for comparison across different foods.
Calculated Data (Based on Your Query):
- calculatedAmount: This object provides details about the specific amount calculated based on your input query (quantity, unit, totalGramWeight).
  - If your query included a quantity and unit (e.g., "3 apples", "200g rice"), this section details how the total gram weight was determined (e.g., which portion size was matched, the weight per unit, and the final totalGramWeight).
  - If your query did not include a quantity and unit (e.g., "apple"), this section defaults to reflecting the 100g reference amount (totalGramWeight: 100).
- calculatedNutrients: This object contains the nutritional values scaled to match the totalGramWeight shown in calculatedAmount.
  - For a query like "3 apples", these nutrients will reflect the total for ~600g (or whatever the calculated weight is).
  - For a query like "apple", these nutrients will be identical to referenceNutrients (reflecting the 100g default).

Why Both? This structure gives you flexibility:

Use referenceNutrients if you always need data per 100g for comparisons.
Use calculatedNutrients if you need the nutritional information for the specific amount requested in the user's query.

Example 1: Query apple

{
  "query": "apple",
  "parsed": { "quantity": null, "unit": null, "food": "apple" },
  "primaryFood": {
    // ... other fields
    "referenceServing": { "size": 100, "unit": "g" },
    "referenceNutrients": { "calories": { "value": 61, /* ... */ } },
    "calculatedAmount": { "totalGramWeight": 100, /* ... */ },
    "calculatedNutrients": { "calories": { "value": 61, /* ... */ } } // Same as reference
    // ...
  }
}

Example 2: Query 3 apples

{
  "query": "3 apples",
  "parsed": { "quantity": 3, "unit": "apple", "food": "apple" },
  "primaryFood": {
    // ... other fields
    "referenceServing": { "size": 100, "unit": "g" },
    "referenceNutrients": { "calories": { "value": 61, /* ... */ } }, // Per 100g
    "calculatedAmount": { "totalGramWeight": 600, /* based on 3 * 200g/apple */ },
    "calculatedNutrients": { "calories": { "value": 366, /* Scaled: 61 * 6 */ } } // Scaled to 600g
    // ...
  }
}

Natural Language Search

Performs a search using a natural language query to identify a food and its quantity.

Endpoint: POST /v1/natural-language-search
Authentication: Required (API Key – Free or Pro)
Body:
- text (string, required): A natural language query (e.g., "100g of cheddar cheese").
- maxResults, confidence, filterForSuggestions (optional): Advanced controls for USDA lookups.

Success Response (200 OK):

{
  "query": "100g of cheddar cheese",
  "foods": [
      {
          "description": "Cheese, cheddar, sharp",
          "category": "Branded",
          "nutrients": {
              "Protein": {
                  "value": 22.87,
                  "unit": "G"
              },
              "Fat": {
                  "value": 33.82,
                  "unit": "G"
              },
              "Carbohydrates": {
                  "value": 2.77,
                  "unit": "G"
              },
              "Energy": {
                  "value": 411,
                  "unit": "KCAL"
              }
          }
      }
  ]
}

Premium AI Natural Language Search (Pro Tier)

Unlock the Workers AI-powered parser for more nuanced, multi-item meal descriptions.

Endpoint: POST /v2/ai-natural-language-search
Authentication: Requires a Pro tier API key
Body:
- text (string, required): Meal description (max 500 characters)
- Optional knobs: maxResults, confidence, filterForSuggestions
What you get:
- AI-interpreted items with unit normalization and gram estimates
- USDA-backed search results with confidence scores
- Response meta showing cache status and model identifier (@cf/meta/llama-2-7b-chat-int8)
Generate a Pro key: GET /_admin/generate-key?tier=pro

Getting Started & Deployment

Follow these steps to set up and deploy the worker.

-### Prerequisites

A Cloudflare account.
Node.js and npm installed.
The Wrangler CLI installed and authenticated.
A Cloudflare account.
Node.js and npm installed.
The Wrangler CLI installed and authenticated.

Step 1: Set Up the D1 Database

Create the D1 Database:
- In the Cloudflare dashboard, create a new D1 database.
- Bind it to your worker in wrangler.toml with the binding name DB.
Run the Schema:

Use Wrangler to execute the schema.sql file to create the necessary tables for caching and API key management in Cloudflare D1.

# Example: apply schema.sql to your production D1 database binding
wrangler d1 execute --binding API_KEYS_DB --file=schema.sql

Step 2: Configure Cloudflare Secrets

Secrets are used to store sensitive data like API keys and credentials. They are encrypted and cannot be viewed after being set.

# 1. USDA API Key (get one from https://api.nal.usda.gov/)
wrangler secret put USDA_API_KEY

# 2. Cloudflare D1 for API key management
Create the D1 database and bind it in `wrangler.toml` as `API_KEYS_DB`. The project stores API key metadata and validation data in Cloudflare D1. Optionally create a KV namespace called `API_KEY_CACHE_KV` for short-lived API key lookup caching.

# 3. Admin token for protected endpoints
wrangler secret put ADMIN_TOKEN

For local development, create a .dev.vars file in the project root and add your secrets there.

Step 3: Local Development & Deployment

Install Dependencies:
```
npm install
```
Run Locally:
```
npm run dev
```
Deploy to Cloudflare:
```
npm run deploy
```

Testing

The project includes a comprehensive test suite using vitest.

Unit & Integration Tests: Located in the tests/ directory, they cover individual functions and the complete request/response flow by mocking external services.

To run the full test suite:

npm test

Code Quality

Input Validation & Security

Comprehensive Input Validation

All API endpoints validate incoming data using zod schemas. This ensures:

Type safety (e.g., string, number, object)
Required fields are present
Length and format constraints
Consistent error responses

Example validation (TypeScript):

import { z } from 'zod';
const NaturalLanguageSearchSchema = z.object({
  query: z.string().min(1).max(100),
});

All validation errors return a structured JSON error response with details.

Input Sanitization & NoSQL Injection Protection

All user-supplied inputs used in database queries are sanitized using a strict allowlist of safe characters. This prevents injection attacks.

Example sanitization:

import { sanitize } from './utils/sanitizer';
const safeKeyId = sanitize(keyId);

Sanitization is applied before any database query, including API key lookups, quota/rate checks, and admin actions.

Security Best Practices

All secrets and credentials are stored using Cloudflare Secrets (never in code or env files).
Structured logging redacts sensitive headers and tokens.
All error responses use a consistent ErrorResponse model.
Rate limiting and quota enforcement are applied to all endpoints.

ESLint: Enforces code quality and best practices.
Prettier: Ensures consistent code formatting.

To check for linting errors:

npm run lint

To automatically format all code:

npm run format

Logging Privacy & Retention

This project emits structured JSON logs intended for machine parsing by observability systems. When deploying to production, follow these guidelines to protect user privacy and to control costs:

Redact sensitive headers and tokens before emitting logs (the worker uses sanitizeHeaders to redact Authorization, cookie headers, and similar values).
Avoid logging full request bodies unless strictly necessary; if you must log request bodies, mask PII (emails, phone numbers, SSNs) and truncate long content.
Implement log retention policies in your logging backend (for example: keep detailed logs for 30 days, aggregated metrics for 365 days).
Consider sampling high-volume, low-value logs (such as repeated 400-level client errors) to reduce cost and noise.
Ensure logs are transmitted over TLS and stored encrypted at rest in your logging backend.

These guidelines reduce the risk of accidental PII exposure and help maintain cost-effective observability.

API Usage Examples

Here are some examples of how to use the API in different programming languages.

JavaScript (Node.js)

const fetch = require('node-fetch');

const apiKey = 'YOUR_API_KEY';
const foodId = '746782'; // Example: Cheddar Cheese

fetch(`https://your-worker.your-domain.workers.dev/food/${foodId}`, {
  headers: {
    'x-api-key': apiKey,
  },
})
  .then(response => response.json())
  .then(data => console.log(data))
  .catch(error => console.error('Error:', error));

Python

import requests

api_key = 'YOUR_API_KEY'
food_id = '746782' # Example: Cheddar Cheese
url = f'https://your-worker.your-domain.workers.dev/food/{food_id}'

headers = {
    'x-api-key': api_key
}

response = requests.get(url, headers=headers)

if response.status_code == 200:
    print(response.json())
else:
    print(f"Error: {response.status_code}, {response.text}")

Getting Started Guide

Obtain an API Key: Contact our sales team at sales@example.com to get your API key.
Making Requests: All requests must include your API key in the x-api-key header.
Response Format: All successful responses will be in JSON format. Errors will also be returned as JSON with an appropriate status code.

Pricing Model

We offer the following tiers for our API:

Tier	Price	Requests/Month
Free	$0/month	1,000
Pro	$50/month	100,000
Enterprise	Custom	Custom

Premium Features: The POST /v2/ai-natural-language-search endpoint and future AI add-ons are available to Pro keys (and above) only. Requests from Free keys return 403 Forbidden.

For more details, please visit our pricing page at example.com/pricing.

Rate Limiting

This API enforces both global and endpoint-specific rate limits based on your API key tier (e.g., free, pro).

How It Works

Global Tier Limit: Each API key tier has a default global limit (e.g., 100 requests/min for free tier).
Endpoint-Specific Limit: Some endpoints (e.g., /food/search) may have stricter limits (e.g., 20 requests/min for free tier).
The middleware checks for an endpoint-specific limit first; if none is set, it falls back to the global tier limit.

Example Rate Limit Config

rateLimits: {
  free: {
    global: { maxRequests: 100, windowMs: 60000 },
    endpoints: {
      '/food/search': { maxRequests: 20, windowMs: 60000 },
      '/admin/stats': { maxRequests: 5, windowMs: 60000 }
    }
  },
  pro: {
    global: { maxRequests: 1000, windowMs: 60000 },
    endpoints: {
      '/food/search': { maxRequests: 200, windowMs: 60000 }
    }
  }
}

Rate Limit Headers

Every response includes headers to help you track your usage:

X-RateLimit-Limit: Maximum requests allowed in the window
X-RateLimit-Remaining: Requests remaining in the current window
X-RateLimit-Reset: Time (in seconds) until the window resets

Error Response (429 Too Many Requests)

If you exceed your rate limit, you will receive:

{
  "statusCode": 429,
  "error": "Rate limit exceeded. Please try again in 30 seconds.",
  "details": [
    { "field": "Retry-After", "value": "30" }
  ]
}

📚 Documentation

User Guides

Query Tips & Best Practices - How to write effective queries, use modifiers, handle synonyms, and debug unmatched items
Debug Logging Reference - Understanding the modifier detection and scoring logs

Technical Documentation

API Reference (OpenAPI) - Complete API specification with all endpoints and schemas
Deployment Guide - How to deploy to Cloudflare Workers
Production Deployment - Production best practices and security
Phase 9 Implementation - Details on modifier logic and synonym handling

Advanced Features

Validation & Rate Limiting - Request validation and rate limit configuration
Advanced Examples - Complex query examples and use cases
Simplified API Guide - Quick start for common use cases

Name		Name	Last commit message	Last commit date
Latest commit History 347 Commits
.github		.github
.vscode		.vscode
.wrangler/state/v3/d1/miniflare-D1DatabaseObject		.wrangler/state/v3/d1/miniflare-D1DatabaseObject
migrations		migrations
scripts		scripts
src		src
tests		tests
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
.prettierrc		.prettierrc
CACHE_STAMPEDE_IMPLEMENTATION_SUMMARY.md		CACHE_STAMPEDE_IMPLEMENTATION_SUMMARY.md
CACHE_STAMPEDE_PREVENTION.md		CACHE_STAMPEDE_PREVENTION.md
CHANGELOG.md		CHANGELOG.md
DEPLOYMENT.md		DEPLOYMENT.md
EXPERT_FEEDBACK_IMPLEMENTATION.md		EXPERT_FEEDBACK_IMPLEMENTATION.md
GEMINI.md		GEMINI.md
JULES.md		JULES.md
MONITORING_DASHBOARD.md		MONITORING_DASHBOARD.md
MONITORING_EXAMPLES.md		MONITORING_EXAMPLES.md
MONITORING_IMPLEMENTATION_SUMMARY.md		MONITORING_IMPLEMENTATION_SUMMARY.md
PHASE_1_2_IMPLEMENTATION.md		PHASE_1_2_IMPLEMENTATION.md
PHASE_2_DEPLOYMENT_CHECKLIST.md		PHASE_2_DEPLOYMENT_CHECKLIST.md
PHASE_2_DEPLOYMENT_COMMANDS.md		PHASE_2_DEPLOYMENT_COMMANDS.md
PHASE_2_IMPLEMENTATION.md		PHASE_2_IMPLEMENTATION.md
PHASE_2_QUICKSTART.md		PHASE_2_QUICKSTART.md
PHASE_2_SUMMARY.md		PHASE_2_SUMMARY.md
README.md		README.md
expertDeveloper_feedback.md		expertDeveloper_feedback.md
hot_cache_seed.sql		hot_cache_seed.sql
openapi.json		openapi.json
package-lock.json		package-lock.json
package.json		package.json
schema.sql		schema.sql
test_multi_source.js		test_multi_source.js
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts
wrangler.toml		wrangler.toml

myProjectsRavi/API

Folders and files

Latest commit

History

Repository files navigation

Production-Ready USDA Nutritional Data API

🚀 New Features: Advanced Natural Language Processing

🎯 Intelligent Query Parsing

📐 Advanced Unit Handling

🧠 Smart Features (Zero-Cost Implementation)

💡 Example Queries

Production-Ready USDA Nutritional Data API

🚀 New Features: Zero-Cost Smart Natural Language Processing

🎯 Intelligent Query Parsing

📊 Smart Nutritional Context

🔍 Enhanced Error Handling

⚡ Phase 2: Performance Multipliers (NEW!)

🚀 USDA Batch API Service

🔥 Hot Cache for Top 100 Foods

📊 Performance Impact

Example Queries & Responses

Basic Query

Complex Query with Preparation

Table of Contents

Architecture Overview

Core Components

Request Lifecycle & Caching Strategy

API Documentation

Health Check

Get Food Data

Search for Foods

API Response Structure

Understanding the /v1/search Response Structure

Natural Language Search

Premium AI Natural Language Search (Pro Tier)

Getting Started & Deployment

Step 1: Set Up the D1 Database

Step 2: Configure Cloudflare Secrets

Step 3: Local Development & Deployment

Testing

Code Quality

Input Validation & Security

Comprehensive Input Validation

Input Sanitization & NoSQL Injection Protection

Security Best Practices

Logging Privacy & Retention

API Usage Examples

JavaScript (Node.js)

Python

Getting Started Guide

Pricing Model

Rate Limiting

How It Works

Example Rate Limit Config

Rate Limit Headers

Error Response (429 Too Many Requests)

📚 Documentation

User Guides

Technical Documentation

Advanced Features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Understanding the `/v1/search` Response Structure

Packages