HyperCache is a high-performance, Redis-compatible distributed cache with advanced memory management, integrated probabilistic data structures (Cuckoo filters), and comprehensive monitoring stack. Built in Go for cloud-native environments.
Production-ready distributed cache with full observability stack:
- β Multi-node cluster deployment with full replication
- β Full Redis client compatibility (RESP protocol)
- β Lamport timestamps for causal ordering of distributed writes
- β Read-repair for gossip propagation window
- β Early Cuckoo filter sync across nodes
- β Enterprise persistence (AOF + Snapshots)
- β Structured JSON logging with correlation ID tracing
- β Real-time monitoring with Grafana + Elasticsearch
- β HTTP API + RESP protocol support
- β Advanced memory management with pressure detection
- β Cuckoo filter integration for negative lookup acceleration
- Grafana Dashboards: Real-time metrics visualization
- Elasticsearch: Centralized log aggregation and search
- Filebeat: Log shipping and processing
- Health Checks: Built-in monitoring endpoints
Pull from Docker Hub and start the full stack:
# Download the compose file
curl -O https://raw.githubusercontent.com/rishabhverma17/HyperCache/main/docker-compose.cluster.yml
# Start everything (3 HyperCache nodes + Elasticsearch + Grafana + Filebeat)
docker compose -f docker-compose.cluster.yml up -dThat's it. All configs are baked into the Docker image β no cloning, no local files needed.
# Verify the cluster
curl http://localhost:9080/health
# Store a key
curl -X PUT http://localhost:9080/api/cache/hello \
-H "Content-Type: application/json" -d '{"value": "world"}'
# Read it from a different node (replication)
curl http://localhost:9082/api/cache/hello
# Open Grafana dashboards
open http://localhost:3000 # admin / admin123- Go 1.23.2+
redis-cli(optional, for RESP testing)
# Build and start a fresh 3-node cluster
make cluster
# Check cluster health
curl -s http://localhost:9080/health | python3 -m json.tool
# Stop the cluster
make cluster-stop
# Full reset (stop + wipe data/logs/binaries + restart)
make cluster-stop && make clean && make clustermake run# Pull the latest image from Docker Hub
docker pull rishabhverma17/hypercache:latest
# Start full stack (3-node cluster + Elasticsearch + Grafana + Filebeat)
docker compose -f docker-compose.cluster.yml up -d
# Or build locally and start
make docker-build && make docker-up
# Stop
docker compose -f docker-compose.cluster.yml downkubectl apply -f k8s/hypercache-cluster.yaml| Service | URL | Notes |
|---|---|---|
| Node 1 HTTP API | http://localhost:9080 | Health, cache, filter, metrics |
| Node 2 HTTP API | http://localhost:9081 | |
| Node 3 HTTP API | http://localhost:9082 | |
| Node 1 RESP | redis-cli -p 8080 |
Redis-compatible |
| Node 2 RESP | redis-cli -p 8081 |
|
| Node 3 RESP | redis-cli -p 8082 |
|
| Prometheus Metrics | http://localhost:9080/metrics | Per-node metrics |
| Grafana | http://localhost:3000 | admin / admin123 |
| Elasticsearch | http://localhost:9200 |
make test-unitmake lint
make fmtmake benchImport HyperCache.postman_collection.json into Postman for a full test suite covering:
health, metrics, CRUD, cross-node replication, delete replication, value types, Cuckoo filter, and cleanup.
# Store a key
curl -X PUT http://localhost:9080/api/cache/mykey \
-H "Content-Type: application/json" \
-d '{"value": "hello world"}'
# Retrieve it
curl http://localhost:9080/api/cache/mykey
# Delete it
curl -X DELETE http://localhost:9080/api/cache/mykey
# Check Cuckoo filter stats
curl http://localhost:9080/api/filter/stats
# Prometheus metrics
curl http://localhost:9080/metricsredis-cli -p 8080 SET foo bar
redis-cli -p 8080 GET foo
redis-cli -p 8081 GET foo # verify replication
redis-cli -p 8080 DEL foo
redis-cli -p 8080 INFO
redis-cli -p 8080 DBSIZEmake build Build the binary
make run Run single node (RESP)
make cluster Start 3-node local cluster
make cluster-stop Stop all HyperCache processes
make clean Remove binaries, logs, data
make test-unit Run unit tests with coverage
make test-integration Run integration tests
make bench Run benchmarks
make lint Run golangci-lint
make fmt Format code
make docker-build Build Docker image
make docker-up Start Docker stack
make docker-down Stop Docker stack
make deps Download and tidy dependencies
- Full RESP protocol implementation
- Works with any Redis client library
- Drop-in replacement for many Redis use cases
- Standard commands: GET, SET, DEL, EXISTS, PING, INFO, FLUSHALL, DBSIZE
- Full Replication: Every node stores every key β maximum availability, any node serves any request
- Lamport Timestamps: Logical clocks for causal ordering of distributed operations. Stale writes from out-of-order gossip are automatically rejected
- Read-Repair: On local cache miss, peer nodes are queried before returning 404. Bridges the gossip propagation window (~50-500ms) so clients never see stale misses
- Early Cuckoo Filter Sync: Filter is updated immediately on gossip receive, before data is written. Eliminates false "definitely not here" rejections during replication lag
- Idempotent Replication: DELETE on a missing key is a no-op, not an error. Designed for eventual consistency
- Correlation ID Tracing: Every request gets a unique ID that flows across all nodes for end-to-end debugging
- Hybrid Persistence: AOF (Append-Only File) + Snapshot dual strategy
- Configurable per Store: Each data store can have independent persistence policies
- Sub-microsecond Writes: AOF logging with low-latency write path
- Fast Recovery: Complete data restoration from AOF replay + snapshot loading
- Snapshot Support: Point-in-time recovery with configurable intervals
- Durability Guarantees: Configurable sync policies (always, everysec, no)
- Docker Hub Integration: Pre-built multi-arch images (amd64, arm64)
- Docker Compose Support: One-command cluster deployment with monitoring
- Kubernetes Ready: StatefulSet manifests with service discovery
- CI/CD Pipeline: GitHub Actions for lint, test, build, and publish
- Per-Store Eviction Policies: Independent LRU, LFU, or session-based eviction per store
- Smart Memory Pool: Pressure monitoring (warning/critical/panic) with automatic cleanup
- Real-time Usage Tracking: Memory statistics and structured alerts
- Configurable Limits: Store-specific memory boundaries
- Per-Store Cuckoo Filters: Negative lookup acceleration β instant "definitely not here" for keys that don't exist
- Configurable False Positive Rate: Tune precision vs memory (default 0.01)
- O(1) Membership Testing: Sub-microsecond filter checks before any store lookup
- Supports Delete: Unlike Bloom filters, Cuckoo filters allow key removal
- Multi-node Clustering: Serf gossip protocol for node discovery and health monitoring
- Consistent Hash Ring: 256 virtual nodes with xxhash64 for uniform key distribution
- Automatic Failover: Node failure detection and traffic redistribution via gossip
- Inter-node Communication: HTTP-based read-repair and peer discovery via gossip metadata
- Structured JSON Logging: Every log line has timestamp, level, component, action, correlation ID
- Grafana Dashboards: Health overview, performance metrics, system components
- Elasticsearch + Filebeat: Centralized log aggregation with container-scoped filtering
- Configurable Log Levels: debug/info/warn/error/fatal β tunable per node at runtime
- Prometheus Metrics:
/metricsendpoint with cache stats, cluster health, hit rates
HyperCache/
βββ cmd/hypercache/ # Server entry point
βββ scripts/ # Deployment and management scripts
β βββ start-system.sh # Complete system launcher
β βββ build-and-run.sh # Build and cluster management
β βββ clean-*.sh # Cleanup utilities
βββ configs/ # Node configuration files
β βββ node1-config.yaml # Node 1 configuration
β βββ node2-config.yaml # Node 2 configuration
β βββ node3-config.yaml # Node 3 configuration
βββ internal/
β βββ cache/ # Cache interfaces and policies
β βββ storage/ # Storage with persistence
β βββ filter/ # Cuckoo filter implementation
β βββ cluster/ # Distributed coordination
β βββ network/resp/ # RESP protocol server
β βββ logging/ # Structured logging
βββ grafana/ # Grafana dashboards and config
βββ examples/ # Client demos and examples
βββ docs/ # Technical documentation
βββ logs/ # Application logs (Filebeat source)
βββ data/ # Persistence data (node storage)
βββ docker-compose.logging.yml # Monitoring stack
βββ filebeat.yml # Log shipping configuration
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Redis Client ββββββ RESP Protocol ββββββ HyperCache β
β (Any Library) β β Server β β Cluster β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββ
β β β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Memory Pool β β Data Storage β β Cuckoo Filter β β Hash Ring β β Gossip Node β
β (Pressure β β + Persistence β β (Probabilistic β β (Consistent β β Discovery β
β Monitoring) β β (AOF+Snapshot) β β Operations) β β Hashing) β β & Failover β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β β β β β
βββββββββββββββββββββββββββΌββββββββββββββββββββββββββΌββββββββββββββββββββββββββΌββββββββββββββββββββββββββ
β β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MONITORING STACK β
βββββββββββββββββββ¬βββββββββββββββββββ¬ββββββββββββββββββ¬βββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββ€
β Filebeat β Elasticsearch β Grafana β Health API β Metrics β Alerting β
β (Log Shipper) β (Log Storage) β (Dashboards) β (Diagnostics) β (Performance) β (Monitoring) β
βββββββββββββββββββ΄βββββββββββββββββββ΄ββββββββββββββββββ΄βββββββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββββ
Grafana Dashboards (http://localhost:3000)
- System Overview: Cluster health, node status, memory usage
- Performance Metrics: Request rates, response times, cache hit ratios
- Error Monitoring: Failed requests, timeout alerts, node failures
- Capacity Planning: Memory trends, storage usage, growth patterns
Elasticsearch Logs (http://localhost:9200)
- Centralized Logging: All cluster nodes, operations, and errors
- Search & Analysis: Query logs by node, operation type, or time range
- Error Tracking: Exception traces, failed operations, debug information
- Audit Trail: Configuration changes, cluster events, admin operations
# Cluster health
curl http://localhost:9080/health
curl http://localhost:9081/health
curl http://localhost:9082/health
# Node statistics
curl http://localhost:9080/stats
# Memory usage
curl http://localhost:9080/api/cache/stats# View cluster logs in real-time
docker logs -f hypercache-filebeat
# Query Elasticsearch directly
curl "http://localhost:9200/logs-*/_search?q=level:ERROR"
# Monitor resource usage
docker stats hypercache-elasticsearch hypercache-grafana
# Backup persistence data
tar -czf hypercache-backup-$(date +%Y%m%d).tar.gz data/HyperCache uses structured JSON logging with correlation IDs for full request tracing across all cluster nodes.
Available log levels (from most to least verbose):
| Level | What it includes |
|---|---|
debug |
Everything: cuckoo filter decisions, event bus routing, gossip internals, health checks, snapshot ticks |
info |
Business operations: request lifecycle (start β operation β result), replication flow, cluster membership changes, persistence events |
warn |
Potential issues: memory pressure warnings, failed joins, missing event bus |
error |
Failures: replication errors, deserialization failures, storage errors |
fatal |
Unrecoverable: startup failures |
Changing the log level:
Edit the node config YAML (e.g., configs/docker/node1-config.yaml):
logging:
level: "info" # Change to "debug" for troubleshooting, "warn" for quieter logs
max_file_size: "100MB"
max_files: 5
output: ["console", "file"]
structured: true
log_dir: "/app/logs"For Docker deployments, update all three node configs and rebuild:
# Edit configs/docker/node1-config.yaml, node2-config.yaml, node3-config.yaml
# Then rebuild and redeploy:
docker compose -f docker-compose.cluster.yml up -d --buildRequest tracing with correlation IDs:
Every request gets a correlation_id that flows through the entire lifecycle β from HTTP entry through cache operations to cross-node replication. Use it to trace any request across all nodes:
# Trace a specific request across all nodes
docker logs hypercache-node1 2>&1 | grep "abc-123-correlation-id"
docker logs hypercache-node2 2>&1 | grep "abc-123-correlation-id"
docker logs hypercache-node3 2>&1 | grep "abc-123-correlation-id"
# Find all errors in the last hour
docker logs --since 1h hypercache-node1 2>&1 | grep '"level":"ERROR"'
# Find all replication events
docker logs hypercache-node1 2>&1 | grep '"action":"replication"'You can also pass your own correlation ID via the X-Correlation-ID HTTP header for end-to-end tracing from your application:
curl -X PUT http://localhost:9080/api/cache/mykey \
-H "Content-Type: application/json" \
-H "X-Correlation-ID: my-trace-id-123" \
-d '{"value": "hello"}'See docs/README.md for the full documentation index:
- Architecture β Consistent hashing, Cuckoo filter internals, RESP protocol, Raft consensus
- Guides β Development setup, Docker, observability, multi-VM deployment
- Reference β Benchmarks, persistence paths, known issues
### Clean Up
```bash
# Stop all services
./scripts/build-and-run.sh stop
docker-compose -f docker-compose.logging.yml down
# Clean persistence data
./scripts/clean-persistence.sh --all
# Clean Elasticsearch data
./scripts/clean-elasticsearch.sh
# Start complete system with monitoring
./scripts/start-system.sh --all
# Start only cluster
./scripts/start-system.sh --cluster
# Start only monitoring
./scripts/start-system.sh --monitor
# Clean data and restart
./scripts/start-system.sh --clean --all# configs/node1-config.yaml
node:
id: "node-1"
data_dir: "./data/node-1"
network:
resp_port: 8080
http_port: 9080
gossip_port: 7946
cache:
max_memory: 1GB
default_ttl: 1h
cleanup_interval: 5m
eviction_policy: "session"
persistence:
enabled: true
aof_enabled: true
snapshot_enabled: true
snapshot_interval: 300s# Independent configuration for each data store
stores:
user_sessions:
eviction_policy: "session" # Session-based eviction
cuckoo_filter: true # Enable probabilistic operations
persistence: "aof+snapshot" # Full persistence
replication_factor: 3
page_cache:
eviction_policy: "lru" # LRU eviction
cuckoo_filter: false # Disable for pure cache
persistence: "aof_only" # Write-ahead logging only
replication_factor: 2
temporary_data:
eviction_policy: "lfu" # Least frequently used
cuckoo_filter: true # Enable for membership tests
persistence: "disabled" # In-memory only
replication_factor: 1# Grafana (localhost:3000)
Username: admin
Password: admin123
# Pre-configured datasources:
- Elasticsearch (HyperCache Logs)
- Health check endpoints- What: Binary protocol for Redis compatibility
- Why: Enables seamless integration with existing Redis clients and tools
- Features: Full command set support, pipelining, pub/sub ready
- Performance: Zero-copy parsing, minimal overhead
- What: Decentralized node discovery and health monitoring
- Why: Eliminates single points of failure in cluster coordination
- Features: Automatic node detection, failure detection, metadata propagation
- Scalability: O(log n) message complexity, handles thousands of nodes
- What: Distributed consensus algorithm for cluster coordination
- Why: Ensures data consistency and handles leader election
- Features: Strong consistency guarantees, partition tolerance, log replication
- Reliability: Proven algorithm used by etcd, Consul, and other systems
- What: Distributed data placement using consistent hashing
- Why: Minimizes data movement during cluster changes
- Features: Virtual nodes for load balancing, configurable replication
- Efficiency: O(log n) lookup time, minimal rehashing on topology changes
- AOF (Append-Only File): Sequential write logging for durability
- WAL (Write-Ahead Logging): Transaction-safe write ordering
- Hybrid Approach: Combines speed of WAL with simplicity of AOF
- Recovery: Fast startup with complete data restoration
- What: Space-efficient probabilistic data structure
- Why: Better than Bloom filters - supports deletions and has better locality
- Features: Configurable false positive rates, O(1) operations
- Use Cases: Membership testing, cache admission policies, duplicate detection
- docs/: Technical deep-dives and architecture docs
HyperCache implements a sophisticated dual-persistence system combining the best of both AOF and WAL approaches:
# Ultra-fast sequential writes
Write Latency: 2.7Β΅s average
Throughput: 370K+ operations/sec
File Format: Human-readable command log
Recovery: Sequential replay of operations# Transaction-safe write ordering
Consistency: ACID compliance
Durability: Configurable fsync policies
Crash Recovery: Automatic rollback/forward
Performance: Batched writes, zero-copy I/O# Measured Performance (Production Test)
β
Data Set: 10 entries
β
Recovery Time: 160Β΅s
β
Success Rate: 100% (5/5 tests)
β
Memory Overhead: <1MB# Snapshot-based recovery
β
Snapshot Creation: 3.7ms for 7 entries
β
File Size: 555B snapshot + 573B AOF
β
Recovery Strategy: Snapshot + AOF replay
β
Data Integrity: Checksum verificationstores:
critical_data:
persistence:
mode: "aof+snapshot" # Full durability
fsync: "always" # Immediate disk sync
snapshot_interval: "60s" # Frequent snapshots
session_cache:
persistence:
mode: "aof_only" # Write-ahead logging
fsync: "periodic" # Batched sync (1s)
compression: true # Compress log files
temporary_cache:
persistence:
mode: "disabled" # In-memory only
# No disk I/O overhead for temporary data# High Durability (Financial/Critical Data)
fsync: "always" # Every write synced
batch_size: 1 # Individual operations
compression: false # No CPU overhead
# Balanced (General Purpose)
fsync: "periodic" # 1-second sync intervals
batch_size: 100 # Batch writes
compression: true # Space efficiency
# High Performance (Analytics/Temporary)
fsync: "never" # OS manages sync
batch_size: 1000 # Large batches
compression: false # CPU for throughput- Zero Data Loss: With
fsync: alwaysconfiguration - Automatic Recovery: Self-healing on restart
- Integrity Checks: Checksums on all persisted data
- Partial Recovery: Recovers valid data even from corrupted files
- Consensus-Based: RAFT ensures consistency across partitions
- Split-Brain Protection: Majority quorum prevents conflicts
- Automatic Reconciliation: Rejoining nodes sync automatically
- Data Validation: Cross-node checksum verification
# Manual snapshot creation
curl -X POST http://localhost:9080/api/admin/snapshot
# Force AOF rewrite (compact logs)
curl -X POST http://localhost:9080/api/admin/aof-rewrite
# Check persistence status
curl http://localhost:9080/api/admin/persistence-stats
# Backup current state
./scripts/backup-persistence.sh
# Restore from backup
./scripts/restore-persistence.sh backup-20250822.tar.gz- High-performance caching layers for microservices
- Session storage with automatic failover
- Redis replacement with lower memory costs and better observability
- Distributed caching with real-time monitoring
- Local development with production-like monitoring
- Load testing with comprehensive metrics
- Log analysis and debugging with Elasticsearch
- Performance monitoring with Grafana dashboards
# Store user session
curl -X PUT http://localhost:9080/api/cache/user:123:session \
-d '{"value":"{\"user_id\":123,\"role\":\"admin\"}", "ttl_hours":2}'
# Retrieve session
curl http://localhost:9080/api/cache/user:123:sessionimport "github.com/redis/go-redis/v9"
// Connect to any cluster node
client := redis.NewClient(&redis.Options{
Addr: "localhost:8080", // Node 1 RESP port
})
// Use exactly like Redis!
client.Set(ctx, "user:123:profile", userData, 30*time.Minute)
client.Incr(ctx, "page:views")
client.LPush(ctx, "notifications", "New message")# Rate limiting counters
curl -X PUT http://localhost:9080/api/cache/rate:user:456 \
-d '{"value":"10", "ttl_hours":1}'
# Feature flags
curl -X PUT http://localhost:9080/api/cache/feature:new_ui \
-d '{"value":"enabled", "ttl_hours":24}'- Go 1.23.2+
- Docker & Docker Compose (for monitoring stack)
- Git (for cloning)
git clone <your-repository-url>
cd Cache
# Quick start - everything in one command
./scripts/start-system.sh
# Access your system:
# - Grafana: http://localhost:3000 (admin/admin123)
# - API: http://localhost:9080/api/cache/
# - Redis: localhost:8080 (redis-cli -p 8080)- Check Cluster Health: Visit http://localhost:9080/health
- Store Some Data:
redis-cli -p 8080 SET mykey "Hello World" - View in Grafana: Open http://localhost:3000, check dashboards
- Query Logs: Visit http://localhost:9200 for Elasticsearch
# Build and test
go build -o bin/hypercache cmd/hypercache/main.go
go test ./internal/... -v
# Start development cluster
./scripts/build-and-run.sh cluster
# View logs
tail -f logs/*.log
# Stop everything
./scripts/build-and-run.sh stop
docker-compose -f docker-compose.logging.yml down- HTTP API Documentation: Complete HTTP API reference with examples
- Technical Deep-Dives: Architecture, implementation details
- Configuration Guide: Production deployment
- RESP Protocol Reference: Redis compatibility examples
- Performance Benchmarks: Throughput and latency tests
- Monitoring Setup: Dashboard configuration
This project demonstrates enterprise-grade Go development with:
- Clean Architecture: Domain-driven design with clear interfaces
- Observability First: Comprehensive logging, metrics, and monitoring
- Production Ready: Persistence, clustering, and operational tooling
- Protocol Compatibility: Full Redis RESP implementation
- Performance Focused: Benchmarked and optimized for high throughput
MIT License - feel free to use in your projects!
From Concept to Production-Grade System:
- Vision: Redis-compatible distributed cache with advanced monitoring
- Built: Full production system with ELK stack integration
- Achieved: Multi-node clusters, real-time observability, enterprise persistence
- Result: Complete caching platform ready for cloud deployment
Features that set HyperCache apart:
- π Zero-downtime deployments with cluster coordination
- π Real-time monitoring with Grafana + Elasticsearch
- πΎ Enterprise persistence with AOF + snapshot recovery
- π Full observability with centralized logging and metrics
- β‘ Redis compatibility drop-in replacement capability
Made with β€οΈ in Go | Redis Compatible | Enterprise Observability
