A comprehensive solution for recovering missing historical financial data when primary data sources fail
| Resource | Link | Description |
|---|---|---|
| π Dashboard | bitcoin24-delta.vercel.app | Real-time Bitcoin tracker with historical charts |
| π API Stats | API Health Check | Database status and data collection metrics |
| β° Data Collection | API Endpoint | Live data collection example |
| π Source Code | View Repository | This knowledge base repository |
# Clone the repository
git clone https://github.com/nadinev6/bitcoin24.git
cd bitcoin24
# Explore the knowledge base
cat README.md
# Check out code examples
ls examples/- π Live Demo & Resources
- π― Project Overview
- π Recovery Results
- π οΈ Technical Implementation
- π§ Backfill Tools & Implementation
- π Performance Metrics
- π― Lessons Learned & Best Practices
- π Generalizable Solution Framework
- π Implementation Guidelines
- π Monitoring & Alerting Strategy
- π Security & Compliance
- π Success Criteria & Validation
- π Future Enhancements
- π Contributing
Problem: A financial data collection system experienced a 14-day service interruption due to database provider policy interpretation, resulting in critical gaps in historical Bitcoin price data.
Solution: We implemented a multi-layered backfill strategy that successfully recovered all missing data using multiple data sources and approaches.
Live Bitcoin Tracker Dashboard showing recovered historical data
| Metric | Result |
|---|---|
| Target Period | 14 days (Nov 21 - Dec 4, 2024) |
| Data Points Recovered | 336 hourly records |
| Success Rate | 100% |
| Recovery Time | < 5 minutes |
| Data Accuracy | Matched external sources |
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Approach 1 β β Approach 2 β β Approach 3 β
β Scheduler API β -> β Direct CoinGeckoβ -> β Yahoo Finance β
β (Failed) β β (Failed) β β (SUCCESS) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
| Source | Status | Authentication | Rate Limits |
|---|---|---|---|
| π Tiger Data | β Production | Database Connection | N/A |
| πͺ CoinGecko | API Key Required | 10,000/month | |
| π Yahoo Finance | β Primary | None | Rate limit friendly |
| π Mock Data | β Development | None | Unlimited |
graph TB
A[Next.js Frontend] --> B[API Routes]
B --> C[Data Collector]
C --> D[Multiple APIs]
C --> E[Tiger Database]
D --> F[CoinGecko]
D --> G[Yahoo Finance]
D --> H[Mock Data]
E --> I[TimescaleDB]
# Direct database backfill - Fastest method
node yahoo-backfill.js- β No API keys required
- β Direct database insertion
- β Handles rate limiting automatically
- β Batch processing for large datasets
# Using existing collection logic with rate limiting
node backfill-bitcoin-data.js --hours 168 --batch-size 24- β Uses proven collection methods
- β Built-in retry logic
- β Configurable batch sizes
- β Progress tracking
# For environments with strict rate limits
bash backfill-slow.sh 672 120- β 2-minute delays between requests
- β Automatic retry on failures
- β Comprehensive logging
- β Background process friendly
The most successful approach involves making direct API calls to Yahoo Finance without authentication:
- No API keys required - reduces complexity and costs
- Reliable historical data access - good coverage for cryptocurrency data
- Rate limit friendly - generally permissive rate limiting
- Simple HTTP headers - just need proper User-Agent identification
Key considerations for database insertion:
- Conflict resolution - use
ON CONFLICTclauses to handle duplicates - Transaction batching - group insertions for better performance
- Error handling - implement retry logic for network failures
- Data validation - verify price ranges and timestamp accuracy
Note: This knowledge base focuses on methodology and best practices. See the original bitcoin-tracker repository for complete implementation details.
| Approach | Setup Complexity | Authentication | Rate Limits | Data Coverage | Recovery Speed | Success Rate |
|---|---|---|---|---|---|---|
| Scheduler API | Low | None | Strict | Good | Slow | 0% |
| CoinGecko Direct | High | Required | Strict | Excellent | Medium | 0% |
| Yahoo Finance | Low | None | Lenient | Excellent | Fast | 100% |
- Never rely on a single data source for critical financial data
- Always have at least 3 backup providers configured
- Test backup sources regularly with small data samples
- Implement automatic failover mechanisms
- Free APIs often provide historical data without authentication
- Yahoo Finance, Alpha Vantage, and IEX Cloud are reliable alternatives
- CoinGecko requires API keys but offers comprehensive data
- Mock data generators are useful for development and testing
- Design for API limitations from the start of your project
- Implement progressive backoff strategies (1s, 2s, 4s, 8s delays)
- Monitor API usage quotas and implement early warning systems
- Cache frequently accessed data to reduce API calls
- Include data recovery procedures in all disaster recovery plans
- Maintain secure, documented database connection details
- Test recovery procedures quarterly to ensure they work
- Implement automated health checks for data collection
- Document incidents and resolutions thoroughly while fresh in memory
- Create runbooks for common recovery scenarios
- Share knowledge across teams to prevent knowledge silos
- Maintain detailed logs of all data recovery operations
This approach can be adapted for any financial data recovery scenario:
// Support for multiple cryptocurrencies
const supportedCoins = [
'BTC-USD', 'ETH-USD', 'ADA-USD', 'DOT-USD',
'LINK-USD', 'XRP-USD', 'LTC-USD'
];
// Batch processing for portfolio recovery
for (const symbol of supportedCoins) {
await recoverHistoricalData(symbol, startDate, endDate);
}// Multiple interval support
const intervals = {
'1m': 60, // 1 minute
'5m': 300, // 5 minutes
'1h': 3600, // 1 hour
'1d': 86400 // 1 day
};// Multi-provider abstraction
const dataProviders = {
yahoo: { name: 'Yahoo Finance', auth: false, rateLimit: 'friendly' },
alphavantage: { name: 'Alpha Vantage', auth: true, rateLimit: 'strict' },
iexcloud: { name: 'IEX Cloud', auth: true, rateLimit: 'moderate' }
};- Identify exact missing data periods and gaps
- Evaluate available data sources and their limitations
- Choose primary and backup recovery strategies
- Prepare database connections and authentication
- Configure database connection strings and SSL
- Set up API access credentials (if required)
- Prepare monitoring, logging, and progress tracking
- Test connections with small data samples
- Start with smallest viable test (1-2 hours of data)
- Monitor progress, errors, and API rate limits
- Validate data integrity against external sources
- Scale up to full dataset after successful test
- Compare recovered data with multiple external sources
- Verify database consistency and constraints
- Update monitoring systems and alerts
- Document the recovery process for future reference
- Data Collection Success Rate: Target >99%
- API Response Times: Monitor for degradation
- Database Insertion Rates: Track throughput
- Storage Utilization: Alert at 80% capacity
- Missing Data Detection: Real-time gap identification
- π¨ Missing data for active collection periods
- π¨ API rate limit exhaustion or 429 errors
- π¨ Database connection failures or timeouts
- π¨ Unusual data value deviations (>20% changes)
- π¨ Collection system downtime >15 minutes
- Never expose API keys in public repositories
- Use environment variables for all sensitive configuration
- Implement proper access controls for database connections
- Maintain audit logs for all data access operations
- Encrypt sensitive data in transit and at rest
- Respect API terms of service and rate limits
- Implement exponential backoff to avoid overwhelming services
- Monitor and log all API usage for compliance reporting
- Use User-Agent headers to identify your application
- Data Completeness: 100% of missing periods recovered
- Data Accuracy: Recovered data matches external validation sources
- Zero Downtime: Recovery completed without service interruption
- Process Documentation: Complete runbook created for future use
- Monitoring Enhancement: Improved alerting to prevent future issues
- All timestamps match expected time ranges
- Price values are within reasonable market ranges
- No duplicate entries in recovered periods
- Database constraints and indexes function correctly
- API integrations continue to work post-recovery
- π€ Automated Failover: Dynamic switching to backup data sources
- π Real-time Validation: Continuous data quality monitoring with ML
- π‘ Predictive Monitoring: Early warning systems for data collection issues
- β Cross-provider Verification: Multi-source data validation algorithms
- π― One-click Recovery: Automated incident response procedures
- Multi-tenant Support: Handle multiple cryptocurrency portfolios
- Advanced Analytics: Statistical analysis and trend detection
- API Gateway: Centralized data access with caching layers
- Container Orchestration: Kubernetes deployment for high availability
This knowledge base was created from a real-world production incident where a financial data collection system experienced a 14-day service interruption. The solution demonstrates practical, proven approaches to data recovery in live production environments.
- Multiple data sources are essential for production financial applications
- Free APIs often provide better historical access than paid alternatives
- Rate limiting awareness must be built into system architecture from day one
- Automated recovery procedures can turn disasters into routine maintenance
- π Live Application: bitcoin24-delta.vercel.app
- π» Knowledge Base: GitHub Repository
- π Documentation: Complete API and deployment guides
- π οΈ Backfill Tools: Production-ready recovery scripts
We welcome contributions to improve this knowledge base!
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.
This knowledge base serves as a comprehensive guide for developers building financial data applications and recovering from data collection incidents. The techniques demonstrated here are applicable across various financial data scenarios and can be adapted for different cryptocurrencies, time ranges, and data providers.
Last Updated: December 5, 2024
Recovery Incident: November 21 - December 4, 2024
Status: β
Successfully Resolved
Uptime Impact: π’ Zero downtime during recovery