-
Notifications
You must be signed in to change notification settings - Fork 1
Improve backup import/export performance and fix production deploy #155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jeff-schnitter
merged 13 commits into
staging
from
154-improve-performance-of-backup-import-and-export-commands
Nov 5, 2025
Merged
Improve backup import/export performance and fix production deploy #155
jeff-schnitter
merged 13 commits into
staging
from
154-improve-performance-of-backup-import-and-export-commands
Nov 5, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fix for no base_url in cortex config file
Implemented parallel API calls using ThreadPoolExecutor for backup export and import operations, significantly improving performance. Changes: - Added ThreadPoolExecutor with max_workers=10 for concurrent API calls - Updated _export_plugins(), _export_scorecards(), _export_workflows() to fetch items in parallel - Updated _import_catalog(), _import_plugins(), _import_scorecards(), _import_workflows() to import files in parallel - Enhanced error handling to report failures without stopping entire operation - Maintained file ordering where applicable Performance improvements: - Export operations now run with concurrent API calls - Import operations process multiple files simultaneously - All existing tests pass (218 passed, 1 skipped) Fixes #154 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Added CORTEX_BASE_URL from GitHub Actions variables to the publish workflow, following the same pattern as test-pr.yml. Changes: - Added CORTEX_BASE_URL to top-level env section - Added env sections to pypi-deploy-event job to pass CORTEX_API_KEY and CORTEX_BASE_URL to container - Added env sections to docker-deploy-event job to pass CORTEX_API_KEY and CORTEX_BASE_URL to container - Added env sections to homebrew-custom-event job to pass CORTEX_API_KEY and CORTEX_BASE_URL to container This fixes the 401 Unauthorized and base_url errors when posting deploy events to Cortex during the publish workflow. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…PI calls Further performance optimizations for backup export: 1. Increased ThreadPoolExecutor workers from 10 to 30 - Network I/O bound operations can handle more parallelism - Should provide 2-3x improvement for plugins and scorecards 2. Eliminated N individual API calls for workflows export - Changed workflows.list() to use include_actions="true" - Single API call now returns all workflow data with actions - Convert JSON to YAML format directly without individual get() calls - This eliminates N network round-trips for N workflows Expected performance improvements: - Workflows: Near-instant (1 API call vs N calls) - Plugins/Scorecards: 2-3x faster with 30 workers vs 10 Previous timing: 2m19s (with 10 workers) Original timing: 3m29s 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Modified all parallel export and import functions to collect results first, then sort alphabetically before writing/printing. This makes debugging failed exports much easier while maintaining parallel execution performance. Changes: - Export functions (plugins, scorecards) now collect all results, sort by tag, then write files in alphabetical order - Import functions (catalog, plugins, scorecards, workflows) now collect all results, sort by filename, then print in alphabetical order - Maintains parallel execution speed - only the output order is affected Example output now shows consistent alphabetical ordering: --> about-learn-cortex --> bogus-plugin --> developer-relations-plugin --> google-plugin --> map-test --> my-cortex-plugin ... 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This is the critical fix for slow parallel export/import performance. Problem: - CortexClient was calling requests.request() directly without a session - Each API call created a new TCP connection (DNS lookup, TCP handshake, TLS) - Even with 30 parallel threads, each request was slow (~3+ seconds) - 44 plugins took 2m24s (no parallelism benefit) Solution: - Created requests.Session() in __init__ with connection pooling - Configured HTTPAdapter with pool_maxsize=50 for concurrent requests - Added automatic retries for transient failures (500, 502, 503, 504) - All requests now reuse existing TCP connections Expected impact: - First request: normal latency (connection setup) - Subsequent requests: dramatically faster (connection reuse) - With 30 workers: should see ~30x speedup for I/O bound operations - 44 plugins: should drop from 2m24s to ~5-10 seconds Technical details: - pool_connections=10: number of connection pools to cache - pool_maxsize=50: max connections per pool (supports 30+ parallel workers) - Retry with backoff for transient server errors 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
The Rich library's print() was being used to write JSON to files, causing massive slowdowns (1-78 seconds per file!). Using json.dump() directly should reduce write time to milliseconds.
Scorecard create operations can fail with 500 errors if there's an active evaluation running. This is a race condition that occurs when: 1. test_import.py creates/updates a scorecard 2. An evaluation is triggered automatically or by another test 3. test_scorecards.py tries to update the same scorecard Added exponential backoff retry logic (1s, 2s) with max 3 attempts to handle these transient 500 errors gracefully.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR dramatically improves the performance of backup import/export commands and fixes the production deploy workflow to correctly use CORTEX_BASE_URL.
Performance Improvements
Backup Export Performance:
Key Changes
1. Parallel Processing with ThreadPoolExecutor
ThreadPoolExecutorwith 30 workers2. HTTP Connection Pooling
requests.Session()with connection pooling inCortexClientHTTPAdapterwithpool_maxsize=50to support concurrent requests3. Optimized API Calls
listAPI call withinclude_actions=trueinstead of N individualgetcalls4. Fixed File Writing Performance
print()with directjson.dump()for JSON file writing5. Alphabetical Output Ordering
6. Production Deploy Fix
.github/workflows/publish.ymlto includeCORTEX_BASE_URLin all container jobs7. Test Reliability
Files Changed
cortexapps_cli/commands/backup.py- Parallel processing, optimized exports, alphabetical orderingcortexapps_cli/cortex_client.py- HTTP connection pooling and session management.github/workflows/publish.yml- Fixed CORTEX_BASE_URL configurationtests/test_scorecards.py- Retry logic for active evaluation race conditionsTest Results
All tests passing (218 passed, 1 skipped) with 78% coverage.
Related Issues
Fixes #154