Skip to content

Add parallel multi-tenant operations for faster bulk processing#154

Open
jfrancoa wants to merge 5 commits intomainfrom
jose/parallel-tenant
Open

Add parallel multi-tenant operations for faster bulk processing#154
jfrancoa wants to merge 5 commits intomainfrom
jose/parallel-tenant

Conversation

@jfrancoa
Copy link
Collaborator

Summary

  • Batch tenant API calls: delete tenants and update tenants now issue a single batch API call instead of N individual calls, eliminating the main bottleneck for collections with many tenants
  • Parallel tenant processing: create data, update data, and delete data use a ThreadPoolExecutor to process multiple tenants concurrently, controlled by a new --parallel_workers option (default: min(32, cpu_count + 4))
  • Safe by design: When parallel_workers > 1, per-tenant concurrent_requests is scaled down proportionally so total in-flight connections stay bounded, preventing cluster overload

Test plan

  • All 228 unit tests pass (make test)
  • Black formatting passes (make lint)
  • Updated test_tenant_manager.py assertions to expect batch API calls (list args) instead of N individual calls — reflects the explicit behavior change
  • Added 11 new tests in test_data_manager.py covering:
    • Parallel tenant processing (all tenants processed)
    • Sequential fallback with parallel_workers=1
    • Error collection from parallel workers
    • Specific tenants list with parallelism
    • Non-MT collection unaffected
    • concurrent_requests reduction for parallel tenants (prevents overload)
    • concurrent_requests unchanged for single tenant

Closes #153

🤖 Generated with Claude Code

Multi-tenant operations were processed sequentially, causing major
bottlenecks with collections with hundreds or thousands of tenants.

Changes:
- tenant_manager: replace per-tenant remove/update loops with single
  batch API calls (1 call instead of N)
- data_manager: use ThreadPoolExecutor to process multiple tenants
  concurrently for create/update/delete data operations
- data_manager: when parallel_workers > 1, concurrent_requests per
  tenant is scaled down to keep total connections bounded
- add --parallel_workers CLI option to create/update/delete data
  commands (default: MAX_WORKERS = min(32, cpu_count + 4))
- update defaults.py with parallel_workers field for data operations

Test changes:
- Updated test_tenant_manager.py assertions to expect batch API calls
  (list argument) instead of per-tenant calls; this reflects the new
  batch behavior which is the explicit goal of this feature
- Added 11 new tests in test_data_manager.py covering parallel tenant
  processing, sequential fallback, error collection, and concurrent
  request scaling

Closes #153

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link

@orca-security-eu orca-security-eu bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orca Security Scan Summary

Status Check Issues by priority
Passed Passed Infrastructure as Code high 0   medium 0   low 0   info 0 View in Orca
Passed Passed SAST high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Secrets high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Vulnerabilities high 0   medium 0   low 0   info 0 View in Orca

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds parallel multi-tenant operations to significantly improve performance for bulk operations on collections with many tenants. The changes convert sequential tenant processing loops into parallel execution using ThreadPoolExecutor, and replace N individual tenant API calls with single batch operations.

Changes:

  • Converted delete_tenants and update_tenants to use batch API calls (single call with a list of tenants) instead of N individual calls
  • Added parallel tenant processing for create_data, update_data, and delete_data using ThreadPoolExecutor
  • Added --parallel_workers CLI option (default: min(32, cpu_count + 4)) to control parallelism, with automatic per-tenant concurrent_requests scaling to prevent cluster overload

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
weaviate_cli/managers/tenant_manager.py Converted delete_tenants and update_tenants to use batch API calls with lists instead of loops
weaviate_cli/managers/data_manager.py Added parallel tenant processing with ThreadPoolExecutor for create_data, update_data, and delete_data; includes concurrent_requests scaling and error aggregation
weaviate_cli/defaults.py Added parallel_workers default (MAX_WORKERS) to CreateDataDefaults, UpdateDataDefaults, and DeleteDataDefaults
weaviate_cli/commands/create.py Added --parallel_workers CLI option for create data command
weaviate_cli/commands/update.py Added --parallel_workers CLI option for update data command
weaviate_cli/commands/delete.py Added --parallel_workers CLI option for delete data command
test/unittests/test_managers/test_tenant_manager.py Updated test assertions to expect batch API calls (lists) instead of individual calls
test/unittests/test_managers/test_data_manager.py Added 11 new tests covering parallel processing, error collection, concurrent_requests scaling, and edge cases

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jfrancoa and others added 2 commits February 26, 2026 17:44
…ror handling

- Suppress per-tenant progress messages in parallel mode (update/delete) to
  avoid interleaved output; messages only print when parallel_workers <= 1
- Make sequential error handling consistent with parallel: collect all tenant
  errors instead of failing fast on the first one, then raise a combined
  exception (matches parallel path behaviour)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds four parallel-processing tests for create_data that mirror the
TestUpdateDataParallel and TestDeleteDataParallel classes, covering:
all-tenants-in-parallel, sequential fallback, error collection, and
non-MT collection handling.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jfrancoa and others added 2 commits February 26, 2026 18:17
__delete_data always returns a non-negative count or raises; it never
returns -1. Drop the dead sentinel branches from both the parallel and
sequential paths in delete_data and rely on exceptions for error
propagation instead, simplifying the control flow.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduce actual_workers = min(parallel_workers, len(tenants),
concurrent_requests) and use it as both the ThreadPoolExecutor
max_workers and the divisor for per-tenant concurrent_requests:

- Over-throttle fix: 2 tenants + 32 workers previously divided by 32,
  leaving each thread with 1 request instead of half the budget. Now
  divides by min(32, 2, …) = 2, fully utilising the budget.
- Over-budget fix: 32 workers + 4 concurrent_requests previously floored
  to 1/thread × 32 threads = 32 total. Capping actual_workers at
  concurrent_requests keeps total in-flight ≤ concurrent_requests.

Update the concurrent_requests scaling test to reflect the corrected
expected value (test was asserting the old over-throttled behaviour).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parallel multi-tenant operations for faster bulk processing

2 participants