feat: Index Migrator (Pre-release) by nkanu17 · Pull Request #583 · redis/redis-vl-python

nkanu17 · 2026-04-14T05:46:28Z

feat: Index Migrator (Pre-release)

Zero-downtime, crash-safe index migration for RedisVL. Plan, apply, and rollback schema changes, including vector quantization, field renames, prefix changes, and algorithm swaps, through a single CLI or programmatic API.

Summary

This PR adds a complete index migration system to RedisVL, enabling users to evolve their index schemas without data loss. The migrator handles the full lifecycle: plan → review → apply → validate → rollback.

Key Capabilities

Category	Operations
Index-only	Change algorithm (FLAT ↔ HNSW ↔ SVS-VAMANA), distance metric, HNSW params (M, EF_CONSTRUCTION), make fields sortable
Schema + Data	Add/remove fields, rename fields, rename index, change key prefix, change field options (separator, stemming)
Vector Quantization	float32 → float16, bfloat16, int8, uint8 with automatic re-encoding and crash-safe backup

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    CLI: rvl migrate                          │
│  plan │ wizard │ apply │ rollback │ estimate │ validate     │
├───────┴────────┴───────┴──────────┴──────────┴──────────────┤
│                  MigrationPlanner                           │
│  Schema diffing, change classification, plan generation     │
├─────────────────────────────────────────────────────────────┤
│            MigrationExecutor (sync + async)                 │
│  enumerate → field-rename → dump → drop → key-rename →     │
│  quantize → create → index → validate                      │
├─────────────────────────────────────────────────────────────┤
│  VectorBackup    │  Quantize Pipeline  │  Reliability Layer │
│  Crash-safe dump │  Batched R/W/Convert│  Validation checks │
└──────────────────┴─────────────────────┴────────────────────┘

Usage

CLI: Interactive Wizard

# List available indexes
rvl migrate list --url redis://localhost:6379

# Interactive wizard: walks through changes step by step
rvl migrate wizard --url redis://localhost:6379

# Generate a plan from a target schema YAML
rvl migrate plan --index my_index --schema target_schema.yaml --url redis://localhost:6379

# Apply with crash-safe backup and multi-worker quantization
rvl migrate apply --plan migration_plan.yaml \
  --backup-dir /tmp/migration_backup \
  --workers 4 --batch-size 1000 \
  --url redis://localhost:6379

# Estimate disk space (dry-run, no mutations)
rvl migrate estimate --plan migration_plan.yaml --url redis://localhost:6379

# Rollback if needed
rvl migrate rollback --backup-dir /tmp/migration_backup \
  --index my_index --url redis://localhost:6379

# Validate post-migration
rvl migrate validate --plan migration_plan.yaml --url redis://localhost:6379

CLI: Batch Migration (Multiple Indexes)

rvl migrate batch-plan --config batch_config.yaml --url redis://localhost:6379
rvl migrate batch-apply --plan batch_plan.yaml --url redis://localhost:6379
rvl migrate batch-status --plan batch_plan.yaml

Programmatic API

from redisvl.migration import MigrationPlanner, MigrationExecutor

# Plan
planner = MigrationPlanner()
plan = planner.create_plan(
    index_name="my_index",
    target_schema=target_schema,
    redis_url="redis://localhost:6379",
)

# Apply
executor = MigrationExecutor()
report = executor.apply(
    plan,
    redis_url="redis://localhost:6379",
    backup_dir="/tmp/migration_backup",
    num_workers=4,
    batch_size=1000,
)

print(f"Result: {report.result}")  # "succeeded"
print(f"Duration: {report.timings.total_duration}s")

Async API

from redisvl.migration import AsyncMigrationExecutor

executor = AsyncMigrationExecutor()
report = await executor.apply(
    plan,
    redis_url="redis://localhost:6379",
    backup_dir="/tmp/migration_backup",
    num_workers=4,
)

Crash Safety & Resume

Migrations are crash-safe by default when --backup-dir is provided:

Before drop: Original vectors are dumped to a binary backup file on disk
On crash: Re-running the same command detects the backup and resumes from the last completed batch
Rollback: rvl migrate rollback restores original vectors from the backup at any time

The backup file tracks phase (dump → ready → active → completed) and batch progress, so resume skips already-completed work.

Performance

Pipelined reads/writes: Batch HGET/HSET operations (configurable --batch-size)
Multi-worker quantization: --workers N parallelizes vector re-encoding via ThreadPoolExecutor (sync) or asyncio.gather (async)
Redis Cluster support: Batched DUMP/RESTORE/DEL for cross-slot key renames (100 keys/pipeline)
Disk space estimation: rvl migrate estimate calculates RDB + AOF impact before any mutations

What's Blocked

Change	Why	Workaround
Change vector dimensions	Requires re-embedding	Re-embed with new model, reload data
Change storage type (hash ↔ JSON)	Different data format	Export, transform, reload
Add a new vector field	Requires vectors for all docs	Add vectors first, then migrate

New Files

`redisvl/migration/` (core module)

File	Description
`models.py`	Data models: `MigrationPlan`, `MigrationReport`, `MigrationTimings`, etc.
`planner.py`	Schema diffing, change classification, plan generation
`executor.py`	Sync migration executor, full apply lifecycle
`async_executor.py`	Async migration executor
`async_planner.py`	Async planner
`validation.py` / `async_validation.py`	Pre/post-migration validation
`backup.py`	`VectorBackup`: crash-safe binary backup format
`quantize.py`	Pipelined vector quantization + multi-worker orchestration
`reliability.py`	Dtype conversion safety checks, width analysis
`wizard.py`	Interactive migration wizard
`batch_planner.py` / `batch_executor.py`	Multi-index batch migration
`utils.py`	Shared utilities (disk estimation, key enumeration, etc.)

`redisvl/cli/migrate.py`

Full CLI with 12 subcommands: list, wizard, plan, apply, estimate, rollback, validate, helper, batch-plan, batch-apply, batch-resume, batch-status

Tests

Suite	Count	Description
`test_migration_planner.py`	67	Schema diffing, change classification
`test_migration_wizard.py`	76	Interactive wizard, adversarial inputs
`test_vector_backup.py`	32	Backup create/load/resume/rollback/cleanup
`test_pipeline_quantize.py`	12	Pipelined read/write/convert
`test_executor_backup_quantize.py`	7	Executor backup integration
`test_multi_worker_quantize.py`	22	Multi-worker, resume, deprecation
`test_async_migration_executor.py`	13	Async executor
`test_async_migration_planner.py`	11	Async planner
`test_batch_migration.py`	44	Batch planner/executor
Integration tests	6 files	Full end-to-end with live Redis

Total: 178+ unit tests passing, all pre-commit checks clean.

Review Notes

This is a pre-release: API surface is stable but may evolve based on feedback
6 rounds of automated code review (nkode-review) have been applied, addressing correctness, security, performance, and backward compatibility findings
The branch includes removal of the MCP module (previously merged separately). Those deletions are unrelated to the migrator
Documentation is in docs/user_guide/how_to_guides/migrate-indexes.md

…, validator, and utilities Adds the core data structures and planning engine for the Index Migrator: - models.py: Pydantic models for MigrationPlan, DiffClassification, ValidationResult, MigrationReport - planner.py: MigrationPlanner with schema introspection, diffing, and change classification - validation.py: MigrationValidator for post-migration checks - utils.py: shared helpers for YAML I/O, disk estimation, index listing, timestamps - connection.py: HNSW parameter extraction for schema introspection - 15 unit tests for planner logic

- Fix import ordering in utils.py (isort compliance) - Simplify validation prefix key rewriting to mirror executor logic - Normalize single-element list prefixes in normalize_target_schema_to_patch

Adds the migration executor and CLI subcommands for plan/apply/validate: - executor.py: MigrationExecutor with sync apply, key enumeration, index drop/create, quantization, field/key rename - reliability.py: BatchUndoBuffer, QuantizationCheckpoint, BGSAVE helpers - cli/migrate.py: CLI with plan, apply, validate, list, helper, estimate subcommands - cli/main.py: register migrate command - cli/utils.py: add_redis_connection_options helper - Integration tests for comprehensive migration, v1, routes, and field modifier ordering

- Fix CLI step labels to match executor order - Fix GEO coordinates to lat,lon order in integration tests - Move JSON path to top-level field property in tests - Use sys.exit() instead of exit() in CLI - Use transaction=False for quantize pipeline

Adds guided migration builder for interactive plan creation: - wizard.py: MigrationWizard with index selection, field operations, vector tuning, quantization, and preview - cli/migrate.py: adds 'wizard' subcommand (rvl migrate wizard --index <name>) - Unit tests for wizard logic (41 tests)

- Improve field removal to clean up renames by both old_name and new_name - Resolve update names through rename map in working schema preview - Add multi-prefix guard to reject indexes with multiple prefixes - Fix dependent prompts (UNF, no_index) when field is already sortable - Pass existing field attrs to common attrs prompts for update mode

…CLI flag Adds non-blocking async migration support: - async_executor.py: AsyncMigrationExecutor with async apply, BGSAVE, quantization - async_planner.py: AsyncMigrationPlanner with async create_plan - async_validation.py: AsyncMigrationValidator with async validate - async_utils.py: async Redis helpers - cli/migrate.py: adds --async flag to 'apply' subcommand - Unit tests for async executor and planner

- Fix SVS client leak in async_planner check_svs_requirements - Remove dead async_utils.py (functions duplicated in async_executor)

…ands - batch_planner.py: multi-index plan generation with pattern/list support - batch_executor.py: checkpointed batch execution with resume capability - CLI: batch-plan, batch-apply, batch-resume, batch-status subcommands - 32 unit tests for batch migration logic

… CLI - Refactor _check_index_applicability to return Tuple[BatchIndexEntry, bool] where bool indicates quantization, avoiding redundant create_plan_from_patch - Replace exit(1) with sys.exit(1) in batch-apply and batch-resume CLI commands - Sanitize report filenames (colons to underscores) for Windows compat

- docs/concepts/index-migrations.md: migration concepts and architecture - docs/user_guide/how_to_guides/migrate-indexes.md: step-by-step migration guide - docs/api/cli.rst: CLI reference for rvl migrate commands - tests/benchmarks/: migration benchmark scripts and visualization - Updated field-attributes, search-and-indexing, and user guide indexes

- Remove 13_sql_query_exercises.ipynb (unrelated to migration feature) - Replace ' -- ' emdashes with colons in crash-safe resume docs

- Fix async executor readiness check to handle missing percent_indexed - Fix benchmark wait_for_index_ready masking percent_indexed=0 - Fix wizard showing dependent prompts when sortable explicitly False - Fix CLI docs: --patch→--schema-patch, --output/-o→--plan-out - Fix migration docs: field renames now listed as supported - Fix batch resume not forwarding batch_plan_path to apply() - Fix batch-resume CLI missing quantization safety gate

- Fix --query-check → --query-check-file in cli.rst - Fix --target-schema mutually-exclusive ref to --schema-patch in cli.rst - Fix async validation functional check to match sync (>0 not ==) - Fix async quantize pipeline to use transaction=False - Fix test checkpoint status 'succeeded' → 'success'

…e order - Update wildcard_search details to say 'expected >0, source had N' instead of misleading 'expected N' - Change doc examples from 'succeeded' to 'success' matching BatchIndexState values - Add --accept-data-loss note for batch-resume in migration guide - Fix geo coordinate order (lon,lat) in test sample data

…ion validation Previously validation compared only num_docs between source and target, causing false negatives when migrations resolved indexing failures (e.g. vector datatype changes). Now compares total keys so that documents shifting from failures to indexed don't trigger a mismatch. Also adds a planner warning when the source index has hash_indexing_failures > 0.

…ard chaining, docs H-priority fixes: - Add MAXIDLE 300000 to FT.AGGREGATE cursors to extend timeout (sync+async) - Add Redis Cluster RENAME support via DUMP/RESTORE/DEL fallback (sync+async) - Improve collision error messages with recovery info for partial renames - Collapse chained field renames (A→B + B→C → A→C) in wizard - Auto-clear UNF/no_index when disabling sortable on previously-sortable field - Add clarifying comments for drop+create non-atomicity and resume scenarios P2 fixes: - Track all SVS compression types (set instead of single var) in planner - Remove stale mismatched checkpoint files instead of just warning - Handle empty prefix in async _sample_keys matching sync behavior - Deduplicate explicit index names in batch planner P3 fixes: - Fix example plan YAML structure in migrate-indexes.md (diff_classification) - Fix batch_report example (remove fabricated fields, add report_path) - Add --accept-data-loss to resume examples in docs - Move phonetic_matcher from manual-only to wizard-supported in field-attributes.md - Fix benchmark percent_indexed default from 1 to 0

…s, multi-worker Design spec for replacing BGSAVE + QuantizationCheckpoint with: 1. Pipelined reads (10x speedup from eliminating per-key round trips) 2. Local vector backup file (targeted rollback instead of full-DB BGSAVE) 3. Opt-in multi-worker parallelism (N=1 default, ThreadPoolExecutor/asyncio.gather)

- Moved dump phase before index drop so FT.AGGREGATE is always available - Backup file contains full key list — no SCAN needed at any point - Added crash recovery matrix for all 9 steps - Updated phase transitions: dump → ready → active → completed

- Explicit dump_completed_batches / quantize_completed_batches counters - Atomic header update only after pipeline_write succeeds - Crash mid-batch → re-process entire batch (HSET idempotent) - Phase table with index state and resume action per phase

2000-key example walking through: - 4 batches dumped (no mutations, index alive) - crash mid-pipeline on batch 2 quantize - state after crash (partial Redis writes, header not updated) - resume skips batches 0-1, re-processes batch 2 (HSET idempotent) - rollback reads originals from backup file, HSETs them back

New backup.py with 16 passing tests. Supports: - Create/load backup files (JSON header + binary data) - Write batches during dump phase with progress tracking - Phase transitions: dump → ready → active → completed - Atomic header updates (temp + rename) - Resume: iter_remaining_batches skips completed batches - Rollback: iter_batches reads all original vectors

New quantize.py with 6 passing tests: - pipeline_read_vectors: batch HGET via pipeline (1 round trip per batch) - pipeline_write_vectors: batch HSET via pipeline - convert_vectors: dtype conversion using buffer_to_array/array_to_buffer

…r (TDD) Two-phase quantize methods with 4 passing tests: - _dump_vectors: pipeline-reads originals to VectorBackup file - _quantize_from_backup: reads from backup, converts, pipeline-writes - Resume test: skips completed batches after simulated crash

… file - Add backup_dir and batch_size params to apply() - New flow: enumerate → field renames → DUMP → drop → key renames → QUANTIZE FROM BACKUP → create - Resume from backup file: reads phase/counters from header, no SCAN needed - BGSAVE removed from normal path - Legacy checkpoint_path still supported as fallback - All 812 tests pass

Mirror of sync executor changes: - Add backup_dir, batch_size params to async apply() - Add async _dump_vectors and _quantize_from_backup methods - Resume from backup file (no SCAN needed) - BGSAVE removed from normal async path - Legacy checkpoint_path still supported

- --backup-dir enables crash-safe resume via backup file - --batch-size controls pipeline batch size (default 500) - --resume deprecated (still works, use --backup-dir instead) - Pass new params through _apply_sync and _apply_async

Fixes 4 name-defined errors in executor.py and async_executor.py

Documents what was implemented, what's deferred (multi-worker), and what legacy components are kept for backward compatibility.

- split_keys: divide keys into N contiguous slices - multi_worker_quantize: orchestrate N workers via ThreadPoolExecutor - Each worker: own Redis connection + own backup file shard - --workers CLI flag (default 1, requires --backup-dir) - Wired into sync executor apply() flow - 8 new tests (split_keys, multi-worker, single-worker fallback) - Fix: restore missing 'return converted' in convert_vectors

- async_multi_worker_quantize: N concurrent async workers - _async_worker_quantize: per-worker async dump + convert + write - Wire multi-worker into both sync and async executor apply() - --workers N CLI flag (requires --backup-dir) - Fix missing return in convert_vectors, missing ) in multi_worker_quantize

Remove: - QuantizationCheckpoint/BatchUndoBuffer/bgsave usage from executors - _quantize_vectors / _async_quantize_vectors methods - --resume / checkpoint_path CLI flag and parameter - 32 legacy tests - Unused imports (array_to_buffer, buffer_to_array, is_already_quantized) Direct pipeline quantize is the fallback when no backup_dir provided. 788 tests pass, mypy clean.

- Backup files deleted automatically after successful migration - --keep-backup CLI flag to preserve backup files - keep_backup param on both sync/async apply() - Fix: use report.result=='succeeded' not report.status=='applied'

nkode-review findings addressed: - Add --resume as deprecated alias for --backup-dir with warning - Add num_workers >= 1 validation in split_keys() and CLI --workers - Replace assert statements with ValueError for multi-worker guards - Update apply() docstring to accurately describe multi-worker dump ordering New features: - Add 'rvl migrate rollback' CLI command to restore vectors from backups Documentation: - Expand executor/planner/async_executor docstrings with full parameter docs - Add 'Backup, Resume & Rollback' section to migration guide - Add Performance Tuning section with throughput tables and worker guidance - Add HNSW vs FLAT index capacity technical note - Add CLI migration examples to cli.ipynb - Update common flags (replace --resume with --backup-dir, --workers, etc.) Test scripts: - Add test_migration_e2e.py (500K doc benchmark) - Add test_crash_resume_e2e.py (crash-safe resume verification) - Add verify_data_correctness.py (float32->float16 value correctness)

Findings addressed: - Multi-worker resume: sync and async workers now attempt VectorBackup.load() before VectorBackup.create(), resuming from partial backups on re-run - Python 3.8 compat: replaced str.removesuffix() with Path.with_suffix('') - Rollback progress counter: count only keys with actual originals, not all keys - Codespell: renamed 'nd' variable to 'num_indexed' in e2e scripts Tests added: - TestRollbackCLI: header path derivation, iter_batches restore, edge cases code-rev results: - security: 0 confirmed findings (2 informational residual risks) - inspect --full: 5 findings, all addressed

…orker guards, overcount - Rollback CLI: add --index filter to scope restore to specific index - Rollback CLI: remove unused --batch-size flag - --resume: fail fast if value looks like a checkpoint file (old semantics) - --workers > 1: enforce --backup-dir at CLI level - Direct quantize overcount: count len(converted) not len(batch_keys) (sync+async)

…ollback Findings addressed: - Rollback gates on backup phase: refuses incomplete (phase='dump') backups unless --force is passed - Rollback requires --index or --yes when multiple indexes detected (no interactive input() prompt that blocks CI) - Backup cleanup uses exact .header/.data extensions and boundary check (prevents deleting unrelated files with shared prefix) - Async/sync direct quantize counts len(converted) not len(batch_keys) Tests added: - test_rollback_skips_incomplete_backup_phase - test_rollback_index_filter - test_rollback_multi_index_requires_flag - test_cleanup_only_removes_known_extensions - test_cleanup_does_not_match_similar_prefix

…mpat note Findings addressed: - Add TestWorkerResume: resume from dump/ready/completed phases, async load vs create, FileExistsError on double-create - Replace Unicode checkmark with ASCII in rollback CLI output - Finding #2 (--resume compat): intentional breaking change, already has clear error message - accepted as-is Tests: 174 passed (4 new resume tests)

…e CLI tests Findings addressed: - Add checkpoint_path as deprecated kwarg to MigrationExecutor.apply and AsyncMigrationExecutor.apply - maps to backup_dir with DeprecationWarning - Add TestResumeDeprecation: verify checkpoint_path in signatures, --resume YAML rejection, directory acceptance Accepted as-is: - Rollback non-recursive dir search: intentional flat layout - redis.asyncio import: redis>=5.0 already required Tests: 178 passed (4 new)

…lisions Critical fix: - Sync and async executor resume paths now perform key prefix renames after quantize (renames happen after drop in normal path, so they may not have completed before crash) Backup naming: - Add sha256[:8] hash of index_name to backup filenames to prevent collisions between distinct names that sanitize identically (e.g., 'a/b' and 'a:b' both become 'a_b' but have different hashes) - Applied to single-worker, multi-worker, and cleanup paths Accepted as-is: - --resume backward compat: intentional, clear error message - Rollback non-recursive: flat layout matches how backups are written Tests: 178 passed

…llback Performance fix: - Sync and async _rename_keys_cluster now batch DUMP+PTTL reads and RESTORE+DEL writes in groups of 100 (pipeline_size), reducing from 5 RTTs/key to ~3 RTTs per batch of 100 keys. ~50x fewer round trips for large key renames in Redis Cluster mode. Backward compat fix: - Both executors now probe for legacy backup filenames (pre-hash naming convention: migration_backup_<safe_name>) when the hashed path is not found. This ensures crash-resume works across library upgrades. Tests: 178 passed

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a pre-release RedisVL index migration system (sync + async) with planning, execution, validation, batch workflows, and extensive documentation/tests to enable crash-safe, document-preserving schema evolution.

Changes:

Introduces migration core modules (planner/executor/validator/backup/quantize/reliability) plus batch migration support.
Adds async equivalents and a new rvl migrate CLI entry point + docs.
Adds comprehensive unit/integration tests and benchmark/e2e helper scripts.

Reviewed changes

Copilot reviewed 45 out of 53 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
tests/unit/test_async_migration_planner.py	Adds async planner unit coverage mirroring sync planner tests.
tests/unit/test_async_migration_executor.py	Adds async executor + disk space estimator unit tests and dtype-detection tests.
tests/integration/test_migration_v1.py	End-to-end integration test for sync plan/apply/validate flow.
tests/integration/test_migration_routes.py	Integration coverage for supported migration “routes” (algo/metric/dtype/params).
tests/integration/test_field_modifier_ordering_integration.py	Adds integration tests for new/related field modifiers (INDEXEMPTY/UNF/NOINDEX).
tests/integration/test_batch_migration_integration.py	Adds integration tests for batch plan/apply/resume/progress callback.
tests/integration/test_async_migration_v1.py	End-to-end integration test for async plan/apply/validate flow.
tests/benchmarks/visualize_results.py	Adds benchmark visualization script for retrieval/memory/latency charts.
scripts/verify_data_correctness.py	Adds manual script to verify float32→float16 migration correctness.
scripts/test_migration_e2e.py	Adds large-scale e2e migration benchmark script.
scripts/test_crash_resume_e2e.py	Adds crash/resume robustness test script for quantization checkpointing.
redisvl/redis/connection.py	Enhances vector attribute parsing to include HNSW params (m, ef_construction).
redisvl/migration/validation.py	Adds sync migration validation (schema/doc counts/key samples/functional checks).
redisvl/migration/utils.py	Adds YAML helpers, schema canonicalization, readiness polling, disk estimation utilities.
redisvl/migration/reliability.py	Adds crash-safety utilities: dtype detection, checkpointing, BGSAVE helpers, undo buffer.
redisvl/migration/quantize.py	Adds pipelined (and multi-worker) quantization for vector dtype conversions.
redisvl/migration/models.py	Adds migration/batch models and disk space estimate models/helpers.
redisvl/migration/batch_planner.py	Adds batch planner for applying a shared patch across many indexes.
redisvl/migration/batch_executor.py	Adds batch executor with checkpointing/resume and reporting.
redisvl/migration/backup.py	Adds crash-safe on-disk backup format for vector dumps and resume.
redisvl/migration/async_validation.py	Adds async validator parity with sync validation checks.
redisvl/migration/async_planner.py	Adds async planner wrapper over sync diff/classification logic.
redisvl/migration/init.py	Exposes new migration/batch APIs at package boundary.
redisvl/cli/utils.py	Fixes redis URL scheme building and refactors CLI option helpers.
redisvl/cli/main.py	Wires in new `migrate` CLI command group.
docs/user_guide/index.md	Adds migration to user guide landing page highlights.
docs/user_guide/how_to_guides/index.md	Adds “Migrate an Index” how-to link and toctree entry.
docs/user_guide/cli.ipynb	Updates CLI notebook with `rvl migrate` commands and reorganizes connection section.
docs/concepts/search-and-indexing.md	Updates concept docs to point to new migration workflow and docs.
docs/concepts/index.md	Adds “Index Migrations” concept card and toctree entry.
docs/concepts/index-migrations.md	New concept doc describing migration modes, supported changes, and sync/async behavior.
docs/concepts/field-attributes.md	Expands vector datatype docs + migration support notes for modifiers.
docs/api/cli.rst	Adds a full CLI reference including `rvl migrate` command group.
CLAUDE.md	Adds protected directory note (local_docs/).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-14T05:48:26Z

redisvl/migration/validation.py

+                        keys_to_check.append(new_prefix + k[len(old_prefix) :])
+                    else:
+                        keys_to_check.append(k)
+            existing_count = target_index.client.exists(*keys_to_check)


Using EXISTS with multiple keys can fail on Redis Cluster when keys span multiple hash slots (multi-key commands are restricted). To make validation work reliably on clustered deployments, check key existence per-key (or via a pipeline of single-key EXISTS calls) and then compare the total count.

Suggested change

existing_count = target_index.client.exists(*keys_to_check)

existing_count = sum(target_index.client.exists(key) for key in keys_to_check)

Copilot · 2026-04-14T05:48:26Z

redisvl/migration/async_validation.py

+                        keys_to_check.append(new_prefix + k[len(old_prefix) :])
+                    else:
+                        keys_to_check.append(k)
+            existing_count = await client.exists(*keys_to_check)


Same Redis Cluster concern as the sync validator: await client.exists(*keys_to_check) is a multi-key operation and can error if keys live in different hash slots. Prefer checking keys individually (or pipelined single-key EXISTS) to keep async validation cluster-safe.

Suggested change

existing_count = await client.exists(*keys_to_check)

existing_count = 0

for key in keys_to_check:

existing_count += await client.exists(key)

Copilot · 2026-04-14T05:48:27Z

redisvl/migration/validation.py

+            if plan.rename_operations.change_prefix is not None:
+                old_prefix = plan.source.keyspace.prefixes[0]
+                new_prefix = plan.rename_operations.change_prefix
+                # Mirror executor logic exactly:
+                #   new_key = new_prefix + key[len(old_prefix):]


Prefix-change validation assumes plan.source.keyspace.prefixes[0] is the only (or correct) source prefix. If the index has multiple prefixes, translating sampled keys using only the first prefix can produce incorrect keys_to_check and false validation failures. Consider translating each sampled key by matching against all configured prefixes (or storing the actual matched prefix per sampled key in the snapshot).

Copilot · 2026-04-14T05:48:27Z

redisvl/migration/async_validation.py

+                old_prefix = plan.source.keyspace.prefixes[0]
+                new_prefix = plan.rename_operations.change_prefix
+                keys_to_check = []
+                for k in key_sample:
+                    if k.startswith(old_prefix):
+                        keys_to_check.append(new_prefix + k[len(old_prefix) :])
+                    else:
+                        keys_to_check.append(k)


Async validator has the same multi-prefix assumption as the sync validator, which can mis-translate key samples when more than one prefix is configured. Matching the sampled key against all prefixes (instead of using only prefixes[0]) would prevent incorrect existence checks.

Suggested change

old_prefix = plan.source.keyspace.prefixes[0]

new_prefix = plan.rename_operations.change_prefix

keys_to_check = []

for k in key_sample:

if k.startswith(old_prefix):

keys_to_check.append(new_prefix + k[len(old_prefix) :])

else:

keys_to_check.append(k)

old_prefixes = plan.source.keyspace.prefixes

new_prefix = plan.rename_operations.change_prefix

keys_to_check = []

for k in key_sample:

translated_key = k

for old_prefix in old_prefixes:

if k.startswith(old_prefix):

translated_key = new_prefix + k[len(old_prefix) :]

break

keys_to_check.append(translated_key)

Copilot · 2026-04-14T05:48:27Z

redisvl/migration/quantize.py

+        return []
+    n = len(keys)
+    chunk_size = math.ceil(n / num_workers)
+    return [keys[i : i + chunk_size] for i in range(0, n, chunk_size)]


The docstring states the returned list has num_workers slices and 'some may be empty if keys < workers', but the implementation returns fewer slices (never empty slices). Either update the docstring to reflect the current behavior, or adjust the function to always return exactly num_workers slices (padding with empty lists) if that invariant is important for worker bookkeeping and stable shard naming.

Suggested change

return []

n = len(keys)

chunk_size = math.ceil(n / num_workers)

return [keys[i : i + chunk_size] for i in range(0, n, chunk_size)]

return [[] for _ in range(num_workers)]

n = len(keys)

chunk_size = math.ceil(n / num_workers)

slices = [keys[i : i + chunk_size] for i in range(0, n, chunk_size)]

slices.extend([[] for _ in range(num_workers - len(slices))])

return slices

Copilot · 2026-04-14T05:48:27Z

redisvl/migration/quantize.py

+            docs_quantized = sum(
+                1 for _ in range(0, total, batch_size) for _ in keys[:batch_size]
+            )


In the 'completed' phase, the sum(...) computation is dead code because docs_quantized is immediately overwritten with total. Removing the unused computation reduces confusion and avoids implying that a more complex accounting is needed here.

Suggested change

docs_quantized = sum(

1 for _ in range(0, total, batch_size) for _ in keys[:batch_size]

)

Copilot · 2026-04-14T05:48:28Z

tests/unit/test_async_migration_executor.py

+    # The executor should raise an error internally when trying to connect
+    # but let's verify it doesn't crash before it tries to apply
+    # For a proper test, we'd need to mock AsyncSearchIndex.from_existing
+    # For now, we just verify the executor is created
+    assert executor is not None


This test doesn't exercise the documented behavior ('requires redis_url or redis_client')—it only asserts the executor can be instantiated. To make it meaningful, assert the actual failure mode (e.g., await executor.apply(plan) raises/returns a failed report) by mocking the connection path (AsyncSearchIndex.from_existing) or by passing neither redis_url nor redis_client and asserting the expected error/report content.

Suggested change

# The executor should raise an error internally when trying to connect

# but let's verify it doesn't crash before it tries to apply

# For a proper test, we'd need to mock AsyncSearchIndex.from_existing

# For now, we just verify the executor is created

assert executor is not None

with pytest.raises(ValueError, match="redis_url or redis_client"):

await executor.apply(plan)

jit-ci · 2026-04-14T05:55:18Z

🛡️ Jit Security Scan Results

✅ No security findings were detected in this PR

^{Security scan by Jit}

nkanu17 added 30 commits April 2, 2026 11:38

fix: address codex review for PR0 (design)

ea33dc2

- Fix import ordering in utils.py (isort compliance) - Simplify validation prefix key rewriting to mirror executor logic - Normalize single-element list prefixes in normalize_target_schema_to_patch

fix: address codex review for PR3 (async)

b8848af

- Fix SVS client leak in async_planner check_svs_requirements - Remove dead async_utils.py (functions duplicated in async_executor)

fix: remove unrelated SQL notebook, fix emdashes in migration docs

0fce1d3

- Remove 13_sql_query_exercises.ipynb (unrelated to migration feature) - Replace ' -- ' emdashes with colons in crash-safe resume docs

fix: handle empty plan_path in batch-resume safety gate

3902820

rename nitin_docs to local_docs

0ecd5c9

gitignore: add local_docs/ and stop tracking it

9c6b57b

feat: add --backup-dir and --batch-size CLI flags

9aa10f9

- --backup-dir enables crash-safe resume via backup file - --batch-size controls pipeline batch size (default 500) - --resume deprecated (still works, use --backup-dir instead) - Pass new params through _apply_sync and _apply_async

nkanu17 added 15 commits April 13, 2026 20:48

fix: add TYPE_CHECKING import for VectorBackup (mypy)

29a2294

Fixes 4 name-defined errors in executor.py and async_executor.py

spec: add implementation status section

9317ca8

Documents what was implemented, what's deferred (multi-worker), and what legacy components are kept for backward compatibility.

spec: mark all items complete, update implementation status

e0c8e45

Copilot AI review requested due to automatic review settings April 14, 2026 05:46

Copilot AI reviewed Apr 14, 2026

View reviewed changes

nkanu17 force-pushed the feat/index-migrator-pre-release branch from 3fe4972 to 3f03bb7 Compare April 14, 2026 05:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Index Migrator (Pre-release)#583

feat: Index Migrator (Pre-release)#583
nkanu17 wants to merge 45 commits intomainfrom
feat/index-migrator-pre-release

nkanu17 commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

jit-ci bot commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	existing_count = target_index.client.exists(*keys_to_check)
	existing_count = sum(target_index.client.exists(key) for key in keys_to_check)

-            existing_count = await client.exists(*keys_to_check)
+            existing_count = 0
+            for key in keys_to_check:
+                existing_count += await client.exists(key)

	docs_quantized = sum(
	1 for _ in range(0, total, batch_size) for _ in keys[:batch_size]
	)

Conversation

nkanu17 commented Apr 14, 2026

feat: Index Migrator (Pre-release)

Summary

Key Capabilities

Architecture

Usage

CLI: Interactive Wizard

CLI: Batch Migration (Multiple Indexes)

Programmatic API

Async API

Crash Safety & Resume

Performance

What's Blocked

New Files

redisvl/migration/ (core module)

redisvl/cli/migrate.py

Tests

Review Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

jit-ci bot commented Apr 14, 2026

🛡️ Jit Security Scan Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`redisvl/migration/` (core module)

`redisvl/cli/migrate.py`