Skip to content

feat(storage): multi-copy upload with store->pull->commit flow#593

Open
rvagg wants to merge 1 commit intorvagg/sp-sp-fetchfrom
rvagg/pull-upload-flow
Open

feat(storage): multi-copy upload with store->pull->commit flow#593
rvagg wants to merge 1 commit intorvagg/sp-sp-fetchfrom
rvagg/pull-upload-flow

Conversation

@rvagg
Copy link
Collaborator

@rvagg rvagg commented Feb 6, 2026

Sits on top of #544 which has the synapse-core side of this.


Implement store->pull->commit flow for efficient multi-copy storage replication.

Split operations API on StorageContext:

  • store(): upload data to SP, wait for parking confirmation
  • presignForCommit(): pre-sign EIP-712 extraData for pull + commit reuse
  • pull(): request SP-to-SP transfer from another provider
  • commit(): add pieces on-chain with optional pre-signed extraData
  • getPieceUrl(): get retrieval URL for SP-to-SP pulls

StorageManager.upload() orchestration:

  • Default 2 copies (primary + endorsed secondary)
  • Single-provider: store->commit flow
  • Multi-copy: store on primary, presign, pull to secondaries, commit all
  • Auto-retry failed secondaries with provider exclusion (up to 5 attempts)
  • Pre-signing avoids redundant wallet prompts across providers

Callback refinements:

  • Remove redundant onUploadComplete (use onStored instead)
  • onStored(providerId, pieceCid) - after data parked on provider
  • onPieceAdded(providerId, pieceCid) - after on-chain submission
  • onPieceConfirmed(providerId, pieceCid, pieceId) - after confirmation

Type clarity:

  • Rename UploadOptions.metadata -> pieceMetadata (piece-level)
  • Rename CommitOptions.pieces[].metadata -> pieceMetadata
  • Dataset-level metadata remains in CreateContextOptions.metadata
  • New: StoreError, CommitError for clear failure semantics
  • New: CopyResult, FailedCopy for multi-copy transparency

Implements #494

Implement store->pull->commit flow for efficient multi-copy storage replication.

Split operations API on StorageContext:
- store(): upload data to SP, wait for parking confirmation
- presignForCommit(): pre-sign EIP-712 extraData for pull + commit reuse
- pull(): request SP-to-SP transfer from another provider
- commit(): add pieces on-chain with optional pre-signed extraData
- getPieceUrl(): get retrieval URL for SP-to-SP pulls

StorageManager.upload() orchestration:
- Default 2 copies (primary + endorsed secondary)
- Single-provider: store->commit flow
- Multi-copy: store on primary, presign, pull to secondaries, commit all
- Auto-retry failed secondaries with provider exclusion (up to 5 attempts)
- Pre-signing avoids redundant wallet prompts across providers

Callback refinements:
- Remove redundant onUploadComplete (use onStored instead)
- onStored(providerId, pieceCid) - after data parked on provider
- onPieceAdded(providerId, pieceCid) - after on-chain submission
- onPieceConfirmed(providerId, pieceCid, pieceId) - after confirmation

Type clarity:
- Rename UploadOptions.metadata -> pieceMetadata (piece-level)
- Rename CommitOptions.pieces[].metadata -> pieceMetadata
- Dataset-level metadata remains in CreateContextOptions.metadata
- New: StoreError, CommitError for clear failure semantics
- New: CopyResult, FailedCopy for multi-copy transparency

Implements #494
@rvagg rvagg requested a review from hugomrdias as a code owner February 6, 2026 14:06
@github-project-automation github-project-automation bot moved this to 📌 Triage in FOC Feb 6, 2026
@socket-security
Copy link

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addediso-web@​2.1.0941007693100
Addedplaywright-test@​14.1.128710010083100
Addeddnum@​2.17.08610010085100
Addedchai@​6.2.29910010088100

View full report

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Feb 6, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
❌ Deployment failed
View logs
synapse-dev 9d9e0d6 Feb 06 2026, 02:08 PM

@rvagg
Copy link
Collaborator Author

rvagg commented Feb 6, 2026

Docs lint failing, this still needs a big docs addition but that can come a little later as we get through review here.

Here's some notes I built up about failure modes and handling:

Multi-Copy Upload: Failure Handling

Philosophy

  1. Primary store failure = hard fail: If we can't store on primary, throw immediately
  2. All commits fail = hard fail: If no provider commits successfully, throw CommitError
  3. Partial commit failure = record and return: Record failed providers in failures[] (with role), return result with successful copies[]
  4. Secondary failure = best-effort: Retry with replacement SPs, then commit whatever succeeded
  5. Never throw away successful work: If data is committed on any provider, the user gets a result -- not an exception
  6. Explicit providers = no retry: User specified providers, respect their choice
  7. Batch semantics: All pieces must succeed on a provider, or that provider is failed
  8. Transparency over exceptions: failures[] tells the user what went wrong; copies[] tells them what worked

Partial Success Over Atomicity

When a user requests N copies and we can only achieve fewer, we commit what we have rather than throwing everything away:

  • Best-effort exhaustion: For auto-selected providers, we retry up to 5 secondaries before giving up
  • Upload work is expensive: Throwing discards successful uploads; parked pieces get GC'd by the SP
  • No information loss: throw after partial success destroys information about what did succeed
  • Result inspection is the contract: result.copies.length < count tells the user they got fewer copies; result.failures tells them why

Failure Modes by Stage

The multi-copy upload has a sequential pipeline: store → pull → commit.

Stage 1: Store (upload data to primary SP)

What happens Data on primary? On-chain? Behaviour
Upload to primary succeeds Yes (parked) No Continue to pull
Upload to primary fails No No Throw StoreError -- nothing happened, safe to retry

Store failure is unambiguous: no data exists anywhere, no on-chain state was created.

Stage 2: Pull (SP-to-SP fetch to secondaries)

What happens Data on secondary? On-chain? Behaviour
Pull succeeds Yes (parked) No Continue to commit
Pull fails (auto-selected) No No Retry with next provider (up to 5 attempts)
Pull fails (explicit provider) No No Record in failures[], no retry
All secondary attempts exhausted No No Proceed to commit with primary only

Pull failure is recoverable: data is still on the primary, no on-chain state exists yet. Retrying pull is cheap (SP-to-SP, no client bandwidth).

Stage 3: Commit (addPieces on-chain transaction)

What happens Data on SP? On-chain? Behaviour
All commits succeed Yes Yes Build result with all copies
Primary commit succeeds, secondary fails Yes Primary: yes Record secondary in failures[]
Primary commit fails, secondary succeeds Yes Secondary: yes Record primary in failures[] with role: 'primary', return with secondary in copies[]
Primary commit fails, secondary also fails Yes (parked) No Throw CommitError -- nothing on-chain, safe to retry
Secondary commit fails Yes (parked) No Record in failures[] -- data on SP, will be GC'd

Behaviour Matrix

Scenario Behaviour
Primary store fails Throw StoreError -- nothing happened
Primary commit fails, secondary succeeds Record primary in failures[] with role: 'primary', return result
All commits fail Throw CommitError -- nothing on-chain
Secondary pull fails (auto-selected) Retry with next provider (up to 5 attempts)
Secondary pull fails (explicit) Record in failures[], no retry
All secondary attempts exhausted Commit primary only, record failures
Secondary commit fails Record in failures[] -- data on SP, will be GC'd
Failover creates new dataset Mark isNewDataSet: true in CopyResult
copies.length < count Partial success -- user should inspect failures[]

Error Types

/** Primary store failed - no data stored anywhere, safe to retry */
class StoreError extends Error {
  name = 'StoreError'
}

/** All commits failed - data stored on SP(s) but nothing on-chain, safe to retry */
class CommitError extends Error {
  name = 'CommitError'
}

// Partial commit failures appear in result.failures[] with role: 'primary' or 'secondary'
// Only throws CommitError when ALL providers fail to commit

What Users Must Check

Users should always inspect result.failures, not just check that upload() didn't throw:

// If ALL commits fail, upload() throws CommitError
// If at least one succeeds, we get a result:
const result = await synapse.storage.upload(data, { count: 3 })

// Check if endorsed provider (primary) failed
const primaryFailed = result.failures.find(f => f.role === 'primary')
if (primaryFailed) {
  console.warn(`Endorsed provider ${primaryFailed.providerId} failed: ${primaryFailed.error}`)
  // Data is only on non-endorsed secondaries
}

// Check if we got all requested copies
if (result.copies.length < 3) {
  console.warn(`Only ${result.copies.length}/3 copies succeeded`)
  for (const failure of result.failures) {
    console.warn(`  Provider ${failure.providerId} (${failure.role}): ${failure.error}`)
  }
}

// Every copy in copies[] is committed on-chain
for (const copy of result.copies) {
  console.log(`Provider ${copy.providerId}, dataset ${copy.dataSetId}, piece ${copy.pieceId}`)
}

Auto-Retry Logic

When user calls upload(data, { count: 2 }) without explicit providerIds or dataSetIds:

  1. Select primary (endorsed preferred)
  2. Store on primary
  3. Select secondary candidate from pool (excluding primary)
  4. Pull to secondary
  5. If pull fails:
    • Mark secondary as failed
    • Select next secondary from pool
    • Retry pull (data already on primary)
    • Repeat until: success OR exhausted pool OR hit MAX_SECONDARY_ATTEMPTS (5)
  6. If no secondary succeeded → proceed to commit with primary only
  7. Commit on all successful providers
  8. Return result with copies[] and failures[]

When user specifies providerIds or dataSetIds: no auto-retry, failures recorded in failures[].

Design Decision: Primary Commit Failure Handling

Current implementation commits on all providers in parallel via Promise.allSettled(). If primary commit fails but secondary commit succeeds, we record the primary failure and return with the secondary in copies[].

Endorsed providers are selected as primary because they're curated for reliability. If primary (endorsed) fails but secondary (non-endorsed) succeeds, the user ends up with data only on non-endorsed providers. This may not meet product requirements of having one copy on an endorsed provider.

// Check if endorsed provider failed
const primaryFailed = result.failures.some(f => f.role === 'primary')
if (primaryFailed) {
  // Handle: retry, alert, or treat as error depending on requirements
}

@timfong888
Copy link

timfong888 commented Feb 6, 2026

I noticed this:

Primary store failure = hard fail: If we can't store on primary, throw immediately

What is the test for the availability of an Endorsed Provider in the case we have more than one? If the first store fails, is there a retry?

Under retry:

Select primary (endorsed preferred)
Store on primary

If we have 2 Endorsed, and the store on primary operation fails do we retry the other endorsed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 📌 Triage

Development

Successfully merging this pull request may close these issues.

2 participants