Skip to content

Conversation

@wojiaodoubao
Copy link
Contributor

A common practice for organizing multimodal data is to package a large number of scattered files into several large archives (such as tar or zip files), which can prevent excessive inode consumption, achieve higher read speeds via streaming, and avoid inconsistencies between metadata and the scattered stored files.

To better support such archives, I suggest we extend the ExternalPacked format for Lance BlobV2. For example linking images from the archives to a Lance table.

@github-actions github-actions bot added enhancement New feature or request python labels Jan 21, 2026
@wojiaodoubao wojiaodoubao marked this pull request as draft January 21, 2026 03:59
@wojiaodoubao wojiaodoubao marked this pull request as ready for review January 21, 2026 06:03
@codecov
Copy link

codecov bot commented Jan 21, 2026

Codecov Report

❌ Patch coverage is 57.85124% with 51 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/blob.rs 17.74% 46 Missing and 5 partials ⚠️

📢 Thoughts on this report? Let us know!

@Xuanwo
Copy link
Collaborator

Xuanwo commented Jan 21, 2026

This idea is kinda interesting. So your idea focuses more on reading an external packed blob, and Lance won't generate such a blob itself, is that correct?

@wojiaodoubao
Copy link
Contributor Author

So your idea focuses more on reading an external packed blob, and Lance won't generate such a blob itself, is that correct?

Yes.

@Xuanwo
Copy link
Collaborator

Xuanwo commented Jan 22, 2026

Yes.

Then I think we don't need to add a new type. We can simply enable reading a range from an external blob.

@wojiaodoubao
Copy link
Contributor Author

We can simply enable reading a range from an external blob.

Do you mean we could enable External BlobFile to handle any arbitrary range(currently the position is always 0), thus eliminating the need to introduce ExternalPacked as an extra component?

It makes sense to me, let me update the pr.

@Xuanwo
Copy link
Collaborator

Xuanwo commented Jan 22, 2026

Do you mean we could enable External BlobFile to handle any arbitrary range(currently the position is always 0), thus eliminating the need to introduce ExternalPacked as an extra component?

Yes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants