Icebird is a library for reading Apache Iceberg tables in JavaScript. It is built on top of hyparquet for reading the underlying parquet files.
To read an Iceberg table:
const { icebergRead } = await import('icebird')
const tableUrl = 'https://s3.amazonaws.com/hyperparam-iceberg/spark/bunnies'
const data = await icebergRead({
tableUrl,
rowStart: 0,
rowEnd: 10,
})To read the Iceberg metadata (schema, etc):
import { icebergMetadata } from 'icebird'
const metadata = await icebergMetadata({ tableUrl })
// subsequent reads will be faster if you provide the metadata:
const data = await icebergRead({
tableUrl,
metadata,
})Check out a minimal iceberg table viewer demo that shows how to integrate Icebird into a react web application using HighTable to render the table data. You can view any publicly accessible Iceberg table:
- Live Demo: https://hyparam.github.io/demos/icebird/
- Demo Source Code: https://github.com/hyparam/demos/tree/master/icebird
To fetch a previous version of the table, you can specify metadataFileName:
import { icebergRead } from 'icebird'
const data = await icebergRead({
tableUrl,
metadataFileName: 'v1.metadata.json',
})To add authentication or other custom fetch options, create a resolver and lister with requestInit and pass those into the public APIs:
import { icebergMetadata, icebergRead, s3Lister, urlResolver } from 'icebird'
const requestInit = {
headers: {
Authorization: 'Bearer my_token',
},
}
const resolver = urlResolver({ requestInit })
const lister = s3Lister({ requestInit })
const metadata = await icebergMetadata({
tableUrl,
resolver,
lister,
})
const data = await icebergRead({
tableUrl,
metadata,
resolver,
lister,
})To read a table from an Iceberg REST Catalog, connect to the catalog and load the table metadata, then pass it to icebergRead:
import {
icebergRead,
restCatalogConnect,
restCatalogListTables,
restCatalogLoadTable,
} from 'icebird'
const requestInit = {
headers: { Authorization: 'Bearer my_token' },
}
const ctx = await restCatalogConnect({
url: 'https://catalog.example.com',
warehouse: 'my-warehouse', // optional
requestInit,
})
const tables = await restCatalogListTables(ctx, { namespace: 'analytics' })
const { metadata } = await restCatalogLoadTable(ctx, {
namespace: 'analytics',
table: 'orders',
})
const data = await icebergRead({
tableUrl: metadata.location,
metadata,
rowStart: 0,
rowEnd: 10,
})Multi-level namespaces can be passed as an array:
const { metadata } = await restCatalogLoadTable(ctx, {
namespace: ['db', 'sub'],
table: 'orders',
})Icebird aims to support reading any Iceberg table, but currently only supports a subset of the features. The following features are supported:
| Feature | Supported | Notes |
|---|---|---|
| Read Iceberg v1 Tables | ✅ | |
| Read Iceberg v2 Tables | ✅ | |
| Read Iceberg v3 Tables | ❌ | Needs broader v3 fixture coverage before broad v3 support. |
| Parquet Storage | ✅ | |
| Avro Storage | ✅ | |
| ORC Storage | ❌ | |
| Puffin Storage | Supports uncompressed deletion-vector-v1 blobs only. |
|
| File-based Catalog (version-hint.text) | ✅ | |
| REST Catalog | ✅ | |
| Hive Catalog | ❌ | |
| Glue Catalog | ❌ | |
| Service-based Catalog | ❌ | |
| Position Deletes | ✅ | Supports Parquet position delete files and Puffin deletion vectors. |
| Equality Deletes | ✅ | |
| Binary Deletion Vectors | ✅ | Supports uncompressed Puffin deletion-vector-v1 blobs. |
| Delete Partition Scope | ✅ | Applies sequence and partition scope before filtering rows. |
| Rename Columns | ✅ | |
| Efficient Partitioned Read Queries | ❌ | |
| Gzip Metadata JSON | ✅ | Supports .gz.metadata.json and metadata.json.gz. |
| All Parquet Compression Codecs | ✅ | |
| All Parquet Types | ✅ | |
| Variant Types | ✅ | |
| Geometry Types | ✅ | |
| Geography Types | ✅ | |
| Row Lineage | ✅ | v3 _row_id and _last_updated_sequence_number inheritance. |
| Sorting | ❌ | |
| Encryption | ❌ |
