-
Notifications
You must be signed in to change notification settings - Fork 837
Open
Description
Databend Roadmap 2024-2025 · Multimodal Data+AI Warehouse
Messaging Anchor
- Hero positioning (from README): Blazing analytics, fast search, geo insights, vector AI — supercharged in a new-era Snowflake-compatible warehouse.
- Objective: every 2024-2025 deliverable should reinforce Databend as a Snowflake-compatible, multimodal warehouse that unifies BI, search, geo, and vector AI.
2024: Cloud-Native & Snowflake-Like Experience
- Cloud-native parity: Deliver a fully-managed experience comparable to Snowflake Cloud.
- 50% cost reduction: Keep the cost/performance advantage through storage and compute optimizations.
- Try Databend Cloud: https://www.databend.com/
2025: On-Prem Replacement with Multimodal Depth
Focus areas inherited from the README pillars:
- Analytics-native: ANSI SQL, window functions, incremental aggregates, streaming ingest.
- Search-native: JSON/full-text indexes and structured filters for hybrid search.
- Geo-native: Spatial data types plus indexing (H3/GeoHash) to power location intelligence.
- Vector-native: Vector types, embeddings, and HNSW indexes for semantic retrieval/RAG.
- Unified engine & data: One optimizer/runtime across structured, semi-structured, vector, and unstructured data on object storage.
Main Tasks for 2025
| Task | Status | Comments |
|---|---|---|
| Dynamic Cluster Management | Done | Make query nodes more dynamic for better resource allocation. |
| Resource Group Management | Done | Enhance on-prem query control through better resource group management. |
| Disaster Recovery | Done | Implement multi-region failover solutions including point-in-time recovery and backup automation. |
| Stability Improvements | In Progress | Increase on-prem stability and fault tolerance for higher reliability. |
| Search Improvements | In Progress | v1.2.830-nightly shipped inverted indexes for VARIANT inner fields via #18861; next steps: expose MATCH/LIMIT latency tracking + VARIANT FULLTEXT scoring. |
| Spatial Indexing for Geometry | Planned | Geo defaults (v1.2.832-nightly, #18873) and HTTP geometry output (v1.2.841-nightly, #18963) are in; next deliverable is actual spatial/H3 index structures. |
| Vector Index & Embeddings Enhancements | Planned | Vector index SIMD quantization (v1.2.846-nightly, #18957) and native storage fixes (v1.2.836-nightly, #18932) landed; next up is multi-probe HNSW + native embeddings catalog. |
| Analytics: Table Statistics Admin API | Done | v1.2.841-nightly added #18967, giving cost-based optimizations visibility into per-table/per-column stats. |
| Analytics: Fuse Parquet Dictionary Toggle | Done | v1.2.840-nightly introduced enable_parquet_dictionary so BI workloads can tune compression vs. speed. |
| Search: External Lake Virtual Columns | Done | v1.2.844-nightly merged #18981, enabling virtual columns on external tables to project search/index keys without copy. |
| Geo: Defaults & Client Output | Done | v1.2.832-nightly turned on geo + virtual-column toggles by default (#18873); v1.2.841-nightly added HTTP geometry_output_format (#18963). |
| Vector: Index Performance Hardening | Done | v1.2.846-nightly + #18957 accelerate quantization; v1.2.836-nightly + #18932 fix panic paths when mixing vector indexes with native storage. |
cdmikechen, mohamedawnallah, onefanwu, GitToby, Rowlandev and 6 moredrmingdrmer, hantmac, TylerHillery, TCeason, GaoYusong and 11 more
Metadata
Metadata
Assignees
Labels
No labels