Skip to content

Databend Roadmap for 2025 #14167

@BohuTANG

Description

@BohuTANG

Databend Roadmap 2024-2025 · Multimodal Data+AI Warehouse

Messaging Anchor

  • Hero positioning (from README): Blazing analytics, fast search, geo insights, vector AI — supercharged in a new-era Snowflake-compatible warehouse.
  • Objective: every 2024-2025 deliverable should reinforce Databend as a Snowflake-compatible, multimodal warehouse that unifies BI, search, geo, and vector AI.

2024: Cloud-Native & Snowflake-Like Experience

  • Cloud-native parity: Deliver a fully-managed experience comparable to Snowflake Cloud.
  • 50% cost reduction: Keep the cost/performance advantage through storage and compute optimizations.
  • Try Databend Cloud: https://www.databend.com/

2025: On-Prem Replacement with Multimodal Depth

Focus areas inherited from the README pillars:

  • Analytics-native: ANSI SQL, window functions, incremental aggregates, streaming ingest.
  • Search-native: JSON/full-text indexes and structured filters for hybrid search.
  • Geo-native: Spatial data types plus indexing (H3/GeoHash) to power location intelligence.
  • Vector-native: Vector types, embeddings, and HNSW indexes for semantic retrieval/RAG.
  • Unified engine & data: One optimizer/runtime across structured, semi-structured, vector, and unstructured data on object storage.

Main Tasks for 2025

Task Status Comments
Dynamic Cluster Management Done Make query nodes more dynamic for better resource allocation.
Resource Group Management Done Enhance on-prem query control through better resource group management.
Disaster Recovery Done Implement multi-region failover solutions including point-in-time recovery and backup automation.
Stability Improvements In Progress Increase on-prem stability and fault tolerance for higher reliability.
Search Improvements In Progress v1.2.830-nightly shipped inverted indexes for VARIANT inner fields via #18861; next steps: expose MATCH/LIMIT latency tracking + VARIANT FULLTEXT scoring.
Spatial Indexing for Geometry Planned Geo defaults (v1.2.832-nightly, #18873) and HTTP geometry output (v1.2.841-nightly, #18963) are in; next deliverable is actual spatial/H3 index structures.
Vector Index & Embeddings Enhancements Planned Vector index SIMD quantization (v1.2.846-nightly, #18957) and native storage fixes (v1.2.836-nightly, #18932) landed; next up is multi-probe HNSW + native embeddings catalog.
Analytics: Table Statistics Admin API Done v1.2.841-nightly added #18967, giving cost-based optimizations visibility into per-table/per-column stats.
Analytics: Fuse Parquet Dictionary Toggle Done v1.2.840-nightly introduced enable_parquet_dictionary so BI workloads can tune compression vs. speed.
Search: External Lake Virtual Columns Done v1.2.844-nightly merged #18981, enabling virtual columns on external tables to project search/index keys without copy.
Geo: Defaults & Client Output Done v1.2.832-nightly turned on geo + virtual-column toggles by default (#18873); v1.2.841-nightly added HTTP geometry_output_format (#18963).
Vector: Index Performance Hardening Done v1.2.846-nightly + #18957 accelerate quantization; v1.2.836-nightly + #18932 fix panic paths when mixing vector indexes with native storage.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions