Skip to content

Commit 3a12aec

Browse files
authored
chore(python-binding): documentation and PyPI metadata (#18711)
* Improve Python binding documentation and PyPI metadata - Enhanced README.md with comprehensive API reference tables organized by functionality - Added clear examples progression from local tables to cloud storage - Updated pyproject.toml description to highlight multi-modal data warehouse for AI era - Added readme field to ensure PyPI displays full documentation - Improved messaging around structured data, JSON, and vector embeddings support * Fix YAML formatting issues - Remove trailing spaces in bindings.python.yml and build_bindings_python action - Fix yamllint errors for Python binding CI files
1 parent fa9b15e commit 3a12aec

File tree

2 files changed

+66
-16
lines changed

2 files changed

+66
-16
lines changed

src/bendpy/README.md

Lines changed: 64 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,91 @@
11
# Databend Python Binding
22

3-
Official Python binding for [Databend](https://databend.com) - The AI-Native Data Warehouse.
3+
Official Python binding for [Databend](https://databend.com) - The multi-modal data warehouse built for the AI era.
44

5-
Databend is the open-source alternative to Snowflake with near 100% SQL compatibility and native AI capabilities. Built in Rust with MPP architecture and S3-native storage, Databend unifies structured tables, JSON documents, and vector embeddings in a single platform.
5+
Databend unifies structured data, JSON documents, and vector embeddings in a single platform with near 100% Snowflake compatibility. Built in Rust with MPP architecture and S3-native storage for cloud-scale analytics.
66

77
## Installation
88

99
```bash
1010
pip install databend
1111
```
1212

13+
To test, run:
14+
```python
15+
python3 -c "import databend; ctx = databend.SessionContext(); ctx.sql('SELECT version() AS version').show()"
16+
```
17+
18+
## API Reference
19+
20+
### Core Operations
21+
| Method | Description |
22+
|--------|-------------|
23+
| `SessionContext()` | Create a new session context |
24+
| `sql(query)` | Execute SQL query, returns DataFrame |
25+
26+
### File Registration
27+
| Method | Description |
28+
|--------|-------------|
29+
| `register_parquet(name, path, pattern=None, connection=None)` | Register Parquet files as table |
30+
| `register_csv(name, path, pattern=None, connection=None)` | Register CSV files as table |
31+
| `register_ndjson(name, path, pattern=None, connection=None)` | Register NDJSON files as table |
32+
| `register_tsv(name, path, pattern=None, connection=None)` | Register TSV files as table |
33+
34+
### Cloud Storage Connections
35+
| Method | Description |
36+
|--------|-------------|
37+
| `create_s3_connection(name, key, secret, endpoint=None, region=None)` | Create S3 connection |
38+
| `create_azblob_connection(name, url, account, key)` | Create Azure Blob connection |
39+
| `create_gcs_connection(name, url, credential)` | Create Google Cloud connection |
40+
| `list_connections()` | List all connections |
41+
| `describe_connection(name)` | Show connection details |
42+
| `drop_connection(name)` | Remove connection |
43+
44+
### Stage Management
45+
| Method | Description |
46+
|--------|-------------|
47+
| `create_stage(name, url, connection)` | Create external stage |
48+
| `show_stages()` | List all stages |
49+
| `list_stages(stage_name)` | List files in stage |
50+
| `describe_stage(name)` | Show stage details |
51+
| `drop_stage(name)` | Remove stage |
52+
53+
### DataFrame Operations
54+
| Method | Description |
55+
|--------|-------------|
56+
| `collect()` | Execute and collect results |
57+
| `show(num=20)` | Display results in console |
58+
| `to_pandas()` | Convert to pandas DataFrame |
59+
| `to_polars()` | Convert to polars DataFrame |
60+
| `to_arrow_table()` | Convert to PyArrow Table |
61+
1362
## Examples
1463

15-
### Local Files
64+
### Local Tables
1665

1766
```python
1867
import databend
1968
ctx = databend.SessionContext()
2069

21-
# Query local Parquet files
22-
ctx.register_parquet("orders", "/path/to/orders/")
23-
ctx.register_parquet("customers", "/path/to/customers/")
24-
df = ctx.sql("SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id").to_pandas()
70+
# Create and query in-memory tables
71+
ctx.sql("CREATE TABLE users (id INT, name STRING, age INT)").collect()
72+
ctx.sql("INSERT INTO users VALUES (1, 'Alice', 25), (2, 'Bob', 30)").collect()
73+
df = ctx.sql("SELECT * FROM users WHERE age > 25").to_pandas()
2574
```
2675

27-
### Local Tables
76+
### Working with Local Files
2877

2978
```python
3079
import databend
3180
ctx = databend.SessionContext()
3281

33-
# Create and query local tables
34-
ctx.sql("CREATE TABLE users (id INT, name STRING, age INT)").collect()
35-
ctx.sql("INSERT INTO users VALUES (1, 'Alice', 25), (2, 'Bob', 30)").collect()
36-
df = ctx.sql("SELECT * FROM users WHERE age > 25").to_pandas()
82+
# Query local Parquet files
83+
ctx.register_parquet("orders", "/path/to/orders/")
84+
ctx.register_parquet("customers", "/path/to/customers/")
85+
df = ctx.sql("SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id").to_pandas()
3786
```
3887

39-
### S3 Remote Files
88+
### Cloud Storage - S3 Files
4089

4190
```python
4291
import databend
@@ -49,14 +98,14 @@ ctx.register_parquet("trips", "s3://bucket/trips/", connection="s3")
4998
df = ctx.sql("SELECT COUNT(*) FROM trips").to_pandas()
5099
```
51100

52-
### Remote Tables
101+
### Cloud Storage - S3 Tables
53102

54103
```python
55104
import databend
56105
import os
57106
ctx = databend.SessionContext()
58107

59-
# Create S3 connection and table
108+
# Create S3 connection and persistent table
60109
ctx.create_s3_connection("s3", os.getenv("AWS_ACCESS_KEY_ID"), os.getenv("AWS_SECRET_ACCESS_KEY"))
61110
ctx.sql("CREATE TABLE s3_table (id INT, name STRING) 's3://bucket/table/' CONNECTION=(CONNECTION_NAME='s3')").collect()
62111
df = ctx.sql("SELECT * FROM s3_table").to_pandas()

src/bendpy/pyproject.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,8 @@ build-backend = "maturin"
77
version = "0.1.0"
88
name = "databend"
99
requires-python = ">=3.10"
10-
description = "Official Python binding for Databend - The AI-Native Data Warehouse. Open-source alternative to Snowflake with near 100% SQL compatibility, native AI capabilities, and unified support for structured, semi-structured, and vector data."
10+
description = "The multi-modal data warehouse built for the AI era. Unified analytics for structured data, JSON, and vector embeddings with near 100% Snowflake compatibility."
11+
readme = "README.md"
1112
license = {text = "Apache-2.0"}
1213
authors = [{name = "Databend Labs", email = "hi@databend.com"}]
1314
maintainers = [{name = "Databend Community", email = "hi@databend.com"}]

0 commit comments

Comments
 (0)