Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions mkdocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ nav:
- Types: reference/specs/type-system.md
- Codec API: reference/specs/codec-api.md
- NPY Codec: reference/specs/npy-codec.md
- Storage Adapter API: reference/specs/storage-adapter-api.md
- Data Operations:
- Data Manipulation: reference/specs/data-manipulation.md
- AutoPopulate: reference/specs/autopopulate.md
Expand All @@ -126,6 +127,7 @@ nav:
- API: api/ # Auto-generated via gen-files + literate-nav
- About:
- about/index.md
- What's New in 2.3: about/whats-new-23.md
- What's New in 2.2: about/whats-new-22.md
- What's New in 2.1: about/whats-new-21.md
- What's New in 2.0: about/whats-new-2.md
Expand Down
137 changes: 137 additions & 0 deletions src/about/whats-new-23.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# What's New in DataJoint 2.3

DataJoint 2.3 introduces **env-var-only configuration of storage**, **a public plugin-adapter contract for third-party storage protocols**, and tightens credential loading for files.

> **Upgrading from 2.2?** No breaking changes for projects using `datajoint.json` or `.secrets/`. The new env vars are purely additive.

## Overview

The DataJoint platform — and many production deployments generally — provision configuration entirely from environment variables: there is no `datajoint.json` in the container image and no `.secrets/` directory on disk. Until 2.3, this worked for the database connection (`DJ_HOST`, `DJ_USER`, `DJ_PASS`, …) but **not** for object stores: per-store credentials had to be configured through `datajoint.json` or `.secrets/stores.<name>.<attr>` files.

DataJoint 2.3 closes that gap with two new env vars, both purely additive:

- `DJ_STORES` — a JSON-encoded copy of the entire `stores` dict, in the same shape used in `datajoint.json`.
- `DJ_IGNORE_CONFIG_FILE` — a boolean flag that skips both `datajoint.json` and the secrets directory entirely.

The 2.3 release also formalizes the **storage-adapter plugin contract** (`datajoint.storage` entry-point group), which had been used internally since 2.0 but lacked a published spec. Third-party packages can now register storage protocols (Databricks Unity Catalog Volumes, custom HTTP-based stores, lab-specific archive systems, …) by subclassing `dj.StorageAdapter` and declaring an entry point.

## `DJ_STORES` — JSON-encoded stores configuration

!!! version-added "New in 2.3"
`DJ_STORES` accepts a JSON object identical to the `stores` block of `datajoint.json`.

A single env var carries the entire `stores` dict. The format matches what users already write in `datajoint.json`, so config can be moved between file and env var by copy-paste — no per-field naming scheme to learn.

```bash
export DJ_STORES='{
"default": "main",
"main": {
"protocol": "s3",
"endpoint": "s3.amazonaws.com",
"bucket": "my-bucket",
"location": "my-project/production",
"access_key": "AKIA...",
"secret_key": "wJal..."
}
}'
```

For plugin-registered adapters, the field names are whatever the adapter defines — `token`, `api_key`, `workspace_url`, etc.:

```bash
export DJ_STORES='{
"uc": {
"protocol": "databricks",
"workspace_url": "https://my-workspace.cloud.databricks.com",
"volume": "main.default.my_volume",
"token": "dapibd..."
}
}'
```

### Precedence

`DJ_STORES`, if set, replaces the `stores` block loaded from `datajoint.json` wholesale. The `.secrets/` directory still runs after `DJ_STORES` and fills in any attributes that `DJ_STORES` omits — useful if a deployment wants to inject only secrets via env vars while leaving non-sensitive store config in a file.

| Source | Priority |
|--------|----------|
| `dj.config["stores"][...]` (programmatic) | 1 (highest) |
| `DJ_STORES` env var | 2 |
| `datajoint.json` `stores` block | 3 |
| `.secrets/stores.<name>.<attr>` files | 4 (fills missing attrs only) |

### Errors

If `DJ_STORES` is set but unparsable, DataJoint raises `ValueError` at config load time with the JSON error, rather than failing later with a confusing `KeyError` from a half-loaded store.

```python
ValueError: DJ_STORES contains invalid JSON: Expecting property name enclosed in double quotes...
```

## `DJ_IGNORE_CONFIG_FILE` — skip files entirely

!!! version-added "New in 2.3"
Set `DJ_IGNORE_CONFIG_FILE=true` to skip `datajoint.json` and the secrets directory.

For env-var-only deployments — Kubernetes pods, Lambda functions, the DataJoint platform — set:

```bash
export DJ_IGNORE_CONFIG_FILE=true
```

When `true`, DataJoint skips:

- the recursive parent-directory search for `datajoint.json`
- the project `.secrets/` directory
- the Docker/Kubernetes `/run/secrets/datajoint/` directory

Only env vars (`DJ_HOST`, `DJ_USER`, `DJ_PASS`, `DJ_STORES`, …) and defaults apply. This guarantees that no stray file in a container image can leak into config.

| Variable | Values | Default | Description |
|----------|--------|---------|-------------|
| `DJ_IGNORE_CONFIG_FILE` | `true`, `1`, `yes` / `false`, `0`, `no` | `false` | Skip file-based config sources |

## `.secrets/stores.<name>.<attr>` accepts any attribute

!!! version-added "New in 2.3"
Any `.secrets/stores.<name>.<attr>` file loads into `dj.config["stores"][<name>][<attr>]`, not just `access_key` / `secret_key`.

Previously, only `.secrets/stores.<name>.access_key` and `.secrets/stores.<name>.secret_key` were honored. Plugin-registered adapters often need other field names — a Databricks adapter wants a Bearer `token`, an HTTP adapter might want `api_key`, etc.

In 2.3, any file matching `stores.<name>.<attr>` under the secrets directory is loaded:

```
.secrets/
├── stores.uc.token # Databricks Bearer token
├── stores.main.access_key # S3 access key
└── stores.main.secret_key # S3 secret key
```

Config-file values and `DJ_STORES` still take precedence — secrets only fill attributes that are not already set.

## Storage-adapter plugin contract

!!! version-added "New in 2.3"
The `datajoint.storage` entry-point group is now part of the public API.

DataJoint's built-in `file`, `s3`, `gcs`, and `azure` protocols are themselves `StorageAdapter` subclasses. Third-party packages can register additional protocols by declaring an entry point:

```toml
# pyproject.toml of a plugin package
[project.entry-points."datajoint.storage"]
databricks = "dj_databricks:DatabricksVolumesAdapter"
```

Once installed, the protocol name (`databricks` in the example) is accepted in any `stores.<name>.protocol` field, and DataJoint will use the adapter to construct the underlying `fsspec` filesystem.

See [Storage Adapter API](../reference/specs/storage-adapter-api.md) for the full plugin contract.

## See Also

- [What's New in 2.2](whats-new-22.md) — Previous release (isolated instances, thread-safe mode, graph-driven cascade)
- [Release Notes (v2.3.0)](https://github.com/datajoint/datajoint-python/releases) — GitHub changelog
- [Manage Secrets](../how-to/manage-secrets.md) — Updated for `DJ_STORES` and `DJ_IGNORE_CONFIG_FILE`
- [Configure Object Storage](../how-to/configure-storage.md) — Env-var-only deployments
- [Storage Adapter API](../reference/specs/storage-adapter-api.md) — Plugin contract
- [Configuration Reference](../reference/configuration.md) — Full env-var table
- [datajoint-python PR #1452](https://github.com/datajoint/datajoint-python/pull/1452) — Implementation
40 changes: 40 additions & 0 deletions src/how-to/configure-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -372,7 +372,47 @@ table.insert1({'session_id': 4, 'recording': '_schema/myschema/...'}) # Error!
- Cannot use reserved sections (configured by `hash_prefix` and `schema_prefix`)
- Can be restricted to specific prefix using `filepath_prefix` configuration

## Configuring stores via environment variables

!!! version-added "New in 2.3"
`DJ_STORES` carries a JSON-encoded copy of the `stores` dict for env-var-only deployments (Kubernetes pods, Lambda, the DataJoint platform). Combined with `DJ_IGNORE_CONFIG_FILE=true`, it removes the need for any file on disk.

The JSON shape is identical to the `stores` block of `datajoint.json`:

```bash
export DJ_STORES='{
"default": "main",
"main": {
"protocol": "s3",
"endpoint": "s3.amazonaws.com",
"bucket": "my-bucket",
"location": "my-project/production",
"access_key": "AKIA...",
"secret_key": "wJal..."
}
}'
```

For plugin-registered adapters, declare whatever fields the adapter requires:

```bash
export DJ_STORES='{
"uc": {
"protocol": "databricks",
"workspace_url": "https://my-workspace.cloud.databricks.com",
"volume": "main.default.my_volume",
"token": "dapibd..."
}
}'
```

`DJ_STORES`, when set, replaces the `stores` block loaded from `datajoint.json`. The `.secrets/` directory still runs afterward and fills in any attribute that `DJ_STORES` omits — useful if you want to keep non-sensitive store config in a file and inject only credentials via env vars.

See [Manage Secrets](manage-secrets.md#env-var-only-deployments) for credential hygiene and [Storage Adapter API](../reference/specs/storage-adapter-api.md) for the plugin contract.

## See Also

- [Use Object Storage](use-object-storage.md) — When and how to use object storage
- [Manage Large Data](manage-large-data.md) — Working with blobs and objects
- [Manage Secrets](manage-secrets.md) — Credential hygiene, env-var-only deployments
- [Storage Adapter API](../reference/specs/storage-adapter-api.md) — Plugin contract for third-party storage protocols *(new in 2.3)*
82 changes: 66 additions & 16 deletions src/how-to/manage-secrets.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,12 @@ DataJoint separates configuration into sensitive and non-sensitive components:
DataJoint loads configuration in this priority order (highest to lowest):

1. **Programmatic settings** — `dj.config['key'] = value`
2. **Environment variables** — `DJ_HOST`, `DJ_USER`, etc.
3. **Secrets directory** — `.secrets/datajoint.json`, `.secrets/stores.*`
4. **Project configuration** — `datajoint.json`
2. **Environment variables** — `DJ_HOST`, `DJ_USER`, `DJ_STORES`, etc.
3. **Project configuration** — `datajoint.json`
4. **Secrets directory** — `.secrets/stores.<name>.<attr>` (fills attributes the file/env didn't already set)
5. **Default values** — Built-in defaults

Higher priority sources override lower ones.
Higher priority sources override lower ones. Set `DJ_IGNORE_CONFIG_FILE=true` *(new in 2.3)* to skip both `datajoint.json` and the secrets directory entirely — see [Env-var-only deployments](#env-var-only-deployments) below.

## `.secrets/` Directory Structure

Expand All @@ -37,10 +37,14 @@ project/
│ ├── stores.main.access_key # S3/cloud storage credentials
│ ├── stores.main.secret_key
│ ├── stores.archive.access_key
│ └── stores.archive.secret_key
│ ├── stores.archive.secret_key
│ └── stores.uc.token # any stores.<name>.<attr> (new in 2.3)
└── ...
```

!!! version-added "New in 2.3"
Any `stores.<name>.<attr>` file is loaded, not only `access_key` / `secret_key`. Plugin-registered storage adapters (e.g. a Databricks Bearer-token adapter) can define their own field names — see [Storage Adapter API](../reference/specs/storage-adapter-api.md).

**Critical:** Add `.secrets/` to `.gitignore`:

```gitignore
Expand Down Expand Up @@ -163,13 +167,29 @@ wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

#### Alternative: Environment Variables

For cloud deployments:
!!! version-added "New in 2.3"
`DJ_STORES` carries a JSON-encoded copy of the entire `stores` dict, in the same shape as `datajoint.json`. Replaces the `stores` block from the file. `.secrets/stores.<name>.<attr>` files still fill in attributes that `DJ_STORES` omits.

For cloud deployments, put the entire `stores` block in a single env var:

```bash
export DJ_STORES_MAIN_ACCESS_KEY=AKIAIOSFODNN7EXAMPLE
export DJ_STORES_MAIN_SECRET_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
export DJ_STORES='{
"default": "main",
"main": {
"protocol": "s3",
"endpoint": "s3.amazonaws.com",
"bucket": "my-bucket",
"location": "my-project/data",
"access_key": "AKIAIOSFODNN7EXAMPLE",
"secret_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
}
}'
```

For plugin-registered adapters, the field names are whatever the adapter declares — `token`, `api_key`, `workspace_url`, etc. See [Storage Adapter API](../reference/specs/storage-adapter-api.md).

If `DJ_STORES` contains invalid JSON, DataJoint raises `ValueError` at config-load time with the JSON parser's error message.

## Environment Variable Reference

### Database Connections
Expand All @@ -180,16 +200,42 @@ export DJ_STORES_MAIN_SECRET_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
| `database.port` | `DJ_PORT` | Database port (default: 3306) |
| `database.user` | `DJ_USER` | Database username |
| `database.password` | `DJ_PASS` | Database password |
| `database.use_tls` | `DJ_TLS` | Use TLS encryption (true/false) |
| `database.use_tls` | `DJ_USE_TLS` | Use TLS encryption (true/false) |

### Object Stores

| Pattern | Example | Description |
|---------|---------|-------------|
| `DJ_STORES_<NAME>_ACCESS_KEY` | `DJ_STORES_MAIN_ACCESS_KEY` | S3 access key ID |
| `DJ_STORES_<NAME>_SECRET_KEY` | `DJ_STORES_MAIN_SECRET_KEY` | S3 secret access key |
| Variable | Description |
|----------|-------------|
| `DJ_STORES` | JSON-encoded `stores` dict (same shape as `datajoint.json`). Replaces the `stores` block from the file. *(new in 2.3)* |

### Config-Source Control

| Variable | Default | Description |
|----------|---------|-------------|
| `DJ_IGNORE_CONFIG_FILE` | `false` | If `true`, skip `datajoint.json`, the project `.secrets/`, and `/run/secrets/datajoint/`. Only env vars and defaults apply. *(new in 2.3)* |

## Env-var-only deployments

!!! version-added "New in 2.3"
`DJ_IGNORE_CONFIG_FILE=true` plus `DJ_STORES` gives a deployment a hard guarantee that no file on disk contributes to config — only env vars do. This is how the DataJoint platform configures pipelines.

**Note:** `<NAME>` is the uppercase store name with `_` replacing special characters.
For Kubernetes, Lambda, the DataJoint platform, or any deployment where the container image must not carry configuration:

```bash
export DJ_IGNORE_CONFIG_FILE=true
export DJ_HOST=db.example.com
export DJ_USER=$(vault read -field=username secret/datajoint)
export DJ_PASS=$(vault read -field=password secret/datajoint)
export DJ_STORES="$(vault read -format=json -field=stores secret/datajoint)"
```

With `DJ_IGNORE_CONFIG_FILE=true`, DataJoint skips:

- the recursive parent-directory search for `datajoint.json`
- the project `.secrets/` directory
- the Docker/Kubernetes `/run/secrets/datajoint/` directory

Only env vars (`DJ_HOST`, `DJ_USER`, `DJ_PASS`, `DJ_STORES`, …) and built-in defaults apply. No file under any parent directory of the working directory can contribute to config.

## Security Best Practices

Expand Down Expand Up @@ -221,8 +267,12 @@ chmod 600 .secrets/datajoint.json
# Use environment variables from secure sources
export DJ_USER=$(vault read -field=username secret/datajoint/db)
export DJ_PASS=$(vault read -field=password secret/datajoint/db)
export DJ_STORES_MAIN_ACCESS_KEY=$(vault read -field=access_key secret/datajoint/s3)
export DJ_STORES_MAIN_SECRET_KEY=$(vault read -field=secret_key secret/datajoint/s3)

# Stores: one JSON-encoded env var (new in 2.3)
export DJ_STORES=$(vault read -format=json -field=stores secret/datajoint)

# Optional: guarantee no file on disk contributes to config (new in 2.3)
export DJ_IGNORE_CONFIG_FILE=true
```

### CI/CD Environment
Expand Down
27 changes: 25 additions & 2 deletions src/reference/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,9 +144,13 @@ If table lacks partition attributes, it follows normal path structure.
├── stores.main.access_key
├── stores.main.secret_key
├── stores.archive.access_key
└── stores.archive.secret_key
├── stores.archive.secret_key
└── stores.uc.token # any stores.<name>.<attr> file (new in 2.3)
```

!!! version-added "New in 2.3"
Any `stores.<name>.<attr>` file is loaded, not only `access_key` / `secret_key`. This supports plugin-registered adapters with arbitrary field names (e.g. a Bearer `token`). See [Storage Adapter API](specs/storage-adapter-api.md).

## Jobs Settings

| Setting | Default | Description |
Expand Down Expand Up @@ -178,6 +182,8 @@ If table lacks partition attributes, it follows normal path structure.
| `cache` | — | `None` | Path for query result cache |
| `query_cache` | — | `None` | Path for compiled query cache |
| `download_path` | — | `.` | Download location for attachments/filepaths |
| `stores` | `DJ_STORES` | `{}` | JSON-encoded `stores` dict. Replaces the `stores` block from `datajoint.json`. *(new in 2.3)* |
| `ignore_config_file` | `DJ_IGNORE_CONFIG_FILE` | `False` | Skip `datajoint.json` and the secrets directory. *(new in 2.3)* |

## Example Configuration

Expand Down Expand Up @@ -243,9 +249,26 @@ export DJ_HOST=mysql.example.com
export DJ_USER=analyst
export DJ_PASS=secret
export DJ_DATABASE_NAME=my_database # PostgreSQL only (new in 2.2.1)

# Stores (new in 2.3) — JSON-encoded copy of the stores block
export DJ_STORES='{
"default": "main",
"main": {
"protocol": "s3",
"endpoint": "s3.amazonaws.com",
"bucket": "datajoint-bucket",
"location": "neuroscience-lab/production",
"access_key": "AKIA...",
"secret_key": "wJal..."
}
}'

# Skip datajoint.json and .secrets/ entirely (new in 2.3)
export DJ_IGNORE_CONFIG_FILE=true
```

**Note:** Per-store credentials must be configured in `datajoint.json` or `.secrets/` — environment variable overrides are not supported for nested store configurations.
!!! version-added "New in 2.3"
`DJ_STORES` carries a JSON-encoded copy of the `stores` block. `DJ_IGNORE_CONFIG_FILE=true` skips `datajoint.json`, the project `.secrets/`, and `/run/secrets/datajoint/` — useful for env-var-only deployments (Kubernetes pods, the DataJoint platform). See [Manage Secrets](../how-to/manage-secrets.md#env-var-only-deployments).

## Programmatic Access

Expand Down
Loading
Loading