Settle the mean-vs-median prediction-centering convention (PEtab v2 median vs legacy mean; reconcile the inconsistent per-family defaults)

## Problem

PyBNF's **prediction-centering** convention — whether the deterministic model prediction is interpreted as a noise distribution's **mean** or **median** (CONTEXT.md "Prediction Centering"; ADR-0011 location axis, ADR-0024 native surface) — is currently **inconsistent across families and ambiguous by default**. There is a genuine tension to settle as policy before more capability lands (#419):

- **PEtab v2 hardcodes the median** for every noise model. The exporter-first / importer dogfood (ADR-0023/0025/0026) needs a clear, defensible median story.
- **Backward compatibility** pulls toward the legacy interpretations (the original `chi_sq` is mean-on-linear; `lognormal_var` was median-on-log; etc.).
- The current state defaults *some* families to mean and *others* to median — a confusing mess that we should resolve deliberately rather than per-PR.

This issue is for **discussion of the go-forward convention** (and how to keep legacy configs working). It deliberately does **not** prescribe the answer; it gathers the evidence and the options. #419 (implement mean *and* median for every family) is the **capability**; this issue is the **policy** that decides the defaults and the config surface — #419's default/surface choices should gate on the outcome here.

## Current state (the inconsistency, precisely)

Legacy objfuncs (each hardcodes its centering):

| objfunc | noise model | centering |
|---|---|---|
| `chi_sq`, `chi_sq_dynamic` | `Gaussian()` = (LINEAR) | **mean** |
| `lognormal` | `Gaussian(LOG10, MEDIAN)` | **median** |
| `laplace` | `Laplace()` = (LINEAR) | **median** |
| `neg_bin`, `neg_bin_dynamic` | `NegBinomial()` | **mean** (prediction *is* the mean) |

Native `noise_model` family tokens (`_NOISE_FAMILIES`) and class defaults:

| token / class | default centering |
|---|---|
| `normal` / `gaussian`, `Gaussian.__init__` | **mean** |
| `lognormal` (Gaussian on LOG10) | **median** |
| `laplace`, `Laplace.__init__` | **median** |
| `neg_bin`, `NegBinomial` | **mean** |

Global `noise_location` key (ADR-0024): optional, default unset → falls through to each family's class default (i.e. inherits the inconsistency).

Observations:
- **Two location-scale families ship opposite defaults**: `Gaussian` → mean, `Laplace` → median.
- The code comments (`location.py`, `_NOISE_LOCATIONS` in `objective.py`) state median is the default "consistent with PEtab v2", but `Gaussian`/`chi_sq` actually default to **mean** — docs and code disagree on what "the default" is.
- On a **linear** scale the choice is invisible (mean = median for these symmetric families), so the inconsistency is latent today and only surfaces on **log scales** (lognormal, a future log-Laplace) and for **`neg_bin`** (asymmetric count family).

## Why it matters

- **PEtab v2 interop**: a round-trip export/import (the dogfood goal) must agree with PEtab's median convention, or silently shift the likelihood.
- **Legacy reproducibility**: users porting old `.conf` files expect their fits unchanged; flipping a default changes results on log/count models.
- **Clarity**: a single, documented convention (plus an explicit escape hatch) replaces "which family am I, and what does it happen to default to?"

## Options to weigh (for discussion)

- **A. Median-everywhere default (PEtab v2-aligned)** + explicit `location = mean` opt-in. Cleanest forward story; **changes** legacy behavior wherever mean ≠ median (log-scale Gaussian if anyone used a mean-centered log model; `neg_bin`).
- **B. Freeze the current per-family defaults**, require nothing, just document. Zero behavior change; preserves the inconsistency permanently.
- **C. A config-level convention switch** (e.g. `centering_convention = petab | legacy`, or piggyback on a broader `petab_compat` mode) that *selects the default family-by-family*: new PEtab v2 configs get median-everywhere, legacy configs are byte-identical to today. This is the "easily tell new-vs-old behavior" idea — it isolates the breaking change behind an explicit opt-in and lets a config self-declare its era.
- **D. No implicit default where it matters**: make `location` **mandatory-explicit** for any (family × scale) where mean ≠ median (all log scales, `neg_bin`), so ambiguity can never resolve silently; keep the implicit default only where it's a provable no-op (linear symmetric families).

(These are not mutually exclusive — e.g. C + D: a convention switch *and* an explicit-required rule for the genuinely ambiguous cases.)

## Backward-compat analysis (what actually changes)

- **Linear-scale Gaussian/Laplace** (`chi_sq`, `laplace`, `sos`-adjacent): nothing changes under any option (symmetric).
- **Log-scale Gaussian** (`lognormal`): already median; stays median under A/C. Only a (currently non-existent) mean-centered log model would move.
- **`neg_bin`**: defaults to mean today; option A would flip it to median (a real change to the count likelihood). This is the concrete decision flagged in #419.
- A `.conf` era switch (C) makes all of the above opt-in, so no existing file changes unless it declares PEtab-v2 mode.

## Acceptance / outcome

A written decision (ADR) that fixes: (1) the go-forward default per (family × scale), (2) the backward-compat mechanism (and whether `.conf` should carry an explicit era/convention marker), (3) the doc/code reconciliation so "the default" means one thing. #419 then implements the capability under that convention.

Relevant ADRs: **0011** (location axis), **0024** (native `location` surface + global `noise_location` + the "median default" intent), **0021** (per-observable noise), **0023/0025/0026** (PEtab v2 interop). Related: **#419** (capability), and the per-observable noise work.


objfunc	noise model	centering
`chi_sq`, `chi_sq_dynamic`	`Gaussian()` = (LINEAR)	mean
`lognormal`	`Gaussian(LOG10, MEDIAN)`	median
`laplace`	`Laplace()` = (LINEAR)	median
`neg_bin`, `neg_bin_dynamic`	`NegBinomial()`	mean (prediction is the mean)

token / class	default centering
`normal` / `gaussian`, `Gaussian.__init__`	mean
`lognormal` (Gaussian on LOG10)	median
`laplace`, `Laplace.__init__`	median
`neg_bin`, `NegBinomial`	mean

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Settle the mean-vs-median prediction-centering convention (PEtab v2 median vs legacy mean; reconcile the inconsistent per-family defaults) #424

Problem

Current state (the inconsistency, precisely)

Why it matters

Options to weigh (for discussion)

Backward-compat analysis (what actually changes)

Acceptance / outcome

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Settle the mean-vs-median prediction-centering convention (PEtab v2 median vs legacy mean; reconcile the inconsistent per-family defaults) #424

Description

Problem

Current state (the inconsistency, precisely)

Why it matters

Options to weigh (for discussion)

Backward-compat analysis (what actually changes)

Acceptance / outcome

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions