EMC2/tmp_patch.txt at dev · Howchie/EMC2 · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
*** Begin Patch
*** Add File: docs/OU_Volterra_chunked_vs_spline.md
# OU Volterra: Chunked/Grid vs Vec/Spline PDF — Current State and Fix Plan

This note documents the current behavior and discrepancy between the chunked/grid OU forward solver and the single-t (vec/spline) solver, and proposes a concrete plan to fix accuracy while preserving speed. This is design-only — no code changes are made here.

## Summary

- Function of interest: `ou_fht_pdf_vec_grid_chunked(...)` (chunked/grid path).
- Reference solver: `ou_fht_pdf_vec(...)` and `ou_fht_pdf_spline(...)` (single-t, spline path).
- Observed issue: For identical parameters and evaluating directly at `t`, the chunked function returns a much smaller PDF (e.g., ~7.3e-05) while the single-t solver returns a plausible value (~0.075). This indicates a systematic error in the chunked path, not just resolution differences.

## Where the two paths differ (mechanically)

1. Kernel diagonal K(?,?) for time-varying boundaries
   - Single-t spline path computes ß'(?) at the exact ? (often via spline derivative or a robust central difference on the raw boundary), then plugs it into the K(?,?) limit.
   - Chunked path historically relied on cached/nearest-node derivatives tied to a coarser, non-uniform grid — this is the most sensitive spot and a common source of bias.

2. ß(?) representation and alignment
   - Single-t path builds a ß-spline aligned to the solver’s ? nodes.
   - Chunked path sometimes builds ß on a different grid (time-parametrized or uniform ? unrelated to the chunking), then evaluates on the chunked grid. Misalignment can degrade accuracy.

3. Endpoint derivative of ? (right-end limit in the final integral)
   - Single-t uses a local quadratic fit or spline derivative at ?_eval/?_max.
   - Chunked may rely on a local fit on a sparser last panel — small inconsistencies here can impact the endpoint (singular) limit used in the PDF integral.

4. Final integral panel count (`M`)
   - Single-t uses an adaptive `M` that scales with the evaluation abscissa.
   - Chunked often uses a large or fixed `M` (sometimes equal to N), which can either be wasteful or misbalanced.

5. Physical vs scaled boundary choices
   - The only correct horizon clamp is on the physical boundary B(t), not on the scaled ß(?). Any clamp in scaled coordinates can spuriously zero out mass.

## The “Gemini” chunked function (guide)

You provided an alternative chunked implementation that:

- Builds a ß-spline on a sufficiently dense ? grid up to ?_max (derived from t_max),
- Uses a robust central difference on raw ß for the kernel diagonal K(?,?),
- Uses the spline for off-diagonals and for `g_term` and Abel seeding,
- Interpolates ? via spline on the chunked ? grid,
- Uses an adaptive derivative override at the right endpoint in the final integral,
- Scales the final integral panel count to the evaluation location.

Empirically, that version matches the single-t solver closely, but can be slower than the previous cache-based implementation.

## Root-cause hypothesis for current discrepancy

The chunked path’s main error source is the kernel diagonal K(?,?) in the TV case, where ß'(?) is required:

- Using a cached/nearest-node derivative on a non-uniform chunked grid can be badly biased, especially where B(t) changes rapidly.
- Off-diagonal kernel terms (which use ß differences) are less sensitive because the spline smooths error and the singular contribution is absent.
- Secondary contributions come from the endpoint derivative of ? and from over/under-resolving the final integral.

## Plan to achieve Gemini-level accuracy with current performance

1) Diagonal derivative: on-the-fly central difference at the exact ?

- Keep ß(?) spline for off-diagonal terms, but compute ß'(?) at K(?,?) using a small central difference on the raw boundary:
  - `d = clamp(1e-4·(1+|?|), 1e-7, 1e-3)`.
  - `ß'(?) ˜ [ß_raw(?+d) - ß_raw(?-d)] / (2d)` with ? limited to [0, ?_max].
- This mirrors the reference single-t behavior where the diagonal is most sensitive, without changing the fast off-diagonal path.
- Performance: to avoid per-panel overhead while solving ?, precompute a `beta_prime_diag[j]` only once per chunked ? grid (same grid used to solve ?). Use the cached vector during the ? solve; compute on-the-fly only at final evaluation points if needed.

2) Align ß-spline to the chunked ? grid

- Build ß-spline on the exact chunked ? nodes (the same grid used for ?) to avoid cross-grid interpolation errors and reduce spline call counts.
- If the chunked grid is extremely short (rare), fall back to a tiny uniform ? grid with =5 nodes for Akima (satisfies spline requirements without impacting general performance).

3) Adaptive panel count for the final integral

- Use `M_eval = clamp(80, 240, 2*j)` where `j` is the index (or upper_bound position) of `?_eval` in the solver grid.
- This preserves accuracy where needed and avoids N-scaled work that does not improve fidelity.
- Micro-optimization: precompute ß(?) at the temporary ? points in the final integral once per evaluation and pass them to the integrand to cut spline calls by ~2–3×.

4) ? endpoint handling and singular panel

- For the last panel, use the exact right-end limit for f(?') with `?'(?)` obtained via a local quadratic fit or the ?-spline derivative at ?_eval/?_max.
- Reuse the existing `local_quad_derivative` pattern already present in the code to remain consistent with single-t.

5) Physical boundary clamp (optional safety)

- If clamping is desired to cut horizon cost, compute the clamp time from the physical boundary `B(t)` (not ß). Only return 0 for `t > t_root(B)`; do not clamp using the scaled ß(?).
- This change is optional and should not be enabled by default unless profiling shows a clear benefit without changing the semantics users expect.

## Expected outcome

- Accuracy: With (1) diagonal ß' via central difference and (2) ?-grid alignment for ß-spline, chunked results should match the single-t solver to within ~1e-4 to 1e-3 (typical across regimes that were problematic).
- Performance: Precomputing `beta_prime_diag` on the chunked grid amortizes the cost; adaptive `M_eval` and precomputed ß values in the integral preserve or improve runtime vs current.

## Validation protocol

1. Numerical checks
   - Compare `ou_fht_pdf_vec_grid_chunked` vs `ou_fht_pdf_vec` and `ou_fht_pdf_spline` across:
     - Multiple `t` in [t_min, t_max] including early, mid, and late times,
     - Boundary shapes (flat, monotone decays, near-zero tails),
     - Different chunking settings (ratio, base_panels, with/without early prelude).

2. Diagnostics (temporary)
   - Log for a handful of points: `K(?,?)`, ß'(?) method, `?(?)`, `?'(?)`, and final integral `M_eval` to confirm parity with single-t.

3. Acceptance criteria
   - Max relative error between chunked and single-t = 1e-3 (typical), with no pathological underflow.
   - Runtime within ±10–20% of current chunked performance on representative workloads.

## Implementation checklist (when ready)

- [ ] Build ß on the chunked ? grid and cache `beta_prime_diag[j]` via raw central difference (one-time per solve grid).
- [ ] Use cached `beta_prime_diag` only for K(?,?); keep off-diagonals on the ß-spline.
- [ ] Switch final integral to adaptive `M_eval`; precompute ß at temporary ? points per evaluation to reduce spline calls.
- [ ] Keep ? endpoint derivative consistent with single-t (local quadratic or spline derivative).
- [ ] (Optional) Add physical boundary clamp by B(t), not ß(?), guarded behind a flag.
- [ ] Add temporary diagnostics and remove once validated.

## Notes

- Do not change the OU function(s) now. This document captures a focused path to obtain the accuracy demonstrated by the reference (Gemini) chunked function while retaining the speed characteristics of the current implementation.

*** End Patch