Skip to content

dlasd2/slasd2: Increase deflation tolerance to match dlasd7/slasd7#1286

Merged
langou merged 1 commit into
Reference-LAPACK:masterfrom
jschueller:issue255
Jun 13, 2026
Merged

dlasd2/slasd2: Increase deflation tolerance to match dlasd7/slasd7#1286
langou merged 1 commit into
Reference-LAPACK:masterfrom
jschueller:issue255

Conversation

@jschueller

Copy link
Copy Markdown
Contributor

Fix DBDSDC/SBDSDC returning non-orthogonal U/V for bidiagonal matrices with many nearly-equal singular values (e.g., all singular values ≈ 1).

The divide-and-conquer bidiagonal SVD has two code paths:

  • Full vector path (DLASD2/SLASD2): deflation tolerance = 8 * EPS
  • Compact path (DLASD7/SLASD7): deflation tolerance = 64 * EPS

With the weaker tolerance (8EPS), singular values differing by only ~10EPS (e.g., ~2e-15 for double precision) escape deflation. The subsequent Z computation in DLASD3 then suffers catastrophic cancellation from denominators (σ_i - σ_j) that are tiny, polluting the singular vectors and causing loss of orthogonality.

Raise DLASD2 and SLASD2 to 64*EPS, matching DLASD7 and SLASD7, so more close singular values are deflated and the singular-vector computation remains stable.

Closes #255

Fix DBDSDC/SBDSDC returning non-orthogonal U/V for bidiagonal matrices
with many nearly-equal singular values (e.g., all singular values ≈ 1).

The divide-and-conquer bidiagonal SVD has two code paths:
- Full vector path (DLASD2/SLASD2): deflation tolerance = 8 * EPS
- Compact path  (DLASD7/SLASD7): deflation tolerance = 64 * EPS

With the weaker tolerance (8*EPS), singular values differing by only
~10*EPS (e.g., ~2e-15 for double precision) escape deflation.  The
subsequent Z computation in DLASD3 then suffers catastrophic
cancellation from denominators (σ_i - σ_j) that are tiny, polluting the
singular vectors and causing loss of orthogonality.

Raise DLASD2 and SLASD2 to 64*EPS, matching DLASD7 and SLASD7, so more
close singular values are deflated and the singular-vector computation
remains stable.

Closes Reference-LAPACK#255
@langou langou merged commit fbd4146 into Reference-LAPACK:master Jun 13, 2026
12 checks passed
@jschueller jschueller deleted the issue255 branch June 13, 2026 10:04
ilayn added a commit to ilayn/semicolon-lapack that referenced this pull request Jun 21, 2026
… #1286)

Raise the deflation tolerance from 8*eps to 64*eps so near-equal singular values are deflated, preventing loss of orthogonality in the divide-and-conquer bidiagonal SVD. Port of Reference-LAPACK 0e0f2359 (Reference-LAPACK/lapack#1286).
ilayn added a commit to ilayn/semicolon-lapack that referenced this pull request Jun 21, 2026
… #1286)

Raise the deflation tolerance from 8*eps to 64*eps so near-equal singular values are deflated, preventing loss of orthogonality in the divide-and-conquer bidiagonal SVD. Port of Reference-LAPACK 0e0f2359 (Reference-LAPACK/lapack#1286).
@martin-frbg

Copy link
Copy Markdown
Collaborator

One of OpenBLAS' codspeed benchmarks in CI is flagging a large change in the result of an SGESDD call (via comparing the original to a reconstruction of the source matrix from its computed diagonal) with

E       Max absolute difference among violations: 2.336502e-05
E       Max relative difference among violations: 0.01933236

where our thresholds are 1e-5 for absolute and 1.e-7 for relative difference - note the relative difference hitting 1.e-2 - the CI job terminated there before trying DGESDD, but its results would probably be similarly "off".

Is this magnitude of difference to previous LAPACK behaviour expected for the SVD of a "benign" 1000-by-222 matrix of random numbers ? Note the python code for the benchmark lives at https://github.com/OpenMathLib/OpenBLAS/blob/develop/benchmark/pybench/

@jschueller

Copy link
Copy Markdown
Contributor Author

I ran reference LAPACK SGESDD on a 1000x222 random matrix (uniform [0,1)) using both tolerance values (8*EPS and 64*EPS) to compare:

1. The tolerance change makes no difference for well-conditioned random matrices.
Both old and new tolerances produced identical singular values and reconstruction error on the same matrix. This is expected — the deflation tolerance only matters when singular values are very close (difference < 64*EPS), which rarely happens for random uniform matrices with condition number ~2.7.

2. The atol=1e-5 threshold is too tight for SGESDD on 1000x222 in single precision.
I tested 50 different random seeds. All 50 fail assert_allclose(u @ diag(s) @ vt, a, atol=1e-5) — the max absolute reconstruction error is routinely 2–3e-5 regardless of the tolerance setting. For single precision (EPS ≈ 6e-8), the D&C SVD error bound scales as O(N * EPS) * ||A||, which for N=222 gives an expected error floor around 222 * 6e-8 ≈ 1.3e-5. The benchmark passes for (100, 5) but is borderline for (1000, 222).

3. The 1.9% "relative difference" is from elements near zero.
2.3e-5 / 0.019 ≈ 0.0012 — the worst relative error occurs for matrix elements where |a_ij| ~ 0.001. This is benign floating-point behavior for single-precision SVD.

4. The tolerance change fixes a real catastrophic bug (issue #255).
With 8*EPS, matrices having many nearly-equal singular values (e.g., all ≈ 1) produce U with orthogonality error ≈ 1 (complete failure). The fix raises this to 64*EPS, matching the compact path (DLASD7/SLASD7) that already uses this tolerance and has worked correctly.

The CodSpeed benchmark's atol={'s': 1e-5} is likely too tight for this matrix size in single precision. The 1.9% "change" in the CodSpeed metric is normal numerical churn — the tolerance change does not regress accuracy; it only affects matrices with near-equal singular values where the old code catastrophically failed. The benchmark threshold may need to be relaxed (e.g., atol={'s': 1e-4}) or the (1000, 222) case may need rtol alongside atol.

@mgates3

mgates3 commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Should a similar change be made in the eigenvalue D&C code?
It appears to have a slightly different formula, including Z.

SRC> grep 'TOL *=' dlaed*.f  # eig
dlaed2.f:      TOL = EIGHT*EPS*MAX( ABS( D( JMAX ) ), ABS( Z( IMAX ) ) )
dlaed8.f:      TOL = EIGHT*EPS*ABS( D( JMAX ) )

SRC> grep 'TOL *=' dlasd*.f  # svd, updated by this PR
dlasd2.f:      TOL = EIGHT*EIGHT*EPS*MAX( ABS( D( N ) ), TOL )
dlasd7.f:      TOL = EIGHT*EIGHT*EPS*MAX( ABS( D( N ) ), TOL )

(Output edited for brevity.)

@jschueller

Copy link
Copy Markdown
Contributor Author

good catch: #1317

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DBDSDC returns non-orthogonal U and V for a specific matrix

4 participants