diff --git a/FLOAT32_PRECISION_ANALYSIS.md b/FLOAT32_PRECISION_ANALYSIS.md new file mode 100644 index 0000000..e479082 --- /dev/null +++ b/FLOAT32_PRECISION_ANALYSIS.md @@ -0,0 +1,262 @@ +# Float32 Precision Issue Analysis + +## 報告された問題 + +float32への丸め込みで偽陰性(false negative)で重なりが検出されないという報告。 + +## 調査結果 + +### コードベースの分析 + +#### 1. 内部表現 +- すべてのバウンディングボックスは内部的に`Real`型(=`float`)で保存される +- `include/prtree/core/detail/bounding_box.h:17`で定義: `using Real = float;` + +#### 2. 精度補正メカニズム(float64入力時のみ) + +**float64入力の場合** (`include/prtree/core/prtree.h:213-291`): +```cpp +// Constructor for float64 input (float32 tree + double refinement) +PRTree(const py::array_t &idx, const py::array_t &x) +``` +- 内部的にfloat32に変換してツリーを構築 +- しかし、元のdouble精度の座標を`idx2exact`に保存(line 274) +- クエリ時に`refine_candidates()`メソッド(line 805-831)でdouble精度で再チェック + +**float32入力の場合** (`include/prtree/core/prtree.h:138-210`): +```cpp +// Constructor for float32 input (no refinement, pure float32 performance) +PRTree(const py::array_t &idx, const py::array_t &x) +``` +- float32のまま処理 +- `idx2exact`は空のまま(line 157のコメント: "idx2exact is NOT populated for float32 input") +- **補正が行われない** + +#### 3. 交差判定ロジック + +`include/prtree/core/detail/bounding_box.h:106-125`: +```cpp +bool operator()(const BB &target) const { + // ... (省略) + for (int i = 0; i < D; ++i) { + flags[i] = -minima[i] <= maxima[i]; // 閉区間セマンティクス + } + // ... +} +``` + +- すべてfloat32で計算される +- `<=`を使用(touching boxes are considered intersecting) + +#### 4. query_intersections()メソッド + +`include/prtree/core/prtree.h:894-1040`: +- line 963-965と993-996で、`idx2exact`が空でない場合のみ補正を行う +- **float32入力の場合は補正がスキップされる**(line 808-810) + +### 理論的な脆弱性 + +1. **丸め込みによる精度損失**: + - float64で表現された微小な重なりがfloat32に変換される際に失われる可能性 + - 特に大きな座標値(例: 10^7以上)では、float32のULP(Unit in Last Place)が大きくなる + +2. **補正メカニズムの非対称性**: + - float64入力: float32ツリー + double補正 + - float32入力: float32ツリーのみ(補正なし) + - これにより、同じデータでも入力型によって結果が異なる可能性 + +3. **潜在的な偽陰性シナリオ**: + - 2つのAABBが非常に小さな量で重なる + - その重なりがfloat32の精度限界以下 + - float32に丸められた後、重ならなくなる + +### テスト結果 + +複数のテストケースを作成して検証を試みましたが、実際の偽陰性を再現することはできませんでした: + +1. **test_float32_overlap_issue.py**: 基本的な精度テスト +2. **test_float32_refined.py**: より厳密な精度境界テスト +3. **test_float32_extreme.py**: 極端なケース(ULP境界、負の座標、subnormal値など) + +#### テスト結果の考察 + +すべてのテストで偽陰性は検出されませんでした。理由として考えられるのは: + +1. **閉区間セマンティクス**: `<=`比較により、境界で接触するボックスは常に交差と判定される +2. **一貫した丸め込み**: 両方のボックスが同じ精度(float32)で保存されるため、比較は一貫している +3. **テストケースの限界**: 実際の報告された問題を再現する特定のデータセットが必要な可能性 + +## 問題の根本原因(理論的分析) + +報告された偽陰性の問題は、以下の条件で発生する可能性があります: + +### ケース1: 異なる精度での構築とクエリ + +```python +# float32でツリーを構築 +boxes_f32 = np.array([[0.0, 0.0, 100.0, 100.0]], dtype=np.float32) +tree = PRTree2D(np.array([0]), boxes_f32) + +# float64でクエリ(わずかに異なる境界値) +query_f64 = np.array([100.0 + epsilon, 0.0, 200.0, 100.0], dtype=np.float64) +result = tree.query(query_f64) # 偽陰性の可能性 +``` + +### ケース2: 大きな座標値での微小な重なり + +float32の精度限界: +- 100付近: ULP ≈ 7.6e-6 +- 10,000付近: ULP ≈ 0.000977 +- 1,000,000付近: ULP ≈ 0.0625 +- 16,777,216 (2^24)付近: ULP = 2.0 + +大きな座標値では、微小な重なりが丸め込みで失われやすい。 + +### ケース3: 累積丸め込みエラー + +複数の演算を経た座標値は、累積的な丸め込みエラーにより、 +元の値から大きくずれる可能性がある。 + +## 推奨される対策 + +1. **float64入力の使用を推奨**: + - float64入力を使用すれば、内部補正メカニズムにより高精度が保たれる + +2. **float32入力にも補正メカニズムを追加**: + - float32で構築されたツリーでも、クエリ時にはdouble精度で再チェック + - ただし、元の精度情報が失われているため完全な補正は不可能 + +3. **ドキュメントの改善**: + - float32使用時の精度制限を明示的に文書化 + - 高精度が必要な場合はfloat64の使用を推奨 + +4. **精度警告の追加**: + - 大きな座標値(>10^6)でfloat32を使用する場合、警告を表示 + +## 検証スクリプト + +以下のテストスクリプトを作成しました: + +1. `test_float32_overlap_issue.py`: 基本的な偽陰性テスト +2. `test_float32_refined.py`: より厳密な精度境界テスト +3. `test_float32_extreme.py`: 極端なエッジケーステスト +4. `test_rounding_direction.py`: 丸め方向の不一致テスト +5. `test_different_sources.py`: 異なるソースからの値の丸めテスト +6. `test_false_negative_found.py`: 偽陰性の系統的探索 + +実行方法: +```bash +python test_float32_overlap_issue.py +python test_float32_refined.py +python test_float32_extreme.py +python test_rounding_direction.py +python test_different_sources.py +python test_false_negative_found.py +``` + +## 更新: 丸め方向の調査 + +「丸める方向が違う場合」という指摘に基づき、追加調査を実施しました。 + +### 重要な発見:偽陽性の検出 + +`test_different_sources.py`の`test_accumulated_computation()`で**偽陽性**を検出: + +```python +# 累積計算による丸め誤差 +accumulated_f64 = sum(0.1 for _ in range(1000)) # ≈ 99.999... +direct_f64 = 100.0 + +# Float64: accumulated < direct (重ならない) +# Float32: 両方が 100.0 に丸まる (重なる!) + +Result: +- Float64 tree: 0 pairs (正しい) +- Float32 tree: 1 pair (偽陽性!) +``` + +これは報告された問題(偽陰性)の逆パターンです。float32の丸め込みにより、本来重ならないボックスが重なっていると誤判定されています。 + +### 偽陰性が再現できない理由の分析 + +1. **閉区間セマンティクス**: `<=` 比較により、境界で接触するボックスは常に交差と判定される +2. **一貫した丸め込み**: 同じfloat64値は常に同じfloat32値に丸められる +3. **内部一貫性**: すべての計算がfloat32で行われるため、比較は一貫している + +### 理論的な偽陰性発生シナリオ + +報告された問題が発生する可能性のある状況: + +1. **異なる計算パス**: + ``` + Box A: 外部計算 -> float64 -> float32 (ツリー構築時) + Box B: 別の計算 -> float64 -> float32 (ツリー構築時) + ``` + 計算履歴の違いにより、本来重なるべき値が異なるfloat32表現になる可能性 + +2. **コンパイラの最適化による中間精度**: + - C++コンパイラがfloat64中間精度を使用する場合がある + - `-ffloat-store`や`-fexcess-precision=standard`フラグの影響 + - 最適化レベル(-O2, -O3)による挙動の違い + +3. **FPU設定とレジスタ精度**: + - x87 FPUの80bit拡張精度レジスタの影響 + - SSE/AVX命令セットの使用有無 + - 丸めモードの設定(RN, RZ, RP, RM) + +4. **データパイプラインの不整合**: + ``` + Box A: ファイル読込 -> 文字列 -> float64 -> float32 + Box B: 直接計算 -> float64 -> float32 + ``` + これらが微妙に異なる値になる可能性 + +5. **プラットフォーム依存の挙動**: + - Windows vs Linux vs macOS での浮動小数点演算の違い + - ハードウェアアーキテクチャ(x86, ARM)の違い + +## 結論 + +1. **コードレベルでの脆弱性確認**: + - float32入力時の補正メカニズムの欠如を確認 + - `include/prtree/core/prtree.h:157` で明示的に補正なしと記載 + +2. **偽陰性の再現**: + - 合成テストケースでは再現できず + - すべての境界接触ケースで正しく検出される + +3. **偽陽性の発見**: + - 累積計算による丸め誤差で偽陽性を確認 + - float64では重ならないがfloat32では重なる + +4. **理論的なリスク**: + - 偽陰性: 異なる計算パス、コンパイラ最適化、FPU設定の違い + - 偽陽性: 累積計算による丸め誤差 + +5. **推奨事項**: + - **重要**: 高精度が必要な場合はfloat64入力を使用 + - 累積計算を避け、直接計算を使用 + - データパイプラインの一貫性を確保 + - クリティカルな用途では float64 + 補正メカニズムに依存 + +## 次のステップ + +この問題を完全に検証・解決するには: + +1. **報告者からの情報収集**: + - 具体的な失敗するデータセット(座標値) + - 発生環境の詳細(OS、コンパイラ、最適化フラグ) + - データの生成方法や処理パイプライン + - ビルド時のCMakeオプション + +2. **再現テスト**: + - 実際のデータでの検証 + - 異なるプラットフォームでのテスト + - コンパイラオプションを変えてのビルド + +3. **潜在的な修正**: + - float32入力でも `idx2exact` を保持するオプション追加 + - 精度警告システムの実装 + - ドキュメントでの精度制限の明示 + +これらの情報があれば、問題を再現し、適切な修正を行うことができます。 diff --git a/test_different_sources.py b/test_different_sources.py new file mode 100644 index 0000000..7e9c901 --- /dev/null +++ b/test_different_sources.py @@ -0,0 +1,252 @@ +#!/usr/bin/env python3 +""" +Test for rounding issues when values come from different sources. + +The hypothesis: When Box A's max and Box B's min are computed or stored +independently, float32 rounding can create gaps that don't exist in float64. +""" +import numpy as np +from python_prtree import PRTree2D + + +def test_computed_vs_literal(): + """ + Test when coordinates come from computations vs literals. + + A computed value might round differently than a literal value + due to intermediate precision. + """ + print("\n=== Test: Computed vs Literal Values ===\n") + + # Computed value (with intermediate float64 precision) + computed_f64 = np.float64(1.0) / np.float64(3.0) * np.float64(300.0) # = 100.0 + + # Literal value + literal_f64 = np.float64(100.0) + + print(f"Computed (f64): {computed_f64:.20f}") + print(f"Literal (f64): {literal_f64:.20f}") + print(f"Equal in f64: {computed_f64 == literal_f64}") + + computed_f32 = np.float32(computed_f64) + literal_f32 = np.float32(literal_f64) + + print(f"\nComputed (f32): {computed_f32:.20f}") + print(f"Literal (f32): {literal_f32:.20f}") + print(f"Equal in f32: {computed_f32 == literal_f32}") + + # Create boxes + boxes = np.array([ + [0.0, 0.0, computed_f64, 100.0], # Ends at computed value + [literal_f64, 0.0, 200.0, 100.0], # Starts at literal value + ], dtype=np.float64) + + boxes_f32 = boxes.astype(np.float32) + + print(f"\nOverlap (f64): {boxes[0,2] >= boxes[1,0]}") + print(f"Overlap (f32): {boxes_f32[0,2] >= boxes_f32[1,0]}") + + idx = np.array([0, 1], dtype=np.int64) + + tree_f64 = PRTree2D(idx, boxes) + pairs_f64 = tree_f64.query_intersections() + + tree_f32 = PRTree2D(idx, boxes_f32) + pairs_f32 = tree_f32.query_intersections() + + print(f"\nFloat64 tree: {len(pairs_f64)} pairs") + print(f"Float32 tree: {len(pairs_f32)} pairs") + + if len(pairs_f64) != len(pairs_f32): + print("\n❌ FALSE NEGATIVE!") + return True + return False + + +def test_accumulated_computation(): + """ + Test when values are accumulated through multiple operations. + """ + print("\n=== Test: Accumulated Computation ===\n") + + # Create a value through accumulation + accumulated_f64 = np.float64(0.0) + step = np.float64(0.1) + for i in range(1000): + accumulated_f64 += step + + # Direct value + direct_f64 = np.float64(100.0) + + print(f"Accumulated (f64): {accumulated_f64:.20f}") + print(f"Direct (f64): {direct_f64:.20f}") + print(f"Difference: {abs(accumulated_f64 - direct_f64):.20e}") + + accumulated_f32 = np.float32(accumulated_f64) + direct_f32 = np.float32(direct_f64) + + print(f"\nAccumulated (f32): {accumulated_f32:.20f}") + print(f"Direct (f32): {direct_f32:.20f}") + print(f"Equal in f32: {accumulated_f32 == direct_f32}") + + # Create boxes with these values + boxes_f64 = np.array([ + [0.0, 0.0, accumulated_f64, 100.0], + [direct_f64, 0.0, 200.0, 100.0], + ], dtype=np.float64) + + boxes_f32 = boxes_f64.astype(np.float32) + + print(f"\nOverlap (f64): {boxes_f64[0,2] >= boxes_f64[1,0]}") + print(f"Overlap (f32): {boxes_f32[0,2] >= boxes_f32[1,0]}") + + idx = np.array([0, 1], dtype=np.int64) + + tree_f64 = PRTree2D(idx, boxes_f64) + pairs_f64 = tree_f64.query_intersections() + + tree_f32 = PRTree2D(idx, boxes_f32) + pairs_f32 = tree_f32.query_intersections() + + print(f"\nFloat64 tree: {len(pairs_f64)} pairs") + print(f"Float32 tree: {len(pairs_f32)} pairs") + + if len(pairs_f64) != len(pairs_f32): + print("\n❌ FALSE NEGATIVE!") + return True + return False + + +def test_separate_float32_arrays(): + """ + Test when float32 values are created in separate arrays. + + Key insight: If two float32 values are created independently, + they might have different representations even if they should be equal. + """ + print("\n=== Test: Separate Float32 Arrays ===\n") + + # Create a problematic float64 value + problematic = np.float64(100.0) + np.float64(1e-7) + + print(f"Problematic value (f64): {problematic:.20f}") + + # Create first array with this value as max + array1_f32 = np.array([0.0, 0.0, problematic, 100.0], dtype=np.float32) + + # Create second array with this value as min + array2_f32 = np.array([problematic, 0.0, 200.0, 100.0], dtype=np.float32) + + print(f"\nArray1[2] (max): {array1_f32[2]:.20f}") + print(f"Array2[0] (min): {array2_f32[0]:.20f}") + print(f"Equal: {array1_f32[2] == array2_f32[0]}") + print(f"Overlap: {array1_f32[2] >= array2_f32[0]}") + + # Combine into boxes + boxes_f32 = np.vstack([array1_f32.reshape(1, -1), array2_f32.reshape(1, -1)]) + + print(f"\nCombined boxes (f32):\n{boxes_f32}") + + idx = np.array([0, 1], dtype=np.int64) + + tree = PRTree2D(idx, boxes_f32) + pairs = tree.query_intersections() + + print(f"\nIntersections found: {len(pairs)}") + print(f"Pairs: {pairs}") + + # Expected: should find intersection since they touch + if len(pairs) == 0: + print("\n❌ FALSE NEGATIVE: Touching boxes not detected!") + return True + + return False + + +def test_binary_representation(): + """ + Test values that have identical decimal representation but different binary. + """ + print("\n=== Test: Binary Representation ===\n") + + # Create a value that cannot be exactly represented in binary + decimal_val = 0.1 + + # In float64 + val_f64 = np.float64(decimal_val) + print(f"0.1 in float64: {val_f64:.60f}") + print(f"Hex: {val_f64.hex()}") + + # In float32 + val_f32 = np.float32(decimal_val) + print(f"\n0.1 in float32: {val_f32:.60f}") + print(f"Hex: {val_f32.hex()}") + + # Scale up + scale = 1000 + scaled_f64 = val_f64 * scale + scaled_f32 = val_f32 * scale + + print(f"\nScaled (f64): {scaled_f64:.60f}") + print(f"Scaled (f32): {scaled_f32:.60f}") + + # Now use these as coordinates + boxes_f64 = np.array([ + [0.0, 0.0, scaled_f64, 100.0], + [scaled_f64, 0.0, 200.0, 100.0], + ], dtype=np.float64) + + # Create float32 version two ways: + # 1. Convert from float64 + boxes_f32_converted = boxes_f64.astype(np.float32) + + # 2. Create directly with float32 + boxes_f32_direct = np.array([ + [0.0, 0.0, val_f32 * scale, 100.0], + [val_f32 * scale, 0.0, 200.0, 100.0], + ], dtype=np.float32) + + print(f"\nBoxes f32 (converted):\n{boxes_f32_converted}") + print(f"\nBoxes f32 (direct):\n{boxes_f32_direct}") + print(f"\nAre they equal? {np.array_equal(boxes_f32_converted, boxes_f32_direct)}") + + idx = np.array([0, 1], dtype=np.int64) + + tree_f64 = PRTree2D(idx, boxes_f64) + pairs_f64 = tree_f64.query_intersections() + + tree_f32_conv = PRTree2D(idx, boxes_f32_converted) + pairs_f32_conv = tree_f32_conv.query_intersections() + + tree_f32_dir = PRTree2D(idx, boxes_f32_direct) + pairs_f32_dir = tree_f32_dir.query_intersections() + + print(f"\nFloat64 tree: {len(pairs_f64)} pairs") + print(f"Float32 (converted): {len(pairs_f32_conv)} pairs") + print(f"Float32 (direct): {len(pairs_f32_dir)} pairs") + + if len(pairs_f64) != len(pairs_f32_conv) or len(pairs_f64) != len(pairs_f32_dir): + print("\n❌ FALSE NEGATIVE!") + return True + + return False + + +if __name__ == "__main__": + print("=" * 70) + print("Testing Different Sources of Rounding") + print("=" * 70) + + issue_found = False + + issue_found |= test_computed_vs_literal() + issue_found |= test_accumulated_computation() + issue_found |= test_separate_float32_arrays() + issue_found |= test_binary_representation() + + print("\n" + "=" * 70) + if issue_found: + print("❌ FALSE NEGATIVE CONFIRMED!") + else: + print("⚠️ No false negatives in these tests") + print("=" * 70) diff --git a/test_false_negative_found.py b/test_false_negative_found.py new file mode 100644 index 0000000..2bf99b2 --- /dev/null +++ b/test_false_negative_found.py @@ -0,0 +1,282 @@ +#!/usr/bin/env python3 +""" +Test to find actual false negatives in float32 overlap detection. + +Based on the insight from accumulated computation test, we need to find +cases where: +- Float64: overlap exists +- Float32: overlap lost due to rounding +""" +import numpy as np +from python_prtree import PRTree2D + + +def test_accumulated_overlap_loss(): + """ + Test where accumulated computation creates a value that overlaps in float64 + but loses overlap when rounded to float32. + """ + print("\n=== Test: Accumulated Overlap Loss ===\n") + + # Create a value through accumulation that is slightly larger than direct + accumulated_f64 = np.float64(0.0) + step = np.float64(0.1) + for i in range(1001): # 1001 * 0.1 = 100.1 (approximately) + accumulated_f64 += step + + direct_f64 = np.float64(100.0) + + print(f"Accumulated (f64): {accumulated_f64:.20f}") + print(f"Direct (f64): {direct_f64:.20f}") + print(f"accumulated > direct: {accumulated_f64 > direct_f64}") + print(f"Difference: {accumulated_f64 - direct_f64:.20e}") + + # Convert to float32 + accumulated_f32 = np.float32(accumulated_f64) + direct_f32 = np.float32(direct_f64) + + print(f"\nAccumulated (f32): {accumulated_f32:.20f}") + print(f"Direct (f32): {direct_f32:.20f}") + print(f"accumulated > direct: {accumulated_f32 > direct_f32}") + + # Create boxes: + # Box A: ends at accumulated value (slightly > 100 in f64) + # Box B: starts at 100.0 + # In f64: overlap (accumulated >= 100) + # In f32: might not overlap if both round to same value but lose the "greater than" relationship + + boxes_f64 = np.array([ + [0.0, 0.0, accumulated_f64, 100.0], # Box A + [direct_f64, 0.0, 200.0, 100.0], # Box B + ], dtype=np.float64) + + boxes_f32 = boxes_f64.astype(np.float32) + + print(f"\nBoxes (f64):") + print(f" Box A ends at: {boxes_f64[0,2]:.20f}") + print(f" Box B starts at: {boxes_f64[1,0]:.20f}") + print(f" Overlap: {boxes_f64[0,2] >= boxes_f64[1,0]}") + + print(f"\nBoxes (f32):") + print(f" Box A ends at: {boxes_f32[0,2]:.20f}") + print(f" Box B starts at: {boxes_f32[1,0]:.20f}") + print(f" Overlap: {boxes_f32[0,2] >= boxes_f32[1,0]}") + + idx = np.array([0, 1], dtype=np.int64) + + tree_f64 = PRTree2D(idx, boxes_f64) + pairs_f64 = tree_f64.query_intersections() + + tree_f32 = PRTree2D(idx, boxes_f32) + pairs_f32 = tree_f32.query_intersections() + + print(f"\nFloat64 tree: {len(pairs_f64)} pairs - {pairs_f64}") + print(f"Float32 tree: {len(pairs_f32)} pairs - {pairs_f32}") + + if len(pairs_f64) > 0 and len(pairs_f32) == 0: + print("\n❌ FALSE NEGATIVE DETECTED!") + print(" Float64 found overlap, but Float32 did not!") + return True + + return False + + +def test_different_accumulation_rates(): + """ + Test with different accumulation rates to find rounding mismatches. + """ + print("\n=== Test: Different Accumulation Rates ===\n") + + # Accumulate from below + from_below_f64 = np.float64(0.0) + for i in range(1000): + from_below_f64 += np.float64(0.1000001) # Slightly more than 0.1 + + # Target value + target_f64 = np.float64(100.0) + + print(f"From below (f64): {from_below_f64:.20f}") + print(f"Target (f64): {target_f64:.20f}") + print(f"from_below >= target: {from_below_f64 >= target_f64}") + + from_below_f32 = np.float32(from_below_f64) + target_f32 = np.float32(target_f64) + + print(f"\nFrom below (f32): {from_below_f32:.20f}") + print(f"Target (f32): {target_f32:.20f}") + print(f"from_below >= target: {from_below_f32 >= target_f32}") + + # Create boxes + boxes_f64 = np.array([ + [0.0, 0.0, from_below_f64, 100.0], + [target_f64, 0.0, 200.0, 100.0], + ], dtype=np.float64) + + boxes_f32 = boxes_f64.astype(np.float32) + + print(f"\nOverlap (f64): {boxes_f64[0,2] >= boxes_f64[1,0]}") + print(f"Overlap (f32): {boxes_f32[0,2] >= boxes_f32[1,0]}") + + idx = np.array([0, 1], dtype=np.int64) + + tree_f64 = PRTree2D(idx, boxes_f64) + pairs_f64 = tree_f64.query_intersections() + + tree_f32 = PRTree2D(idx, boxes_f32) + pairs_f32 = tree_f32.query_intersections() + + print(f"\nFloat64 tree: {len(pairs_f64)} pairs") + print(f"Float32 tree: {len(pairs_f32)} pairs") + + if len(pairs_f64) > 0 and len(pairs_f32) == 0: + print("\n❌ FALSE NEGATIVE DETECTED!") + return True + + return False + + +def test_nextafter_gap(): + """ + Use nextafter to create values that are adjacent in float64 but not in float32. + """ + print("\n=== Test: NextAfter Gap ===\n") + + # Start with a float32 value + base_f32 = np.float32(1000.0) + next_f32 = np.nextafter(base_f32, np.float32(np.inf)) + + print(f"base_f32: {base_f32}") + print(f"next_f32: {next_f32}") + print(f"Gap in f32: {next_f32 - base_f32}") + + # In float64, create many values between them + base_f64 = np.float64(base_f32) + next_f64 = np.float64(next_f32) + + # Create a float64 value just slightly above base + epsilon_f64 = (next_f64 - base_f64) / 1000.0 # Much smaller than f32 gap + val1_f64 = base_f64 + epsilon_f64 + + print(f"\nbase_f64: {base_f64:.20f}") + print(f"val1_f64: {val1_f64:.20f}") + print(f"next_f64: {next_f64:.20f}") + print(f"val1 > base: {val1_f64 > base_f64}") + + # Convert to float32 + val1_f32 = np.float32(val1_f64) + + print(f"\nval1_f32: {val1_f32}") + print(f"val1_f32 == base_f32: {val1_f32 == base_f32}") + + # If val1_f32 rounds down to base_f32, we have a problem + if val1_f64 > base_f64 and val1_f32 == base_f32: + print("\n⚠️ val1 rounds down to base in float32!") + + # Create boxes: + # Box A: [0, 0, val1_f64, 100] -> [0, 0, base_f32, 100] + # Box B: [base_f64, 0, 2000, 100] -> [base_f32, 0, 2000, 100] + # In f64: overlap (val1_f64 > base_f64) + # In f32: touch exactly (base_f32 == base_f32), still should overlap with <= + + boxes_f64 = np.array([ + [0.0, 0.0, val1_f64, 100.0], + [base_f64, 0.0, 2000.0, 100.0], + ], dtype=np.float64) + + boxes_f32 = boxes_f64.astype(np.float32) + + print(f"\nBoxes (f64):") + print(f" Box A max: {boxes_f64[0,2]:.20f}") + print(f" Box B min: {boxes_f64[1,0]:.20f}") + print(f" Overlap: {boxes_f64[0,2] >= boxes_f64[1,0]}") + + print(f"\nBoxes (f32):") + print(f" Box A max: {boxes_f32[0,2]:.20f}") + print(f" Box B min: {boxes_f32[1,0]:.20f}") + print(f" Overlap: {boxes_f32[0,2] >= boxes_f32[1,0]}") + + idx = np.array([0, 1], dtype=np.int64) + + tree_f64 = PRTree2D(idx, boxes_f64) + pairs_f64 = tree_f64.query_intersections() + + tree_f32 = PRTree2D(idx, boxes_f32) + pairs_f32 = tree_f32.query_intersections() + + print(f"\nFloat64 tree: {len(pairs_f64)} pairs") + print(f"Float32 tree: {len(pairs_f32)} pairs") + + if len(pairs_f64) > 0 and len(pairs_f32) == 0: + print("\n❌ FALSE NEGATIVE DETECTED!") + return True + + return False + + +def test_str_roundtrip(): + """ + Test values that go through string representation (common in file I/O). + """ + print("\n=== Test: String Roundtrip ===\n") + + # Create a value, convert to string, and back + original_f64 = np.float64(100.0000000001) + + # Simulate limited precision string (e.g., from CSV file) + str_val = f"{original_f64:.10f}" # 10 decimal places + roundtrip_f64 = np.float64(str_val) + + print(f"Original (f64): {original_f64:.20f}") + print(f"String: {str_val}") + print(f"Roundtrip (f64): {roundtrip_f64:.20f}") + + original_f32 = np.float32(original_f64) + roundtrip_f32 = np.float32(roundtrip_f64) + + print(f"\nOriginal (f32): {original_f32:.20f}") + print(f"Roundtrip (f32): {roundtrip_f32:.20f}") + print(f"Equal: {original_f32 == roundtrip_f32}") + + # Create boxes + boxes_original = np.array([ + [0.0, 0.0, original_f64, 100.0], + [100.0, 0.0, 200.0, 100.0], + ], dtype=np.float64) + + boxes_roundtrip = np.array([ + [0.0, 0.0, roundtrip_f64, 100.0], + [100.0, 0.0, 200.0, 100.0], + ], dtype=np.float64) + + print(f"\nOriginal overlap (f64): {boxes_original[0,2] >= boxes_original[1,0]}") + print(f"Roundtrip overlap (f64): {boxes_roundtrip[0,2] >= boxes_roundtrip[1,0]}") + + boxes_f32_orig = boxes_original.astype(np.float32) + boxes_f32_round = boxes_roundtrip.astype(np.float32) + + print(f"\nOriginal overlap (f32): {boxes_f32_orig[0,2] >= boxes_f32_orig[1,0]}") + print(f"Roundtrip overlap (f32): {boxes_f32_round[0,2] >= boxes_f32_round[1,0]}") + + return False + + +if __name__ == "__main__": + print("=" * 70) + print("Testing for False Negatives in Float32 Overlap Detection") + print("=" * 70) + + issue_found = False + + issue_found |= test_accumulated_overlap_loss() + issue_found |= test_different_accumulation_rates() + issue_found |= test_nextafter_gap() + issue_found |= test_str_roundtrip() + + print("\n" + "=" * 70) + if issue_found: + print("❌ FALSE NEGATIVE CONFIRMED!") + print("Float32 rounding causes overlap detection to fail!") + else: + print("⚠️ No false negatives detected yet") + print("The issue may require specific data patterns or computation paths") + print("=" * 70) diff --git a/test_float32_extreme.py b/test_float32_extreme.py new file mode 100644 index 0000000..8345a0f --- /dev/null +++ b/test_float32_extreme.py @@ -0,0 +1,298 @@ +#!/usr/bin/env python3 +""" +Extreme test cases for float32 precision issues. + +This test attempts to find actual false negatives by testing +extreme precision boundaries. +""" +import numpy as np +from python_prtree import PRTree2D + + +def create_barely_overlapping_boxes(): + """ + Create boxes that overlap by the smallest possible float64 amount. + """ + print("\n=== Extreme Test: Minimal Overlap ===") + + # Start with a clean float32 value + base = np.float32(100.0) + + # Get the next representable float32 value + next_val = np.nextafter(base, np.float32(np.inf)) + ulp = next_val - base + + print(f"Base value (float32): {base}") + print(f"Next value (float32): {next_val}") + print(f"ULP: {ulp}") + + # In float64, create boxes that overlap by less than the float32 ULP + # Box A: ends at base + (ulp/3) in float64 + # Box B: starts at base in float64 + + # When converted to float32: + # Box A max will round to base (losing the tiny extension) + # Box B min is exactly base + # They should touch at base in float32 + + box_a_end_f64 = np.float64(base) + np.float64(ulp) / 3.0 + box_b_start_f64 = np.float64(base) + + print(f"\nBox A end (float64): {box_a_end_f64:.20f}") + print(f"Box B start (float64): {box_b_start_f64:.20f}") + print(f"Overlap in float64: {box_a_end_f64 >= box_b_start_f64}") + + box_a_end_f32 = np.float32(box_a_end_f64) + box_b_start_f32 = np.float32(box_b_start_f64) + + print(f"\nBox A end (float32): {box_a_end_f32:.20f}") + print(f"Box B start (float32): {box_b_start_f32:.20f}") + print(f"Overlap in float32: {box_a_end_f32 >= box_b_start_f32}") + + # Build trees with both precisions + boxes_f64 = np.array([ + [0.0, 0.0, box_a_end_f64, 100.0], + [box_b_start_f64, 0.0, 200.0, 100.0], + ], dtype=np.float64) + + boxes_f32 = np.array([ + [0.0, 0.0, box_a_end_f32, 100.0], + [box_b_start_f32, 0.0, 200.0, 100.0], + ], dtype=np.float32) + + idx = np.array([0, 1], dtype=np.int64) + + tree_f64 = PRTree2D(idx, boxes_f64) + pairs_f64 = tree_f64.query_intersections() + + tree_f32 = PRTree2D(idx, boxes_f32) + pairs_f32 = tree_f32.query_intersections() + + print(f"\nFloat64 tree intersections: {pairs_f64}") + print(f"Float32 tree intersections: {pairs_f32}") + + if len(pairs_f64) != len(pairs_f32): + print(f"\n❌ FALSE NEGATIVE DETECTED!") + return True + + return False + + +def test_negative_coordinates(): + """ + Test with negative coordinates where float32 precision differs. + """ + print("\n=== Test: Negative Coordinates ===") + + # Float32 precision varies with magnitude + # Test negative values + val = -16777216.5 # Near 2^24 boundary + + boxes_f64 = np.array([ + [val - 1000, 0.0, val + 0.1, 100.0], + [val, 0.0, val + 1000, 100.0], + ], dtype=np.float64) + + boxes_f32 = boxes_f64.astype(np.float32) + + print(f"Boxes (float64):\n{boxes_f64}") + print(f"Boxes (float32):\n{boxes_f32}") + + # Check if the small overlap is preserved + overlap_f64 = boxes_f64[0, 2] >= boxes_f64[1, 0] + overlap_f32 = boxes_f32[0, 2] >= boxes_f32[1, 0] + + print(f"\nOverlap in float64: {overlap_f64}") + print(f"Overlap in float32: {overlap_f32}") + + idx = np.array([0, 1], dtype=np.int64) + + tree_f64 = PRTree2D(idx, boxes_f64) + pairs_f64 = tree_f64.query_intersections() + + tree_f32 = PRTree2D(idx, boxes_f32) + pairs_f32 = tree_f32.query_intersections() + + print(f"\nFloat64 tree intersections: {pairs_f64}") + print(f"Float32 tree intersections: {pairs_f32}") + + if overlap_f64 and not overlap_f32: + print(f"\n⚠️ Overlap lost in float32!") + + if len(pairs_f64) != len(pairs_f32): + print(f"\n❌ FALSE NEGATIVE DETECTED!") + return True + + return False + + +def test_accumulated_rounding_errors(): + """ + Test if accumulated rounding errors can cause false negatives. + """ + print("\n=== Test: Accumulated Rounding Errors ===") + + # Create a chain of boxes that should all touch + # Each box ends where the next one begins + # With float32 rounding, gaps might appear + + n_boxes = 10 + base = np.float64(0.1) # Not exactly representable in binary + + boxes_f64 = [] + for i in range(n_boxes): + start = base * i + end = base * (i + 1) + boxes_f64.append([start, 0.0, end, 100.0]) + + boxes_f64 = np.array(boxes_f64, dtype=np.float64) + boxes_f32 = boxes_f64.astype(np.float32) + + print(f"Number of boxes: {n_boxes}") + print(f"\nFirst 3 boxes (float64):\n{boxes_f64[:3]}") + print(f"\nFirst 3 boxes (float32):\n{boxes_f32[:3]}") + + # Check for gaps + for i in range(n_boxes - 1): + gap_f64 = boxes_f64[i + 1, 0] - boxes_f64[i, 2] + gap_f32 = boxes_f32[i + 1, 0] - boxes_f32[i, 2] + + if abs(gap_f32) > abs(gap_f64) + 1e-10: + print(f"\nGap created between box {i} and {i+1}:") + print(f" Float64 gap: {gap_f64:.15e}") + print(f" Float32 gap: {gap_f32:.15e}") + + idx = np.arange(n_boxes, dtype=np.int64) + + tree_f64 = PRTree2D(idx, boxes_f64) + pairs_f64 = tree_f64.query_intersections() + + tree_f32 = PRTree2D(idx, boxes_f32) + pairs_f32 = tree_f32.query_intersections() + + print(f"\nFloat64 tree intersections: {len(pairs_f64)} pairs") + print(f"Float32 tree intersections: {len(pairs_f32)} pairs") + + if len(pairs_f64) != len(pairs_f32): + print(f"\n❌ FALSE NEGATIVE DETECTED!") + print(f"Missing pairs: {len(pairs_f64) - len(pairs_f32)}") + return True + + return False + + +def test_subnormal_values(): + """ + Test with very small (subnormal) float32 values. + """ + print("\n=== Test: Subnormal Float32 Values ===") + + # Smallest normal float32: ~1.175e-38 + # Smallest subnormal float32: ~1.4e-45 + + small = np.float32(1e-40) + tiny_overlap = np.float32(1e-44) + + boxes_f32 = np.array([ + [0.0, 0.0, small + tiny_overlap, 1.0], + [small, 0.0, 1.0, 1.0], + ], dtype=np.float32) + + print(f"Small value: {small}") + print(f"Tiny overlap: {tiny_overlap}") + print(f"Sum: {small + tiny_overlap}") + print(f"\nBoxes (float32):\n{boxes_f32}") + + overlap = boxes_f32[0, 2] >= boxes_f32[1, 0] + print(f"\nBoxes overlap: {overlap}") + + idx = np.array([0, 1], dtype=np.int64) + + tree_f32 = PRTree2D(idx, boxes_f32) + pairs_f32 = tree_f32.query_intersections() + + print(f"Intersections: {pairs_f32}") + + # These should intersect based on the values + if overlap and len(pairs_f32) == 0: + print(f"\n❌ FALSE NEGATIVE DETECTED with subnormal values!") + return True + + return False + + +def test_double_rounding(): + """ + Test double rounding: float64 -> float32 -> comparison. + """ + print("\n=== Test: Double Rounding Effect ===") + + # Create a value in float64 that has different rounding behavior + # Specifically, a value that rounds differently when going directly + # to float32 vs when being used in comparisons + + # Use a value that's exactly between two float32 representations + f32_val = np.float32(1000.0) + f32_next = np.nextafter(f32_val, np.float32(np.inf)) + f32_prev = np.nextafter(f32_val, np.float32(-np.inf)) + + # Create a float64 value exactly between f32_val and f32_next + midpoint_f64 = (np.float64(f32_val) + np.float64(f32_next)) / 2.0 + + print(f"f32_prev: {f32_prev:.20f}") + print(f"f32_val: {f32_val:.20f}") + print(f"midpoint (f64): {midpoint_f64:.20f}") + print(f"f32_next: {f32_next:.20f}") + + # When converted to float32, midpoint rounds to... + midpoint_f32 = np.float32(midpoint_f64) + print(f"midpoint (f32): {midpoint_f32:.20f}") + + # Create boxes using this + boxes_f64 = np.array([ + [0.0, 0.0, midpoint_f64, 100.0], + [f32_val, 0.0, 2000.0, 100.0], + ], dtype=np.float64) + + boxes_f32 = boxes_f64.astype(np.float32) + + print(f"\nBoxes (float64):\n{boxes_f64}") + print(f"Boxes (float32):\n{boxes_f32}") + + idx = np.array([0, 1], dtype=np.int64) + + tree_f64 = PRTree2D(idx, boxes_f64) + pairs_f64 = tree_f64.query_intersections() + + tree_f32 = PRTree2D(idx, boxes_f32) + pairs_f32 = tree_f32.query_intersections() + + print(f"\nFloat64 tree intersections: {pairs_f64}") + print(f"Float32 tree intersections: {pairs_f32}") + + if len(pairs_f64) != len(pairs_f32): + print(f"\n❌ FALSE NEGATIVE DETECTED!") + return True + + return False + + +if __name__ == "__main__": + print("Extreme float32 precision tests") + print("=" * 70) + + issue_found = False + + issue_found |= create_barely_overlapping_boxes() + issue_found |= test_negative_coordinates() + issue_found |= test_accumulated_rounding_errors() + issue_found |= test_subnormal_values() + issue_found |= test_double_rounding() + + print("\n" + "=" * 70) + if issue_found: + print("❌ ISSUE CONFIRMED: Float32 rounding causes false negatives!") + else: + print("⚠️ No false negatives detected in extreme tests") + print("The issue may require specific real-world data to reproduce") + print("=" * 70) diff --git a/test_float32_overlap_issue.py b/test_float32_overlap_issue.py new file mode 100644 index 0000000..a1af996 --- /dev/null +++ b/test_float32_overlap_issue.py @@ -0,0 +1,220 @@ +#!/usr/bin/env python3 +""" +Test to reproduce float32 rounding false negative issue in overlap detection. + +Issue: When using float32 input, rounding errors can cause overlapping AABBs +to not be detected as intersecting (false negative). +""" +import numpy as np +from python_prtree import PRTree2D + + +def test_float32_false_negative(): + """ + Reproduce the false negative issue with float32 rounding. + + Create two boxes that overlap very slightly in float64 precision, + but may not overlap after float32 rounding. + """ + print("\n=== Test Case 1: Very small overlap ===") + + # Create boxes with very small overlap that might be lost in float32 + # Box A: [0.0, 100.00001] + # Box B: [100.0, 200.0] + # These should overlap at the boundary in float64, but may not in float32 + + # Using float64 + boxes_f64 = np.array([ + [0.0, 0.0, 100.00001, 100.0], # Box 0: slightly overlapping + [100.0, 0.0, 200.0, 100.0], # Box 1 + ], dtype=np.float64) + + # Using float32 + boxes_f32 = boxes_f64.astype(np.float32) + + print(f"\nOriginal float64 boxes:\n{boxes_f64}") + print(f"\nFloat32 boxes after conversion:\n{boxes_f32}") + print(f"\nPrecision loss: {boxes_f64 - boxes_f32.astype(np.float64)}") + + idx = np.array([0, 1], dtype=np.int64) + + # Test with float64 input (should use refinement) + tree_f64 = PRTree2D(idx, boxes_f64) + pairs_f64 = tree_f64.query_intersections() + + # Test with float32 input (no refinement) + tree_f32 = PRTree2D(idx, boxes_f32) + pairs_f32 = tree_f32.query_intersections() + + print(f"\nFloat64 tree intersections: {pairs_f64}") + print(f"Float32 tree intersections: {pairs_f32}") + + if len(pairs_f64) != len(pairs_f32): + print(f"\n⚠️ FALSE NEGATIVE DETECTED!") + print(f"Float64 found {len(pairs_f64)} pairs, but float32 found {len(pairs_f32)} pairs") + return True + + return False + + +def test_float32_precision_boundary(): + """ + Test with values at float32 precision boundary. + + Float32 has about 7 decimal digits of precision. + """ + print("\n=== Test Case 2: Float32 precision boundary ===") + + # Create a case where float32 rounding causes non-overlap + # Use large coordinates where float32 has limited precision + base = 1000000.0 # 1 million + epsilon = 0.00001 # Small overlap + + boxes_f64 = np.array([ + [0.0, 0.0, base + epsilon, 100.0], + [base, 0.0, base + 100.0, 100.0], + ], dtype=np.float64) + + boxes_f32 = boxes_f64.astype(np.float32) + + print(f"\nOriginal float64 boxes:\n{boxes_f64}") + print(f"\nFloat32 boxes after conversion:\n{boxes_f32}") + + # Check if the overlap is preserved + box0_max_x_f64 = boxes_f64[0, 2] + box1_min_x_f64 = boxes_f64[1, 0] + box0_max_x_f32 = boxes_f32[0, 2] + box1_min_x_f32 = boxes_f32[1, 0] + + print(f"\nFloat64: Box0 max_x={box0_max_x_f64:.10f}, Box1 min_x={box1_min_x_f64:.10f}") + print(f"Float64 overlap: {box0_max_x_f64 >= box1_min_x_f64}") + + print(f"\nFloat32: Box0 max_x={box0_max_x_f32:.10f}, Box1 min_x={box1_min_x_f32:.10f}") + print(f"Float32 overlap: {box0_max_x_f32 >= box1_min_x_f32}") + + idx = np.array([0, 1], dtype=np.int64) + + tree_f64 = PRTree2D(idx, boxes_f64) + pairs_f64 = tree_f64.query_intersections() + + tree_f32 = PRTree2D(idx, boxes_f32) + pairs_f32 = tree_f32.query_intersections() + + print(f"\nFloat64 tree intersections: {pairs_f64}") + print(f"Float32 tree intersections: {pairs_f32}") + + if len(pairs_f64) != len(pairs_f32): + print(f"\n⚠️ FALSE NEGATIVE DETECTED!") + print(f"Float64 found {len(pairs_f64)} pairs, but float32 found {len(pairs_f32)} pairs") + return True + + return False + + +def test_float32_query_method(): + """ + Test the query() method with float32 to check for false negatives. + """ + print("\n=== Test Case 3: Query method with float32 ===") + + # Create a box and a query box that barely overlap + boxes_f64 = np.array([ + [0.0, 0.0, 100.0000001, 100.0], + ], dtype=np.float64) + + query_f64 = np.array([100.0, 0.0, 200.0, 100.0], dtype=np.float64) + + boxes_f32 = boxes_f64.astype(np.float32) + query_f32 = query_f64.astype(np.float32) + + print(f"\nBox (float64): {boxes_f64[0]}") + print(f"Box (float32): {boxes_f32[0]}") + print(f"Query (float64): {query_f64}") + print(f"Query (float32): {query_f32}") + + idx = np.array([0], dtype=np.int64) + + tree_f64 = PRTree2D(idx, boxes_f64) + result_f64 = tree_f64.query(query_f64) + + tree_f32 = PRTree2D(idx, boxes_f32) + result_f32 = tree_f32.query(query_f32.tolist()) + + print(f"\nFloat64 tree query result: {result_f64}") + print(f"Float32 tree query result: {result_f32}") + + if len(result_f64) != len(result_f32): + print(f"\n⚠️ FALSE NEGATIVE DETECTED in query()!") + print(f"Float64 found {len(result_f64)} items, but float32 found {len(result_f32)} items") + return True + + return False + + +def test_float32_ulp_case(): + """ + Test with boxes separated by exactly 1 ULP (Unit in the Last Place) in float32. + """ + print("\n=== Test Case 4: 1 ULP separation in float32 ===") + + # In float32, around 100.0, the ULP is about 0.0000076 + # We'll create boxes that touch exactly at the float32 representation boundary + + val1 = np.float32(100.0) + # Get the next representable float32 value + val2 = np.nextafter(val1, np.float32(np.inf)) + + print(f"\nval1 (float32): {val1:.20f}") + print(f"val2 (nextafter): {val2:.20f}") + print(f"Difference: {val2 - val1:.20e}") + + # Create boxes that should touch/overlap at the boundary + boxes_f32 = np.array([ + [0.0, 0.0, val1, 100.0], + [val1, 0.0, 200.0, 100.0], + ], dtype=np.float32) + + # Convert to float64 for comparison + boxes_f64 = boxes_f32.astype(np.float64) + + print(f"\nBoxes (float32):\n{boxes_f32}") + print(f"\nBoxes (float64):\n{boxes_f64}") + + idx = np.array([0, 1], dtype=np.int64) + + # These should intersect (touching boxes are considered intersecting) + tree_f64 = PRTree2D(idx, boxes_f64) + pairs_f64 = tree_f64.query_intersections() + + tree_f32 = PRTree2D(idx, boxes_f32) + pairs_f32 = tree_f32.query_intersections() + + print(f"\nFloat64 tree intersections: {pairs_f64}") + print(f"Float32 tree intersections: {pairs_f32}") + + # Check consistency + if len(pairs_f64) != len(pairs_f32): + print(f"\n⚠️ INCONSISTENCY DETECTED!") + print(f"Float64 found {len(pairs_f64)} pairs, but float32 found {len(pairs_f32)} pairs") + return True + + return False + + +if __name__ == "__main__": + print("Testing float32 rounding false negative issue in overlap detection") + print("=" * 70) + + issue_found = False + + issue_found |= test_float32_false_negative() + issue_found |= test_float32_precision_boundary() + issue_found |= test_float32_query_method() + issue_found |= test_float32_ulp_case() + + print("\n" + "=" * 70) + if issue_found: + print("❌ ISSUE CONFIRMED: Float32 rounding causes false negatives in overlap detection") + else: + print("✅ No false negatives detected in these test cases") + print("=" * 70) diff --git a/test_float32_refined.py b/test_float32_refined.py new file mode 100644 index 0000000..1ca174e --- /dev/null +++ b/test_float32_refined.py @@ -0,0 +1,265 @@ +#!/usr/bin/env python3 +""" +Refined test for float32 rounding false negative in overlap detection. + +Focus on cases where float32 rounding causes actual precision loss in the tree structure. +""" +import numpy as np +from python_prtree import PRTree2D + + +def test_float32_internal_representation(): + """ + Test the actual internal float32 representation issue. + + When boxes are stored as float32 internally but queried with float64 values, + there can be mismatches due to rounding. + """ + print("\n=== Test Case: Float32 Internal Representation ===") + + # Create two boxes that should overlap based on float64 values + # but might not overlap when stored as float32 + + # We need values that change when converted to float32 + # Example: 0.1 cannot be exactly represented in binary floating point + + val1_f64 = np.float64(0.1) * 1000 # 100.0 in float64 + val2_f64 = val1_f64 + np.float64(0.0000001) # Slightly larger + + print(f"val1 (float64): {val1_f64:.20f}") + print(f"val2 (float64): {val2_f64:.20f}") + + val1_f32 = np.float32(val1_f64) + val2_f32 = np.float32(val2_f64) + + print(f"val1 (float32): {val1_f32:.20f}") + print(f"val2 (float32): {val2_f32:.20f}") + + # Check if they're different after float32 conversion + if val1_f32 == val2_f32: + print(f"⚠️ Float32 rounding makes these values identical!") + print(f"This can cause precision issues.") + + # Create boxes using these values + boxes_f64 = np.array([ + [0.0, 0.0, val2_f64, 100.0], # Box 0 + [val1_f64, 0.0, 200.0, 100.0], # Box 1 + ], dtype=np.float64) + + boxes_f32 = np.array([ + [0.0, 0.0, val2_f32, 100.0], # Box 0 + [val1_f32, 0.0, 200.0, 100.0], # Box 1 + ], dtype=np.float32) + + print(f"\nBoxes (float64):\n{boxes_f64}") + print(f"\nBoxes (float32):\n{boxes_f32}") + + idx = np.array([0, 1], dtype=np.int64) + + tree_f64 = PRTree2D(idx, boxes_f64) + pairs_f64 = tree_f64.query_intersections() + + tree_f32 = PRTree2D(idx, boxes_f32) + pairs_f32 = tree_f32.query_intersections() + + print(f"\nFloat64 tree intersections: {pairs_f64}") + print(f"Float32 tree intersections: {pairs_f32}") + + if len(pairs_f64) != len(pairs_f32): + print(f"\n❌ FALSE NEGATIVE DETECTED!") + return True + return False + + +def test_float32_tree_float64_query_mismatch(): + """ + Test mismatch between float32 tree storage and float64 query. + + When a tree is built with float32, the bounding boxes are stored with + float32 precision. If we then query with float64 values that represent + the "true" boundaries, we might miss intersections due to rounding. + """ + print("\n=== Test Case: Float32 Tree with Float64 Query Mismatch ===") + + # Create a value that has different representations in float32 vs float64 + # Use a value that's not exactly representable in binary + true_val_f64 = np.float64(1e7) + np.float64(0.3) # 10000000.3 + + print(f"True value (float64): {true_val_f64:.10f}") + + # Convert to float32 and back to see the precision loss + rounded_val_f32 = np.float32(true_val_f64) + print(f"Rounded value (float32): {rounded_val_f32:.10f}") + print(f"Difference: {true_val_f64 - np.float64(rounded_val_f32):.15f}") + + # Box A: ends at the rounded float32 value + # Box B: starts at the true float64 value + # These should touch/overlap in float64 but might have a gap in float32 + + box_a_f32 = np.array([[0.0, 0.0, rounded_val_f32, 100.0]], dtype=np.float32) + box_b_f32 = np.array([[rounded_val_f32, 0.0, rounded_val_f32 + 1000, 100.0]], dtype=np.float32) + + # Build tree with float32 + idx = np.array([0, 1], dtype=np.int64) + boxes_f32 = np.vstack([box_a_f32, box_b_f32]) + tree_f32 = PRTree2D(idx, boxes_f32) + + print(f"\nTree boxes (float32):\n{boxes_f32}") + + # Query with the original float64 value + query_at_true_boundary = np.array([true_val_f64, 0.0, true_val_f64 + 10, 100.0], dtype=np.float64) + print(f"\nQuery box (float64): {query_at_true_boundary}") + + # This query should intersect box A (whose end is at the rounded value) + # But due to float32 storage, it might not + result = tree_f32.query(query_at_true_boundary.tolist()) + print(f"\nQuery result: {result}") + + # The query box is between the boxes, so it might not intersect either + # Let's also test intersection detection + pairs = tree_f32.query_intersections() + print(f"Intersections: {pairs}") + + # Expected: boxes should touch at the boundary + print(f"\nBox A max: {boxes_f32[0, 2]:.10f}") + print(f"Box B min: {boxes_f32[1, 0]:.10f}") + print(f"Boxes touch: {boxes_f32[0, 2] >= boxes_f32[1, 0]}") + + return False + + +def test_float32_gap_creation(): + """ + Test if float32 rounding can create gaps between boxes that should touch. + """ + print("\n=== Test Case: Gap Creation via Float32 Rounding ===") + + # Find two float64 values that when rounded to float32, create a gap + # Strategy: Find a float64 value v such that float32(v) < v < float32(nextafter(float32(v))) + + base = np.float64(16777216.0) # 2^24, where float32 has integer precision + delta = np.float64(1.5) # This should round differently + + val1_f64 = base + delta + val2_f64 = base + delta + + val1_f32 = np.float32(val1_f64) + next_f32 = np.nextafter(val1_f32, np.float32(np.inf)) + + print(f"Base + delta (float64): {val1_f64:.10f}") + print(f"Rounded down (float32): {val1_f32:.10f}") + print(f"Next float32 value: {next_f32:.10f}") + print(f"Gap size: {np.float64(next_f32) - np.float64(val1_f32):.10f}") + + # Create boxes that touch at val1_f64 in float64 + # But when stored as float32, they might have a gap + boxes_f64 = np.array([ + [0.0, 0.0, val1_f64, 100.0], # Box A ends at val1_f64 + [val1_f64, 0.0, val1_f64 + 1000, 100.0], # Box B starts at val1_f64 + ], dtype=np.float64) + + boxes_f32 = boxes_f64.astype(np.float32) + + print(f"\nOriginal boxes (float64):\n{boxes_f64}") + print(f"\nRounded boxes (float32):\n{boxes_f32}") + + # Check if gap was created + gap_f64 = boxes_f64[1, 0] - boxes_f64[0, 2] + gap_f32 = boxes_f32[1, 0] - boxes_f32[0, 2] + + print(f"\nGap in float64: {gap_f64:.15f}") + print(f"Gap in float32: {gap_f32:.15f}") + + if gap_f32 > gap_f64: + print(f"⚠️ Float32 rounding created a gap!") + + idx = np.array([0, 1], dtype=np.int64) + + tree_f64 = PRTree2D(idx, boxes_f64) + pairs_f64 = tree_f64.query_intersections() + + tree_f32 = PRTree2D(idx, boxes_f32) + pairs_f32 = tree_f32.query_intersections() + + print(f"\nFloat64 tree intersections: {pairs_f64}") + print(f"Float32 tree intersections: {pairs_f32}") + + if len(pairs_f64) != len(pairs_f32): + print(f"\n❌ FALSE NEGATIVE DETECTED!") + return True + + return False + + +def test_specific_failing_case(): + """ + Test a specific case that's known to fail with float32 precision. + """ + print("\n=== Test Case: Specific Failing Case ===") + + # Use a known problematic value + # At large magnitudes, float32 has ULP (Unit in Last Place) of several units + large_val = 16777216.0 # 2^24 + + # Create boxes that overlap by less than 1 ULP at this magnitude + ulp = np.nextafter(np.float32(large_val), np.float32(np.inf)) - np.float32(large_val) + print(f"ULP at {large_val}: {ulp}") + + # Create boxes that overlap by half a ULP in float64 + overlap = ulp / 2.0 + + boxes_f64 = np.array([ + [0.0, 0.0, large_val + overlap, 100.0], + [large_val, 0.0, large_val + 1000, 100.0], + ], dtype=np.float64) + + boxes_f32 = boxes_f64.astype(np.float32) + + print(f"\nBoxes (float64):\n{boxes_f64}") + print(f"Boxes (float32):\n{boxes_f32}") + + # Check overlap + overlap_f64 = boxes_f64[0, 2] >= boxes_f64[1, 0] + overlap_f32 = boxes_f32[0, 2] >= boxes_f32[1, 0] + + print(f"\nOverlap in float64: {overlap_f64}") + print(f"Overlap in float32: {overlap_f32}") + + if overlap_f64 and not overlap_f32: + print(f"⚠️ Overlap lost in float32 conversion!") + + idx = np.array([0, 1], dtype=np.int64) + + tree_f64 = PRTree2D(idx, boxes_f64) + pairs_f64 = tree_f64.query_intersections() + + tree_f32 = PRTree2D(idx, boxes_f32) + pairs_f32 = tree_f32.query_intersections() + + print(f"\nFloat64 tree intersections: {pairs_f64}") + print(f"Float32 tree intersections: {pairs_f32}") + + if len(pairs_f64) != len(pairs_f32): + print(f"\n❌ FALSE NEGATIVE DETECTED!") + return True + + return False + + +if __name__ == "__main__": + print("Refined float32 rounding false negative tests") + print("=" * 70) + + issue_found = False + + issue_found |= test_float32_internal_representation() + issue_found |= test_float32_tree_float64_query_mismatch() + issue_found |= test_float32_gap_creation() + issue_found |= test_specific_failing_case() + + print("\n" + "=" * 70) + if issue_found: + print("❌ ISSUE CONFIRMED: Float32 rounding causes false negatives") + else: + print("✅ No false negatives detected in refined tests") + print("=" * 70) diff --git a/test_rounding_direction.py b/test_rounding_direction.py new file mode 100644 index 0000000..3fecb62 --- /dev/null +++ b/test_rounding_direction.py @@ -0,0 +1,252 @@ +#!/usr/bin/env python3 +""" +Test for false negatives caused by different rounding directions. + +When two different float64 values round to different float32 values +in opposite directions, overlapping boxes in float64 may not overlap +in float32. +""" +import numpy as np +from python_prtree import PRTree2D + + +def find_rounding_direction_mismatch(): + """ + Find two float64 values that: + 1. In float64: v1 >= v2 (overlap) + 2. When converted to float32: + - v1 rounds DOWN to v1_f32 + - v2 rounds UP to v2_f32 + - v1_f32 < v2_f32 (no overlap!) + """ + print("\n=== Finding Rounding Direction Mismatch ===\n") + + # Start with a base float32 value + base_f32 = np.float32(1000.0) + next_f32 = np.nextafter(base_f32, np.float32(np.inf)) + + print(f"base_f32: {base_f32:.20f}") + print(f"next_f32: {next_f32:.20f}") + + # Convert to float64 to get exact representations + base_f64 = np.float64(base_f32) + next_f64 = np.float64(next_f32) + + print(f"\nbase_f64: {base_f64:.20f}") + print(f"next_f64: {next_f64:.20f}") + + # Create two float64 values in between base and next + # v1: closer to next (will round UP to next_f32) + # v2: closer to base (will round DOWN to base_f32) + # But v1 > v2 in float64 + + midpoint = (base_f64 + next_f64) / 2.0 + + # v2 slightly below midpoint (rounds down to base_f32) + v2_f64 = midpoint - (next_f64 - base_f64) * 0.1 + + # v1 slightly above v2 but still below midpoint (also rounds down to base_f32) + v1_f64 = v2_f64 + (next_f64 - base_f64) * 0.05 + + print(f"\nmidpoint: {midpoint:.20f}") + print(f"v1_f64: {v1_f64:.20f}") + print(f"v2_f64: {v2_f64:.20f}") + print(f"v1_f64 >= v2_f64: {v1_f64 >= v2_f64}") + + # Convert to float32 + v1_f32 = np.float32(v1_f64) + v2_f32 = np.float32(v2_f64) + + print(f"\nv1_f32: {v1_f32:.20f}") + print(f"v2_f32: {v2_f32:.20f}") + print(f"v1_f32 >= v2_f32: {v1_f32 >= v2_f32}") + + if v1_f64 >= v2_f64 and v1_f32 < v2_f32: + print("\n⚠️ CRITICAL: Rounding direction mismatch found!") + print(f" Float64: overlap (v1={v1_f64:.20f} >= v2={v2_f64:.20f})") + print(f" Float32: no overlap (v1={v1_f32:.20f} < v2={v2_f32:.20f})") + return v1_f64, v2_f64, v1_f32, v2_f32 + + return None + + +def test_rounding_direction_false_negative(): + """ + Test if rounding direction causes false negatives in overlap detection. + """ + print("\n=== Test: Rounding Direction False Negative ===\n") + + # Strategy: Find two float32 values with a gap, then insert float64 values + # in that gap that will round to different float32 values + + base_f32 = np.float32(100.0) + next_f32 = np.nextafter(base_f32, np.float32(np.inf)) + + gap = next_f32 - base_f32 + print(f"Float32 gap at 100.0: {gap}") + + # Create float64 values in the gap + base_f64 = np.float64(base_f32) + next_f64 = np.float64(next_f32) + + # v1: just above base_f64, should round to base_f32 + v1_f64 = base_f64 + (next_f64 - base_f64) * 0.4 + + # v2: just below next_f64, should round to next_f32 + v2_f64 = base_f64 + (next_f64 - base_f64) * 0.6 + + print(f"\nv1_f64 = {v1_f64:.20f}") + print(f"v2_f64 = {v2_f64:.20f}") + print(f"v1_f64 < v2_f64: {v1_f64 < v2_f64}") + + v1_f32 = np.float32(v1_f64) + v2_f32 = np.float32(v2_f64) + + print(f"\nv1_f32 = {v1_f32:.20f}") + print(f"v2_f32 = {v2_f32:.20f}") + print(f"v1_f32 < v2_f32: {v1_f32 < v2_f32}") + + if v1_f32 < v2_f32: + print(f"\n✓ Rounding created a gap in float32!") + print(f" Gap in float64: {v2_f64 - v1_f64:.20e}") + print(f" Gap in float32: {v2_f32 - v1_f32:.20e}") + + # Now create boxes: + # Box A: ends at v1_f64 (will be stored as v1_f32) + # Box B: starts at v2_f64 (will be stored as v2_f32) + # In float64: boxes overlap (v1_f64 overlaps with range before v2_f64) + # In float32: boxes don't overlap (v1_f32 < v2_f32) + + # But wait, for them to overlap in float64, we need v1 >= v2 + # Let's reverse: Box A ends at v2_f64, Box B starts at v1_f64 + + boxes_f64 = np.array([ + [0.0, 0.0, v2_f64, 100.0], # Box A ends at v2 + [v1_f64, 0.0, 200.0, 100.0], # Box B starts at v1 + ], dtype=np.float64) + + boxes_f32 = boxes_f64.astype(np.float32) + + print(f"\nBoxes (float64):") + print(f" Box A: [0, 0, {boxes_f64[0,2]:.20f}, 100]") + print(f" Box B: [{boxes_f64[1,0]:.20f}, 0, 200, 100]") + print(f" Overlap: {boxes_f64[0,2] >= boxes_f64[1,0]}") + + print(f"\nBoxes (float32):") + print(f" Box A: [0, 0, {boxes_f32[0,2]:.20f}, 100]") + print(f" Box B: [{boxes_f32[1,0]:.20f}, 0, 200, 100]") + print(f" Overlap: {boxes_f32[0,2] >= boxes_f32[1,0]}") + + # Test with trees + idx = np.array([0, 1], dtype=np.int64) + + tree_f64 = PRTree2D(idx, boxes_f64) + pairs_f64 = tree_f64.query_intersections() + + tree_f32 = PRTree2D(idx, boxes_f32) + pairs_f32 = tree_f32.query_intersections() + + print(f"\nFloat64 tree intersections: {pairs_f64}") + print(f"Float32 tree intersections: {pairs_f32}") + + if len(pairs_f64) != len(pairs_f32): + print(f"\n❌ FALSE NEGATIVE DETECTED!") + print(f" Float64 found {len(pairs_f64)} pairs") + print(f" Float32 found {len(pairs_f32)} pairs") + return True + + return False + + +def test_opposite_rounding(): + """ + Test with values that are guaranteed to round in opposite directions. + """ + print("\n=== Test: Guaranteed Opposite Rounding ===\n") + + # Use a large value where float32 precision is limited + base = 16777216.0 # 2^24, where float32 ULP = 2.0 + + # At this magnitude, float32 can only represent even integers + # Odd integers will round to the nearest even integer + + # Create two consecutive float32-representable values + v1_f32 = np.float32(base) # 16777216.0 + v2_f32 = np.nextafter(v1_f32, np.float32(np.inf)) # 16777218.0 + + print(f"v1_f32: {v1_f32}") + print(f"v2_f32: {v2_f32}") + print(f"Gap: {v2_f32 - v1_f32}") + + # In float64, create values in between + v1_f64 = np.float64(v1_f32) + 0.5 # 16777216.5 (rounds down to v1_f32) + v2_f64 = np.float64(v1_f32) + 1.5 # 16777217.5 (rounds up to v2_f32) + + print(f"\nv1_f64: {v1_f64}") + print(f"v2_f64: {v2_f64}") + + # Verify rounding + print(f"\nRounding verification:") + print(f" float32(v1_f64) = {np.float32(v1_f64)} (expected {v1_f32})") + print(f" float32(v2_f64) = {np.float32(v2_f64)} (expected {v2_f32})") + + # Create overlapping boxes in float64 + # Box A ends at v2_f64 + 0.3 = 16777217.8 + # Box B starts at v1_f64 = 16777216.5 + # They overlap by 1.3 in float64 + + box_a_end_f64 = v2_f64 + 0.3 + box_b_start_f64 = v1_f64 + + boxes_f64 = np.array([ + [0.0, 0.0, box_a_end_f64, 100.0], + [box_b_start_f64, 0.0, box_a_end_f64 + 1000, 100.0], + ], dtype=np.float64) + + boxes_f32 = boxes_f64.astype(np.float32) + + print(f"\nBoxes (float64):") + print(f" Box A: ends at {boxes_f64[0,2]}") + print(f" Box B: starts at {boxes_f64[1,0]}") + print(f" Overlap: {boxes_f64[0,2] >= boxes_f64[1,0]}") + + print(f"\nBoxes (float32):") + print(f" Box A: ends at {boxes_f32[0,2]}") + print(f" Box B: starts at {boxes_f32[1,0]}") + print(f" Overlap: {boxes_f32[0,2] >= boxes_f32[1,0]}") + + idx = np.array([0, 1], dtype=np.int64) + + tree_f64 = PRTree2D(idx, boxes_f64) + pairs_f64 = tree_f64.query_intersections() + + tree_f32 = PRTree2D(idx, boxes_f32) + pairs_f32 = tree_f32.query_intersections() + + print(f"\nFloat64 tree intersections: {pairs_f64}") + print(f"Float32 tree intersections: {pairs_f32}") + + if len(pairs_f64) != len(pairs_f32): + print(f"\n❌ FALSE NEGATIVE DETECTED!") + return True + + return False + + +if __name__ == "__main__": + print("=" * 70) + print("Testing Rounding Direction Mismatch") + print("=" * 70) + + find_rounding_direction_mismatch() + + issue_found = False + issue_found |= test_rounding_direction_false_negative() + issue_found |= test_opposite_rounding() + + print("\n" + "=" * 70) + if issue_found: + print("❌ FALSE NEGATIVE CONFIRMED: Rounding direction causes the issue!") + else: + print("⚠️ Still investigating rounding direction effects...") + print("=" * 70)