Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .ai
8 changes: 8 additions & 0 deletions .audit/oberstet_modernization-phase-12.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
- [ ] I did **not** use any AI-assistance tools to help create this pull request.
- [x] I **did** use AI-assistance tools to *help* create this pull request.
- [x] I have read, understood and followed the project's AI_POLICY.md when creating code, documentation etc. for this pull request.

Submitted by: @oberstet
Date: 2025-11-25
Related issue(s): #1787
Branch: oberstet:modernization-phase-1.2
228 changes: 228 additions & 0 deletions ISSUES_TO_FIX.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
# Issues To Fix

## Security/Stability Issues

### Issue #1789: NULL Pointer Dereference in nvx_utf8vld_new
- **Reporter**: University of Athens researchers
- **Severity**: Medium (unlikely in practice, but valid bug)
- **Location**: `autobahn/nvx/_utf8validator.c:110-115`
- **Problem**: `malloc()` return value not checked before dereferencing
- **Similar Issue**: `autobahn/nvx/_xormasker.c` likely has same pattern
- **Fix Priority**: Include in Phase 1.2 or later
- **Estimated Effort**: < 1 hour (add NULL checks, return error to Python)
- **Link**: https://github.com/crossbario/autobahn-python/issues/1789

## Future Modernization Considerations

### Rust Rewrite of NVX Module
- See analysis in this file below
- Not critical, but would eliminate entire class of memory safety issues
- **Estimated Effort**: 1-2 weeks (see detailed analysis)


---

## Detailed Analysis: Rust Rewrite of NVX Module

### Current State
- **2 C files**: `_utf8validator.c` (693 LOC), `_xormasker.c` (199 LOC)
- **Total**: ~892 lines of C code
- **No .h headers**: Functions exposed via CFFI `ffi.cdef()`
- **SIMD optimizations**: SSE2, SSE4.1 (compile-time selection)
- **CFFI-based**: Already using CFFI (not CPyExt) - good for PyPy

### API Surface (Must Remain Unchanged)

**UTF-8 Validator (8 functions)**:
```c
void* nvx_utf8vld_new();
void nvx_utf8vld_reset(void* utf8vld);
int nvx_utf8vld_validate(void* utf8vld, const uint8_t* data, size_t length);
void nvx_utf8vld_free(void* utf8vld);
int nvx_utf8vld_set_impl(void* utf8vld, int impl);
int nvx_utf8vld_get_impl(void* utf8vld);
size_t nvx_utf8vld_get_current_index(void* utf8vld);
size_t nvx_utf8vld_get_total_index(void* utf8vld);
```

**XOR Masker (7 functions)**:
```c
void* nvx_xormask_new(const uint8_t* mask);
void nvx_xormask_reset(void* xormask);
size_t nvx_xormask_pointer(void* xormask);
void nvx_xormask_process(void* xormask, uint8_t* data, size_t length);
void nvx_xormask_free(void* xormask);
int nvx_xormask_set_impl(void* xormask, int impl);
int nvx_xormask_get_impl(void* xormask);
```

### Rust Rewrite Approach

**Strategy**: Keep C API, rewrite implementation in Rust

#### Architecture
```
┌─────────────────────────────────────────┐
│ Python (unchanged) │
│ from _nvx_utf8validator import lib │
└──────────────────┬──────────────────────┘
│ CFFI
┌──────────────────▼──────────────────────┐
│ C FFI Shim (thin, ~50 lines) │
│ #[no_mangle] extern "C" functions │
└──────────────────┬──────────────────────┘
┌──────────────────▼──────────────────────┐
│ Rust Implementation │
│ - Safe memory management (Box) │
│ - SIMD via std::arch intrinsics │
│ - No unsafe malloc/free │
└─────────────────────────────────────────┘
```

#### Key Components

1. **Memory Management** (✅ Major Benefit)
- Replace `malloc/free` → Rust `Box::new() / drop`
- Automatic safety: no NULL deref, no double-free, no use-after-free
- `void*` → `Box<Validator>` converted to raw pointer for C FFI

2. **SIMD Code** (⚠️ Requires Care)
- C: `__SSE2__`, `__SSE4_1__` preprocessor + intrinsics
- Rust: `#[cfg(target_feature)]` + `std::arch::x86_64`
- Nearly 1:1 mapping of intrinsics (e.g., `_mm_load_si128` → same in Rust)
- Runtime detection via `is_x86_feature_detected!("sse4.1")`

3. **FFI Boundary** (⚠️ Unsafe but Isolated)
```rust
#[no_mangle]
pub extern "C" fn nvx_utf8vld_new() -> *mut c_void {
let validator = Box::new(Utf8Validator::new());
Box::into_raw(validator) as *mut c_void
}

#[no_mangle]
pub extern "C" fn nvx_utf8vld_free(ptr: *mut c_void) {
if !ptr.is_null() {
unsafe {
let _ = Box::from_raw(ptr as *mut Utf8Validator);
}
}
}
```

### Effort Estimation

#### Phase 1: Setup & Basic Structure (2-3 days)
- [ ] Create Rust library crate (`autobahn-nvx` or in-tree)
- [ ] Setup CFFI to compile Rust instead of C
- [ ] Define Rust structs matching C structs
- [ ] Implement FFI shims (15 functions)
- [ ] Wire up build system (Cargo + CFFI integration)

#### Phase 2: Port Core Logic (3-5 days)
- [ ] Port UTF-8 validator scalar implementation
- [ ] Port XOR masker scalar implementation
- [ ] Port UTF-8 DFA state machine
- [ ] Add comprehensive tests (unit tests in Rust)
- [ ] Validate against existing Python tests

#### Phase 3: SIMD Optimizations (3-5 days)
- [ ] Port SSE2 UTF-8 validator
- [ ] Port SSE4.1 UTF-8 validator (if used)
- [ ] Port SSE2 XOR masker
- [ ] Feature detection & runtime selection
- [ ] Benchmark against C version (must match or exceed)

#### Phase 4: Integration & Validation (2-3 days)
- [ ] Integration testing with autobahn-python
- [ ] Performance benchmarking (throughput, latency)
- [ ] Test on CPython 3.11-3.14
- [ ] Test on PyPy 3.11
- [ ] Cross-platform testing (Linux x86-64, ARM64, macOS, Windows)
- [ ] Memory profiling (no leaks, no regressions)

**Total Estimated Effort**: 10-16 days (2-3 weeks)

### Benefits of Rust Rewrite

✅ **Security & Correctness**:
- Eliminates NULL pointer dereferences (issue #1789 class)
- Prevents buffer overflows
- Prevents use-after-free
- Prevents double-free
- Safe memory management by default

✅ **Maintainability**:
- Better type system
- Clearer ownership semantics
- Modern tooling (cargo, clippy, rustfmt)
- Built-in testing framework

✅ **Performance**:
- Likely same or better (Rust's LLVM backend)
- Better optimization opportunities (aliasing rules)
- No unexpected overhead

✅ **PyPy Compatibility**:
- Still uses CFFI (not CPyExt)
- No change to PyPy story

### Risks & Challenges

⚠️ **SIMD Portability**:
- Must support same architectures as C version
- Runtime feature detection complexity
- Need to test on ARM64 NEON (if C version supports it)

⚠️ **Build Complexity**:
- CFFI + Cargo integration
- Cross-compilation for wheels
- CI/CD updates for Rust toolchain

⚠️ **Learning Curve**:
- Team needs Rust familiarity
- SIMD intrinsics knowledge still required

⚠️ **Binary Size**:
- Rust stdlib might increase binary size slightly
- Mitigated by `#![no_std]` if needed

### Recommendation

**Priority**: Medium-Low (Phase 2 or later)

**Rationale**:
1. Current C code works, issue #1789 fixable in < 1 hour
2. Rust rewrite is 2-3 weeks effort vs. immediate value
3. Better to focus on Phase 1 modernization first
4. Consider after Phase 1 complete, if:
- More memory safety issues found
- Team has Rust expertise
- Time available for non-critical improvements

**Alternative Approach**:
- Fix issue #1789 with NULL checks (Phase 1.2)
- Add comprehensive fuzzing tests for C code
- Revisit Rust rewrite in Phase 2 or 3

### If We Do Proceed

**Prerequisites**:
- [ ] Team Rust training (1-2 weeks)
- [ ] Rust SIMD intrinsics study (3-5 days)
- [ ] CFFI + Cargo integration prototype (1-2 days)
- [ ] Performance baseline established (benchmarks)

**Deliverables**:
- [ ] Rust crate with C-compatible API
- [ ] Drop-in replacement for C files
- [ ] No Python code changes required
- [ ] Performance parity or better
- [ ] Comprehensive test suite
- [ ] Documentation

---

**Summary**: Rust rewrite is feasible (2-3 weeks), provides significant safety benefits, but not urgent. Fix immediate issue #1789 first, revisit Rust later if strategic value aligns.

1 change: 0 additions & 1 deletion docs/AI_AUDIT_PROCESS.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/AI_AUDIT_PROCESS_REVIEW.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/AI_GUIDELINES.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/AI_POLICY.md

This file was deleted.

1 change: 1 addition & 0 deletions docs/ai/AI_AUDIT_FILE.md
1 change: 1 addition & 0 deletions docs/ai/AI_AUDIT_PROCESS.md
1 change: 1 addition & 0 deletions docs/ai/AI_AUDIT_PROCESS_REVIEW.md
1 change: 1 addition & 0 deletions docs/ai/AI_GUIDELINES.md
1 change: 1 addition & 0 deletions docs/ai/AI_POLICY.md
6 changes: 3 additions & 3 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
sys.path.insert(0, os.path.abspath("./_extensions"))
sys.path.insert(0, os.path.abspath(".."))
sys.path.insert(0, os.path.abspath("../src"))
sys.path.insert(0, os.path.abspath("."))

# monkey-patch txaio so that we can "use" both twisted *and* asyncio,
Expand Down Expand Up @@ -114,7 +114,7 @@ def autodoc_skip_member(app, what, name, obj, skip, options):
# needs_sphinx = '1.0'

autoapi_type = "python"
autoapi_dirs = ["../autobahn"]
autoapi_dirs = ["../src/autobahn"]
# autoapi_ignore doesn’t interpret filesystem-style paths — it uses
# glob-style matching relative to autoapi_dirs roots.
autoapi_ignore = [
Expand Down Expand Up @@ -193,7 +193,7 @@ def autodoc_skip_member(app, what, name, obj, skip, options):
# built documents.
#
base_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), os.pardir))
with open(os.path.join(base_dir, "autobahn", "_version.py")) as f:
with open(os.path.join(base_dir, "src", "autobahn", "_version.py")) as f:
exec(f.read()) # defines __version__

version = release = __version__
Expand Down
23 changes: 19 additions & 4 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,24 @@ Contents
wamp/examples
environments/index
OVERVIEW.md
AI_POLICY.md
AI_GUIDELINES.md
AI_AUDIT_PROCESS.md
AI_AUDIT_PROCESS_REVIEW.md

AI Policy
---------

.. toctree::
:maxdepth: 2

ai/AI_POLICY.md
ai/AI_GUIDELINES.md
ai/AI_AUDIT_FILE.md
ai/AI_AUDIT_PROCESS.md
ai/AI_AUDIT_PROCESS_REVIEW.md

Releases
--------

.. toctree::
:maxdepth: 2

release-notes
changelog
Loading
Loading