Skip to content

Commit d065b4c

Browse files
DOC-5831 fixed C# redundant closing brace issue
1 parent 467e25a commit d065b4c

File tree

4 files changed

+320
-80
lines changed

4 files changed

+320
-80
lines changed

build/jupyterize/SPECIFICATION.md

Lines changed: 81 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -258,7 +258,69 @@ During implementation, several non‑obvious details significantly reduced bugs
258258
- Keep patterns minimal and composable (e.g., separate `class_opening`, `method_opening`, `closing_braces`).
259259
- Validate patterns at startup or wrap application with try/except to warn and continue on malformed regex.
260260

261-
### 10. Pattern Count Differences Between Languages (Java Implementation Insight)
261+
### 10. Closing Brace Removal Must Be Match-Based, Not Pattern-Based (Critical Bug Fix)
262+
263+
**Problem**: The initial implementation removed closing braces based on the number of unwrap patterns configured, not the number of patterns that actually matched. This caused a critical bug where closing braces from control structures (for loops, foreach loops, if statements) were incorrectly removed.
264+
265+
**Example of the bug**:
266+
```csharp
267+
// Original code in a cell
268+
for (var i = 0; i < resultsList.Count; i++)
269+
{
270+
Console.WriteLine(i);
271+
}
272+
273+
// BUG: Closing brace was removed, resulting in:
274+
for (var i = 0; i < resultsList.Count; i++)
275+
{
276+
Console.WriteLine(i);
277+
// Missing }
278+
```
279+
280+
**Root cause**: The unwrapping logic counted braces to remove based on pattern configuration (e.g., "C# has 4 patterns with braces, so remove 4 closing braces from every cell"), rather than counting how many patterns actually matched in each specific cell.
281+
282+
**Solution**: Modified `remove_matching_lines()` to return a tuple `(modified_code, match_count)` and updated `unwrap_code()` to only remove closing braces when patterns actually match:
283+
284+
```python
285+
# Before (WRONG):
286+
for pattern_config in unwrap_patterns:
287+
code = remove_matching_lines(code, pattern, end_pattern)
288+
if '{' in pattern:
289+
braces_removed += 1 # Always increments!
290+
291+
# After (CORRECT):
292+
for pattern_config in unwrap_patterns:
293+
code, match_count = remove_matching_lines(code, pattern, end_pattern)
294+
if match_count > 0 and '{' in pattern:
295+
braces_removed += match_count # Only increments if pattern matched
296+
```
297+
298+
**Implementation details**:
299+
1. `remove_matching_lines()` now returns `(code, match_count)` instead of just `code`
300+
2. `unwrap_code()` tracks `braces_removed` based on actual matches, not pattern configuration
301+
3. `remove_trailing_braces()` scans from the end and removes only the exact number of trailing closing braces
302+
4. The `closing_braces` pattern was removed from configuration files (C# and Java) since it's now handled programmatically
303+
304+
**Time saved by documenting this**: ~2 hours of debugging similar issues in the future.
305+
306+
**Follow-up fix**: After implementing match-based brace removal, a second issue was discovered: cells containing **only** orphaned closing braces (from removed class/method wrappers) were still being included in the notebook. These cells appeared when the closing braces were after a REMOVE block, causing them to be parsed as a separate preamble cell.
307+
308+
**Solution**: Added a filter in `create_cells()` to skip cells that contain only closing braces and whitespace:
309+
310+
```python
311+
# Skip cells that contain only closing braces and whitespace
312+
# (orphaned closing braces from removed class/method wrappers)
313+
if lang_config.get('unwrap_patterns'):
314+
# Remove all whitespace and check if only closing braces remain
315+
code_no_whitespace = re.sub(r'\s', '', code)
316+
if code_no_whitespace and re.match(r'^}+$', code_no_whitespace):
317+
logging.debug(f"Skipping cell {i} (contains only closing braces)")
318+
continue
319+
```
320+
321+
This ensures that orphaned closing brace cells are completely removed from the final notebook.
322+
323+
### 11. Pattern Count Differences Between Languages (Java Implementation Insight)
262324

263325
**Key Discovery**: When adding Java support after C#, the pattern count increased from 5 to 8 patterns.
264326

@@ -1016,8 +1078,8 @@ Minimal example (C#) reflecting patterns that worked in practice:
10161078
{
10171079
"c#": {
10181080
"boilerplate": [
1019-
"#r \"nuget: NRedisStack, 0.12.0\"",
1020-
"#r \"nuget: StackExchange.Redis, 2.6.122\""
1081+
"#r \"nuget: NRedisStack\"",
1082+
"#r \"nuget: StackExchange.Redis\""
10211083
],
10221084
"unwrap_patterns": [
10231085
{ "type": "class_single_line", "pattern": "^\\s*public\\s+class\\s+\\w+.*\\{\\s*$", "end_pattern": "^\\s*public\\s+class\\s+\\w+.*\\{\\s*$", "keep_content": false },
@@ -1061,7 +1123,7 @@ Java has nearly identical structural wrapper patterns to C#, with additional pat
10611123

10621124
**Boilerplate considerations**:
10631125
- **C#**: Requires explicit NuGet package directives (`#r "nuget: ..."`)
1064-
- Example: `#r "nuget: NRedisStack, 0.12.0"`
1126+
- Example: `#r "nuget: NRedisStack"` (version numbers omitted to use latest)
10651127
- These are essential for C# notebooks to work
10661128
- **Java**: Jupyter notebooks typically use `%maven` or `%jars` magic commands, but these vary by kernel
10671129
- For IJava kernel: May need `%maven` directives
@@ -1220,6 +1282,7 @@ This order ensures wrapper removal doesn’t leave code over-indented and avoids
12201282
- **Multiple patterns in one file**: Real files combine class + method + annotation patterns
12211283
- **Indentation complexity**: Real code has varying indentation levels
12221284
- **Edge cases**: Real files expose issues like split wrappers across cells
1285+
- **Control structures**: Real files contain for loops, foreach loops, if statements with closing braces that must be preserved (see Critical Note #10)
12231286

12241287
**Recommended testing workflow**:
12251288
1. Write synthetic tests first (fast, focused, easy to debug)
@@ -1675,13 +1738,26 @@ This specification has been iteratively improved based on real implementation ex
16751738

16761739
### Version 5: After Java Implementation
16771740
- Added Java configuration examples with 8 unwrap patterns
1678-
- Added Critical Implementation Note #10: Pattern count differences between languages
1741+
- Added Critical Implementation Note #11: Pattern count differences between languages
16791742
- Added "Critical Testing Insight" section about testing with real repository files
16801743
- Updated Phase 3 implementation strategy with "examine real files first" guidance
16811744
- Corrected Java pattern count from initial estimate (6) to actual implementation (8)
16821745
- Added insights about pattern order importance in Java (`@Test` before methods)
16831746
- ~1720 lines
16841747

1748+
### Version 6: After Closing Brace Bug Fix
1749+
- Added Critical Implementation Note #10: Closing brace removal must be match-based, not pattern-based
1750+
- Fixed critical bug where control structure closing braces (for, foreach, if) were incorrectly removed
1751+
- Modified `remove_matching_lines()` to return `(code, match_count)` tuple
1752+
- Updated `unwrap_code()` to track braces based on actual matches, not pattern configuration
1753+
- Removed `closing_braces` pattern from C# and Java configurations (now handled programmatically)
1754+
- Added regression test `test_csharp_for_loop_braces()` to prevent future occurrences
1755+
- Updated "Critical Testing Insight" to mention control structure edge case
1756+
- **Follow-up fix**: Added filter to skip cells containing only orphaned closing braces
1757+
- Enhanced regression test to verify no orphaned closing brace cells exist
1758+
- **Boilerplate update**: Removed version numbers from C# NuGet directives (defaults to latest)
1759+
- ~1805 lines
1760+
16851761
### Key Lessons Learned
16861762

16871763
1. **Lead with pitfalls**: Critical notes at the beginning save hours of debugging

0 commit comments

Comments
 (0)