You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: build/jupyterize/SPECIFICATION.md
+111Lines changed: 111 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -189,6 +189,37 @@ elif step_name:
189
189
190
190
**See**: [Language-Specific Features](#language-specific-features) section for detailed implementation.
191
191
192
+
### 9. Unwrapping Patterns: Single‑line vs Multi‑line, and Dedenting (Based on Implementation Experience)
193
+
194
+
During implementation, several non‑obvious details significantly reduced bugs and rework:
195
+
196
+
- Pattern classes and semantics
197
+
- Single‑line patterns: When `start_pattern == end_pattern`, treat as “remove this line only”. Examples: `public class X {` or `public void Run() {` on one line.
198
+
- Multi‑line patterns: When `start_pattern != end_pattern`, remove the start line, everything until the end line, and the end line itself. Use this to strip a wrapper’s braces while preserving the inner code with a separate “keep content” strategy.
199
+
- Use anchored patterns with `^` to avoid over‑matching. Prefer `re.match` (anchored at the start) over `re.search`.
200
+
201
+
- Wrappers split across cells
202
+
- Real C# files often split wrappers across lines/blocks (e.g., class name on line N, `{` or `}` in later lines). Because parsing splits code into preamble/step cells, wrapper open/close tokens may land in separate cells.
203
+
- Practical approach: Use separate, simple patterns to remove opener lines (class/method declarations with `{` either on the same line or next line) and a generic pattern to remove solitary closing braces in any cell.
204
+
205
+
- Order of operations inside cell creation
206
+
1) Apply unwrapping patterns (in the order listed in configuration)
207
+
2) Dedent code (e.g., `textwrap.dedent`) so content previously nested inside wrappers aligns to column 0
208
+
3) Strip trailing whitespace (e.g., `rstrip()`)
209
+
4) Skip empty cells
210
+
211
+
- Dedent all cells when unwrapping is enabled
212
+
- Even if a particular cell didn’t change after unwrapping, its content may still be indented due to having originated inside a method/class in the source file. Dedent ALL cells whenever `unwrap_patterns` are configured for the language.
213
+
214
+
- Logging for traceability
215
+
- Emit `DEBUG` logs per applied pattern (e.g., pattern `type`) to simplify diagnosing regex issues.
216
+
217
+
- Safety tips for patterns
218
+
- Anchor with `^` and keep them specific; avoid overly greedy constructs.
219
+
- Keep patterns minimal and composable (e.g., separate `class_opening`, `method_opening`, `closing_braces`).
220
+
- Validate patterns at startup or wrap application with try/except to warn and continue on malformed regex.
221
+
222
+
192
223
---
193
224
194
225
## Code Quality Patterns
@@ -802,6 +833,86 @@ public class SyncLandingExample {
802
833
- Harder to maintain
803
834
- Breaks existing examples
804
835
836
+
### Configuration Schema and Semantics (Implementation-Proven)
- Apply `unwrap_code(code, language)` (sequentially over `unwrap_patterns`)
883
+
- Dedent with `textwrap.dedent(code)` whenever unwrapping is configured for the language
884
+
885
+
> Note: When language-specific features are enabled, prefer the extended signature `create_cells(parsed_blocks, language)` and the runtime order defined in the Language-Specific Features section (boilerplate → unwrap → dedent → rstrip → skip empty). The simplified example above illustrates the core cell construction only.
886
+
887
+
-`rstrip()` to remove trailing whitespace
888
+
- Skip cell if now empty
889
+
4) Add step metadata if available
890
+
891
+
This order ensures wrapper removal doesn’t leave code over-indented and avoids generating spurious empty cells.
892
+
893
+
### Testing Checklist (Language-Specific)
894
+
895
+
- Boilerplate
896
+
- First cell is boilerplate for languages with `boilerplate` configured
897
+
- Languages without `boilerplate` configured do not get a boilerplate cell
898
+
- Unwrapping
899
+
- Class and method wrappers (single-line and multi-line) are removed
900
+
- Closing braces are removed wherever they appear
901
+
- Inner content remains and is dedented to column 0
902
+
- Robustness
903
+
- Missing configuration file → proceed without boilerplate/unwrapping
904
+
- Malformed regex → warn and continue; no crash
905
+
- Real repository example file converts correctly end-to-end
906
+
907
+
### Edge Cases and Gotchas
908
+
909
+
- Wrappers split across cells: rely on separate opener and generic `}` patterns
910
+
- Dedent all cells when unwrapping is enabled (not only those that changed)
911
+
- Anchoring with `^` is crucial to avoid removing mid-line braces in string literals or comments
912
+
- Apply patterns in a safe order: openers before closers
913
+
- Tabs vs spaces: dedent works on common leading whitespace; prefer spaces in examples
0 commit comments