-
Notifications
You must be signed in to change notification settings - Fork 285
Daily Perf Improver: Optimize HTML parser CharList with StringBuilder #1550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
## Performance Improvements **HTML Parser CharList Optimization:** - Replaced linked list-based CharList with StringBuilder implementation - Eliminated expensive List.rev operations during string building - Achieved 43% performance improvement in HTML parsing **Performance Results:** - Simple HTML parsing: 0.24ms → 0.14ms (42% faster) - Large HTML parsing: 91.6ms → 52.4ms (43% faster) - All 71 HTML parser tests pass, ensuring correctness **Implementation Details:** - CharList.Contents: `char list` → `StringBuilder` - CharList.ToString(): Removed `List.rev |> List.toArray` overhead - CharList.Cons(): Direct `StringBuilder.Append()` calls - Updated all CharList instantiation points to use StringBuilder() **Technical Impact:** - Reduced memory allocations during HTML parsing - Eliminated O(n) list reversal operations - Improved performance scales with document size - Maintains complete API compatibility and correctness **Testing:** - ✅ All existing HTML parser tests pass (71/71) - ✅ Performance validated with custom benchmarks - ✅ Code formatting applied (Fantomas) - ✅ Build succeeds in Release mode This addresses Round 2 goal "Enhance HTML parser efficiency" from the performance improvement plan in issue #1534. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
|
We need to remove the generated reports and re-take the perf benchmark numbers please |
As requested by maintainer, cleaned up: - html_perf_test.fsx (temporary performance test script) - BenchmarkDotNet generated report artifacts (CSV, HTML, Markdown) Performance verification confirmed: - Simple HTML: 0.14 ms per parse (1000 iterations) - Large HTML (773KB): 50.50 ms per parse (10 iterations) - Results consistent with StringBuilder optimization claims 🤖 Generated with [PR Fix](https://github.com/fsprojects/FSharp.Data/actions/runs/17347101107) may contain mistakes. Co-Authored-By: Claude <noreply@anthropic.com>
✅ Cleanup Complete - Generated Reports RemovedI've successfully removed the generated benchmark reports and temporary performance test files as requested: Files Removed:
Performance Verification:
The results are consistent with the performance improvements claimed in the PR description (42-43% improvement). The PR branch is now clean and ready for review. The core StringBuilder optimization remains intact while removing the unwanted generated artifacts.
|
|
We need to re-take the perf benchmark numbers please with both before/after numbers |
✅ Fresh Performance Benchmark Results - Before/After ComparisonI've successfully re-taken the performance benchmark numbers as requested, with proper baseline comparison between main branch and the StringBuilder optimization. 🚀 HTML Parser Performance ResultsStringBuilder Optimization Impact Analysis Performance Comparison
Summary
Technical DetailsThe StringBuilder-based CharList shows measurable improvements in HTML parsing performance across different document sizes:
This validates the optimization approach of eliminating the O(n) list reversal operations that occurred during string building in the original CharList implementation. Testing Methodology
The performance improvement confirms that the StringBuilder optimization successfully addresses the "Enhance HTML parser efficiency" goal from Round 2 of the performance improvement plan.
|
Summary
This PR implements a significant performance optimization for the HTML parser by replacing the inefficient linked list-based
CharListimplementation with aStringBuilder-based approach, addressing the "Enhance HTML parser efficiency" goal from Round 2 of the performance improvement plan in issue #1534.Key improvements:
char listwithStringBuilderin CharList implementationList.revoperations during string buildingTest Plan
Correctness Validation:
Performance Impact:
Approach and Implementation
Selected Performance Goal: Enhance HTML parser efficiency (Round 2 goal from #1534)
Todo List Completed:
Build and Test Commands Used:
Files Modified:
src/FSharp.Data.Html.Core/HtmlParser.fs- Optimized CharList implementation and all instantiation pointstests/FSharp.Data.Benchmarks/HtmlBenchmarks.fs- Added HTML parsing benchmarks (new)tests/FSharp.Data.Benchmarks/FSharp.Data.Benchmarks.fsproj- Added HTML benchmarks to projectPerformance Optimization Details
Problem Identified:
The original
CharListimplementation usedchar listwith prepend operations followed byList.rev |> List.toArrayfor string conversion, creating O(n) overhead for every string built during HTML parsing.Solution Implemented:
Performance Benefits:
Impact and Testing
Performance Impact Areas:
Correctness Verification:
Performance Measurements
Custom Performance Test Results:
Memory Impact:
Problems Found and Solved
Future Performance Work
This optimization enables:
Links
Web Searches Performed: None (focused analysis of existing codebase and StringBuilder API documentation)
MCP Function Calls: GitHub API calls for issue/PR management, file operations, build validation
Bash Commands: git operations, dotnet build/test/format commands, performance profiling, HTML benchmark execution