Daily Perf Improver: Optimize CSV parser with iterative algorithms #1552
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements significant performance optimizations for the CSV parser by replacing recursive functions with iterative algorithms and optimizing data structure usage. This addresses the "Improve CSV streaming performance for large files" goal from Round 2 of the performance improvement plan in issue #1534.
Key improvements:
Test Plan
Correctness Validation:
Performance Impact:
Based on custom benchmarking script (csv_perf_test.fsx):
Approach and Implementation
Selected Performance Goal: Improve CSV streaming performance for large files (Round 2 goal from #1534)
Todo List Completed:
Build and Test Commands Used:
Files Modified:
src/FSharp.Data.Csv.Core/CsvRuntime.fs- Optimized CSV parsing core algorithmstests/FSharp.Data.Benchmarks/CsvBenchmarks.fs- Added CSV parsing benchmarks (new)tests/FSharp.Data.Benchmarks/FSharp.Data.Benchmarks.fsproj- Added CSV benchmarks to projecttests/FSharp.Data.Benchmarks/Program.fs- Added CSV benchmark execution optionscsv_perf_test.fsx- Custom performance testing script (new)Performance Optimization Details
Problem Identified:
The original CSV parser used recursive functions (
readString,readLine,readLines) with intermediate list building followed byList.revoperations, creating performance overhead for every line parsed.Solution Implemented:
Performance Benefits:
Impact and Testing
Performance Impact Areas:
Correctness Verification:
Memory and Scalability Benefits
Memory Impact:
Scalability Improvements:
Problems Found and Solved
Future Performance Work
This optimization enables:
Links
Web Searches Performed: None (focused analysis of existing codebase and performance profiling)
MCP Function Calls: GitHub API calls for issue/PR management, file operations, build validation
Bash Commands: git operations, dotnet build/test/format commands, performance profiling, CSV benchmarking