fix(csv): function parse option fieldsPerRecord when negative records may have less fields#7153
fix(csv): function parse option fieldsPerRecord when negative records may have less fields#7153PengjuXu wants to merge 1 commit into
Conversation
… may have less fields
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7153 +/- ##
=======================================
Coverage 94.61% 94.61%
=======================================
Files 634 634
Lines 51843 51855 +12
Branches 9346 9350 +4
=======================================
+ Hits 49050 49062 +12
Misses 2218 2218
Partials 575 575 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
bartlomieju
left a comment
There was a problem hiding this comment.
Thanks for catching this — the bug is real: with columns/skipFirstRow + fieldsPerRecord: -1, parse() throws despite the docs saying "no check is made and records may have a variable number of fields."
A few things to address before this can land:
1. CsvParseStream is not fixed. It calls the same convertRowToObject (csv/parse_stream.ts:460) and is not updated by this PR, so the bug persists for the streaming API. The fix + a parallel test should be added to parse_stream.ts / parse_stream_test.ts, otherwise the two APIs diverge.
2. Asymmetry contradicts the docs. fieldsPerRecord < 0 should allow any row length — fewer and more. This PR only permits fewer; more still throws (with a new message). You acknowledge this and point at #7141 as a better solution — agreed. It probably makes sense to resolve the design question on #7141 first rather than ship a half-fix here that needs immediate follow-up.
3. JSDoc update. With this change, a short row produces an object where missing columns are simply absent keys (not undefined, not ""). That's a reasonable choice but it's new public-visible behavior — please document it on ParseOptions.fieldsPerRecord in both parse.ts and parse_stream.ts.
Inline comments below for the smaller issues.
| headers: readonly string[], | ||
| zeroBasedLine: number, | ||
| /* allow less fields in a row than headers */ | ||
| allowLessRowFields = false, |
There was a problem hiding this comment.
Two small nits:
- Use
//for one-line comments — that's the convention in this file (see e.g.parse.ts:229). allowLessRowFields→allowFewerRowFields("fewer" for countables).
Also, threading a boolean flag through an internal helper for one call-site is a bit awkward. Consider passing fieldsPerRecord (or { variable: boolean }) so the policy lives inside the helper rather than at every caller — this would also make it easier to fix CsvParseStream symmetrically.
| `Syntax error on line ${ | ||
| zeroBasedLine + 1 | ||
| }: The record has ${row.length} fields, but the header allows maximum ${headers.length} fields`, | ||
| ); |
There was a problem hiding this comment.
Two issues here:
- Message wording: the header doesn't "allow" anything. Please mirror the existing
expected N fields but got Mstyle used elsewhere (seeparse_stream.ts:453): e.g.expected at most ${headers.length} fields but got ${row.length}. - This throws plain
Error, but the rest of the parser throwsSyntaxError(seeparse_stream.ts:450and the doc example atparse.ts:486). The pre-existing throw above is alsoError— worth fixing both while you're here for consistency.
| } | ||
| const out: Record<string, unknown> = {}; | ||
| for (const [index, header] of headers.entries()) { | ||
| if (index === row.length) break; |
There was a problem hiding this comment.
Minor: a mid-loop break reads a bit awkwardly. Could be expressed more directly as:
const limit = Math.min(row.length, headers.length);
for (let i = 0; i < limit; i++) {
out[headers[i]] = row[i];
}or headers.slice(0, row.length).forEach(...). Not blocking.
| row, | ||
| headers, | ||
| zeroBasedFirstLineIndex + i, | ||
| (options?.fieldsPerRecord ?? 0) < 0, |
There was a problem hiding this comment.
If you adopt the suggestion to push the fieldsPerRecord < 0 check inside convertRowToObject, this call site becomes a plain convertRowToObject(row, headers, zeroBasedFirstLineIndex + i, options.fieldsPerRecord) and the same change can be applied verbatim in parse_stream.ts:460.
| }), | ||
| Error, | ||
| "Syntax error on line 2: The record has 4 fields, but the header allows maximum 3 fields", | ||
| ); |
There was a problem hiding this comment.
Test name "mismatching number of headers and fields 3" is opaque — please name it after the case being asserted, e.g. "throws when row has more fields than columns with fieldsPerRecord: -1". Same suggestion for the new step at line ~313 — "Allow less row fields than columns" could be "allows rows with fewer fields than columns when fieldsPerRecord is negative".
|
@bartlomieju Thanks for detailed comments and pointing out my blind spots. |
problem
When invoked for object (not array) return, csv/parse function not work with fieldsPerRecord, which states
If negative, no check is made and records may have a variable number of fields.It throws error instead like
The record has 5 fields, but the header has 6 fieldsfix
Make it allow row to have less fields than header has. (see small commit's test change)
(It still does not allow more fields in row than header, as it is not clear what to do in the case)
I believe #7141 is the better solution (dropping extra fields in the row than header)