BREAKING(csv): honour fieldsPerRecord when mapping rows to objects#7141
BREAKING(csv): honour fieldsPerRecord when mapping rows to objects#7141fibibot wants to merge 1 commit into
fieldsPerRecord when mapping rows to objects#7141Conversation
…parse()` and `CsvParseStream` When `skipFirstRow` or `columns` is set together with `fieldsPerRecord: -1` (or `fieldsPerRecord` left undefined, which is the same variable-length mode), `convertRowToObject` previously threw "The record has X fields, but the header has Y fields" unconditionally on length mismatch, ignoring the `fieldsPerRecord` setting. It now respects the mode: when variable-length records are permitted, short rows yield `undefined` for missing header keys and extra fields beyond the header list are dropped. Strict modes (`fieldsPerRecord >= 0`) keep the existing length check. The mapped-row value type widens from `Record<string, string>` to `Record<string, string | undefined>` (and the same shift in `RecordWithColumn`) so the static type reflects the runtime behaviour. This is a TS-level breaking change for callers reading values cookie-style without `noUncheckedIndexedAccess`. Fixes #6434.
|
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7141 +/- ##
=======================================
Coverage 94.61% 94.61%
=======================================
Files 634 634
Lines 51830 51838 +8
Branches 9341 9342 +1
=======================================
+ Hits 49037 49045 +8
Misses 2218 2218
Partials 575 575 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| * | ||
| * const string = "name,age\nAlice,34\nBob\n"; | ||
| * const result = parse(string, { skipFirstRow: true }); | ||
| * |
There was a problem hiding this comment.
suggestion : add also an example when rows have more fields?
a \n b,c => {a:"b"} // extra "c" got dropped
| const out: Record<string, unknown> = {}; | ||
| const out: Record<string, string | undefined> = {}; | ||
| for (const [index, header] of headers.entries()) { | ||
| out[header] = row[index]; |
There was a problem hiding this comment.
suggestion : maybe add if (index === row.length) break; , so no field is assigned undefined, and you can avoid adding undefined to union type everywhere
see #7153
bartlomieju
left a comment
There was a problem hiding this comment.
Solid fix for a real bug — fieldsPerRecord: -1 (and the implicit default) is documented as variable-length but convertRowToObject was unconditionally strict, so any caller using skipFirstRow/columns got the documented behavior contradicted. Plumbing an allowVariableLength flag through is the right shape, and the test coverage hits all three repros from #6434 plus a strict-mode guard test.
One thing worth thinking about: the type widening from Record<K, string> to Record<K, string | undefined> is unconditional, even for users in a strict mode (fieldsPerRecord >= 0) where undefined cannot occur at runtime. That's a mild DX regression for strict-mode users, but encoding the mode in the type would require pushing fieldsPerRecord into ParseResult and significantly more conditional-type plumbing. I think unconditional widening is the right call — flagging for discussion only.
A few inline notes on small consistency / clarity improvements.
| row: readonly string[], | ||
| headers: readonly string[], | ||
| zeroBasedLine: number, | ||
| allowVariableLength: boolean = false, |
There was a problem hiding this comment.
Two call sites compute this flag independently — parse.ts does fieldsPerRecord === undefined || fieldsPerRecord < 0, the stream does #fieldsPerRecord === "ANY". Both are correct today, but they're an invariant pair (the stream's normalization step defines what ANY means). Consider exporting a small isVariableLength(fieldsPerRecord: number | undefined): boolean helper from _io.ts and using it on both sides, so if the rules ever change (e.g. someone wants fieldsPerRecord: null to also mean variable-length) there's a single place to touch. Not blocking.
| allowVariableLength: boolean = false, | ||
| ) { | ||
| if (row.length !== headers.length) { | ||
| if (!allowVariableLength && row.length !== headers.length) { |
There was a problem hiding this comment.
Worth a one-line comment that this check still earns its keep in strict mode: the outer parser already enforces record.length === fieldsPerRecord, but columns.length can differ from fieldsPerRecord (e.g. the fieldsPerRecord: 2, columns: ["foo"] test case), so this is the only place that catches that misuse. Otherwise a future cleanup might delete it as redundant.
| record, | ||
| this.#headers, | ||
| this.#zeroBasedLineIndex, | ||
| this.#fieldsPerRecord === "ANY", |
There was a problem hiding this comment.
Minor readability: lifting this to a named local would mirror the shape of the sync parse() call site and make the intent obvious without having to recall what "ANY" means in the discriminated union.
const allowVariableLength = this.#fieldsPerRecord === "ANY";
controller.enqueue(convertRowToObject(
record,
this.#headers,
this.#zeroBasedLineIndex,
allowVariableLength,
));|
|
||
| const zeroBasedFirstLineIndex = options.skipFirstRow ? 1 : 0; | ||
| const allowVariableLength = options.fieldsPerRecord === undefined || | ||
| options.fieldsPerRecord < 0; |
There was a problem hiding this comment.
Note that this intentionally excludes fieldsPerRecord: 0 (the "infer from first row, then enforce" mode) — which is correct, since once inferred it should behave strictly. Worth a brief inline comment to that effect; otherwise a reader who only knows the -1 semantics might think 0 was overlooked.
| * `options.columns`, it returns `Record<string, string>[]`. | ||
| * `options.columns`, it returns `Record<string, string | undefined>[]`. Values | ||
| * are typed as `string | undefined` to reflect that variable-length records | ||
| * (the default when `fieldsPerRecord` is undefined or negative) may produce |
There was a problem hiding this comment.
Nice new example. Suggest also documenting the extra-fields shape here (fields beyond headers.length are silently dropped) — it's tested but not shown in user-facing docs, and "missing fields become undefined" naturally raises the question "what about extra ones?"
| columns: ["foo", "bar", "baz"], | ||
| columns: ["foo"], | ||
| fieldsPerRecord: 2, | ||
| }), |
There was a problem hiding this comment.
Good explicit strict-mode regression test. One more variant worth considering: fieldsPerRecord: 0 (infer-then-enforce) combined with skipFirstRow and a subsequent short row — to lock in that the inferred strict mode still throws and isn't accidentally collapsed into variable-length. Optional.
Summary
Fixes #6434.
CsvParseStreamandparse()documentfieldsPerRecord: -1(or undefined,the default) as allowing variable-length records, and that worked when
emitting
string[][]. OnceskipFirstRow: trueorcolumns: [...]was set,parsing threw
Syntax error on line N: The record has X fields, but the header has Y fieldseven withfieldsPerRecord: -1, becauseconvertRowToObjectthrew unconditionally on length mismatch, ignoring thefieldsPerRecordsetting.This change makes
convertRowToObjectaccept anallowVariableLengthflagthat both call sites compute from the active
fieldsPerRecordmode (thestream's
#fieldsPerRecord === "ANY"; for the syncparse(),options.fieldsPerRecord === undefined || options.fieldsPerRecord < 0).Runtime behaviour:
fieldsPerRecordundefined or negative): short rowsyield
undefinedfor missing header keys, extra fields are dropped.fieldsPerRecord >= 0): unchanged. The existingexpected N fields but got Mcheck still fires beforeconvertRowToObject, and the residual length-vs-header check still throwswhen
columnsdiffers in length from a positivefieldsPerRecord.Type widening (BREAKING)
The mapped-row value type widens from
Record<string, string>toRecord<string, string | undefined>(and the same shift insideRecordWithColumn) so callers see the runtime possibility ofundefined.This is a TS-level breaking change for callers reading values cookie-style
without
noUncheckedIndexedAccess. The issue calls this out; wideningunconditionally was preferred over gating on
fieldsPerRecordbecause thedefault mode (no
fieldsPerRecordset) is variable-length, so any callerusing
skipFirstRoworcolumnswith the defaultfieldsPerRecordcan nowobserve
undefinedand the type should reflect that.Test plan
deno fmt --check csv/deno lint csv/deno test -A --doc csv/(72 tests / 199 steps, all passing)fieldsPerRecorddoesn't work withskipFirstRoworcolumns#6434(skipFirstRow / columns / both) in both
csv/parse_test.tsandcsv/parse_stream_test.ts.fieldsPerRecorderror paths still tested.Closes bartlomieju/orchid-inbox#88