Skip to content

BREAKING(csv): honour fieldsPerRecord when mapping rows to objects#7141

Open
fibibot wants to merge 1 commit into
mainfrom
orch/issue-88
Open

BREAKING(csv): honour fieldsPerRecord when mapping rows to objects#7141
fibibot wants to merge 1 commit into
mainfrom
orch/issue-88

Conversation

@fibibot
Copy link
Copy Markdown

@fibibot fibibot commented May 14, 2026

Summary

Fixes #6434.

CsvParseStream and parse() document fieldsPerRecord: -1 (or undefined,
the default) as allowing variable-length records, and that worked when
emitting string[][]. Once skipFirstRow: true or columns: [...] was set,
parsing threw Syntax error on line N: The record has X fields, but the header has Y fields even with fieldsPerRecord: -1, because
convertRowToObject threw unconditionally on length mismatch, ignoring the
fieldsPerRecord setting.

This change makes convertRowToObject accept an allowVariableLength flag
that both call sites compute from the active fieldsPerRecord mode (the
stream's #fieldsPerRecord === "ANY"; for the sync parse(),
options.fieldsPerRecord === undefined || options.fieldsPerRecord < 0).

Runtime behaviour:

  • Variable-length mode (fieldsPerRecord undefined or negative): short rows
    yield undefined for missing header keys, extra fields are dropped.
  • Strict modes (fieldsPerRecord >= 0): unchanged. The existing
    expected N fields but got M check still fires before
    convertRowToObject, and the residual length-vs-header check still throws
    when columns differs in length from a positive fieldsPerRecord.

Type widening (BREAKING)

The mapped-row value type widens from Record<string, string> to
Record<string, string | undefined> (and the same shift inside
RecordWithColumn) so callers see the runtime possibility of undefined.
This is a TS-level breaking change for callers reading values cookie-style
without noUncheckedIndexedAccess. The issue calls this out; widening
unconditionally was preferred over gating on fieldsPerRecord because the
default mode (no fieldsPerRecord set) is variable-length, so any caller
using skipFirstRow or columns with the default fieldsPerRecord can now
observe undefined and the type should reflect that.

Test plan

Closes bartlomieju/orchid-inbox#88

…parse()` and `CsvParseStream`

When `skipFirstRow` or `columns` is set together with `fieldsPerRecord: -1`
(or `fieldsPerRecord` left undefined, which is the same variable-length mode),
`convertRowToObject` previously threw "The record has X fields, but the header
has Y fields" unconditionally on length mismatch, ignoring the
`fieldsPerRecord` setting.

It now respects the mode: when variable-length records are permitted, short
rows yield `undefined` for missing header keys and extra fields beyond the
header list are dropped. Strict modes (`fieldsPerRecord >= 0`) keep the
existing length check.

The mapped-row value type widens from `Record<string, string>` to
`Record<string, string | undefined>` (and the same shift in `RecordWithColumn`)
so the static type reflects the runtime behaviour. This is a TS-level breaking
change for callers reading values cookie-style without
`noUncheckedIndexedAccess`.

Fixes #6434.
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions github-actions Bot added the csv label May 14, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.61%. Comparing base (5ea9159) to head (1b7ffac).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #7141   +/-   ##
=======================================
  Coverage   94.61%   94.61%           
=======================================
  Files         634      634           
  Lines       51830    51838    +8     
  Branches     9341     9342    +1     
=======================================
+ Hits        49037    49045    +8     
  Misses       2218     2218           
  Partials      575      575           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

@PengjuXu PengjuXu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work!

Comment thread csv/parse.ts
*
* const string = "name,age\nAlice,34\nBob\n";
* const result = parse(string, { skipFirstRow: true });
*
Copy link
Copy Markdown

@PengjuXu PengjuXu May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion : add also an example when rows have more fields?
a \n b,c => {a:"b"} // extra "c" got dropped

Comment thread csv/_io.ts
const out: Record<string, unknown> = {};
const out: Record<string, string | undefined> = {};
for (const [index, header] of headers.entries()) {
out[header] = row[index];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion : maybe add if (index === row.length) break; , so no field is assigned undefined, and you can avoid adding undefined to union type everywhere
see #7153

Copy link
Copy Markdown
Member

@bartlomieju bartlomieju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid fix for a real bug — fieldsPerRecord: -1 (and the implicit default) is documented as variable-length but convertRowToObject was unconditionally strict, so any caller using skipFirstRow/columns got the documented behavior contradicted. Plumbing an allowVariableLength flag through is the right shape, and the test coverage hits all three repros from #6434 plus a strict-mode guard test.

One thing worth thinking about: the type widening from Record<K, string> to Record<K, string | undefined> is unconditional, even for users in a strict mode (fieldsPerRecord >= 0) where undefined cannot occur at runtime. That's a mild DX regression for strict-mode users, but encoding the mode in the type would require pushing fieldsPerRecord into ParseResult and significantly more conditional-type plumbing. I think unconditional widening is the right call — flagging for discussion only.

A few inline notes on small consistency / clarity improvements.

Comment thread csv/_io.ts
row: readonly string[],
headers: readonly string[],
zeroBasedLine: number,
allowVariableLength: boolean = false,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two call sites compute this flag independently — parse.ts does fieldsPerRecord === undefined || fieldsPerRecord < 0, the stream does #fieldsPerRecord === "ANY". Both are correct today, but they're an invariant pair (the stream's normalization step defines what ANY means). Consider exporting a small isVariableLength(fieldsPerRecord: number | undefined): boolean helper from _io.ts and using it on both sides, so if the rules ever change (e.g. someone wants fieldsPerRecord: null to also mean variable-length) there's a single place to touch. Not blocking.

Comment thread csv/_io.ts
allowVariableLength: boolean = false,
) {
if (row.length !== headers.length) {
if (!allowVariableLength && row.length !== headers.length) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth a one-line comment that this check still earns its keep in strict mode: the outer parser already enforces record.length === fieldsPerRecord, but columns.length can differ from fieldsPerRecord (e.g. the fieldsPerRecord: 2, columns: ["foo"] test case), so this is the only place that catches that misuse. Otherwise a future cleanup might delete it as redundant.

Comment thread csv/parse_stream.ts
record,
this.#headers,
this.#zeroBasedLineIndex,
this.#fieldsPerRecord === "ANY",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor readability: lifting this to a named local would mirror the shape of the sync parse() call site and make the intent obvious without having to recall what "ANY" means in the discriminated union.

const allowVariableLength = this.#fieldsPerRecord === "ANY";
controller.enqueue(convertRowToObject(
  record,
  this.#headers,
  this.#zeroBasedLineIndex,
  allowVariableLength,
));

Comment thread csv/parse.ts

const zeroBasedFirstLineIndex = options.skipFirstRow ? 1 : 0;
const allowVariableLength = options.fieldsPerRecord === undefined ||
options.fieldsPerRecord < 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this intentionally excludes fieldsPerRecord: 0 (the "infer from first row, then enforce" mode) — which is correct, since once inferred it should behave strictly. Worth a brief inline comment to that effect; otherwise a reader who only knows the -1 semantics might think 0 was overlooked.

Comment thread csv/parse.ts
* `options.columns`, it returns `Record<string, string>[]`.
* `options.columns`, it returns `Record<string, string | undefined>[]`. Values
* are typed as `string | undefined` to reflect that variable-length records
* (the default when `fieldsPerRecord` is undefined or negative) may produce
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice new example. Suggest also documenting the extra-fields shape here (fields beyond headers.length are silently dropped) — it's tested but not shown in user-facing docs, and "missing fields become undefined" naturally raises the question "what about extra ones?"

Comment thread csv/parse_test.ts
columns: ["foo", "bar", "baz"],
columns: ["foo"],
fieldsPerRecord: 2,
}),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good explicit strict-mode regression test. One more variant worth considering: fieldsPerRecord: 0 (infer-then-enforce) combined with skipFirstRow and a subsequent short row — to lock in that the inferred strict mode still throws and isn't accidentally collapsed into variable-length. Optional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CsvParseStream: negative fieldsPerRecord doesn't work with skipFirstRow or columns

4 participants