Skip to content

Constrain CSV formats supported by ENSRainbow to single-column#1648

Draft
djstrong wants to merge 5 commits intomainfrom
1491-constrain-csv-formats-supported-by-ensrainbow-to-single-column
Draft

Constrain CSV formats supported by ENSRainbow to single-column#1648
djstrong wants to merge 5 commits intomainfrom
1491-constrain-csv-formats-supported-by-ensrainbow-to-single-column

Conversation

@djstrong
Copy link
Contributor

@djstrong djstrong commented Feb 17, 2026

Lite PR

Tip: Review docs on the ENSNode PR process

Summary

  • Constrain convert-csv command to single-column CSV format (one label per line). The two-column format (label + labelhash) is no longer supported.
  • All labelhashes are now computed deterministically from labels using labelhash(), removing the risk of incorrect mappings from untrusted labelhash values.
  • Update tests and documentation to reflect the single-column-only format.

Why

  • The two-column format (label + labelhash) allowed callers to supply arbitrary labelhash values that were never verified against the label. This created a risk of silent data corruption from incorrect or malicious mappings.
  • Computing labelhashes deterministically from labels is the only correct approach and eliminates an entire class of bugs.

Testing

  • Updated convert-csv-command.test.ts: replaced two-column success tests with rejection tests; removed invalid-hash and missing-column tests that no longer apply.
  • Updated cli.test.ts: switched fixture references from test_labels_2col.csv to test_labels_1col.csv.
  • Deleted test_labels_invalid_hash.csv fixture (no longer relevant).

Notes for Reviewer (Optional)

  • This is a breaking change for ensrainbow (see changeset). Callers using two-column CSV files will need to strip the labelhash column.
  • The SQL dump converter (convert-command-sql.ts) is unaffected — it's a separate code path.

Pre-Review Checklist (Blocking)

  • This PR does not introduce significant changes and is low-risk to review quickly.
  • Relevant changesets are included (or are not required)

…ingle-column format only

- Updated tests in cli.test.ts and convert-csv-command.test.ts to use single-column CSV files.
- Refactored convert-csv-command.ts to reject multi-column CSV formats and adjusted documentation accordingly.
Copilot AI review requested due to automatic review settings February 17, 2026 21:48
@vercel
Copy link
Contributor

vercel bot commented Feb 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

3 Skipped Deployments
Project Deployment Actions Updated (UTC)
admin.ensnode.io Skipped Skipped Feb 17, 2026 11:15pm
ensnode.io Skipped Skipped Feb 17, 2026 11:15pm
ensrainbow.io Skipped Skipped Feb 17, 2026 11:15pm

@changeset-bot
Copy link

changeset-bot bot commented Feb 17, 2026

🦋 Changeset detected

Latest commit: c865e21

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 19 packages
Name Type
ensrainbow Major
ensindexer Major
ensadmin Major
ensapi Major
fallback-ensapi Major
@ensnode/datasources Major
@ensnode/ensrainbow-sdk Major
@ensnode/ponder-metadata Major
@ensnode/ensnode-schema Major
@ensnode/ensnode-react Major
@ensnode/ensnode-sdk Major
@ensnode/ponder-sdk Major
@ensnode/ponder-subgraph Major
@ensnode/shared-configs Major
@docs/ensnode Major
@docs/ensrainbow Major
@docs/mintlify Major
@namehash/ens-referrals Major
@namehash/namehash-ui Major

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR constrains the ENSRainbow CSV converter to only support single-column CSV format (label only), removing the previously supported two-column format (label + labelhash). This change ensures all label-to-labelhash mappings are computed deterministically using the labelhash() function, rather than accepting pre-computed labelhashes that may be incorrect.

Changes:

  • Removed support for two-column CSV format (label, labelhash)
  • Updated CSV converter to reject multi-column CSV files with clear error messages
  • Updated documentation to reflect single-column CSV format requirement
  • Removed obsolete test fixture for invalid labelhash validation
  • Updated all tests to verify proper rejection of multi-column CSV files

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
docs/ensnode.io/src/content/docs/ensrainbow/concepts/creating-files.mdx Updated CSV format documentation to describe single-column requirement and removed references to two-column format
apps/ensrainbow/test/fixtures/test_labels_invalid_hash.csv Deleted obsolete test fixture that was used to test labelhash validation in two-column format
apps/ensrainbow/src/commands/convert-csv-command.ts Simplified createRainbowRecord() to only handle single-column format, removed LabelHash import, removed column detection logic, and removed unused parameters from processRecord()
apps/ensrainbow/src/commands/convert-csv-command.test.ts Updated tests to verify rejection of multi-column CSV files and removed tests for two-column format support and invalid labelhash validation
apps/ensrainbow/src/cli.test.ts Updated test fixtures from test_labels_2col.csv to test_labels_1col.csv
.gitignore Added additional ignore patterns for ENSRainbow temporary and versioned directories

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@coderabbitai
Copy link

coderabbitai bot commented Feb 17, 2026

📝 Walkthrough

Walkthrough

CSV input is restricted to a single-column format (label only); two-column (label + labelhash) support is removed. Labelhashes are now computed deterministically from labels. Tests, docs, conversion logic, a changeset, and .gitignore were updated.

Changes

Cohort / File(s) Summary
Changeset & Git ignore
./.changeset/single-column-csv.md, ./.gitignore
Adds a major changeset documenting the single-column CSV requirement; extends .gitignore with apps/ensrainbow/data*, apps/ensrainbow/temp*, and apps/ensrainbow/v2*.
Core CSV conversion
apps/ensrainbow/src/commands/convert-csv-command.ts
Enforces exactly one CSV column (label), removes multi-column handling and LabelHash input validation, always computes labelHash bytes from the label, simplifies function signature (removed expectedColumns/lineNumber), and ensures output stream cleanup on errors.
Tests
apps/ensrainbow/src/cli.test.ts, apps/ensrainbow/src/commands/convert-csv-command.test.ts
Replaced 2-column fixtures with 1-column fixtures; removed/rewrote tests that previously accepted two-column input and added assertions that multi-column CSVs are rejected with "Expected 1 column (label only)"-style errors; adjusted related header/malformed-row expectations.
Documentation
docs/ensnode.io/src/content/docs/ensrainbow/concepts/creating-files.mdx, docs/ensnode.io/src/content/docs/ensrainbow/faq.mdx
Updates guidance to require single-column CSV (one label per line), removes two-column examples and labelhash validation text, and documents deterministic labelhash generation from labels.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant CLI as CLI
participant Converter as CSV Converter
participant Hasher as labelHash()
participant DedupDB as Deduplication DB
participant ENSDB as ENSRainbow DB
participant Writer as Output Stream

CLI->>Converter: stream row (label)
Converter->>Converter: validate 1 column
Converter->>Hasher: compute labelHash(label)
Hasher-->>Converter: labelHashBytes
Converter->>DedupDB: check duplicate(labelHashBytes)
DedupDB-->>Converter: unique?|duplicate
alt unique
Converter->>ENSDB: check existing(labelHashBytes)
ENSDB-->>Converter: exists?|not exists
alt not exists
Converter->>Writer: write Rainbow record (label,labelHashBytes,...)
end
end
Converter->>CLI: report stats / errors

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I munched a CSV, one column delight,

Labels bloom and hashes grow right,
No stray hashes wandering free,
Rows are tidy, simple, and wee,
Hooray — one column sets us light ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main change: constraining CSV formats to single-column only.
Description check ✅ Passed The description follows the template with all required sections completed: Summary, Why, Testing, Notes for Reviewer, and Pre-Review Checklist.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 1491-constrain-csv-formats-supported-by-ensrainbow-to-single-column

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
docs/ensnode.io/src/content/docs/ensrainbow/concepts/creating-files.mdx (2)

44-44: ⚠️ Potential issue | 🟠 Major

Stale overview content still references the removed two-column format.

Line 44 (overview table) still says CSV file (1 or 2 columns) and line 58 says "flexible column formats" — both directly contradict the updated single-column-only behavior introduced in this PR.

✏️ Proposed fix
-| **CSV Conversion** | CSV file (1 or 2 columns) | Building new ENS rainbow tables | `pnpm run convert` |
+| **CSV Conversion** | Single-column CSV file (label only) | Building new ENS rainbow tables | `pnpm run convert` |
-The `convert` command processes CSV files with flexible column formats.
+The `convert` command processes single-column CSV files (one label per line).

Also applies to: 58-58

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/ensnode.io/src/content/docs/ensrainbow/concepts/creating-files.mdx` at
line 44, Update the documentation to reflect the new single-column-only CSV
behavior: change the overview table cell for "CSV Conversion" from "CSV file (1
or 2 columns)" to "CSV file (1 column)" and replace the phrase "flexible column
formats" (near the later description) with wording that explicitly states only a
single-column format is supported; search for the "CSV Conversion" table row and
the sentence containing "flexible column formats" to locate and update the text.

153-163: ⚠️ Potential issue | 🟠 Major

Stale "How It Works" section still describes the removed two-column detection and labelhash validation logic.

Steps 1 and 4 (and their sub-bullets) describe behavior that no longer exists: automatic column-count detection, two-column labelhash validation, and format requirements for labelhash input. These contradict the updated code which only accepts single-column input and always computes labelhashes deterministically.

✏️ Proposed fix
-1. **Detects** CSV format automatically (1 or 2 columns)
+1. **Reads** each row as a single-column entry (label only); rejects any row with more than one column
 2. **Streams** CSV parsing using fast-csv for memory efficiency
 3. **Validates** column count and data format
-4. **Computes** or validates labelhashes as needed
-   - For single-column format: Computes labelhash using the `labelhash()` function
-   - For two-column format: Validates the format of the provided labelhash (does not verify it matches the label)
-   - Invalid labelhashes are rejected if they don't meet format requirements (66 characters including "0x" prefix, lowercase hex, valid hex format)
+4. **Computes** labelhashes deterministically from labels using the `labelhash()` function
 5. **Filters** existing labels if `--existing-db-path` is provided
 6. **Filters** duplicate labels within the same CSV file
 7. **Writes** .ensrainbow file as output
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/ensnode.io/src/content/docs/ensrainbow/concepts/creating-files.mdx`
around lines 153 - 163, Update the "How It Works" section to reflect the current
single-column-only behavior: remove the automatic CSV column-count detection and
any mention of two-column input or labelhash validation (references to Steps 1
and 4 and the two-column sub-bullet), and instead state that the tool accepts a
single-column CSV of labels and deterministically computes labelhashes using the
labelhash() function for every entry; also delete the bullet about labelhash
format requirements (66 chars, "0x" prefix, lowercase hex) and any text implying
input labelhashes are validated or accepted.
.gitignore (1)

37-44: ⚠️ Potential issue | 🟡 Minor

apps/ensrainbow/data (line 38) is now redundant and the new patterns are misplaced under the wrong section.

apps/ensrainbow/data* on line 42 already matches the literal apps/ensrainbow/data, making line 38 a dead entry. Additionally, the three new ENSRainbow patterns are placed after the # fallback-ensapi dist comment instead of under the # ENSRainbow data section.

🛠️ Proposed fix
 # ENSRainbow data
-apps/ensrainbow/data
+apps/ensrainbow/data*
+apps/ensrainbow/temp*
+apps/ensrainbow/v2*
 
 # fallback-ensapi dist
 apps/fallback-ensapi/dist
-apps/ensrainbow/data*
-apps/ensrainbow/temp*
-apps/ensrainbow/v2*
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gitignore around lines 37 - 44, The .gitignore contains a redundant literal
entry "apps/ensrainbow/data" which is already matched by the broader pattern
"apps/ensrainbow/data*", and the three ENSRainbow patterns
("apps/ensrainbow/data*", "apps/ensrainbow/temp*", "apps/ensrainbow/v2*") are
placed beneath the wrong comment ("# fallback-ensapi dist"); remove the
duplicate "apps/ensrainbow/data" line and move the three ENSRainbow patterns so
they appear directly under the "# ENSRainbow data" comment, leaving the "#
fallback-ensapi dist" section intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/ensrainbow/src/commands/convert-csv-command.test.ts`:
- Around line 66-80: Remove the duplicate two-column CSV rejection test by
deleting or skipping the less-descriptive fixture-based test that calls
convertCsvCommand with inputFile from TEST_FIXTURES_DIR and outputFile in
tempDir (the test titled "should reject two-column CSV (multi-column formats are
not supported)" around lines 66-80); keep the inline CSV-based test (the one
around lines 622-638) which is more self-documenting, and ensure the remaining
test still asserts the same error regex against convertCsvCommand and uses the
same LabelSetId and silent flag.

In `@apps/ensrainbow/src/commands/convert-csv-command.ts`:
- Around line 315-331: The label value is redundantly coerced with
String(row[0]) in createRainbowRecord; simplify by assigning label = row[0] (row
is already typed as string[]). Update the label variable assignment in the
createRainbowRecord function (keep the existing labelHashBytes =
labelHashToBytes(labelhash(label)) usage intact) and remove the unnecessary
String() wrapper.

---

Outside diff comments:
In @.gitignore:
- Around line 37-44: The .gitignore contains a redundant literal entry
"apps/ensrainbow/data" which is already matched by the broader pattern
"apps/ensrainbow/data*", and the three ENSRainbow patterns
("apps/ensrainbow/data*", "apps/ensrainbow/temp*", "apps/ensrainbow/v2*") are
placed beneath the wrong comment ("# fallback-ensapi dist"); remove the
duplicate "apps/ensrainbow/data" line and move the three ENSRainbow patterns so
they appear directly under the "# ENSRainbow data" comment, leaving the "#
fallback-ensapi dist" section intact.

In `@docs/ensnode.io/src/content/docs/ensrainbow/concepts/creating-files.mdx`:
- Line 44: Update the documentation to reflect the new single-column-only CSV
behavior: change the overview table cell for "CSV Conversion" from "CSV file (1
or 2 columns)" to "CSV file (1 column)" and replace the phrase "flexible column
formats" (near the later description) with wording that explicitly states only a
single-column format is supported; search for the "CSV Conversion" table row and
the sentence containing "flexible column formats" to locate and update the text.
- Around line 153-163: Update the "How It Works" section to reflect the current
single-column-only behavior: remove the automatic CSV column-count detection and
any mention of two-column input or labelhash validation (references to Steps 1
and 4 and the two-column sub-bullet), and instead state that the tool accepts a
single-column CSV of labels and deterministically computes labelhashes using the
labelhash() function for every entry; also delete the bullet about labelhash
format requirements (66 chars, "0x" prefix, lowercase hex) and any text implying
input labelhashes are validated or accepted.

- Removed the test case that checked for rejection of two-column CSV files, as it is now redundant with the existing test for multi-column CSV formats.
- Simplified the label extraction in the CSV conversion function by removing unnecessary string conversion.
Copilot AI review requested due to automatic review settings February 17, 2026 22:07
@vercel vercel bot temporarily deployed to Preview – admin.ensnode.io February 17, 2026 22:08 Inactive
@vercel vercel bot temporarily deployed to Preview – ensrainbow.io February 17, 2026 22:08 Inactive
@vercel vercel bot temporarily deployed to Preview – ensnode.io February 17, 2026 22:08 Inactive
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Clarified that the `convert` command processes single-column CSV files with one label per line.
- Updated related sections in the FAQ and creating files documentation to reflect this change.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
apps/ensrainbow/src/commands/convert-csv-command.ts (2)

538-644: ⚠️ Potential issue | 🟠 Major

outputStream is scoped inside try — file descriptor leaks on every error path.

outputStream is bound inside the try block via destructuring from initializeConversion. The finally block cannot reach it. Any throw between initializeConversion and line 592 (outputStream.end()) — including a multi-column CSV rejection from processCSVFile — leaves the write stream's file descriptor open. processCSVFile calls csvStream.destroy() and fileStream.destroy() on error but never touches outputStream.

On Windows, the unclosed handle also prevents the afterEach teardown (rm(tempDir, { recursive: true })) from deleting the partial output file.

🔧 Proposed fix — hoist `outputStream` to be `finally`-accessible
  let existingDb: ENSRainbowDB | null = openedDb;
  let dedupDb: DeduplicationDB | undefined;
  let tempDb: ClassicLevel<string, string> | undefined;
  let temporaryDedupDir: string | null = null;
+ let outputStream: ReturnType<typeof setupWriteStream> | undefined;

  try {
    const {
      RainbowRecordType,
-     outputStream,
+     outputStream: stream,
      existingDb: db,
    } = await initializeConversion(options, labelSetVersion, outputFile, existingDb);
+   outputStream = stream;
    existingDb = db;

    // ... existing body unchanged ...

-   // Close output stream
-   outputStream.end();
+   await new Promise<void>((resolve, reject) => {
+     outputStream!.end((err?: Error | null) => (err ? reject(err) : resolve()));
+   });

And in finally, after the existing cleanup:

+   // Destroy output stream if it was not cleanly ended
+   if (outputStream && !outputStream.writableEnded) {
+     outputStream.destroy();
+   }
  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ensrainbow/src/commands/convert-csv-command.ts` around lines 538 - 644,
The outputStream variable is declared inside the try (via destructuring from
initializeConversion) and can leak on error paths; move/hoist outputStream to an
outer scope (declare let outputStream: WriteStream | null = null before the
try), assign it from the destructured result of initializeConversion (keep
RainbowRecordType and existingDb assignment as-is), and in the finally block
check if outputStream is non-null then gracefully close it (call end(), and if
still open call destroy() or handle errors) to ensure the file descriptor is
always released even if processCSVFile or other operations throw.

47-60: ⚠️ Potential issue | 🟡 Minor

DeduplicationDB.has() silently swallows all DB errors, not only "not-found".

The catch-all catch (_error) { return false; } treats every error (I/O failures, corruption, LEVEL_LOCKED, etc.) as a cache miss, allowing duplicates to silently bypass deduplication. classic-level 1.4.1 throws not-found errors with error.notFound === true. Narrow the catch to check for this flag and rethrow other errors:

🛡️ Proposed fix — narrow the catch
  try {
    await this.db.get(key);
    return true;
  } catch (error) {
-   return false;
+   if (error instanceof Error && (error as any).notFound === true) {
+     return false;
+   }
+   throw error;
  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ensrainbow/src/commands/convert-csv-command.ts` around lines 47 - 60,
The has method in DeduplicationDB currently swallows all errors in the try/catch
around this.db.get(key); update DeduplicationDB.has to only treat a missing-key
error as a false return: catch the error from db.get, check for the
classic-level not-found indicator (error.notFound === true or equivalent),
return false only in that case, and rethrow any other errors (I/O, LEVEL_LOCKED,
corruption) so callers can surface failures; keep the existing pendingWrites
check and behavior for true.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@apps/ensrainbow/src/commands/convert-csv-command.ts`:
- Around line 538-644: The outputStream variable is declared inside the try (via
destructuring from initializeConversion) and can leak on error paths; move/hoist
outputStream to an outer scope (declare let outputStream: WriteStream | null =
null before the try), assign it from the destructured result of
initializeConversion (keep RainbowRecordType and existingDb assignment as-is),
and in the finally block check if outputStream is non-null then gracefully close
it (call end(), and if still open call destroy() or handle errors) to ensure the
file descriptor is always released even if processCSVFile or other operations
throw.
- Around line 47-60: The has method in DeduplicationDB currently swallows all
errors in the try/catch around this.db.get(key); update DeduplicationDB.has to
only treat a missing-key error as a false return: catch the error from db.get,
check for the classic-level not-found indicator (error.notFound === true or
equivalent), return false only in that case, and rethrow any other errors (I/O,
LEVEL_LOCKED, corruption) so callers can surface failures; keep the existing
pendingWrites check and behavior for true.

---

Duplicate comments:
In `@apps/ensrainbow/src/commands/convert-csv-command.test.ts`:
- Around line 66-77: The test uses the fixture test_labels_invalid_first.csv to
assert convertCsvCommand rejects for multiple columns but appears to duplicate
the inline CSV test at lines ~606–623; open and verify the contents of
test_labels_invalid_first.csv to confirm it contains a two-column case, then
either (a) update the fixture to cover a distinct invalid scenario (so it’s not
redundant) or (b) remove/replace this fixture-based test and keep only the
inline test; reference convert-csv-command.test.ts, the test case "should reject
CSV with more than one column", convertCsvCommand, and the fixture name
test_labels_invalid_first.csv when making the change.

… error handling

- Removed the test case for rejecting multi-column CSV files, as it is now redundant.
- Improved error handling in the CSV conversion function to ensure proper cleanup of output streams on error.
- Updated the import statement to include the WriteStream type for better type safety.
Copilot AI review requested due to automatic review settings February 17, 2026 23:14
@vercel vercel bot temporarily deployed to Preview – admin.ensnode.io February 17, 2026 23:14 Inactive
@vercel vercel bot temporarily deployed to Preview – ensnode.io February 17, 2026 23:14 Inactive
@vercel vercel bot temporarily deployed to Preview – ensrainbow.io February 17, 2026 23:14 Inactive
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/ensrainbow/src/commands/convert-csv-command.ts`:
- Around line 614-627: The error-path cleanup destroys the WriteStream but
leaves the incomplete output file on disk; update the error branch that handles
outputStream to also delete the partial file by calling fs.promises.unlink (or
fs.unlinkSync) on the stream's file path (use outputStream.path or the existing
output path variable if present) inside a try/catch and log success/failure via
logger; ensure this runs after outputStream.destroy() so the descriptor is
closed and suppress ENOENT if the file is already gone.

In `@docs/ensnode.io/src/content/docs/ensrainbow/concepts/creating-files.mdx`:
- Line 86: Update the sentence about CSV input to explicitly state that the CSV
converter does not detect or skip header rows and will store the first line as a
label if a header is present; for example, modify the existing line referencing
"The CSV converter expects a single-column CSV file with one label per line,
without a header row" to add a note that any header row (e.g., "label") will be
ingested as a data entry and must be removed by the user before upload. Ensure
the wording mentions "CSV converter" and clarifies "no automatic header
detection or skipping" so readers know to strip headers manually.

Comment on lines +614 to +627
// Clean up output stream if it wasn't gracefully ended (error path).
// After end(), writable is false, so this only triggers on error paths.
if (outputStream?.writable) {
try {
// Suppress errors from in-flight writes whose async I/O completes
// after destroy (e.g. "Cannot call write after a stream was destroyed").
// On error paths the output file is incomplete anyway.
outputStream.on("error", () => {});
outputStream.destroy();
logger.info("Destroyed output stream on error path");
} catch (error) {
logger.warn(`Failed to destroy output stream: ${error}`);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Partial output file is not deleted on the error path.

Destroying the stream (line 622) closes the file descriptor, but the incomplete .ensrainbow file written so far remains on disk (it will contain at least the protobuf header from writeHeader). Callers that attempt to ingest this partial file will fail silently or with a confusing error.

Consider adding output file cleanup in the error path:

🛡️ Proposed fix — delete partial output file on error
   // Clean up output stream if it wasn't gracefully ended (error path).
   // After end(), writable is false, so this only triggers on error paths.
   if (outputStream?.writable) {
     try {
       outputStream.on("error", () => {});
       outputStream.destroy();
       logger.info("Destroyed output stream on error path");
+      // Remove the incomplete output file left behind after stream destruction
+      try {
+        rmSync(outputFile, { force: true });
+        logger.info(`Removed incomplete output file on error path: ${outputFile}`);
+      } catch (cleanupError) {
+        logger.warn(`Failed to remove incomplete output file: ${cleanupError}`);
+      }
     } catch (error) {
       logger.warn(`Failed to destroy output stream: ${error}`);
     }
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ensrainbow/src/commands/convert-csv-command.ts` around lines 614 - 627,
The error-path cleanup destroys the WriteStream but leaves the incomplete output
file on disk; update the error branch that handles outputStream to also delete
the partial file by calling fs.promises.unlink (or fs.unlinkSync) on the
stream's file path (use outputStream.path or the existing output path variable
if present) inside a try/catch and log success/failure via logger; ensure this
runs after outputStream.destroy() so the descriptor is closed and suppress
ENOENT if the file is already gone.

### CSV Format

The CSV converter supports two formats and expects CSV files **without a header row**.
The CSV converter expects a single-column CSV file with one label per line, **without a header row**.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify that header rows are stored as data — no automatic skipping occurs.

The phrase "without a header row" tells users not to include one, but if they do, the header row will be stored as a label entry (no detection or skipping). A user who adds a common CSV header like label will silently end up with that string in their dataset.

Consider adding a note such as:

-The CSV converter expects a single-column CSV file with one label per line, **without a header row**.
+The CSV converter expects a single-column CSV file with one label per line, **without a header row**. Every row is treated as a label — there is no header detection or skipping.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
The CSV converter expects a single-column CSV file with one label per line, **without a header row**.
The CSV converter expects a single-column CSV file with one label per line, **without a header row**. Every row is treated as a label — there is no header detection or skipping.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/ensnode.io/src/content/docs/ensrainbow/concepts/creating-files.mdx` at
line 86, Update the sentence about CSV input to explicitly state that the CSV
converter does not detect or skip header rows and will store the first line as a
label if a header is present; for example, modify the existing line referencing
"The CSV converter expects a single-column CSV file with one label per line,
without a header row" to add a note that any header row (e.g., "label") will be
ingested as a data entry and must be removed by the user before upload. Ensure
the wording mentions "CSV converter" and clarifies "no automatic header
detection or skipping" so readers know to strip headers manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments