Constrain CSV formats supported by ENSRainbow to single-column#1648
Constrain CSV formats supported by ENSRainbow to single-column#1648
Conversation
…ingle-column format only - Updated tests in cli.test.ts and convert-csv-command.test.ts to use single-column CSV files. - Refactored convert-csv-command.ts to reject multi-column CSV formats and adjusted documentation accordingly.
|
The latest updates on your projects. Learn more about Vercel for GitHub. 3 Skipped Deployments
|
🦋 Changeset detectedLatest commit: c865e21 The changes in this PR will be included in the next version bump. This PR includes changesets to release 19 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
Pull request overview
This PR constrains the ENSRainbow CSV converter to only support single-column CSV format (label only), removing the previously supported two-column format (label + labelhash). This change ensures all label-to-labelhash mappings are computed deterministically using the labelhash() function, rather than accepting pre-computed labelhashes that may be incorrect.
Changes:
- Removed support for two-column CSV format (label, labelhash)
- Updated CSV converter to reject multi-column CSV files with clear error messages
- Updated documentation to reflect single-column CSV format requirement
- Removed obsolete test fixture for invalid labelhash validation
- Updated all tests to verify proper rejection of multi-column CSV files
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
docs/ensnode.io/src/content/docs/ensrainbow/concepts/creating-files.mdx |
Updated CSV format documentation to describe single-column requirement and removed references to two-column format |
apps/ensrainbow/test/fixtures/test_labels_invalid_hash.csv |
Deleted obsolete test fixture that was used to test labelhash validation in two-column format |
apps/ensrainbow/src/commands/convert-csv-command.ts |
Simplified createRainbowRecord() to only handle single-column format, removed LabelHash import, removed column detection logic, and removed unused parameters from processRecord() |
apps/ensrainbow/src/commands/convert-csv-command.test.ts |
Updated tests to verify rejection of multi-column CSV files and removed tests for two-column format support and invalid labelhash validation |
apps/ensrainbow/src/cli.test.ts |
Updated test fixtures from test_labels_2col.csv to test_labels_1col.csv |
.gitignore |
Added additional ignore patterns for ENSRainbow temporary and versioned directories |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
📝 WalkthroughWalkthroughCSV input is restricted to a single-column format (label only); two-column (label + labelhash) support is removed. Labelhashes are now computed deterministically from labels. Tests, docs, conversion logic, a changeset, and .gitignore were updated. Changes
Sequence Diagram(s)mermaid CLI->>Converter: stream row (label) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
docs/ensnode.io/src/content/docs/ensrainbow/concepts/creating-files.mdx (2)
44-44:⚠️ Potential issue | 🟠 MajorStale overview content still references the removed two-column format.
Line 44 (overview table) still says
CSV file (1 or 2 columns)and line 58 says "flexible column formats" — both directly contradict the updated single-column-only behavior introduced in this PR.✏️ Proposed fix
-| **CSV Conversion** | CSV file (1 or 2 columns) | Building new ENS rainbow tables | `pnpm run convert` | +| **CSV Conversion** | Single-column CSV file (label only) | Building new ENS rainbow tables | `pnpm run convert` |-The `convert` command processes CSV files with flexible column formats. +The `convert` command processes single-column CSV files (one label per line).Also applies to: 58-58
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/ensnode.io/src/content/docs/ensrainbow/concepts/creating-files.mdx` at line 44, Update the documentation to reflect the new single-column-only CSV behavior: change the overview table cell for "CSV Conversion" from "CSV file (1 or 2 columns)" to "CSV file (1 column)" and replace the phrase "flexible column formats" (near the later description) with wording that explicitly states only a single-column format is supported; search for the "CSV Conversion" table row and the sentence containing "flexible column formats" to locate and update the text.
153-163:⚠️ Potential issue | 🟠 MajorStale "How It Works" section still describes the removed two-column detection and labelhash validation logic.
Steps 1 and 4 (and their sub-bullets) describe behavior that no longer exists: automatic column-count detection, two-column labelhash validation, and format requirements for labelhash input. These contradict the updated code which only accepts single-column input and always computes labelhashes deterministically.
✏️ Proposed fix
-1. **Detects** CSV format automatically (1 or 2 columns) +1. **Reads** each row as a single-column entry (label only); rejects any row with more than one column 2. **Streams** CSV parsing using fast-csv for memory efficiency 3. **Validates** column count and data format -4. **Computes** or validates labelhashes as needed - - For single-column format: Computes labelhash using the `labelhash()` function - - For two-column format: Validates the format of the provided labelhash (does not verify it matches the label) - - Invalid labelhashes are rejected if they don't meet format requirements (66 characters including "0x" prefix, lowercase hex, valid hex format) +4. **Computes** labelhashes deterministically from labels using the `labelhash()` function 5. **Filters** existing labels if `--existing-db-path` is provided 6. **Filters** duplicate labels within the same CSV file 7. **Writes** .ensrainbow file as output🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/ensnode.io/src/content/docs/ensrainbow/concepts/creating-files.mdx` around lines 153 - 163, Update the "How It Works" section to reflect the current single-column-only behavior: remove the automatic CSV column-count detection and any mention of two-column input or labelhash validation (references to Steps 1 and 4 and the two-column sub-bullet), and instead state that the tool accepts a single-column CSV of labels and deterministically computes labelhashes using the labelhash() function for every entry; also delete the bullet about labelhash format requirements (66 chars, "0x" prefix, lowercase hex) and any text implying input labelhashes are validated or accepted..gitignore (1)
37-44:⚠️ Potential issue | 🟡 Minor
apps/ensrainbow/data(line 38) is now redundant and the new patterns are misplaced under the wrong section.
apps/ensrainbow/data*on line 42 already matches the literalapps/ensrainbow/data, making line 38 a dead entry. Additionally, the three new ENSRainbow patterns are placed after the# fallback-ensapi distcomment instead of under the# ENSRainbow datasection.🛠️ Proposed fix
# ENSRainbow data -apps/ensrainbow/data +apps/ensrainbow/data* +apps/ensrainbow/temp* +apps/ensrainbow/v2* # fallback-ensapi dist apps/fallback-ensapi/dist -apps/ensrainbow/data* -apps/ensrainbow/temp* -apps/ensrainbow/v2*🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.gitignore around lines 37 - 44, The .gitignore contains a redundant literal entry "apps/ensrainbow/data" which is already matched by the broader pattern "apps/ensrainbow/data*", and the three ENSRainbow patterns ("apps/ensrainbow/data*", "apps/ensrainbow/temp*", "apps/ensrainbow/v2*") are placed beneath the wrong comment ("# fallback-ensapi dist"); remove the duplicate "apps/ensrainbow/data" line and move the three ENSRainbow patterns so they appear directly under the "# ENSRainbow data" comment, leaving the "# fallback-ensapi dist" section intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/ensrainbow/src/commands/convert-csv-command.test.ts`:
- Around line 66-80: Remove the duplicate two-column CSV rejection test by
deleting or skipping the less-descriptive fixture-based test that calls
convertCsvCommand with inputFile from TEST_FIXTURES_DIR and outputFile in
tempDir (the test titled "should reject two-column CSV (multi-column formats are
not supported)" around lines 66-80); keep the inline CSV-based test (the one
around lines 622-638) which is more self-documenting, and ensure the remaining
test still asserts the same error regex against convertCsvCommand and uses the
same LabelSetId and silent flag.
In `@apps/ensrainbow/src/commands/convert-csv-command.ts`:
- Around line 315-331: The label value is redundantly coerced with
String(row[0]) in createRainbowRecord; simplify by assigning label = row[0] (row
is already typed as string[]). Update the label variable assignment in the
createRainbowRecord function (keep the existing labelHashBytes =
labelHashToBytes(labelhash(label)) usage intact) and remove the unnecessary
String() wrapper.
---
Outside diff comments:
In @.gitignore:
- Around line 37-44: The .gitignore contains a redundant literal entry
"apps/ensrainbow/data" which is already matched by the broader pattern
"apps/ensrainbow/data*", and the three ENSRainbow patterns
("apps/ensrainbow/data*", "apps/ensrainbow/temp*", "apps/ensrainbow/v2*") are
placed beneath the wrong comment ("# fallback-ensapi dist"); remove the
duplicate "apps/ensrainbow/data" line and move the three ENSRainbow patterns so
they appear directly under the "# ENSRainbow data" comment, leaving the "#
fallback-ensapi dist" section intact.
In `@docs/ensnode.io/src/content/docs/ensrainbow/concepts/creating-files.mdx`:
- Line 44: Update the documentation to reflect the new single-column-only CSV
behavior: change the overview table cell for "CSV Conversion" from "CSV file (1
or 2 columns)" to "CSV file (1 column)" and replace the phrase "flexible column
formats" (near the later description) with wording that explicitly states only a
single-column format is supported; search for the "CSV Conversion" table row and
the sentence containing "flexible column formats" to locate and update the text.
- Around line 153-163: Update the "How It Works" section to reflect the current
single-column-only behavior: remove the automatic CSV column-count detection and
any mention of two-column input or labelhash validation (references to Steps 1
and 4 and the two-column sub-bullet), and instead state that the tool accepts a
single-column CSV of labels and deterministically computes labelhashes using the
labelhash() function for every entry; also delete the bullet about labelhash
format requirements (66 chars, "0x" prefix, lowercase hex) and any text implying
input labelhashes are validated or accepted.
- Removed the test case that checked for rejection of two-column CSV files, as it is now redundant with the existing test for multi-column CSV formats. - Simplified the label extraction in the CSV conversion function by removing unnecessary string conversion.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Clarified that the `convert` command processes single-column CSV files with one label per line. - Updated related sections in the FAQ and creating files documentation to reflect this change.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
apps/ensrainbow/src/commands/convert-csv-command.ts (2)
538-644:⚠️ Potential issue | 🟠 Major
outputStreamis scoped insidetry— file descriptor leaks on every error path.
outputStreamis bound inside thetryblock via destructuring frominitializeConversion. Thefinallyblock cannot reach it. Any throw betweeninitializeConversionand line 592 (outputStream.end()) — including a multi-column CSV rejection fromprocessCSVFile— leaves the write stream's file descriptor open.processCSVFilecallscsvStream.destroy()andfileStream.destroy()on error but never touchesoutputStream.On Windows, the unclosed handle also prevents the
afterEachteardown (rm(tempDir, { recursive: true })) from deleting the partial output file.🔧 Proposed fix — hoist `outputStream` to be `finally`-accessible
let existingDb: ENSRainbowDB | null = openedDb; let dedupDb: DeduplicationDB | undefined; let tempDb: ClassicLevel<string, string> | undefined; let temporaryDedupDir: string | null = null; + let outputStream: ReturnType<typeof setupWriteStream> | undefined; try { const { RainbowRecordType, - outputStream, + outputStream: stream, existingDb: db, } = await initializeConversion(options, labelSetVersion, outputFile, existingDb); + outputStream = stream; existingDb = db; // ... existing body unchanged ... - // Close output stream - outputStream.end(); + await new Promise<void>((resolve, reject) => { + outputStream!.end((err?: Error | null) => (err ? reject(err) : resolve())); + });And in
finally, after the existing cleanup:+ // Destroy output stream if it was not cleanly ended + if (outputStream && !outputStream.writableEnded) { + outputStream.destroy(); + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/ensrainbow/src/commands/convert-csv-command.ts` around lines 538 - 644, The outputStream variable is declared inside the try (via destructuring from initializeConversion) and can leak on error paths; move/hoist outputStream to an outer scope (declare let outputStream: WriteStream | null = null before the try), assign it from the destructured result of initializeConversion (keep RainbowRecordType and existingDb assignment as-is), and in the finally block check if outputStream is non-null then gracefully close it (call end(), and if still open call destroy() or handle errors) to ensure the file descriptor is always released even if processCSVFile or other operations throw.
47-60:⚠️ Potential issue | 🟡 Minor
DeduplicationDB.has()silently swallows all DB errors, not only "not-found".The catch-all
catch (_error) { return false; }treats every error (I/O failures, corruption,LEVEL_LOCKED, etc.) as a cache miss, allowing duplicates to silently bypass deduplication. classic-level 1.4.1 throws not-found errors witherror.notFound === true. Narrow the catch to check for this flag and rethrow other errors:🛡️ Proposed fix — narrow the catch
try { await this.db.get(key); return true; } catch (error) { - return false; + if (error instanceof Error && (error as any).notFound === true) { + return false; + } + throw error; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/ensrainbow/src/commands/convert-csv-command.ts` around lines 47 - 60, The has method in DeduplicationDB currently swallows all errors in the try/catch around this.db.get(key); update DeduplicationDB.has to only treat a missing-key error as a false return: catch the error from db.get, check for the classic-level not-found indicator (error.notFound === true or equivalent), return false only in that case, and rethrow any other errors (I/O, LEVEL_LOCKED, corruption) so callers can surface failures; keep the existing pendingWrites check and behavior for true.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@apps/ensrainbow/src/commands/convert-csv-command.ts`:
- Around line 538-644: The outputStream variable is declared inside the try (via
destructuring from initializeConversion) and can leak on error paths; move/hoist
outputStream to an outer scope (declare let outputStream: WriteStream | null =
null before the try), assign it from the destructured result of
initializeConversion (keep RainbowRecordType and existingDb assignment as-is),
and in the finally block check if outputStream is non-null then gracefully close
it (call end(), and if still open call destroy() or handle errors) to ensure the
file descriptor is always released even if processCSVFile or other operations
throw.
- Around line 47-60: The has method in DeduplicationDB currently swallows all
errors in the try/catch around this.db.get(key); update DeduplicationDB.has to
only treat a missing-key error as a false return: catch the error from db.get,
check for the classic-level not-found indicator (error.notFound === true or
equivalent), return false only in that case, and rethrow any other errors (I/O,
LEVEL_LOCKED, corruption) so callers can surface failures; keep the existing
pendingWrites check and behavior for true.
---
Duplicate comments:
In `@apps/ensrainbow/src/commands/convert-csv-command.test.ts`:
- Around line 66-77: The test uses the fixture test_labels_invalid_first.csv to
assert convertCsvCommand rejects for multiple columns but appears to duplicate
the inline CSV test at lines ~606–623; open and verify the contents of
test_labels_invalid_first.csv to confirm it contains a two-column case, then
either (a) update the fixture to cover a distinct invalid scenario (so it’s not
redundant) or (b) remove/replace this fixture-based test and keep only the
inline test; reference convert-csv-command.test.ts, the test case "should reject
CSV with more than one column", convertCsvCommand, and the fixture name
test_labels_invalid_first.csv when making the change.
… error handling - Removed the test case for rejecting multi-column CSV files, as it is now redundant. - Improved error handling in the CSV conversion function to ensure proper cleanup of output streams on error. - Updated the import statement to include the WriteStream type for better type safety.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 9 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/ensrainbow/src/commands/convert-csv-command.ts`:
- Around line 614-627: The error-path cleanup destroys the WriteStream but
leaves the incomplete output file on disk; update the error branch that handles
outputStream to also delete the partial file by calling fs.promises.unlink (or
fs.unlinkSync) on the stream's file path (use outputStream.path or the existing
output path variable if present) inside a try/catch and log success/failure via
logger; ensure this runs after outputStream.destroy() so the descriptor is
closed and suppress ENOENT if the file is already gone.
In `@docs/ensnode.io/src/content/docs/ensrainbow/concepts/creating-files.mdx`:
- Line 86: Update the sentence about CSV input to explicitly state that the CSV
converter does not detect or skip header rows and will store the first line as a
label if a header is present; for example, modify the existing line referencing
"The CSV converter expects a single-column CSV file with one label per line,
without a header row" to add a note that any header row (e.g., "label") will be
ingested as a data entry and must be removed by the user before upload. Ensure
the wording mentions "CSV converter" and clarifies "no automatic header
detection or skipping" so readers know to strip headers manually.
| // Clean up output stream if it wasn't gracefully ended (error path). | ||
| // After end(), writable is false, so this only triggers on error paths. | ||
| if (outputStream?.writable) { | ||
| try { | ||
| // Suppress errors from in-flight writes whose async I/O completes | ||
| // after destroy (e.g. "Cannot call write after a stream was destroyed"). | ||
| // On error paths the output file is incomplete anyway. | ||
| outputStream.on("error", () => {}); | ||
| outputStream.destroy(); | ||
| logger.info("Destroyed output stream on error path"); | ||
| } catch (error) { | ||
| logger.warn(`Failed to destroy output stream: ${error}`); | ||
| } | ||
| } |
There was a problem hiding this comment.
Partial output file is not deleted on the error path.
Destroying the stream (line 622) closes the file descriptor, but the incomplete .ensrainbow file written so far remains on disk (it will contain at least the protobuf header from writeHeader). Callers that attempt to ingest this partial file will fail silently or with a confusing error.
Consider adding output file cleanup in the error path:
🛡️ Proposed fix — delete partial output file on error
// Clean up output stream if it wasn't gracefully ended (error path).
// After end(), writable is false, so this only triggers on error paths.
if (outputStream?.writable) {
try {
outputStream.on("error", () => {});
outputStream.destroy();
logger.info("Destroyed output stream on error path");
+ // Remove the incomplete output file left behind after stream destruction
+ try {
+ rmSync(outputFile, { force: true });
+ logger.info(`Removed incomplete output file on error path: ${outputFile}`);
+ } catch (cleanupError) {
+ logger.warn(`Failed to remove incomplete output file: ${cleanupError}`);
+ }
} catch (error) {
logger.warn(`Failed to destroy output stream: ${error}`);
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/ensrainbow/src/commands/convert-csv-command.ts` around lines 614 - 627,
The error-path cleanup destroys the WriteStream but leaves the incomplete output
file on disk; update the error branch that handles outputStream to also delete
the partial file by calling fs.promises.unlink (or fs.unlinkSync) on the
stream's file path (use outputStream.path or the existing output path variable
if present) inside a try/catch and log success/failure via logger; ensure this
runs after outputStream.destroy() so the descriptor is closed and suppress
ENOENT if the file is already gone.
| ### CSV Format | ||
|
|
||
| The CSV converter supports two formats and expects CSV files **without a header row**. | ||
| The CSV converter expects a single-column CSV file with one label per line, **without a header row**. |
There was a problem hiding this comment.
Clarify that header rows are stored as data — no automatic skipping occurs.
The phrase "without a header row" tells users not to include one, but if they do, the header row will be stored as a label entry (no detection or skipping). A user who adds a common CSV header like label will silently end up with that string in their dataset.
Consider adding a note such as:
-The CSV converter expects a single-column CSV file with one label per line, **without a header row**.
+The CSV converter expects a single-column CSV file with one label per line, **without a header row**. Every row is treated as a label — there is no header detection or skipping.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| The CSV converter expects a single-column CSV file with one label per line, **without a header row**. | |
| The CSV converter expects a single-column CSV file with one label per line, **without a header row**. Every row is treated as a label — there is no header detection or skipping. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/ensnode.io/src/content/docs/ensrainbow/concepts/creating-files.mdx` at
line 86, Update the sentence about CSV input to explicitly state that the CSV
converter does not detect or skip header rows and will store the first line as a
label if a header is present; for example, modify the existing line referencing
"The CSV converter expects a single-column CSV file with one label per line,
without a header row" to add a note that any header row (e.g., "label") will be
ingested as a data entry and must be removed by the user before upload. Ensure
the wording mentions "CSV converter" and clarifies "no automatic header
detection or skipping" so readers know to strip headers manually.
Lite PR
Tip: Review docs on the ENSNode PR process
Summary
convert-csvcommand to single-column CSV format (one label per line). The two-column format (label + labelhash) is no longer supported.labelhash(), removing the risk of incorrect mappings from untrusted labelhash values.Why
Testing
convert-csv-command.test.ts: replaced two-column success tests with rejection tests; removed invalid-hash and missing-column tests that no longer apply.cli.test.ts: switched fixture references fromtest_labels_2col.csvtotest_labels_1col.csv.test_labels_invalid_hash.csvfixture (no longer relevant).Notes for Reviewer (Optional)
ensrainbow(see changeset). Callers using two-column CSV files will need to strip the labelhash column.convert-command-sql.ts) is unaffected — it's a separate code path.Pre-Review Checklist (Blocking)