Skip to content

Add Merkle Tree–Based Chunk-Level Hashing for Dataset Verification this establishes dataset verification integrity .#8

Merged
Archit381 merged 11 commits intoAOSSIE-Org:mainfrom
aniket866:Merkel-root
Feb 27, 2026
Merged

Add Merkle Tree–Based Chunk-Level Hashing for Dataset Verification this establishes dataset verification integrity .#8
Archit381 merged 11 commits intoAOSSIE-Org:mainfrom
aniket866:Merkel-root

Conversation

@aniket866
Copy link
Contributor

@aniket866 aniket866 commented Feb 25, 2026

Closes #9

Summary

This PR introduces Merkle Tree–based chunk-level hashing to enhance dataset integrity verification.

Previously, the pipeline computed a single SHA256 hash for the entire raw and processed dataset. While this ensured file-level integrity, it required re-hashing the full dataset (potentially multi-gigabyte) to verify any portion of the data.

  • With this update, the system now supports chunk-level cryptographic verification using a Merkle Tree construction.

Problem Statement

  1. The existing pipeline only generated a single SHA256 checksum for:
  • Raw Wikipedia dump
  1. Processed cleaned dataset

Current approach has limitations:

  • Requires re-hashing the entire dataset to verify integrity

  • Not scalable for large datasets

  • No support for partial verification

  • No cryptographic structure for subset validation

  • For large training corpora, this becomes inefficient and impractical.

Proposed Solution

  1. Implemented a Merkle Tree–based hashing system that:

  2. Splits files into fixed-size chunks (default: 1MB)

  3. Computes SHA256 hash for each chunk (leaf nodes)

  4. Builds a Merkle Tree using raw-byte concatenation

  5. Produces a Merkle Root representing the entire dataset

  6. Stores the Merkle Root in the dataset manifest

  7. This allows cryptographic verification of specific chunks without re-hashing the entire file.

Implementation Details

  • Added compute_merkle_root() function

  • Reads file in chunks

  • Hashes each chunk using SHA256

  • Builds Merkle Tree bottom-up

  • Returns final Merkle Root (hex string)

Updated generate_manifest() to include:

  • raw_merkle_root

  • processed_merkle_root

chunk_size_bytes

Manifest Example (New Fields)

{
  "raw_sha256": "...",
  "processed_sha256": "...",
  "raw_merkle_root": "...",
  "processed_merkle_root": "...",
  "chunk_size_bytes": 1048576
}

Benefits

• Enables chunk-level verification
• Scales efficiently for large datasets
• Maintains deterministic hashing
• Improves reproducibility
• Aligns with blockchain-style verification models
• Backward compatible

Testing

  1. Added and validated tests for:

  2. Merkle root determinism

  3. Content modification detection

  4. Single-chunk compatibility with SHA256

  5. Empty file handling

  6. Manifest field presence

All tests pass:

16 passed in 1.88s

image

Impact

  • This change upgrades the pipeline from simple file hashing to structured cryptographic verification.

  • It significantly improves scalability and verification robustness for large training datasets while preserving existing functionality.

Checklist

  • My code follows the project's code style and conventions
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings or errors
  • I have joined the Discord server and I will share a link to this PR with the project maintainers there
  • I have read the Contributing Guidelines

⚠️ AI Notice - Important!

We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact.

@Archit381 Please check this out

Summary by CodeRabbit

  • New Features

    • Added Merkle-root based cryptographic checksums for file verification.
    • Manifest now includes Merkle root hashes and chunk-size metadata for raw and processed files.
    • Hashing enhanced to accept additional input formats for verification.
  • Tests

    • Added tests for Merkle determinism, content-change detection, single-chunk and empty-file edge cases.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds Merkle-tree based file hashing: new compute_merkle_root() that chunks files, hashes chunks with SHA-256, and builds a deterministic Merkle root. compute_sha256() now accepts bytes input. generate_manifest() is extended to include raw_merkle_root, processed_merkle_root, and chunk_size_bytes. Also adds *.venv to .gitignore.

Changes

Cohort / File(s) Summary
Configuration
\.gitignore
Added *.venv ignore pattern for Python virtual environment folders.
Merkle & Hashing Utilities
openverifiablellm/utils.py
Added compute_merkle_root(file_path, chunk_size=...) -> str. Updated compute_sha256() to accept Union[str, Path, bytes]. generate_manifest() now includes raw_merkle_root, processed_merkle_root, and chunk_size_bytes.
Tests
tests/test_util.py
Added tests for Merkle-related behavior: manifest contains merkle fields, determinism, sensitivity to content changes, single-chunk equality with SHA-256, and empty-file handling.

Sequence Diagram

sequenceDiagram
    rect rgba(200,220,255,0.5)
    participant User
    end
    rect rgba(200,255,200,0.5)
    participant Manifest as generate_manifest
    end
    rect rgba(255,230,200,0.5)
    participant Merkle as compute_merkle_root
    end
    participant FileSystem as "File System"
    participant SHA256

    User->>Manifest: request manifest(raw_path, processed_path)
    Manifest->>Merkle: compute_merkle_root(raw_path)
    Merkle->>FileSystem: read next chunk
    FileSystem-->>Merkle: chunk bytes
    Merkle->>SHA256: hash(chunk)
    SHA256-->>Merkle: leaf hash
    Merkle->>Merkle: aggregate pairs → parent hashes
    Merkle-->>Manifest: raw_merkle_root
    Manifest->>Merkle: compute_merkle_root(processed_path)
    Merkle->>FileSystem: read next chunk
    FileSystem-->>Merkle: chunk bytes
    Merkle->>SHA256: hash(chunk)
    SHA256-->>Merkle: leaf hash
    Merkle->>Merkle: aggregate → root
    Merkle-->>Manifest: processed_merkle_root
    Manifest->>Manifest: write manifest with merkle roots & chunk_size
    Manifest-->>User: return manifest JSON
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Suggested labels

Python Lang

Poem

🐰 I nibble bytes in tidy chunks,

hashes hop and leaf by leaf,
roots arise from little funks,
manifests hold my belief.
🌿✨

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title describes the main change (adding Merkle Tree–based chunk-level hashing) but is somewhat verbose and contains unclear phrasing 'this establishes dataset verification integrity' that doesn't add clarity.
Linked Issues check ✅ Passed The PR successfully implements all coding requirements from issue #9: compute_merkle_root function, updated generate_manifest with Merkle fields, bytes support in compute_sha256, and comprehensive tests for determinism, content changes, single-chunk equality, and empty files.
Out of Scope Changes check ✅ Passed The .gitignore change to add *.venv pattern is a minor, supporting change for development environment setup and is appropriately scoped to the PR's needs.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added size/M and removed size/M labels Feb 25, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@openverifiablellm/utils.py`:
- Around line 15-40: The function compute_merkle_root currently allows
chunk_size <= 0 which lets the read loop skip data (e.g., chunk_size=0) and
produce incorrect hashes; before opening the file, validate the chunk_size
parameter (ensure it's an int and > 0) and raise a clear ValueError for invalid
values so callers cannot pass zero/negative sizes, then proceed with the
existing logic that reads chunks into leaves and hashes them; reference
compute_merkle_root, chunk_size, and leaves when adding this guard.
- Around line 126-129: Introduce a single module-level constant (e.g.,
CHUNK_SIZE_BYTES) and use it everywhere instead of the hardcoded literal:
replace the literal 1024 * 1024 in the manifest dict's "chunk_size_bytes" and
pass CHUNK_SIZE_BYTES into compute_merkle_root calls (or align
compute_merkle_root's default to this constant) so hashing and manifest metadata
share the same value; update any other uses of the 1024*1024 literal to
reference CHUNK_SIZE_BYTES and ensure the manifest key "chunk_size_bytes" stores
that constant.

In `@tests/test_merkle.py`:
- Around line 9-53: Add a regression test in tests/test_merkle.py to lock the
“duplicate last leaf” rule used by utils.compute_merkle_root (see
duplicate-last-leaf at openverifiablellm/utils.py line 51): write a file that
produces an odd number of chunks (e.g., content length 9 with chunk_size=4
yields 3 leaves), compute each chunk's SHA-256 hex digest, duplicate the final
leaf to make pairs, iteratively hash pairwise concatenated hex strings (using
hashlib.sha256(...).hexdigest()) until a single root is produced, and assert
that this manually computed root equals utils.compute_merkle_root(file,
chunk_size=4); name the test e.g.
test_merkle_root_duplicates_last_leaf_for_odd_chunks so future changes to tree
construction will be caught.

In `@tests/test_util.py`:
- Around line 143-148: The test reads the manifest as raw text and asserts
substrings, which can produce false positives; change it to parse the JSON from
manifest_file (use json.loads on manifest_file.read_text() or json.load on the
file) and then assert that the resulting dict contains keys "raw_merkle_root",
"processed_merkle_root", and "chunk_size_bytes" and validate types/values (e.g.,
merkle roots are strings and chunk_size_bytes is an int > 0) instead of checking
raw text; update references to manifest_file and manifest accordingly in
tests/test_util.py.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c70d97d and b970aed.

📒 Files selected for processing (4)
  • .gitignore
  • openverifiablellm/utils.py
  • tests/test_merkle.py
  • tests/test_util.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@github-actions github-actions bot added size/M and removed size/M labels Feb 25, 2026
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@github-actions github-actions bot added size/M and removed size/M labels Feb 25, 2026
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@github-actions github-actions bot added size/M and removed size/M labels Feb 25, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
openverifiablellm/utils.py (1)

129-131: ⚠️ Potential issue | 🔴 Critical

Define MERKLE_CHUNK_SIZE_BYTES before use (current code crashes).

Line 129, Line 130, and Line 131 reference MERKLE_CHUNK_SIZE_BYTES, but it is not defined in this module, so generate_manifest() will fail with NameError.

🔧 Proposed fix
 logger = logging.getLogger(__name__)
+MERKLE_CHUNK_SIZE_BYTES = 1024 * 1024
 
 # Merkle Tree Chunk-Level Hashing for Large Files
-def compute_merkle_root(file_path: Union[str, Path], chunk_size: int = 1024 * 1024) -> str:
+def compute_merkle_root(file_path: Union[str, Path], chunk_size: int = MERKLE_CHUNK_SIZE_BYTES) -> str:
#!/bin/bash
set -e
# Verify symbol usages and ensure there is a module-level assignment
rg -n --type=py '\bMERKLE_CHUNK_SIZE_BYTES\b'
ast-grep --pattern 'MERKLE_CHUNK_SIZE_BYTES = $_'

Expected result: at least one assignment match plus the usage sites.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openverifiablellm/utils.py` around lines 129 - 131, generate_manifest() and
the compute_merkle_root calls reference MERKLE_CHUNK_SIZE_BYTES which is not
defined, causing a NameError; add a module-level constant definition (e.g.,
MERKLE_CHUNK_SIZE_BYTES = <appropriate integer>) near the top of the utils
module so the symbol exists before it is used, or import it from the correct
module if it belongs elsewhere, and then re-run tests that cover
generate_manifest() and compute_merkle_root to verify the fix.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_util.py`:
- Around line 143-145: Remove the stray diff marker "@@" from tests/test_util.py
(it causes a syntax error) and move the "import json" statement out of the
function body to the module-level imports at the top of the file with the other
imports; ensure the test function that defines "manifest_file = tmp_path /
'data/dataset_manifest.json'" no longer contains the stray token and that all
imports (including json) are declared before any function/class definitions so
the file parses correctly.

---

Duplicate comments:
In `@openverifiablellm/utils.py`:
- Around line 129-131: generate_manifest() and the compute_merkle_root calls
reference MERKLE_CHUNK_SIZE_BYTES which is not defined, causing a NameError; add
a module-level constant definition (e.g., MERKLE_CHUNK_SIZE_BYTES = <appropriate
integer>) near the top of the utils module so the symbol exists before it is
used, or import it from the correct module if it belongs elsewhere, and then
re-run tests that cover generate_manifest() and compute_merkle_root to verify
the fix.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b970aed and e58ef1d.

📒 Files selected for processing (2)
  • openverifiablellm/utils.py
  • tests/test_util.py

@github-actions github-actions bot added size/M and removed size/M labels Feb 25, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
tests/test_util.py (1)

133-149: 🧹 Nitpick | 🔵 Trivial

Consider adding value/type validation for robustness.

The test correctly parses JSON (addressing the past review suggestion), but only checks field presence. Adding type and value assertions would catch more regressions.

♻️ Optional enhancement
     assert "raw_merkle_root" in manifest
     assert "processed_merkle_root" in manifest
     assert "chunk_size_bytes" in manifest
+    # Validate types and expected values
+    assert isinstance(manifest["chunk_size_bytes"], int)
+    assert manifest["chunk_size_bytes"] > 0
+    assert len(manifest["raw_merkle_root"]) == 64  # SHA256 hex length
+    assert len(manifest["processed_merkle_root"]) == 64
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_util.py` around lines 133 - 149, Update the
test_manifest_contains_merkle_fields test to not only assert presence but also
validate types/values: after calling utils.generate_manifest and loading
manifest, assert that manifest["raw_merkle_root"] and
manifest["processed_merkle_root"] are non-empty strings (or bytes-encoded hex
strings) and that manifest["chunk_size_bytes"] is an integer > 0; keep
references to the existing test function name and utils.generate_manifest so the
assertions are added in the same test right after loading the manifest.
openverifiablellm/utils.py (1)

106-111: ⚠️ Potential issue | 🔴 Critical

MERKLE_CHUNK_SIZE_BYTES is undefined — code will raise NameError at runtime.

The constant is referenced on lines 108-110 but never defined in the module. Define it at module level and use it in the function default parameter for consistency.

Proposed fix

Add the constant after the logger (line 12):

 logger = logging.getLogger(__name__)
 
+MERKLE_CHUNK_SIZE_BYTES = 1024 * 1024
+
 # Merkle Tree Chunk-Level Hashing for Large Files
-def compute_merkle_root(file_path: Union[str, Path], chunk_size: int = 1024 * 1024) -> str:
+def compute_merkle_root(file_path: Union[str, Path], chunk_size: int = MERKLE_CHUNK_SIZE_BYTES) -> str:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openverifiablellm/utils.py` around lines 106 - 111, The code references
MERKLE_CHUNK_SIZE_BYTES when calling compute_merkle_root for "raw_merkle_root"
and "processed_merkle_root" but that constant is not defined; add a module-level
constant named MERKLE_CHUNK_SIZE_BYTES (e.g., near the logger definition) and
then update compute_merkle_root's signature to use MERKLE_CHUNK_SIZE_BYTES as
the default chunk_size parameter so callers like the code that sets
raw_merkle_root/processed_merkle_root use the defined constant consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@openverifiablellm/utils.py`:
- Around line 106-111: The code references MERKLE_CHUNK_SIZE_BYTES when calling
compute_merkle_root for "raw_merkle_root" and "processed_merkle_root" but that
constant is not defined; add a module-level constant named
MERKLE_CHUNK_SIZE_BYTES (e.g., near the logger definition) and then update
compute_merkle_root's signature to use MERKLE_CHUNK_SIZE_BYTES as the default
chunk_size parameter so callers like the code that sets
raw_merkle_root/processed_merkle_root use the defined constant consistently.

In `@tests/test_util.py`:
- Around line 133-149: Update the test_manifest_contains_merkle_fields test to
not only assert presence but also validate types/values: after calling
utils.generate_manifest and loading manifest, assert that
manifest["raw_merkle_root"] and manifest["processed_merkle_root"] are non-empty
strings (or bytes-encoded hex strings) and that manifest["chunk_size_bytes"] is
an integer > 0; keep references to the existing test function name and
utils.generate_manifest so the assertions are added in the same test right after
loading the manifest.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 593692a and 8b14fa9.

📒 Files selected for processing (2)
  • openverifiablellm/utils.py
  • tests/test_util.py

@github-actions github-actions bot added size/M and removed size/M labels Feb 26, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
openverifiablellm/utils.py (2)

15-20: ⚠️ Potential issue | 🔴 Critical

Reject invalid chunk_size before reading file data.

chunk_size is not validated. With chunk_size=0, Line [20] reads nothing and returns the empty-file hash for non-empty files, which breaks verification semantics. This appears to be a regression of an earlier finding.

🔧 Proposed fix
-def compute_merkle_root(file_path: Union[str, Path], chunk_size: int = 1024 * 1024) -> str:
+def compute_merkle_root(file_path: Union[str, Path], chunk_size: int = MERKLE_CHUNK_SIZE_BYTES) -> str:
+    if not isinstance(chunk_size, int) or chunk_size <= 0:
+        raise ValueError("chunk_size must be a positive integer")
+
     path = Path(file_path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openverifiablellm/utils.py` around lines 15 - 20, The compute_merkle_root
function accepts an unvalidated chunk_size which allows 0 or negative values
causing the file read loop (while chunk := f.read(chunk_size)) to behave
incorrectly; at the top of compute_merkle_root validate chunk_size (ensure it's
an int and > 0) and raise a ValueError for invalid values before opening the
file or reading, so callers cannot pass 0/negative sizes and the rest of the
logic (the file read loop and Merkle leaf construction) can assume a valid chunk
size.

108-110: ⚠️ Potential issue | 🔴 Critical

MERKLE_CHUNK_SIZE_BYTES is undefined and will crash manifest generation.

Lines [108]-[110] reference a missing symbol, causing NameError at runtime.

🔧 Proposed fix
 logger = logging.getLogger(__name__)
+MERKLE_CHUNK_SIZE_BYTES = 1024 * 1024
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openverifiablellm/utils.py` around lines 108 - 110, The manifest generation
references an undefined constant MERKLE_CHUNK_SIZE_BYTES in the
compute_merkle_root calls for raw_merkle_root and processed_merkle_root, causing
a NameError; fix by either defining MERKLE_CHUNK_SIZE_BYTES (e.g., set a
sensible default near the top of openverifiablellm/utils.py) or replace it with
the existing correct symbol (e.g., MERKLE_CHUNK_SIZE or DEFAULT_CHUNK_SIZE) used
elsewhere in the module so compute_merkle_root(raw_path, chunk_size=...) and
compute_merkle_root(processed_path, chunk_size=...) use a defined constant.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@openverifiablellm/utils.py`:
- Around line 126-137: The type annotation of compute_sha256 is inconsistent
with its runtime check for bytearray; update the function signature for
compute_sha256 to include bytearray in the Union (e.g., Union[str, Path, bytes,
bytearray] or reorder to Union[bytes, bytearray, str, Path]) so static typing
matches the isinstance(file_path, (bytes, bytearray)) branch, and add or update
any necessary typing imports (from typing import Union) if required.

---

Duplicate comments:
In `@openverifiablellm/utils.py`:
- Around line 15-20: The compute_merkle_root function accepts an unvalidated
chunk_size which allows 0 or negative values causing the file read loop (while
chunk := f.read(chunk_size)) to behave incorrectly; at the top of
compute_merkle_root validate chunk_size (ensure it's an int and > 0) and raise a
ValueError for invalid values before opening the file or reading, so callers
cannot pass 0/negative sizes and the rest of the logic (the file read loop and
Merkle leaf construction) can assume a valid chunk size.
- Around line 108-110: The manifest generation references an undefined constant
MERKLE_CHUNK_SIZE_BYTES in the compute_merkle_root calls for raw_merkle_root and
processed_merkle_root, causing a NameError; fix by either defining
MERKLE_CHUNK_SIZE_BYTES (e.g., set a sensible default near the top of
openverifiablellm/utils.py) or replace it with the existing correct symbol
(e.g., MERKLE_CHUNK_SIZE or DEFAULT_CHUNK_SIZE) used elsewhere in the module so
compute_merkle_root(raw_path, chunk_size=...) and
compute_merkle_root(processed_path, chunk_size=...) use a defined constant.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b14fa9 and dbef17a.

📒 Files selected for processing (1)
  • openverifiablellm/utils.py

@aniket866
Copy link
Contributor Author

Hii @Archit381 I have made the changes you suggested , Please have a look

@github-actions github-actions bot added size/M and removed size/M labels Feb 26, 2026
@Archit381 Archit381 self-assigned this Feb 26, 2026
@Archit381
Copy link
Member

@aniket866 In openverifiablellm/utils.py, pls update the parameter file_path to some other name since this function now also works on bytes/bytearray

@aniket866
Copy link
Contributor Author

@aniket866 In openverifiablellm/utils.py, pls update the parameter file_path to some other name since this function now also works on bytes/bytearray

Sure

@github-actions github-actions bot added size/M and removed size/M labels Feb 26, 2026
@aniket866
Copy link
Contributor Author

hi @Archit381 I have changed
file_path is now data
have a look

@aniket866 aniket866 marked this pull request as draft February 27, 2026 08:24
@aniket866 aniket866 marked this pull request as ready for review February 27, 2026 08:30
@Archit381
Copy link
Member

@aniket866 data doesnt seem the best if someone has to pass a file_path for loading. Maybe add another param, file_path as an Optional

@github-actions github-actions bot added size/M and removed size/M labels Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dataset verification limited to single SHA256 hash (no chunk-level integrity support)

2 participants