Add notebooks demonstrating standalone ATIF evaluation via Python API#1752
Conversation
Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds two new example Jupyter notebooks demonstrating standalone ATIF evaluation: one shows programmatic evaluator construction, plugin registration and registry-based resolution; the other shows inline custom evaluators (ExactMatchEvaluator, ToolCountEvaluator) used directly with EvaluationHarness. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User as "User / Notebook"
participant Builder as "WorkflowEvalBuilder\n(type registry)"
participant Harness as "EvaluationHarness"
participant Eval as "Evaluator\n(ExactMatch / ToolCount / plugin)"
participant ATIF as "ATIF Models\n(AtifEvalSample)"
rect rgba(200,230,255,0.5)
User->>ATIF: define ATIF trajectories\nand build AtifEvalSample(s)
end
rect rgba(200,255,200,0.5)
User->>Builder: register/discover plugins\nor provide evaluators directly
Builder->>Harness: resolve evaluator instances
end
rect rgba(255,230,200,0.5)
User->>Harness: invoke run(evaluator(s), AtifEvalSampleList)
Harness->>Eval: call evaluate_atif_fn(atif_samples)
Eval-->>Harness: return EvalOutput (items, average)
Harness-->>User: aggregated evaluation results
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
…on-api-notebook Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>
Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>
Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>
nat eval Python APIEvaluationHarness to run with ATIF-native trajectory
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
examples/notebooks/eval_atif_standalone.ipynb (4)
7-23: Consider documenting or validating NVIDIA_API_KEY usage.The requirements mention that
NVIDIA_API_KEYis needed (line 23), but the code never explicitly references or validates this environment variable. While the NIM client likely reads it automatically, users might benefit from:
- A code cell that validates the environment variable is set before running evaluations
- Documentation explaining that the NIM client reads this automatically from the environment
🔍 Example validation cell
Add after the installation cell:
import os if "NVIDIA_API_KEY" not in os.environ: raise EnvironmentError( "NVIDIA_API_KEY environment variable is required. " "Set it with: export NVIDIA_API_KEY='your-key-here'" ) print("✓ NVIDIA_API_KEY is set")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/notebooks/eval_atif_standalone.ipynb` around lines 7 - 23, Add an explicit runtime check and brief note about NVIDIA_API_KEY in the notebook: insert a small validation cell after the installation cell that imports os, verifies "NVIDIA_API_KEY" exists in os.environ, and raises a clear EnvironmentError with instructions to export the key if missing; also add one short sentence near the "Requirements" section stating that the NIM client reads the key from the environment and that users must set NVIDIA_API_KEY before running evaluations so they see the validation message early.
231-239: Consider adding error handling for validation failures.The
model_validate()call can raisepydantic.ValidationErrorif the ATIF data is malformed. For a demonstration notebook this is acceptable, but production usage should wrap this in a try-except block to provide clear error messages.🛡️ Example with error handling
from nat.data_models.atif import ATIFTrajectory +from pydantic import ValidationError -trajectory = ATIFTrajectory.model_validate(atif_trajectory_data) +try: + trajectory = ATIFTrajectory.model_validate(atif_trajectory_data) +except ValidationError as e: + print(f"Failed to validate ATIF trajectory: {e}") + raise🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/notebooks/eval_atif_standalone.ipynb` around lines 231 - 239, Wrap the ATIFTrajectory.model_validate(atif_trajectory_data) call in a try-except that catches pydantic.ValidationError; on failure print or log a clear error message including the exception details and the raw atif_trajectory_data (or a summary) and either exit/raise to stop further processing, then only proceed to use trajectory (e.g. printing trajectory.schema_version, session_id, agent, steps, final_metrics) if validation succeeded.
39-47: Consider documenting notebook execution prerequisites.The installation command uses relative paths (
../../packages/) and the--no-depsflag, which assumes:
- The notebook is executed from its current directory location
- Dependencies are already installed or will be resolved by transitive dependencies
Consider adding a note in the preceding markdown cell about the expected execution context (e.g., "Run this notebook from the repository root or ensure the relative paths resolve correctly").
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/notebooks/eval_atif_standalone.ipynb` around lines 39 - 47, The markdown before the pip install cell should state the expected execution context and dependency assumptions: add a short note above the pip command referencing the install cell (the line starting with "!uv pip install -q --no-deps -e ../../packages/nvidia_nat_eval -e ../../packages/nvidia_nat_ragas") that tells users to run the notebook from the repository root (or adjust paths), explains that --no-deps assumes dependencies are already installed or provided transitively, and provide the alternative released install command ("!uv pip install nvidia-nat-eval nvidia-nat-ragas") for users who are not developing locally.
394-416: Consider using async context manager idiomatically.The code manually calls
__aenter__()and later__aexit__()(in cell 9), which works but isn't the idiomatic pattern. Consider usingasync withto ensure cleanup happens automatically even if an exception occurs.♻️ Idiomatic async context manager usage
Replace cells 7, 8, and 9 with a single cell:
from nat.plugins.eval.runtime.builder import WorkflowEvalBuilder from nat.plugins.eval.evaluator.atif_evaluator import AtifEvaluator from nat.plugins.eval.runtime.eval_harness import EvaluationHarness async with WorkflowEvalBuilder( general_config=config.general, eval_general_config=config.eval.general, ) as eval_builder: await eval_builder.populate_builder(config, skip_workflow=True) # Collect ATIF evaluators atif_evaluators = {} for name in config.eval.evaluators: evaluator_info = eval_builder.get_evaluator(name) if isinstance(evaluator_info, AtifEvaluator): atif_evaluators[name] = evaluator_info print(f" [ATIF] {name}: {evaluator_info.description}") print(f"\nATIF-native evaluators ready: {list(atif_evaluators.keys())}") # Run evaluation harness = EvaluationHarness() results = await harness.evaluate( evaluators=atif_evaluators, atif_samples=atif_samples, ) print("=" * 60) print("Evaluation Results") print("=" * 60) for evaluator_name, eval_output in results.items(): print(f"\n--- {evaluator_name} ---") print(f" Average Score: {eval_output.average_score}") for item in eval_output.eval_output_items: print(f" Item {item.id}: score={item.score}") if item.reasoning: reasoning_str = str(item.reasoning) print(f" reasoning: {reasoning_str[:200]}{'...' if len(reasoning_str) > 200 else ''}") print("Builder resources released.")This ensures cleanup happens automatically.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/notebooks/eval_atif_standalone.ipynb` around lines 394 - 416, The code currently calls eval_builder.__aenter__() and later __aexit__() manually; change this to the idiomatic async context manager: use "async with WorkflowEvalBuilder(general_config=config.general, eval_general_config=config.eval.general) as eval_builder:" then move the call to await eval_builder.populate_builder(config, skip_workflow=True) and all logic that collects AtifEvaluator instances (check isinstance(evaluator_info, AtifEvaluator)) inside that block, removing the explicit __aenter__/__aexit__ calls; also run the evaluation inside the same block (e.g., create EvaluationHarness and await harness.evaluate(evaluators=atif_evaluators, atif_samples=atif_samples)) so cleanup happens automatically.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/notebooks/eval_atif_standalone.ipynb`:
- Around line 7-473: Replace all occurrences of the standalone acronym "NAT" in
markdown/documentation text with "NeMo Agent Toolkit" (e.g., change "without
building or running a NAT workflow", "NAT eval can consume", "NAT ships ATIF
Pydantic models", "To use NAT's built-in evaluators") while leaving package
names and code identifiers that include "nat" unchanged and enclosed in
backticks (e.g., `nvidia-nat-eval`, `nvidia-nat-ragas`); update the notebook
title and all markdown cells where the plain token "NAT" appears, but do not
modify any code strings, variable names, or backticked references.
- Around line 1-24: Insert an Apache License 2.0 markdown cell as the very first
cell in the notebook (before the existing "# Standalone ATIF Evaluation with
NeMo Agent Toolkit" title) containing the standard Apache License, Version 2.0
header and a copyright line updated to 2026; ensure the new cell is a markdown
cell so the title and rest of the notebook remain unchanged and that the license
text appears at the top of the notebook.
---
Nitpick comments:
In `@examples/notebooks/eval_atif_standalone.ipynb`:
- Around line 7-23: Add an explicit runtime check and brief note about
NVIDIA_API_KEY in the notebook: insert a small validation cell after the
installation cell that imports os, verifies "NVIDIA_API_KEY" exists in
os.environ, and raises a clear EnvironmentError with instructions to export the
key if missing; also add one short sentence near the "Requirements" section
stating that the NIM client reads the key from the environment and that users
must set NVIDIA_API_KEY before running evaluations so they see the validation
message early.
- Around line 231-239: Wrap the
ATIFTrajectory.model_validate(atif_trajectory_data) call in a try-except that
catches pydantic.ValidationError; on failure print or log a clear error message
including the exception details and the raw atif_trajectory_data (or a summary)
and either exit/raise to stop further processing, then only proceed to use
trajectory (e.g. printing trajectory.schema_version, session_id, agent, steps,
final_metrics) if validation succeeded.
- Around line 39-47: The markdown before the pip install cell should state the
expected execution context and dependency assumptions: add a short note above
the pip command referencing the install cell (the line starting with "!uv pip
install -q --no-deps -e ../../packages/nvidia_nat_eval -e
../../packages/nvidia_nat_ragas") that tells users to run the notebook from the
repository root (or adjust paths), explains that --no-deps assumes dependencies
are already installed or provided transitively, and provide the alternative
released install command ("!uv pip install nvidia-nat-eval nvidia-nat-ragas")
for users who are not developing locally.
- Around line 394-416: The code currently calls eval_builder.__aenter__() and
later __aexit__() manually; change this to the idiomatic async context manager:
use "async with WorkflowEvalBuilder(general_config=config.general,
eval_general_config=config.eval.general) as eval_builder:" then move the call to
await eval_builder.populate_builder(config, skip_workflow=True) and all logic
that collects AtifEvaluator instances (check isinstance(evaluator_info,
AtifEvaluator)) inside that block, removing the explicit __aenter__/__aexit__
calls; also run the evaluation inside the same block (e.g., create
EvaluationHarness and await harness.evaluate(evaluators=atif_evaluators,
atif_samples=atif_samples)) so cleanup happens automatically.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ec77ff3c-5873-4f39-adc1-bc6204af07a2
📒 Files selected for processing (1)
examples/notebooks/eval_atif_standalone.ipynb
Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>
EvaluationHarness to run with ATIF-native trajectoryThere was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
examples/notebooks/eval_atif_custom_evaluator.ipynb (3)
213-215: Consider removing_count_tool_callsfrom ExactMatchEvaluator.This method seems out of place in an "exact match" evaluator. While it adds
trajectory_tool_call_countto the reasoning dict (line 234), this metric is tangential to exact-match comparison and duplicates functionality already inToolCountEvaluator. Consider removing it to keep the evaluator focused on its core responsibility.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/notebooks/eval_atif_custom_evaluator.ipynb` around lines 213 - 215, The ExactMatchEvaluator includes a helper _count_tool_calls and adds trajectory_tool_call_count to the reasoning dict, which duplicates ToolCountEvaluator and is out of scope for exact-match logic; remove the _count_tool_calls method from ExactMatchEvaluator and stop adding trajectory_tool_call_count to the reasoning/result dict (remove the code that computes/assigns trajectory_tool_call_count inside ExactMatchEvaluator), relying on ToolCountEvaluator to provide that metric instead.
246-252: Add type hint forsampleparameter.The
sampleparameter lacks a type annotation. Per coding guidelines, Python methods should use type hints for all parameters.📝 Suggested fix
- def _count_tool_calls(self, sample) -> tuple[int, list[str]]: + def _count_tool_calls(self, sample: AtifEvalSample) -> tuple[int, list[str]]:Apply the same fix to
ExactMatchEvaluator._count_tool_callsat line 213:- def _count_tool_calls(self, sample) -> int: + def _count_tool_calls(self, sample: AtifEvalSample) -> int:As per coding guidelines: "Python methods should use type hints for all parameters and return values."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/notebooks/eval_atif_custom_evaluator.ipynb` around lines 246 - 252, The method _count_tool_calls is missing a type annotation for its sample parameter; add an explicit type hint (e.g., sample: EvalSample or the concrete sample class used in this notebook/repo, or at minimum sample: Any from typing) and keep the return annotation tuple[int, list[str]] unchanged; apply the same change to ExactMatchEvaluator._count_tool_calls as well so both functions declare the sample parameter type consistently.
7-26: Replace "NAT" acronym with "NeMo Agent Toolkit".Lines 11 and 22 use "NAT" as an acronym. Per coding guidelines, always spell out "NeMo Agent Toolkit" in documentation unless referring to package names in backticks (e.g.,
nvidia-nat-eval).📝 Suggested fix
- "no NAT workflow, and no evaluator plugins required.\n", + "no NeMo Agent Toolkit workflow, and no evaluator plugins required.\n",- "- `nvidia-nat-eval` works as a standalone package — no API keys, no LLMs, no plugins\n", + "- `nvidia-nat-eval` works as a standalone package — no API keys, no LLMs, no evaluator plugins\n",As per coding guidelines: "Documentation in Markdown files should not use NAT as an acronym, always spell out NeMo Agent Toolkit."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/notebooks/eval_atif_custom_evaluator.ipynb` around lines 7 - 26, Replace the bare "NAT" acronym with "NeMo Agent Toolkit" in the notebook text (e.g., change phrases like "no NAT workflow" and any other plain-text "NAT" mentions) while leaving package names in backticks (e.g., `nvidia-nat-eval`) unchanged; update any surrounding wording to keep grammar correct after substitution.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/notebooks/eval_atif_custom_evaluator.ipynb`:
- Line 300: The call to ExactMatchEvaluator uses the wrong constructor parameter
name `case_sensitive`; change it to use the declared parameter `normalize_case`
and preserve the intended semantics (the original used case_sensitive=False to
request case-insensitive matching), so replace that argument with
normalize_case=True in the ExactMatchEvaluator instantiation to avoid the
TypeError.
---
Nitpick comments:
In `@examples/notebooks/eval_atif_custom_evaluator.ipynb`:
- Around line 213-215: The ExactMatchEvaluator includes a helper
_count_tool_calls and adds trajectory_tool_call_count to the reasoning dict,
which duplicates ToolCountEvaluator and is out of scope for exact-match logic;
remove the _count_tool_calls method from ExactMatchEvaluator and stop adding
trajectory_tool_call_count to the reasoning/result dict (remove the code that
computes/assigns trajectory_tool_call_count inside ExactMatchEvaluator), relying
on ToolCountEvaluator to provide that metric instead.
- Around line 246-252: The method _count_tool_calls is missing a type annotation
for its sample parameter; add an explicit type hint (e.g., sample: EvalSample or
the concrete sample class used in this notebook/repo, or at minimum sample: Any
from typing) and keep the return annotation tuple[int, list[str]] unchanged;
apply the same change to ExactMatchEvaluator._count_tool_calls as well so both
functions declare the sample parameter type consistently.
- Around line 7-26: Replace the bare "NAT" acronym with "NeMo Agent Toolkit" in
the notebook text (e.g., change phrases like "no NAT workflow" and any other
plain-text "NAT" mentions) while leaving package names in backticks (e.g.,
`nvidia-nat-eval`) unchanged; update any surrounding wording to keep grammar
correct after substitution.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: edd55269-8cc0-4e76-ac6c-a2348988b73e
📒 Files selected for processing (1)
examples/notebooks/eval_atif_custom_evaluator.ipynb
Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/notebooks/eval_atif_custom_evaluator.ipynb`:
- Line 11: Update the notebook markdown prose to replace the standalone acronym
"NAT" with the full phrase "NeMo Agent Toolkit" in the affected markdown cells
(e.g., the string containing "no NAT workflow, and no evaluator plugins
required." and the cell around line 323), while leaving any package or code
identifiers that include "nat" (such as `nvidia-nat-eval`) untouched and
properly backticked; ensure you search for other standalone "NAT" occurrences in
the notebook and apply the same replacement to match the documentation
guideline.
- Around line 241-247: The _count_tool_calls method is missing a type annotation
for its sample parameter; update the signature of _count_tool_calls to annotate
sample as AtifEvalSample (the notebook already imports AtifEvalSample and uses
atif_samples: AtifEvalSampleList), e.g., change def _count_tool_calls(self,
sample) -> tuple[int, list[str]] to include sample: AtifEvalSample so static
type checkers and linters recognize the type while leaving the rest of the
method (steps, tool_calls iteration, return) unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5952bc86-12e3-4cbf-aeef-9bb879056a2a
📒 Files selected for processing (1)
examples/notebooks/eval_atif_custom_evaluator.ipynb
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
examples/notebooks/eval_atif_custom_evaluator.ipynb (1)
198-200:⚠️ Potential issue | 🟡 MinorImport and annotate
AtifEvalSamplein_count_tool_calls.
sampleis still untyped here, which violates the repo rule for Python method signatures.Suggested fix
-from nat.plugins.eval.evaluator.atif_evaluator import AtifEvalSampleList +from nat.plugins.eval.evaluator.atif_evaluator import AtifEvalSample +from nat.plugins.eval.evaluator.atif_evaluator import AtifEvalSampleList @@ - def _count_tool_calls(self, sample) -> tuple[int, list[str]]: + def _count_tool_calls(self, sample: AtifEvalSample) -> tuple[int, list[str]]:As per coding guidelines: Python methods should use type hints for all parameters and return values.
Also applies to: 237-246
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/notebooks/eval_atif_custom_evaluator.ipynb` around lines 198 - 200, The _count_tool_calls function currently accepts an untyped parameter named sample; add a proper type annotation using AtifEvalSample to satisfy the repo's typing rules: import AtifEvalSample from nat.plugins.eval.evaluator.atif_evaluator (alongside AtifEvalSampleList) and change the function signature for _count_tool_calls to accept sample: AtifEvalSample and annotate the return type as appropriate; apply the same parameter typing fixes to the other occurrences referenced (around the 237-246 block) so all method signatures are fully typed.
🧹 Nitpick comments (1)
examples/notebooks/eval_atif_standalone.ipynb (1)
305-312: KeepWorkflowEvalBuilderinside a single managed scope.Calling
__aenter__()in one cell and__aexit__()in a later cell leaks resources whenever an intermediate cell fails or is skipped. A singleasync withblock, or at leasttry/finally, is safer for a demo notebook.Safer pattern
from nat.plugins.eval.evaluator.atif_evaluator import AtifEvaluator from nat.plugins.eval.runtime.builder import WorkflowEvalBuilder from nat.plugins.eval.runtime.eval_harness import EvaluationHarness async with WorkflowEvalBuilder( general_config=config.general, eval_general_config=config.eval.general, ) as eval_builder: await eval_builder.populate_builder(config, skip_workflow=True) atif_evaluators = {} for name in config.eval.evaluators: evaluator_info = eval_builder.get_evaluator(name) if isinstance(evaluator_info, AtifEvaluator): atif_evaluators[name] = evaluator_info harness = EvaluationHarness() results = await harness.evaluate( evaluators=atif_evaluators, atif_samples=atif_samples, )Also applies to: 380-381
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/notebooks/eval_atif_standalone.ipynb` around lines 305 - 312, The notebook currently calls WorkflowEvalBuilder.__aenter__() in one cell and __aexit__() later, which can leak resources; change to use an async context manager (async with WorkflowEvalBuilder(...) as eval_builder) or wrap the lifecycle in try/finally so enter/exit are paired in the same scope. Inside that single managed block, call eval_builder.populate_builder(...) and then build atif_evaluators via eval_builder.get_evaluator(...) and run EvaluationHarness.evaluate(...); ensure no manual calls to __aenter__/__aexit__ remain (referenced symbols: WorkflowEvalBuilder, __aenter__, __aexit__, populate_builder, get_evaluator, EvaluationHarness.evaluate).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/notebooks/eval_atif_standalone.ipynb`:
- Around line 37-45: The editable pip install line currently uses the
"--no-deps" flag which prevents transitive dependencies (like nat-core) from
being installed and breaks later imports; update the notebook cell containing
the command string "!uv pip install -q --no-deps -e
../../packages/nvidia_nat_eval -e ../../packages/nvidia_nat_ragas" to remove the
"--no-deps" flag so dependencies are installed (keep the "-q" and "-e" editable
options intact).
---
Duplicate comments:
In `@examples/notebooks/eval_atif_custom_evaluator.ipynb`:
- Around line 198-200: The _count_tool_calls function currently accepts an
untyped parameter named sample; add a proper type annotation using
AtifEvalSample to satisfy the repo's typing rules: import AtifEvalSample from
nat.plugins.eval.evaluator.atif_evaluator (alongside AtifEvalSampleList) and
change the function signature for _count_tool_calls to accept sample:
AtifEvalSample and annotate the return type as appropriate; apply the same
parameter typing fixes to the other occurrences referenced (around the 237-246
block) so all method signatures are fully typed.
---
Nitpick comments:
In `@examples/notebooks/eval_atif_standalone.ipynb`:
- Around line 305-312: The notebook currently calls
WorkflowEvalBuilder.__aenter__() in one cell and __aexit__() later, which can
leak resources; change to use an async context manager (async with
WorkflowEvalBuilder(...) as eval_builder) or wrap the lifecycle in try/finally
so enter/exit are paired in the same scope. Inside that single managed block,
call eval_builder.populate_builder(...) and then build atif_evaluators via
eval_builder.get_evaluator(...) and run EvaluationHarness.evaluate(...); ensure
no manual calls to __aenter__/__aexit__ remain (referenced symbols:
WorkflowEvalBuilder, __aenter__, __aexit__, populate_builder, get_evaluator,
EvaluationHarness.evaluate).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: adcab0e4-1781-4493-912d-515c05db22a9
📒 Files selected for processing (2)
examples/notebooks/eval_atif_custom_evaluator.ipynbexamples/notebooks/eval_atif_standalone.ipynb
There was a problem hiding this comment.
♻️ Duplicate comments (2)
examples/notebooks/eval_atif_standalone.ipynb (2)
37-45:⚠️ Potential issue | 🟠 MajorRemove
--no-depsfrom the editable install command.Line 42 contradicts Line 39:
--no-depsskips the transitive packages that provide the laternat.*imports, so this notebook will not bootstrap correctly from a clean environment.Suggested fix
-!uv pip install -q --no-deps -e ../../packages/nvidia_nat_eval -e ../../packages/nvidia_nat_ragas +!uv pip install -q -e ../../packages/nvidia_nat_eval -e ../../packages/nvidia_nat_ragas#!/bin/bash python <<'PY' import json import pathlib import tomllib nb_path = pathlib.Path("examples/notebooks/eval_atif_standalone.ipynb") nb = json.loads(nb_path.read_text()) for idx, cell in enumerate(nb["cells"]): src = "".join(cell.get("source", [])) if "!uv pip install" in src or "from nat." in src: print(f"\n=== cell {idx} ===") print(src) for rel in ("packages/nvidia_nat_eval/pyproject.toml", "packages/nvidia_nat_ragas/pyproject.toml"): path = pathlib.Path(rel) if path.exists(): data = tomllib.loads(path.read_text()) print(f"\n=== {rel} dependencies ===") for dep in data.get("project", {}).get("dependencies", []): print(dep) PYExpected result: the install cell shows
--no-deps, later cells importnat.*, and the package metadata lists the runtime dependencies that the current command suppresses.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/notebooks/eval_atif_standalone.ipynb` around lines 37 - 45, The editable pip install line in the notebook cell (the command starting with "!uv pip install -q --no-deps -e ../../packages/nvidia_nat_eval -e ../../packages/nvidia_nat_ragas") suppresses transitive runtime dependencies and breaks later "nat.*" imports; remove the "--no-deps" flag from that install invocation so the packages' declared dependencies (from nvidia_nat_eval and nvidia_nat_ragas) are installed when bootstrapping the notebook.
1-24:⚠️ Potential issue | 🟠 MajorAdd the SPDX Apache-2.0 header as the first notebook cell.
The notebook still starts directly with the title cell, so the required file-level license header and updated 2026 copyright notice are missing. Please insert a markdown cell before the title with the standard SPDX Apache-2.0 header.
As per coding guidelines: All code should be licensed under the Apache License 2.0, should contain an Apache License 2.0 header comment at the top of each file, and changed files must have up-to-date copyright years.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/notebooks/eval_atif_standalone.ipynb` around lines 1 - 24, Insert a new first notebook cell (a markdown cell before the existing title cell in the cells array) containing the SPDX Apache-2.0 header and updated 2026 copyright notice so the notebook begins with the required file-level license header; ensure the header is formatted as a markdown cell and placed before the existing cell that contains the title "# Standalone ATIF Evaluation with NeMo Agent Toolkit".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@examples/notebooks/eval_atif_standalone.ipynb`:
- Around line 37-45: The editable pip install line in the notebook cell (the
command starting with "!uv pip install -q --no-deps -e
../../packages/nvidia_nat_eval -e ../../packages/nvidia_nat_ragas") suppresses
transitive runtime dependencies and breaks later "nat.*" imports; remove the
"--no-deps" flag from that install invocation so the packages' declared
dependencies (from nvidia_nat_eval and nvidia_nat_ragas) are installed when
bootstrapping the notebook.
- Around line 1-24: Insert a new first notebook cell (a markdown cell before the
existing title cell in the cells array) containing the SPDX Apache-2.0 header
and updated 2026 copyright notice so the notebook begins with the required
file-level license header; ensure the header is formatted as a markdown cell and
placed before the existing cell that contains the title "# Standalone ATIF
Evaluation with NeMo Agent Toolkit".
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9c651d9c-b2e2-421d-b690-965681506a7d
📒 Files selected for processing (1)
examples/notebooks/eval_atif_standalone.ipynb
|
/merge |
1 similar comment
|
/merge |
Description
eval_atif_standalone.ipynb— demonstrates using EvaluationHarness with builder-constructed RAGAS evaluators against ATIF trajectories, no YAML config neededeval_atif_custom_evaluator.ipynb— demonstrates fully standalone evaluation with inline custom evaluators (exact match + tool count), requiring only nvidia-nat-eval with no LLMs, no API keys, and no evaluator pluginsCloses
By Submitting this PR I confirm:
Summary by CodeRabbit