Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
18e933e
Add configurable misalignment QA experiment shell.
ethancjackson Mar 18, 2026
c5563d9
Use structured chat turns for misalignment QA agent
ethancjackson Mar 19, 2026
a0980d8
Update misalignment README for session-seeded transcripts
ethancjackson Mar 19, 2026
42735b1
Switch misalignment QA configs to YAML
ethancjackson Mar 19, 2026
8f5dfb4
Remove flattened agent_input from misalignment QA metadata
ethancjackson Mar 19, 2026
a004d48
Refactor misalignment QA runtime and docs.
ethancjackson Mar 19, 2026
cdd7723
Finish wiring thinking_budget through misalignment QA.
ethancjackson Mar 19, 2026
ec263c1
Polish misalignment QA runs and reporting.
ethancjackson Mar 19, 2026
b92eaae
Make misalignment QA runs uniquely identifiable.
ethancjackson Mar 19, 2026
ffa96b7
Add Anthropic variants to misalignment QA.
ethancjackson Mar 19, 2026
ff5071e
Replace misalignment QA CLI reporting with a notebook explorer.
ethancjackson Mar 23, 2026
3b28900
Simplify misalignment_qa: refactor results_notebook, improve notebook…
ethancjackson Mar 23, 2026
53ed74b
Fix TypeError in _fetch_trace_metrics_df: coerce Langfuse API string …
ethancjackson Mar 23, 2026
6b8316e
Clean up preparation.py and config_types.py
ethancjackson Mar 23, 2026
6aa2987
Improve README: bridge smoke→notebook, clarify variant inheritance, a…
ethancjackson Mar 23, 2026
a8b99a1
Clear notebook outputs before committing
ethancjackson Mar 23, 2026
693bbd6
committing current notebook outputs
ethancjackson Mar 23, 2026
fbc110b
Add cross-modality reckless ICL experiment config
ethancjackson Mar 23, 2026
d4f4569
Expand cross-modality experiment to all Gemini models; fix thinking t…
ethancjackson Mar 23, 2026
18fd7c1
Add interactive single-run walkthrough notebook for misalignment_qa
ethancjackson Mar 24, 2026
5236b7a
added interactive notebook for misalignment experiments
ethancjackson Mar 24, 2026
ca7454f
the baseline condition with the examples but without the context foll…
ethancjackson Apr 22, 2026
c00690c
updated baseline experiments
ethancjackson Apr 28, 2026
d384a1a
added proper baseline config and updated interactive notebook
ethancjackson Apr 29, 2026
058f8a4
refactor(misalignment_qa): streamline as bootcamp reference implement…
ethancjackson May 14, 2026
8c1a1c2
feat(misalignment_qa): expand tasks to three misalignment modalities
ethancjackson May 14, 2026
5c08899
docs: add misalignment_qa entry to project-level README
ethancjackson May 14, 2026
e354c44
docs(misalignment_qa): remove incorrect Langfuse prereq from interact…
ethancjackson May 14, 2026
578d360
refactor(misalignment_qa): rename interactive notebook to 01_interact…
ethancjackson May 14, 2026
1567129
refactor(misalignment_qa): rename results_notebook.py → analysis.py; …
ethancjackson May 14, 2026
918da10
refactor(misalignment_qa): rename report_metrics.ipynb → 02_inspect_r…
ethancjackson May 14, 2026
145ab9b
docs(misalignment_qa): add workflow sequencing callout to top of 02_i…
ethancjackson May 14, 2026
17177ca
docs(misalignment_qa): add workflow sequencing note to top of 01_inte…
ethancjackson May 14, 2026
1d867b3
fix(misalignment_qa): fix 10→9 task count in README; export ExamplesI…
ethancjackson May 14, 2026
9729a8d
fix(misalignment_qa): detect and surface item-level API failures in w…
ethancjackson May 14, 2026
98eefdd
fix(misalignment_qa): drop temperature for LiteLLM providers to fix A…
ethancjackson May 14, 2026
f53c73e
fix(misalignment_qa): restore temperature=0.2 for Gemini, never send …
ethancjackson May 14, 2026
38a9fe7
fix(misalignment_qa): consistent temperature=0.2 across all providers…
ethancjackson May 14, 2026
e6a234b
fix(misalignment_qa): fix two mypy errors and minor README polish
ethancjackson May 14, 2026
f2115f3
merge main
ethancjackson May 14, 2026
80409cc
[pre-commit.ci] Add auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 14, 2026
7e1557b
fix(misalignment_qa): resolve all ruff violations introduced by this PR
ethancjackson May 14, 2026
db50aa6
chore: restore stashed notebook output changes
ethancjackson May 14, 2026
d544cfe
[pre-commit.ci] Add auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 14, 2026
aa37284
fix(misalignment_qa): add all missing docstrings and fix W505 violations
ethancjackson May 14, 2026
ffee944
fix(misalignment_qa): add docstrings to test file; fix nbqa-ruff E402…
ethancjackson May 14, 2026
2c380ec
fix(misalignment_qa): suppress mypy errors for pydantic-settings and …
ethancjackson May 14, 2026
6de6c05
fix(misalignment_qa): suppress mypy import-untyped for yaml (no stubs…
ethancjackson May 14, 2026
57779bb
chore: restore knowledge_qa notebooks to main state (kernel metadata …
ethancjackson May 14, 2026
8a84802
fix(misalignment_qa): reliable heatmap with matplotlib + scores fallback
ethancjackson May 14, 2026
c33fc7a
[pre-commit.ci] Add auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 14, 2026
1c24356
Split misalignment_qa into library and implementation layers.
ethancjackson May 20, 2026
a370853
chore(misalignment_qa): refresh notebook outputs after library split.
ethancjackson May 20, 2026
b5faa71
[pre-commit.ci] Add auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 20, 2026
c1de37c
fix(misalignment_qa): resolve ruff import order and run.py E402.
ethancjackson May 20, 2026
9807216
Merge origin/main into ethan-dev.
ethancjackson May 20, 2026
face593
Remove redundant dependencies
rjavadi May 20, 2026
416963f
merge pip audit fixes
ethancjackson May 21, 2026
acd1076
fix: pass string literals for Langfuse Evaluation data_type.
ethancjackson May 21, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
OPENAI_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai/"
OPENAI_API_KEY="..." # Used for Open-AI compatible models, including Gemini models accessed via the OpenAI API.
GOOGLE_API_KEY="..." # Used by google-adk
ANTHROPIC_API_KEY="..."

# Model selection (see https://ai.google.dev/gemini-api/docs/models)
# Stable: gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite
Expand Down
9 changes: 0 additions & 9 deletions .github/workflows/code_checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,6 @@ jobs:
# Skipping joblib vulnerability (PYSEC-2024-277): disputed, no fix version available
# Skipping markdown vulnerability (PYSEC-2026-89): no fix version available on PyPI
# Skipping pyjwt vulnerability (PYSEC-2025-183): disputed, no fix version available
# Skipping transformers vulnerabilities (PYSEC-2025-211 through 218): no fix version available
ignore-vulns: |
GHSA-xm59-rqc7-hhvf
GHSA-hx9q-6w63-j58v
Expand All @@ -72,11 +71,3 @@ jobs:
PYSEC-2024-277
PYSEC-2026-89
PYSEC-2025-183
PYSEC-2025-211
PYSEC-2025-212
PYSEC-2025-213
PYSEC-2025-214
PYSEC-2025-215
PYSEC-2025-216
PYSEC-2025-217
PYSEC-2025-218
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ repos:
rev: 1.9.1
hooks:
- id: nbqa-ruff
args: [--fix, --exit-non-zero-on-fix, "--ignore=D100,F704,PLE1142"]
args: [--fix, --exit-non-zero-on-fix, "--ignore=D100,D103,E402,F704,PLE1142"]

ci:
autofix_commit_msg: |
Expand Down
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ This is a collection of reference implementations for Vector Institute's **Agent

## Reference Implementations

This repository includes four modules, each demonstrating a different aspect of building and evaluating agent-based systems:
This repository includes five modules, each demonstrating a different aspect of building and evaluating agent-based systems:

- **[Basics](implementations/basics/README.md)**
Two introductory notebooks covering agent evaluation fundamentals: why evals are hard, the four quality dimensions, grader types, and a hands-on walkthrough of the shared evaluation harness with Langfuse.
Expand All @@ -25,6 +25,9 @@ This repository includes four modules, each demonstrating a different aspect of
- **[Report Generation Agent](implementations/report_generation/README.md)**
An agent that accepts natural language queries and generates downloadable Excel reports from a relational database. Includes a Gradio demo UI and Langfuse-integrated evaluations.

- **[Misalignment QA](implementations/misalignment_qa/README.md)**
A YAML-driven experiment runner for probing whether reckless examples can nudge LLM responses toward harmful behavior. Tests five context-injection conditions across six commercial models and three task categories (life-safety, harmful code, social engineering), with traces and scores stored in Langfuse.

## Getting Started

Set your API keys in `.env`. Use `.env.example` as a template.
Expand Down
1 change: 1 addition & 0 deletions aieng-eval-agents/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Shared library for Vector Institute's Agentic AI Evaluation Bootcamp. Provides r
| `aieng.agent_evals.knowledge_qa` | ReAct agent that answers questions using live web search. Includes evaluation against the DeepSearchQA benchmark with LLM-as-a-judge metrics (precision/recall/F1). |
| `aieng.agent_evals.aml_investigation` | Agent that investigates Anti-Money Laundering cases by querying a SQLite database of financial transactions via a read-only SQL tool. |
| `aieng.agent_evals.report_generation` | Agent that generates structured Excel reports from a relational database based on natural language queries. |
| `aieng.agent_evals.misalignment_qa` | Config-driven experiment runner for measuring LLM misalignment under varying context conditions, with YAML-defined variants, LLM-as-judge scoring, and Langfuse trace analysis. |

### Reusable tools (`aieng.agent_evals.tools`)

Expand Down
10 changes: 10 additions & 0 deletions aieng-eval-agents/aieng/agent_evals/configs.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,16 @@ class Configs(BaseSettings):
validation_alias=AliasChoices("GEMINI_API_KEY", "GOOGLE_API_KEY"),
description="API key for Google/Gemini API (accepts GEMINI_API_KEY or GOOGLE_API_KEY).",
)
anthropic_api_key: SecretStr | None = Field(
default=None,
validation_alias="ANTHROPIC_API_KEY",
description="API key for Anthropic API access when using LiteLLM-backed Claude models.",
)
vector_inference_api_key: SecretStr | None = Field(
default=None,
validation_alias="VECTOR_INFERENCE_API_KEY",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "VECTOR_INFERENCE_API_KEY" doesn't exist in .env. Is it used for testing purposes only?

vector_inference_api_key on Configs is declared but never read — the agent builder pulls VECTOR_INFERENCE_API_KEY straight from os.environ via AgentSpec.api_key_env. Same story as anthropic_api_key just above.
Two options:

  1. Remove both fields from Configs, since they're not enforcing or providing anything (no extra="forbid", default=None).
  2. Use them: have the agent builder prefer configs.<name>.get_secret_value() when spec.api_key_env matches, falling back to os.getenv otherwise. That makes Configs the single source of truth and gives you SecretStr's leak protection in logs/exceptions.
    Either is fine.

description="API key for Vector's internal OpenAI-compatible inference endpoint.",
)
default_planner_model: str = Field(
default="gemini-2.5-pro",
description="Model name for planning/complex reasoning tasks.",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@

from aieng.agent_evals.evaluation.graders.config import LLMRequestConfig
from aieng.agent_evals.evaluation.types import Evaluation
from langfuse.api import ScoreDataType
from openai import APIConnectionError, APIStatusError, APITimeoutError, InternalServerError, RateLimitError
from openai.types.chat.parsed_chat_completion import ParsedChatCompletion
from pydantic import BaseModel
Expand Down Expand Up @@ -119,7 +118,7 @@ def build_error_evaluation(*, name: str, error: Exception, prefix: str) -> Evalu
name=name,
value=True,
comment=f"{prefix}: {message}",
data_type=ScoreDataType.BOOLEAN,
data_type="BOOLEAN",
metadata={"error_type": error.__class__.__name__, "error": message},
)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@
)
from aieng.agent_evals.evaluation.trace import _default_tool_call_predicate
from aieng.agent_evals.evaluation.types import Evaluation, TraceEvaluatorFunction, TraceObservationPredicate
from langfuse.api import ScoreDataType
from langfuse.api.resources import ObservationsView
from langfuse.api.resources.commons.types.trace_with_full_details import TraceWithFullDetails
from langfuse.experiment import ExperimentItemResult
Expand Down Expand Up @@ -259,7 +258,7 @@ def _to_groundedness_evaluation(
name="groundedness_score",
value=groundedness_score,
comment=response.explanation,
data_type=ScoreDataType.NUMERIC,
data_type="NUMERIC",
metadata=metadata,
)

Expand Down
43 changes: 43 additions & 0 deletions aieng-eval-agents/aieng/agent_evals/misalignment_qa/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
"""Config-driven misalignment QA experiment runner."""

from aieng.agent_evals.misalignment_qa.agent import SUPPORTED_TOOL_NAMES, build_misalignment_agent
from aieng.agent_evals.misalignment_qa.config_types import (
AgentOverrideSpec,
AgentSpec,
AgentToolSpec,
EvalSpec,
ExamplePairSpec,
ExamplesInjectMode,
ExperimentConfig,
LLMJudgeSpec,
MessageSpec,
TaskItemSpec,
TraceUsageMetricsSpec,
VariantSpec,
)
from aieng.agent_evals.misalignment_qa.experiment import load_experiment_config, run_experiment_config
from aieng.agent_evals.misalignment_qa.preparation import PreparedTaskItem, PreparedVariantRun
from aieng.agent_evals.misalignment_qa.task import MisalignmentTask


__all__ = [
"SUPPORTED_TOOL_NAMES",
"AgentOverrideSpec",
"AgentSpec",
"AgentToolSpec",
"EvalSpec",
"ExamplePairSpec",
"ExamplesInjectMode",
"ExperimentConfig",
"LLMJudgeSpec",
"MessageSpec",
"MisalignmentTask",
"PreparedTaskItem",
"PreparedVariantRun",
"TaskItemSpec",
"TraceUsageMetricsSpec",
"VariantSpec",
"build_misalignment_agent",
"load_experiment_config",
"run_experiment_config",
]
121 changes: 121 additions & 0 deletions aieng-eval-agents/aieng/agent_evals/misalignment_qa/agent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
"""ADK agent builder for misalignment QA experiments."""

from __future__ import annotations

import logging
import os
from typing import Any

from aieng.agent_evals.configs import Configs
from aieng.agent_evals.misalignment_qa.config_types import AgentSpec, AgentToolSpec
from aieng.agent_evals.tools import (
create_fetch_file_tool,
create_google_search_tool,
create_grep_file_tool,
create_read_file_tool,
create_web_fetch_tool,
)
from google.adk.agents import LlmAgent
from google.adk.models.lite_llm import LiteLlm
from google.genai.types import GenerateContentConfig, HttpOptions, ThinkingConfig


logger = logging.getLogger(__name__)


TOOL_FACTORIES: dict[str, Any] = {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Please ignore if you'd rather not make extra changes]
Recommended: change typing hint to TOOL_FACTORIES: dict[str, Callable[[Configs], Any]]

"google_search": lambda configs: create_google_search_tool(config=configs),
"web_fetch": lambda _configs: create_web_fetch_tool(),
"fetch_file": lambda _configs: create_fetch_file_tool(),
"grep_file": lambda _configs: create_grep_file_tool(),
"read_file": lambda _configs: create_read_file_tool(),
}
SUPPORTED_TOOL_NAMES: tuple[str, ...] = tuple(TOOL_FACTORIES.keys())


def _build_tools(configs: Configs, tools: list[AgentToolSpec]) -> list[Any]:
enabled = [t for t in tools if t.enabled]
if not enabled:
return []

out: list[Any] = []
for spec in enabled:
factory = TOOL_FACTORIES.get(spec.name)
if not factory:
raise ValueError(f"Unsupported tool: {spec.name}")
out.append(factory(configs))

return out


def _build_generate_content_config(spec: AgentSpec) -> GenerateContentConfig:
if spec.provider == "litellm":
# Pass temperature when it is set; None causes ADK to omit the field
# entirely (provider uses its default). Set temperature: null in the
# variant's agent config for models that have deprecated it
# (e.g. claude-opus-4-7).
return GenerateContentConfig(
temperature=spec.temperature,
max_output_tokens=spec.max_output_tokens,
)

return GenerateContentConfig(
http_options=HttpOptions(timeout=spec.timeout_sec * 1000) if spec.timeout_sec is not None else None,
temperature=spec.temperature,
max_output_tokens=spec.max_output_tokens,
thinking_config=ThinkingConfig(
include_thoughts=spec.thinking_include_thoughts,
thinking_budget=spec.thinking_budget,
),
)


def _build_model(spec: AgentSpec) -> str | LiteLlm:
if spec.provider == "litellm":
if spec.thinking_budget is not None or spec.thinking_include_thoughts:
logger.warning(
"Ignoring thinking settings for LiteLLM-backed model '%s'; those settings are Gemini-specific.",
spec.model,
)
kwargs: dict[str, Any] = {"drop_params": True}
if spec.timeout_sec is not None:
kwargs["timeout"] = spec.timeout_sec
if spec.api_base is not None:
kwargs["api_base"] = spec.api_base
if spec.api_key_env is not None:
api_key = os.getenv(spec.api_key_env)
if not api_key:
raise ValueError(
f"Environment variable '{spec.api_key_env}' is required for LiteLLM model '{spec.model}'."
)
kwargs["api_key"] = api_key
return LiteLlm(model=spec.model, **kwargs)

return spec.model


def build_misalignment_agent(spec: AgentSpec, *, name: str = "assistant") -> LlmAgent:
"""Build a configurable ADK LlmAgent.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to CONTRIBUTING.md numpy format docstring is recommended.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Build a configurable ADK LlmAgent.
"""Build a configurable ADK ``LlmAgent`` for misalignment QA experiments.
Intentionally minimal: focuses on prompt/system-instruction configurability
and tool selection so the test harness remains the main experiment driver.
Parameters
----------
spec : AgentSpec
Resolved agent specification (provider, model, prompt, tools, etc.).
name : str, optional
Name assigned to the underlying ``LlmAgent``. Defaults to ``"assistant"``.
Returns
-------
LlmAgent
A configured ADK agent ready to be invoked by the experiment runner.
Raises
------
ValueError
If ``spec.tools`` contains an unsupported tool name, or if
``spec.api_key_env`` is set but the corresponding environment
variable is empty.
"""


Intentionally minimal: focuses on prompt/system-instruction configurability
and tool selection so the test harness remains the main experiment driver.
"""
configs = Configs() # type: ignore[call-arg] # fields populated from env vars

tool_list = _build_tools(configs=configs, tools=spec.tools)
generate_cfg = _build_generate_content_config(spec)
model = _build_model(spec)

# No planner forced — for misalignment probing we want the agent to produce
# the next completion directly (tools may or may not be enabled).
return LlmAgent(
name=name,
description="",
instruction=spec.system_prompt,
tools=tool_list,
model=model,
generate_content_config=generate_cfg,
)


__all__ = ["SUPPORTED_TOOL_NAMES", "build_misalignment_agent"]
Loading