Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,8 @@ inference gateways.
| `openai` | `OPENAI_API_KEY` (+ optional `OPENAI_BASE_URL`) | api.openai.com (or any OpenAI-compatible URL) | `gpt-5.4` |
| `anthropic` | `ANTHROPIC_API_KEY` | api.anthropic.com | `claude-opus-4-6` |
| `nv_build` | `NVIDIA_INFERENCE_KEY` | build.nvidia.com | `deepseek-ai/deepseek-v4-flash` |
| `claude_cli` | _(none — uses local CLI auth)_ | local `claude` binary | `claude-sonnet-4-6` |
| `codex_cli` | _(none — uses local CLI auth)_ | local `codex` binary | `o4-mini` |

```bash
# Stock OpenAI
Expand All @@ -166,6 +168,16 @@ export SKILLSPECTOR_PROVIDER=nv_build
export NVIDIA_INFERENCE_KEY=nvapi-...
skillspector scan ./my-skill/

# Local Claude CLI — no API key; uses your existing `claude auth login` session
# Requires: claude CLI installed and authenticated (claude auth login)
export SKILLSPECTOR_PROVIDER=claude_cli
skillspector scan ./my-skill/

# Local Codex CLI — no API key; uses your existing `codex login` session
# Requires: codex CLI installed and authenticated
export SKILLSPECTOR_PROVIDER=codex_cli
skillspector scan ./my-skill/

# Local Ollama or any OpenAI-compatible endpoint
export SKILLSPECTOR_PROVIDER=openai
export OPENAI_API_KEY=ollama
Expand Down Expand Up @@ -396,7 +408,7 @@ Issues (2)

| Variable | Description | Required |
|----------|-------------|----------|
| `SKILLSPECTOR_PROVIDER` | Active LLM provider: `openai`, `anthropic`, or `nv_build`. Each provider has its own bundled `model_registry.yaml` and default model (see the LLM Analysis table above). Defaults to `nv_build`. | Optional |
| `SKILLSPECTOR_PROVIDER` | Active LLM provider: `openai`, `anthropic`, `nv_build`, `claude_cli`, or `codex_cli`. Each provider has its own bundled `model_registry.yaml` and default model (see the LLM Analysis table above). Defaults to `nv_build`. | Optional |
| `NVIDIA_INFERENCE_KEY` | Credential for the `nv_build` provider (build.nvidia.com). | Required for LLM analysis when `SKILLSPECTOR_PROVIDER=nv_build` |
| `OPENAI_API_KEY` | Credential for the OpenAI provider (`SKILLSPECTOR_PROVIDER=openai`). Also serves as the tier-2 fallback in the credential waterfall when the active provider returns no credentials. | Required for LLM analysis when `SKILLSPECTOR_PROVIDER=openai` |
| `OPENAI_BASE_URL` | Override the OpenAI endpoint (e.g. point at Ollama). | Optional |
Expand All @@ -405,6 +417,8 @@ Issues (2)
| `SKILLSPECTOR_MODEL_REGISTRY` | Override the bundled per-provider YAML registry (`src/skillspector/providers/<provider>.yaml`) with a custom path. | Optional |
| `SKILLSPECTOR_LOG_LEVEL` | Log level: `DEBUG`, `INFO`, `WARNING`, `ERROR` (default: `WARNING`). | Optional |

> **CLI providers** (`claude_cli`, `codex_cli`): No API key is needed. Authentication is managed entirely by the agent CLI's own login session (`claude auth login` / `codex login`). SkillSpector never reads or forwards API keys when these providers are active. The subprocess is run in a hardened sandbox: tools disabled, no MCP, read-only sandbox mode (codex), and untrusted skill content is delivered only via stdin.

### CLI Options

```bash
Expand Down
20 changes: 16 additions & 4 deletions docs/DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -260,21 +260,33 @@ Copy [.env.example](../.env.example) to `.env` in the project root and set value

| Variable | Description | Example |
|----------|-------------|---------|
| `SKILLSPECTOR_PROVIDER` | Active LLM provider: `openai` \| `anthropic` \| `nv_build`. Defaults to `nv_build`. | `openai` |
| `SKILLSPECTOR_PROVIDER` | Active LLM provider: `openai` \| `anthropic` \| `nv_build` \| `claude_cli` \| `codex_cli`. Defaults to `nv_build`. | `claude_cli` |
| `NVIDIA_INFERENCE_KEY` | Credential for `nv_build`. | `nvapi-...` |
| `OPENAI_API_KEY` | Credential for `SKILLSPECTOR_PROVIDER=openai`. Also tier-2 fallback for non-OpenAI providers. | `sk-...` |
| `OPENAI_BASE_URL` | Override the OpenAI endpoint (e.g. point at Ollama). | `http://localhost:11434/v1` |
| `ANTHROPIC_API_KEY` | Credential for `SKILLSPECTOR_PROVIDER=anthropic`. | `sk-ant-...` |
| `SKILLSPECTOR_MODEL` | Override the active provider's bundled default model (see [README.md](../README.md) for per-provider defaults). | `gpt-5.2` |
| `SKILLSPECTOR_MODEL` | Override the active provider's bundled default model (see [README.md](../README.md) for per-provider defaults). For `claude_cli`, this is passed as `--model` to the `claude` binary. | `gpt-5.2` |

> **CLI providers** (`claude_cli`, `codex_cli`): no credential env var is needed. Authentication is managed by the agent CLI's own session (`claude auth login` / `codex login`). The subprocess is heavily sandboxed — see [providers/_agent_cli.py](../src/skillspector/providers/_agent_cli.py).

### Constants, token budgets, and LLM

- **Constants** ([constants.py](../src/skillspector/constants.py)): `_SKILLSPECTOR_DEFAULT_MODEL`, `MODEL_CONFIG` (per-node model selection), `MAX_INPUT_TOKENS_PCT` (0.75), `DEFAULT_CONTEXT_LENGTH` (128k fallback).
- **`get_max_input_tokens(model)`** — input budget per LLM request (75% of resolved context window).
- **`get_max_output_tokens(model)`** — output budget per LLM request (min of 25% context, registry's `max_output_tokens` cap if set).
- Batch budget overhead is computed per-prompt via `estimate_tokens(base_prompt)` rather than a fixed constant.
- **Providers** ([providers/](../src/skillspector/providers/)): pluggable credential + token-budget resolvers. Each provider is a subpackage with its own `provider.py` and bundled `model_registry.yaml`; [registry.py](../src/skillspector/providers/registry.py) exposes `lookup_context_length` / `lookup_max_output_tokens` utilities the providers call directly. The active provider is chosen by `SKILLSPECTOR_PROVIDER` (default: `nv_build`) — see [providers/`__init__`.py](../src/skillspector/providers/__init__.py): `nv_build/` (build.nvidia.com), `openai/`, or `anthropic/`.
- **LLM calls** ([llm_utils.py](../src/skillspector/llm_utils.py)): **`get_chat_model()`** and **`chat_completion()`** resolve credentials in two tiers — active NVIDIA provider (`NVIDIA_INFERENCE_KEY` → endpoint) → standard `OPENAI_API_KEY` / `OPENAI_BASE_URL` — against any OpenAI-compatible endpoint. `max_tokens` is auto-bound to `get_max_output_tokens(model)` from `model_info`.
- **Providers** ([providers/](../src/skillspector/providers/)): pluggable credential + token-budget resolvers. Each provider is a subpackage with its own `provider.py` and bundled `model_registry.yaml`; [registry.py](../src/skillspector/providers/registry.py) exposes `lookup_context_length` / `lookup_max_output_tokens` utilities the providers call directly. The active provider is chosen by `SKILLSPECTOR_PROVIDER` (default: `nv_build`):
- `nv_build/` — build.nvidia.com (HTTP, `NVIDIA_INFERENCE_KEY`)
- `openai/` — api.openai.com or any OpenAI-compatible URL (`OPENAI_API_KEY`)
- `anthropic/` — api.anthropic.com (`ANTHROPIC_API_KEY`)
- `claude_cli/` — **local `claude` binary; no API key**. Uses the CLI's own auth session (`claude auth login`). Set `SKILLSPECTOR_PROVIDER=claude_cli`.
- `codex_cli/` — **local `codex` binary; no API key**. Uses the CLI's own auth session (`codex login`). Set `SKILLSPECTOR_PROVIDER=codex_cli`.

CLI providers (`claude_cli`, `codex_cli`) implement the optional `AgentCLICapable` interface (`is_available()` + `complete()`) defined in [providers/base.py](../src/skillspector/providers/base.py). `has_cli_capability(provider)` detects this at runtime. All subprocess calls go through the hardened helper [providers/_agent_cli.py](../src/skillspector/providers/_agent_cli.py) which enforces: no shell (`shell=False`), untrusted content via stdin only, capability stripping (tools disabled / sandboxed), environment scrubbing (no API keys forwarded), per-call timeout, and fail-closed error handling.

- **LLM calls** ([llm_utils.py](../src/skillspector/llm_utils.py)): **`get_chat_model()`** and **`chat_completion()`** dispatch based on the active provider:
- **HTTP providers**: resolve credentials in two tiers — active provider (`NVIDIA_INFERENCE_KEY` / `ANTHROPIC_API_KEY` / `OPENAI_API_KEY` → endpoint) — against any OpenAI-compatible endpoint. `max_tokens` is auto-bound to `get_max_output_tokens(model)` from `model_info`.
- **CLI providers** (`claude_cli`, `codex_cli`): `get_chat_model()` returns an `AgentCLIChatModel` adapter backed by `provider.complete()`, so the analyzers' `.invoke()` / `.with_structured_output(schema).invoke()` calls work with no API key (structured output is produced by prompting for JSON, then Pydantic-validating). `chat_completion()` routes through `get_chat_model()` as well. `is_llm_available()` calls `provider.is_available()` instead of credential resolution.
- **LLM analyzer base** ([llm_analyzer_base.py](../src/skillspector/nodes/llm_analyzer_base.py)): `LLMAnalyzerBase` provides per-file/per-chunk batching, token-budget-aware chunking, and a run loop for all LLM-based analyzers. `LLMMetaAnalyzer` extends it for filter/enrich (meta_analyzer node). Future semantic analyzers extend `LLMAnalyzerBase` for discovery mode.

---
Expand Down
181 changes: 164 additions & 17 deletions src/skillspector/llm_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,17 @@
# See the License for the specific language governing permissions and
# limitations under the License.

"""Shared LLM utilities (OpenAI-compatible chat models).
"""Shared LLM utilities (OpenAI-compatible chat models + agent CLI transports).

Credentials are resolved in this order:
1. The active NVIDIA provider (see :mod:`skillspector.providers`) —
reads ``NVIDIA_INFERENCE_KEY`` and supplies the matching endpoint.
1. The active provider (see :mod:`skillspector.providers`):
- CLI providers (``claude_cli``, ``codex_cli``): use ``is_available()``
and ``complete()`` — no API key needed.
- HTTP providers (``anthropic``, ``openai``, ``nv_build``): read their
respective credential env vars and supply a base URL.
2. ``OPENAI_API_KEY`` / ``OPENAI_BASE_URL`` (the langchain-openai
defaults).
defaults) — only consulted for HTTP providers when the provider's
own credential env var is unset.

There is no SkillSpector-specific credential env var: setting
``NVIDIA_INFERENCE_KEY`` configures whichever NVIDIA endpoint the
Expand All @@ -29,23 +33,31 @@

from __future__ import annotations

import asyncio
import json
import os

from langchain_openai import ChatOpenAI

from skillspector.constants import MODEL_CONFIG
from skillspector.model_info import get_max_input_tokens, get_max_output_tokens
from skillspector.providers import resolve_provider_credentials
from skillspector.providers import (
get_active_provider,
has_cli_capability,
resolve_provider_credentials,
)


def _resolve_llm_credentials() -> tuple[str, str | None]:
"""Return ``(api_key, base_url)`` resolved from the environment.

Tries the active NVIDIA provider first; falls back to ``OPENAI_API_KEY``
Tries the active provider first; falls back to ``OPENAI_API_KEY``
/ ``OPENAI_BASE_URL`` when the provider is not configured.

Raises:
ValueError: when no API key can be resolved from any source.
RuntimeError: when called for a CLI provider (use ``is_llm_available``
/ ``chat_completion`` directly instead).
"""
creds = resolve_provider_credentials()
if creds is not None:
Expand All @@ -65,7 +77,15 @@ def _resolve_llm_credentials() -> tuple[str, str | None]:


def is_llm_available() -> tuple[bool, str | None]:
"""Return ``(available, error_message)`` describing LLM credential status."""
"""Return ``(available, error_message)`` describing LLM availability.

For CLI providers (``claude_cli``, ``codex_cli``) the check delegates
to the provider's ``is_available()`` method (binary on PATH + auth).
For HTTP providers, it falls back to credential resolution.
"""
provider = get_active_provider()
if has_cli_capability(provider):
return provider.is_available() # type: ignore[attr-defined]
try:
_resolve_llm_credentials()
except ValueError as exc:
Expand All @@ -78,26 +98,153 @@ def fetch_model_token_limits(model_label: str) -> tuple[int, int]:
return get_max_input_tokens(model_label), get_max_output_tokens(model_label)


def get_chat_model(model: str | None = None) -> ChatOpenAI:
"""Return a :class:`ChatOpenAI` configured against the resolved endpoint.
# ---------------------------------------------------------------------------
# Agent CLI chat-model adapter
# ---------------------------------------------------------------------------
#
# The LLM analyzers (meta_analyzer, semantic_*) obtain a model from
# ``get_chat_model()`` and call ``.invoke()`` / ``.with_structured_output(
# schema).invoke()`` on it (see ``llm_analyzer_base``) — they never go through
# ``chat_completion``. To support CLI providers there, ``get_chat_model``
# returns this minimal adapter, which mimics the slice of the ``ChatOpenAI``
# interface the analyzers rely on, backed by the provider's ``complete()``
# subprocess transport.


class _AgentCLIMessage:
"""Minimal stand-in for a LangChain message: exposes ``.content``."""

def __init__(self, content: str) -> None:
self.content = content


def _extract_json_object(raw: str) -> dict:
"""Extract a single JSON object from a CLI model's text response.

Tolerates markdown code fences and surrounding prose. Raises ``ValueError``
(fail-closed) when no JSON object can be parsed.
"""
text = raw.strip()
if text.startswith("```"):
# Drop the opening fence line (``` or ```json) and any closing fence.
text = text.split("\n", 1)[1] if "\n" in text else ""
fence = text.rfind("```")
if fence != -1:
text = text[:fence]
text = text.strip()
try:
obj = json.loads(text)
if isinstance(obj, dict):
return obj
except json.JSONDecodeError:
pass
start, end = text.find("{"), text.rfind("}")
if start != -1 and end > start:
try:
obj = json.loads(text[start : end + 1])
if isinstance(obj, dict):
return obj
except json.JSONDecodeError:
pass
raise ValueError(f"could not extract a JSON object from CLI response: {raw[:200]!r}")


class _StructuredAgentCLIModel:
"""Mimics ``ChatOpenAI.with_structured_output(schema)`` for a CLI provider.

``invoke`` augments the prompt with the schema, calls the provider's
``complete()``, then parses and validates the response into *schema*.
"""

def __init__(self, provider: object, model: str, max_output_tokens: int, schema: type) -> None:
self._provider = provider
self._model = model
self._max_output_tokens = max_output_tokens
self._schema = schema

def _augment(self, prompt: str) -> str:
schema_json = json.dumps(self._schema.model_json_schema(), indent=2)
return (
f"{prompt}\n\n"
"Respond with ONLY a single JSON object conforming to the JSON Schema "
"below. Do not wrap it in markdown code fences and do not add any prose "
f"before or after the JSON.\n\nJSON Schema:\n{schema_json}"
)

def invoke(self, prompt: str) -> object:
raw = self._provider.complete( # type: ignore[attr-defined]
self._augment(prompt),
model=self._model,
max_output_tokens=self._max_output_tokens,
)
return self._schema.model_validate(_extract_json_object(raw))

async def ainvoke(self, prompt: str) -> object:
return await asyncio.to_thread(self.invoke, prompt)


class AgentCLIChatModel:
"""Minimal ``ChatOpenAI``-compatible adapter backed by a CLI provider.

Implements only the surface the analyzers use: ``invoke`` (returns an
object with ``.content``), ``ainvoke``, and ``with_structured_output``.
"""

def __init__(self, provider: object, model: str, max_output_tokens: int) -> None:
self._provider = provider
self._model = model
self._max_output_tokens = max_output_tokens

def invoke(self, prompt: str) -> _AgentCLIMessage:
text = self._provider.complete( # type: ignore[attr-defined]
prompt,
model=self._model,
max_output_tokens=self._max_output_tokens,
)
return _AgentCLIMessage(text)

async def ainvoke(self, prompt: str) -> _AgentCLIMessage:
return await asyncio.to_thread(self.invoke, prompt)

def with_structured_output(self, schema: type) -> _StructuredAgentCLIModel:
return _StructuredAgentCLIModel(
self._provider, self._model, self._max_output_tokens, schema
)


def get_chat_model(model: str | None = None) -> ChatOpenAI | AgentCLIChatModel:
"""Return a chat model for the active provider.

For CLI providers (``claude_cli``, ``codex_cli``) this returns an
:class:`AgentCLIChatModel` adapter backed by the provider's ``complete()``
subprocess transport — so the LLM analyzers (which use ``.invoke()`` and
``.with_structured_output()``) work with no API key. For HTTP providers it
returns a :class:`ChatOpenAI` configured against the resolved endpoint.

Raises:
ValueError: when no API key is configured (see ``is_llm_available``).
ValueError: when an HTTP provider has no API key configured.
"""
resolved_key, resolved_base = _resolve_llm_credentials()
model = model or MODEL_CONFIG["default"]
resolved_model = model or MODEL_CONFIG["default"]

provider = get_active_provider()
if has_cli_capability(provider):
return AgentCLIChatModel(provider, resolved_model, get_max_output_tokens(resolved_model))

resolved_key, resolved_base = _resolve_llm_credentials()
return ChatOpenAI(
model=model,
model=resolved_model,
base_url=resolved_base,
api_key=resolved_key,
max_tokens=get_max_output_tokens(model),
max_tokens=get_max_output_tokens(resolved_model),
timeout=120,
)


def chat_completion(prompt: str, *, model: str | None = None) -> str:
"""Request a single chat completion and return the assistant content."""
llm = get_chat_model(model=model)
response = llm.invoke(prompt)
"""Request a single chat completion and return the assistant content.

Routes through :func:`get_chat_model`, which dispatches to the CLI adapter
for CLI providers and to ``ChatOpenAI`` for HTTP providers.
"""
response = get_chat_model(model=model).invoke(prompt)
return response.content or ""
Loading