feat(providers): add Google Vertex AI inference provider#1568
feat(providers): add Google Vertex AI inference provider#1568maxamillion wants to merge 4 commits into
Conversation
Adds Vertex AI provider profiles, routing, credential refresh plumbing, CLI support, docs, and regression coverage. Keeps the related NETLINK_ROUTE seccomp allowance needed by Vertex client tooling that calls getifaddrs.
fb581ba to
fe3b147
Compare
Cover the full end-to-end setup for running Claude Code and OpenCode inside an OpenShell sandbox via inference.local with a Vertex AI backend: - google-vertex-ai.mdx: add 'Use from a Sandbox' section with tabbed examples for Claude Code (--bare flag, no /v1 suffix) and OpenCode (/v1 suffix required). Add providers_v2_enabled prerequisite and --no-verify note for global region. Document policy proposals table covering metadata.google.internal (always blocked), downloads.claude.ai, and storage.googleapis.com. - inference-routing.mdx: expand 'Use the Local Endpoint' section with tabbed examples for Claude Code, OpenCode, Python OpenAI SDK, and Python Anthropic SDK. Add notes explaining the /v1 path suffix difference between clients. - supported-agents.mdx: update Claude Code and OpenCode rows to mention inference.local support and correct base URL requirements.
|
/ok to test 09ddf58 |
On arm64 under heavy CI load, the /proc fd scan in find_socket_inode_owners can transiently miss the parent process's socket fd entry, returning only the child as an owner. This causes resolve_process_identity to return Ok (single owner, no ambiguity check fires) instead of the expected ambiguous-ownership Err. Extend the retry loop to also handle unexpected Ok results, mirroring the existing retry for transient Err results. 10 retries at 50ms gives a 500ms settling window, which is sufficient for procfs to stabilize on loaded arm64 runners.
| ) -> Result<reqwest::Response, RouterError> { | ||
| let (builder, url) = prepare_backend_request(client, route, method, path, &headers, body)?; | ||
| let (builder, url) = | ||
| prepare_backend_request(client, route, method, path, &headers, body, true)?; |
There was a problem hiding this comment.
Does this always force an upgrade to :streamRawPredict upstream? Is that intended?
There was a problem hiding this comment.
No, prepare_backend_request calls build_provider_url which conditionally sets it.
let suffix = if stream_response
&& suffix == ":rawPredict"
&& is_vertex_anthropic_rawpredict_route(route)
{
":streamRawPredict"
} else {
suffix.as_str()
};| - name: service_account_key | ||
| description: Google service account JSON refresh bootstrap material; not injected into sandboxes | ||
| env_vars: [GOOGLE_SERVICE_ACCOUNT_KEY] | ||
| required: false |
There was a problem hiding this comment.
Is this actually read after being written? I don't see it used in minting flows or anywhere else.
There was a problem hiding this comment.
The service_account_key credential holds the raw service account JSON. It is not used in the code for getting the token. This is there to block sandbox injection and to make sure it's never used as a bearer token (instead forcing the minting of an access token). The is_non_injectable_provider_credential function implements for former and resolve_vertex_ai_route_requires_minted_access_token() test validates the latter.
|
Hey @maxamillion I was looking at your vertex ai changes here and I was thinking that this problem isn't specific to vertex ai - we would run into this with supporting e.g. Azure OpenAI endpoints as well. I'm wondering if instead of hardcoding these transforms per-provider (e.g. For standard providers (OpenAI, NVIDIA, Anthropic), these fields would all be empty so no extra config. For vertex ai, the route resolver would populate them the same way it already sets As a concrete example of how we could set this in the provider profile: inference:
protocol: anthropic_messages # what api the client speaks to inference.local
model_in_path: true # model ID goes in the URL, not the request body
request_suffix: ":rawPredict" # append after model ID for buffered requests
stream_suffix: ":streamRawPredict" # for streaming requests
body_remove: [model] # fields to remove from the client's JSON body
body_inject:
anthropic_version: "vertex-2023-10-16" # k/v pair to add to JSON body if absent
strip_headers: [anthropic-beta]I think long term this will make it easier to support more providers/inference endpoints, and keep the router a generic request forwarder, while the provider awareness would just need to stay at route resolution rather than split across resolution + routing. WDYT? |
Summary
Add Google Vertex AI as a first-class inference provider, supporting both service account (JWT) and gcloud ADC (OAuth2 refresh token) credential flows. Routes Anthropic models through Vertex AI rawPredict and all other models (Gemini, Llama, Mistral, etc.) through the Vertex OpenAI-compatible endpoint. Includes a seccomp policy relaxation for
NETLINK_ROUTEsockets required by Vertex client tooling.Related Issue
Changes
Provider profile & discovery
providers/google-vertex-ai.yamlwith three credential entries: raw service account key (gateway-only, never injected into sandboxes), service account JWT-minted token, and gcloud ADC OAuth2-refreshed token.ProviderTypeProfile::allows_gateway_refresh_bootstrap()andCredentialRefreshProfile::is_gateway_mintable()replace inline gateway-refresh logic in server and CLI.normalize_inference_provider_type()inopenshell-coreis now the single source of truth for provider alias resolution (vertex,vertex-ai,google-vertex→google-vertex-ai).Inference routing (server)
resolve_vertex_ai_route()dispatches by publisher: Anthropic models get rawPredict URLs withmodel_in_path=true; all others get the OpenAI-compatible/chat/completionsendpoint.infer_vertex_publisher()maps model prefixes to publishers (6 families: Anthropic, Google, Meta, Mistral, AI21, DeepSeek).{region}-aiplatform.googleapis.com, global →aiplatform.googleapis.com,us/eu→aiplatform.{region}.rep.googleapis.com.CredentialLookupenum (PreferredOnlyvsPreferredThenAny) prevents raw SA JSON from being picked up as a bearer token.Router backend
build_provider_url()handles four URL construction cases viamodel_in_path×request_path_overridematrix. Streaming upgrades:rawPredict→:streamRawPredict.modelfrom request body (Vertex encodes it in path), injectsanthropic_version: "vertex-2023-10-16", and stripsanthropic-betaheader (Vertex rejects unknown beta values).Provider gRPC (server)
is_non_injectable_provider_credential()prevents raw service account JSON from reaching sandboxes.ANTHROPIC_VERTEX_PROJECT_ID,GCP_PROJECT_ID,CLOUD_ML_REGION,GCP_LOCATION,GOOSE_PROVIDER=gcp_vertex_ai, etc. so Claude Code, Goose, and OpenCode work inside sandboxes. Explicit credential values take precedence.Protobuf
ResolvedRoutegainsmodel_in_path(field 8) andrequest_path_override(field 9).CLI
--from-gcloud-adcflag onprovider create(mutually exclusive with--from-existingand--credential). Reads gcloud ADC fromGOOGLE_APPLICATION_CREDENTIALS,$CLOUDSDK_CONFIG/application_default_credentials.json, or~/.config/gcloud/application_default_credentials.json; validatesauthorized_usertype; configures OAuth2 refresh and mints the first token.VERTEX_AI_PROJECT_ID,VERTEX_AI_REGION, base URL, publisher).SandboxUploadPlanrefactor consolidates upload existence-check + git-aware planning.scrub_git_env()prevents inherited git env vars from breaking subprocess git calls.Sandbox
NETLINK_ROUTE(protocol 0) now allowed through seccomp; all other netlink protocols remain blocked. Required becausegetifaddrs(3)on Linux usesNETLINK_ROUTEand is called by Node.js, Python, Go, and most HTTP/gRPC client libraries. Security is maintained by CAP_NET_ADMIN absence, network namespace isolation, and nftables rules.model_in_pathandrequest_path_override.enrich_sandbox_baseline_paths()refactored with injectablepath_existsclosure for testability.Documentation
docs/providers/google-vertex-ai.mdx: full provider setup guide covering both auth flows, configuration keys, region/host selection, supported models, sandbox usage with Claude Code and OpenCode, and policy proposals guidance.inference-routing.mdx,manage-providers.mdx,providers-v2.mdx,supported-agents.mdx,best-practices.mdxfor Vertex references.architecture/gateway.mdInference Resolution section documenting bundle resolution, Vertex host selection, route shaping, header passthrough, and security model.Testing
mise run pre-commitpasses (lint, format, license headers)Checklist