From a7017abd2ceaf8e7f75ed103e363e04b0eac9aef Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Manuel=20Sugawara=20=28=E2=88=A9=EF=BD=80-=C2=B4=29?= =?UTF-8?q?=E2=8A=83=E2=94=81=E7=82=8E=E7=82=8E=E7=82=8E=E7=82=8E=E7=82=8E?= Date: Wed, 29 Apr 2026 12:30:09 -0700 Subject: [PATCH] Add a series of technical guides to the project --- docs/technical-guide/auth.md | 208 +++++++++++ docs/technical-guide/aws-protocols.md | 143 +++++++ docs/technical-guide/client-core.md | 256 +++++++++++++ docs/technical-guide/codecs.md | 374 +++++++++++++++++++ docs/technical-guide/codegen.md | 475 ++++++++++++++++++++++++ docs/technical-guide/context.md | 176 +++++++++ docs/technical-guide/documents.md | 298 +++++++++++++++ docs/technical-guide/dynamic-client.md | 320 ++++++++++++++++ docs/technical-guide/http.md | 264 +++++++++++++ docs/technical-guide/index.md | 120 ++++++ docs/technical-guide/io.md | 238 ++++++++++++ docs/technical-guide/mcp.md | 228 ++++++++++++ docs/technical-guide/retries-waiters.md | 188 ++++++++++ docs/technical-guide/rules-engine.md | 434 ++++++++++++++++++++++ docs/technical-guide/schemas.md | 467 +++++++++++++++++++++++ docs/technical-guide/server.md | 375 +++++++++++++++++++ 16 files changed, 4564 insertions(+) create mode 100644 docs/technical-guide/auth.md create mode 100644 docs/technical-guide/aws-protocols.md create mode 100644 docs/technical-guide/client-core.md create mode 100644 docs/technical-guide/codecs.md create mode 100644 docs/technical-guide/codegen.md create mode 100644 docs/technical-guide/context.md create mode 100644 docs/technical-guide/documents.md create mode 100644 docs/technical-guide/dynamic-client.md create mode 100644 docs/technical-guide/http.md create mode 100644 docs/technical-guide/index.md create mode 100644 docs/technical-guide/io.md create mode 100644 docs/technical-guide/mcp.md create mode 100644 docs/technical-guide/retries-waiters.md create mode 100644 docs/technical-guide/rules-engine.md create mode 100644 docs/technical-guide/schemas.md create mode 100644 docs/technical-guide/server.md diff --git a/docs/technical-guide/auth.md b/docs/technical-guide/auth.md new file mode 100644 index 000000000..c1f45a969 --- /dev/null +++ b/docs/technical-guide/auth.md @@ -0,0 +1,208 @@ +# Auth System + +> **Last updated:** April 29, 2026 + +The auth system provides pluggable authentication and signing for Smithy-Java clients. It's layered across four modules: +shared identity abstractions, client-side auth scheme resolution, AWS-specific identity types, and the SigV4 signing +implementation. + +**Source:** +- [`auth-api/`](https://github.com/smithy-lang/smithy-java/tree/main/auth-api) — Shared abstractions +- [`client/client-auth-api/`](https://github.com/smithy-lang/smithy-java/tree/main/client/client-auth-api) — Client auth scheme resolution +- [`aws/aws-auth-api/`](https://github.com/smithy-lang/smithy-java/tree/main/aws/aws-auth-api) — AWS identity types +- [`aws/aws-sigv4/`](https://github.com/smithy-lang/smithy-java/tree/main/aws/aws-sigv4) — SigV4 implementation + +## Core Abstractions (`auth-api`) + +### Identity + +```java +public interface Identity { + default Instant expirationTime() { return null; } +} +``` + +Built-in identity types: +- `TokenIdentity` — bearer token (`String token()`) +- `ApiKeyIdentity` — API key (`String apiKey()`) +- `LoginIdentity` — username/password + +### IdentityResolver + +```java +public interface IdentityResolver { + IdentityResult resolveIdentity(Context requestProperties); + Class identityType(); + static IdentityResolver chain(List> resolvers); + static IdentityResolver of(I identity); // static resolver +} +``` + +Returns `IdentityResult` (success-or-error wrapper) instead of throwing. Expected failures (missing env vars) return +`IdentityResult.ofError()`. `IdentityResolverChain` tries resolvers in order, returns first success. + +### IdentityResolvers (Registry) + +```java +public interface IdentityResolvers { + IdentityResolver identityResolver(Class identityClass); + static IdentityResolvers of(IdentityResolver... resolvers); +} +``` + +Type-safe registry mapping identity class → resolver. Used by `AuthScheme.identityResolver(resolvers)`. + +### Signer + +```java +@FunctionalInterface +public interface Signer extends AutoCloseable { + SignResult sign(RequestT request, IdentityT identity, Context properties); +} +``` + +`SignResult` is a record: `(RequestT signedRequest, String signature)`. The signature string is used as the +seed for event stream signing. + +## Client Auth API (`client-auth-api`) + +### AuthScheme + +```java +public interface AuthScheme { + ShapeId schemeId(); + Class requestClass(); + Class identityClass(); + default IdentityResolver identityResolver(IdentityResolvers resolvers); + default Context getSignerProperties(Context context); + default Context getIdentityProperties(Context context); + Signer signer(); + default > FrameProcessor eventSigner(...); +} +``` + +An AuthScheme bundles: scheme ID (e.g., `aws.auth#sigv4`), identity resolver lookup, and signer. `getSignerProperties()` +/ `getIdentityProperties()` extract scheme-specific config from the client context. + +### AuthSchemeResolver + +```java +@FunctionalInterface +public interface AuthSchemeResolver { + List resolveAuthScheme(AuthSchemeResolverParams params); +} +``` + +`DefaultAuthSchemeResolver` iterates `operation.effectiveAuthSchemes()` and wraps each in an `AuthSchemeOption`. The +client pipeline picks the first option with a matching scheme, compatible request class, and available identity +resolver. + +### AuthSchemeFactory (SPI) + +```java +public interface AuthSchemeFactory { + ShapeId schemeId(); + AuthScheme createAuthScheme(T trait); +} +``` + +Discovered via `ServiceLoader`. Receives the Smithy trait instance and creates a configured `AuthScheme`. + +## Pipeline Integration + +Auth resolution happens in `ClientPipeline.doSendOrRetry()` between the `modifyBeforeSigning` and `readBeforeSigning` +interceptor hooks: + +1. Build `AuthSchemeResolverParams` with protocol ID, operation, and context +2. Call `authSchemeResolver.resolveAuthScheme(params)` → priority-ordered `List` +3. Iterate options, look up each `schemeId` in `supportedAuthSchemes` +4. Check `authScheme.requestClass().isAssignableFrom(request.getClass())` +5. Merge identity/signer properties from scheme defaults + option overrides +6. Call `authScheme.identityResolver(identityResolvers)` — skip if null +7. Call `resolver.resolveIdentity(identityProperties)` +8. First scheme with a non-null resolver becomes the `ResolvedScheme` +9. After endpoint resolution, apply endpoint auth scheme property overrides +10. `authScheme.signer().sign(request, identity, signerProperties)` → signed request + +### Property Layering + +Signer properties are merged from three sources (later overrides earlier): +1. Scheme defaults (`authScheme.getSignerProperties(context)`) +2. Resolver overrides (`AuthSchemeOption.signerPropertyOverrides()`) +3. Endpoint overrides (`applyEndpointAuthSchemeOverrides`) + +## AWS Auth (`aws-auth-api`) + +### AwsCredentialsIdentity + +```java +public interface AwsCredentialsIdentity extends Identity { + String accessKeyId(); + String secretAccessKey(); + default String sessionToken() { return null; } + default String accountId() { return null; } +} +``` + +### AwsCredentialsResolver + +```java +public interface AwsCredentialsResolver extends IdentityResolver { + @Override default Class identityType() { + return AwsCredentialsIdentity.class; + } +} +``` + +## SigV4 Implementation (`aws-sigv4`) + +### SigV4AuthScheme + +```java +public final class SigV4AuthScheme implements AuthScheme { + public SigV4AuthScheme(String signingName); + // schemeId() → "aws.auth#sigv4" + // getSignerProperties() extracts SIGNING_NAME, REGION, CLOCK from context + // signer() → SigV4Signer.create() + // eventSigner() → SigV4EventSigner +} +``` + +Its inner `Factory` class implements `AuthSchemeFactory` and is registered via SPI. + +### SigV4Signer — Signing Flow + +1. Extract `region`, `signingName`, `clock` from properties +2. Compute payload hash (SHA-256 hex of body) +3. Build canonical request: method + path + query + sorted headers + signed headers + payload hash +4. Derive signing key: `HMAC(HMAC(HMAC(HMAC("AWS4"+secret, date), region), service), "aws4_request")` + - Cached in `SigningCache` (bounded LRU, 300 entries, `StampedLock`-protected) + - Valid for the same calendar day +5. Compute signature: `HMAC(signingKey, stringToSign)` +6. Build `Authorization` header + +Performance optimizations: +- `SigningResources` pools `StringBuilder`, `MessageDigest`, and `Mac` instances +- `Pool` uses `ConcurrentLinkedQueue` (32 max) +- `SigningCache` uses `LinkedHashMap` with FIFO eviction and `StampedLock` +- Manual date formatting (avoids `DateTimeFormatter`) + +### SigV4EventSigner + +Implements chained event stream signing (`AWS4-HMAC-SHA256-PAYLOAD`). Each frame's signature depends on the previous +frame's signature. Produces frames with `:date` and `:chunk-signature` headers. Returns a `closingFrame()` that signs an +empty payload. + +## Auth Scheme Discovery + +**Generated clients**: Auth schemes are hardcoded by codegen based on `AuthSchemeFactory` SPI. + +**Dynamic client**: `SimpleAuthDetectionPlugin` discovers auth schemes at runtime via `ServiceLoader`, reads effective + auth schemes from the model via `ServiceIndex.getEffectiveAuthSchemes()`, and creates schemes via factories. + +## Configuration Points + +Users can customize auth at three levels: +1. **Client builder** — `putSupportedAuthSchemes()`, `authSchemeResolver()`, `addIdentityResolver()` +2. **Plugins** — `ClientPlugin.configureClient()` can add schemes, resolvers, identity resolvers +3. **Per-request** — `RequestOverrideConfig` can override auth scheme resolver, add schemes, add identity resolvers diff --git a/docs/technical-guide/aws-protocols.md b/docs/technical-guide/aws-protocols.md new file mode 100644 index 000000000..e1247f6a1 --- /dev/null +++ b/docs/technical-guide/aws-protocols.md @@ -0,0 +1,143 @@ +# AWS Protocol Integrations + +> **Last updated:** April 29, 2026 + +Smithy-Java implements four AWS-specific protocols as `ClientProtocol` plugins, plus a shared event streaming +infrastructure. These protocols differ in serialization format, operation routing, and HTTP binding usage, but all share +the same client pipeline and auth system. + +**Source:** [`aws/client/`](https://github.com/smithy-lang/smithy-java/tree/main/aws/client) + +## Protocol Hierarchy + +``` +ClientProtocol + └── HttpClientProtocol (abstract) + ├── HttpBindingClientProtocol (abstract, REST-style) + │ ├── RestJsonClientProtocol — aws.protocols#restJson1 + │ └── RestXmlClientProtocol — aws.protocols#restXml + ├── AwsJsonProtocol (abstract sealed, RPC-style) + │ ├── AwsJson1Protocol — aws.protocols#awsJson1_0 + │ └── AwsJson11Protocol — aws.protocols#awsJson1_1 + └── AwsQueryClientProtocol — aws.protocols#awsQuery +``` + +All protocols are registered via `ClientProtocolFactory` SPI in `META-INF/services`. + +## AWS JSON 1.0 / 1.1 + +**Module:** `aws-client-awsjson` + +RPC-style protocol, all data goes in the body, no HTTP binding traits. + +- **Request**: Always `POST`. Sets `X-Amz-Target: {ServiceName}.{OperationName}`. Body is JSON-serialized + input. Content-Type: `application/x-amz-json-1.0` or `1.1`. +- **Response**: JSON body. Empty body → deserialize from `{}`. +- **Error detection**: `x-amzn-errortype` header first, then `__type` and `code` fields in JSON body. JSON 1.0 strips + URI prefix from `__type`. +- **Codec**: `JsonCodec` with `useTimestampFormat(true)` but NOT `useJsonName(true)`. + +## AWS restJson1 + +**Module:** `aws-client-restjson` + +REST-style protocol, uses HTTP binding traits (`@http`, `@httpHeader`, `@httpQuery`, `@httpLabel`, `@httpPayload`, +`@httpPrefixHeaders`). + +- **Request**: HTTP method and URI pattern from `@http` trait. Headers, query params, path labels from binding + traits. Body is JSON for non-bound members. +- **Response**: HTTP binding deserialization for headers, status code, payload. +- **Error detection**: `x-amzn-errortype` header first, then `__type` in JSON body. Uses `HttpBindingErrorFactory` for + known errors. +- **Codec**: `JsonCodec` with `useJsonName(true)` AND `useTimestampFormat(true)`. +- **Key difference from AWS JSON**: Uses `@jsonName` trait, omits empty payloads, supports struct payloads via + `@httpPayload`. + +## AWS restXml + +**Module:** `aws-client-restxml` + +REST-style protocol, same HTTP binding support as restJson1 but with XML body. + +- **Request/Response**: Same HTTP binding pattern as restJson1 but body is XML. +- **Error detection**: `x-amzn-errortype` header first, then XML error code via `XmlUtil.parseErrorCodeName()`. +- **Codec**: `XmlCodec`. + +## AWS Query + +**Module:** `aws-client-awsquery` + +The most unique protocol, asymmetric serialization formats. + +- **Request**: Always `POST` with `Content-Type: application/x-www-form-urlencoded`. Body format: + `Action={OperationName}&Version={version}&Param1=Value1&Param2.SubParam=Value2`. Uses custom `AwsQueryFormSerializer` + with dot-delimited nested parameters. Respects `@xmlName` and `@xmlFlattened` traits. +- **Response**: XML body with wrapper elements (`{OperationName}Response` → `{OperationName}Result`). Uses `XmlCodec` + with wrapper element configuration. +- **Error detection**: XML error code via `XmlUtil.parseErrorCodeName()`. Checks `@awsQueryError` trait custom codes on + operation error schemas. +- **Requires**: Both `service` and `serviceVersion` settings (unlike other protocols). +- **Does NOT support**: Event streaming, document types. + +## Event Streaming + +**Module:** `aws-event-streams` + +All AWS protocols (except Query) share the same event streaming infrastructure based on the AWS Event Stream binary +message format (`application/vnd.amazon.eventstream`). + +### Core Types + +- `AwsEventFrame` — wraps `software.amazon.eventstream.Message`, implements `Frame` +- `AwsEventEncoderFactory` — creates encoders for input/output streams +- `AwsEventDecoderFactory` — creates decoders for input/output streams + +### Encoding Flow + +1. Determine if event is initial request/response or a union event member +2. Handle `@eventHeader` members → event message headers +3. Handle `@eventPayload` members → blob (raw bytes), string (UTF-8), or codec-serialized +4. Regular members → codec-serialized as payload +5. Error events: modeled exceptions get `:exception-type` header; unmodeled get `:error-code` + `:error-message` + +### Decoding Flow + +1. Read `:message-type` header: `"event"`, `"error"`, or `"exception"` +2. For errors: extract `:error-code` and `:error-message`, throw `EventStreamingException` +3. For exceptions: read `:exception-type`, deserialize as modeled error +4. For events: read `:event-type`, find matching union member, deserialize payload + headers + +### RPC vs REST Event Streaming + +- **RPC protocols** (AWS JSON, rpcv2-cbor): Use `RpcEventStreamsUtil` helper for body wrapping and initial event + handling +- **REST protocols** (restJson1, restXml): Use `HttpBindingClientProtocol`'s built-in event streaming via + `RequestSerializer.eventEncoderFactory()` and `ResponseDeserializer.eventDecoderFactory()` + +## Protocol Comparison + +| Aspect | AWS JSON 1.0/1.1 | restJson1 | restXml | AWS Query | +|--------|-------------------|-----------|---------|-----------| +| Style | RPC | REST | REST | RPC | +| Request body | JSON | JSON + HTTP bindings | XML + HTTP bindings | Form URL-encoded | +| Response body | JSON | JSON + HTTP bindings | XML + HTTP bindings | XML with wrappers | +| Operation routing | `X-Amz-Target` header | HTTP method + URI | HTTP method + URI | `Action=` in body | +| `@jsonName` | No | Yes | N/A | N/A | +| `@xmlName` | N/A | N/A | Yes | Yes | +| Empty input | `{}` | Omitted | Omitted | `Action=Op&Version=V` | +| Event streaming | Yes | Yes | Yes | No | +| Required settings | `service` | `service` | `service` | `service` + `serviceVersion` | + +## Key Design Patterns + +1. **RPC vs REST split**: RPC protocols extend `HttpClientProtocol` directly. REST protocols extend + `HttpBindingClientProtocol` which delegates to `HttpBinding` for HTTP trait-based serialization. + +2. **Shared event streaming**: All protocols use the same + `AwsEventFrame`/`AwsEventEncoderFactory`/`AwsEventDecoderFactory` infrastructure. + +3. **SPI-based discovery**: All protocol factories are registered via `ClientProtocolFactory` SPI, enabling runtime + protocol selection. + +4. **Error header priority**: All AWS protocols check `x-amzn-errortype` header before parsing the response body for + error type. diff --git a/docs/technical-guide/client-core.md b/docs/technical-guide/client-core.md new file mode 100644 index 000000000..96cc92e63 --- /dev/null +++ b/docs/technical-guide/client-core.md @@ -0,0 +1,256 @@ +# Client Core Architecture + +> **Last updated:** April 29, 2026 + +The client core is the backbone of all Smithy-Java clients, both generated and dynamic. It provides the `Client` base +class, the plugin system, the interceptor system, and the call pipeline that orchestrates request execution from +serialization through auth, endpoint resolution, signing, transport, and deserialization. + +**Source:** [`client/client-core/`](https://github.com/smithy-lang/smithy-java/tree/main/client/client-core) + +## Client Base Class + +[`Client`](https://github.com/smithy-lang/smithy-java/blob/main/client/client-core/src/main/java/software/amazon/smithy/java/client/core/Client.java) +is the abstract base for all generated clients. It holds immutable runtime state: + +```java +public abstract class Client implements Closeable { + private final ClientConfig config; + private final ClientPipeline pipeline; + private final TypeRegistry typeRegistry; + private final ClientInterceptor interceptor; + private final IdentityResolvers identityResolvers; + private final RetryStrategy retryStrategy; +} +``` + +### The call() Method + +```java +protected O call( + I input, ApiOperation operation, RequestOverrideConfig overrideConfig) +``` + +This is the entry point for all RPC calls: +1. Applies `RequestOverrideConfig` if present +2. Invokes `interceptor.modifyBeforeCall()` to allow interceptors to modify the config +3. Rebuilds pipeline/resolvers if config changed +4. Constructs a `ClientCall` with all resolved state +5. Delegates to `pipeline.send(clientCall)` + +### Client.Builder + +A self-referential generic builder (`B extends Builder`) implementing `ClientSetting`. Wraps a +`ClientConfig.Builder` internally: + +```java +builder.transport(new JavaHttpClientTransport()) + .protocol(new Rpcv2CborProtocol(serviceId)) + .endpointResolver(EndpointResolver.staticEndpoint("https://example.com")) + .addInterceptor(myInterceptor) + .retryStrategy(StandardRetryStrategy.builder().maxAttempts(5).build()) + .addPlugin(myPlugin) + .build(); +``` + +## Plugin System + +### ClientPlugin + +```java +@FunctionalInterface +public interface ClientPlugin { + void configureClient(ClientConfig.Builder config); + default List getChildPlugins() { return Collections.emptyList(); } + default Phase getPluginPhase() { return Phase.APPLY; } +} +``` + +**Phase enum** (execution order): +1. `FIRST` — framework-level plugins (e.g., `DefaultPlugin`) +2. `BEFORE_DEFAULTS` → `DEFAULTS` → `AFTER_DEFAULTS` +3. `BEFORE_APPLY` → `APPLY` (default) → `AFTER_APPLY` +4. `LAST` — framework-level finalization + +Plugins are deduplicated by class (first instance wins). Applied plugin classes are tracked so `toBuilder().build()` +won't re-apply them. + +### AutoClientPlugin (SPI) + +```java +public interface AutoClientPlugin extends ClientPlugin {} +``` + +Discovered via `ServiceLoader`. Applied through `AutoPlugin` which is a child of `DefaultPlugin`. + +### DefaultPlugin + +Singleton at `Phase.FIRST`. Always added to every client. Its children handle: +- **`DiscoverTransportPlugin`** — auto-discovers transport via SPI +- **`ApplyModelRetryInfoPlugin`** — sets `RetryInfo` on exceptions based on modeled retry info +- **`InjectIdempotencyTokenPlugin`** — auto-generates idempotency tokens when not provided +- **`AutoPlugin`** — loads all `AutoClientPlugin` SPI implementations as children + +### Plugin Collection Algorithm + +1. Recursively flatten plugins + children +2. Skip already-applied classes +3. Filter through `pluginPredicate` +4. Sort by `Phase` (stable sort preserves insertion order within same phase) +5. Call `configureClient(builder)` on each + +## Interceptor System + +### ClientInterceptor + +Defines ~20 hooks that inject code into the request execution pipeline: + +| Hook | Type | Scope | +|------|------|-------| +| `modifyBeforeCall` | write | once/execution | +| `readBeforeExecution` | read | once/execution | +| `modifyBeforeSerialization` → `readBeforeSerialization` | write/read | once/execution | +| `readAfterSerialization` | read | once/execution | +| `modifyBeforeRetryLoop` | write | once/execution | +| `readBeforeAttempt` | read | once/attempt | +| `modifyBeforeSigning` → `readBeforeSigning` → `readAfterSigning` | write/read | once/attempt | +| `modifyBeforeTransmit` → `readBeforeTransmit` | write/read | once/attempt | +| `readAfterTransmit` | read | once/attempt | +| `modifyBeforeDeserialization` → `readBeforeDeserialization` | write/read | once/attempt | +| `readAfterDeserialization` | read | once/attempt | +| `modifyBeforeAttemptCompletion` → `readAfterAttempt` | write/read | once/attempt | +| `modifyBeforeCompletion` → `readAfterExecution` | write/read | once/execution | + +**Read hooks** observe in-flight data (cannot modify). **Write hooks** can modify and return new values. + +### Hook Data Types + +Sealed hierarchy carrying progressively more data: + +``` +CallHook — operation, config, input +InputHook — operation, context, input + └─ RequestHook — adds request + └─ ResponseHook — adds response + └─ OutputHook — adds output +``` + +Each hook has `with*()` methods that return a new hook if the value changed (identity-check optimization). + +### ClientInterceptorChain + +Chains multiple interceptors. For read hooks, catches exceptions with "last error wins" semantics. For write hooks, +chains output of one interceptor as input to the next. + +## Call Pipeline + +[`ClientPipeline`](https://github.com/smithy-lang/smithy-java/blob/main/client/client-core/src/main/java/software/amazon/smithy/java/client/core/ClientPipeline.java) +orchestrates the full request lifecycle. Created via `ClientPipeline.of(protocol, transport)` which validates that both +share the same `MessageExchange`. + +### Full Flow + +**Pre-retry (once per execution):** +1. `readBeforeExecution` → `modifyBeforeSerialization` → `readBeforeSerialization` +2. `protocol.createRequest(operation, input, context, UNRESOLVED_URI)` → serialized request +3. `readAfterSerialization` → `modifyBeforeRetryLoop` + +**Retry token acquisition:** +4. `retryStrategy.acquireInitialToken()` → may sleep for pre-emptive throttling + +**Per-attempt loop:** +5. `readBeforeAttempt` → `modifyBeforeSigning` → `readBeforeSigning` +6. **Auth scheme resolution** → resolve identity → resolve endpoint → apply endpoint overrides +7. `protocol.setServiceEndpoint(request, endpoint)` → merge endpoint into request +8. `signer.sign(request, identity, signerProperties)` → signed request +9. Activate event stream writer if applicable +10. `readAfterSigning` → `modifyBeforeTransmit` → `readBeforeTransmit` +11. **`transport.send(context, request)`** → response +12. `readAfterTransmit` → `modifyBeforeDeserialization` → `readBeforeDeserialization` +13. `protocol.deserializeResponse(...)` → output or error +14. `readAfterDeserialization` → `modifyBeforeAttemptCompletion` → `readAfterAttempt` + +**Retry decision:** +15. If retryable error and stream is replayable: `retryStrategy.refreshRetryToken()` → sleep → back to step 5 +16. If not retryable: `retryStrategy.recordSuccess()` (on success) + +**Completion (once per execution):** +17. `modifyBeforeCompletion` → `readAfterExecution` +18. Return output or throw error + +## Transport Abstraction + +```java +public interface ClientTransport extends Closeable { + ResponseT send(Context context, RequestT request); + MessageExchange messageExchange(); +} +``` + +`ClientTransportFactory` is the SPI for discovering transports with priority ordering. `remapExceptions()` maps JDK +exceptions to the `TransportException` hierarchy. + +## Protocol Abstraction + +```java +public interface ClientProtocol { + ShapeId id(); + Codec payloadCodec(); + MessageExchange messageExchange(); + RequestT createRequest(ApiOperation operation, I input, Context context, SmithyUri endpoint); + RequestT setServiceEndpoint(RequestT request, Endpoint endpoint); + O deserializeResponse(ApiOperation operation, Context context, TypeRegistry errorRegistry, + RequestT request, ResponseT response); +} +``` + +`ClientProtocolFactory` is the SPI for creating protocols from Smithy traits. + +## RequestOverrideConfig + +Allows per-request configuration overrides. Contains the same elements as `ClientConfig` but all optional: + +```java +RequestOverrideConfig override = RequestOverrideConfig.builder() + .endpointResolver(EndpointResolver.staticEndpoint("https://other.com")) + .addInterceptor(loggingInterceptor) + .build(); + +Output result = client.call("GetItem", input, override); +``` + +Application: creates a new `ClientConfig` by copying the original builder, applying non-null overrides (additive for +interceptors, auth schemes, identity resolvers), and rebuilding. + +## Error Hierarchy + +``` +CallException + └─ TransportException + ├─ ConnectTimeoutException + ├─ ConnectionAcquireTimeoutException + ├─ ConnectionClosedException + ├─ TlsException + ├─ TransportProtocolException + ├─ TransportSocketException + └─ TransportSocketTimeout +``` + +## Key Classes Summary + +| Class | Role | +|-------|------| +| `Client` | Abstract base for all clients. Entry point via `call()`. | +| `Client.Builder` | Abstract builder wrapping `ClientConfig.Builder`. | +| `ClientConfig` | Immutable snapshot of all client configuration. | +| `ClientPipeline` | Orchestrates the full request lifecycle. | +| `ClientCall` | Per-call data bag (input, operation, config, retry state). | +| `ClientPlugin` | Functional interface to modify `ClientConfig.Builder`. | +| `AutoClientPlugin` | SPI marker for auto-discovered plugins. | +| `DefaultPlugin` | Singleton FIRST-phase plugin (transport, retry, idempotency, auto-plugins). | +| `ClientInterceptor` | ~20 hook methods for observing/modifying the call pipeline. | +| `ClientProtocol` | Serializes input → request, deserializes response → output. | +| `ClientTransport` | Sends serialized requests and returns responses. | +| `MessageExchange` | Marker ensuring protocol/transport compatibility. | +| `RequestOverrideConfig` | Per-request configuration overrides. | +| `CallContext` | Context keys: ENDPOINT, IDENTITY, RETRY_ATTEMPT, etc. | diff --git a/docs/technical-guide/codecs.md b/docs/technical-guide/codecs.md new file mode 100644 index 000000000..825d4cac1 --- /dev/null +++ b/docs/technical-guide/codecs.md @@ -0,0 +1,374 @@ +# Codecs + +> **Last updated:** April 29, 2026 + +Codecs are the abstraction Smithy-Java uses to read and write bytes from and to structures defined in the model. The +system is schema-driven, every serialization and deserialization call receives a `Schema` parameter that tells the +codec how to handle the value (field names, timestamp formats, XML attributes, etc.). This design enables +protocol-agnostic types: the same generated structure can be serialized to JSON, CBOR, or XML without any +format-specific code in the type itself. + +**Source:** [`codecs/`](https://github.com/smithy-lang/smithy-java/tree/main/codecs) + +## Architecture Overview + +``` +core/serde/ # Abstract interfaces: Codec, ShapeSerializer, ShapeDeserializer +codecs/ +├── json-codec/ # JSON (RFC 8259), most feature-rich, two provider implementations +├── cbor-codec/ # CBOR binary format (RFC 7049), used by RPC v2 +└── xml-codec/ # XML with Smithy XML trait support, used for legacy AWS protocols +``` + +## Core Interfaces + +### Codec + +[`Codec`](https://github.com/smithy-lang/smithy-java/blob/main/core/src/main/java/software/amazon/smithy/java/core/serde/Codec.java) +is the top-level abstraction. It creates serializers and deserializers and provides convenience methods: + +```java +public interface Codec extends AutoCloseable { + ShapeSerializer createSerializer(OutputStream sink); + ShapeDeserializer createDeserializer(ByteBuffer source); + + // Convenience: serialize shape to bytes + default ByteBuffer serialize(SerializableShape shape); + + // Convenience: deserialize bytes to shape + default T deserializeShape(byte[] source, ShapeBuilder builder); +} +``` + +The default `deserializeShape()` follows a three-step pattern: +`builder.deserialize(createDeserializer(source)).errorCorrection().build()`. + +### ShapeSerializer + +[`ShapeSerializer`](https://github.com/smithy-lang/smithy-java/blob/main/core/src/main/java/software/amazon/smithy/java/core/serde/ShapeSerializer.java) +is the **push-based** serialization interface. Every write method takes a `Schema` parameter: + +```java +void writeStruct(Schema schema, SerializableStruct struct); +void writeBoolean(Schema schema, boolean value); +void writeString(Schema schema, String value); +void writeInteger(Schema schema, int value); +void writeTimestamp(Schema schema, Instant value); +void writeBlob(Schema schema, ByteBuffer value); +void writeDocument(Schema schema, Document value); +void writeNull(Schema schema); +// ... and all other Smithy types +``` + +**Lists and maps use a callback pattern**, the serializer controls framing (brackets, tags) around the callback: + +```java + void writeList(Schema schema, T listState, int size, BiConsumer consumer); + void writeMap(Schema schema, T mapState, int size, BiConsumer consumer); +``` + +This design lets the codec write `[` before and `]` after for JSON, or CBOR array headers, without the caller needing to +know the format. + +### ShapeDeserializer + +[`ShapeDeserializer`](https://github.com/smithy-lang/smithy-java/blob/main/core/src/main/java/software/amazon/smithy/java/core/serde/ShapeDeserializer.java) +is the **pull-based** deserialization interface: + +```java +boolean readBoolean(Schema schema); +String readString(Schema schema); +int readInteger(Schema schema); +Instant readTimestamp(Schema schema); +Document readDocument(); + + void readStruct(Schema schema, T state, StructMemberConsumer consumer); + void readList(Schema schema, T state, ListMemberConsumer consumer); + void readStringMap(Schema schema, T state, MapMemberConsumer consumer); +``` + +Struct deserialization uses a callback that receives each member: + +```java +interface StructMemberConsumer { + void accept(T state, Schema memberSchema, ShapeDeserializer memberDeserializer); + default void unknownMember(T state, String memberName) {} +} +``` + +The deserializer iterates over fields, resolves each field name to a `Schema` member, and calls the consumer with the +member schema and a deserializer positioned at the value. + +### MapSerializer + +```java +public interface MapSerializer { + void writeEntry(Schema keySchema, String key, T state, + BiConsumer valueSerializer); +} +``` + +### Supporting Classes + +- **`InterceptingSerializer`** — Abstract class with `before(Schema)` and `after(Schema)` hooks. `before()` returns the delegate serializer. Used extensively in XML for element wrapping and in CBOR for struct member name writing. +- **`SpecificShapeSerializer`** / **`SpecificShapeDeserializer`** — Base classes that throw for all methods. Subclasses override only what they support. +- **`ListSerializer`** — Decorator that calls `beforeEachValue(position)` before delegating each write. +- **`NullSerializer`** — Singleton that silently discards all writes. +- **`TypeRegistry`** — Maps shape IDs to builders for polymorphic deserialization (error types, union variants). +- **`TimestampFormatter`** — Handles epoch-seconds, date-time (ISO 8601), and HTTP-date formats. + +## Schema Interaction + +Codecs use schema metadata extensively: + +- `schema.type()` — Determines the Smithy shape type +- `schema.memberName()` — The field name to write/read +- `schema.memberIndex()` — Positional index for optimized field name lookup +- `schema.members()` — Children of struct/union schemas +- `schema.listMember()` / `schema.mapKeyMember()` / `schema.mapValueMember()` — Collection member schemas +- `schema.getTrait(TraitKey)` — Trait lookup (e.g., `@jsonName`, `@timestampFormat`, `@xmlAttribute`) +- `schema.getExtension(key)` — Lazily-computed, cached format-specific data + +### Schema Extension System + +Codecs pre-compute format-specific data on Schema objects via +[`SchemaExtensionProvider`](https://github.com/smithy-lang/smithy-java/blob/main/core/src/main/java/software/amazon/smithy/java/core/schema/SchemaExtensionProvider.java) +SPI: + +```java +public interface SchemaExtensionProvider { + SchemaExtensionKey key(); + T provide(Schema schema); // called lazily, cached on the Schema +} +``` + +This is how the JSON codec pre-computes UTF-8 byte arrays for field names, the XML codec pre-computes element/attribute +mappings, and the CBOR codec pre-computes canonicalized member name bytes. + +## JSON Codec + +**Source:** [`codecs/json-codec/`](https://github.com/smithy-lang/smithy-java/tree/main/codecs/json-codec) + +### JsonCodec + +`JsonCodec` implements `Codec` and delegates to a pluggable `JsonSerdeProvider`. Configuration: + +```java +JsonCodec codec = JsonCodec.builder() + .useJsonName(true) // honor @jsonName trait + .useTimestampFormat(true) // honor @timestampFormat trait + .defaultTimestampFormat(TimestampFormatter.EPOCH_SECONDS) + .prettyPrint(false) + .build(); +``` + +### Provider Architecture + +Two implementations, selected via `ServiceLoader` with priority ordering: + +1. **`SmithyJsonSerdeProvider`** (priority 5, name "smithy") — High-performance native implementation. The default. +2. **`JacksonJsonSerdeProvider`** (priority 0, name "jackson") — Jackson-based fallback. + +Override via system property: `-Dsmithy-java.json-provider=smithy|jackson`. + +### SmithyJsonSerializer (Native) + +The native JSON serializer writes directly to a `byte[]` buffer with no intermediate String/char operations. Key +optimizations: + +- **Object pooling** — Striped `AtomicReferenceArray` pool (`processors * 4` slots, power-of-2). Uses + `compareAndExchangeAcquire/Release` for lock-free acquire/release. Skips pooling for virtual threads. +- **Pre-computed field names** — Resolves `byte[][]` field name tables from `SmithyJsonSchemaExtensions` once per + `writeStruct`, then uses `System.arraycopy` for each member. The pre-computed bytes include quotes and colon (e.g., + `"fieldName":`). +- **Fused capacity checks** — For struct members, computes `nameBytes.length + 1 + maxValueBytes` in a single + `ensureCapacity` call. +- **Custom number formatting** — Uses `Schubfach` algorithm for double/float-to-decimal conversion (avoids + `Double.toString()`). +- **Nesting via `boolean[] needsComma`** — Tracks comma insertion per depth level (max 64). + +Serialization flow for a struct: +1. Write `{`, increment depth +2. Resolve field name table from schema extension +3. Call `struct.serializeMembers(structSerializer)` +4. For each member, the inner `StructSerializer` resolves pre-computed field name bytes by `memberIndex`, writes comma if needed, writes field name bytes, then delegates value writing +5. Write `}` + +### SmithyJsonDeserializer (Native) + +Operates directly on `byte[]` with `int pos/end` cursors. Key optimizations: + +- **Speculative field matching** — In `readStruct`, tries to match the next field name against the expected next member + (by schema definition order) using `Arrays.equals` on raw bytes. If it matches, skips the hash lookup entirely. Falls + back to `SmithyMemberLookup.lookup()` (hash-based) on miss. +- **Localized hot fields** — Copies `this.buf/end/pos` to local variables before struct/list/map loops to help JIT + register allocation. +- **Fast timestamp parsing** — For epoch-seconds, tries `parseLong` first (most timestamps are whole numbers), then + handles fractional nanoseconds directly from bytes. For ISO-8601 and HTTP-date, parses directly from bytes bypassing + `DateTimeFormatter`. +- **Mutable result fields** — `parsedLong`, `parsedDouble`, `parsedString` avoid allocations. + +### SmithyJsonSchemaExtensions + +Pre-computes per-schema: +- UTF-8 byte arrays for field names (including quotes and colon) +- `SmithyMemberLookup` hash tables for O(1) field resolution during deserialization +- Indexed field name tables by `memberIndex` for O(1) lookup during serialization + +## CBOR Codec + +**Source:** [`codecs/cbor-codec/`](https://github.com/smithy-lang/smithy-java/tree/main/codecs/cbor-codec) + +### Rpcv2CborCodec + +Simpler than JSON, no `@jsonName` or `@timestampFormat` trait handling. Delegates to `CborSerdeProvider` (SPI-loaded). + +### CborSerializer + +Writes CBOR binary format to a `Sink` abstraction (three implementations: `OutputStreamSink`, `ResizingSink`, +`NullSink`). + +Key characteristics: +- **Indefinite-length maps for structs** — Uses `0xBF` (indefinite map start) with `0xFF` (break byte), since member + count isn't known upfront. +- **Definite-length arrays for lists** — Uses definite length when size is known. +- **Compact integer encoding** — Selects 1/2/4/8 byte encoding based on value magnitude. +- **Timestamps as tagged epoch** — CBOR tag 1 wrapping a double of epoch milliseconds / 1000. +- **BigInteger** — CBOR tags 2 (positive) and 3 (negative) with byte string payload. +- **BigDecimal** — CBOR tag 4 with `[exponent, mantissa]` array. + +Inner `CborStructSerializer` extends `InterceptingSerializer`, `before(Schema)` writes the member name as a CBOR text +string, returns the parent serializer for value writing. + +### CborDeserializer + +Uses `CborParser` (token-based parser) operating on `byte[]` payload. + +Key characteristics: +- **Canonicalizer for struct member lookup** — `Canonicalizer` class pre-computes UTF-8 byte arrays for all members, + grouped by length. Resolution compares raw bytes at the parser position without String allocation. Cached in a static + `ConcurrentHashMap`. +- **Half-precision float support** — Decodes IEEE 754 half-precision floats. +- **Null skipping** — `readStruct` skips explicit null values without dispatching events. +- **Container size** — `containerSize()` delegates to `parser.collectionSize()` for pre-allocation hints. + +### Sink Interface + +Sealed interface with three implementations: +- `OutputStreamSink` — Wraps `OutputStream` +- `ResizingSink` — Growable `byte[]` buffer with `finish()` returning `ByteBuffer` +- `NullSink` — Discards all writes (for size calculation) + +## XML Codec + +**Source:** [`codecs/xml-codec/`](https://github.com/smithy-lang/smithy-java/tree/main/codecs/xml-codec) + +The XML codec is the most complex due to XML's structural differences from the Smithy data model. It uses +`javax.xml.stream` (StAX) for both reading and writing, with DTD and external entity support disabled for security. + +### XmlCodec + +```java +XmlCodec codec = XmlCodec.builder() + .wrapperElements(List.of("OperationNameResponse", "OperationNameResult")) + .build(); +``` + +The `wrapperElements` configuration is used for AWS Query protocol, where responses are wrapped in extra elements. + +### XmlSerializer + +Extends `InterceptingSerializer` and uses a hierarchy of 6+ inner serializer classes: + +1. **`XmlSerializer`** (top-level) — Writes root element with `@xmlName` trait and namespace. +2. **`StructMemberSerializer`** — Routes members: attributes → skip, flattened → `ValueSerializer`, normal → `NonFlattenedMemberSerializer`. +3. **`StructAttributeSerializer`** — Routes: attributes → `AttributeSerializer`, non-attributes → skip. +4. **`NonFlattenedMemberSerializer`** — Wraps each member in `...`. +5. **`ValueSerializer`** — Writes actual values. For structs, first serializes attributes then members. +6. **`AttributeSerializer`** — Writes values as XML attributes via `writer.writeAttribute()`. + +Struct serialization flow: +1. Write `` +2. If struct has attributes, call `struct.serializeMembers(structAttributeSerializer)` first +3. Then call `struct.serializeMembers(structMemberSerializer)` for elements +4. Write `` + +Trait handling: +- `@xmlName` — Overrides element/attribute names +- `@xmlAttribute` — Serializes as XML attribute instead of element +- `@xmlFlattened` — Skips wrapper element for lists/maps +- `@xmlNamespace` — Adds namespace declarations + +### XmlDeserializer + +Uses `XmlReader` (wraps StAX `XMLStreamReader` or buffered events). Two-layer design: + +- **`XmlDeserializer`** (outer) — Handles top-level element entry/exit, validates root element name. +- **`InnerDeserializer`** (inner) — Handles actual value parsing from XML text content. + +Key features: +- **Wrapper element skipping** — For AWS Query protocol, skips `` and `` + wrappers. +- **Flattened member buffering** — For interspersed flattened lists (e.g., S3's `ListObjectVersions`), buffers XML + events per member, then replays them via `XmlReader.BufferedReader` after all non-flattened members are read. +- **Attribute deserialization** — `AttributeDeserializer` reads attribute values as strings and parses to target types. + +### XmlInfo + +Caches pre-computed XML metadata per Schema in `ConcurrentHashMap`s: + +- **`StructInfo`** — `xmlName`, `xmlNamespace`, attribute/element member maps, `hasFlattened` flag +- **`ListMemberInfo`** — `xmlName`, `memberName` (default "member"), `flattened` flag +- **`MapMemberInfo`** — `xmlName`, `entryName` (default "entry"), `keyName`, `valueName`, `flattened` flag + +## Serialization/Deserialization Flow + +### Serialization (shape → bytes) + +``` +codec.serialize(shape) + → codec.createSerializer(outputStream) + → shape.serialize(serializer) + → serializer.writeStruct(schema, struct) + → struct.serializeMembers(innerSerializer) + → innerSerializer.writeString(memberSchema, value) // for each member + → format-specific byte writing +``` + +### Deserialization (bytes → shape) + +``` +codec.deserializeShape(bytes, builder) + → codec.createDeserializer(bytes) + → builder.deserialize(deserializer) + → deserializer.readStruct(schema, builder, consumer) + → for each field: resolve member schema, call consumer.accept(builder, memberSchema, deserializer) + → consumer reads value: deserializer.readString(memberSchema) + → builder.errorCorrection().build() +``` + +## Key Design Patterns + +1. **Push-based serialization, pull-based deserialization** — Serializers receive data pushed by shapes. Deserializers + pull data guided by schemas. + +2. **Schema-driven** — Every read/write takes a `Schema` parameter. Codecs use schema metadata (traits, type, member + index) to determine format-specific behavior. + +3. **Callback-based containers** — Lists and maps use `BiConsumer` callbacks rather than returning sub-serializers. This + lets the codec control framing around the callback. + +4. **Extension pre-computation** — Format-specific data (field name bytes, member lookups, timestamp formatters) is + computed lazily on first access and cached on Schema objects via `SchemaExtensionProvider` SPI. + +5. **Provider SPI** — Both JSON and CBOR use `ServiceLoader`-based provider selection with priority ordering, allowing + pluggable implementations. + +6. **InterceptingSerializer pattern** — Used in XML and CBOR for wrapping writes with before/after hooks (element + open/close, field name writing). + +7. **Speculative optimization** — The native JSON deserializer speculatively matches field names in schema definition + order, avoiding hash lookups on the common path. + +8. **Object pooling** — The native JSON serializer uses a lock-free striped pool for buffer reuse, with virtual-thread + awareness. diff --git a/docs/technical-guide/codegen.md b/docs/technical-guide/codegen.md new file mode 100644 index 000000000..e3712f824 --- /dev/null +++ b/docs/technical-guide/codegen.md @@ -0,0 +1,475 @@ +# Code Generation + +> **Last updated:** April 29, 2026 + +Smithy-Java has a single codegen plugin that generates Java code from Smithy models. It can be configured for one or +more modes, `client`, `server`, and `types`, and produces generated types, client implementations, and server stubs +from the same codebase. The codegen builds on the +[smithy-codegen-core](https://github.com/smithy-lang/smithy/tree/main/smithy-codegen-core) framework and follows the +[directed codegen pattern](https://smithy.io/2.0/guides/building-codegen/index.html). + +**Source:** [`codegen/`](https://github.com/smithy-lang/smithy-java/tree/main/codegen) + +## Module Structure + +``` +codegen/ +├── codegen-core/ # Core framework: generators, writer, integrations, symbol provider +│ ├── src/main/ # Public API shared by all modes +│ └── src/internal/ # JavaTypesCodegenPlugin, TypesDirectedJavaCodegen +├── codegen-plugin/ # Main plugin: JavaCodegenPlugin, DirectedJavaCodegen, client/server generators +├── plugins/ # Thin wrapper JARs (client-codegen, server-codegen, types-codegen) +└── test-utils/ # Test utilities +``` + +The split between `codegen-core` and `codegen-plugin` is intentional: `codegen-core` contains everything needed for type +generation (shared between client and server), while `codegen-plugin` adds client-specific and server-specific +generators. This allows lightweight type-only generation without pulling in client/server dependencies. + +## Plugin Architecture + +### SmithyBuildPlugin Implementations + +Two plugins are registered via Java SPI (`META-INF/services/software.amazon.smithy.build.SmithyBuildPlugin`): + +1. **`JavaCodegenPlugin`** (`codegen-plugin`) — name: `"java-codegen"`. The primary plugin supporting all modes. +2. **`JavaTypesCodegenPlugin`** (`codegen-core/internal`) — name: `"internal-types-only"`. A lightweight types-only + plugin used internally (e.g., by `framework-errors`). Lives in `codegen-core` to avoid pulling in client/server + dependencies. + +### Configuration + +The plugin is configured via `smithy-build.json`: + +```json +{ + "plugins": { + "java-codegen": { + "service": "com.example#CoffeeShop", + "namespace": "com.example", + "modes": ["client"], + "headerFile": "license.txt", + "addNullnessAnnotations": true, + "protocol": "smithy.protocols#rpcv2Cbor", + "edition": "LATEST", + "runtimeTraits": ["custom.trait#MyTrait"], + "runtimeTraitsSelector": "[id|namespace = 'custom']" + } + } +} +``` + +Key settings: +- `service` — The Smithy service shape ID to generate for (required for client/server modes). +- `namespace` — Java package namespace for generated code. +- `modes` — Array of `"client"`, `"server"`, and/or `"types"`. Determines what gets generated. +- `protocol` — Optional protocol to target. Affects which traits are included in schemas. +- `runtimeTraits` / `runtimeTraitsSelector` — Additional traits to include in generated schemas beyond the defaults. + +### Execution Flow + +`JavaCodegenPlugin.execute(PluginContext)`: + +1. Parses `modes` from settings +2. Routes to either `executeTypesMode()` or `executeServiceMode()`: + ```java + if (modes.contains(TYPES) && !modes.contains(CLIENT) && !modes.contains(SERVER)) { + executeTypesMode(context, settingsNode, modes); + } else { + executeServiceMode(context, settingsNode, modes); + } + ``` +3. Creates a `CodegenDirector` parameterized with `` +4. Applies `DefaultTransforms` (enum conversion, pagination flattening, idempotency tokens, deprecation filtering, dedicated I/O, etc.) +5. Calls `runner.run()` which drives the directed codegen pattern + +### Dependency Validation + +`JavaCodegenPlugin.validateDependencies()` checks at runtime that required classes are on the classpath: +- CLIENT mode → requires `software.amazon.smithy.java.client.core.Client` +- SERVER mode → requires `software.amazon.smithy.java.server.Service` +- CLIENT with endpoint rules → requires `RulesEngineBuilder` +- CLIENT with waiters → requires `Waiter` + +## Modes + +`CodegenMode` is a simple enum: `CLIENT`, `SERVER`, `TYPES`. + +| Mode(s) | Service Required | Types Generated | Client Generated | Server Generated | +|---------|-----------------|-----------------|-----------------|-----------------| +| `[types]` | No (synthetic) | ✅ | ❌ | ❌ | +| `[client]` | Yes | ✅ | ✅ | ❌ | +| `[server]` | Yes | ✅ | ❌ | ✅ | +| `[client, server]` | Yes | ✅ | ✅ | ✅ | +| `[client, types]` | Yes | ✅ (expanded) | ✅ | ❌ | + +When TYPES is combined with CLIENT or SERVER, `expandServiceClosureForTypes()` adds all model shapes (structures, unions, enums, intEnums) not already in the service closure via a synthetic operation. + +## The Directed Codegen Pattern + +The Smithy codegen framework uses a "directed codegen" pattern where the framework walks the model and calls +shape-specific methods on a `DirectedCodegen` implementation. This is documented in the +[Smithy codegen guide](https://smithy.io/2.0/guides/building-codegen/index.html). + +### DirectedJavaCodegen + +[`DirectedJavaCodegen`](https://github.com/smithy-lang/smithy-java/blob/main/codegen/codegen-plugin/src/main/java/software/amazon/smithy/java/codegen/DirectedJavaCodegen.java) +is the unified implementation for all modes. Shape generation methods are unconditional, they always run. Only +`generateOperation()` and `generateService()` have mode-conditional branches: + +```java +void generateStructure(...) → StructureGenerator // always +void generateError(...) → StructureGenerator // always +void generateUnion(...) → UnionGenerator // always +void generateEnumShape(...) → EnumGenerator // always + +void generateOperation(...) { + if (isSynthetic || isTypesOnly) return; // skip for types-only + if (modes.contains(SERVER)) new OperationInterfaceGenerator().accept(directive); + new OperationGenerator().accept(directive); +} + +void generateService(...) { + if (isTypesOnly) return; + if (modes.contains(CLIENT)) { + new ClientInterfaceGenerator().accept(directive); + new ClientImplementationGenerator().accept(directive); + } + if (modes.contains(SERVER)) new ServiceGenerator().accept(directive); + new ApiServiceGenerator().accept(directive); + new ServiceExceptionGenerator<>().accept(directive); +} +``` + +There's also a `TypesDirectedJavaCodegen` in `codegen-core/internal` — a stripped-down version where +`generateOperation()` and `generateService()` are no-ops. Used only by `JavaTypesCodegenPlugin`. + +### Synthetic Shape Filtering + +Both implementations skip shapes in the `smithy.synthetic` namespace: + +```java +private static boolean isSynthetic(Shape shape) { + return shape.getId().getNamespace().equals(SyntheticServiceTransform.SYNTHETIC_NAMESPACE); +} +``` + +## The Dummy Service Trick for Types Codegen + +**Problem**: Directed codegen requires a root `ServiceShape` to walk the model. Types-only mode has no service. + +**Solution**: + [`SyntheticServiceTransform`](https://github.com/smithy-lang/smithy-java/blob/main/codegen/codegen-core/src/main/java/software/amazon/smithy/java/codegen/SyntheticServiceTransform.java) + creates a synthetic service that wraps the target shapes. + +This is likely to change when we finish the design and implementation of types codegen that can be used for all Smithy +based SDKs. + +### How It Works + +`SyntheticServiceTransform.transform(Model, Set, Map)`: + +1. Creates a `ServiceShape` with ID `smithy.synthetic#TypesGenService` and a `SyntheticTrait` +2. For each shape in the closure: + - **Operations** → added directly to the service + - **Structures, Enums, IntEnums, Unions** → collected for wrapping +3. Creates synthetic wrapper shapes: + - `smithy.synthetic#TypesOperationInput` — a structure with one member per type (`m0`, `m1`, ...) + - `smithy.synthetic#TypesOperationOutput` — empty structure + - `smithy.synthetic#TypesOperation` — operation with `@private` and `@synthetic` traits +4. Adds the synthetic operation to the service +5. Returns the modified model + +When TYPES is combined with CLIENT/SERVER, `expandServiceClosure()` adds a synthetic operation to the *existing* service to include additional shapes not already in its closure. + +### TypeCodegenSettings + +For types-only mode, `TypeCodegenSettings` wraps `JavaCodegenSettings` and injects the synthetic service ID: + +```java +nodeBuilder.withMember("service", SyntheticServiceTransform.SYNTHETIC_SERVICE_ID.toString()); +return JavaCodegenSettings.fromNode(nodeBuilder.build()); +``` + +Default selector: `:is(structure, union, enum, intEnum)`, generates all aggregate types. + +## JavaWriter and Format Patterns + +### Class Hierarchy + +``` +SymbolWriter (smithy-codegen-core) + └── DeferredSymbolWriter (codegen-core) — adds symbol table for deferred name resolution + └── JavaWriter (codegen-core) — Java-specific formatters and import handling +``` + +`DeferredSymbolWriter` adds a `Map> symbolTable` that tracks all symbols by name. This enables detecting name conflicts (e.g., two classes named `Error` from different packages) and resolving them at `toString()` time by using fully-qualified names. + +### Format Patterns + +JavaWriter registers custom formatters that are used extensively in code templates: + +| Formatter | Character | Purpose | Example | +|-----------|-----------|---------|---------| +| `$T` | T | Java type reference | `$T` → `MyClass` (with import) | +| `$B` | B | Boxed type | `int` → `Integer` | +| `$N` | N | Nullable annotation | Adds `@Nullable` when enabled, else falls back to `$B` | +| `$U` | U | Capitalize first letter | `"foo"` → `"Foo"` | +| `$L` | L | Literal (inherited) | Writes value as-is | +| `$S` | S | String literal (inherited) | Writes value quoted | +| `$C` | C | Code section/consumer (inherited) | Executes a `Runnable` or `Consumer` | + +### The $C Formatter + +`$C` is the most important formatter for understanding the codegen. It executes a `Runnable` or `Consumer` +inline. This is how large templates are broken into manageable pieces, as documented in the +[Smithy codegen guide](https://smithy.io/2.0/guides/building-codegen/generating-code.html#breaking-up-large-templates-with-the-c-formatter). + +Generators store sub-generators in the writer's context: + +```java +writer.putContext("schemas", new SchemaFieldGenerator(directive, writer, shape)); +writer.putContext("properties", new PropertyGenerator(directive, writer)); +writer.putContext("constructor", new ConstructorGenerator(directive, writer)); +``` + +Then the template references them: + +```java +writer.write(""" + public final class ${shape:T} { + ${schemas:C|} + ${properties:C|} + ${constructor:C|} + } + """); +``` + +The `${name:C|}` syntax means "execute the Runnable stored in context key 'name', with a trailing separator". + +### The $T Formatter + +`$T` handles Java type references with automatic import management. For generic types, it recursively formats type +references: + +```java +writer.write("$T<$T, $T>", mapSymbol, keySymbol, valueSymbol); +// → Map (with imports for Map, String, Integer) +``` + +The placeholder mechanism uses `£{fullName:L}` internally, which is resolved at `toString()` time after all symbols are +collected and conflicts detected. + +### toString() Output + +```java +public String toString() { + putNameContext(); // Resolve all symbol name conflicts + setExpressionStart('£'); // Switch to £ for final resolution + return format("£Lpackage £L;\n\n£L\n£L", header, packageNamespace, imports, body); +} +``` + +## Type Generation + +### How Types Are Shared Between Client and Server + +All type generators live in `codegen-core` and are shared: +- `StructureGenerator`, `EnumGenerator`, `UnionGenerator`, `ListGenerator`, `MapGenerator` +- `SchemasGenerator`, `SharedSerdeGenerator`, `SchemaIndexGenerator` +- `OperationGenerator` (generates `ApiOperation` classes) +- `BuilderGenerator`, `SchemaFieldGenerator`, `SerializerMemberGenerator`, etc. + +Mode-specific generators live in `codegen-plugin`: +- Client: `ClientInterfaceGenerator`, `ClientImplementationGenerator`, `BddFileGenerator`, `WaiterContainerGenerator` +- Server: `ServiceGenerator`, `OperationInterfaceGenerator` + +### StructureGenerator + +Generates a `public final class` implementing `SerializableStruct` (or extending `ModeledException` for errors). Uses a template with context-driven sub-generators: + +| Context Key | Generator | Purpose | +|-------------|-----------|---------| +| `schemas` | `SchemaFieldGenerator` | `$SCHEMA` static field | +| `properties` | `PropertyGenerator` | Private final fields | +| `constructor` | `ConstructorGenerator` | Private constructor from Builder | +| `getters` | `GetterGenerator` | Public getter methods | +| `equals` | `EqualsGenerator` | `equals()` with cost-optimized member ordering | +| `hashCode` | `HashCodeGenerator` | `hashCode()` | +| `serializer` | `StructureSerializerGenerator` | `serializeMembers()` | +| `builder` | `StructureBuilderGenerator` | Nested `Builder` class | + +Key behaviors: +- Error structures extend `ModeledException` (or a service-specific exception) +- Streaming members make the class implement `Closeable` +- Collections are wrapped with `Collections.unmodifiable*()` in the constructor +- `EqualsGenerator` uses `CheapEqualsFirstComparator` to order equality checks by cost (primitives first) +- Builder has `PresenceTracker` for required members and `errorCorrection()` for client error correction + +### EnumGenerator + +Generates a `public sealed interface` implementing `SmithyEnum` (string) or `SmithyIntEnum` (int): + +- Static instances for each variant (e.g., `MyEnum OPTION_ONE = new OptionOneType();`) +- Inner `final class` for each known variant +- `$Unknown` record for unknown values +- Builder with `value()` setter and switch-based `build()` + +### UnionGenerator + +Generates a `public sealed interface` implementing `SerializableStruct`: + +- Inner `record` for each variant (e.g., `record FooMember(String foo) implements MyUnion`) +- `$Unknown` record for unknown members +- Builder enforces single-value constraint + +### SchemasGenerator + +Runs during `customizeBeforeIntegrations()`. Generates `Schemas` classes containing static `Schema` fields for all +shapes. Uses `SchemaFieldOrder` to partition shapes across multiple classes to avoid JVM class file size limits. + +Recursive shapes use a two-phase approach in the generated code: + +```java +// Phase 1: Builder +static final SchemaBuilder FOO_BUILDER = Schema.structureBuilder(id); + +// Phase 2: Members (in static block) +FOO_BUILDER.putMember("bar", BAR_SCHEMA); +FOO_BUILDER.putMember("self", FOO_BUILDER); // recursive + +// Phase 3: Build +static final Schema FOO = FOO_BUILDER.build().resolve(); +``` + +## Integration System + +### JavaCodegenIntegration + +```java +public interface JavaCodegenIntegration + extends SmithyIntegration { + + default List> traitInitializers() { + return List.of(); + } +} +``` + +Integrations are discovered via SPI +(`META-INF/services/software.amazon.smithy.java.codegen.JavaCodegenIntegration`). They can: + +- `preprocessModel()` — Transform the model before codegen +- `decorateSymbolProvider()` — Customize symbol resolution +- `interceptors()` — Add code section interceptors +- `customize()` — Post-generation customization +- `traitInitializers()` — Register trait construction code generators + +### Built-in Integrations + +**`CoreIntegration`** (priority -1, runs last): +- Decorates `SymbolProvider` to track generated symbols per package +- Registers 15 `TraitInitializer` implementations for prelude traits (pagination, HTTP, compression, length, range, + etc.) +- Includes a `GenericTraitInitializer` catch-all that uses `TraitService` SPI + Node deserialization for any trait + without a specialized initializer + +**`JavadocIntegration`**: +- Registers 9 code interceptors for Javadoc generation: `@SmithyGenerated` annotation, documentation text, `@see` links, + `@since` tags, `@Deprecated` annotation, operation error docs, builder setter docs, etc. + +### Code Sections + +Interceptors target specific `CodeSection` types. When the codegen writes a section (e.g., a class declaration, a getter +method), registered interceptors can inject additional code before/after: + +- `ClassSection` — wraps class/interface generation +- `GetterSection` — wraps getter methods +- `BuilderSetterSection` — wraps builder setter methods +- `OperationSection` — wraps operation methods +- `JavadocSection` — wraps Javadoc blocks +- `MemberSerializerSection` / `MemberDeserializerSection` — wraps serde code + +### TraitInitializer + +```java +public interface TraitInitializer extends BiConsumer { + Class traitClass(); +} +``` + +Used by `TraitInitializerGenerator` to write trait construction code in Schema definitions. For example, +`LengthTraitInitializer` writes `new LengthTrait(min, max)` in the generated `Schemas` class. + +## JavaSymbolProvider + +[`JavaSymbolProvider`](https://github.com/smithy-lang/smithy-java/blob/main/codegen/codegen-core/src/main/java/software/amazon/smithy/java/codegen/JavaSymbolProvider.java) +maps Smithy shapes to Java `Symbol` objects. It's mode-aware: + +- **Structures** → `{namespace}.model.{Name}` +- **Enums/IntEnums** → `{namespace}.model.{Name}` +- **Operations** → `{namespace}.model.{Name}` (with server properties when SERVER mode) +- **Services** → mode-dependent: + - CLIENT → `{namespace}.client.{Name}Client` + - SERVER → `{namespace}.service.{Name}` + - TYPES → `null` + +Key symbol properties: +- `IS_PRIMITIVE` — whether the type is a Java primitive +- `BOXED_TYPE` — boxed version (e.g., `int` → `Integer`) +- `EXTERNAL_TYPE` — marks externally-defined types to skip generation +- `SERVICE_EXCEPTION` — service-specific exception symbol + +## CodeGenerationContext + +Central context object available throughout codegen: + +```java +public class CodeGenerationContext { + Model model(); + JavaCodegenSettings settings(); + SymbolProvider symbolProvider(); + FileManifest fileManifest(); + WriterDelegator writerDelegator(); + List integrations(); + Set runtimeTraits(); // Traits to include in Schema definitions + SchemaFieldOrder schemaFieldOrder(); // Ordering/partitioning for Schema classes + TraitInitializer getInitializer(T); // Find initializer for a trait +} +``` + +### Runtime Traits Collection + +`collectRuntimeTraits()` builds the set of traits to include in generated Schemas: +1. Static prelude traits (~25 traits: LengthTrait, PatternTrait, RequiredTrait, etc.) +2. Protocol-defined traits (from `ProtocolDefinitionTrait.getTraits()`) +3. Auth-defined traits (from `AuthDefinitionTrait.getTraits()`) +4. User-configured traits (`runtimeTraits` and `runtimeTraitsSelector` settings) + +## Key Classes Summary + +| Class | Location | Role | +|-------|----------|------| +| `JavaCodegenPlugin` | codegen-plugin | Main SmithyBuildPlugin, routes to types/service mode | +| `JavaTypesCodegenPlugin` | codegen-core/internal | Lightweight types-only plugin | +| `DirectedJavaCodegen` | codegen-plugin | Unified DirectedCodegen for all modes | +| `TypesDirectedJavaCodegen` | codegen-core/internal | Types-only DirectedCodegen | +| `CodegenMode` | codegen-core | Enum: CLIENT, SERVER, TYPES | +| `JavaCodegenSettings` | codegen-core | Plugin settings (service, namespace, protocol, etc.) | +| `JavaWriter` | codegen-core | Code writer with $T, $B, $N, $U formatters | +| `DeferredSymbolWriter` | codegen-core | Base class with deferred symbol name resolution | +| `JavaSymbolProvider` | codegen-core | Maps Smithy shapes to Java Symbols | +| `JavaCodegenIntegration` | codegen-core | SPI for extending codegen | +| `CoreIntegration` | codegen-core | Registers trait initializers, decorates symbol provider | +| `SyntheticServiceTransform` | codegen-core | Creates synthetic service for types-only mode | +| `StructureGenerator` | codegen-core | Generates structure/error classes | +| `EnumGenerator` | codegen-core | Generates sealed interface enums | +| `UnionGenerator` | codegen-core | Generates sealed interface unions | +| `SchemasGenerator` | codegen-core | Generates Schema static fields | +| `OperationGenerator` | codegen-core | Generates ApiOperation classes | +| `TraitInitializer` | codegen-core | Interface for writing trait construction code | +| `ClientInterfaceGenerator` | codegen-plugin | Generates client interface | +| `ClientImplementationGenerator` | codegen-plugin | Generates client implementation | +| `ServiceGenerator` | codegen-plugin | Generates server Service class | +| `OperationInterfaceGenerator` | codegen-plugin | Generates server operation interfaces | +| `CodeGenerationContext` | codegen-core | Central context with model, symbols, traits, integrations | diff --git a/docs/technical-guide/context.md b/docs/technical-guide/context.md new file mode 100644 index 000000000..15686f5a1 --- /dev/null +++ b/docs/technical-guide/context.md @@ -0,0 +1,176 @@ +# Context System + +> **Last updated:** April 29, 2026 + +The `Context` is a typed key-value store that threads through the entire Smithy-Java client and server pipeline. It +carries configuration, per-call state, and subsystem-specific data using type-safe keys. Understanding the context +system is foundational, nearly every other component reads from or writes to a `Context`. + +**Source:** [`context/`](https://github.com/smithy-lang/smithy-java/tree/main/context) + +## Context Interface + +`Context` is a **sealed interface** with two implementations: + +- `ChunkedArrayStorageContext` — The mutable, default implementation +- `UnmodifiableContext` — A read-only wrapper/decorator + +### Core API + +```java +public sealed interface Context { + Context put(Key key, T value); + T get(Key key); + default T getOrDefault(Key key, T defaultValue); + default T expect(Key key); // throws NullPointerException if missing + default Context putIfAbsent(Key key, T value); + default T computeIfAbsent(Key key, Function, ? extends T> mappingFunction); + void copyTo(Context target); // deep copy via copyFunction + default Context merge(Context other); // returns a NEW context with both +} +``` + +### Factory Methods + +```java +Context.create(); // new mutable context +Context.empty(); // singleton unmodifiable empty context +Context.modifiableCopy(context); // deep copy, mutable +Context.unmodifiableCopy(context); // deep copy, immutable +Context.unmodifiableView(context); // live view (not a copy), immutable +``` + +## Context.Key<T> + +`Context.Key` is a final inner class providing identity-based, type-safe tokens for context values. + +```java +static Key key(String name); // identity copy (shallow) +static Key key(String name, Function copyFunction); // custom deep copy +``` + +Each key gets a monotonically increasing integer `id` used as an array index. Keys are registered in a global +`CopyOnWriteArrayList` with `synchronized` ID assignment. + +**Critical rule:** Keys MUST be stored as `static final` fields. Creating ephemeral keys permanently grows the global + registry and every context's storage. + +### Copy Function + +The `copyFunction` controls what happens during `copyTo()`. For immutable types, `Function.identity()` is used +(default). For mutable types, provide a copy constructor: + +```java +// Immutable value, identity copy (default) +Context.Key FOO = Context.key("Foo"); + +// Mutable value, deep copy +Context.Key> FEATURES = Context.key("Features", HashSet::new); +``` + +Without a proper copy function, `copyTo()` shares the same mutable object reference between source and target contexts. + +## Storage Implementation + +`ChunkedArrayStorageContext` uses a **chunked array** strategy for O(1) get/put by key ID: + +- Keys are stored in 32-element chunks (`CHUNK_SIZE = 32`) +- Chunks are allocated lazily as needed +- Key ID is decomposed: `chunkIdx = id >> 5`, `offset = id & 0x1F` +- ~4.5x faster copies than HashMap, ~2x faster gets (per Javadoc) + +`ChunkedArrayStorageContext` is **NOT thread-safe**. It's designed for single-threaded use within a request +pipeline. Key registration IS thread-safe. + +## Context Lifecycle in the Client Pipeline + +``` +ClientConfig.Builder.context (mutable) + ↓ build() +ClientConfig.context (unmodifiable copy, safe to share across threads) + ↓ ClientCall.Builder.withConfig() +ClientCall.context = Context.modifiableCopy(config.context) (mutable per-call copy) + ↓ enriched by pipeline + ├── CallContext.RETRY_ATTEMPT, FEATURE_IDS, IDENTITY, ENDPOINT, etc. + ↓ passed to subsystems + ├── Context.unmodifiableView() → AuthSchemeResolverParams + ├── Context.unmodifiableView() → EndpointResolverParams + ├── mutable reference → transport.send() + └── mutable reference → all interceptor hooks +``` + +This ensures: +1. **Client-level config is immutable**, safe to share across threads/calls +2. **Per-call context is mutable**, enriched as the pipeline progresses +3. **Subsystem views are read-only**, auth/endpoint resolvers get unmodifiable views +4. **Deep copies isolate mutations**, `modifiableCopy` + `copyFunction` prevent cross-call contamination + +## Settings Pattern + +Settings are defined as interfaces with `Context.Key` constants and convenience setter methods: + +```java +public interface RegionSetting> extends ClientSetting { + Context.Key REGION = Context.key("Region name"); + + default void region(String region) { + putConfig(REGION, region); + } +} +``` + +Settings compose via interface inheritance: + +```java +public interface SigV4Settings extends ClockSetting, RegionSetting { ... } +``` + +This allows client builders and `RequestOverrideConfig.OverrideBuilder` to mix in typed settings while storing all values in the same `Context` instance. + +## Key Context Keys + +### Core Client Keys (`CallContext`) + +| Key | Type | Description | +|-----|------|-------------| +| `ENDPOINT` | `Endpoint` | Resolved endpoint | +| `ENDPOINT_RESOLVER` | `EndpointResolver` | The resolver used | +| `IDENTITY` | `Identity` | Resolved caller identity | +| `RETRY_ATTEMPT` | `Integer` | Current retry attempt (starts at 1) | +| `RETRY_MAX` | `Integer` | Max retries configured | +| `IDEMPOTENCY_TOKEN` | `String` | Token used with the call | +| `FEATURE_IDS` | `Set` | Feature IDs (uses `HashSet::new` copy function) | + +### Client Configuration Keys (`ClientContext`) + +| Key | Type | Description | +|-----|------|-------------| +| `APPLICATION_ID` | `String` | App name for user-agent | +| `API_CALL_TIMEOUT` | `Duration` | Total call timeout including retries | +| `API_CALL_ATTEMPT_TIMEOUT` | `Duration` | Single attempt timeout | + +### HTTP Keys (`HttpContext`) + +| Key | Type | Description | +|-----|------|-------------| +| `HTTP_REQUEST_TIMEOUT` | `Duration` | HTTP-level request timeout | +| `REQUEST_MIN_COMPRESSION_SIZE_BYTES` | `Integer` | Compression threshold | +| `DISABLE_REQUEST_COMPRESSION` | `Boolean` | Disable compression flag | + +### AWS-Specific Keys + +| Key | Type | Location | +|-----|------|----------| +| `RegionSetting.REGION` | `String` | aws-client-core | +| `SigV4Settings.SIGNING_NAME` | `String` | aws-sigv4 | +| `EndpointSettings.USE_DUAL_STACK` | `Boolean` | aws-client-core | +| `EndpointSettings.USE_FIPS` | `Boolean` | aws-client-core | + +### Other Notable Keys + +| Key | Type | Location | +|-----|------|----------| +| `ModelSetting.MODEL` | `Model` | dynamic-client | +| `ServiceIdSetting.SERVICE_ID` | `ShapeId` | dynamic-client | +| `RulesEngineSettings.BYTECODE` | `Bytecode` | rulesengine | +| `EndpointContext.CUSTOM_ENDPOINT` | `Endpoint` | endpoints | diff --git a/docs/technical-guide/documents.md b/docs/technical-guide/documents.md new file mode 100644 index 000000000..0e7be2f66 --- /dev/null +++ b/docs/technical-guide/documents.md @@ -0,0 +1,298 @@ +# Documents + +> **Last updated:** April 29, 2026 + +Documents are a first-class abstraction in the Smithy data model, a `document` type represents protocol-agnostic open +content that can hold any Smithy value. In Smithy-Java, the `Document` interface is the Java equivalent: it can +represent any Smithy shape (string, number, list, map, structure, etc.) without requiring a generated type. Documents +are the foundation of the dynamic client and are used throughout the framework for untyped data handling. + +**Source:** + [`core/src/main/java/software/amazon/smithy/java/core/serde/document/`](https://github.com/smithy-lang/smithy-java/tree/main/core/src/main/java/software/amazon/smithy/java/core/serde/document) + +## The Document Interface + +[`Document`](https://github.com/smithy-lang/smithy-java/blob/main/core/src/main/java/software/amazon/smithy/java/core/serde/document/Document.java) +is a Java interface that extends `SerializableShape`. It represents untyped data from the Smithy data model. + +### Core Design Principles + +1. **Protocol-agnostic** — Documents represent the Smithy data model, not any wire format. Protocol codecs smooth over + incompatibilities (e.g., base64-encoded blobs in JSON are transparently decoded by `asBlob()`). + +2. **Dual serialization contract** — `serialize(ShapeSerializer)` always calls `serializer.writeDocument(schema, this)`, + while `serializeContents(ShapeSerializer)` emits the *inner* data model (string, number, struct, etc.). This + two-level design lets serializers choose whether to treat the value as an opaque document or drill into its contents. + +3. **Typed documents** — Documents can wrap a `SerializableShape` via `Document.of(SerializableShape)`, preserving the + full schema so codecs can serialize/deserialize them exactly as if the original shape were used directly. + +### Key Methods + +```java +// Type identification +ShapeType type(); + +// The two-level serialization contract +void serialize(ShapeSerializer encoder); // calls writeDocument(schema, this) +void serializeContents(ShapeSerializer serializer); // emits inner data model + +// Scalar accessors (throw SerializationException if wrong type) +boolean asBoolean(); +int asInteger(); +long asLong(); +String asString(); +ByteBuffer asBlob(); +Instant asTimestamp(); +// ... and byte, short, float, double, BigInteger, BigDecimal + +// Collection accessors +List asList(); +Map asStringMap(); + +// Member access (for map/struct/union documents) +Document getMember(String name); +Collection getMemberNames(); +int size(); + +// Recursive unwrap to standard Java types +Object asObject(); + +// Discriminator support (for typed documents) +ShapeId discriminator(); + +// Convert to a generated shape + T asShape(ShapeBuilder builder); +``` + +## Document Implementations + +All concrete implementations are **package-private Java records** inside the `Documents` class: + +| Smithy Type | Record | Key Field | +|---|---|---| +| boolean | `BooleanDocument` | `boolean value` | +| byte | `ByteDocument` | `byte value` | +| short | `ShortDocument` | `short value` | +| integer | `IntegerDocument` | `int value` | +| long | `LongDocument` | `long value` | +| float | `FloatDocument` | `float value` | +| double | `DoubleDocument` | `double value` | +| bigInteger/bigDecimal | `NumberDocument` | `Number value` | +| string | `StringDocument` | `String value` | +| blob | `BlobDocument` | `ByteBuffer value` | +| timestamp | `TimestampDocument` | `Instant value` | +| list | `ListDocument` | `List values` | +| map | `StringMapDocument` | `Map members` | +| structure/union | `StructureDocument` | `Map members` | +| structure (lazy) | `LazyStructure` | `SerializableStruct struct` | +| streaming blob | `DataStreamDocument` | `DataStream stream` | +| streaming union | `EventStreamDocument` | `EventStream stream` | + +### Numeric Types + +Numeric types use primitive records (`ByteDocument`, `IntegerDocument`, etc.) to avoid autoboxing. All numeric documents +support cross-type casting, `IntegerDocument.asLong()` returns `(long) value`, following JLS widening/narrowing +rules. `NumberDocument` is used only for `BigInteger` and `BigDecimal` (already objects). + +### StructureDocument + +`StructureDocument` implements both `Document` and `SerializableStruct`, providing `serializeMembers()` and +`getMemberValue()`. It stores members as `Map`. + +### LazyStructure — The Key Optimization + +When `Document.of(SerializableShape)` is called on a structure, the result is a `LazyStructure` that **defers member +parsing** until `getMember()`, `asStringMap()`, or `getMemberNames()` is called. If the document is only serialized +(never inspected), the original struct's `serialize()` is used directly, no intermediate document tree is created. + +```java +// LazyStructure wraps the original struct +class LazyStructure implements Document { + private final SerializableStruct struct; + private volatile StructureDocument materialized; // lazy + + void serializeContents(ShapeSerializer serializer) { + struct.serialize(serializer); // direct delegation, no materialization + } + + Document getMember(String name) { + return materialize().getMember(name); // materializes on first access + } +} +``` + +Materialization uses `DocumentParser.StructureParser` to parse the struct's members into a `StructureDocument`. The `volatile` field ensures thread-safe lazy initialization. + +## Creating Documents + +### Static Factory Methods + +**Scalar factories:** +```java +Document.of(true) // → BooleanDocument +Document.of(42) // → IntegerDocument +Document.of("hello") // → StringDocument +Document.of(Instant.now()) // → TimestampDocument +Document.of(ByteBuffer.wrap(bytes)) // → BlobDocument +``` + +**Collection factories:** +```java +Document.of(List.of(Document.of(1), Document.of(2))) // → ListDocument +Document.of(Map.of("key", Document.of("value"))) // → StringMapDocument +``` + +**Streaming factories:** +```java +Document.of(schema, dataStream) // → DataStreamDocument +Document.of(schema, eventStream) // → EventStreamDocument +``` + +### Document.ofObject(Object) — Universal Factory + +Accepts any standard Java type and recursively converts: + +```java +Document.ofObject("hello") // → StringDocument +Document.ofObject(42) // → IntegerDocument +Document.ofObject(List.of("a", "b")) // → ListDocument of StringDocuments +Document.ofObject(Map.of("k", "v")) // → StringMapDocument +Document.ofObject(mySerializableStruct) // → typed document (LazyStructure) +``` + +This is the primary entry point for the dynamic client's `call(String, Map)` method. + +### Document.of(SerializableShape) — Typed Documents + +Captures the full state of any `SerializableShape` into a document tree: + +```java +Document doc = Document.of(myStruct); +// For structures: returns LazyStructure (defers member parsing) +// For other shapes: uses DocumentParser to capture the value +``` + +## The Serialize/SerializeContents Two-Level Pattern + +This is the most important design pattern in the document system. + +``` +serialize(serializer) + → serializer.writeDocument(schema, this) // "I am a document" + → codec decides how to handle it + → typically calls document.serializeContents(serializer) + → serializer.writeString(schema, value) // "My content is a string" +``` + +**Why two levels?** + +- `serialize()` wraps the value as a document (`writeDocument`). This tells the codec "this is an opaque document value." +- `serializeContents()` emits the raw data model. This tells the codec "here's what's inside." +- Codecs receive documents via `writeDocument` and can choose to inline the contents via `serializeContents()`. +- This prevents infinite recursion while allowing codecs full access to document internals. + +For example, the JSON codec's `writeDocument` for structures writes a `__type` discriminator field before the contents, +while for scalars it just inlines the value. + +## DocumentParser — Shapes to Documents + +[`DocumentParser`](https://github.com/smithy-lang/smithy-java/blob/main/core/src/main/java/software/amazon/smithy/java/core/serde/document/DocumentParser.java) +implements `ShapeSerializer` and converts any serialized Smithy data model into `Document` objects. It's the inverse of +`serializeContents`. + +```java +// Usage (internal): +var parser = new DocumentParser(); +shape.serialize(parser); +Document result = parser.getResult(); +``` + +Key behavior: +- Each `write*` method creates the corresponding document record and stores it in `result` +- `writeStruct` for STRUCTURE types creates a `LazyStructure`; for unions, it eagerly parses members +- `writeDocument` calls `value.serializeContents(this)` — unwrapping documents recursively +- `writeList` / `writeMap` collect elements via inner serializers + +`DocumentParser.StructureParser` extends `InterceptingSerializer`: +- `before(Schema)` returns the inner `DocumentParser` +- `after(Schema)` stores the parsed result keyed by `schema.memberName()` +- Produces a `Map` of member name → document + +## DocumentDeserializer — Documents to Shapes + +[`DocumentDeserializer`](https://github.com/smithy-lang/smithy-java/blob/main/core/src/main/java/software/amazon/smithy/java/core/serde/document/DocumentDeserializer.java) +implements `ShapeDeserializer` and wraps a `Document`, allowing any document to be deserialized into a generated shape: + +```java +// Convert a document to a generated type: +MyStruct struct = document.asShape(MyStruct.builder()); +``` + +Internally, `asShape()` calls `builder.deserialize(new DocumentDeserializer(this))`. + +The deserializer delegates to the document's type accessors: +- `readString(schema)` → `value.asString()` +- `readInteger(schema)` → `value.asInteger()` +- `readStruct(schema, state, consumer)` → iterates `value.getMemberNames()`, resolves member schemas, calls consumer + +**Extensibility**: `DocumentDeserializer` is designed to be extended by codecs. The `deserializer(Document)` factory + method is `protected` so subclasses can return codec-specific deserializers. + +### Discriminator Parsing + +```java +DocumentDeserializer.parseDiscriminator(String text, String defaultNamespace) +``` + +Parses shape IDs from discriminator strings (e.g., `__type` in JSON). Handles both fully-qualified +(`com.example#MyError`) and short-form (`MyError`) discriminators. + +## Equality and Comparison + +### Document.equals(Document left, Document right, int options) + +Compares documents by type and value, ignoring schemas. Supports `DocumentEqualsFlags.NUMBER_PROMOTION` for JLS 5.1.2 +widening comparison. Recursively compares lists element-by-element and maps entry-by-entry. `BigDecimal` comparison uses +`stripTrailingZeros()` for value equality. + +### Document.compare(Document left, Document right) + +Supports numeric and string types only. Uses `DocumentUtils.compareWithPromotion()` for cross-type numeric comparison — +promotes to `BigDecimal` when either operand is `BigInteger`/`BigDecimal`, to `double` when either is `float`/`double`, +otherwise uses `long`. + +## Relationship to the Dynamic Client + +The dynamic client uses documents as its input/output type. When a user calls: + +```java +Document result = client.call("GetItem", Map.of("id", "123")); +``` + +The flow is: +1. `Map.of("id", "123")` is converted to a `Document` via `Document.ofObject()` +2. The document is wrapped into a `StructDocument` (from `dynamic-schemas`) via `StructDocument.of(inputSchema, + document, serviceId)`, this recursively wraps every nested value with the correct schema +3. The `StructDocument` implements `SerializableStruct`, so the protocol codec can serialize it with full schema + information +4. The response is deserialized into a `StructDocument` via `SchemaGuidedDocumentBuilder` +5. The user receives it as a `Document` and accesses members via `result.getMember("name").asString()` + +See the [Dynamic Client](dynamic-client.md) document for details on `StructDocument` and +`ContentDocument`. + +## Key Classes Summary + +| Class | Role | +|---|---| +| `Document` | Core interface, untyped Smithy data model value | +| `Documents` | Package-private record implementations for all types | +| `Documents.LazyStructure` | Lazy typed structure wrapping a `SerializableStruct` | +| `DocumentParser` | `ShapeSerializer` that converts shapes → documents | +| `DocumentDeserializer` | `ShapeDeserializer` that reads from documents (extensible) | +| `DocumentUtils` | Number serialization, comparison, member value extraction | +| `DocumentEqualsFlags` | Bitfield flags for equality options | +| `DataStreamDocument` | Document wrapping a `DataStream` | +| `EventStreamDocument` | Document wrapping an `EventStream` | +| `DiscriminatorException` | Exception for missing/invalid discriminators | diff --git a/docs/technical-guide/dynamic-client.md b/docs/technical-guide/dynamic-client.md new file mode 100644 index 000000000..bc709d4c8 --- /dev/null +++ b/docs/technical-guide/dynamic-client.md @@ -0,0 +1,320 @@ +# Dynamic Client + +> **Last updated:** April 29, 2026 + +The dynamic client is an innovation in Smithy-Java that allows creating Smithy clients at runtime from a Smithy model, +without code generation. Instead of generated POJOs for inputs and outputs, it uses `Document` objects guided by +`Schema` objects derived from the model. This makes it possible to call any Smithy service as long as you have its model +— useful for tooling, testing, proxying, and scenarios where codegen isn't practical. + +**Source:** +- [`client/dynamic-client/`](https://github.com/smithy-lang/smithy-java/tree/main/client/dynamic-client) — The + `DynamicClient` class and supporting plugins +- [`dynamic-schemas/`](https://github.com/smithy-lang/smithy-java/tree/main/dynamic-schemas) — Schema conversion and + schema-aware document types + +## DynamicClient + +[`DynamicClient`](https://github.com/smithy-lang/smithy-java/blob/main/client/dynamic-client/src/main/java/software/amazon/smithy/java/dynamicclient/DynamicClient.java) +extends `Client` — it's a full Smithy client inheriting the standard client pipeline (interceptors, auth, transport, +protocol, endpoint resolution). + +### Builder + +```java +DynamicClient client = DynamicClient.builder() + .model(model) // Smithy Model (required) + .serviceId(ShapeId.from("com.example#MyService")) // optional if model has exactly one service + .protocol(new Rpcv2CborProtocol()) // protocol + .transport(new JavaHttpClientTransport()) // transport + .endpointResolver(EndpointResolver.staticEndpoint("https://example.com")) + .build(); +``` + +The builder implements two setting interfaces: +- **`ModelSetting`** — provides `model(Model)`, stores in `Context.Key MODEL` +- **`ServiceIdSetting`** — provides `serviceId(ToShapeId)`, stores in `Context.Key SERVICE_ID` + +### Build Logic + +1. Retrieves the `Model` from context (required) +2. Resolves the `ServiceShape`: + - If `serviceId` is set, looks it up in the model + - If not set, auto-detects if the model contains exactly one service (throws if 0 or >1) +3. Creates a `SchemaConverter(model)` and converts the service shape to a `Schema` +4. Creates an `ApiService` wrapping the service schema +5. Runs `DetectProtocolPlugin` to auto-detect protocol from model traits if not explicitly set +6. Builds a `TypeRegistry` for service-level errors + +### Public API — call() Methods + +```java +// No input (empty document) +Document result = client.call("GetItems"); + +// Map input (converted via Document.ofObject) +Document result = client.call("GetItem", Map.of("id", "123")); + +// Document input +Document result = client.call("GetItem", Document.of(Map.of("id", Document.of("123")))); + +// With request override config +Document result = client.call("GetItem", input, overrideConfig); +``` + +All `call()` methods: +1. Resolve the `ApiOperation` by name (lazy, cached in `ConcurrentHashMap`) +2. Convert the input `Document` to a `StructDocument` via `StructDocument.of(inputSchema, input, serviceId)` +3. Delegate to the inherited `Client.call(inputStruct, apiOperation, overrideConfig)` + +### Other Methods + +```java +// Get the raw ApiOperation for advanced use +ApiOperation op = client.getOperation("GetItem"); + +// Create a SerializableStruct from any shape in the model +SerializableStruct struct = client.createStruct(ShapeId.from("com.example#MyStruct"), document); +``` + +## SchemaConverter + +[`SchemaConverter`](https://github.com/smithy-lang/smithy-java/blob/main/dynamic-schemas/src/main/java/software/amazon/smithy/java/dynamicschemas/SchemaConverter.java) converts Smithy `Shape` objects (from `software.amazon.smithy.model`) into runtime `Schema` objects (from `software.amazon.smithy.java.core.schema`). + +```java +SchemaConverter converter = new SchemaConverter(model); +Schema schema = converter.getSchema(shape); +``` + +### Conversion Strategy + +- **Scalar shapes** (string, boolean, integer, etc.) — Direct creation via `Schema.createXxx(shapeId, traits)` +- **Aggregate shapes** (structure, union, list, map) — Recursive builder pattern with cycle detection + +### Recursive Schema Handling + +The converter maintains: +- `Map recursiveBuilders` — Detects and handles recursive references +- `Set building` — Tracks shapes currently being built (cycle detection) + +When a cycle is detected (target shape is in `building`), it uses the deferred `SchemaBuilder` reference instead of a completed `Schema`, producing a `DeferredRootSchema` at build time. + +### Caching + +Results are cached in `ConcurrentMap schemas`. The converter also provides a `SchemaIndex` via `getSchemaIndex()` that delegates to this cache. + +### Document Builder Factory + +```java +// Creates a ShapeBuilder that produces StructDocuments +ShapeBuilder builder = SchemaConverter.createDocumentBuilder(schema, serviceId); +``` + +This is what protocols use to deserialize responses into `StructDocument` instances. + +## StructDocument — The Bridge + +[`StructDocument`](https://github.com/smithy-lang/smithy-java/blob/main/dynamic-schemas/src/main/java/software/amazon/smithy/java/dynamicschemas/StructDocument.java) +is the central type that makes the dynamic client work. It implements both `Document` AND `SerializableStruct`, allowing +it to stand in for a generated POJO anywhere in the client pipeline. + +### Key Design Decision + +> StructDocument intentionally breaks the normal Document invariant. Its `serialize()` calls `serializeContents()` +> (which calls `writeStruct`) instead of `writeDocument`. This is because StructDocument is meant to stand in for a +> modeled value, not be serialized as a document. + +### Fields + +```java +private final Schema schema; // The Smithy schema this document represents +private final ShapeId service; // Service ID for namespace resolution +private final Map members; // Member name → Document value +``` + +### Deep Conversion — StructDocument.of() + +```java +StructDocument input = StructDocument.of(inputSchema, document, serviceId); +``` + +This recursively walks the schema tree and the document tree in parallel, wrapping each value with the correct schema: + +- **STRUCTURE** — Iterates schema members, looks up each in the delegate, recursively converts → `StructDocument` +- **UNION** — Same as structure, handles `@streaming` unions as event streams +- **MAP** — Wraps each value with the map's value member schema → `ContentDocument` +- **LIST/SET** — Wraps each element with the list's member schema → `ContentDocument` +- **Scalars** (BOOLEAN, STRING, INTEGER, etc.) — Wraps in `ContentDocument` with the correct schema +- **BLOB with @streaming** — Passes through as `Document.of(schema, delegate.asDataStream())` +- **DOCUMENT type** — Wraps as-is in `ContentDocument` + +### Serialization + +When the protocol serializes a `StructDocument`: +1. `serialize()` calls `serializeContents()` which calls `serializer.writeStruct(schema, this)` +2. `serializeMembers()` iterates member names, looks up the schema member, calls `value.serialize(serializer)` on each +3. The protocol sees correct shape IDs, traits (like `@jsonName`, `@timestampFormat`), and types for every value + +### Shape ID Awareness (Discriminator) + +```java +public ShapeId discriminator() { + return schema.type() == ShapeType.STRUCTURE ? schema.id() : null; +} +``` + +For structures, returns the schema's shape ID. This is used by protocols for error discrimination (e.g., `__type` in AWS +JSON). + +## ContentDocument + +[`ContentDocument`](https://github.com/smithy-lang/smithy-java/blob/main/dynamic-schemas/src/main/java/software/amazon/smithy/java/dynamicschemas/ContentDocument.java) is a record that wraps another `Document` and overrides its schema: + +```java +record ContentDocument(Document document, Schema schema) implements Document +``` + +Used for non-structure values (scalars, lists, maps) that need schema association. While `StructDocument` handles +structures/unions, `ContentDocument` handles everything else. + +Key behavior: +- For `DOCUMENT` type: calls `serializer.writeDocument(schema, this)` (preserves document semantics) +- For all other types: calls the appropriate `serializer.writeXxx(schema, value)` with the overridden schema +- All `asXxx()` methods delegate to the wrapped document + +## SchemaGuidedDocumentBuilder + +[`SchemaGuidedDocumentBuilder`](https://github.com/smithy-lang/smithy-java/blob/main/dynamic-schemas/src/main/java/software/amazon/smithy/java/dynamicschemas/SchemaGuidedDocumentBuilder.java) +implements `ShapeBuilder`. This is what protocols use to deserialize responses into `StructDocument` +instances. + +### Deserialization + +The `deserialize(ShapeDeserializer)` method handles every Smithy type: +- **Scalars** — Reads via `decoder.readXxx(schema)`, wraps in `ContentDocument` +- **LIST** — Uses `decoder.readList()` with a `SchemaList` (ArrayList subclass that captures the member schema) +- **MAP** — Uses `decoder.readStringMap()` with a `SchemaMap` (HashMap subclass that captures the schema) +- **STRUCTURE/UNION** — Uses `decoder.readStruct()` to build a `LinkedHashMap`, creates a `StructDocument` +- **Streaming BLOB** — Uses `decoder.readDataStream()` +- **Streaming UNION** — Uses `decoder.readEventStream()` + +### build() + +For unions, throws if no value was set. Returns `new StructDocument(target, map, service)`. + +## DynamicOperation + +[`DynamicOperation`](https://github.com/smithy-lang/smithy-java/blob/main/client/dynamic-client/src/main/java/software/amazon/smithy/java/dynamicclient/DynamicOperation.java) +implements `ApiOperation`. This is the runtime representation of a Smithy operation +without codegen. + +Key fields: +- `operationSchema`, `inputSchema`, `outputSchema` — `Schema` objects converted from the model +- `errorSchemas` — `List` for operation-specific errors +- `typeRegistry` — Composed from operation errors + service errors, used for error deserialization +- `effectiveAuthSchemes` — `List` from `ServiceIndex.getEffectiveAuthSchemes()` + +`inputBuilder()` / `outputBuilder()` return `SchemaGuidedDocumentBuilder` instances. + +## DocumentException + +[`DocumentException`](https://github.com/smithy-lang/smithy-java/blob/main/client/dynamic-client/src/main/java/software/amazon/smithy/java/dynamicclient/DocumentException.java) +extends `ModeledException`. Wraps a `StructDocument` representing the error contents. + +- `getContents()` returns the error as a `Document` for inspection +- `createMessage()` attempts to extract a `message` or `Message` member from the document + +Its inner `SchemaGuidedExceptionBuilder` is registered in the `TypeRegistry` for error deserialization. + +## Plugins + +### DetectProtocolPlugin + +Runs at `FIRST` phase. Uses `ServiceLoader` to discover `ClientProtocolFactory` implementations on the classpath. Reads +protocol traits from the service via `ServiceIndex.of(model).getProtocols(service)`. If a transport is already set, only +picks protocols whose `messageExchange()` matches the transport's. + +### SimpleAuthDetectionPlugin + +Runs at `DEFAULTS` phase. Auto-discovered via SPI. Uses `ServiceLoader` to discover `AuthSchemeFactory` +implementations. Reads effective auth schemes from `ServiceIndex.getEffectiveAuthSchemes()`. Only applies if no +`AuthSchemeResolver` is already configured. + +## Request/Response Flow + +### Sending a Request + +1. User calls `client.call("GetItem", Map.of("id", "123"))` +2. `Map` is converted to `Document` via `Document.ofObject()` +3. `getApiOperation("GetItem")` lazily creates a `DynamicOperation` with schemas from `SchemaConverter` +4. Input `Document` is wrapped into a `StructDocument` via `StructDocument.of(inputSchema, document, serviceId)` — + recursively wraps every nested value with the correct schema +5. The `StructDocument` (which implements `SerializableStruct`) is passed to the protocol layer +6. The protocol serializes it using `serialize()` / `serializeMembers()`, which emit proper schema-tagged values +7. The protocol sees correct shape IDs, traits, and types for every value + +### Receiving a Response + +1. The protocol receives the raw response +2. It calls `apiOperation.outputBuilder()` which returns a `SchemaGuidedDocumentBuilder` +3. The protocol calls `builder.deserialize(decoder)` +4. The builder reads each field using the schema to guide deserialization, wrapping values in `ContentDocument` or + `StructDocument` +5. `builder.build()` returns a `StructDocument` that the user receives as a `Document` + +### Error Handling + +1. The protocol detects an error response +2. It looks up the error type in the `TypeRegistry` (composed from operation + service errors) +3. The registry returns a `DocumentException.SchemaGuidedExceptionBuilder` +4. The builder deserializes the error body into a `StructDocument` +5. A `DocumentException` is thrown, which the user can catch and inspect via `getContents()` + +## Differences from Generated Clients + +| Aspect | Generated Client | Dynamic Client | +|--------|-----------------|----------------| +| Types | Generated POJOs per shape | `Document` / `StructDocument` | +| Schema source | Static `Schema` constants in generated code | `SchemaConverter` converts from `Model` at runtime | +| Input construction | Builder pattern on generated types | `Document.ofObject(Map)` or `Document.of(...)` | +| Output access | Typed getters | `document.getMember("name").asString()` | +| Error types | Generated exception classes | `DocumentException` with `getContents()` | +| Protocol detection | Hardcoded in generated code | Auto-detected via `ServiceLoader` + model traits | +| Auth detection | Hardcoded in generated code | Auto-detected via `SimpleAuthDetectionPlugin` | + +## File Layout + +``` +client/dynamic-client/src/main/java/software/amazon/smithy/java/dynamicclient/ +├── DynamicClient.java — Main client class +├── DynamicOperation.java — Runtime operation representation +├── DocumentException.java — Schema-aware exception + builder +├── plugins/ +│ ├── DetectProtocolPlugin.java — Auto-detect protocol from model +│ └── SimpleAuthDetectionPlugin.java — Auto-detect auth schemes +└── settings/ + ├── ModelSetting.java — Mixin for Model config + └── ServiceIdSetting.java — Mixin for ServiceId config + +dynamic-schemas/src/main/java/software/amazon/smithy/java/dynamicschemas/ +├── SchemaConverter.java — Shape → Schema conversion with recursion handling +├── StructDocument.java — Document + SerializableStruct hybrid +├── ContentDocument.java — Schema-wrapping document for non-struct values +└── SchemaGuidedDocumentBuilder.java — ShapeBuilder for deserializing into StructDocument +``` + +## Key Classes Summary + +| Class | Module | Role | +|---|---|---| +| `DynamicClient` | dynamic-client | Main client class, extends `Client` | +| `DynamicOperation` | dynamic-client | `ApiOperation` | +| `DocumentException` | dynamic-client | Error wrapper with `getContents()` | +| `DetectProtocolPlugin` | dynamic-client | Auto-detects protocol from model traits | +| `SimpleAuthDetectionPlugin` | dynamic-client | Auto-detects auth schemes | +| `ModelSetting` | dynamic-client | Builder mixin for `Model` | +| `ServiceIdSetting` | dynamic-client | Builder mixin for `ShapeId` | +| `SchemaConverter` | dynamic-schemas | Converts Smithy `Shape` → runtime `Schema` | +| `StructDocument` | dynamic-schemas | Document + SerializableStruct bridge | +| `ContentDocument` | dynamic-schemas | Schema-wrapping decorator for non-struct documents | +| `SchemaGuidedDocumentBuilder` | dynamic-schemas | `ShapeBuilder` for deserializing into `StructDocument` | diff --git a/docs/technical-guide/http.md b/docs/technical-guide/http.md new file mode 100644 index 000000000..6092cfd0f --- /dev/null +++ b/docs/technical-guide/http.md @@ -0,0 +1,264 @@ +# HTTP Layer + +> **Last updated:** April 29, 2026 + +The HTTP layer provides the abstraction between Smithy's protocol-agnostic model and the wire format. It's split into +four modules: core HTTP types, HTTP binding (mapping Smithy traits to HTTP), the Java HttpClient transport, and +client-side HTTP binding protocol support. + +**Source:** +- [`http/http-api/`](https://github.com/smithy-lang/smithy-java/tree/main/http/http-api) — Core HTTP types +- [`http/http-binding/`](https://github.com/smithy-lang/smithy-java/tree/main/http/http-binding) — Smithy HTTP trait + mapping +- [`client/client-http/`](https://github.com/smithy-lang/smithy-java/tree/main/client/client-http) — Java HttpClient + transport +- [`client/client-http-binding/`](https://github.com/smithy-lang/smithy-java/tree/main/client/client-http-binding) — + Client HTTP binding protocol base + +## HTTP Abstraction (`http-api`) + +### HttpRequest and HttpResponse + +Both extend `HttpMessage` and follow an **immutable-by-default, modifiable-on-demand** pattern: + +```java +public interface HttpMessage extends AutoCloseable { + HttpVersion httpVersion(); + HttpHeaders headers(); + DataStream body(); // never null, may be zero-length +} + +public interface HttpRequest extends HttpMessage { + String method(); + SmithyUri uri(); + ModifiableHttpRequest toModifiable(); // returns self if already modifiable + ModifiableHttpRequest toModifiableCopy(); // always copies + HttpRequest toUnmodifiable(); // freezes + static ModifiableHttpRequest create(); +} + +public interface HttpResponse extends HttpMessage { + int statusCode(); + ModifiableHttpResponse toModifiable(); + ModifiableHttpResponse toModifiableCopy(); + HttpResponse toUnmodifiable(); + static ModifiableHttpResponse create(); +} +``` + +Modifiable variants add fluent setters: `setMethod()`, `setUri()`, `setStatusCode()`, `setBody()`, `setHeader()`, `addHeader()`, `removeHeader()`. + +### HttpHeaders + +Case-insensitive header container: + +```java +public interface HttpHeaders { + static HttpHeaders of(Map> headers); // immutable + static ModifiableHttpHeaders ofModifiable(); // mutable + boolean hasHeader(String name); + String firstValue(String name); + List allValues(String name); + void forEachEntry(BiConsumer consumer); // most efficient iteration + ModifiableHttpHeaders toModifiable(); + HttpHeaders toUnmodifiable(); +} +``` + +Backed by flat arrays (`ArrayHttpHeaders` / `ArrayUnmodifiableHttpHeaders`) for O(1) iteration. + +### HeaderName + +Canonical lowercase header name constants with fast case-insensitive lookup. Pre-defines ~60+ standard HTTP +headers. `canonicalize(String)` returns the canonical constant for known headers (O(1) by length-bucketed lookup), +lowercased string for unknown. Uses ASCII `| 0x20` trick for branchless case-insensitive comparison. + +### MessageExchange + +```java +public final class HttpMessageExchange implements MessageExchange { + public static final HttpMessageExchange INSTANCE = new HttpMessageExchange(); +} +``` + +A singleton marker that types the request/response pair. Both transport and protocol expose `messageExchange()`, and the +pipeline validates they match. + +## HTTP Binding (`http-binding`) + +This module maps Smithy `@http*` traits to HTTP request/response components. It is protocol-agnostic — body +serialization is delegated to a `Codec`. + +### Entry Point + +```java +public final class HttpBinding { + public RequestSerializer requestSerializer(); + public ResponseSerializer responseSerializer(); + public RequestDeserializer requestDeserializer(); + public ResponseDeserializer responseDeserializer(); +} +``` + +### BindingMatcher + +Pre-computes the binding location for each member of a structure schema using `member.memberIndex()` for O(1) lookup: + +```java +enum Binding { HEADER, QUERY, PAYLOAD, BODY, LABEL, STATUS, PREFIX_HEADERS, QUERY_PARAMS } +``` + +Trait-to-binding mapping: +- `@httpLabel` → LABEL (URI path segment) +- `@httpQuery` → QUERY (query parameter) +- `@httpQueryParams` → QUERY_PARAMS (map of query params) +- `@httpHeader` → HEADER +- `@httpPrefixHeaders` → PREFIX_HEADERS (map of headers with prefix) +- `@httpPayload` → PAYLOAD (entire body) +- `@httpResponseCode` → STATUS +- No trait → BODY (serialized in body alongside other BODY members) + +`BindingMatcher` instances are cached per `Schema` in `ConcurrentHashMap`s. + +### Request Serialization Flow + +```java +httpBinding.requestSerializer() + .operation(operation) + .payloadCodec(codec) + .payloadMediaType("application/json") + .shapeValue(input) + .endpoint(endpoint) + .omitEmptyPayload(true) + .serializeRequest(); // → HttpRequest +``` + +Internally: +1. Gets `BindingMatcher` from cache +2. Creates `HttpBindingSerializer` with the `HttpTrait`, codec, matcher +3. Calls `shapeValue.serialize(serializer)` → triggers `writeStruct()` +4. Each member is routed to the appropriate sub-serializer: + - HEADER → `HttpHeaderSerializer` + - QUERY → `HttpQuerySerializer` (writes to `QueryStringBuilder`) + - LABEL → `HttpLabelSerializer` (writes to labels map) + - PAYLOAD → `PayloadSerializer` (handles DataStream, EventStream, or codec) + - BODY → serialized via codec (filtered to only BODY-bound members) +5. Builds URI from endpoint + path labels + query string +6. Constructs `HttpRequest` with method, URI, headers, body + +### Response Deserialization Flow + +```java +httpBinding.responseDeserializer() + .payloadCodec(codec) + .outputShapeBuilder(builder) + .response(response) + .deserialize(); +``` + +Dispatches each member by binding location: +- HEADER → `HttpHeaderDeserializer` +- STATUS → `ResponseStatusDeserializer` +- PAYLOAD → handles event streams, streaming DataStream, or codec-deserialized shapes +- BODY → codec-deserialized (filtered to only BODY-bound members) + +## Java HttpClient Transport (`client-http`) + +### JavaHttpClientTransport + +The primary transport implementation, wrapping `java.net.http.HttpClient`: + +```java +public final class JavaHttpClientTransport implements ClientTransport { + public JavaHttpClientTransport(); + public JavaHttpClientTransport(HttpClient client); + public HttpResponse send(Context context, HttpRequest request); +} +``` + +Request construction: +- Creates `BodyPublisher`: `noBody()` for empty, `ofByteArray()` for known-length, `ofInputStream()` for streaming +- Applies per-request timeout from `HttpContext.HTTP_REQUEST_TIMEOUT` +- Copies headers via `forEachEntry()`, skipping `content-length` (JDK manages it) + +Response parsing: +- Converts `java.net.http.HttpResponse` to Smithy `HttpResponse` +- Wraps response body as `DataStream.ofInputStream()` + +Error mapping: `HttpConnectTimeoutException` → `ConnectTimeoutException`, other exceptions via `ClientTransport.remapExceptions()`. + +Discovered via SPI (`ClientTransportFactory`, name `"http-java"`). + +### HttpClientProtocol + +Abstract base for HTTP-based protocols: + +```java +public abstract class HttpClientProtocol implements ClientProtocol { + public MessageExchange messageExchange(); // → HttpMessageExchange.INSTANCE + public HttpRequest setServiceEndpoint(HttpRequest request, Endpoint endpoint); +} +``` + +`setServiceEndpoint()` merges the endpoint URI with the request URI and copies endpoint-provided HTTP headers. + +### HttpBindingClientProtocol + +Ties together `HttpClientProtocol` + `HttpBinding` for REST-style protocols: + +```java +public abstract class HttpBindingClientProtocol> extends HttpClientProtocol { + abstract protected String payloadMediaType(); + abstract protected Codec payloadCodec(); + abstract protected HttpErrorDeserializer getErrorDeserializer(Context context); +} +``` + +Default `createRequest()` delegates to `HttpBinding.requestSerializer()`. Default `deserializeResponse()` checks status +(200-299 = success), uses `HttpBinding.responseDeserializer()` for success, `HttpErrorDeserializer` for errors. + +### HttpErrorDeserializer + +Configurable error deserialization pipeline: + +```java +HttpErrorDeserializer.builder() + .codec(codec) + .headerErrorExtractor(extractor) // check headers for error type + .knownErrorFactory(factory) // deserialize known errors + .unknownErrorFactory(factory) // create unknown errors + .errorPayloadParser(parser) // extract error ShapeId from payload + .build(); +``` + +Resolution order: headers → payload parsing → fallback to HTTP status code. + +### Plugins + +- `UserAgentPlugin` — Sets User-Agent header +- `HttpChecksumPlugin` — Computes/validates HTTP checksums +- `RequestCompressionPlugin` — Gzip compression for request bodies +- `ApplyHttpRetryInfoPlugin` — Adds retry info headers + +### Built-in Auth Schemes + +- `HttpBearerAuthScheme` / `HttpBearerAuthSigner` +- `HttpBasicAuthAuthScheme` / `HttpBasicAuthSigner` +- `HttpApiKeyAuthScheme` / `HttpApiKeyAuthSigner` +- `HttpDigestAuthAuthScheme` / `HttpDigestAuthSigner` + +## Key Design Patterns + +1. **Immutable/Modifiable duality** — All HTTP messages have immutable and modifiable variants. `toModifiable()` returns + self if already modifiable; `toModifiableCopy()` always copies. + +2. **Binding caching** — `BindingMatcher` instances are cached per `Schema` to avoid re-computing trait lookups on every + request. + +3. **Codec delegation** — The HTTP binding layer is protocol-agnostic. Body serialization is delegated to a `Codec` + (JSON, CBOR, XML). + +4. **InterceptingSerializer pattern** — `HttpBindingSerializer` routes each struct member to the correct sub-serializer + based on its binding location. + +5. **MessageExchange as type witness** — Ensures type-safe pairing of transport and protocol. diff --git a/docs/technical-guide/index.md b/docs/technical-guide/index.md new file mode 100644 index 000000000..b274cb9e6 --- /dev/null +++ b/docs/technical-guide/index.md @@ -0,0 +1,120 @@ +# Smithy-Java Knowledge Center + +> **Last updated:** April 29, 2026 + +A collection of technical documents covering the critical subsystems of Smithy-Java. Each document provides enough depth +to understand, maintain, and extend the subsystem, including architecture, key classes, design patterns, and links to +source code. + +**Repository:** [github.com/smithy-lang/smithy-java](https://github.com/smithy-lang/smithy-java) + +## Suggested Reading Order + +The documents are organized in dependency order. Start with the foundations, then work through the client stack, and +finish with the specialized systems. + +### Tier 1, Foundations + +Read these first. Every other document assumes familiarity with these +concepts. + +| # | Document | Summary | +|---|----------|---------| +| 1 | [Context System](context.md) | The `Context` typed key-value store that threads through the entire pipeline. Covers `Context.Key`, chunked array storage, copy semantics, and the settings pattern. | +| 2 | [Schemas](schemas.md) | The `Schema` sealed class hierarchy, runtime representations of Smithy shapes. Covers lists/maps, recursive schemas, presence tracking, validation, trait storage (`TraitKey`/`TraitMap`), and the extension system. | +| 3 | [I/O and Streaming](io.md) | `DataStream` and `EventStream` abstractions. Covers frame encoding/decoding, `EventPipeStream` producer-consumer bridge, and how streaming flows through the pipeline. | + +### Tier 2, Serialization + +These build on Schemas and I/O. + +| # | Document | Summary | Depends On | +|---|----------|---------|------------| +| 4 | [Codecs](codecs.md) | `Codec`, `ShapeSerializer`, `ShapeDeserializer` interfaces. JSON (native high-perf impl with speculative matching), CBOR (canonicalizer), XML (StAX with flattened member buffering). | Schemas | +| 5 | [Documents](documents.md) | The `Document` interface for untyped Smithy data. The serialize/serializeContents two-level pattern, `LazyStructure` optimization, `DocumentParser`/`DocumentDeserializer`. | Schemas, Codecs | + +### Tier 3, Client Stack + +Read these in order. Each builds on the previous. + +| # | Document | Summary | Depends On | +|---|----------|---------|------------| +| 6 | [HTTP Layer](http.md) | HTTP abstraction (`HttpRequest`/`HttpResponse`), HTTP binding (mapping `@http*` traits), Java HttpClient transport, `HttpBindingClientProtocol`. | Schemas, Codecs, I/O | +| 7 | [Auth System](auth.md) | `Identity`, `IdentityResolver`, `Signer`, `AuthScheme`, `AuthSchemeResolver`. SigV4 implementation as concrete example. | Context, HTTP | +| 8 | [Client Core](client-core.md) | `Client` base class, plugin system (`ClientPlugin`, phases, `AutoClientPlugin`), interceptor system (~20 hooks), the full call pipeline, transport/protocol abstractions, `RequestOverrideConfig`. | Context, Schemas, HTTP, Auth | +| 9 | [Retries and Waiters](retries-waiters.md) | Token-based retry strategy (Standard, Adaptive with CUBIC rate limiting), token buckets, waiter framework with JMESPath matchers, waiter codegen. | Client Core | + +### Tier 4, Protocol Implementations + +These can be read independently after the client stack. + +| # | Document | Summary | Depends On | +|---|----------|---------|------------| +| 10 | [AWS Protocol Integrations](aws-protocols.md) | AWS JSON 1.0/1.1, restJson1, restXml, AWS Query protocols. Event streaming infrastructure. Comparison table. | HTTP, Codecs, Client Core | + +### Tier 5, Advanced Systems + +These can be read in any order after the client stack. + +| # | Document | Summary | Depends On | +|---|----------|---------|------------| +| 11 | [Dynamic Client](dynamic-client.md) | `DynamicClient`, creating clients at runtime from Smithy models. `SchemaConverter`, `StructDocument` bridge, `ContentDocument`, `SchemaGuidedDocumentBuilder`. | Client Core, Documents, Schemas | +| 12 | [Rules Engine](rules-engine.md) | Endpoint resolution via BDD transformation + bytecode VM. Compilation pipeline, fused opcodes, inline condition optimization, register allocation, context providers. | Client Core | +| 13 | [Server](server.md) | Server-side framework. `Service`/`Operation` interfaces, handler pipeline, Netty integration, protocol SPI, orchestrator hierarchy, server stub codegen. Current limitations. | Schemas, Codecs, HTTP | +| 14 | [MCP](mcp.md) | Model Context Protocol integration. Converting Smithy operations to MCP tools, JSON Schema generation, stdio/HTTP transports, proxy aggregation, `@prompts` trait. | Server, Documents | + +### Tier 6, Code Generation + +Best read after understanding what the generated code looks like (from the client/server docs). + +| # | Document | Summary | Depends On | +|---|----------|---------|------------| +| 15 | [Code Generation](codegen.md) | The `java-codegen` plugin. Modes (client/server/types), directed codegen pattern, `JavaWriter` format patterns (`$C`, `$T`), synthetic service trick, integration system, type generators. | Schemas, Client Core, Server | + +## Dependency Graph + +``` +Context ─────────────────────────────────────────────────┐ + │ │ +Schemas ──────────────────────────────────┐ │ + │ │ │ +I/O ──────────────────────┐ │ │ + │ │ │ │ +Codecs ───────────┐ │ │ │ + │ │ │ │ │ +Documents │ │ │ │ + │ │ │ │ │ + │ HTTP Layer ──┤ │ │ + │ │ │ │ │ + │ Auth System ─┤ │ │ + │ │ │ │ │ + │ Client Core ─┼───────────────┤ │ + │ │ │ │ │ + │ Retries/Waiters │ │ + │ │ │ │ + │ AWS Protocols │ │ + │ │ │ +Dynamic Client Server ─── MCP │ + │ │ +Rules Engine Code Generation ────┘ +``` + +## Quick Reference + +| Document | Key Classes | Module(s) | +|----------|-------------|-----------| +| Context | `Context`, `Context.Key` | `context` | +| Schemas | `Schema`, `SchemaBuilder`, `TraitKey`, `TraitMap`, `PresenceTracker`, `Validator` | `core` | +| I/O | `DataStream`, `EventStream`, `EventPipeStream`, `SmithyUri` | `io`, `core/serde/event` | +| Codecs | `Codec`, `ShapeSerializer`, `ShapeDeserializer`, `JsonCodec`, `Rpcv2CborCodec`, `XmlCodec` | `codecs/*` | +| Documents | `Document`, `DocumentParser`, `DocumentDeserializer` | `core/serde/document` | +| HTTP | `HttpRequest`, `HttpResponse`, `HttpBinding`, `JavaHttpClientTransport` | `http/*`, `client/client-http*` | +| Auth | `AuthScheme`, `AuthSchemeResolver`, `Signer`, `Identity`, `SigV4AuthScheme` | `auth-api`, `client/client-auth-api`, `aws/aws-sigv4` | +| Client Core | `Client`, `ClientPipeline`, `ClientPlugin`, `ClientInterceptor`, `ClientProtocol`, `ClientTransport` | `client/client-core` | +| Retries/Waiters | `RetryStrategy`, `StandardRetryStrategy`, `AdaptiveRetryStrategy`, `Waiter`, `Matcher` | `retries*`, `client/client-waiters` | +| AWS Protocols | `AwsJson1Protocol`, `RestJsonClientProtocol`, `RestXmlClientProtocol`, `AwsQueryClientProtocol` | `aws/client/*` | +| Dynamic Client | `DynamicClient`, `SchemaConverter`, `StructDocument`, `ContentDocument` | `client/dynamic-client`, `dynamic-schemas` | +| Rules Engine | `RulesEngineBuilder`, `Bytecode`, `BytecodeEvaluator`, `BytecodeEndpointResolver` | `rules-engine` | +| Server | `Service`, `Operation`, `Server`, `ServerProtocol`, `Handler`, `Orchestrator` | `server/*` | +| MCP | `McpServer`, `McpService`, `StdioProxy`, `HttpMcpProxy` | `mcp/*` | +| Code Generation | `JavaCodegenPlugin`, `DirectedJavaCodegen`, `JavaWriter`, `StructureGenerator` | `codegen/*` | diff --git a/docs/technical-guide/io.md b/docs/technical-guide/io.md new file mode 100644 index 000000000..f108777ee --- /dev/null +++ b/docs/technical-guide/io.md @@ -0,0 +1,238 @@ +# I/O and Streaming + +> **Last updated:** April 29, 2026 + +The I/O module provides the foundational abstractions for binary data transfer in Smithy-Java: `DataStream` for byte +streams and `EventStream` for typed event sequences. These abstractions unify how streaming shapes flow through the +client/server pipeline — the transport layer only sees `DataStream`, regardless of whether the payload is a simple blob +or a sequence of modeled events. + +**Source:** [`io/`](https://github.com/smithy-lang/smithy-java/tree/main/io), + [`core/src/main/java/software/amazon/smithy/java/core/serde/event/`](https://github.com/smithy-lang/smithy-java/tree/main/core/src/main/java/software/amazon/smithy/java/core/serde/event) + +## DataStream + +[`DataStream`](https://github.com/smithy-lang/smithy-java/blob/main/io/src/main/java/software/amazon/smithy/java/io/datastream/DataStream.java) +is the central abstraction for reading streams of binary data. It extends `Flow.Publisher` and +`AutoCloseable`. + +```java +public interface DataStream extends Flow.Publisher, AutoCloseable { + long contentLength(); // -1 if unknown + boolean hasKnownLength(); + String contentType(); // MIME type or null + boolean isReplayable(); // can re-read from beginning + boolean isAvailable(); // hasn't been consumed yet (or is replayable) + InputStream asInputStream(); // blocking InputStream view + void writeTo(OutputStream out); // transfer to OutputStream + ByteBuffer asByteBuffer(); // read all into memory +} +``` + +### Factory Methods + +```java +DataStream.ofEmpty() // singleton empty stream +DataStream.ofString("hello", "text/plain") // from string +DataStream.ofBytes(bytes, "application/octet-stream") // from byte array +DataStream.ofByteBuffer(buffer, "application/cbor") // from ByteBuffer +DataStream.ofInputStream(is, contentType, contentLength) // from InputStream +DataStream.ofFile(path, contentType) // from file +DataStream.ofPublisher(publisher, contentType, length, replayable) // from reactive publisher +``` + +### Implementations + +| Class | Replayable | Known Length | Notes | +|-------|-----------|-------------|-------| +| `EmptyDataStream` | ✅ | ✅ (0) | Singleton | +| `ByteBufferDataStream` | ✅ | ✅ | Uses `buffer.duplicate()` for replay | +| `FileDataStream` | ✅ | ✅ | Opens new `InputStream` per read | +| `InputStreamDataStream` | ❌ | Depends | Tracks `consumed` flag, throws on re-read | +| `PublisherDataStream` | Configurable | Configurable | Wraps `Flow.Publisher` | +| `WrappedDataStream` | Delegates | Overridden | Decorator for metadata changes | + +DataStream is used as the body type for all HTTP messages. Event streams are converted to `DataStream` via `EventPipeStream`, unifying the transport layer. + +### Replayability and Retries + +`isReplayable()` is critical for retry decisions. In `ClientCall.isRetryDisallowed()`, if the input has a non-replayable +data stream (blob streaming), retries are prevented. Event streams (union type) are excluded from this check. + +## EventStream + +[`EventStream`](https://github.com/smithy-lang/smithy-java/blob/main/core/src/main/java/software/amazon/smithy/java/core/serde/event/EventStream.java) +is a sealed interface for typed event sequences: + +``` +EventStream (sealed) +├── EventStreamReader — reading events (implements Iterable) +│ └── ProtocolEventStreamReader — protocol-level reader +│ └── DefaultEventStreamReader +└── EventStreamWriter — writing events + └── ProtocolEventStreamWriter — protocol-level writer + └── DefaultEventStreamWriter +``` + +### EventStreamWriter — Writing Events + +```java +public sealed interface EventStreamWriter extends EventStream { + void write(T event); // blocks until write is possible +} +``` + +The protocol-level writer adds internal methods: +- `toDataStream()` — converts writer to a DataStream for wire transport +- `setEventEncoderFactory(EventEncoderFactory)` — configures encoding +- `setInitialEvent(IE)` — sets the first event +- `addFrameProcessor(FrameProcessor)` — adds signing/transformation +- `activate()` — finalizes setup, unblocks writes + +### EventStreamReader — Reading Events + +```java +public sealed interface EventStreamReader extends Iterable, EventStream { + T read(); // returns null at end of stream +} +``` + +### Two-Phase Initialization + +Writers use a two-phase initialization pattern: + +1. **Configuration phase**: `setEventEncoderFactory()` → `setInitialEvent()` → `addFrameProcessor()` +2. **Activation**: `activate()` unblocks the `readyLatch`, allowing user writes + +This is necessary because the protocol and auth layers must configure encoding and signing before any events can be +written. A `CountDownLatch` blocks user writes until `activate()` is called. + +## Frame Encoding/Decoding Architecture + +Events are encoded into frames for wire transport: + +```java +interface Frame { T unwrap(); } +interface FrameEncoder> { ByteBuffer encode(F frame); } +interface FrameDecoder> { List decode(ByteBuffer buffer); } // stateful +interface FrameProcessor> { + F transformFrame(F frame); // e.g., signing + default F closingFrame(); // e.g., SigV4 trailing frame +} +``` + +Event encoding/decoding: + +```java +interface EventEncoder> { + F encode(SerializableStruct item); + F encodeFailure(Throwable exception); +} +interface EventDecoder> { + SerializableStruct decode(F frame); + SerializableStruct decodeInitialEvent(F frame, EventStream stream); +} +``` + +`ProcessingEventEncoderFactory` is a decorator that wraps factories with `FrameProcessor` instances (e.g., for SigV4 +event signing). Multiple processors compose via chained `withFrameProcessor()` calls. + +### EventPipeStream + +`EventPipeStream` is the thread-safe bridge between the event-writing thread and the transport-reading thread: + +- `ArrayBlockingQueue` with capacity 16 +- `write(ByteBuffer)` blocks if queue full +- `complete()` sends a `POISON_PILL` sentinel (empty ByteBuffer) +- `read()` blocks on `queue.take()`, returns -1 at POISON_PILL + +## How Streaming Flows Through the Pipeline + +### Client-Side Request (Event Streaming) + +1. `Client.call()` extracts the `EventStream` from the input and casts to `ProtocolEventStreamWriter` +2. Protocol serializes input, configures the writer with `EventEncoderFactory` +3. `writer.toDataStream()` creates a `DataStream` backed by `EventPipeStream` +4. After signing, `writer.addFrameProcessor(eventSigner)` adds SigV4 event signing +5. `writer.activate()` unblocks writes, sends initial event +6. Transport reads from the `DataStream` while user writes events on another thread + +### Client-Side Response (Event Streaming) + +1. Protocol creates `ProtocolEventStreamReader.newReader(body, eventDecoderFactory, false)` +2. User iterates over `EventStreamReader` using `read()` or `iterator()` +3. Reader reads from `DataStream.asInputStream()`, decodes frames, decodes events + +### Data Stream (Simpler) + +- **Serialization**: `writeDataStream()` sets the DataStream directly as the HTTP body +- **Deserialization**: `readDataStream()` returns the response body DataStream directly +- No encoding/decoding pipeline — raw bytes flow through + +## Schema Integration + +Streaming shapes are identified by the `@streaming` trait: + +```java +// In ApiOperation: +default Schema inputStreamMember() { + for (var m : inputSchema().members()) { + if (m.hasTrait(TraitKey.STREAMING_TRAIT)) return m; + } + return null; +} +``` + +The distinction between data streams and event streams is made by checking `Schema.type()`: +- `ShapeType.UNION` → event stream +- Otherwise → data stream (blob) + +## Codec Integration + +```java +// ShapeSerializer +default void writeDataStream(Schema schema, DataStream value) { /* no-op */ } +default void writeEventStream(Schema schema, EventStream value) { /* no-op */ } + +// ShapeDeserializer +default DataStream readDataStream(Schema schema) { throw UnsupportedOperationException; } +default EventStream readEventStream(Schema schema) { throw UnsupportedOperationException; } +``` + +HTTP binding serialization routes streaming members specially — `DataStream` is set as the HTTP body directly, while `EventStream` is converted via the writer pipeline. + +## I/O Utilities + +### ByteBufferUtils + +```java +static String base64Encode(ByteBuffer buffer); +static String getUTF8String(ByteBuffer buffer); +static byte[] getBytes(ByteBuffer buffer); // zero-copy if array-backed and exact +static InputStream byteBufferInputStream(ByteBuffer buffer); +``` + +### ByteBufferOutputStream + +Growable `OutputStream` backed by a `byte[]` array: + +```java +ByteBuffer toByteBuffer(); // wraps internal array (no copy) +void writeTo(OutputStream out); // direct write from internal array +void reset(); // reuse buffer without reallocation +``` + +### SmithyUri + +Lightweight, pre-decomposed URI representation optimized for endpoint-style URIs. Stores scheme, host, port, path, query +separately. Lazy caching of `toString()`, `hashCode()`, `toURI()`. Key methods: `withScheme()`, `withHost()`, +`withPath()`, `withConcatPath()`, `withEndpoint()`. + +### URLEncoding + +RFC 3986 percent-encoding: `encodeUnreserved(String, boolean preserveSlashes)` and `urlDecode(String)`. + +### QueryStringBuilder / QueryStringParser + +Build and parse query strings with proper encoding. `QueryStringBuilder.addForQueryParams()` skips keys already added +via `add()` (httpQuery takes precedence). diff --git a/docs/technical-guide/mcp.md b/docs/technical-guide/mcp.md new file mode 100644 index 000000000..75f3df8b6 --- /dev/null +++ b/docs/technical-guide/mcp.md @@ -0,0 +1,228 @@ +# MCP (Model Context Protocol) + +> **Last updated:** April 29, 2026 + +MCP is a JSON-RPC 2.0-based protocol that enables AI/LLM clients (like Claude Desktop, VS Code Copilot) to discover and +invoke "tools" exposed by servers. Smithy-Java's MCP integration bridges Smithy service models to MCP by automatically +converting Smithy operations into MCP tools with JSON Schema descriptions. The system is marked **developer-preview** +(`@SmithyUnstableApi`). + +**Source:** +- [`mcp/mcp-server/`](https://github.com/smithy-lang/smithy-java/tree/main/mcp/mcp-server) — MCP server runtime +- [`mcp/mcp-schemas/`](https://github.com/smithy-lang/smithy-java/tree/main/mcp/mcp-schemas) — MCP protocol types + (Smithy models + generated Java) +- [`smithy-ai-traits/`](https://github.com/smithy-lang/smithy-java/tree/main/smithy-ai-traits) — The `@prompts` trait + +## Architecture + +``` +Smithy Service (with operations) + ↓ codegen (server mode) +Generated Service + Operation interfaces + ↓ user implements operations +Service instance + ↓ registered with McpServer +McpServer (stdio/HTTP transport) + ↓ converts operations to MCP tools +MCP Client (Claude, VS Code, etc.) +``` + +### Module Structure + +| Module | Purpose | +|--------|---------| +| `mcp-schemas` | Smithy models for MCP protocol types + generated Java code | +| `mcp-server` | MCP server runtime (stdio/HTTP transport, tool dispatch) | +| `smithy-ai-traits` | The `@prompts` Smithy trait definition | +| `aws-mcp-types` | AWS-specific MCP types (PreRequest, AwsServiceMetadata) | + +## McpServer + +```java +McpServer server = McpServer.builder() + .stdio() // use System.in/System.out + .name("my-mcp-server") + .version("1.0.0") + .addService("employee-tools", employeeService) // local Smithy service + .addService(stdioProxy) // remote MCP server proxy + .toolFilter((serverName, toolName) -> true) // optional filtering + .build(); + +server.start(); +server.awaitCompletion(); +``` + +- Reads JSON-RPC messages line-by-line from an `InputStream` (stdio transport) +- Writes responses to an `OutputStream` with newline delimiters +- Delegates all protocol logic to `McpService` +- Supports hot-reloading: `addNewService()` and `addNewProxy()` with `tools/list_changed` notifications + +## McpService — The Protocol Engine + +Handles all MCP protocol methods: + +| Method | Description | +|--------|-------------| +| `initialize` | Returns capabilities, server info, protocol version negotiation | +| `ping` | Returns empty result | +| `tools/list` | Returns all registered tools with JSON Schema | +| `tools/call` | Invokes a tool (local operation or proxy) | +| `prompts/list` | Returns all registered prompts | +| `prompts/get` | Returns prompt content with template substitution | + +### Smithy-to-JSON-Schema Conversion + +The most complex part. Maps Smithy types to JSON Schema: + +| Smithy Type | JSON Schema | +|-------------|-------------| +| STRING, ENUM, BLOB, TIMESTAMP | `"string"` | +| BYTE, SHORT, INTEGER, LONG, FLOAT, DOUBLE | `"number"` | +| BOOLEAN | `"boolean"` | +| STRUCTURE | `"object"` with properties | +| LIST, SET | `"array"` with items | +| MAP | `"object"` with additionalProperties | +| UNION | `oneOf` with wrapped member variants | +| DOCUMENT | `["string","number","boolean","object","array","null"]` | + +### Input/Output Adaptation + +Handles type mismatches between JSON (what LLMs send) and Smithy types: +- `adaptDocument()` — Converts string→BigDecimal, string→BigInteger, Base64 string→Blob, various timestamp formats +- `adaptOutputDocument()` — Reverse: BigDecimal→string, Blob→Base64, discriminated union→wrapper format + +### Protocol Version Handling + +```java +public abstract sealed class ProtocolVersion { + v2024_11_05 // Original version + v2025_03_26 // Adds annotations support (default) + v2025_06_18 // Adds outputSchema support +} +``` + +Features are gated by protocol version — `outputSchema` only sent for ≥ 2025-06-18, `annotations` for ≥ 2025-03-26. + +## Proxy Support + +### StdioProxy + +Launches an external MCP server as a child process, communicates via stdin/stdout: + +```java +StdioProxy proxy = StdioProxy.builder() + .command("npx") + .arguments(List.of("some-mcp-server")) + .name("remote-tools") + .build(); +``` + +Uses `ConcurrentHashMap>` for request/response correlation. Virtual threads +for response reading. + +### HttpMcpProxy + +Sends JSON-RPC requests as HTTP POST to a remote endpoint: + +```java +HttpMcpProxy proxy = HttpMcpProxy.builder() + .endpoint("https://mcp.example.com") + .signer(sigv4Signer) + .timeout(Duration.ofMinutes(5)) + .build(); +``` + +Supports SSE response parsing, session management via `Mcp-Session-Id` header, and custom `Signer` for authentication. + +## The @prompts Trait + +Defined in `smithy-ai-traits` (namespace `smithy.ai`): + +```smithy +@trait(selector: ":is(service, operation)") +map prompts { + key: PromptName + value: PromptTemplateDefinition +} + +structure PromptTemplateDefinition { + @required description: String + @required template: String + arguments: ArgumentShape // @idRef pointing to a structure + preferWhen: String +} +``` + +Template substitution uses `{{placeholder}}` pattern. `preferWhen` is appended as ".Tool preference: ..." suffix. + +## The @oneOf Trait + +Applied to Smithy `document` shapes for discriminated unions: + +```smithy +@oneOf(discriminator: "__type", members: [{name: "circle", target: Circle}]) +document ShapeWithOneOf +``` + +The MCP server converts between: +- **MCP format** (wrapper): `{"circle": {"radius": 5}}` +- **Smithy format** (discriminated): `{"__type": "smithy.example#Circle", "radius": 5}` + +## Building MCP Services + +### smithy-build.json Configuration + +```json +{ + "plugins": { + "java-codegen": { + "service": "smithy.example#EmployeeService", + "namespace": "com.example.mcp", + "runtimeTraits": ["smithy.api#documentation", "smithy.api#examples", "smithy.ai#prompts"], + "modes": ["server"] + } + } +} +``` + +The `runtimeTraits` array is critical — it tells codegen to preserve `@documentation` (for tool descriptions) and +`@prompts` (for prompt templates) at runtime. + +### End-to-End Flow + +1. Define Smithy model with service, operations, and optional `@prompts` trait +2. Run codegen in `server` mode → generates service stubs and operation interfaces +3. Implement operations +4. Build the service using the generated staged builder +5. Create `McpServer` with `addService()` or proxy support +6. Client sends `tools/list` → server converts each operation to a `ToolInfo` with JSON Schema +7. Client sends `tools/call` → server deserializes arguments, invokes operation, serializes output + +### Proxy Aggregation + +A single MCP server can aggregate tools from multiple sources: + +```java +McpServer.builder() + .stdio() + .addService("local-tools", localService) // local Smithy service + .addService(StdioProxy.builder()...build()) // subprocess MCP server + .addService(HttpMcpProxy.builder()...build()) // remote HTTP MCP server + .build(); +``` + +## Key Classes Summary + +| Class | Role | +|---|---| +| `McpServer` | Transport layer (stdio I/O), lifecycle management | +| `McpServerBuilder` | Fluent builder for McpServer | +| `McpService` | Protocol engine — handles all MCP methods, schema conversion | +| `McpServerProxy` | Abstract base for remote MCP server proxies | +| `StdioProxy` | Subprocess-based MCP proxy (stdin/stdout) | +| `HttpMcpProxy` | HTTP-based MCP proxy with SSE and session management | +| `Prompt` | Prompt handling with template substitution | +| `PromptLoader` | Loads prompts from `@prompts` traits on Smithy services | +| `ToolFilter` | Interface for filtering which tools are exposed | +| `McpMetricsObserver` | Metrics collection interface | +| `ProtocolVersion` | Sealed class for MCP protocol version negotiation | diff --git a/docs/technical-guide/retries-waiters.md b/docs/technical-guide/retries-waiters.md new file mode 100644 index 000000000..c2177f6a4 --- /dev/null +++ b/docs/technical-guide/retries-waiters.md @@ -0,0 +1,188 @@ +# Retries and Waiters + +> **Last updated:** April 29, 2026 + +Smithy-Java provides two complementary systems for handling transient failures and polling for resource state: the retry +strategy (for automatic retries within a single API call) and the waiter framework (for polling across multiple API +calls until a resource reaches a desired state). + +**Source:** +- [`retries-api/`](https://github.com/smithy-lang/smithy-java/tree/main/retries-api) — Retry strategy interfaces +- [`retries/`](https://github.com/smithy-lang/smithy-java/tree/main/retries) — Concrete retry implementations +- [`client/client-waiters/`](https://github.com/smithy-lang/smithy-java/tree/main/client/client-waiters) — Waiter + framework + +## Retry Strategy + +### Token-Based Protocol + +The retry system uses an explicit token-passing protocol rather than simple counters: + +```java +public interface RetryStrategy { + AcquireInitialTokenResponse acquireInitialToken(AcquireInitialTokenRequest request); + RefreshRetryTokenResponse refreshRetryToken(RefreshRetryTokenRequest request); + RecordSuccessResponse recordSuccess(RecordSuccessRequest request); + int maxAttempts(); +} +``` + +Flow: +1. **Before first attempt**: `acquireInitialToken(scope)` → get a `RetryToken` + initial delay +2. **On failure**: `refreshRetryToken(token, failure, suggestedDelay)` → new token + backoff delay. Throws + `TokenAcquisitionFailedException` if retries exhausted. +3. **On success**: `recordSuccess(token)` → releases capacity back to the token bucket + +### RetryInfo + +Exceptions can implement `RetryInfo` to provide retry metadata: + +```java +public interface RetryInfo { + RetrySafety isRetrySafe(); // YES, NO, MAYBE + boolean isThrottle(); + Duration retryAfter(); +} +``` + +Only `RetrySafety.YES` allows retries. + +### StandardRetryStrategy + +The recommended strategy. Defaults: 3 max attempts (1 initial + 2 retries), 50ms base delay, 20s max backoff. + +```java +StandardRetryStrategy strategy = StandardRetryStrategy.builder() + .maxAttempts(5) + .backoffBaseDelay(Duration.ofMillis(100)) + .backoffMaxBackoff(Duration.ofSeconds(30)) + .build(); +``` + +Uses `ExponentialDelayWithJitter`: `random(0, min(maxDelay, baseDelay * 2^(attempt-2)))`. + +### AdaptiveRetryStrategy + +Extends the standard strategy with client-side rate limiting using a CUBIC-based algorithm. Adds a +`RateLimiterTokenBucketStore` that tracks per-scope send rates: + +- On throttling errors: reduces estimated safe send rate (`rate * 0.7`) +- On success: gradually increases send rate using CUBIC recovery +- May add delay even for the first attempt if the rate limiter is throttling + +### Token Bucket System + +- **`TokenBucket`** — Lock-free (`AtomicInteger`) token bucket. Long-polling tokens bypass the capacity check. +- **`TokenBucketStore`** — Per-scope store using `LinkedHashMap` with LRU eviction (128 max scopes). Thread-safe via + `ReentrantLock`. +- Default capacity: 500 tokens. Normal retry cost: 14 tokens. Throttling retry cost: 5 tokens. + +### Claimable + +`RetryStrategy` implements `Claimable` — `claim(owner)` succeeds once, preventing accidental sharing across unrelated +clients. + +## Pipeline Integration + +The retry loop is embedded in `ClientPipeline`: + +1. `retryStrategy.acquireInitialToken(scope)` — before first attempt +2. Execute attempt (auth → endpoint → sign → transport → deserialize) +3. On retryable error: extract `suggestedDelay` from `RetryInfo`, call `refreshRetryToken()`, sleep, loop +4. On `TokenAcquisitionFailedException`: no more retries +5. On success: `recordSuccess(token)` + +`ClientCall.isRetryDisallowed()` prevents retries when the input has a non-replayable data stream. + +## Waiter Framework + +### Waiter + +```java +public final class Waiter { + // Built via Waiter.builder(pollingFunction) +} +``` + +### Polling Loop + +```java +waiter.wait(input, maxWaitTimeMillis); +``` + +1. `attemptNumber++` +2. Call `pollingFunction.poll(input, overrideConfig)` — catch `ModeledException` +3. `resolveState(input, output, exception)` — iterate acceptors, first match wins +4. Switch on state: + - `SUCCESS` → return + - `FAILURE` → throw `WaiterFailureException` + - `RETRY` → `waitToRetry(attempt, maxWaitTime, startTime)` → `Thread.sleep(delay)` + +### Acceptors and Matchers + +```java +record Acceptor(WaiterState state, Matcher matcher) {} +``` + +`Matcher` is a sealed interface with four implementations: + +| Matcher | Factory | Matches When | +|---|---|---| +| `OutputMatcher` | `Matcher.output(Predicate)` | Operation succeeds and predicate matches output | +| `InputOutputMatcher` | `Matcher.inputOutput(BiPredicate)` | Operation succeeds and bi-predicate matches | +| `SuccessMatcher` | `Matcher.success(boolean)` | `true` → success; `false` → any error | +| `ErrorTypeMatcher` | `Matcher.errorType(String)` | Exception's schema ID name matches | + +### Backoff Strategy (Waiter-specific) + +Separate from retry backoff: + +```java +public interface BackoffStrategy { + long computeNextDelayInMills(int attempt, long remainingTime); + static BackoffStrategy getDefault(Long minDelayMillis, Long maxDelayMillis); +} +``` + +`DefaultBackoffStrategy`: exponential with jitter, 2ms min, 120s max by default. + +### JMESPath Integration + +For evaluating Smithy waiter acceptor expressions at runtime: + +- `JMESPathPredicate` — Parses a JMESPath expression, evaluates against the output `Document`, compares using a + `Comparator` +- `JMESPathBiPredicate` — For input+output JMESPath expressions +- `Comparator` — Enum: `STRING_EQUALS`, `BOOLEAN_EQUALS`, `ALL_STRING_EQUALS`, `ANY_STRING_EQUALS` + +## Waiter Code Generation + +`WaiterContainerGenerator` generates a record class (e.g., `CoffeeShopWaiter`) that wraps the client and exposes a method per waiter defined in the model's `@waitable` trait: + +```java +@SmithyGenerated +public record CoffeeShopWaiter(CoffeeShopClient client) { + public Waiter fooReady() { + return Waiter.builder(client::getFoo) + .backoffStrategy(BackoffStrategy.getDefault(minDelay, maxDelay)) + .success(Matcher.output(new JMESPathPredicate("status", "DONE", Comparator.STRING_EQUALS))) + .failure(Matcher.errorType("ResourceNotFound")) + .build(); + } +} +``` + +The client interface gets a `waiter()` method: + +```java +CoffeeShopWaiter waiter(); +``` + +## Key Design Patterns + +1. **Token-based retry protocol** — Enables circuit breaking via token buckets, scoped throttling, and long-polling + exemption. +2. **Separation of concerns** — `retries-api` (interfaces), `retries` (implementations), `client-core` (pipeline + orchestration), `client-waiters` (independent polling). +3. **Retries vs Waiters** — Retries handle transient failures within a single call (50ms-20s backoff). Waiters poll + across multiple calls until a resource state is reached (2ms-120s backoff). diff --git a/docs/technical-guide/rules-engine.md b/docs/technical-guide/rules-engine.md new file mode 100644 index 000000000..f6d41a02a --- /dev/null +++ b/docs/technical-guide/rules-engine.md @@ -0,0 +1,434 @@ +# Rules Engine + +> **Last updated:** April 29, 2026 + +The rules engine is Smithy-Java's implementation of the +[Smithy endpoint rules engine specification](https://smithy.io/2.0/additional-specs/rules-engine/index.html). It allows +model authors to define rules that determine which endpoint is used for each operation. Smithy-Java developed an +innovative approach: instead of interpreting the rules tree directly, it transforms the rules into a Binary Decision +Diagram (BDD) and compiles them to bytecode executed by a stack-based virtual machine. This eliminates redundant +condition evaluation, enables aggressive optimization, and achieves near-zero allocation on the hot path. + +**Source:** [`rules-engine/`](https://github.com/smithy-lang/smithy-java/tree/main/rules-engine) + +## Architecture Overview + +The rules engine has a three-stage pipeline: + +``` +EndpointRuleSet → CFG → BDD (with sifting + node reversal) → Bytecode → VM Evaluation +``` + +Two endpoint resolver implementations exist: + +- **`BytecodeEndpointResolver`** — The high-performance path. Compiles rules to bytecode and evaluates via a stack-based + VM traversing a BDD. Used by generated clients. +- **`DecisionTreeEndpointResolver`** — The slow fallback. Interprets the ruleset decision tree directly using Smithy's + `RuleEvaluator`. Used by the dynamic client where BDD compilation isn't practical. + +## RulesEngineBuilder — Entry Point + +[`RulesEngineBuilder`](https://github.com/smithy-lang/smithy-java/blob/main/rules-engine/src/main/java/software/amazon/smithy/java/rulesengine/RulesEngineBuilder.java) +is the central factory. It: + +1. Auto-discovers `RulesExtension` implementations via `ServiceLoader` (always includes `StdExtension`) +2. Collects functions, builtin providers, and builtin context keys from extensions +3. Provides two compilation paths and a loading path + +### Compilation from EndpointRuleSet + +```java +public Bytecode compile(EndpointRuleSet rules) { + var cfg = Cfg.from(rules); // 1. Convert to Control Flow Graph + var originalTrait = EndpointBddTrait.from(cfg); // 2. Build BDD from CFG + var optimizedTrait = SiftingOptimization.builder() // 3. Sifting optimization + .cfg(cfg).build().apply(originalTrait); + var reversedTrait = new NodeReversal().apply(optimizedTrait); // 4. Node reversal + return compile(reversedTrait); // 5. Compile to bytecode +} +``` + +### Compilation from pre-built BDD + +```java +public Bytecode compile(EndpointBddTrait bdd) { + return new BytecodeCompiler(extensions, bdd, functions, builtinProviders, builtinKeys).compile(); +} +``` + +### Loading pre-compiled bytecode + +```java +public Bytecode load(byte[] data) { ... } // Validates header, loads all sections +``` + +## The BDD Transformation + +The BDD (Binary Decision Diagram) is the core data structure that replaces the traditional if-else decision tree. The +transformation happens in the `smithy-rules` library (external dependency), but the rules engine consumes it. + +### What the BDD Represents + +- Each **internal node** tests a single condition (by index) +- Each node has a **high** (true) and **low** (false) branch +- **Terminal nodes**: `TRUE (1)`, `FALSE (-1)`, or `RESULT (100_000_000 + resultIndex)` +- **Complement edges** (negative references) encode logical NOT without extra nodes + +### BDD Node Encoding + +Flat `int[]` array, 3 ints per node (12 bytes): + +``` +[conditionIndex][highRef][lowRef] +``` + +Reference encoding: +- `1` = TRUE terminal +- `-1` = FALSE terminal +- `2, 3, ...` = Node at index `ref - 1` +- `-2, -3, ...` = Complement of node at index `-ref - 1` +- `100_000_000+` = Result terminal (result index = `ref - 100_000_000`) + +### Optimization Pipeline + +1. **`Cfg.from(rules)`** — Converts the `EndpointRuleSet` into a Control Flow Graph +2. **`EndpointBddTrait.from(cfg)`** — Builds the initial BDD from the CFG. Conditions are shared/deduplicated across + paths. +3. **`SiftingOptimization`** — Reorders BDD variables to minimize node count. Classic BDD optimization that tries + different variable orderings. +4. **`NodeReversal`** — Reverses node ordering for better cache locality during traversal. + +The `EndpointBddTrait` stores: +- Parameters (input variables with types, defaults, builtins) +- Conditions (each is a Smithy `Condition` expression) +- Results (each is an `EndpointRule`, `ErrorRule`, or `NoMatchRule`) +- The BDD graph itself + +## Bytecode Compilation + +[`BytecodeCompiler`](https://github.com/smithy-lang/smithy-java/blob/main/rules-engine/src/main/java/software/amazon/smithy/java/rulesengine/BytecodeCompiler.java) +transforms the BDD trait into a `Bytecode` object. It compiles each condition and result into stack-based bytecode +instructions. + +### Compilation Process + +```java +Bytecode compile() { + // 1. Compile all conditions into bytecode sequences + for (int i = 0; i < bdd.getConditions().size(); i++) { + writer.markConditionStart(); + compileCondition(bdd.getConditions().get(i)); + } + // 2. Compile all results into bytecode sequences + for (Rule result : bdd.getResults()) { + writer.markResultStart(); + // EndpointRule, ErrorRule, or NoMatchRule + } + return buildProgram(); +} +``` + +### Expression Compilation + +The compiler uses Smithy's `ExpressionVisitor` to walk the AST and emit bytecode: + +- **Literals** → `LOAD_CONST` / `LOAD_CONST_W` +- **References** → `LOAD_REGISTER` +- **String templates** → Multiple loads + `RESOLVE_TEMPLATE` +- **GetAttr** → `GET_PROPERTY`, `GET_INDEX`, or register-optimized variants +- **IsSet** → `TEST_REGISTER_ISSET` (optimized for references) or `ISSET` +- **Not** → `TEST_REGISTER_NOT_SET` (optimized) or `NOT` +- **BoolEquals** → `TEST_REGISTER_IS_TRUE`/`TEST_REGISTER_IS_FALSE` (optimized) or `BOOLEAN_EQUALS` +- **StringEquals** → `STRING_EQUALS_REG_CONST` (optimized for ref==const) or `STRING_EQUALS` +- **Library functions** → Special opcodes for builtins or generic `FN0`-`FN3`/`FN` for custom functions + +### Fused Opcodes + +The compiler aggressively fuses common patterns into single opcodes: + +| Fused Opcode | Pattern | Benefit | +|---|---|---| +| `SUBSTRING_EQ` | `stringEquals(coalesce(substring(ref, s, e, rev), ""), "xx")` | Single opcode instead of 5+ | +| `SPLIT_GET` | `split(ref, delim, 0)#[index]` | Avoids allocating the split array | +| `SELECT_BOOL_REG` | `ite(register, constA, constB)` | Single opcode for if-then-else | +| `STRING_EQUALS_REG_CONST` | `stringEquals(ref, "literal")` | Avoids stack manipulation | +| `SET_REG_RETURN` | `SET_REGISTER` + `RETURN_VALUE` | Fuses two instructions | +| `BUILD_URI` | URL template decomposition | Constructs `SmithyUri` directly, avoids `URI.create()` | +| `STRUCTN` | Small records (≤8 entries) | Uses `ArrayPropertyGetter` (linear scan) instead of `HashMap` | + +### Register Allocation + +[`RegisterAllocator`](https://github.com/smithy-lang/smithy-java/blob/main/rules-engine/src/main/java/software/amazon/smithy/java/rulesengine/RegisterAllocator.java) +assigns register indices: +- Input parameters get registers 0..N-1 +- Temporary variables (condition results) get registers N..M +- Maximum 256 registers (byte-indexed) + +## The Bytecode Format + +### Binary Format (44-byte header) + +``` +Offset Size Description +0 4 Magic number (0x52554C45 = "RULE") +4 2 Version (currently 1) +6 2 Condition count +8 2 Result count +10 2 Register count +12 2 Constant pool size +14 2 Function count +16 4 BDD node count +20 4 BDD root reference +24 4 Condition table offset +28 4 Result table offset +32 4 Function table offset +36 4 Constant pool offset +40 4 BDD table offset +``` + +### File Layout + +1. **Header** (44 bytes) +2. **Condition Table** — Array of 4-byte absolute offsets to condition bytecode +3. **Result Table** — Array of 4-byte absolute offsets to result bytecode +4. **Function Table** — Length-prefixed UTF-8 function names +5. **Register Definitions** — Serialized parameter metadata +6. **BDD Table** — Flat array of BDD nodes (12 bytes each) +7. **Instruction Section** — Compiled bytecode for all conditions and results +8. **Constant Pool** — Type-tagged constants (null, string, integer, boolean, list, map) + +### Condition Classification (Inline BDD Optimization) + +At construction time, `Bytecode.classifyConditions()` scans each condition's bytecode to identify trivial patterns that +can be evaluated inline during BDD traversal without entering the full VM dispatch loop: + +```java +COND_ISSET = 1; // TEST_REGISTER_ISSET + RETURN_VALUE +COND_IS_TRUE = 2; // TEST_REGISTER_IS_TRUE + RETURN_VALUE +COND_IS_FALSE = 3; // TEST_REGISTER_IS_FALSE + RETURN_VALUE +COND_NOT_SET = 4; // TEST_REGISTER_NOT_SET + RETURN_VALUE +COND_STRING_EQ_REG_CONST = 5; // STRING_EQUALS_REG_CONST + RETURN_VALUE +``` + +These are stored in `conditionTypes[]` and `conditionOperands[]` arrays for O(1) lookup during BDD traversal. + +## The VM Implementation + +[`BytecodeEvaluator`](https://github.com/smithy-lang/smithy-java/blob/main/rules-engine/src/main/java/software/amazon/smithy/java/rulesengine/BytecodeEvaluator.java) +is a stack-based VM with a fixed-size stack (`Object[64]`) and a register file (`Object[]`). + +### Core Evaluation Loop + +```java +private Object runLoop(byte[] instructions, RulesFunction[] functions, Object[] constantPool) { + int pc = this.pc; + int sp = this.stackPosition; + Object[] stack = this.stack; + Object[] regs = this.registers; + + while (pc < instructions.length) { + int opcode = instructions[pc++] & 0xFF; + switch (opcode) { + case Opcodes.LOAD_CONST -> stack[sp++] = constantPool[instructions[pc++] & 0xFF]; + case Opcodes.LOAD_REGISTER -> stack[sp++] = regs[instructions[pc++] & 0xFF]; + // ... 40+ opcodes + } + } +} +``` + +Key design: `pc`, `sp`, `stack`, `regs` are local variables to avoid field access overhead and help JIT register +allocation. + +### BDD Evaluation with Inline Optimization + +The critical hot path is `evaluateBdd()`: + +```java +Endpoint evaluateBdd() { + int ref = bytecode.getBddRootRef(); + int[] nodes = bytecode.getBddNodes(); + byte[] condTypes = bytecode.conditionTypes; + int[] condOps = bytecode.conditionOperands; + Object[] regs = this.registers; + + while (Bdd.isNodeReference(ref)) { + int idx = ref > 0 ? ref - 1 : -ref - 1; + int base = idx * 3; + int condIdx = nodes[base]; + + boolean result = switch (condTypes[condIdx]) { + case COND_ISSET -> regs[condOps[condIdx]] != null; // Inline! + case COND_IS_TRUE -> regs[condOps[condIdx]] == Boolean.TRUE; // Inline! + case COND_STRING_EQ_REG_CONST -> { ... } // Inline! + default -> test(condIdx); // Full bytecode evaluation + }; + + // Handle complement edges (XOR with sign bit) + ref = (result ^ (ref < 0)) ? nodes[base + 1] : nodes[base + 2]; + } + + if (Bdd.isTerminal(ref)) return null; + return resolveResult(ref - Bdd.RESULT_OFFSET); +} +``` + +For most real-world rulesets, the majority of conditions are simple register checks (isSet, isTrue, stringEquals) that +get inlined — avoiding the overhead of the full bytecode dispatch loop. + +### Thread-Local Evaluators + +`BytecodeEndpointResolver` uses `ThreadLocal` — each thread reuses its evaluator with no +synchronization overhead. + +## Register Filling + +### RegisterFiller + +Two implementations selected at construction time: +- **`FastRegisterFiller`** (< 64 registers) — Uses `long` bitmasks for O(1) tracking of filled/required registers +- **`LargeRegisterFiller`** (≥ 64 registers) — Uses `boolean[]` arrays + +Fill order: +1. Copy register template (defaults) +2. Apply input parameters from the request +3. Apply builtin providers (key-based first, then function-based) +4. Validate required parameters + +### RegisterSink (Zero-Allocation Parameter Passing) + +`ContextProvider.RegisterSink` writes endpoint parameters directly into register indices, avoiding `Map` +allocation: + +```java +void put(String name, Object value) { + Integer i = registerMap.get(name); + if (i != null) { + values[i] = value; + if (i < 64) filled |= 1L << i; + } +} +``` + +## How Endpoints Are Resolved + +### BytecodeEndpointResolver.resolveEndpoint(EndpointResolverParams params) + +1. Get thread-local `BytecodeEvaluator` +2. Write endpoint params into `RegisterSink` via `ContextProvider.createEndpointParams()` +3. Call `evaluator.resetFromSink(ctx)` — copies template, drains sink, fills builtins, validates +4. Call `evaluator.evaluateBdd()` — traverses BDD, evaluates conditions, resolves result +5. Result bytecode builds an `Endpoint` via `RETURN_ENDPOINT` opcode + +### Context Parameter Collection + +[`ContextProvider`](https://github.com/smithy-lang/smithy-java/blob/main/rules-engine/src/main/java/software/amazon/smithy/java/rulesengine/ContextProvider.java) +is a sealed interface hierarchy that extracts parameters from operations: + +- **`OrchestratingProvider`** — Top-level, caches per-operation providers in a `ConcurrentHashMap` +- **`ContextParamProvider`** — Reads `smithy.rules#contextParam` trait from input members +- **`ContextPathProvider`** — Evaluates JMESPath expressions from `smithy.rules#operationContextParams` +- **`StaticParamsProvider`** — Returns fixed values from `smithy.rules#staticContextParams` +- **`MultiContextParamProvider`** — Composes multiple providers + +## Built-in Functions and Extensibility + +### RulesExtension SPI + +```java +public interface RulesExtension { + void putBuiltinProviders(Map> providers); + void putBuiltinKeys(Map> keys); + Iterable getFunctions(); + void extractEndpointProperties(Endpoint.Builder builder, Context context, + Map properties, Map> headers); +} +``` + +Extensions are discovered via `ServiceLoader` and always include `StdExtension`. + +### RulesFunction Interface + +```java +public interface RulesFunction { + int getArgumentCount(); + String getFunctionName(); + Object apply(Object... arguments); // Generic N-arg + Object apply0(); // 0-arg fast path + Object apply1(Object arg1); // 1-arg fast path + Object apply2(Object arg1, Object arg2); // 2-arg fast path +} +``` + +Functions with 0-3 args use specialized methods to avoid varargs array allocation. The compiler emits `FN0`-`FN3` +opcodes accordingly. + +### Built-in Functions with Dedicated Opcodes + +| Function | Opcode | Notes | +|---|---|---| +| `substring` | `SUBSTRING` | Inline start/end/reverse as operands | +| `isValidHostLabel` | `IS_VALID_HOST_LABEL` | Validates DNS labels | +| `parseURL` | `PARSE_URL` | Uses `UriFactory` LRU cache | +| `uriEncode` | `URI_ENCODE` | Delegates to `URLEncoding.encodeUnreserved()` | +| `split` | `SPLIT` | String splitting | +| `coalesce` | `JNN_OR_POP` chain | Short-circuit null coalescing | +| `ite` | `JMP_IF_FALSE`+`JUMP` or `SELECT_BOOL_REG` | If-then-else | +| `isSet` | `TEST_REGISTER_ISSET` / `ISSET` | Null check | +| `not` | `TEST_REGISTER_NOT_SET` / `NOT` | Boolean negation | + +## Performance: BDD/VM vs Traditional Evaluation + +| Aspect | Traditional (`DecisionTreeEndpointResolver`) | BDD/VM (`BytecodeEndpointResolver`) | +|---|---|---| +| Condition evaluation | Redundant across shared paths | Each condition evaluated at most once (BDD sharing) | +| Hot path allocation | `Map` per resolution | Zero allocation (`RegisterSink` + `long` bitmasks) | +| Simple conditions | Full expression evaluation | Inline array lookup + comparison (no VM dispatch) | +| Thread safety | New state per call | Thread-local evaluator reuse | +| Constants | Parsed per evaluation | Pre-loaded constant pool | +| URI construction | `URI.create(string)` | `BUILD_URI` decomposes template, `UriFactory` LRU cache | +| Small records | `HashMap` | `ArrayPropertyGetter` (linear scan, ≤8 entries) | +| Interpreter state | Object fields | Local variables (JIT-friendly) | + +### Key Performance Numbers + +- Stack size: 64 slots (fixed, no resizing) +- Max registers: 256 (byte-indexed) +- Max constants: 65,536 (short-indexed) +- BDD nodes: flat `int[]` array (12 bytes each), excellent cache locality +- Condition classification: O(1) lookup via parallel `byte[]` and `int[]` arrays +- URI cache: LRU (32 entries) with hot-slot optimization + +## Key Classes Summary + +| Class | Role | +|---|---| +| `RulesEngineBuilder` | Factory: compiles rulesets to bytecode, loads bytecode, manages extensions | +| `BytecodeCompiler` | Compiles `EndpointBddTrait` conditions/results into bytecode | +| `BytecodeWriter` | Incrementally builds the binary bytecode format with jump patching | +| `BytecodeReader` | Reads/deserializes the binary bytecode format | +| `Bytecode` | Immutable compiled program: bytecode, offsets, constant pool, BDD, condition classification | +| `Opcodes` | Constants for all 40+ instruction opcodes | +| `BytecodeEvaluator` | Stack-based VM that executes bytecode, contains `evaluateBdd()` hot path | +| `BytecodeEndpointResolver` | `EndpointResolver` using bytecode VM with thread-local evaluators | +| `DecisionTreeEndpointResolver` | Fallback `EndpointResolver` using Smithy's `RuleEvaluator` | +| `RegisterAllocator` | Assigns register indices to parameters and temp variables | +| `RegisterFiller` | Abstract class with `FastRegisterFiller` (bitmask) and `LargeRegisterFiller` (array) | +| `ContextProvider` | Sealed interface hierarchy for extracting endpoint parameters from operations | +| `RulesExtension` | SPI for extending the engine with custom builtins and functions | +| `RulesFunction` | Interface for custom functions with arity-specialized methods | +| `StdExtension` | Default extension providing `SDK::Endpoint` builtin | +| `EndpointUtils` | Static helpers: property access, substring, splitGet, value conversion | +| `UriFactory` | LRU cache for URI parsing with hot-slot optimization | +| `BytecodeDisassembler` | Human-readable disassembly (used by `Bytecode.toString()`) | +| `BytecodeWalker` | Utility for walking bytecode instructions respecting operand boundaries | + +## Testing + +The test suite includes: +- **Unit tests**: `BytecodeCompilerTest`, `BytecodeEvaluatorTest`, `BytecodeWriterTest`, `BytecodeReaderTest`, + `RegisterAllocatorTest`, `RegisterFillerTest`, `EndpointUtilsTest`, `UriFactoryTest` +- **Integration tests**: `BytecodeEndpointResolverTest`, `DecisionTreeEndpointResolverTest` +- **Fuzz tests** (using Jazzer): `BytecodeEvaluatorFuzzTest`, `BytecodeReaderFuzzTest`, `BytecodeLoadFuzzTest`, + `BytecodeWalkerFuzzTest` — test robustness against malformed bytecode inputs +- **Disassembler tests**: `BytecodeDisassemblerTest` diff --git a/docs/technical-guide/schemas.md b/docs/technical-guide/schemas.md new file mode 100644 index 000000000..9de8dc68b --- /dev/null +++ b/docs/technical-guide/schemas.md @@ -0,0 +1,467 @@ +# Schemas + +> **Last updated:** April 29, 2026 + +Schemas are a core innovation in Smithy-Java. A `Schema` is a trimmed-down, runtime representation of a Smithy model +shape that carries enough information to serialize, deserialize, and validate Java objects, without requiring the full +Smithy model at runtime. Schemas decouple protocols from generated types: a client can switch protocols (JSON, CBOR, +XML) at runtime because the protocol codec reads all format-specific behavior (field names, timestamp formats, etc.) +from the schema, not from hardcoded logic in the generated type. + +**Source:** + [`core/src/main/java/software/amazon/smithy/java/core/schema/`](https://github.com/smithy-lang/smithy-java/tree/main/core/src/main/java/software/amazon/smithy/java/core/schema) + +## Class Hierarchy + +`Schema` is an `abstract sealed class` with five permitted subclasses: + +``` +Schema (abstract sealed) +├── RootSchema — Fully resolved, non-recursive shape (structures, scalars, enums, etc.) +├── MemberSchema — A member targeting an already-built schema +├── DeferredRootSchema — A possibly-recursive schema with lazily resolved members +├── DeferredMemberSchema — A member targeting an unbuilt SchemaBuilder (recursive reference) +└── ResolvedRootSchema — Created when a DeferredRootSchema is fully resolved +``` + +All subclasses are package-private and `final`. External code interacts only with the `Schema` base class. + +### Key Fields on Schema + +Every `Schema` instance carries: + +- `ShapeType type` — The Smithy shape type (STRUCTURE, LIST, STRING, etc.). Members report their *target's* type, never MEMBER. +- `ShapeId id` — The full Smithy shape ID (e.g., `com.example#MyStruct$fieldName` for members). +- `TraitMap traits` — Merged traits (target + member traits for member schemas). See [Trait Storage](#trait-storage) below. +- `String memberName` — Non-null only for member schemas. +- `int memberIndex` — Position in the parent's member list. Used for bitfield presence tracking. +- `boolean isRequiredByValidation` — True if the member has `@required` and no non-null default. +- Precomputed validation constraints — `minLengthConstraint`, `maxLengthConstraint`, `minRangeConstraint`, + `maxRangeConstraint`, `stringPattern`, `stringValidationFlags`, etc. All computed at construction time for + zero-allocation validation. +- `Supplier> shapeBuilder` — Factory for creating shape builders (used during deserialization). +- `Class shapeClass` — The Java class associated with this schema. + +## Creating Schemas + +### Scalar Types + +Static factory methods on `Schema` create simple shapes: + +```java +Schema.createString(ShapeId.from("com.example#Name"), new LengthTrait(...)) +Schema.createInteger(ShapeId.from("com.example#Count")) +Schema.createTimestamp(ShapeId.from("com.example#CreatedAt")) +Schema.createEnum(ShapeId.from("com.example#Color"), Set.of("RED", "GREEN", "BLUE")) +Schema.createIntEnum(ShapeId.from("com.example#Priority"), Set.of(1, 2, 3)) +``` + +These return a `RootSchema` directly, no builder needed. + +### Aggregate Types (Structures, Unions, Lists, Maps) + +Aggregate types use a builder: + +```java +Schema myStruct = Schema.structureBuilder(ShapeId.from("com.example#MyStruct")) + .putMember("name", Schema.createString(ShapeId.from("com.example#MyStruct$name")), + new RequiredTrait()) + .putMember("age", Schema.createInteger(ShapeId.from("com.example#MyStruct$age"))) + .builderSupplier(() -> new MyStruct.Builder()) + .shapeClass(MyStruct.class) + .build(); +``` + +`SchemaBuilder.build()` is idempotent, subsequent calls return the cached result. After `build()`, no more members can +be added. + +### Prelude Schemas + +[`PreludeSchemas`](https://github.com/smithy-lang/smithy-java/blob/main/core/src/main/java/software/amazon/smithy/java/core/schema/PreludeSchemas.java) provides static constants for all Smithy prelude types: + +```java +PreludeSchemas.STRING // smithy.api#String +PreludeSchemas.INTEGER // smithy.api#Integer +PreludeSchemas.BOOLEAN // smithy.api#Boolean +PreludeSchemas.DOCUMENT // smithy.api#Document +// ... and primitive variants with default traits: +PreludeSchemas.PRIMITIVE_BOOLEAN // smithy.api#PrimitiveBoolean (default: false) +PreludeSchemas.PRIMITIVE_INTEGER // smithy.api#PrimitiveInteger (default: 0) +``` + +## Lists and Maps + +### List Schemas + +A list schema must have exactly one member named `"member"`: + +```java +Schema myList = Schema.listBuilder(ShapeId.from("com.example#NameList")) + .putMember("member", Schema.createString(ShapeId.from("com.example#NameList$member"))) + .build(); +``` + +The builder enforces this, passing any name other than `"member"` throws `IllegalArgumentException`. Build validation +also fails if the member is missing. + +Access the element schema via the cached accessor: + +```java +Schema elementSchema = myList.listMember(); // O(1), cached after first call +``` + +### Map Schemas + +A map schema must have exactly two members named `"key"` and `"value"`: + +```java +Schema myMap = Schema.mapBuilder(ShapeId.from("com.example#TagMap")) + .putMember("key", Schema.createString(ShapeId.from("com.example#TagMap$key"))) + .putMember("value", Schema.createString(ShapeId.from("com.example#TagMap$value"))) + .build(); +``` + +Access key/value schemas via cached accessors: + +```java +Schema keySchema = myMap.mapKeyMember(); // O(1), cached +Schema valueSchema = myMap.mapValueMember(); // O(1), cached +``` + +### Member Map Optimization + +The internal member storage uses optimized map implementations based on size: + +| Members | Implementation | +|---------|---------------| +| 0 | `Map.of()` | +| 1 | `Map.of(name, schema)` | +| 2 | Custom `Map2` (direct field comparison, faster than `Map.of` for 2 entries) | +| 3+ | `Map.ofEntries(...)` | + +This matters because list schemas always have 1 member and map schemas always have 2, the most common cases get the +most optimized storage. + +## Recursive Schema Handling + +Recursive shapes (e.g., a tree node that contains children of the same type) require special handling because the schema +can't be fully built before it references itself. + +### The Deferred Resolution Pattern + +`SchemaBuilder.putMember()` has two overloads: + +```java +// Normal: targets an already-built Schema +putMember(String name, Schema target, Trait... traits) + +// Recursive: targets an unbuilt SchemaBuilder +putMember(String name, SchemaBuilder target, Trait... traits) +``` + +When `build()` is called, the builder checks if any member targets an unbuilt `SchemaBuilder`. If so, it creates a +`DeferredRootSchema` instead of a `RootSchema`. + +### Resolution Flow + +1. **Build phase**: `DeferredRootSchema` is created with unresolved member builders. +2. **Lazy resolution**: On first access to `members()`, `member()`, etc., `resolveInternal()` builds all member + builders, creating `MemberSchema` or `DeferredMemberSchema` instances. Results are cached in a `volatile + ResolvedMembers` record. +3. **Final resolution**: Calling `resolve()` creates a `ResolvedRootSchema` and updates the `SchemaBuilder` to point to + it. +4. **Member access**: `DeferredMemberSchema.memberTarget()` calls `target.build()` on the `SchemaBuilder`, which returns + the already-built schema (cached in `builtShape`). + +### Example: Recursive Tree + +```java +SchemaBuilder treeBuilder = Schema.structureBuilder(ShapeId.from("com.example#TreeNode")); +Schema leafSchema = Schema.createString(ShapeId.from("com.example#TreeNode$value")); + +// Self-reference: children targets the builder, not a built schema +treeBuilder + .putMember("value", leafSchema) + .putMember("children", treeBuilder); // recursive! + +Schema treeSchema = treeBuilder.build(); // Returns DeferredRootSchema +``` + +In generated code, the codegen emits a two-phase pattern: + +```java +// Phase 1: Create builder +static final SchemaBuilder TREE_NODE_BUILDER = Schema.structureBuilder(id); + +// Phase 2: Add members (in static initializer block, after all builders exist) +TREE_NODE_BUILDER.putMember("value", LEAF_SCHEMA); +TREE_NODE_BUILDER.putMember("children", TREE_NODE_BUILDER); // self-reference + +// Phase 3: Build and resolve +static final Schema TREE_NODE = TREE_NODE_BUILDER.build().resolve(); +``` + +## Presence Tracking + +Presence tracking determines which required members have been set during deserialization. This is critical for +validation, a structure missing a required member is invalid. + +### PresenceTracker + +[`PresenceTracker`](https://github.com/smithy-lang/smithy-java/blob/main/core/src/main/java/software/amazon/smithy/java/core/schema/PresenceTracker.java) +is a `sealed abstract class` with three implementations: + +| Implementation | When Used | Mechanism | +|---|---|---| +| `NoOpPresenceTracker` | 0 required members | All operations are no-ops (singleton) | +| `RequiredMemberPresenceTracker` | 1–64 required members | `long` bitfield | +| `BigRequiredMemberPresenceTracker` | 65+ required members | `java.util.BitSet` | + +Usage in a generated builder: + +```java +private final PresenceTracker tracker = PresenceTracker.of(SCHEMA); + +public Builder name(String name) { + this.name = name; + tracker.setMember(SCHEMA_NAME); // sets bit for this member + return this; +} + +public MyStruct build() { + tracker.checkAllSet(SCHEMA); // throws if any required member is missing + return new MyStruct(this); +} +``` + +### How the Bitfield Works + +Members are sorted during schema construction so that required-with-no-default members come first. Each member gets a +`memberIndex` (0-based position). The bitmask for a required member is `1L << memberIndex`. + +- `setMember(Schema member)` → `setBitfields |= member.requiredByValidationBitmask()` +- `allSet()` → `schema.requiredStructureMemberBitfield() == setBitfields` + +The `requiredStructureMemberBitfield()` is the OR of all required members' bitmasks, precomputed at schema construction +time. + +### isRequiredByValidation + +A member is required by validation only if: +1. It has the `@required` trait, AND +2. It either has no `@default` trait, or its default is `null` + +This means a member with `@required` and `@default(0)` is NOT required by validation, the default satisfies the +requirement. + +## Validation + +The +[`Validator`](https://github.com/smithy-lang/smithy-java/blob/main/core/src/main/java/software/amazon/smithy/java/core/schema/Validator.java) +validates shapes using their schemas. Its key design insight is that it **implements `ShapeSerializer`**, validation +piggybacks on the serialization visitor pattern. + +### How It Works + +```java +List errors = Validator.validate(myStruct); +``` + +Internally: +1. Creates a `ShapeValidator` (implements `ShapeSerializer`) +2. Calls `shape.serialize(shapeValidator)` +3. Each `write*` method on the validator checks constraints from the schema +4. Errors are collected into a list + +### What Gets Validated + +| Constraint | Trait | Applies To | +|---|---|---| +| Type checking | (intrinsic) | All shapes | +| Required members | `@required` | Structure members | +| Length | `@length` | Strings (codepoint count), blobs, lists, maps | +| Range | `@range` | All numeric types | +| Pattern | `@pattern` | Strings | +| Enum values | `@enum` | String enums, int enums | +| Sparse | `@sparse` | Lists, maps (null element check) | +| Unique items | `@uniqueItems` | Lists | +| Union exactly-one | (intrinsic) | Unions | + +### Precomputed Constraints + +All constraint values are computed at schema construction time and stored as primitive fields. For example, range +validation for an integer member is a simple comparison against `minLongConstraint` / `maxLongConstraint`, no trait +lookup at validation time. + +String validation uses a bitfield (`stringValidationFlags`) to skip unnecessary checks: + +```java +static final int STRING_VALIDATE_LENGTH = 1; +static final int STRING_VALIDATE_ENUM = 2; +static final int STRING_VALIDATE_PATTERN = 4; +``` + +### Validation Errors + +`ValidationError` is a sealed interface with record implementations for each error type: + +- `RequiredValidationFailure` — missing required member +- `LengthValidationFailure` — length constraint violated +- `RangeValidationFailure` — range constraint violated +- `PatternValidationFailure` — regex pattern not matched +- `EnumValidationFailure` / `IntEnumValidationFailure` — invalid enum value +- `SparseValidationFailure` — null in non-sparse collection +- `TypeValidationFailure` — wrong type +- `UnionValidationFailure` — zero or multiple union members set +- `UniqueItemConflict` — duplicate items in `@uniqueItems` list +- `DepthValidationFailure` — exceeded max nesting depth + +Each error carries a `path()` (JSON-pointer-style) and a `message()`. + +### Configuration + +The validator supports `maxDepth` (default 100) and `maxAllowedErrors` (default 100). When limits are reached, it +short-circuits via `ValidationShortCircuitException`. + +## Trait Storage + +### TraitKey + +[`TraitKey`](https://github.com/smithy-lang/smithy-java/blob/main/core/src/main/java/software/amazon/smithy/java/core/schema/TraitKey.java) +is an identity-based key for O(1) trait access. Each trait class gets exactly one `TraitKey` with a unique integer `id`, +assigned via `ClassValue>` for deduplication. + +Pre-defined constants exist for ~40 commonly used traits: + +```java +TraitKey.REQUIRED_TRAIT // RequiredTrait.class +TraitKey.LENGTH_TRAIT // LengthTrait.class +TraitKey.JSON_NAME_TRAIT // JsonNameTrait.class +TraitKey.STREAMING_TRAIT // StreamingTrait.class +TraitKey.HTTP_HEADER_TRAIT // HttpHeaderTrait.class +// ... etc. +``` + +### TraitMap + +`TraitMap` is a package-private, array-indexed trait storage. Internally it's a `Trait[]` array indexed by +`TraitKey.id`: + +```java +TraitMap traits = TraitMap.create(new RequiredTrait(), new LengthTrait(1, 100)); +RequiredTrait req = traits.get(TraitKey.REQUIRED_TRAIT); // O(1) array lookup +``` + +The array is sized to `largestTraitId + 1`, so it's compact for typical schemas. + +### Trait Merging for Members + +Member schemas carry *merged* traits, the target shape's traits overlaid with the member's own traits. This is done via +`TraitMap.withMemberTraits()`: + +```java +// If target has @documentation("A string") and member has @required, +// the merged TraitMap contains both. +// If both have @documentation, the member's wins. +``` + +The `getTrait()` method on `Schema` always returns from the merged map. For member-only traits, use `getDirectTrait()`: + +```java +schema.getTrait(TraitKey.REQUIRED_TRAIT); // merged (target + member) +schema.getDirectTrait(TraitKey.REQUIRED_TRAIT); // member-only +``` + +## Schema Extension System + +Codecs and other components can attach lazily-computed data to schemas via the extension system. + +### SchemaExtensionProvider + +```java +public interface SchemaExtensionProvider { + SchemaExtensionKey key(); + T provide(Schema schema); // called lazily, result cached +} +``` + +Providers are discovered via `ServiceLoader`. Extensions are stored in an `Object[]` array on each `Schema`, initialized +with `NOT_COMPUTED` sentinels. On first access: + +```java +T value = schema.getExtension(MyExtension.KEY); +``` + +The provider's `provide()` is called, and the result is cached. Thread safety relies on returned objects being immutable +(benign race pattern, multiple threads may compute the same value, but the result is always the same). + +### Example: JSON Field Name Pre-computation + +The JSON codec uses `SmithyJsonSchemaExtensions` to pre-compute UTF-8 byte arrays for field names (including quotes and +colon), hash tables for O(1) field resolution, and indexed field name tables by `memberIndex`. This data is computed +once per schema and reused across all serialization/deserialization calls. + +## API-Level Classes + +### SerializableShape and SerializableStruct + +These interfaces connect schemas to the serialization system: + +```java +// Any shape that can serialize itself +@FunctionalInterface +public interface SerializableShape { + void serialize(ShapeSerializer encoder); +} + +// A structure/union with schema awareness +public interface SerializableStruct extends SerializableShape { + Schema schema(); + void serializeMembers(ShapeSerializer serializer); + T getMemberValue(Schema member); +} +``` + +Generated types implement `SerializableStruct`. Their `serializeMembers()` calls `serializer.write*(memberSchema, +value)` for each set member. The schema parameter on every write call is what enables protocol-agnostic serialization. + +### ApiOperation + +[`ApiOperation`](https://github.com/smithy-lang/smithy-java/blob/main/core/src/main/java/software/amazon/smithy/java/core/schema/ApiOperation.java) +represents a modeled operation with typed input/output: + +```java +public interface ApiOperation { + ShapeBuilder inputBuilder(); + ShapeBuilder outputBuilder(); + Schema schema(); + Schema inputSchema(); + Schema outputSchema(); + TypeRegistry errorRegistry(); + List effectiveAuthSchemes(); + // ... +} +``` + +### SchemaIndex + +[`SchemaIndex`](https://github.com/smithy-lang/smithy-java/blob/main/core/src/main/java/software/amazon/smithy/java/core/schema/SchemaIndex.java) +provides runtime lookup of schemas by `ShapeId`. Uses `ServiceLoader` to discover and combine multiple indexes (one per +generated package) into a `CombinedSchemaIndex`. + +## Key Design Patterns + +1. **Sealed class hierarchy** — `Schema` is sealed to 5 subclasses, enabling exhaustive pattern matching and preventing + external extension. +2. **Precomputed validation** — All constraint values computed at construction time, stored as primitive fields for + zero-allocation validation at runtime. +3. **Bitfield presence tracking** — Required members tracked via `long` bitfield (≤64 members) or `BitSet` (65+), with + member sorting to pack required members into low indices. +4. **Array-indexed trait storage** — `TraitMap` uses `Trait[]` indexed by `TraitKey.id` for O(1) access, avoiding hash + map overhead. +5. **Deferred resolution** — Recursive schemas use `DeferredRootSchema`/`DeferredMemberSchema` with lazy resolution, + breaking circular dependencies. +6. **Serialization-based validation** — `Validator` implements `ShapeSerializer`, reusing the visitor pattern for + validation without a separate traversal mechanism. +7. **Extension SPI** — `SchemaExtensionProvider` via `ServiceLoader` with lazy per-schema computation and benign-race + caching. diff --git a/docs/technical-guide/server.md b/docs/technical-guide/server.md new file mode 100644 index 000000000..ac74e6aae --- /dev/null +++ b/docs/technical-guide/server.md @@ -0,0 +1,375 @@ +# Server + +> **Last updated:** April 29, 2026 + +The server side of Smithy-Java provides a framework for building HTTP services from Smithy models. It's still under +development and marked as `@SmithyUnstableApi`. The architecture follows the same schema-driven, protocol-agnostic +design as the client side, protocols are pluggable via SPI, and generated server stubs use the same type system as +generated clients. + +**Source:** [`server/`](https://github.com/smithy-lang/smithy-java/tree/main/server) + +## Module Structure + +| Module | Purpose | +|---|---| +| `server-api` | Public interfaces: `Service`, `Operation`, `Server`, `ServerBuilder`, `Route`, `RequestContext` | +| `server-core` | Internal runtime: orchestration, handler pipeline, protocol resolution, HTTP request/response | +| `server-netty` | Netty-based HTTP/1.1 server implementation | +| `server-rpcv2-cbor` | Server-side `rpcv2-cbor` protocol | +| `server-proxy` | `ProxyService`, proxies to a downstream service via `DynamicClient` | + +Additionally, `aws/server/aws-server-restjson/` provides the AWS `restJson1` server protocol. + +## Core Abstractions + +### Service + +```java +public interface Service { + + Operation getOperation(String operationName); + List> getAllOperations(); + Schema schema(); + TypeRegistry typeRegistry(); + SchemaIndex schemaIndex(); +} +``` + +A `Service` holds a collection of `Operation` instances, a `Schema` (the Smithy service shape), a `TypeRegistry` (for +error types), and a `SchemaIndex`. Generated server stubs implement this interface. + +### Operation + +```java +public final class Operation { + // Sync + public static Operation of( + String name, BiFunction operation, + ApiOperation sdkOperation, Service service); + + // Async + public static Operation ofAsync( + String name, BiFunction> operation, + ApiOperation sdkOperation, Service service); + + public boolean isAsync(); + public String name(); + public ApiOperation getApiOperation(); + public Service getOwningService(); +} +``` + +Each `Operation` wraps either a sync `BiFunction` or an async `BiFunction>`. It also holds an `ApiOperation` (the codegen-generated schema for the operation) and a +back-reference to its owning `Service`. + +### Server and ServerBuilder + +```java +public interface Server { + static ServerBuilder builder(); // SPI-discovered + static ServerBuilder builder(String serverName); // Named lookup + void start(); + CompletableFuture shutdown(); +} +``` + +`Server.builder()` uses `ServiceLoader` to discover `ServerProvider` implementations. Currently the only provider is `NettyServerProvider` (name `"smithy-java-netty-server"`). + +`ServerBuilder` provides a fluent API: + +```java +Server server = Server.builder() + .endpoints(URI.create("http://localhost:8080")) + .addService(myService) + .addService("/api/v2", anotherService) // with path prefix + .numberOfWorkers(4) + .build(); + +server.start(); +``` + +### Route + +```java +public final class Route { + String getHostName(); + String getPathPrefix(); + Integer getPort(); + String getProtocol(); + List getServices(); +} +``` + +Routes enable virtual-host-style routing: requests are matched by hostname, port, protoqcol scheme, and path prefix to +determine which services are candidates. + +### RequestContext + +```java +public interface RequestContext { + String getRequestId(); +} +``` + +Currently minimal, marked with `// TODO Fill with more stuff`. The `OperationHandler` currently passes `null` for the +context when invoking operations. + +## Operation Dispatch Flow + +1. **HTTP Request arrives** at Netty → `HttpRequestHandler.channelRead()` +2. **Protocol Resolution**: `ProtocolResolver.resolve()` is called + - `ServiceMatcher.getCandidateServices()` narrows services by route matching + - Each `ServerProtocol` (loaded via SPI) tries `resolveOperation()`, first non-null result wins + - Returns `ServiceProtocolResolutionResult(service, operation, protocol)` +3. **Job Creation**: An `HttpJob` is created with the resolved operation, protocol, request, and response +4. **Body Accumulation**: HTTP content chunks are accumulated into a `ByteArrayOutputStream` +5. **Orchestrator Enqueue**: On `LastHttpContent`, the job is enqueued to the `Orchestrator` +6. **Handler Pipeline** (inside `SingleThreadOrchestrator`): + - `ProtocolHandler.before()` → `protocol.deserializeInput(job)` + - `ValidationHandler.before()` → validates the deserialized input using `Validator` + - `OperationHandler.before()` → invokes the user's sync/async function, sets the response + - Then `after()` in reverse order: + - `ProtocolHandler.after()` → `protocol.serializeOutput(job, output)` or `protocol.serializeError(job, error)` +7. **Response Writing**: `HttpRequestHandler.writeResponse()` converts the serialized `DataStream` to a Netty + `DefaultFullHttpResponse` + +## The Netty HTTP Server + +### NettyServer + +Uses Netty's `ServerBootstrap` with platform-optimal transport: +- **Epoll** (Linux) → **KQueue** (macOS) → **NIO** (fallback) +- Boss group: 1 thread for accepting connections +- Worker group: `availableProcessors() * 2` threads for I/O +- Creates an `OrchestratorGroup` with `ErrorHandlingOrchestrator(SingleThreadOrchestrator(...))` per worker + +### ServerChannelInitializer + +```java +void initChannel(Channel channel) { + pipeline.addLast("http1Codec", new HttpServerCodec()); + pipeline.addLast(new HttpRequestHandler(orchestratorGroup.next(), protocolResolver)); +} +``` + +Each new channel gets an HTTP/1.1 codec and an `HttpRequestHandler` bound to a specific orchestrator (round-robin +selected). + +### HttpRequestHandler + +A `ChannelDuplexHandler` that: +1. On `HttpRequest`: creates server-side `HttpRequest`, resolves protocol/operation, creates `HttpJob` +2. On `HttpContent`: accumulates body bytes into `ByteArrayOutputStream` +3. On `LastHttpContent`: wraps body as `DataStream`, enqueues job to orchestrator +4. On completion: writes `DefaultFullHttpResponse` back to channel, including CORS headers + +## Protocols + +### Protocol SPI + +Protocols are discovered via `ServiceLoader`: + +```java +public interface ServerProtocolProvider { + ServerProtocol provideProtocolHandler(List candidateServices); + ShapeId getProtocolId(); + int precision(); // lower = tried first +} +``` + +### ServerProtocol (Abstract Base) + +```java +public abstract class ServerProtocol { + public abstract ShapeId getProtocolId(); + public abstract ServiceProtocolResolutionResult resolveOperation( + ServiceProtocolResolutionRequest request, List candidates); + public abstract CompletableFuture deserializeInput(Job job); + protected abstract CompletableFuture serializeOutput( + Job job, SerializableStruct output, boolean isError); +} +``` + +### RpcV2 CBOR Protocol + +- **Resolution**: Checks `smithy-protocol: rpc-v2-cbor` header + POST method. Parses URL path: + `/service/{ServiceName}/operation/{OperationName}` +- **Deserialization**: Uses `Rpcv2CborCodec` to deserialize CBOR body +- **Serialization**: Uses `Rpcv2CborCodec` to serialize output to CBOR + +### AWS RestJson1 Protocol + +- **Resolution**: Builds a `UriMatcherMap` (trie-based URI router) from all operations' `@http` traits. Matches by HTTP + method + URI pattern (including path labels and query strings) +- **Deserialization**: Uses `HttpBinding.requestDeserializer()` with `JsonCodec`, handles path labels, query params, + headers, and JSON body +- **Serialization**: Uses `HttpBinding.responseSerializer()` with `JsonCodec` + +## Handler Pipeline + +### Handler Interface + +```java +public interface Handler { + CompletableFuture before(Job job); + CompletableFuture after(Job job); +} +``` + +Handlers form a pipeline with `before()` called in order and `after()` called in reverse (like a stack). If any +`before()` fails, the pipeline short-circuits to the `after()` phase. + +### Built-in Handlers + +Assembled by `HandlerAssembler`: + +```java +List.of( + new ProtocolHandler(), // 1st: deserialize input / serialize output + new ValidationHandler(), // 2nd: validate deserialized input + new OperationHandler() // 3rd: invoke user's operation function +); +``` + +1. **ProtocolHandler**: `before()` → deserialize input. `after()` → serialize output or error. +2. **ValidationHandler**: `before()` → runs `Validator.validate(input)`, throws `ValidationException` on failure. +3. **OperationHandler**: `before()` → invokes the user's sync/async function, sets response value. + +### Orchestrator Hierarchy + +``` +Orchestrator (sealed interface) + └── ObservableOrchestrator (sealed, adds inflightJobs()) + ├── SingleThreadOrchestrator, runs handler pipeline on dedicated thread + ├── OrchestratorGroup, manages pool with selection strategy + └── DelegatingObservableOrchestrator (sealed, decorator base) + └── ErrorHandlingOrchestrator, catches unhandled errors +``` + +- **SingleThreadOrchestrator**: Dedicated daemon thread consuming from a `LinkedBlockingDeque`. Each job walks + the handler pipeline as a state machine (BEFORE → AFTER → DONE), re-enqueuing itself when async handlers return + incomplete futures. +- **OrchestratorGroup**: Pool with pluggable selection strategies: + - `RoundRobinStrategy`, simple counter-based + - `LeastLoadedStrategy`, picks orchestrator with fewest inflight jobs + +## Server Stub Generation + +The codegen generates two types of artifacts for server mode: + +### Operation Interfaces + +For each operation, two `@FunctionalInterface` interfaces are generated: + +```java +// Sync +@FunctionalInterface +public interface AddBeerOperation { + AddBeerOutput addBeer(AddBeerInput input, RequestContext context); +} + +// Async +@FunctionalInterface +public interface AddBeerOperationAsync { + CompletableFuture addBeer(AddBeerInput input, RequestContext context); +} +``` + +### Service Implementation + +A concrete `Service` implementation with a **staged builder** pattern: + +```java +// Each stage enforces that all operations are provided +public interface GetBeerStage { + AddBeerStage addGetBeerOperation(GetBeerOperation op); + AddBeerStage addGetBeerOperation(GetBeerOperationAsync op); +} +public interface AddBeerStage { + BuildStage addAddBeerOperation(AddBeerOperation op); + BuildStage addAddBeerOperation(AddBeerOperationAsync op); +} +public interface BuildStage { + BeerService build(); +} +``` + +Usage: + +```java +BeerService service = BeerService.builder() + .addGetBeerOperation(new GetBeerImpl()) // sync + .addAddBeerOperation(new AddBeerAsyncImpl()) // async + .build(); +``` + +## Additional Components + +### FilteredService + +```java +public final class FilteredService implements Service { + public FilteredService(Service delegate, Predicate> filter); +} +``` + +Wraps a `Service` and filters its operations. Used with `OperationFilters.allowList()` / `OperationFilters.blockList()`. + +### ProxyService + +A `Service` implementation that proxies requests to a downstream service using `DynamicClient`. Supports model-driven +operation discovery, configurable auth/region/endpoint, and `ProxyOperationTrait` for input wrapping/unwrapping. + +### Job Hierarchy + +``` +Job (sealed interface) + └── DefaultJob (abstract sealed, holds operation + protocol + failure) + └── HttpJob (final, adds HttpRequest + HttpResponse) +``` + +`Request` and `Response` are sealed interfaces carrying a `Context` (key-value bag), a `DataStream` (serialized bytes), +and a deserialized/serialized `SerializableStruct`. + +## Current Limitations + +The server is in developer preview. Known limitations from code comments: + +1. **RequestContext not wired** — `OperationHandler` passes `null` for `RequestContext` +2. **Full body buffering** — Entire request body accumulated in memory before processing +3. **No max body size** — No limit on request body size +4. **HTTP/1.1 only** — No HTTP/2 support +5. **No TLS support** — Plain HTTP only +6. **No user-facing middleware API** — The `Handler` pipeline is internal; users cannot add custom handlers +7. **No graceful drain** — `shutdown()` returns immediately without draining inflight requests +8. **HandlerAssembler is static** — No user-extensible handler registration + +## Key Classes Summary + +| Class | Module | Role | +|---|---|---| +| `Service` | server-api | Core interface for a Smithy service with operations | +| `Operation` | server-api | Wraps a sync/async user function + ApiOperation metadata | +| `Server` / `ServerBuilder` | server-api | SPI-based server lifecycle + configuration | +| `Route` / `ServiceMatcher` | server-api/core | Request-to-service routing | +| `ServerProtocol` | server-core | Abstract base for protocol implementations | +| `ServerProtocolProvider` | server-core | SPI interface for protocol discovery | +| `ProtocolResolver` | server-core | Iterates protocols to resolve request → operation | +| `Handler` | server-core | Pipeline step with `before()`/`after()` lifecycle | +| `ProtocolHandler` | server-core | Delegates to protocol for (de)serialization | +| `ValidationHandler` | server-core | Validates deserialized input | +| `OperationHandler` | server-core | Invokes user's operation function | +| `SingleThreadOrchestrator` | server-core | Runs handler pipeline on a dedicated thread | +| `OrchestratorGroup` | server-core | Pool of orchestrators with selection strategies | +| `ErrorHandlingOrchestrator` | server-core | Catches unhandled errors and serializes them | +| `Job` / `HttpJob` | server-core | Request processing unit | +| `NettyServer` | server-netty | Netty-based server implementation | +| `HttpRequestHandler` | server-netty | Netty channel handler bridging HTTP to orchestrator | +| `RpcV2CborProtocol` | server-rpcv2-cbor | Server-side rpcv2-cbor protocol | +| `AwsRestJson1Protocol` | aws-server-restjson | Server-side AWS restJson1 with URI trie router | +| `FilteredService` | server-api | Decorator that filters operations by predicate | +| `ProxyService` | server-proxy | Proxies to downstream via DynamicClient | +| `OperationInterfaceGenerator` | codegen-plugin | Generates sync/async operation interfaces | +| `ServiceGenerator` | codegen-plugin | Generates Service implementation with staged builder |