Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .changeset/violet-melons-poke.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
---
---
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,10 +39,9 @@ https://github.com/SimplePDF/simplepdf-embed/assets/10613140/8924f018-6076-4e44-

# Get started

- 🧩 [Iframe bridge](./embed/README.md) - `@simplepdf/embed` (framework-free client to embed + programmatically drive the editor, with an AI SDK adapter, generated from the editor manifest; the React layer is `@simplepdf/react-embed-pdf`)
- ⚛️ [React component](./react/README.md) - `@simplepdf/react-embed-pdf`
- 🧩 [Iframe bridge](./embed/README.md) - `@simplepdf/embed` (framework-free client to embed + programmatically drive the editor, with an AI SDK adapter)
- 🚀 [Script tag](./web/README.md) - `@simplepdf/web-embed-pdf`
- 🛠 [Iframe API](./documentation/IFRAME.md) - `postMessage` events
- 🤖 [SimplePDF Copilot](./copilot/README.md) - AI form-filling reference implementation

# Features
Expand Down
24 changes: 12 additions & 12 deletions copilot/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,9 +69,9 @@ Browser
- SimplePDF Copilot drives the editor through `postMessage` (focus a field, set a value, navigate, submit)
- LLM streaming runs through your server via the Vercel AI SDK; you choose the provider
- Tool calls are executed in the browser, against the iframe. Your server only proxies the chat stream.
- **Voice input is different it is a deliberate audio egress, on one of two paths.** Dictating into the composer records a short audio clip in the browser; recording starts when you tap the mic, and when you confirm it (✓) the clip is transcribed and the editable transcript drops into the textarea. Two routes, each named in the recorder before the audio is sent:
- **Voice input is different, it is a deliberate audio egress, on one of two paths.** Dictating into the composer records a short audio clip in the browser; recording starts when you tap the mic, and when you confirm it (✓) the clip is transcribed and the editable transcript drops into the textarea. Two routes, each named in the recorder before the audio is sent:
- **Demo (server):** when the deployment is in demo mode (operator keys configured), the clip uploads to `/api/transcribe`, which forwards it to OpenAI (`gpt-4o-transcribe`) and returns the transcript. So **audio leaves the browser to SimplePDF's server and then OpenAI** (the server keeps no audio, logs no transcript).
- **BYOK (browser-direct):** configure a Speech-to-Text provider (OpenAI or a custom OpenAI-compatible endpoint) in the model picker's Speech-to-Text tab; the clip is sent **directly from the browser to that endpoint, never to SimplePDF**. The key lives only in this browser's encrypted vault a demo/reference feature (a browser-held key is exposed to anything on the page).
- **BYOK (browser-direct):** configure a Speech-to-Text provider (OpenAI or a custom OpenAI-compatible endpoint) in the model picker's Speech-to-Text tab; the clip is sent **directly from the browser to that endpoint, never to SimplePDF**. The key lives only in this browser's encrypted vault, a demo/reference feature (a browser-held key is exposed to anything on the page).
- In both cases PDF bytes still stay on-device, and audio is sent only when you confirm the recording (never automatically). The recorder prompt names the actual audio recipient before you confirm: the **demo** prompt reads "Speak to SimplePDF Copilot…" (its server→OpenAI flow is the one described above, and "What is this demo?" documents it); the **BYOK** prompt reads "Speak to OpenAI…" or "Speak to <your endpoint>…" (sent directly to that provider, not to SimplePDF).

## Built with
Expand Down Expand Up @@ -119,19 +119,19 @@ In the running app, open the chat sidebar, click **Bring your own provider**, pa

### Run the demo without asking viewers for a key

Asking every visitor to paste a provider key is friction-heavy. To skip that, put the deployment **in demo mode**: configure a single chat key/model/turn-cap plus a transcription key in your `.env`, and the server pays for chat + voice under your account for every visitor. There are no invite links demo mode is simply on whenever both are configured. The chat opens already wired up and the Model Picker stays out of the way (visitors can still bring their own key to override).
Asking every visitor to paste a provider key is friction-heavy. To skip that, put the deployment **in demo mode**: configure a single chat key/model/turn-cap plus a transcription key in your `.env`, and the server pays for chat + voice under your account for every visitor. There are no invite links, demo mode is simply on whenever both are configured. The chat opens already wired up and the Model Picker stays out of the way (visitors can still bring their own key to override).

Demo mode requires **both**:

- `DEMO_CHAT_API_KEY` + `DEMO_CHAT_MODEL` + `DEMO_RATE_LIMIT_TURNS` the chat key, the model, and the per-IP turn cap.
- `DEMO_STT_OPENAI_API_KEY` the voice transcription key (transcription-only, never your chat key).
- `DEMO_CHAT_API_KEY` + `DEMO_CHAT_MODEL` + `DEMO_RATE_LIMIT_TURNS`, the chat key, the model, and the per-IP turn cap.
- `DEMO_STT_OPENAI_API_KEY`, the voice transcription key (transcription-only, never your chat key).

Two chat models are supported on the demo path:

- Anthropic Claude Haiku 4.5 (`DEMO_CHAT_MODEL=anthropic_haiku_4_5`)
- DeepSeek V4 Flash (`DEMO_CHAT_MODEL=deepseek_v4_flash`)

Leave the demo vars unset and the deployment runs **BYOK-only**: every visitor brings their own key via the Model Picker. Note: with no invite gating, demo mode is open to anyone who can reach the page, so the **per-IP turn cap (`DEMO_RATE_LIMIT_TURNS`) is the cost control** size it accordingly. See [`.env.example`](./.env.example) for the exact vars.
Leave the demo vars unset and the deployment runs **BYOK-only**: every visitor brings their own key via the Model Picker. Note: with no invite gating, demo mode is open to anyone who can reach the page, so the **per-IP turn cap (`DEMO_RATE_LIMIT_TURNS`) is the cost control**, size it accordingly. See [`.env.example`](./.env.example) for the exact vars.

### Voice input (dictation)

Expand Down Expand Up @@ -170,9 +170,9 @@ The iframe will refuse to load on origins that aren't whitelisted, so add your s

For multi-container deployments (or any deploy where you want per-IP rate-limit counters to survive restarts), set `REDIS_URL` to a Redis-protocol-compatible instance (Valkey on DO Managed Caching is the canonical fit at $15/mo). When `REDIS_URL` is set, `IP_HASH_SALT` is also required (the server refuses to boot otherwise) so the persisted hashes can't be brute-forced against a leaked snapshot. Generate one with `openssl rand -hex 32`. Without `REDIS_URL`, counters live in memory per container, which is fine for local dev, single-instance hosts, or BYOK-only deployments.

> **DO App Platform gotcha wire the database from the App side.** If you're using DO Managed Caching, **don't** start by adding a Trusted Source on the cluster. Open your App Platform app → **Settings** → **App Spec** → **+ Create or Attach Database** → pick the existing cluster (or provision a new one). DO then auto-handles trusted sources, VPC routing, and injects the connection string into the app's env. Wiring from the cluster side leaves the App on a public-egress IP that can't be matched to a Trusted Source, and you'll get `ETIMEDOUT` even with the source allowlisted. Same shape as adding a custom domain: it has to be done from the App, not from the resource.
> **DO App Platform gotcha, wire the database from the App side.** If you're using DO Managed Caching, **don't** start by adding a Trusted Source on the cluster. Open your App Platform app → **Settings** → **App Spec** → **+ Create or Attach Database** → pick the existing cluster (or provision a new one). DO then auto-handles trusted sources, VPC routing, and injects the connection string into the app's env. Wiring from the cluster side leaves the App on a public-egress IP that can't be matched to a Trusted Source, and you'll get `ETIMEDOUT` even with the source allowlisted. Same shape as adding a custom domain: it has to be done from the App, not from the resource.
>
> Once attached, DO injects the connection string as `DATABASE_URL` (the bind variable's default name) **rename it** to `REDIS_URL` in the App's env vars, OR add a separate `REDIS_URL` entry whose value is `${cluster-name.DATABASE_URL}` to alias it. The copilot server only reads `REDIS_URL`.
> Once attached, DO injects the connection string as `DATABASE_URL` (the bind variable's default name), **rename it** to `REDIS_URL` in the App's env vars, OR add a separate `REDIS_URL` entry whose value is `${cluster-name.DATABASE_URL}` to alias it. The copilot server only reads `REDIS_URL`.

#### One-click deploy to DigitalOcean

Expand Down Expand Up @@ -219,9 +219,9 @@ The chat sidebar advertises these tools to the model. Each runs inside the ifram
| `setFieldValue` | Write a value into a field |
| `selectTool` | Switch the editor toolbar (`TEXT`, `COMB_TEXT`, `CHECKBOX`, `SIGNATURE`, `PICTURE`) |
| `goTo` | Navigate to a specific page (1-indexed) |
| `movePage` | Reorder a visible page (`fromPage` → `toPage`, both 1-indexed). Destructive only fired on explicit user request |
| `deletePages` | Remove visible pages and their fields (last remaining page can't be deleted). Destructive only fired on explicit user request |
| `rotatePage` | Rotate a visible page 90° clockwise per call. Destructive only fired on explicit user request |
| `movePage` | Reorder a visible page (`fromPage` → `toPage`, both 1-indexed). Destructive, only fired on explicit user request |
| `deletePages` | Remove visible pages and their fields (last remaining page can't be deleted). Destructive, only fired on explicit user request |
| `rotatePage` | Rotate a visible page 90° clockwise per call. Destructive, only fired on explicit user request |
| `submit` (Pro mode) / `download` (demo mode) | Finalize: real iframe `SUBMIT` on a Pro fork (lands in BYOS + webhooks) vs. an in-browser `DOWNLOAD` on the hosted demo |

Tool input + output schemas + the bridge that posts these events into the iframe live in the [`@simplepdf/embed`](../embed) package (generated from the editor contract); copilot's tool catalogue + middleware live in `src/lib/tools/` (`definitions.ts`, `middleware.ts`). System prompt: `src/server/tools.ts`. Public iframe contract these tools exercise: [`documentation/IFRAME.md`](../documentation/IFRAME.md).
Expand All @@ -245,7 +245,7 @@ The architecture is deliberate:
- **Document data stays in the browser.** SimplePDF processes PDFs client-side. The iframe never uploads document bytes to SimplePDF.
- **Chat traffic flows through your server.** You control the provider, the keys, the logs, and any RAG / internal data layered in.
- **Submission is direct to your storage.** On Premium with [Bring Your Own Storage](https://simplepdf.com/pricing) (S3, Azure Blob, or SharePoint), completed PDFs upload from the browser to your bucket, never to SimplePDF servers.
- **Voice input is the one exception, and it is opt-in.** Dictation sends the recorded audio clip out of the browserunlike PDF bytes, dictated audio (which can contain PII/PHI) leaves the device. Two routes, each disclosed before recording: the **demo** route uploads to SimplePDF's server and on to OpenAI (the server keeps no audio and logs no transcript textonly an IP hash, byte size, and elapsed time); the **BYOK** route sends audio **directly to the user's chosen provider, never to SimplePDF** (SimplePDF makes no retention claim for user-selected providers that's the provider's policy). Audio is sent only on an explicit Record.
- **Voice input is the one exception, and it is opt-in.** Dictation sends the recorded audio clip out of the browser, unlike PDF bytes, dictated audio (which can contain PII/PHI) leaves the device. Two routes, each disclosed before recording: the **demo** route uploads to SimplePDF's server and on to OpenAI (the server keeps no audio and logs no transcript text, only an IP hash, byte size, and elapsed time); the **BYOK** route sends audio **directly to the user's chosen provider, never to SimplePDF** (SimplePDF makes no retention claim for user-selected providers, that's the provider's policy). Audio is sent only on an explicit Record.

### Using the demo account

Expand Down
24 changes: 12 additions & 12 deletions documentation/IFRAME.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@

SimplePDF Embed [React](../react/README.md) and [Web](../web/README.md) integrate `SimplePDF` in a single line of code by displaying the editor in a modal.

**For more control**embedding the editor inline (e.g. in a `div`), or driving it programmatically read on.
**For more control**, embedding the editor inline (e.g. in a `div`), or driving it programmatically, read on.

## The iframe URL vs. programmatic control

Pointing an `<iframe src>` at the editor (below) gives you the **full editor, with your account's branding**, in zero JavaScript. What it does **not** give you is **programmatic control**: you can't read a field, jump to a page, prefill values, submit, or let an AI agent fill and read the document on the user's behalf.

For that, drive the same iframe with the typed [`@simplepdf/embed`](#iframe-communication) client see [Iframe Communication](#iframe-communication).
For that, drive the same iframe with the typed [`@simplepdf/embed`](#iframe-communication) client, see [Iframe Communication](#iframe-communication).

## With a SimplePDF account (to collect customers' submissions)

Expand Down Expand Up @@ -96,7 +96,7 @@ See [Data Privacy & companyIdentifier](../README.md#data-privacy--companyidentif

_Programmatic control is only available with a SimplePDF account_

The iframe communicates over the `postMessage` API. Use **[`@simplepdf/embed`](../embed/README.md)** a zero-dependency client that drives the editor over the iframe for you, **generated from the editor contract** so it can't drift. It wraps everything for you: request/response correlation, timeouts, the editor-ready / document-loaded lifecycle, typed events, and the closed error model. Methods + arguments are camelCase; the editor's `snake_case` wire is handled behind the scenes. (If you'd rather add no dependency, the raw protocol is documented under [Wire shape](#wire-shape).)
The iframe communicates over the `postMessage` API. Use **[`@simplepdf/embed`](../embed/README.md)**, a zero-dependency client that drives the editor over the iframe for you, **generated from the editor contract** so it can't drift. It wraps everything for you: request/response correlation, timeouts, the editor-ready / document-loaded lifecycle, typed events, and the closed error model. Methods + arguments are camelCase; the editor's `snake_case` wire is handled behind the scenes. (If you'd rather add no dependency, the raw protocol is documented under [Wire shape](#wire-shape).)

The examples below go from the simplest embed to full programmatic and agentic control. `createEmbed` either **creates** the iframe inside a container you provide, or **attaches** to an `<iframe>` you render.

Expand All @@ -110,8 +110,8 @@ The examples below go from the simplest embed to full programmatic and agentic c
import { createEmbed } from "@simplepdf/embed";

const embed = createEmbed({
target: "#editor", // a container the iframe is created inside it
companyIdentifier: "acme", // your <companyIdentifier>.simplepdf.com use "embed" for the free editor
target: "#editor", // a container, the iframe is created inside it
companyIdentifier: "acme", // your <companyIdentifier>.simplepdf.com, use "embed" for the free editor
document: { url: "https://example.com/form.pdf" },
});

Expand Down Expand Up @@ -161,7 +161,7 @@ See the [`@simplepdf/embed` README](../embed/README.md#actions) for the full met

### 4. Fill and read a document (what an agent does for you)

"Fill and read this document for me" is just these operations in sequence read the fields, fill one, then walk the user to a signature: navigate to its page, focus the field, and open the signature tool.
"Fill and read this document for me" is just these operations in sequence, read the fields, fill one, then walk the user to a signature: navigate to its page, focus the field, and open the signature tool.

```ts
// read
Expand All @@ -177,33 +177,33 @@ await embed.actions.focusField({ fieldId: "f_signature" });
await embed.actions.selectTool({ tool: "SIGNATURE" });
```

An AI agent does exactly this the next section exposes these operations as tools the model calls.
An AI agent does exactly this, the next section exposes these operations as tools the model calls.

### 5. Drive the editor from an LLM (agentic)

The same operations are exposed as [Vercel AI SDK](https://sdk.vercel.ai) tools:

```ts
// server execute-less tool definitions for streamText / generateText
// server: execute-less tool definitions for streamText / generateText
import { simplePDFToolDefinitions } from "@simplepdf/embed/ai-sdk";
streamText({ model, tools: simplePDFToolDefinitions() });

// browser run the model's tool calls against the live editor
// browser: run the model's tool calls against the live editor
import { createSimplePDFExecutor } from "@simplepdf/embed/ai-sdk";
const execute = createSimplePDFExecutor({ embed });
```

In React, `@simplepdf/react-embed-pdf/ai-sdk`'s `useEmbedTools(embedRef)` is the same registry pre-bound to the live editor drop it straight into `useChat({ tools })`.
In React, `@simplepdf/react-embed-pdf/ai-sdk`'s `useEmbedTools(embedRef)` is the same registry pre-bound to the live editor, drop it straight into `useChat({ tools })`.

---

## Reference: the editor contract (the spec)

The single source of truth for the available operations and events can be found at **[`https://simplepdf.com/embed/json`](https://simplepdf.com/embed/json)**.

It describes every operation (its `request_type`, input/output JSON Schema, and per-operation error codes), the outbound events, the supported locales, and the **complete closed set of error codes** each `code` carrying a plain-language description of its meaning. It is the iframe / `postMessage` counterpart to the REST API's OpenAPI spec at [`/api/json`](https://simplepdf.com/api/json).
It describes every operation (its `request_type`, input/output JSON Schema, and per-operation error codes), the outbound events, the supported locales, and the **complete closed set of error codes**, each `code` carrying a plain-language description of its meaning. It is the iframe / `postMessage` counterpart to the REST API's OpenAPI spec at [`/api/json`](https://simplepdf.com/api/json).

- **Programmatic access (recommended):** [`@simplepdf/embed`](https://github.com/SimplePDF/simplepdf-embed/tree/main/embed) generates its client, zod schemas (`/schemas`), and agentic tool registry (`/tools`, `/ai-sdk`) from this exact contract use the package and you never read the raw spec.
- **Programmatic access (recommended):** [`@simplepdf/embed`](https://github.com/SimplePDF/simplepdf-embed/tree/main/embed) generates its client, zod schemas (`/schemas`), and agentic tool registry (`/tools`, `/ai-sdk`) from this exact contract, use the package and you never read the raw spec.
- **Agents / LLMs:** point the model at `/embed/json` (or `@simplepdf/embed/ai-sdk`) to discover and drive the editor programmatically.

### Wire shape
Expand Down
Loading
Loading