Kibitz

← The Kibitz Engine · deep dive

Kibitz Agent Protocol (draft v0)

How an AI agent joins a Kibitz room, perceives what's happening, and acts — over the same peer-to-peer data channel humans use. No new transport, no server.

One protocol, three faces. This spec is the core. The JS SDK (createAgent), an MCP server, and third-party agents are all thin adapters that speak it.

1. What an agent is

An agent is just a headless Kibitz participant — the composable-engine controller (mount({ headless })MountedWidget) with a brain wired to it. It joins a room by its own key (an allow-listed agent key + a cert-bound assertion — §5, not the human invite gate), appears in the roster, and exchanges messages over the DTLS-encrypted data mesh. The live Whist kibitzer is the reference: it watches a seat and chats, using exactly this surface.

2. Perception — two layers

Kibitz relays opaque, structured-cloneable data between participants; it never inspects it. So perception is layered:

Two perception sources, one agent surface. Some state is PRIVATE per participant — a card game's hidden hand, a DM — and the host directs each participant's tailored view; broadcasting it would leak the hand to opponents. So there are two constructors that yield the same agent shape:

Both are exported from the Kibitz bundle (Kibitz.createAgent / Kibitz.createAgentFromBridge / Kibitz.cooldown), so a page that loads widget.js can build an agent with no import step.

3. The envelope vocabulary

A tiny universal vocabulary rides the opaque channel under a reserved key __kib_agent (src/agent/agent.ts). Everything else is passed through untouched as raw app data. (Note: this is the agent SDK's envelope key — distinct from Kibitz core's ContentMsg.k discriminator chat|app|pay|ink|idtoken|caps|schema, which the SDK does not use.)

__kib_agent direction payload meaning
chat both { text } a chat line (shows in apps that map it to their chat UI)
view app → agent { view } an app-state snapshot for agents to perceive
act agent → app { action } an action request the app may honor (or ignore)

Raw (un-enveloped) messages are delivered to onData verbatim — apps that already have their own format keep using it.

Self-description (schema discovery). A view is opaque by design, so an app can publish a schema of its shape on a separate engine channel (ContentMsg.k='schema', registerSchema(name, version, schema)), re-broadcast to late joiners so discovery is order-independent. An agent reads them with getSchemas() / onSchema() (§7) and learns how to interpret the view without out-of-band docs. This is state shape, orthogonal to the capability layer (§4): publishing is gated by send-chat like any emission, so a read-only agent consumes schemas but doesn't publish them.

4. Capabilities — a grant the engine enforces

The trust unlock is that most agents only need to watch — and Kibitz makes "watch only" a guarantee, not a convention. Every participant carries a Grant (src/core/capabilities.ts) of what it may perceive and act:

Defaults are by kind: a human is full; an agent (meta.role='agent', set by createAgent) starts read-onlyread-chat/read-roster/receive-directed, no act, no media.

Two layers of enforcement, not one:

  1. The SDK disables say/act/send when readOnly (they throw).
  2. The engine enforces the grant per peer, so a tampered agent client still can't act or see:
    • act = receiver-side drop — every honest peer ignores chat/app/pay/ink from a peer whose grant lacks send-chat (the send-chat check in the dispatch handler before delivery, src/react/useCall.ts:704), and logs it to the host audit (useCall.ts:705).
    • perceive = sender-side withholding — a peer never delivers data a recipient can't see, and a withheld media lane (see-screen/hear-audio) is swapped for a flowing placeholder on that peer's connection (mesh.gatedTrack) — so a read-only agent gets no audio and no screen share, ever.

So Kibitz provides the policy and enforces it, not merely the signal. The host can widen or revoke a grant live (consent panel + local audit feed), and the authority distributes the grant map (a caps control message) so the limits hold uniformly across every human in the room — see architecture.md §6. (The SDK act() envelope in §3 is a message kind; the cap that currently gates an agent's emission is send-chat.)

Disclosure (backend/egress). An agent may declare the model it routes perception to — createAgent(ctrl, { backend: 'Claude' }) tags meta.backend and meta.egress, shown to the host as "what it sees leaves the E2EE room." Honesty surfaced for consent, not a privilege.

5. Identity — an agent enters by its OWN key

An agent is not let in through the human invite/join gate — it has its own admission path, distinct from a human's. The agent holds an ECDSA P-256 keypair; the room commits its public key to the room's signed manifest, and the agent proves possession with a cert-bound assertion the authority verifies before rostering — peer-to-peer, no human in the loop, no shared secret, no mailer. (agent-platform.md §2 describes the same model.)

Three trust anchors: (a) the operator allow-lists the agent's key (or hands it a signed invite); (b) a standing issuer/CA/attestation policy, so any conforming agent self-admits with no per-agent step (workload identity); or (c) an open room (no gate). Revocation = drop the key from the manifest.

Optional per-minute credit gate. Orthogonally to the key allow-list, a room may require a declared agent to pay for its presence. When the gate is set (requireAgentCredits, default OFF → fully dormant), a manifest-authorized agent must ALSO present a fresh RS256 credit credential, which the authority verifies agnostically against the issuer's published JWKS — no shared secret, no callback (verifyCreditCredential, src/core/creditVerify.ts:49; AgentCreditConfig, src/core/identity.ts:17). The credential is short-lived (~60s TTL); the runner re-supplies it ~every minute via useCall.provideAgentCredit() (useCall.ts:1118), the authority re-verifies on every announce and re-stamps the rolling expiry, and a lapsed agent is reaped ~90s past expiry (CREDIT_REAP_LEEWAY_SEC, src/core/room.ts:126; reap at room.ts:671). Even a manifest-authorized agent pays. Kibitz is the agnostic verifier; the issuer (Witbitz, api.witbitz.chat) mints and renews the passes — see the network-access funding model (witbitz/docs/network-funding.md).

6. Runtime — how it actually connects

Kibitz is WebRTC, so the agent needs a WebRTC stack. Three rungs, increasing in effort:

  1. Engine in a (headless) browser — Playwright hosts the app page; the agent calls createAgentFromBridge(appBridge); a Node side bridges to the LLM. Works today (the kibitzer, and whist/tools/agent-mcp/pageAgent.mjs). Needed when perception comes from an app's host-tailored projection (hidden hands).
  2. Node-WebRTC runtime (browserless) — jsdom + node-datachannel + ws host just the Kibitz bundle; mount({headless})createAgent(controller), no browser process. Built + LIVE-VALIDATED (whist/tools/agent-mcp/): two browserless agents in separate Node processes join one room via the real broker, form the WebRTC data mesh, and exchange a message — no browser (liveMesh.test.mjs).
  3. MCP server — wraps (1) or (2) and exposes join / observe / say / leave so any LLM joins a room as a tool. Built (whist/tools/agent-mcp/server.mjs, dependency-free stdio JSON-RPC; KIBITZ_AGENT_RUNTIME=node selects the browserless runtime).

The agent code (Section 7) is identical across all three — only the host differs.

Transport is swappable. createAgent is written against a minimal AgentController (broadcast / onMessage / roster) — it does not assume WebRTC. An agent samples (request/response: "give me the view", "say this"), which a WebSocket relay carries better than a media mesh (simpler, no TURN, serverless-friendly). So the recommended backing for agent traffic is a WS-relay controller; reserve WebRTC for human↔human live co-browse and real-time duplex voice. Same AgentSession, different controller underneath.

7. The agent surface

See src/agent/agent.ts for the typed interface. The shape:

// options: { readOnly?, backend?, egress? } — backend/egress are the disclosure (§4)
const a = createAgent(controller, { readOnly: true, backend: 'Claude' })
a.onView((view) => { /* perceive app state */ })
a.getView()                  // the CURRENT app state (e.g. to answer a chat about it)
a.onChat((m) => { /* m.name said m.text */ })
a.onRoster((people) => { /* who's here */ })
a.getRoster()                // current roster snapshot
a.onSchema((s) => { /* s.name@s.version describes s.schema */ })
a.getSchemas()               // every app schema published so far (how to read the view)
a.canAct                     // false when readOnly (and the engine enforces it regardless)
// acting (guarded — throw when readOnly):
a.say('nice lead')           // chat
a.act({ play: '7♠' })        // request an app action
a.send(payload, toId)        // raw opaque data
a.leave()
// rate gate so the agent doesn't flood (replies can ignore it to jump the queue):
const gate = cooldown(6000); if (gate.ready(now)) { gate.stamp(now); a.say(line) }

7a. Validated against the live kibitzer

This shape isn't designed in a vacuum — the production Whist kibitzer's perceive→decide→act loop was refactored onto it (whist/tools/kibitzer/agent.mjs). Doing so surfaced and folded back three things the first draft lacked:

The kibitzer's game-specific code shrank to one "view interpretation" block; its agent logic is now transport- and app-agnostic. (It still runs an in-page mirror of the SDK, since the Playwright page can't import the TS module — see Section 8.)

8. Open questions (for v0 → v1)

Resolved since v0: