← The Kibitz Engine · deep dive
How an AI agent joins a Kibitz room, perceives what's happening, and acts — over the same peer-to-peer data channel humans use. No new transport, no server.
One protocol, three faces. This spec is the core. The JS SDK (
createAgent), an MCP server, and third-party agents are all thin adapters that speak it.
An agent is just a headless Kibitz participant — the composable-engine controller
(mount({ headless }) → MountedWidget) with a brain wired to it. It joins a room by
its own key (an allow-listed agent key + a cert-bound assertion — §5, not the human
invite gate), appears in the roster, and exchanges messages over the DTLS-encrypted data
mesh. The live
Whist kibitzer is the
reference: it watches a seat and chats, using exactly this surface.
Kibitz relays opaque, structured-cloneable data between participants; it never inspects it. So perception is layered:
view each turn; the kibitzer reads pub/myHand/chat out of it.)Two perception sources, one agent surface. Some state is PRIVATE per participant — a card game's hidden hand, a DM — and the host directs each participant's tailored view; broadcasting it would leak the hand to opponents. So there are two constructors that yield the same agent shape:
createAgent(controller) — perceive over the generic broadcast/onMessage channel.createAgentFromBridge(appBridge) — perceive an app's host-tailored, per-participant
projection (Whist's onView/sendChat). The right choice for hidden-information apps. Its
BridgeAgent is a strict subset — onView/getView/onChat/say only (view + chat),
with no act/send/roster/schema; those need the full controller
(src/agent/agent.ts:178).Both are exported from the Kibitz bundle (Kibitz.createAgent / Kibitz.createAgentFromBridge
/ Kibitz.cooldown), so a page that loads widget.js can build an agent with no import step.
A tiny universal vocabulary rides the opaque channel under a reserved key __kib_agent
(src/agent/agent.ts). Everything else is passed through untouched as raw app data. (Note:
this is the agent SDK's envelope key — distinct from Kibitz core's ContentMsg.k
discriminator chat|app|pay|ink|idtoken|caps|schema, which the SDK does not use.)
__kib_agent |
direction | payload | meaning |
|---|---|---|---|
chat |
both | { text } |
a chat line (shows in apps that map it to their chat UI) |
view |
app → agent | { view } |
an app-state snapshot for agents to perceive |
act |
agent → app | { action } |
an action request the app may honor (or ignore) |
Raw (un-enveloped) messages are delivered to onData verbatim — apps that already have
their own format keep using it.
Self-description (schema discovery). A view is opaque by design, so an app can publish a
schema of its shape on a separate engine channel (ContentMsg.k='schema',
registerSchema(name, version, schema)), re-broadcast to late joiners so discovery is
order-independent. An agent reads them with getSchemas() / onSchema() (§7) and learns
how to interpret the view without out-of-band docs. This is state shape, orthogonal to the
capability layer (§4): publishing is gated by send-chat like any emission, so a read-only
agent consumes schemas but doesn't publish them.
The trust unlock is that most agents only need to watch — and Kibitz makes "watch only"
a guarantee, not a convention. Every participant carries a Grant
(src/core/capabilities.ts) of what it may perceive and
act:
see-screen, hear-audio, read-chat, read-roster, receive-directed.send-chat, speak, act.Defaults are by kind: a human is full; an agent (meta.role='agent', set by
createAgent) starts read-only — read-chat/read-roster/receive-directed, no act,
no media.
Two layers of enforcement, not one:
say/act/send when readOnly (they throw).send-chat (the send-chat check in the dispatch handler before
delivery, src/react/useCall.ts:704), and logs it to the host
audit (useCall.ts:705).see-screen/hear-audio) is swapped for a flowing
placeholder on that peer's connection (mesh.gatedTrack) — so a read-only agent gets no
audio and no screen share, ever.So Kibitz provides the policy and enforces it, not merely the signal. The host can widen
or revoke a grant live (consent panel + local audit feed), and the authority distributes
the grant map (a caps control message) so the limits hold uniformly across every human in
the room — see architecture.md §6. (The SDK
act() envelope in §3 is a message kind; the cap that currently gates an agent's emission
is send-chat.)
Disclosure (backend/egress). An agent may declare the model it routes perception to —
createAgent(ctrl, { backend: 'Claude' }) tags meta.backend and meta.egress, shown to the
host as "what it sees leaves the E2EE room." Honesty surfaced for consent, not a privilege.
An agent is not let in through the human invite/join gate — it has its own admission path,
distinct from a human's. The agent holds an ECDSA P-256 keypair; the room commits its
public key to the room's signed manifest, and the agent proves possession with a
cert-bound assertion the authority verifies before rostering — peer-to-peer, no human in
the loop, no shared secret, no mailer. (agent-platform.md §2 describes the same model.)
RoomManifest.agentKeys is a list of AgentEntry { key, caps?, label? } (src/core/roomManifest.ts:71): key is the agent's
public JWK, caps is the capability policy it gets on admission (absent ⇒ perceive-only,
defaultGrant('agent'), clamped by sanitizeGrant), label is a display/audit name. Signed
with the rest of the manifest, so the allow-list and the granted powers are tamper-proof
and room-bound; the entry is public, so it's safe in the link. agentKeys is a valid
standalone allow-list — a room can be agents-only / agents-gated with humans left open
(verifyManifest, roomManifest.ts:135).signAgentAssertion, src/core/agentKey.ts:67),
re-signing per (re-)announce so it stays fresh. Cert-binding stops a captured assertion from
being replayed on another connection; room-binding + freshness stop cross-room / stale replay.verifyAgentAssertion / admitAgentByManifest,
agentKey.ts:96 + roomManifest.ts:84).
In the engine this is the withAgentGate branch — an agentAssertion is admitted off the
manifest before the human verify path
(src/widget/Widget.tsx:557); a room with no agent keys simply
admits no agents. The runner supplies the agent's private key via
useCall.provideAgentKey() (src/react/useCall.ts:1097), which
re-signs on a timer. agentKeyThumbprint (agentKey.ts:59) is the
short stable key id for allow-listing and audit.Three trust anchors: (a) the operator allow-lists the agent's key (or hands it a signed invite); (b) a standing issuer/CA/attestation policy, so any conforming agent self-admits with no per-agent step (workload identity); or (c) an open room (no gate). Revocation = drop the key from the manifest.
Optional per-minute credit gate. Orthogonally to the key allow-list, a room may require a
declared agent to pay for its presence. When the gate is set (requireAgentCredits, default
OFF → fully dormant), a manifest-authorized agent must ALSO present a fresh RS256 credit
credential, which the authority verifies agnostically against the issuer's published JWKS
— no shared secret, no callback (verifyCreditCredential,
src/core/creditVerify.ts:49;
AgentCreditConfig, src/core/identity.ts:17). The credential is
short-lived (~60s TTL); the runner re-supplies it ~every minute via useCall.provideAgentCredit()
(useCall.ts:1118), the authority re-verifies on every announce
and re-stamps the rolling expiry, and a lapsed agent is reaped ~90s past expiry
(CREDIT_REAP_LEEWAY_SEC, src/core/room.ts:126;
reap at room.ts:671). Even a manifest-authorized agent pays. Kibitz is
the agnostic verifier; the issuer (Witbitz, api.witbitz.chat) mints and renews the passes —
see the network-access funding model (witbitz/docs/network-funding.md).
Kibitz is WebRTC, so the agent needs a WebRTC stack. Three rungs, increasing in effort:
createAgentFromBridge(appBridge); a Node side bridges to the LLM. Works today (the
kibitzer, and whist/tools/agent-mcp/pageAgent.mjs). Needed when perception comes from an
app's host-tailored projection (hidden hands).node-datachannel + ws host just the
Kibitz bundle; mount({headless}) → createAgent(controller), no browser process.
Built + LIVE-VALIDATED (whist/tools/agent-mcp/): two browserless agents in separate
Node processes join one room via the real broker, form the WebRTC data mesh, and exchange
a message — no browser (liveMesh.test.mjs).join / observe / say / leave so any
LLM joins a room as a tool. Built (whist/tools/agent-mcp/server.mjs, dependency-free
stdio JSON-RPC; KIBITZ_AGENT_RUNTIME=node selects the browserless runtime).The agent code (Section 7) is identical across all three — only the host differs.
Transport is swappable. createAgent is written against a minimal AgentController
(broadcast / onMessage / roster) — it does not assume WebRTC. An agent samples
(request/response: "give me the view", "say this"), which a WebSocket relay carries
better than a media mesh (simpler, no TURN, serverless-friendly). So the recommended
backing for agent traffic is a WS-relay controller; reserve WebRTC for human↔human live
co-browse and real-time duplex voice. Same AgentSession, different controller underneath.
See src/agent/agent.ts for the typed interface. The shape:
// options: { readOnly?, backend?, egress? } — backend/egress are the disclosure (§4)
const a = createAgent(controller, { readOnly: true, backend: 'Claude' })
a.onView((view) => { /* perceive app state */ })
a.getView() // the CURRENT app state (e.g. to answer a chat about it)
a.onChat((m) => { /* m.name said m.text */ })
a.onRoster((people) => { /* who's here */ })
a.getRoster() // current roster snapshot
a.onSchema((s) => { /* s.name@s.version describes s.schema */ })
a.getSchemas() // every app schema published so far (how to read the view)
a.canAct // false when readOnly (and the engine enforces it regardless)
// acting (guarded — throw when readOnly):
a.say('nice lead') // chat
a.act({ play: '7♠' }) // request an app action
a.send(payload, toId) // raw opaque data
a.leave()
// rate gate so the agent doesn't flood (replies can ignore it to jump the queue):
const gate = cooldown(6000); if (gate.ready(now)) { gate.stamp(now); a.say(line) }
This shape isn't designed in a vacuum — the production Whist kibitzer's perceive→decide→act
loop was refactored onto it (whist/tools/kibitzer/agent.mjs). Doing so surfaced and
folded back three things the first draft lacked:
getView() — an agent replying to a chat line needs the current state to answer.cooldown(ms) — every agent needs a flood gate; it was hand-rolled, now it's in the SDK.meta.role — the kibitzer skipped other agents by a
uid-prefix hack; the protocol does it cleanly off the role tag every agent sets.The kibitzer's game-specific code shrank to one "view interpretation" block; its agent logic is now transport- and app-agnostic. (It still runs an in-page mirror of the SDK, since the Playwright page can't import the TS module — see Section 8.)
view; add frames later. Note this is the inbound
question — gating an agent's media perception is already built, §4.)Kibitz.createAgent /
createAgentFromBridge / cooldown) and the kibitzer prefers it; the last step is
re-vendoring the current Kibitz bundle into Whist to delete the inline fallback — a release
chore, not a design question.Resolved since v0:
agentKey.ts, manifest agentKeys) — not the human invite gate — with per-entry
caps and agents-only-room support.creditVerify.ts, Witbitz-minted). Shipped in code, pending
a 2-device live test.view shape over a schema
engine channel, re-broadcast to late joiners, consumed via getSchemas()/onSchema().engine version +
features (e.g. schema.v1) so a newer build can see what an older one supports
(COMPATIBILITY.md). From the kibitzer refactor (§7a): cooldown and
getView() are in the SDK; agent-vs-agent filtering rides meta.role.