Prompt
Building block for the Web’s Built-in Prompt API (LanguageModel). One-shot ask() for embeds and widgets, plus a createSession() primitive for chat-shaped apps that need independent per-conversation sessions, streaming with deltas, and a multi-turn history.
The package’s README covers install and the API table. This page is the conceptual overview of the core (vanilla) surface. The React adapter lives at @web-ai-sdk/prompt/react; see usePrompt and useSession.
One-shot: ask()
import { ask } from "@web-ai-sdk/prompt";
const result = await ask({ input: "Summarize this in one sentence: WebMCP lets pages expose tools to agents.", systemPrompt: "You are concise. Reply with a single sentence.", temperature: 0.2, onUpdate: (text) => render(text),});
console.log(result.response, result.cached);Pass onUpdate to render partial text as it streams. result.response is the final cleaned text; result.cached tells you whether the response came back without invoking the model.
onUpdate receives the cumulative buffer, not deltas. For delta-shaped streaming use createSession().sendStreaming().
Chat: createSession()
ask() shares a warm LanguageModel instance across same-shape callers — perfect for embeds, but the wrong shape for chat. Two chats with the same persona would queue instead of streaming concurrently. createSession() gives every conversation its own underlying instance:
import { createSession } from "@web-ai-sdk/prompt";
const session = createSession({ systemPrompt: "You are a helpful assistant.", temperature: 0.7,});
// Streaming yields DELTA chunks (not cumulative buffers):for await (const delta of session.sendStreaming("Tell me about WebMCP.")) { process.stdout.write(delta);}
// Or one-shot per turn:const text = await session.send("And what about the Prompt API?");
// Tear down explicitly when the conversation ends.session.destroy();The wrapper is intentionally thin. It handles cross-browser smoothing every consumer would otherwise reimplement (delta-vs-cumulative chunk detection, output sanitization, abort wiring, typed unavailability) and forwards everything else to the native instance. It does not track conversation history, queue concurrent sends, or wrap clone() — those are your data model and UI concerns.
createSession() never touches the ask() session cache. Two sessions with identical options get two independent instances with isolated history, system prompt, sampling, and lifecycle — abort() / destroy() on one session never touch another. Concurrent send / sendStreaming calls on the same session are NOT queued — the underlying LanguageModel is sequential per instance and will reject the overlapping call with InvalidStateError. Either await the previous send or call session.abort() before issuing a new turn.
Concurrency note. Each session is an independent LanguageModel instance, but the underlying on-device model is single-instance. Chrome 148 / Edge 138 currently schedule sendStreaming calls across sessions FIFO: overlapping sends do not interleave token-by-token — the second send waits for the first to drain. This is a constraint of the runtime, not of the API; code written against createSession() becomes faster automatically if a future release exposes parallel inference.
How it works
Chrome’s LanguageModel exposes LanguageModel.create({...}) to spin up a session and session.prompt(input) / session.promptStreaming(input) to run it. The wrapper does the following on top:
- Feature detection.
isPromptAvailable()/checkAvailability()returnfalse/nullon browsers without the API. The vanillaask()throwsPromptUnavailableError; the React hook surfacesstatus: "unavailable". - Session reuse for
ask(). A bounded LRU (default 8) cachesLanguageModel.create()by stringified create options. Cold start ≈ 1-3s; warm calls are sub-second.createSession()is never cached. - Optional result cache for
ask(). Off by default; every call hits the model. Passcache: createSessionStorageCache()(or any{ get, set }-shaped object) to memoize responses by(input, systemPrompt, temperature, topK). Swap the backend (e.g.localStoragefor cross-tab persistence) via thecacheoption. - Delta-vs-cumulative chunk detection. Chrome ships delta chunks; some Edge backends ship cumulative. The wrapper normalizes per-chunk so
onUpdate(cumulative) andsendStreaming()(deltas) work the same across browsers.
System prompt + sampling
systemPrompt is folded into the session’s initialPrompts as a system role. temperature and topK are passed through verbatim; when omitted, the model’s defaults apply.
ask({ input: "Tell me a joke about web standards.", systemPrompt: "You are a stand-up comedian. Be punchy.", temperature: 1.0, topK: 8,});If you also pass createOptions.initialPrompts, that array silently overrides the synthesized system prompt — the persona is lost. The SDK emits a one-shot console.warn when this happens. Pass only one.
Language hints
Chrome’s Prompt API accepts expectedInputs / expectedOutputs with optional language arrays. The wrapper sets these automatically when you pass language and the language is in supportedLanguages (default: ["en"]):
ask({ input: "Explain CORS in two sentences.", language: "en-US",});For unsupported languages the hints are silently omitted; the model still runs, just without the hint.
Structured output
Pass a JSON Schema via responseConstraint to constrain the model’s output:
const result = await ask({ input: "Extract the city and country from: 'I live in Belo Horizonte, Brazil.'", responseConstraint: { type: "object", properties: { city: { type: "string" }, country: { type: "string" }, }, required: ["city", "country"], },});
const parsed = JSON.parse(result.response ?? "{}");The wrapper passes responseConstraint straight through to session.prompt() / session.promptStreaming(). Support depends on your Chrome version.
Aborting
AbortSignal is supported on every surface. Aborting mid-stream resolves cleanly; an opt-in result cache is not written for aborted runs:
const controller = new AbortController();const promise = ask({ input: "Long story…", signal: controller.signal });setTimeout(() => controller.abort(), 1000);For sessions, call session.abort() to stop the most recent in-flight send / sendStreaming, or session.destroy() to tear down the underlying instance.
Cache controls
import { clearSessions, // drop every cached one-shot session clearSession, // drop one cached session by create-options configurePromptCache, // change the LRU cap (default 8)} from "@web-ai-sdk/prompt";
// Free memory eagerly when a feature is done.clearSessions();
// Or just the one cached for a specific persona.clearSession({ initialPrompts: [{ role: "system", content: "You are concise." }] });
// Raise / lower the cap; excess entries are evicted in LRU order.configurePromptCache({ max: 4 });Errors and unavailability
The vanilla ask() throws PromptUnavailableError when the API is missing or reports availability: "unavailable". Callers branch explicitly:
import { ask, PromptUnavailableError } from "@web-ai-sdk/prompt";
try { const result = await ask({ input: "hi" });} catch (err) { if (err instanceof PromptUnavailableError) { // No Chrome flag, no model, or download blocked; fall back. return; } throw err;}createSession() returns a Session synchronously even when creation fails; the error surfaces on the first send / sendStreaming.