Realtime sessions are ephemeral. The server holds conversation state for the lifetime of the WebSocket and a short trailing TTL (15 minutes of inactivity by default); after that, server-side memory is gone. To support long-running relationships — a user chats with a voice agent today, comes back next week, and the agent picks up where they left off — your client needs to persist memory externally and re-inject it on every new session. This guide shows the pattern: enable the in-session memory feature, capture the facts and summary it generates, store them in your own database, and seed the next session with what you’ve saved.Documentation Index
Fetch the complete documentation index at: https://dev.docs.inworld.ai/llms.txt
Use this file to discover all available pages before exploring further.
How the memory feature fits in
TheproviderData.memory branch enables automatic conversation memory inside a single session. While the session is open, the server periodically asks the LLM to extract durable facts and a rolling summary from the conversation, prepends them to the system prompt, and trims older transcript items so context stays bounded.
This is what makes a single 90-minute conversation coherent — the agent doesn’t forget what was said 20 minutes ago even as the raw transcript ages out. But because all of this state lives only on the server until the session closes, you need to capture it and replay it yourself to bridge across sessions.
The persistence lifecycle
The contract is: anything your client persists during one session can be sent back as configuration on the next. Nothing about the realtime API itself is durable across sessions — your DB is the source of truth for long-term state.1. Enable memory in the session
Send the memory config in your initialsession.update (or a later one):
2. Capture state from session.updated
Each time the server completes a memory-generation cycle, it sends a session.updated event containing the full session object — including a freshly populated providerData.memory.state. The state shape is:
session.updated events for two reasons: as an acknowledgement after every client-sent session.update, and whenever it completes a memory-generation cycle. Both carry the full session object, so most session.updated events you receive will repeat the same memory state you’ve already seen. Use the version counter (incremented only when new memory is generated) to detect a real change and avoid redundant DB writes:
conversation.item.input_audio_transcription.completed for user turns and response.output_audio_transcript.done for assistant turns, and write each turn into your own transcript table keyed by user_id.
3. Store the memories
There are two reasonable shapes for the persisted side, depending on how many sessions per user you expect.Option A — flat persistence (KV, Postgres row, JSON blob)
For most apps, just save the latestfacts array and summary string per user. Overwrite on every memory-generation cycle. Cheap, simple, fits in a single row:
Option B — vector store for retrieval across many sessions
If your users accumulate many sessions over months and the saved facts pile up past what fits in a single prompt, store each fact (or each session’s summary) with embeddings in a vector DB, then retrieve only the most relevant memories at the start of each new session. The Inworld language-learning Node example demonstrates this end-to-end with Supabase + pgvector: it embeds each saved fact, then on a new session runs a similarity query against the user’s opening utterance and injects the top-K matching memories. The same pattern works with Pinecone, Weaviate, or any pgvector-backed Postgres. Rough shape (using OpenAI embeddings + pgvector):4. Restore on a new session
When a returning user connects, load their saved state and bake it into the new session before the conversation starts. The most reliable approach is to inject everything you want the agent to recall directly into theinstructions field of the initial session.update. The model already treats instructions as authoritative, and this approach is portable across LLM providers — it doesn’t depend on any server-side state restoration semantics.
Optional: replay the last few turns
The injected facts and summary give the agent semantic memory. If you also want the agent to have the literal last few exchanges in context — useful for greetings like “as I was saying earlier…” — replay them withconversation.item.create after the session.updated ack:
End-to-end example
A reconnect handler that loads from your DB, resumes the session with persisted memory baked in, and keeps capturing fresh memories during the new session:Tuning tips
turn_interval: generate memory every N turns. Lower values catch facts sooner but spend more on memory-generation LLM calls.5is a good default; drop to3for fact-dense conversations (onboarding, intake) or raise to10for casual chat.max_facts: cap the size of the persisted fact list. If you’re using Option A (flat persistence) and finding the resumedinstructionsgetting too long, drop this number — older facts get aged out and the summary absorbs them.max_memory_length: cap on the rolling summary length in characters. Match this to the model’s context budget; a 2000-char summary plus ~50 short facts comfortably fits any modern LLM.- Stable
user_id: the entire pattern keys on a stable user identifier. Pass it asproviderData.user_idso it shows up in tracing and logs alongside the memory state (see SessionProvider in providerData). - First-session vs returning-user paths: keep them in one handler. A new user’s
loadMemoryreturns empty facts and an empty summary, which produces an emptymemoryBlock— the same code path serves both cases.
See also
providerData.memoryreference — every config field and the full state payload shape- Managing conversations —
conversation.item.*events used for replay - Inworld language-learning Node example — Supabase + pgvector reference implementation of Option B