Streaming responses

Need ultra-low latency output? Use stream_text - an async iterator that yields text deltas in real time.

Prepare the script

import asyncio
from ai_sdk import openai, stream_text

model = openai("gpt-4.1-mini")

async def main():
result = stream_text(
model=model,
prompt="Write a short poem about the sea where each line rhymes.",
on_chunk=lambda d: print(d, end="", flush=True),
)

    # Alternatively await the full text:
    full = await result.text()
    print("\n---\nFull text:")
    print(full)

asyncio.run(main())

Run it

python stream.py

You’ll see tokens appear immediately instead of buffering the full response.

stream_text accepts the same arguments as generate_text plus optional callbacks:

on_chunk(delta) - each text delta
on_error(exc) - exceptions while streaming
on_finish(full_text) - once complete

stream_object is similar, but yields objects instead of text.

stream_text is similar, but yields objects instead of text.

Introduction

Guides