Streaming
Receive responses in real-time with server-sent events (SSE).
Overview
Streaming allows you to receive partial responses as they are generated, providing a much better user experience for real-time applications like chatbots.
Enable Streaming
Set stream: true in your request:
{
"model": "model-id",
"messages": [{"role": "user", "content": "Tell me a story."}],
"stream": true
}Server-Sent Events (SSE) Format
The API returns a stream of data: events:
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant"},"index":0}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":"Once"},"index":0}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" upon"},"index":0}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" a"},"index":0}]}
data: [DONE]Each chunk contains a delta object with the incremental content.
Examples
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.yuhuanstudio.com/v1"
)
stream = client.chat.completions.create(
model="model-id",
messages=[{"role": "user", "content": "Tell me a story."}],
stream=True
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)import OpenAI from "openai";
const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://api.yuhuanstudio.com/v1",
});
const stream = await client.chat.completions.create({
model: "model-id",
messages: [{ role: "user", content: "Tell me a story." }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}curl https://api.yuhuanstudio.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-N \
-d '{
"model": "model-id",
"messages": [{"role": "user", "content": "Tell me a story."}],
"stream": true
}'Stream Options
Include Usage
Request token usage information in the final stream event:
{
"model": "model-id",
"messages": [{"role": "user", "content": "Hello"}],
"stream": true,
"stream_options": {
"include_usage": true
}
}The final chunk before [DONE] will include:
{
"usage": {
"prompt_tokens": 10,
"completion_tokens": 25,
"total_tokens": 35
}
}Streaming with Reasoning Models
When using reasoning/thinking models, the stream may include thinking tokens in the reasoning_content field:
stream = client.chat.completions.create(
model="model-id",
messages=[{"role": "user", "content": "Solve: What is 15! / 13!?"}],
stream=True
)
for chunk in stream:
delta = chunk.choices[0].delta
# Reasoning content (thinking process)
if hasattr(delta, 'reasoning_content') and delta.reasoning_content:
print(f"[Thinking] {delta.reasoning_content}", end="")
# Final content
if delta.content:
print(delta.content, end="")Streaming with Tool Calls
When the model calls a function, tool calls appear as deltas with partial arguments:
data: {"choices": [{"delta": {"tool_calls": [{"index": 0, "id": "call_abc", "type": "function", "function": {"name": "get_weather", "arguments": ""}}]}}]}
data: {"choices": [{"delta": {"tool_calls": [{"index": 0, "function": {"arguments": "{\"lo"}}]}}]}}
data: {"choices": [{"delta": {"tool_calls": [{"index": 0, "function": {"arguments": "cation"}}]}}]}}
data: {"choices": [{"delta": {"tool_calls": [{"index": 0, "function": {"arguments": "\":\"Tokyo\"}"}}]}}]}}Handle streaming tool calls by accumulating arguments:
tool_calls = {}
for chunk in stream:
if chunk.choices[0].delta.tool_calls:
for tc in chunk.choices[0].delta.tool_calls:
idx = tc.index
if idx not in tool_calls:
tool_calls[idx] = {"id": tc.id, "name": tc.function.name, "arguments": ""}
if tc.function.arguments:
tool_calls[idx]["arguments"] += tc.function.argumentsStreaming the Messages API
When using the Anthropic Messages API (POST /v1/messages), streaming follows the Anthropic SSE format:
| Event | Description |
|---|---|
message_start | Message object begins |
content_block_start | New content block started |
content_block_delta | Incremental content delta |
content_block_stop | Content block completed |
message_delta | Message-level delta with usage |
message_stop | Message completed |
with client.messages.stream(
model="model-id",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a story"}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)Streaming the Responses API
When using the Responses API (POST /v1/responses), streaming follows the OpenAI Responses SSE format:
| Event | Description |
|---|---|
response.created | Response object created |
response.in_progress | Response processing started |
response.output_item.added | New output item added |
response.content_part.added | New content part added |
response.output_text.delta | Incremental text delta |
response.output_text.done | Text output completed |
response.output_item.done | Output item completed |
response.completed | Response fully completed |
How is this guide?