Streaming

Overview

Streaming allows you to receive partial responses as they are generated, providing a much better user experience for real-time applications like chatbots.

Enable Streaming

Set stream: true in your request:

{
  "model": "model-id",
  "messages": [{"role": "user", "content": "Tell me a story."}],
  "stream": true
}

Server-Sent Events (SSE) Format

The API returns a stream of data: events:

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant"},"index":0}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":"Once"},"index":0}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" upon"},"index":0}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" a"},"index":0}]}

data: [DONE]

Each chunk contains a delta object with the incremental content.

Examples

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.yuhuanstudio.com/v1"
)

stream = client.chat.completions.create(
    model="model-id",
    messages=[{"role": "user", "content": "Tell me a story."}],
    stream=True
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://api.yuhuanstudio.com/v1",
});

const stream = await client.chat.completions.create({
  model: "model-id",
  messages: [{ role: "user", content: "Tell me a story." }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

curl https://api.yuhuanstudio.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -N \
  -d '{
    "model": "model-id",
    "messages": [{"role": "user", "content": "Tell me a story."}],
    "stream": true
  }'

Stream Options

Include Usage

Request token usage information in the final stream event:

{
  "model": "model-id",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": true,
  "stream_options": {
    "include_usage": true
  }
}

The final chunk before [DONE] will include:

{
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 25,
    "total_tokens": 35
  }
}

Streaming with Reasoning Models

When using reasoning/thinking models, the stream may include thinking tokens in the reasoning_content field:

stream = client.chat.completions.create(
    model="model-id",
    messages=[{"role": "user", "content": "Solve: What is 15! / 13!?"}],
    stream=True
)

for chunk in stream:
    delta = chunk.choices[0].delta

    # Reasoning content (thinking process)
    if hasattr(delta, 'reasoning_content') and delta.reasoning_content:
        print(f"[Thinking] {delta.reasoning_content}", end="")

    # Final content
    if delta.content:
        print(delta.content, end="")

Streaming with Tool Calls

When the model calls a function, tool calls appear as deltas with partial arguments:

data: {"choices": [{"delta": {"tool_calls": [{"index": 0, "id": "call_abc", "type": "function", "function": {"name": "get_weather", "arguments": ""}}]}}]}
data: {"choices": [{"delta": {"tool_calls": [{"index": 0, "function": {"arguments": "{\"lo"}}]}}]}}
data: {"choices": [{"delta": {"tool_calls": [{"index": 0, "function": {"arguments": "cation"}}]}}]}}
data: {"choices": [{"delta": {"tool_calls": [{"index": 0, "function": {"arguments": "\":\"Tokyo\"}"}}]}}]}}

Handle streaming tool calls by accumulating arguments:

tool_calls = {}

for chunk in stream:
    if chunk.choices[0].delta.tool_calls:
        for tc in chunk.choices[0].delta.tool_calls:
            idx = tc.index
            if idx not in tool_calls:
                tool_calls[idx] = {"id": tc.id, "name": tc.function.name, "arguments": ""}
            if tc.function.arguments:
                tool_calls[idx]["arguments"] += tc.function.arguments

Streaming the Messages API

When using the Anthropic Messages API (POST /v1/messages), streaming follows the Anthropic SSE format:

Event	Description
`message_start`	Message object begins
`content_block_start`	New content block started
`content_block_delta`	Incremental content delta
`content_block_stop`	Content block completed
`message_delta`	Message-level delta with usage
`message_stop`	Message completed

with client.messages.stream(
    model="model-id",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a story"}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Streaming the Responses API

When using the Responses API (POST /v1/responses), streaming follows the OpenAI Responses SSE format:

Event	Description
`response.created`	Response object created
`response.in_progress`	Response processing started
`response.output_item.added`	New output item added
`response.content_part.added`	New content part added
`response.output_text.delta`	Incremental text delta
`response.output_text.done`	Text output completed
`response.output_item.done`	Output item completed
`response.completed`	Response fully completed

Overview

Streaming allows you to receive partial responses as they are generated, providing a much better user experience for real-time applications like chatbots.

Enable Streaming

Set stream: true in your request:

{
  "model": "model-id",
  "messages": [{"role": "user", "content": "Tell me a story."}],
  "stream": true
}

Server-Sent Events (SSE) Format

The API returns a stream of data: events:

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant"},"index":0}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":"Once"},"index":0}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" upon"},"index":0}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" a"},"index":0}]}

data: [DONE]

Each chunk contains a delta object with the incremental content.

Examples

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.yuhuanstudio.com/v1"
)

stream = client.chat.completions.create(
    model="model-id",
    messages=[{"role": "user", "content": "Tell me a story."}],
    stream=True
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://api.yuhuanstudio.com/v1",
});

const stream = await client.chat.completions.create({
  model: "model-id",
  messages: [{ role: "user", content: "Tell me a story." }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

curl https://api.yuhuanstudio.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -N \
  -d '{
    "model": "model-id",
    "messages": [{"role": "user", "content": "Tell me a story."}],
    "stream": true
  }'

Stream Options

Include Usage

Request token usage information in the final stream event:

{
  "model": "model-id",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": true,
  "stream_options": {
    "include_usage": true
  }
}

The final chunk before [DONE] will include:

{
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 25,
    "total_tokens": 35
  }
}

Streaming with Reasoning Models

When using reasoning/thinking models, the stream may include thinking tokens in the reasoning_content field:

stream = client.chat.completions.create(
    model="model-id",
    messages=[{"role": "user", "content": "Solve: What is 15! / 13!?"}],
    stream=True
)

for chunk in stream:
    delta = chunk.choices[0].delta

    # Reasoning content (thinking process)
    if hasattr(delta, 'reasoning_content') and delta.reasoning_content:
        print(f"[Thinking] {delta.reasoning_content}", end="")

    # Final content
    if delta.content:
        print(delta.content, end="")

Streaming with Tool Calls

When the model calls a function, tool calls appear as deltas with partial arguments:

data: {"choices": [{"delta": {"tool_calls": [{"index": 0, "id": "call_abc", "type": "function", "function": {"name": "get_weather", "arguments": ""}}]}}]}
data: {"choices": [{"delta": {"tool_calls": [{"index": 0, "function": {"arguments": "{\"lo"}}]}}]}}
data: {"choices": [{"delta": {"tool_calls": [{"index": 0, "function": {"arguments": "cation"}}]}}]}}
data: {"choices": [{"delta": {"tool_calls": [{"index": 0, "function": {"arguments": "\":\"Tokyo\"}"}}]}}]}}

Handle streaming tool calls by accumulating arguments:

tool_calls = {}

for chunk in stream:
    if chunk.choices[0].delta.tool_calls:
        for tc in chunk.choices[0].delta.tool_calls:
            idx = tc.index
            if idx not in tool_calls:
                tool_calls[idx] = {"id": tc.id, "name": tc.function.name, "arguments": ""}
            if tc.function.arguments:
                tool_calls[idx]["arguments"] += tc.function.arguments

Streaming the Messages API

When using the Anthropic Messages API (POST /v1/messages), streaming follows the Anthropic SSE format:

Event	Description
`message_start`	Message object begins
`content_block_start`	New content block started
`content_block_delta`	Incremental content delta
`content_block_stop`	Content block completed
`message_delta`	Message-level delta with usage
`message_stop`	Message completed

with client.messages.stream(
    model="model-id",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a story"}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Streaming the Responses API

When using the Responses API (POST /v1/responses), streaming follows the OpenAI Responses SSE format:

Event	Description
`response.created`	Response object created
`response.in_progress`	Response processing started
`response.output_item.added`	New output item added
`response.content_part.added`	New content part added
`response.output_text.delta`	Incremental text delta
`response.output_text.done`	Text output completed
`response.output_item.done`	Output item completed
`response.completed`	Response fully completed

Overview

Enable Streaming

Server-Sent Events (SSE) Format

Examples

Stream Options

Include Usage

Streaming with Reasoning Models

Streaming with Tool Calls

Streaming the Messages API

Streaming the Responses API

On this page

Streaming

Overview

Enable Streaming

Server-Sent Events (SSE) Format

Examples

Stream Options

Include Usage

Streaming with Reasoning Models

Streaming with Tool Calls

Streaming the Messages API

Streaming the Responses API

On this page