# Streaming (/docs/streaming)


## Overview [#overview]

Streaming allows you to receive partial responses as they are generated, providing a much better user experience for real-time applications like chatbots.

## Enable Streaming [#enable-streaming]

Set `stream: true` in your request:

```json
{
  "model": "model-id",
  "messages": [{"role": "user", "content": "Tell me a story."}],
  "stream": true
}
```

## Server-Sent Events (SSE) Format [#server-sent-events-sse-format]

The API returns a stream of `data:` events:

```
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant"},"index":0}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":"Once"},"index":0}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" upon"},"index":0}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" a"},"index":0}]}

data: [DONE]
```

Each chunk contains a `delta` object with the incremental content.

## Examples [#examples]

<Tabs items="[&#x22;Python&#x22;, &#x22;JavaScript&#x22;, &#x22;cURL&#x22;]">
  <Tab value="Python">
    ```python
    from openai import OpenAI

    client = OpenAI(
        api_key="YOUR_API_KEY",
        base_url="https://api.yuhuanstudio.com/v1"
    )

    stream = client.chat.completions.create(
        model="model-id",
        messages=[{"role": "user", "content": "Tell me a story."}],
        stream=True
    )

    for chunk in stream:
        content = chunk.choices[0].delta.content
        if content:
            print(content, end="", flush=True)
    ```
  </Tab>

  <Tab value="JavaScript">
    ```javascript
    import OpenAI from "openai";

    const client = new OpenAI({
      apiKey: "YOUR_API_KEY",
      baseURL: "https://api.yuhuanstudio.com/v1",
    });

    const stream = await client.chat.completions.create({
      model: "model-id",
      messages: [{ role: "user", content: "Tell me a story." }],
      stream: true,
    });

    for await (const chunk of stream) {
      const content = chunk.choices[0]?.delta?.content;
      if (content) process.stdout.write(content);
    }
    ```
  </Tab>

  <Tab value="cURL">
    ```bash
    curl https://api.yuhuanstudio.com/v1/chat/completions \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -N \
      -d '{
        "model": "model-id",
        "messages": [{"role": "user", "content": "Tell me a story."}],
        "stream": true
      }'
    ```
  </Tab>
</Tabs>

## Stream Options [#stream-options]

### Include Usage [#include-usage]

Request token usage information in the final stream event:

```json
{
  "model": "model-id",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": true,
  "stream_options": {
    "include_usage": true
  }
}
```

The final chunk before `[DONE]` will include:

```json
{
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 25,
    "total_tokens": 35
  }
}
```

## Streaming with Reasoning Models [#streaming-with-reasoning-models]

When using reasoning/thinking models, the stream may include thinking tokens in the `reasoning_content` field:

```python
stream = client.chat.completions.create(
    model="model-id",
    messages=[{"role": "user", "content": "Solve: What is 15! / 13!?"}],
    stream=True
)

for chunk in stream:
    delta = chunk.choices[0].delta

    # Reasoning content (thinking process)
    if hasattr(delta, 'reasoning_content') and delta.reasoning_content:
        print(f"[Thinking] {delta.reasoning_content}", end="")

    # Final content
    if delta.content:
        print(delta.content, end="")
```

## Streaming with Tool Calls [#streaming-with-tool-calls]

When the model calls a function, tool calls appear as deltas with partial arguments:

```json
data: {"choices": [{"delta": {"tool_calls": [{"index": 0, "id": "call_abc", "type": "function", "function": {"name": "get_weather", "arguments": ""}}]}}]}
data: {"choices": [{"delta": {"tool_calls": [{"index": 0, "function": {"arguments": "{\"lo"}}]}}]}}
data: {"choices": [{"delta": {"tool_calls": [{"index": 0, "function": {"arguments": "cation"}}]}}]}}
data: {"choices": [{"delta": {"tool_calls": [{"index": 0, "function": {"arguments": "\":\"Tokyo\"}"}}]}}]}}
```

Handle streaming tool calls by accumulating arguments:

```python
tool_calls = {}

for chunk in stream:
    if chunk.choices[0].delta.tool_calls:
        for tc in chunk.choices[0].delta.tool_calls:
            idx = tc.index
            if idx not in tool_calls:
                tool_calls[idx] = {"id": tc.id, "name": tc.function.name, "arguments": ""}
            if tc.function.arguments:
                tool_calls[idx]["arguments"] += tc.function.arguments
```

## Streaming the Messages API [#streaming-the-messages-api]

When using the Anthropic Messages API (`POST /v1/messages`), streaming follows the Anthropic SSE format:

| Event                 | Description                    |
| --------------------- | ------------------------------ |
| `message_start`       | Message object begins          |
| `content_block_start` | New content block started      |
| `content_block_delta` | Incremental content delta      |
| `content_block_stop`  | Content block completed        |
| `message_delta`       | Message-level delta with usage |
| `message_stop`        | Message completed              |

```python
with client.messages.stream(
    model="model-id",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a story"}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
```

## Streaming the Responses API [#streaming-the-responses-api]

When using the Responses API (`POST /v1/responses`), streaming follows the OpenAI Responses SSE format:

| Event                         | Description                 |
| ----------------------------- | --------------------------- |
| `response.created`            | Response object created     |
| `response.in_progress`        | Response processing started |
| `response.output_item.added`  | New output item added       |
| `response.content_part.added` | New content part added      |
| `response.output_text.delta`  | Incremental text delta      |
| `response.output_text.done`   | Text output completed       |
| `response.output_item.done`   | Output item completed       |
| `response.completed`          | Response fully completed    |
