# Vision & Documents (/docs/vision)


## Overview [#overview]

Multimodal-capable models can analyze images, documents, and videos alongside text. Yunxin supports multimodal inputs through the Chat Completions API using the `content` array.

## Content Types [#content-types]

| Type        | Description       | Supported Formats                    |
| ----------- | ----------------- | ------------------------------------ |
| `image_url` | Image analysis    | JPEG, PNG, GIF, WebP (URL or base64) |
| `file_url`  | Document analysis | PDF, TXT, CSV, DOCX, and more        |
| `video_url` | Video analysis    | MP4, MOV, WebM                       |

## Sending Images [#sending-images]

Include images in the `content` array using the `image_url` type:

```json
{
  "model": "model-id",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What's in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/photo.jpg"
          }
        }
      ]
    }
  ]
}
```

### Image Formats [#image-formats]

| Format          | Supported       |
| --------------- | --------------- |
| URL (HTTPS)     | Yes             |
| Base64 data URI | Yes             |
| Local file path | No (use base64) |

### Base64 Encoding [#base64-encoding]

```python
import base64

with open("image.png", "rb") as f:
    base64_image = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="model-id",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this image."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{base64_image}"
                    }
                }
            ]
        }
    ]
)
```

## Multiple Images [#multiple-images]

Send multiple images in a single request:

```python
response = client.chat.completions.create(
    model="model-id",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Compare these two images."},
                {"type": "image_url", "image_url": {"url": "https://example.com/image1.jpg"}},
                {"type": "image_url", "image_url": {"url": "https://example.com/image2.jpg"}}
            ]
        }
    ]
)
```

## Image Detail Level [#image-detail-level]

Control the resolution for analysis:

```json
{
  "type": "image_url",
  "image_url": {
    "url": "https://example.com/photo.jpg",
    "detail": "high"
  }
}
```

| Detail | Description                    | Token Usage |
| ------ | ------------------------------ | ----------- |
| `low`  | 512×512 fixed                  | Lower       |
| `high` | Full resolution (up to 2048px) | Higher      |
| `auto` | Model decides                  | Varies      |

## Vision-Capable Models [#vision-capable-models]

<Callout>
  Not all models support vision. Use the `GET /v1/models` endpoint and check for the `vision` capability to find models that support image inputs.
</Callout>

## Document Analysis [#document-analysis]

Send documents for analysis using the `file_url` content type:

```json
{
  "model": "model-id",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Summarize the key findings in this document."},
        {"type": "file_url", "file_url": {"url": "https://example.com/report.pdf"}}
      ]
    }
  ]
}
```

### Multiple Documents [#multiple-documents]

```python
response = client.chat.completions.create(
    model="model-id",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Compare these two documents."},
            {"type": "file_url", "file_url": {"url": "https://example.com/report-q1.pdf"}},
            {"type": "file_url", "file_url": {"url": "https://example.com/report-q2.pdf"}}
        ]
    }]
)
```

## Video Analysis [#video-analysis]

Send videos for analysis using the `video_url` content type:

```json
{
  "model": "model-id",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe what happens in this video."},
        {"type": "video_url", "video_url": {"url": "https://example.com/clip.mp4"}}
      ]
    }
  ]
}
```

<Callout>
  Document and video support depends on the model. Use the `vision` capability as a general indicator for multimodal support, but check individual model documentation for specific format support.
</Callout>
