meta-llama/llama-models: Utilities intended for use with Llama models.
meta-llama/llama3: The official Meta Llama 3 GitHub site
ManishThota/llama3.2-Vision-9B-Instruct · HF Mirror
Llama-3.2-11B-Vision-Instruct · 模型库

Ollama下载模型

llama3.2-vision

我们可以使用Ollama上的模型，也可以在HF或者ModelScope上下载safetensors后自行构建Ollama模型。

在cmd中使用命令拉取模型

ollama run llama3.2-vision

下载完成后使用ollama list查看模型

在Open WebUI中使用

初步测试多模态能力是不错的

在Python中使用Ollama调用

以下是一个简单的通过Ollama Python API实现异步调用，并且支持流式传输和结构化输出。

import asyncio
import httpx
from pydantic import BaseModel
from ollama import AsyncClient


# 定义结构化输出的 Pydantic 模型
class Object(BaseModel):
    name: str
    confidence: float
    attributes: str


class ImageDescription(BaseModel):
    summary: str
    objects: list[Object]
    scene: str
    colors: list[str]
    time_of_day: str
    setting: str
    text_content: str | None = None


async def main():
    # 获取网络图片的 URL
    image_url = input("请输入图片的 URL: ")

    # 下载图片内容
    async with httpx.AsyncClient() as client_http:
        response = await client_http.get(image_url)
        response.raise_for_status()  # 确保请求成功
        image_data = response.content  # 获取图片的二进制数据

    # 初始化异步客户端
    client = AsyncClient()

    # 定义结构化输出的 schema
    schema = ImageDescription.model_json_schema()

    # 调用多模态模型进行图片分析
    # 注意：client.chat 是一个协程，需要先调用它，获取返回的异步迭代器
    async_chat_response = client.chat(
        model="llama3.2-vision",  # 使用支持视觉的模型
        messages=[
            {
                "role": "user",
                "content": "Analyze this image and return a detailed JSON description including objects, scene, colors and any text detected. If you cannot determine certain details, leave those fields empty.",
                "images": [image_data],  # 传递图片数据
            }
        ],
        format=schema,  # 使用 Pydantic 生成的 schema
        options={"temperature": 0},  # 设置温度为 0，使输出更确定
        stream=True,  # 启用流式传输
    )

    # 使用 await 调用协程，获取异步迭代器
    async for part in await async_chat_response:
        # 打印流式传输的每部分结果
        print(part.message.content, end="", flush=True)

    print("\n分析完成！")


if __name__ == "__main__":
    asyncio.run(main())

使用同样的图片调用结果如图所示