跳到主要内容

3 篇博文 含有标签「LLM」

查看所有标签

LLM StableDiffusion 生成图片

· 阅读需 2 分钟
wencaiwulue
Senior DevOps / System Engineer @ bytedance

背景

需要使用 StableDiffusion 文生图。

环境

  • 本地 mac 能够访问到 HuggingFace
  • Linux 主机有 GPU 卡

由于 linux 主机无法访问到 HuggingFace,因此在本地电脑上下载好,然后上传到 linux 服务器。

步骤

本地下载模型文件

huggingface-cli download stabilityai/stable-diffusion-2-1 --local-dir ~/stable-diffusion

使用 rsync 上传到 Linux 主机

rsync -e "ssh -i ~/.ssh/common.pem" -aP ~/stable-diffusion root@10.0.1.45:/shared/llm-models

或者直接在 Linux 主机上配置 mirror 源

export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download stabilityai/stable-diffusion-2-1 --local-dir ~/stable-diffusion

安装 python 依赖

pip install fastchat diffusers torch

使用 python 代码调用文生图

from diffusers import StableDiffusionPipeline

# 加载模型
model_id = "/shared/llm-models/stable-diffusion/"
pipe = StableDiffusionPipeline.from_pretrained(model_id)
pipe.to("cuda") # 或者使用 "cpu"
image = pipe("generate a dog").images[0]
image.save("generated_image.png")
print("Image generated and saved as 'generated_image.png'.")
(base) root@iv-yczfw5rmkgcva4frh5tu:~# python img.py
Loading pipeline components...: 17%|█████████████████▌ | 1/6 [00:00<00:04, 1.12it/s]/root/miniconda3/lib/python3.12/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
warnings.warn(
Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:01<00:00, 4.27it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:19<00:00, 2.52it/s]
Image generated and saved as 'generated_image.png'.
(base) root@iv-yczfw5rmkgcva4frh5tu:~# ls
generated_image.png img.py
(base) root@iv-yczfw5rmkgcva4frh5tu:~# cat img.py

下载生成的图片到本地

scp -i ~/.ssh/common.pem root@10.0.1.45:~/generated_image.png generated_image.png

查看效果

generated_image.png

大模型 LLM function call

· 阅读需 11 分钟
wencaiwulue
Senior DevOps / System Engineer @ bytedance

背景

在做大模型私有化相关的内容,遇到需要支持 function call 的功能

主要流程

流程

由于大模型不支持流式输入,因此实际上对于 function call,都是通过单次的 http 请求来实现的。

举例说明

使用大模型 function call 查询 南山图书馆位置的需求。

按照 OpenAI 的 function 格式发送请求

假设我们有个 function LocationTool, 作用是根据指定名称,查询具体位置。详细参数描述如下:

LocationTool:根据用户给出的问题查询具体位置或推荐具体地点。当问题中给出指定地点询问位置时可以返回位置信息,包含酒店、餐厅、博物馆、政府/医疗/教学机构等,但不包含市辖区、城市名称、乡镇名称。问题中可对餐厅、博物馆、政府单位、景点等进行推荐,返回推荐的具体地点。
支持如下参数:
location_keyword:表示用户想要查找的地点,必填参数,字符串类型;
poi_keyword:表示地点类型,必填参数,字符串类型;
latitude:表示UserInfo中纬度,必填参数,浮点型;
longitude:表示UserInfo中经度,必填参数,浮点型;
sort:表示排序规则,0=不排序,1=最近优先,2=最便宜优先,3=最贵优先,默认值为0,缺省参数,整型。
使用举例,需要出本插件的情况:问题是 "上海台儿庄路的特色小餐馆",参数为 {\"location_keyword\":\"上海台儿庄路\",\"poi_keyword\":\"特色小餐馆\",\"latitude\":\"30.11\",\"longitude\":\"121.45\"}。类似以下文本不出本插件:北京好玩的地方、门头沟在哪、方恒的消防栓在哪、推荐一些当地特产。
➜  ~ curl -X POST localhost:8000/chat/completions -H "Content-Type: application/json" -d '{
"model": "xxxxxx",
"stream": false,
"messages": [
{
"role": "user",
"content": "请帮我查询一下南山图书馆的位置"
}
],
"temperature": 0.95,
"top_p": 0.8,
"top_k": 10,
"repetition_penalty": 1,
"max_new_tokens": 2048,
"tools": [{
"id": "1",
"type": "function",
"function": {
"name": "LocationTool",
"description": "根据用户给出的问题查询具体位置或推荐具体地点。当问题中给出指定地点询问位置时可以返回位置信息,包含酒店、餐厅、博物馆、政府/医疗/教学机构等,但不包含市辖区、城市名称、乡镇名称。问题中可对餐厅、博物馆、政府单位、景点等进行推荐,返回推荐的具体地点。",
"parameters": {
"type": "object",
"properties": {
"location_keyword": {
"type": "string",
"description": "表示用户想要查找的地点"
},
"poi_keyword": {
"type": "string",
"description": "表示UserInfo中纬度"
},
"latitude": {
"type": "string",
"description": "表示UserInfo中纬度"
},
"longitude": {
"type": "string",
"description": "表示UserInfo中纬度"
},
"sort": {
"type": "integer",
"description": "0=不排序,1=最近优先,2=最便宜优先,3=最贵优先,默认值为0"
}
},
"required": ["location_keyword","poi_keyword","latitude","longitude"]
}
}
}]
}'
{"model":"xxxxxx","choices":[{"delta":{"role":"assistant","content":"\n当前提供了 1 个工具,分别是['LocationTool'],需求为查询南山图书馆的位置,需要调用 LocationTool 获取相关信息。","tool_calls":[{"type":"function","function":{"name":"LocationTool","arguments":"{\"latitude\": \"22.536444\", \"location_keyword\": \"南山图书馆\"}"},"index":0,"id":"call_nbkpltop1qzycolf8gw3n7zk"}]},"index":0,"finish_reason":"tool_calls","logprobs":null}],"usage":{"prompt_tokens":220,"completion_tokens":82,"total_tokens":302}}

➜ ~

可以看到响应包含

{
"tool_calls": [
{
"type": "function",
"function": {
"name": "LocationTool",
"arguments": "{\"latitude\": \"22.536444\", \"location_keyword\": \"南山图书馆\"}"
},
"index": 0,
"id": "call_nbkpltop1qzycolf8gw3n7zk"
}
]
}

调用 function call 获取结果

{
"type": "function",
"function": {
"name": "LocationTool",
"arguments": "{\"latitude\": \"22.536444\", \"location_keyword\": \"南山图书馆\"}"
},
"index": 0,
"id": "call_nbkpltop1qzycolf8gw3n7zk"
}

使用给到的参数,调用其他组件 LocationTool 方法,获取到结果。

南山区南山大道2093号

将结果给到大模型,做最终的输出

curl -X POST localhost:8000/chat/completions -H "Content-Type: application/json" -d '{
"model": "xxxxxx",
"stream": false,
"messages": [
{
"role": "user",
"content": "请帮我查询一下南山图书馆的位置"
},
{
"role":"assistant",
"content":"\n当前提供了 1 个工具,分别是[\"LocationTool\"],需求是查询南山图书馆的位置,需要调用 LocationTool 获取相关信息。","tool_calls":[{"type":"function","function":{"name":"LocationTool","arguments":"{\"latitude\": \"22.5385\", \"location_keyword\": \"南山图书馆\"}"},"index":0,"id":"1"}]
},
{
"role": "tool",
"content": "南山区南山大道2093号",
"tool_call_id": "1"
}
],
"temperature": 0.95,
"top_p": 0.8,
"top_k": 10,
"repetition_penalty": 1,
"max_new_tokens": 2048,
"tools": [{
"id": "1",
"type": "function",
"function": {
"name": "LocationTool",
"description": "根据用户给出的问题查询具体位置或推荐具体地点。当问题中给出指定地点询问位置时可以返回位置信息,包含酒店、餐厅、博物馆、政府/医疗/教学机构等,但不包含市辖区、城市名称、乡镇名称。问题中可对餐厅、博物馆、政府单位、景点等进行推荐,返回推荐的具体地点。",
"parameters": {
"type": "object",
"properties": {
"location_keyword": {
"type": "string",
"description": "表示用户想要查找的地点"
},
"poi_keyword": {
"type": "string",
"description": "表示UserInfo中纬度"
},
"latitude": {
"type": "string",
"description": "表示UserInfo中纬度"
},
"longitude": {
"type": "string",
"description": "表示UserInfo中纬度"
},
"sort": {
"type": "integer",
"description": "0=不排序,1=最近优先,2=最便宜优先,3=最贵优先,默认值为0"
}
},
"required": ["location_keyword","poi_keyword","latitude","longitude"]
}
}
}]
}'
{"model":"xxxxxx","choices":[{"message":{"role":"assistant","content":"南山图书馆的位置是南山区南山大道 2093 号。","tool_calls":[]},"index":0,"finish_reason":"stop","logprobs":null}],"usage":{"prompt_tokens":291,"completion_tokens":28,"total_tokens":319}}

最终的结果

模型回复 南山图书馆的位置是南山区南山大道 2093 号,得到了我们想要的答案。

探究本质

本质还是以训练的格式,直接发送给大模型。让大模型按照指定格式来识别。

➜  ~ curl -X POST localhost:8000/chat/completions -H "Content-Type: application/json" -d '{
"model": "xxxxxx",
"stream": false,
"messages": [
{
"role":"assistant",
"content":"<|Functions|>:\n- LocationTool:根据用户给出的问题查询具体位置或推荐具体地点。当问题中给出指定地点询问位置时可以返回位置信息,包含酒店、餐厅、博物馆、政府/医疗/教学机构等,但不包含市辖区、城市名称、乡镇名称。问题中可对餐厅、博物馆、政府单位、景点等进行推荐,返回推荐的具体地点。支持如下参数:location_keyword:表示用户 想要查找的地点,必填参数,字符串类型;poi_keyword:表示地点类型,必填参数,字符串类型;latitude:表示UserInfo中纬度,必填参数,浮点型;longitude:表示UserInfo中经度,必填 参数,浮点型;sort:表示排序规则,0=不排序,1=最近优先,2=最便宜优先,3=最贵优先,默认值为0,缺省参数,整型。比如,需要出本插件的情况:问题是 \"上海台儿庄路的特色小餐馆\" ,参数为 {\"location_keyword\":\"上海台儿庄路\",\"poi_keyword\":\"特色小餐馆\",\"latitude\":\"30.11\",\"longitude\":\"121.45\"}。类似以下文本不出本插件:北京好玩的地方、门头沟在哪、方恒的消防栓在哪、推荐一些当地特产。\n\n<|UserInfo|>:\n{\"system_time\": \"2023-09-14T19:27:42\", \"longitude\": 106.03, \"latitude\": 42.04}"
},
{
"role": "user",
"content": "请帮我查询一下南山图书馆的位置"
}
],
"temperature": 0.95,
"top_p": 0.8,
"top_k": 10,
"repetition_penalty": 1,
"max_new_tokens": 2048
}'
{"model":"xxxxxx","choices":[{"message":{"role":"assistant","content":"\n当前提供了 1 个工具,分别是['LocationTool'],需求是查询南山图书馆的位置,需要调用 LocationTool 获取相关信息。","tool_calls":[{"type":"function","function":{"name":"LocationTool","arguments":"{\"latitude\": 42.04, \"longitude\": 106.03, \"location_keyword\": \"南山图书馆\"}"},"index":0,"id":"call_uwy26bq6wvx7ud0g3anhrxhu"}]},"index":0,"finish_reason":"tool_calls","logprobs":null}],"usage":{"prompt_tokens":361,"completion_tokens":87,"total_tokens":448}}
➜  ~ curl -X POST localhost:8000/chat/completions -H "Content-Type: application/json" -d '{
"model": "xxxxxx",
"stream": false,
"messages": [
{
"role":"assistant",
"content":"<|Functions|>:\n- LocationTool:根据用户给出的问题查询具体位置或推荐具体地点。当问题中给出指定地点询问位置时可以返回位置信息,包含酒店、餐厅、博物馆、政府/医疗/教学机构等,但不包含市辖区、城市名称、乡镇名称。问题中可对餐厅、博物馆、政府单位、景点等进行推荐,返回推荐的具体地点。支持如下参数:location_keyword:表示用户 想要查找的地点,必填参数,字符串类型;poi_keyword:表示地点类型,必填参数,字符串类型;latitude:表示UserInfo中纬度,必填参数,浮点型;longitude:表示UserInfo中经度,必填 参数,浮点型;sort:表示排序规则,0=不排序,1=最近优先,2=最便宜优先,3=最贵优先,默认值为0,缺省参数,整型。比如,需要出本插件的情况:问题是 \"上海台儿庄路的特色小餐馆\" ,参数为 {\"location_keyword\":\"上海台儿庄路\",\"poi_keyword\":\"特色小餐馆\",\"latitude\":\"30.11\",\"longitude\":\"121.45\"}。类似以下文本不出本插件:北京好玩的地方、门头沟在哪、方恒的消防栓在哪、推荐一些当地特产。\n\n<|UserInfo|>:\n{\"system_time\": \"2023-09-14T19:27:42\", \"longitude\": 106.03, \"latitude\": 42.04}"
},
{
"role": "user",
"content": "请帮我查询一下南山图书馆的位置"
},
{
"role":"assistant",
"content":"<|Observation|>:南山区南山大道2093号"
}
],
"temperature": 0.95,
"top_p": 0.8,
"top_k": 10,
"repetition_penalty": 1,
"max_new_tokens": 2048
}'
{"model":"xxxxxx","choices":[{"message":{"role":"assistant","content":"为您查询到南山图书馆的地址为:南山区南山大道 2093 号。\n\n如果您想了解更多关于南山图书馆的信息,或者有其他需求,请继续提问。","tool_calls":[]},"index":0,"finish_reason":"stop","logprobs":null}],"usage":{"prompt_tokens":384,"completion_tokens":53,"total_tokens":437}}

LLM 大模型 Olloma 项目解读

· 阅读需 6 分钟
wencaiwulue
Senior DevOps / System Engineer @ bytedance

前言

最近在看大模型的相关知识,然后想在本地启动一个 LLM ,用来做实验,然后找到了 ollama 。体验挺好的,所以拿来研究下。

项目

在本地构建 ollama 二进制可执行文件

  • 克隆项目:git clone https://github.com/ollama/ollama.git
  • 构建依赖的 llama-server 二进制:cd ollama && ./scripts/build_darwin.sh
  • 构建 ollama 二进制:CGO_ENABLED=1 GOOS=darwin GOARCH=arm64 go build

可以看到,还有许多接口

POST   /api/pull
POST /api/generate
POST /api/chat
POST /api/embed
POST /api/embeddings
POST /api/create
POST /api/push
POST /api/copy
DELETE /api/delete
POST /api/show
POST /api/blobs/:digest
HEAD /api/blobs/:digest
GET /api/ps
POST /v1/chat/completions
POST /v1/completions
POST /v1/embeddings
GET /v1/models
GET /v1/models/:model
GET /
GET /api/tags
GET /api/version
HEAD /
HEAD /api/tags
HEAD /api/version
time=2024-08-14T16:27:00.222+08:00 level=INFO source=memory.go:309 msg="offload to metal" layers.requested=-1 layers.model=29 layers.offload=29 layers.split="" memory.available="[21.3 GiB]" memory.required.full="5.4 GiB" memory.required.partial="5.4 GiB" memory.required.kv="448.0 MiB" memory.required.allocations="[5.4 GiB]" memory.weights.total="3.9 GiB" memory.weights.repeating="3.4 GiB" memory.weights.nonrepeating="426.4 MiB" memory.graph.full="478.0 MiB" memory.graph.partial="478.0 MiB"
time=2024-08-14T16:27:00.756+08:00 level=INFO source=server.go:393 msg="starting llama server" cmd="/var/folders/30/cmv9c_5j3mq_kthx63sb1t5c0000gn/T/ollama3971698503/runners/metal/ollama_llama_server --model /Users/bytedance/.ollama/models/blobs/sha256-43f7a214e5329f672bb05404cfba1913cbb70fdaa1a17497224e1925046b0ed5 --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 29 --parallel 4 --port 61974"

仔细看命令

ollama_llama_server --model /Users/bytedance/.ollama/models/blobs/sha256-43f7a214e5329f672bb05404cfba1913cbb70fdaa1a17497224e1925046b0ed5 --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 29 --parallel 4 --port 61974

本质上使用的是 llama.cpp 提供的功能。直接使用 llama.cpp 的功能也可以实现相同的效果。

➜  bin ./llama-server --model /Users/bytedance/.ollama/models/blobs/sha256-43f7a214e5329f672bb05404cfba1913cbb70fdaa1a17497224e1925046b0ed5 --ctx-size 8192 --batch-size 512 --log-disable --n-gpu-layers 29 --parallel 4 --port 61974
INFO [ main] build info | tid="0x1fa794c00" timestamp=1723624128 build=3581 commit="06943a69"
INFO [ main] system info | tid="0x1fa794c00" timestamp=1723624128 n_threads=8 n_threads_batch=-1 total_threads=10 system_info="AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 1 | SVE = 0 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | "
llama_model_loader: loaded meta data with 21 key-value pairs and 339 tensors from /Users/bytedance/.ollama/models/blobs/sha256-43f7a214e5329f672bb05404cfba1913cbb70fdaa1a17497224e1925046b0ed5 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen2
llama_model_loader: - kv 1: general.name str = Qwen2-7B-Instruct
llama_model_loader: - kv 2: qwen2.block_count u32 = 28
llama_model_loader: - kv 3: qwen2.context_length u32 = 32768
llama_model_loader: - kv 4: qwen2.embedding_length u32 = 3584
llama_model_loader: - kv 5: qwen2.feed_forward_length u32 = 18944
llama_model_loader: - kv 6: qwen2.attention.head_count u32 = 28
llama_model_loader: - kv 7: qwen2.attention.head_count_kv u32 = 4
llama_model_loader: - kv 8: qwen2.rope.freq_base f32 = 1000000.000000
llama_model_loader: - kv 9: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 10: general.file_type u32 = 2
llama_model_loader: - kv 11: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 12: tokenizer.ggml.pre str = qwen2
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 = 151645
llama_model_loader: - kv 17: tokenizer.ggml.padding_token_id u32 = 151643
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 151643
llama_model_loader: - kv 19: tokenizer.chat_template str = {% for message in messages %}{% if lo...
llama_model_loader: - kv 20: general.quantization_version u32 = 2
llama_model_loader: - type f32: 141 tensors
llama_model_loader: - type q4_0: 197 tensors
llama_model_loader: - type q6_K: 1 tensors
...

打开 本地服务

llama cpp.png

可以看到,llama-server 已经内嵌了一个简单的 UI 页面,也可以切换到右上角的 New UI,有个更加美观的页面。在这里简单的页面上我们就可以和 LLM 大模型进行交流了。

llama_cpp_new_ui.png

但是 Ollama 启动的 Ollama-llama-server 这个 UI 被移除了。

curl 'http://localhost:61974/completion' \
-H 'Accept: text/event-stream' \
-H 'Accept-Language: en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7' \
-H 'Cache-Control: no-cache' \
-H 'Connection: keep-alive' \
-H 'Content-Type: application/json' \
-H 'Cookie: gitea_incredible=jRSMcBghtF%3A63878871dcbaf7a40498c267e6df0b786550c2b7f8ab9e1d46d610e0affc4286' \
-H 'DNT: 1' \
-H 'Origin: http://localhost:61974' \
-H 'Pragma: no-cache' \
-H 'Referer: http://localhost:61974/' \
-H 'Sec-Fetch-Dest: empty' \
-H 'Sec-Fetch-Mode: cors' \
-H 'Sec-Fetch-Site: same-origin' \
-H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36' \
-H 'sec-ch-ua: "Not)A;Brand";v="99", "Google Chrome";v="127", "Chromium";v="127"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'sec-ch-ua-platform: "macOS"' \
--data-raw '{"stream":true,"n_predict":400,"temperature":0.7,"stop":["</s>","Llama:","User:"],"repeat_last_n":256,"repeat_penalty":1.18,"penalize_nl":false,"top_k":40,"top_p":0.95,"min_p":0.05,"tfs_z":1,"typical_p":1,"presence_penalty":0,"frequency_penalty":0,"mirostat":0,"mirostat_tau":5,"mirostat_eta":0.1,"grammar":"","n_probs":0,"min_keep":0,"image_data":[],"cache_prompt":true,"api_key":"","slot_id":-1,"prompt":"This is a conversation between User and Llama, a friendly chatbot. Llama is helpful, kind, honest, good at writing, and never fails to answer any requests immediately and with precision.\n\nUser: hello\nLlama:"}'
➜  ~ curl 'http://localhost:61974/completion' \
-H 'Accept: text/event-stream' \
-H 'Accept-Language: en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7' \
-H 'Cache-Control: no-cache' \
-H 'Connection: keep-alive' \
-H 'Content-Type: application/json' \
-H 'Cookie: gitea_incredible=jRSMcBghtF%3A63878871dcbaf7a40498c267e6df0b786550c2b7f8ab9e1d46d610e0affc4286' \
-H 'DNT: 1' \
-H 'Origin: http://localhost:61974' \
-H 'Pragma: no-cache' \
-H 'Referer: http://localhost:61974/' \
-H 'Sec-Fetch-Dest: empty' \
-H 'Sec-Fetch-Mode: cors' \
-H 'Sec-Fetch-Site: same-origin' \
-H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36' \
-H 'sec-ch-ua: "Not)A;Brand";v="99", "Google Chrome";v="127", "Chromium";v="127"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'sec-ch-ua-platform: "macOS"' \
--data-raw '{"stream":true,"n_predict":400,"temperature":0.7,"stop":["</s>","Llama:","User:"],"repeat_last_n":256,"repeat_penalty":1.18,"penalize_nl":false,"top_k":40,"top_p":0.95,"min_p":0.05,"tfs_z":1,"typical_p":1,"presence_penalty":0,"frequency_penalty":0,"mirostat":0,"mirostat_tau":5,"mirostat_eta":0.1,"grammar":"","n_probs":0,"min_keep":0,"image_data":[],"cache_prompt":true,"api_key":"","slot_id":-1,"prompt":"This is a conversation between User and Llama, a friendly chatbot. Llama is helpful, kind, honest, good at writing, and never fails to answer any requests immediately and with precision.\n\nUser: hello\nLlama:"}'
data: {"content":" Hello","stop":false,"id_slot":0,"multimodal":false}

data: {"content":" there","stop":false,"id_slot":0,"multimodal":false}

data: {"content":"!","stop":false,"id_slot":0,"multimodal":false}

data: {"content":" How","stop":false,"id_slot":0,"multimodal":false}

data: {"content":" can","stop":false,"id_slot":0,"multimodal":false}

data: {"content":" I","stop":false,"id_slot":0,"multimodal":false}

data: {"content":" assist","stop":false,"id_slot":0,"multimodal":false}

data: {"content":" you","stop":false,"id_slot":0,"multimodal":false}

data: {"content":" today","stop":false,"id_slot":0,"multimodal":false}

data: {"content":"?\n\n","stop":false,"id_slot":0,"multimodal":false}

data: {"content":"","stop":false,"id_slot":0,"multimodal":false}

data: {"content":"","stop":false,"id_slot":0,"multimodal":false}

data: {"content":"","id_slot":0,"stop":true,"model":"/Users/bytedance/.ollama/models/blobs/sha256-43f7a214e5329f672bb05404cfba1913cbb70fdaa1a17497224e1925046b0ed5","tokens_predicted":12,"tokens_evaluated":47,"generation_settings":{"n_ctx":2048,"n_predict":-1,"model":"/Users/bytedance/.ollama/models/blobs/sha256-43f7a214e5329f672bb05404cfba1913cbb70fdaa1a17497224e1925046b0ed5","seed":4294967295,"temperature":0.699999988079071,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"tfs_z":1.0,"typical_p":1.0,"repeat_last_n":256,"repeat_penalty":1.1799999475479126,"presence_penalty":0.0,"frequency_penalty":0.0,"penalty_prompt_tokens":[],"use_penalty_prompt_tokens":false,"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"penalize_nl":false,"stop":["</s>","Llama:","User:"],"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":true,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"","samplers":["top_k","tfs_z","typical_p","top_p","min_p","temperature"]},"prompt":"This is a conversation between User and Llama, a friendly chatbot. Llama is helpful, kind, honest, good at writing, and never fails to answer any requests immediately and with precision.\n\nUser: hello\nLlama:","truncated":false,"stopped_eos":false,"stopped_word":true,"stopped_limit":false,"stopping_word":"User:","tokens_cached":58,"timings":{"prompt_n":1,"prompt_ms":1468.578,"prompt_per_token_ms":1468.578,"prompt_per_second":0.6809308051734398,"predicted_n":12,"predicted_ms":336.452,"predicted_per_token_ms":28.037666666666667,"predicted_per_second":35.66630604068337}}

➜ ~

Ollama UI/GUI

UI: open-webui

pip install open-webui && open-webui serve

open_ui.png

GUI: Hollama hollama.png

架构

ollama_arch.svg

亮点

  • 将 llama.cpp 中繁杂的操作,包装成更加简单的操作。用户不需要关心如何在本地启动项目,模型文件放在哪里,只需要选择是哪种模型,就可以直接对话。
  • 更加动态的支持更多的模型,定义了 Modelfile 文件,和 Dockerfile 文件一样。支持模型仓库
  • 支持自定义 Modelfile 文件,支持导入 GGUF(GPT-Generated Unified Format) 模型文件。或者发布模型。(pull/push)
  • 更加友好的图形化页面。有丰富的 UI