# MLLM部署文档 对MLLM进行推理加速和部署可以查看[lmdeploy推理加速文档](LmDeploy推理加速文档.md)和[vLLM推理加速文档](vLLM推理加速文档.md). ## 目录 - [环境准备](#环境准备) - [qwen-vl-chat](#qwen-vl-chat) - [yi-vl-6b-chat](#yi-vl-6b-chat) - [minicpm-v-v2_5-chat](#minicpm-v-v2_5-chat) - [语音与视频模态](#语音与视频模态) ## 环境准备 ```shell git clone https://github.com/modelscope/swift.git cd swift pip install -e '.[llm]' ``` 以下我们给出了若干模型的例子(选择了尺寸较小的模型来方便实验),相信聪明的你可以从中找到部署与调用的规律,我就不多介绍啦。 ## qwen-vl-chat **服务端:** ```bash # 使用原始模型 CUDA_VISIBLE_DEVICES=0 swift deploy --model_type qwen-vl-chat # 使用微调后的LoRA CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir output/qwen-vl-chat/vx-xxx/checkpoint-xxx # 使用微调后Merge LoRA的模型 CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir output/qwen-vl-chat/vx-xxx/checkpoint-xxx-merged ``` **客户端:** 测试: ```bash curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "qwen-vl-chat", "messages": [{"role": "user", "content": "https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/rose.jpg图中是什么花,有几只?"}], "max_tokens": 256, "temperature": 0 }' ``` 使用swift: ```python from swift.llm import get_model_list_client, XRequestConfig, inference_client model_list = get_model_list_client() model_type = model_list.data[0].id print(f'model_type: {model_type}') # use base64 # import base64 # with open('rose.jpg', 'rb') as f: # img_base64 = base64.b64encode(f.read()).decode('utf-8') # query = f'{img_base64}图中是什么花,有几只?' # use local_path # query = 'rose.jpg图中是什么花,有几只?' # use url query = 'https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/rose.jpg图中是什么花,有几只?' request_config = XRequestConfig(seed=42) resp = inference_client(model_type, query, request_config=request_config) response = resp.choices[0].message.content print(f'query: {query}') print(f'response: {response}') history = [(query, response)] query = '框出图中的花' request_config = XRequestConfig(stream=True, seed=42) stream_resp = inference_client(model_type, query, history, request_config=request_config) print(f'query: {query}') print('response: ', end='') for chunk in stream_resp: print(chunk.choices[0].delta.content, end='', flush=True) print() """ model_type: qwen-vl-chat query: rose.jpg图中是什么花,有几只? response: 图中是三朵红玫瑰花。 query: 框出图中的花 response: (34,449),(368,981)(342,456),(670,917)(585,508),(859,977) """ ``` 使用openai: ```python from openai import OpenAI client = OpenAI( api_key='EMPTY', base_url='http://localhost:8000/v1', ) model_type = client.models.list().data[0].id print(f'model_type: {model_type}') # use base64 # import base64 # with open('rose.jpg', 'rb') as f: # img_base64 = base64.b64encode(f.read()).decode('utf-8') # image_url = f'data:image/jpeg;base64,{img_base64}' # use local_path # from swift.llm import convert_to_base64 # image_url = convert_to_base64(images=['rose.jpg'])['images'][0] # image_url = f'data:image/jpeg;base64,{image_url}' # use url image_url = 'https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/rose.jpg' query = '图中是什么花,有几只?' messages = [{ 'role': 'user', 'content': [ {'type': 'image_url', 'image_url': {'url': image_url}}, {'type': 'text', 'text': query}, ] }] resp = client.chat.completions.create( model=model_type, messages=messages, seed=42) response = resp.choices[0].message.content print(f'query: {query}') print(f'response: {response}') # 流式 messages.append({'role': 'assistant', 'content': response}) query = '框出图中的花' messages.append({'role': 'user', 'content': query}) stream_resp = client.chat.completions.create( model=model_type, messages=messages, stream=True, seed=42) print(f'query: {query}') print('response: ', end='') for chunk in stream_resp: print(chunk.choices[0].delta.content, end='', flush=True) print() """Out[0] model_type: qwen-vl-chat query: https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/rose.jpg图中是什么花,有几只? response: 图中是三朵红玫瑰花。 query: 框出图中的花 response: (34,449),(368,981)(342,456),(670,917)(585,508),(859,977) """ ``` ## yi-vl-6b-chat **服务端:** ```bash # 使用原始模型 CUDA_VISIBLE_DEVICES=0 swift deploy --model_type yi-vl-6b-chat # 使用微调后的LoRA CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir output/yi-vl-6b-chat/vx-xxx/checkpoint-xxx # 使用微调后Merge LoRA的模型 CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir output/yi-vl-6b-chat/vx-xxx/checkpoint-xxx-merged ``` **客户端:** 测试: ```bash curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "yi-vl-6b-chat", "messages": [{"role": "user", "content": "描述这张图片"}], "temperature": 0, "images": ["http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png"] }' ``` 使用swift: ```python from swift.llm import get_model_list_client, XRequestConfig, inference_client model_list = get_model_list_client() model_type = model_list.data[0].id print(f'model_type: {model_type}') # use base64 # import base64 # with open('cat.png', 'rb') as f: # img_base64 = base64.b64encode(f.read()).decode('utf-8') # images = [img_base64] # use local_path # from swift.llm import convert_to_base64 # images = ['cat.png'] # images = convert_to_base64(images=images)['images'] # use url images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png'] query = '描述这张图片' request_config = XRequestConfig(temperature=0) resp = inference_client(model_type, query, images=images, request_config=request_config) response = resp.choices[0].message.content print(f'query: {query}') print(f'response: {response}') history = [(query, response)] query = '图中有几只羊' images.append('http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png') request_config = XRequestConfig(stream=True, temperature=0) stream_resp = inference_client(model_type, query, history, images=images, request_config=request_config) print(f'query: {query}') print('response: ', end='') for chunk in stream_resp: print(chunk.choices[0].delta.content, end='', flush=True) print() """ model_type: yi-vl-6b-chat query: 描述这张图片 response: 图片显示一只小猫坐在地板上,眼睛睁开,凝视着摄像机。小猫看起来很可爱,有灰色和白色的毛皮,以及蓝色的眼睛。小猫似乎正在看摄像机,可能被吸引到它正在拍摄它的照片或视频。 query: 图中有几只羊 response: 图中有四只羊. """ ``` 使用openai: ```python from openai import OpenAI client = OpenAI( api_key='EMPTY', base_url='http://localhost:8000/v1', ) model_type = client.models.list().data[0].id print(f'model_type: {model_type}') # use base64 # import base64 # with open('cat.png', 'rb') as f: # img_base64 = base64.b64encode(f.read()).decode('utf-8') # image_url = f'data:image/jpeg;base64,{img_base64}' # use local_path # from swift.llm import convert_to_base64 # image_url = convert_to_base64(images=['cat.png'])['images'][0] # image_url = f'data:image/jpeg;base64,{image_url}' # use url image_url = 'http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png' query = '描述这张图片' messages = [{ 'role': 'user', 'content': [ {'type': 'image_url', 'image_url': {'url': image_url}}, {'type': 'text', 'text': query}, ] }] resp = client.chat.completions.create( model=model_type, messages=messages, temperature=0) response = resp.choices[0].message.content print(f'query: {query}') print(f'response: {response}') # 流式 messages.append({'role': 'assistant', 'content': response}) query = '图中有几只羊' messages.append({'role': 'user', 'content': [ {'type': 'image_url', 'image_url': {'url': 'http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png'}}, {'type': 'text', 'text': query}, ]}) stream_resp = client.chat.completions.create( model=model_type, messages=messages, stream=True, temperature=0) print(f'query: {query}') print('response: ', end='') for chunk in stream_resp: print(chunk.choices[0].delta.content, end='', flush=True) print() """ model_type: yi-vl-6b-chat query: 描述这张图片 response: 图片显示一只小猫坐在地板上,眼睛睁开,凝视着摄像机。小猫看起来很可爱,有灰色和白色的毛皮,以及蓝色的眼睛。小猫似乎正在看摄像机,可能被吸引到它正在拍摄它的照片或视频。 query: 图中有几只羊 response: 图中有四只羊. """ ``` ## minicpm-v-v2_5-chat **服务端:** ```bash # 使用原始模型 CUDA_VISIBLE_DEVICES=0 swift deploy --model_type minicpm-v-v2_5-chat # 使用微调后的LoRA CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir output/minicpm-v-v2_5-chat/vx-xxx/checkpoint-xxx # 使用微调后Merge LoRA的模型 CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir output/minicpm-v-v2_5-chat/vx-xxx/checkpoint-xxx-merged ``` **客户端:** 测试: ```bash curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "minicpm-v-v2_5-chat", "messages": [{"role": "user", "content": "描述这张图片"}], "temperature": 0, "images": ["http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png"] }' ``` 使用swift: ```python from swift.llm import get_model_list_client, XRequestConfig, inference_client model_list = get_model_list_client() model_type = model_list.data[0].id print(f'model_type: {model_type}') # use base64 # import base64 # with open('cat.png', 'rb') as f: # img_base64 = base64.b64encode(f.read()).decode('utf-8') # images = [img_base64] # use local_path # from swift.llm import convert_to_base64 # images = ['cat.png'] # images = convert_to_base64(images=images)['images'] # use url images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png'] query = '描述这张图片' request_config = XRequestConfig(temperature=0) resp = inference_client(model_type, query, images=images, request_config=request_config) response = resp.choices[0].message.content print(f'query: {query}') print(f'response: {response}') history = [(query, response)] query = '这张图是如何产生的?' request_config = XRequestConfig(stream=True, temperature=0) stream_resp = inference_client(model_type, query, history, images=images, request_config=request_config) print(f'query: {query}') print('response: ', end='') for chunk in stream_resp: print(chunk.choices[0].delta.content, end='', flush=True) print() """ model_type: minicpm-v-v2_5-chat query: 描述这张图片 response: 这张图片展示了一只年轻的猫咪的特写,可能是一只小猫,具有明显的特征。它的毛皮主要是白色的,带有灰色和黑色的条纹,尤其是在脸部周围。小猫的眼睛很大,呈蓝色,给人一种好奇和迷人的表情。耳朵尖尖,竖立着,显示出警觉性。背景模糊不清,突出了小猫作为图片的主题。整体的色调柔和,猫咪的毛皮与背景的柔和色调形成对比。 query: 这张图是如何产生的? response: 这张图片看起来是用数字绘画技术创作的。艺术家使用数字绘图工具来模仿毛皮的纹理和颜色,眼睛的反射,以及整体的柔和感。这种技术使艺术家能够精确地控制细节和色彩,创造出逼真的猫咪形象。 """ ``` 使用openai: ```python from openai import OpenAI client = OpenAI( api_key='EMPTY', base_url='http://localhost:8000/v1', ) model_type = client.models.list().data[0].id print(f'model_type: {model_type}') # use base64 # import base64 # with open('cat.png', 'rb') as f: # img_base64 = base64.b64encode(f.read()).decode('utf-8') # image_url = f'data:image/jpeg;base64,{img_base64}' # use local_path # from swift.llm import convert_to_base64 # image_url = convert_to_base64(images=['cat.png'])['images'][0] # image_url = f'data:image/jpeg;base64,{image_url}' # use url image_url = 'http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png' query = '描述这张图片' messages = [{ 'role': 'user', 'content': [ {'type': 'image_url', 'image_url': {'url': image_url}}, {'type': 'text', 'text': query}, ] }] resp = client.chat.completions.create( model=model_type, messages=messages, temperature=0) response = resp.choices[0].message.content print(f'query: {query}') print(f'response: {response}') # 流式 messages.append({'role': 'assistant', 'content': response}) query = '这张图是如何产生的?' messages.append({'role': 'user', 'content': query}) stream_resp = client.chat.completions.create( model=model_type, messages=messages, stream=True, temperature=0) print(f'query: {query}') print('response: ', end='') for chunk in stream_resp: print(chunk.choices[0].delta.content, end='', flush=True) print() """ model_type: minicpm-v-v2_5-chat query: 描述这张图片 response: 这张图片展示了一只年轻的猫咪的特写,可能是一只小猫,具有明显的特征。它的毛皮主要是白色的,带有灰色和黑色的条纹,尤其是在脸部周围。小猫的眼睛很大,呈蓝色,给人一种好奇和迷人的表情。耳朵尖尖,竖立着,显示出警觉性。背景模糊不清,突出了小猫作为图片的主题。整体的色调柔和,猫咪的毛皮与背景的柔和色调形成对比。 query: 这张图是如何产生的? response: 这张图片看起来是用数字绘画技术创作的。艺术家使用数字绘图工具来模仿毛皮的纹理和颜色,眼睛的反射,以及整体的柔和感。这种技术使艺术家能够精确地控制细节和色彩,创造出逼真的猫咪形象。 """ ``` ## 语音与视频模态 ### qwen2-audio-7b-instruct **服务端:** ```bash # pip install transformers>=4.45 CUDA_VISIBLE_DEVICES=0 swift deploy --model_type qwen2-audio-7b-instruct # ... ``` **客户端:** 使用swift: ```python from swift.llm import get_model_list_client, XRequestConfig, inference_client model_list = get_model_list_client() model_type = model_list.data[0].id print(f'model_type: {model_type}') # use base64 # import base64 # with open('weather.wav', 'rb') as f: # aud_base64 = base64.b64encode(f.read()).decode('utf-8') # audios = [aud_base64] # use local_path # from swift.llm import convert_to_base64 # audios = ['weather.wav'] # audios = convert_to_base64(images=audios)['images'] # use url audios = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav'] query = '