# LmDeploy推理加速与部署 lmdeploy github: [https://github.com/InternLM/lmdeploy](https://github.com/InternLM/lmdeploy). 支持lmdeploy推理加速的多模态模型可以查看[支持的模型](../Instruction/支持的模型和数据集.md#多模态大模型). ## 目录 - [环境准备](#环境准备) - [推理加速](#推理加速) - [部署](#部署) ## 环境准备 GPU设备: A10, 3090, V100, A100均可. ```bash # 设置pip全局镜像 (加速下载) pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/ # 安装ms-swift git clone https://github.com/modelscope/swift.git cd swift pip install -e '.[llm]' # lmdeploy与cuda版本有对应关系,请按照`https://github.com/InternLM/lmdeploy#installation`进行安装 pip install lmdeploy ``` ## 推理加速 ### 使用python [OpenGVLab/InternVL2-2B](https://modelscope.cn/models/OpenGVLab/InternVL2-2B/summary) ```python import os os.environ['CUDA_VISIBLE_DEVICES'] = '0' # from swift.hub import HubApi # _api = HubApi() # _api.login('') # https://modelscope.cn/my/myaccesstoken from swift.llm import ( ModelType, get_lmdeploy_engine, get_default_template_type, get_template, inference_lmdeploy, inference_stream_lmdeploy ) model_type = ModelType.internvl2_2b model_id_or_path = None lmdeploy_engine = get_lmdeploy_engine(model_type, model_id_or_path=model_id_or_path) template_type = get_default_template_type(model_type) template = get_template(template_type, lmdeploy_engine.hf_tokenizer) lmdeploy_engine.generation_config.max_new_tokens = 256 generation_info = {} request_list = [{'query': '描述图片', 'images': ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png']}, {'query': '你是谁?'}, {'query': ( 'http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png' 'http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png' 'What is the difference bewteen the two images?' )}] resp_list = inference_lmdeploy(lmdeploy_engine, template, request_list, generation_info=generation_info) for request, resp in zip(request_list, resp_list): print(f"query: {request['query']}") print(f"response: {resp['response']}") print(generation_info) # stream request_list = [{'query': '