# Qwen2-VL 最佳实践 qwen2-vl-72b-instruct的最佳实践可以查看[这里](https://github.com/modelscope/ms-swift/issues/2064). ## 目录 - [环境准备](#环境准备) - [推理](#推理) - [微调](#微调) ## 环境准备 ```shell git clone https://github.com/modelscope/ms-swift.git cd ms-swift pip install -e .[llm] pip install pyav qwen_vl_utils ``` 模型：（支持base/instruct/gptq-int4/gptq-int8/awq微调） - qwen2-vl-2b-instruct: [https://modelscope.cn/models/qwen/Qwen2-VL-2B-Instruct](https://modelscope.cn/models/qwen/Qwen2-VL-2B-Instruct) - qwen2-vl-7b-instruct: [https://modelscope.cn/models/qwen/Qwen2-VL-7B-Instruct](https://modelscope.cn/models/qwen/Qwen2-VL-7B-Instruct) - qwen2-vl-72b-instruct: [https://modelscope.cn/models/qwen/Qwen2-VL-72B-Instruct](https://modelscope.cn/models/qwen/Qwen2-VL-72B-Instruct) ## 推理推理qwen2-vl-7b-instruct: ```shell # Experimental environment: A100 # 30GB GPU memory CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen2-vl-7b-instruct ``` 输出: (支持传入本地路径或URL) ```python """ <<< 你是谁？我是来自阿里云的大规模语言模型，我叫通义千问。 -------------------------------------------------- <<< 这两张图片有什么区别 Input an image path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png Input an image path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png [INFO:swift] Setting size_factor: 28. You can adjust this hyperparameter through the environment variable: `SIZE_FACTOR`. [INFO:swift] Setting resized_height: None. You can adjust this hyperparameter through the environment variable: `RESIZED_HEIGHT`. [INFO:swift] Setting resized_width: None. You can adjust this hyperparameter through the environment variable: `RESIZED_WIDTH`. [INFO:swift] Setting min_pixels: 3136. You can adjust this hyperparameter through the environment variable: `MIN_PIXELS`. [INFO:swift] Setting max_pixels: 12845056. You can adjust this hyperparameter through the environment variable: `MAX_PIXELS`. 这两张图片的内容确实不同。第一张图片是一只小猫的特写，它有着大大的眼睛和柔软的毛发，显得非常可爱。第二张图片是一群羊的卡通插画，背景是绿色的草地和山脉，显得非常温馨和自然。 -------------------------------------------------- <<< 图中有几只羊 Input an image path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png 图中有四只羊。 -------------------------------------------------- <<< 计算结果是多少 Input an image path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png 1452 + 45304 = 46756 -------------------------------------------------- <<< 对图片进行OCR Input an image path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/ocr.png 图片中的文字内容如下：简介 SWIFT支持250+ LLM和35+ MLLM（多模态大模型）的训练、推理、评测和部署。开发者可以直接将我们的框架应用到自己的Research和生产环境中，实现模型训练评测到应用的完整链路。我们除支持了 PEFT提供的轻量训练方案外，也提供了一个完整的Adapters库以支持最新的训练技术，如NEFTune、LoRA+、LLaMA-PRO等，这个适配器库可以脱离训练脚本直接使用在自己的自定流程中。为方便不熟悉深度学习的用户使用，我们提供了一个Gradio的web-ui用于控制训练和推理，并提供了配套的深度学习课程和最佳实践供新手入门。此外，我们也在拓展其他模态的能力，目前我们支持了AnimateDiff的全参数训练和LoRA训练。 SWIFT具有丰富的文档体系，如有使用问题请查看这里. 可以在Huggingface space 和 ModelScope创空间中体验SWIFT web-ui功能了。 -------------------------------------------------- <<< clear <<<