# 支持的模型和数据集
## 目录
- [模型](#模型)
- [大语言模型](#大语言模型)
- [多模态大模型](#多模态大模型)
- [数据集](#数据集)
## 模型
下表介绍了swift介入的模型的相关信息:
- Model List: 模型在swift中注册的model_type的列表.
- Default Lora Target Modules: 对应模型的默认lora_target_modules.
- Default Template: 对应模型的默认template.
- Support Flash Attn: 模型是否支持[flash attention](https://github.com/Dao-AILab/flash-attention)加速推理和微调.
- Support VLLM: 模型是否支持[vllm](https://github.com/vllm-project/vllm)加速推理和部署.
- Requires: 对应模型所需的额外依赖要求.
### 大语言模型
| Model Type | Model ID | Default Lora Target Modules | Default Template | Support Flash Attn | Support vLLM | Support LMDeploy | Support Megatron | Requires | Tags | HF Model ID |
| --------- | -------- | --------------------------- | ---------------- | ------------------ | ------------ | ---------------- | ---------------- | -------- | ---- | ----------- |
|qwen-1_8b|[qwen/Qwen-1_8B](https://modelscope.cn/models/qwen/Qwen-1_8B/summary)|c_attn|default-generation|✔|✔|✔|✘||-|[Qwen/Qwen-1_8B](https://huggingface.co/Qwen/Qwen-1_8B)|
|qwen-1_8b-chat|[qwen/Qwen-1_8B-Chat](https://modelscope.cn/models/qwen/Qwen-1_8B-Chat/summary)|c_attn|qwen|✔|✔|✔|✘||-|[Qwen/Qwen-1_8B-Chat](https://huggingface.co/Qwen/Qwen-1_8B-Chat)|
|qwen-1_8b-chat-int4|[qwen/Qwen-1_8B-Chat-Int4](https://modelscope.cn/models/qwen/Qwen-1_8B-Chat-Int4/summary)|c_attn|qwen|✔|✔|✘|✘|auto_gptq>=0.5|-|[Qwen/Qwen-1_8B-Chat-Int4](https://huggingface.co/Qwen/Qwen-1_8B-Chat-Int4)|
|qwen-1_8b-chat-int8|[qwen/Qwen-1_8B-Chat-Int8](https://modelscope.cn/models/qwen/Qwen-1_8B-Chat-Int8/summary)|c_attn|qwen|✔|✔|✘|✘|auto_gptq>=0.5|-|[Qwen/Qwen-1_8B-Chat-Int8](https://huggingface.co/Qwen/Qwen-1_8B-Chat-Int8)|
|qwen-7b|[qwen/Qwen-7B](https://modelscope.cn/models/qwen/Qwen-7B/summary)|c_attn|default-generation|✔|✔|✔|✘||-|[Qwen/Qwen-7B](https://huggingface.co/Qwen/Qwen-7B)|
|qwen-7b-chat|[qwen/Qwen-7B-Chat](https://modelscope.cn/models/qwen/Qwen-7B-Chat/summary)|c_attn|qwen|✔|✔|✔|✘||-|[Qwen/Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat)|
|qwen-7b-chat-int4|[qwen/Qwen-7B-Chat-Int4](https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int4/summary)|c_attn|qwen|✔|✔|✘|✘|auto_gptq>=0.5|-|[Qwen/Qwen-7B-Chat-Int4](https://huggingface.co/Qwen/Qwen-7B-Chat-Int4)|
|qwen-7b-chat-int8|[qwen/Qwen-7B-Chat-Int8](https://modelscope.cn/models/qwen/Qwen-7B-Chat-Int8/summary)|c_attn|qwen|✔|✔|✘|✘|auto_gptq>=0.5|-|[Qwen/Qwen-7B-Chat-Int8](https://huggingface.co/Qwen/Qwen-7B-Chat-Int8)|
|qwen-14b|[qwen/Qwen-14B](https://modelscope.cn/models/qwen/Qwen-14B/summary)|c_attn|default-generation|✔|✔|✔|✘||-|[Qwen/Qwen-14B](https://huggingface.co/Qwen/Qwen-14B)|
|qwen-14b-chat|[qwen/Qwen-14B-Chat](https://modelscope.cn/models/qwen/Qwen-14B-Chat/summary)|c_attn|qwen|✔|✔|✔|✘||-|[Qwen/Qwen-14B-Chat](https://huggingface.co/Qwen/Qwen-14B-Chat)|
|qwen-14b-chat-int4|[qwen/Qwen-14B-Chat-Int4](https://modelscope.cn/models/qwen/Qwen-14B-Chat-Int4/summary)|c_attn|qwen|✔|✔|✘|✘|auto_gptq>=0.5|-|[Qwen/Qwen-14B-Chat-Int4](https://huggingface.co/Qwen/Qwen-14B-Chat-Int4)|
|qwen-14b-chat-int8|[qwen/Qwen-14B-Chat-Int8](https://modelscope.cn/models/qwen/Qwen-14B-Chat-Int8/summary)|c_attn|qwen|✔|✔|✘|✘|auto_gptq>=0.5|-|[Qwen/Qwen-14B-Chat-Int8](https://huggingface.co/Qwen/Qwen-14B-Chat-Int8)|
|qwen-72b|[qwen/Qwen-72B](https://modelscope.cn/models/qwen/Qwen-72B/summary)|c_attn|default-generation|✔|✔|✔|✘||-|[Qwen/Qwen-72B](https://huggingface.co/Qwen/Qwen-72B)|
|qwen-72b-chat|[qwen/Qwen-72B-Chat](https://modelscope.cn/models/qwen/Qwen-72B-Chat/summary)|c_attn|qwen|✔|✔|✔|✘||-|[Qwen/Qwen-72B-Chat](https://huggingface.co/Qwen/Qwen-72B-Chat)|
|qwen-72b-chat-int4|[qwen/Qwen-72B-Chat-Int4](https://modelscope.cn/models/qwen/Qwen-72B-Chat-Int4/summary)|c_attn|qwen|✔|✔|✘|✘|auto_gptq>=0.5|-|[Qwen/Qwen-72B-Chat-Int4](https://huggingface.co/Qwen/Qwen-72B-Chat-Int4)|
|qwen-72b-chat-int8|[qwen/Qwen-72B-Chat-Int8](https://modelscope.cn/models/qwen/Qwen-72B-Chat-Int8/summary)|c_attn|qwen|✔|✔|✘|✘|auto_gptq>=0.5|-|[Qwen/Qwen-72B-Chat-Int8](https://huggingface.co/Qwen/Qwen-72B-Chat-Int8)|
|modelscope-agent-7b|[iic/ModelScope-Agent-7B](https://modelscope.cn/models/iic/ModelScope-Agent-7B/summary)|c_attn|modelscope-agent|✔|✘|✘|✘||-|-|
|modelscope-agent-14b|[iic/ModelScope-Agent-14B](https://modelscope.cn/models/iic/ModelScope-Agent-14B/summary)|c_attn|modelscope-agent|✔|✘|✘|✘||-|-|
|qwen1half-0_5b|[qwen/Qwen1.5-0.5B](https://modelscope.cn/models/qwen/Qwen1.5-0.5B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen1.5-0.5B](https://huggingface.co/Qwen/Qwen1.5-0.5B)|
|qwen1half-1_8b|[qwen/Qwen1.5-1.8B](https://modelscope.cn/models/qwen/Qwen1.5-1.8B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen1.5-1.8B](https://huggingface.co/Qwen/Qwen1.5-1.8B)|
|qwen1half-4b|[qwen/Qwen1.5-4B](https://modelscope.cn/models/qwen/Qwen1.5-4B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen1.5-4B](https://huggingface.co/Qwen/Qwen1.5-4B)|
|qwen1half-7b|[qwen/Qwen1.5-7B](https://modelscope.cn/models/qwen/Qwen1.5-7B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen1.5-7B](https://huggingface.co/Qwen/Qwen1.5-7B)|
|qwen1half-14b|[qwen/Qwen1.5-14B](https://modelscope.cn/models/qwen/Qwen1.5-14B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen1.5-14B](https://huggingface.co/Qwen/Qwen1.5-14B)|
|qwen1half-32b|[qwen/Qwen1.5-32B](https://modelscope.cn/models/qwen/Qwen1.5-32B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen1.5-32B](https://huggingface.co/Qwen/Qwen1.5-32B)|
|qwen1half-72b|[qwen/Qwen1.5-72B](https://modelscope.cn/models/qwen/Qwen1.5-72B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen1.5-72B](https://huggingface.co/Qwen/Qwen1.5-72B)|
|qwen1half-110b|[qwen/Qwen1.5-110B](https://modelscope.cn/models/qwen/Qwen1.5-110B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen1.5-110B](https://huggingface.co/Qwen/Qwen1.5-110B)|
|codeqwen1half-7b|[qwen/CodeQwen1.5-7B](https://modelscope.cn/models/qwen/CodeQwen1.5-7B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/CodeQwen1.5-7B](https://huggingface.co/Qwen/CodeQwen1.5-7B)|
|qwen1half-moe-a2_7b|[qwen/Qwen1.5-MoE-A2.7B](https://modelscope.cn/models/qwen/Qwen1.5-MoE-A2.7B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✘|✘|transformers>=4.40|moe|[Qwen/Qwen1.5-MoE-A2.7B](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B)|
|qwen1half-0_5b-chat|[qwen/Qwen1.5-0.5B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen1.5-0.5B-Chat](https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat)|
|qwen1half-1_8b-chat|[qwen/Qwen1.5-1.8B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-1.8B-Chat/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen1.5-1.8B-Chat](https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat)|
|qwen1half-4b-chat|[qwen/Qwen1.5-4B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-4B-Chat/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen1.5-4B-Chat](https://huggingface.co/Qwen/Qwen1.5-4B-Chat)|
|qwen1half-7b-chat|[qwen/Qwen1.5-7B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-7B-Chat/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat)|
|qwen1half-14b-chat|[qwen/Qwen1.5-14B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-14B-Chat/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen1.5-14B-Chat](https://huggingface.co/Qwen/Qwen1.5-14B-Chat)|
|qwen1half-32b-chat|[qwen/Qwen1.5-32B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-32B-Chat/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen1.5-32B-Chat](https://huggingface.co/Qwen/Qwen1.5-32B-Chat)|
|qwen1half-72b-chat|[qwen/Qwen1.5-72B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-72B-Chat/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen1.5-72B-Chat](https://huggingface.co/Qwen/Qwen1.5-72B-Chat)|
|qwen1half-110b-chat|[qwen/Qwen1.5-110B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-110B-Chat/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen1.5-110B-Chat](https://huggingface.co/Qwen/Qwen1.5-110B-Chat)|
|qwen1half-moe-a2_7b-chat|[qwen/Qwen1.5-MoE-A2.7B-Chat](https://modelscope.cn/models/qwen/Qwen1.5-MoE-A2.7B-Chat/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✘|✘|transformers>=4.40|moe|[Qwen/Qwen1.5-MoE-A2.7B-Chat](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B-Chat)|
|codeqwen1half-7b-chat|[qwen/CodeQwen1.5-7B-Chat](https://modelscope.cn/models/qwen/CodeQwen1.5-7B-Chat/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/CodeQwen1.5-7B-Chat](https://huggingface.co/Qwen/CodeQwen1.5-7B-Chat)|
|qwen1half-0_5b-chat-int4|[qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4)|
|qwen1half-1_8b-chat-int4|[qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4)|
|qwen1half-4b-chat-int4|[qwen/Qwen1.5-4B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-4B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-4B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-4B-Chat-GPTQ-Int4)|
|qwen1half-7b-chat-int4|[qwen/Qwen1.5-7B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-7B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-7B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-7B-Chat-GPTQ-Int4)|
|qwen1half-14b-chat-int4|[qwen/Qwen1.5-14B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-14B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-14B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-14B-Chat-GPTQ-Int4)|
|qwen1half-32b-chat-int4|[qwen/Qwen1.5-32B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-32B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-32B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-32B-Chat-GPTQ-Int4)|
|qwen1half-72b-chat-int4|[qwen/Qwen1.5-72B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-72B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-72B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-72B-Chat-GPTQ-Int4)|
|qwen1half-110b-chat-int4|[qwen/Qwen1.5-110B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-110B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-110B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-110B-Chat-GPTQ-Int4)|
|qwen1half-0_5b-chat-int8|[qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8](https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8)|
|qwen1half-1_8b-chat-int8|[qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8](https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8)|
|qwen1half-4b-chat-int8|[qwen/Qwen1.5-4B-Chat-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen1.5-4B-Chat-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-4B-Chat-GPTQ-Int8](https://huggingface.co/Qwen/Qwen1.5-4B-Chat-GPTQ-Int8)|
|qwen1half-7b-chat-int8|[qwen/Qwen1.5-7B-Chat-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen1.5-7B-Chat-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-7B-Chat-GPTQ-Int8](https://huggingface.co/Qwen/Qwen1.5-7B-Chat-GPTQ-Int8)|
|qwen1half-14b-chat-int8|[qwen/Qwen1.5-14B-Chat-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen1.5-14B-Chat-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-14B-Chat-GPTQ-Int8](https://huggingface.co/Qwen/Qwen1.5-14B-Chat-GPTQ-Int8)|
|qwen1half-72b-chat-int8|[qwen/Qwen1.5-72B-Chat-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen1.5-72B-Chat-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen1.5-72B-Chat-GPTQ-Int8](https://huggingface.co/Qwen/Qwen1.5-72B-Chat-GPTQ-Int8)|
|qwen1half-moe-a2_7b-chat-int4|[qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|✔|✘|✘|✘|auto_gptq>=0.5, transformers>=4.40|moe|[Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4](https://huggingface.co/Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4)|
|qwen1half-0_5b-chat-awq|[qwen/Qwen1.5-0.5B-Chat-AWQ](https://modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✘|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen1.5-0.5B-Chat-AWQ](https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat-AWQ)|
|qwen1half-1_8b-chat-awq|[qwen/Qwen1.5-1.8B-Chat-AWQ](https://modelscope.cn/models/qwen/Qwen1.5-1.8B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen1.5-1.8B-Chat-AWQ](https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat-AWQ)|
|qwen1half-4b-chat-awq|[qwen/Qwen1.5-4B-Chat-AWQ](https://modelscope.cn/models/qwen/Qwen1.5-4B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen1.5-4B-Chat-AWQ](https://huggingface.co/Qwen/Qwen1.5-4B-Chat-AWQ)|
|qwen1half-7b-chat-awq|[qwen/Qwen1.5-7B-Chat-AWQ](https://modelscope.cn/models/qwen/Qwen1.5-7B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen1.5-7B-Chat-AWQ](https://huggingface.co/Qwen/Qwen1.5-7B-Chat-AWQ)|
|qwen1half-14b-chat-awq|[qwen/Qwen1.5-14B-Chat-AWQ](https://modelscope.cn/models/qwen/Qwen1.5-14B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen1.5-14B-Chat-AWQ](https://huggingface.co/Qwen/Qwen1.5-14B-Chat-AWQ)|
|qwen1half-32b-chat-awq|[qwen/Qwen1.5-32B-Chat-AWQ](https://modelscope.cn/models/qwen/Qwen1.5-32B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen1.5-32B-Chat-AWQ](https://huggingface.co/Qwen/Qwen1.5-32B-Chat-AWQ)|
|qwen1half-72b-chat-awq|[qwen/Qwen1.5-72B-Chat-AWQ](https://modelscope.cn/models/qwen/Qwen1.5-72B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen1.5-72B-Chat-AWQ](https://huggingface.co/Qwen/Qwen1.5-72B-Chat-AWQ)|
|qwen1half-110b-chat-awq|[qwen/Qwen1.5-110B-Chat-AWQ](https://modelscope.cn/models/qwen/Qwen1.5-110B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen1.5-110B-Chat-AWQ](https://huggingface.co/Qwen/Qwen1.5-110B-Chat-AWQ)|
|codeqwen1half-7b-chat-awq|[qwen/CodeQwen1.5-7B-Chat-AWQ](https://modelscope.cn/models/qwen/CodeQwen1.5-7B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✘|transformers>=4.37, autoawq|-|[Qwen/CodeQwen1.5-7B-Chat-AWQ](https://huggingface.co/Qwen/CodeQwen1.5-7B-Chat-AWQ)|
|qwen2-0_5b|[qwen/Qwen2-0.5B](https://modelscope.cn/models/qwen/Qwen2-0.5B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen2-0.5B](https://huggingface.co/Qwen/Qwen2-0.5B)|
|qwen2-0_5b-instruct|[qwen/Qwen2-0.5B-Instruct](https://modelscope.cn/models/qwen/Qwen2-0.5B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen2-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct)|
|qwen2-0_5b-instruct-int4|[qwen/Qwen2-0.5B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2-0.5B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2-0.5B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct-GPTQ-Int4)|
|qwen2-0_5b-instruct-int8|[qwen/Qwen2-0.5B-Instruct-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen2-0.5B-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2-0.5B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct-GPTQ-Int8)|
|qwen2-0_5b-instruct-awq|[qwen/Qwen2-0.5B-Instruct-AWQ](https://modelscope.cn/models/qwen/Qwen2-0.5B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✘|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen2-0.5B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct-AWQ)|
|qwen2-1_5b|[qwen/Qwen2-1.5B](https://modelscope.cn/models/qwen/Qwen2-1.5B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen2-1.5B](https://huggingface.co/Qwen/Qwen2-1.5B)|
|qwen2-1_5b-instruct|[qwen/Qwen2-1.5B-Instruct](https://modelscope.cn/models/qwen/Qwen2-1.5B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen2-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct)|
|qwen2-1_5b-instruct-int4|[qwen/Qwen2-1.5B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2-1.5B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4)|
|qwen2-1_5b-instruct-int8|[qwen/Qwen2-1.5B-Instruct-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen2-1.5B-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2-1_5B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2-1_5B-Instruct-GPTQ-Int8)|
|qwen2-1_5b-instruct-awq|[qwen/Qwen2-1.5B-Instruct-AWQ](https://modelscope.cn/models/qwen/Qwen2-1.5B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen2-1.5B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct-AWQ)|
|qwen2-7b|[qwen/Qwen2-7B](https://modelscope.cn/models/qwen/Qwen2-7B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen2-7B](https://huggingface.co/Qwen/Qwen2-7B)|
|qwen2-7b-instruct|[qwen/Qwen2-7B-Instruct](https://modelscope.cn/models/qwen/Qwen2-7B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct)|
|qwen2-7b-instruct-int4|[qwen/Qwen2-7B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2-7B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2-7B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2-7B-Instruct-GPTQ-Int4)|
|qwen2-7b-instruct-int8|[qwen/Qwen2-7B-Instruct-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen2-7B-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2-7B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2-7B-Instruct-GPTQ-Int8)|
|qwen2-7b-instruct-awq|[qwen/Qwen2-7B-Instruct-AWQ](https://modelscope.cn/models/qwen/Qwen2-7B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen2-7B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2-7B-Instruct-AWQ)|
|qwen2-72b|[qwen/Qwen2-72B](https://modelscope.cn/models/qwen/Qwen2-72B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen2-72B](https://huggingface.co/Qwen/Qwen2-72B)|
|qwen2-72b-instruct|[qwen/Qwen2-72B-Instruct](https://modelscope.cn/models/qwen/Qwen2-72B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen2-72B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct)|
|qwen2-72b-instruct-int4|[qwen/Qwen2-72B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2-72B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2-72B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2-72B-Instruct-GPTQ-Int4)|
|qwen2-72b-instruct-int8|[qwen/Qwen2-72B-Instruct-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen2-72B-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2-72B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2-72B-Instruct-GPTQ-Int8)|
|qwen2-72b-instruct-awq|[qwen/Qwen2-72B-Instruct-AWQ](https://modelscope.cn/models/qwen/Qwen2-72B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen2-72B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2-72B-Instruct-AWQ)|
|qwen2-57b-a14b|[qwen/Qwen2-57B-A14B](https://modelscope.cn/models/qwen/Qwen2-57B-A14B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✘|✘|transformers>=4.40|moe|[Qwen/Qwen2-57B-A14B](https://huggingface.co/Qwen/Qwen2-57B-A14B)|
|qwen2-57b-a14b-instruct|[qwen/Qwen2-57B-A14B-Instruct](https://modelscope.cn/models/qwen/Qwen2-57B-A14B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✘|✘|transformers>=4.40|moe|[Qwen/Qwen2-57B-A14B-Instruct](https://huggingface.co/Qwen/Qwen2-57B-A14B-Instruct)|
|qwen2-57b-a14b-instruct-int4|[qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.40|moe|[Qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4)|
|qwen2-math-1_5b|[qwen/Qwen2-Math-1.5B](https://modelscope.cn/models/qwen/Qwen2-Math-1.5B/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen2-Math-1.5B](https://huggingface.co/Qwen/Qwen2-Math-1.5B)|
|qwen2-math-1_5b-instruct|[qwen/Qwen2-Math-1.5B-Instruct](https://modelscope.cn/models/qwen/Qwen2-Math-1.5B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen2-Math-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2-Math-1.5B-Instruct)|
|qwen2-math-7b|[qwen/Qwen2-Math-7B](https://modelscope.cn/models/qwen/Qwen2-Math-7B/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen2-Math-7B](https://huggingface.co/Qwen/Qwen2-Math-7B)|
|qwen2-math-7b-instruct|[qwen/Qwen2-Math-7B-Instruct](https://modelscope.cn/models/qwen/Qwen2-Math-7B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen2-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2-Math-7B-Instruct)|
|qwen2-math-72b|[qwen/Qwen2-Math-72B](https://modelscope.cn/models/qwen/Qwen2-Math-72B/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen2-Math-72B](https://huggingface.co/Qwen/Qwen2-Math-72B)|
|qwen2-math-72b-instruct|[qwen/Qwen2-Math-72B-Instruct](https://modelscope.cn/models/qwen/Qwen2-Math-72B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/Qwen2-Math-72B-Instruct](https://huggingface.co/Qwen/Qwen2-Math-72B-Instruct)|
|qwen2_5-0_5b|[qwen/Qwen2.5-0.5B](https://modelscope.cn/models/qwen/Qwen2.5-0.5B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B)|
|qwen2_5-1_5b|[qwen/Qwen2.5-1.5B](https://modelscope.cn/models/qwen/Qwen2.5-1.5B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B)|
|qwen2_5-3b|[qwen/Qwen2.5-3B](https://modelscope.cn/models/qwen/Qwen2.5-3B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-3B](https://huggingface.co/Qwen/Qwen2.5-3B)|
|qwen2_5-7b|[qwen/Qwen2.5-7B](https://modelscope.cn/models/qwen/Qwen2.5-7B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B)|
|qwen2_5-14b|[qwen/Qwen2.5-14B](https://modelscope.cn/models/qwen/Qwen2.5-14B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B)|
|qwen2_5-32b|[qwen/Qwen2.5-32B](https://modelscope.cn/models/qwen/Qwen2.5-32B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B)|
|qwen2_5-72b|[qwen/Qwen2.5-72B](https://modelscope.cn/models/qwen/Qwen2.5-72B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-72B](https://huggingface.co/Qwen/Qwen2.5-72B)|
|qwen2_5-0_5b-instruct|[qwen/Qwen2.5-0.5B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-0.5B-Instruct/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)|
|qwen2_5-1_5b-instruct|[qwen/Qwen2.5-1.5B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-1.5B-Instruct/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct)|
|qwen2_5-3b-instruct|[qwen/Qwen2.5-3B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-3B-Instruct/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)|
|qwen2_5-7b-instruct|[qwen/Qwen2.5-7B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-7B-Instruct/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)|
|qwen2_5-14b-instruct|[qwen/Qwen2.5-14B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-14B-Instruct/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)|
|qwen2_5-32b-instruct|[qwen/Qwen2.5-32B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-32B-Instruct/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct)|
|qwen2_5-72b-instruct|[qwen/Qwen2.5-72B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-72B-Instruct/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)|
|qwen2_5-0_5b-instruct-gptq-int4|[qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4)|
|qwen2_5-1_5b-instruct-gptq-int4|[qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4)|
|qwen2_5-3b-instruct-gptq-int4|[qwen/Qwen2.5-3B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2.5-3B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-3B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-GPTQ-Int4)|
|qwen2_5-7b-instruct-gptq-int4|[qwen/Qwen2.5-7B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2.5-7B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4)|
|qwen2_5-14b-instruct-gptq-int4|[qwen/Qwen2.5-14B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2.5-14B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4)|
|qwen2_5-32b-instruct-gptq-int4|[qwen/Qwen2.5-32B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2.5-32B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4)|
|qwen2_5-72b-instruct-gptq-int4|[qwen/Qwen2.5-72B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2.5-72B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-72B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct-GPTQ-Int4)|
|qwen2_5-0_5b-instruct-gptq-int8|[qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8)|
|qwen2_5-1_5b-instruct-gptq-int8|[qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8)|
|qwen2_5-3b-instruct-gptq-int8|[qwen/Qwen2.5-3B-Instruct-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen2.5-3B-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-3B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-GPTQ-Int8)|
|qwen2_5-7b-instruct-gptq-int8|[qwen/Qwen2.5-7B-Instruct-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen2.5-7B-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8)|
|qwen2_5-14b-instruct-gptq-int8|[qwen/Qwen2.5-14B-Instruct-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen2.5-14B-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8)|
|qwen2_5-32b-instruct-gptq-int8|[qwen/Qwen2.5-32B-Instruct-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen2.5-32B-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8)|
|qwen2_5-72b-instruct-gptq-int8|[qwen/Qwen2.5-72B-Instruct-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen2.5-72B-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-72B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct-GPTQ-Int8)|
|qwen2_5-0_5b-instruct-awq|[qwen/Qwen2.5-0.5B-Instruct-AWQ](https://modelscope.cn/models/qwen/Qwen2.5-0.5B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✘|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen2.5-0.5B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-AWQ)|
|qwen2_5-1_5b-instruct-awq|[qwen/Qwen2.5-1.5B-Instruct-AWQ](https://modelscope.cn/models/qwen/Qwen2.5-1.5B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen2.5-1.5B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-AWQ)|
|qwen2_5-3b-instruct-awq|[qwen/Qwen2.5-3B-Instruct-AWQ](https://modelscope.cn/models/qwen/Qwen2.5-3B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen2.5-3B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-AWQ)|
|qwen2_5-7b-instruct-awq|[qwen/Qwen2.5-7B-Instruct-AWQ](https://modelscope.cn/models/qwen/Qwen2.5-7B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen2.5-7B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-AWQ)|
|qwen2_5-14b-instruct-awq|[qwen/Qwen2.5-14B-Instruct-AWQ](https://modelscope.cn/models/qwen/Qwen2.5-14B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen2.5-14B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct-AWQ)|
|qwen2_5-32b-instruct-awq|[qwen/Qwen2.5-32B-Instruct-AWQ](https://modelscope.cn/models/qwen/Qwen2.5-32B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen2.5-32B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct-AWQ)|
|qwen2_5-72b-instruct-awq|[qwen/Qwen2.5-72B-Instruct-AWQ](https://modelscope.cn/models/qwen/Qwen2.5-72B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen2.5-72B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct-AWQ)|
|qwen2_5-math-1_5b|[qwen/Qwen2.5-Math-1.5B](https://modelscope.cn/models/qwen/Qwen2.5-Math-1.5B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-Math-1.5B](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B)|
|qwen2_5-math-7b|[qwen/Qwen2.5-Math-7B](https://modelscope.cn/models/qwen/Qwen2.5-Math-7B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B)|
|qwen2_5-math-72b|[qwen/Qwen2.5-Math-72B](https://modelscope.cn/models/qwen/Qwen2.5-Math-72B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-Math-72B](https://huggingface.co/Qwen/Qwen2.5-Math-72B)|
|qwen2_5-math-1_5b-instruct|[qwen/Qwen2.5-Math-1.5B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-Math-1.5B-Instruct/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-Math-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B-Instruct)|
|qwen2_5-math-7b-instruct|[qwen/Qwen2.5-Math-7B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-Math-7B-Instruct/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct)|
|qwen2_5-math-72b-instruct|[qwen/Qwen2.5-Math-72B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-Math-72B-Instruct/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-Math-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-72B-Instruct)|
|qwen2_5-coder-0_5b|[qwen/Qwen2.5-Coder-0.5B](https://modelscope.cn/models/qwen/Qwen2.5-Coder-0.5B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-Coder-0.5B](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B)|
|qwen2_5-coder-0_5b-instruct|[qwen/Qwen2.5-Coder-0.5B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-Coder-0.5B-Instruct/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-Coder-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct)|
|qwen2_5-coder-0_5b-instruct-gptq-int4|[qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4)|
|qwen2_5-coder-0_5b-instruct-gptq-int8|[qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8)|
|qwen2_5-coder-0_5b-instruct-awq|[qwen/Qwen2.5-Coder-0.5B-Instruct-AWQ](https://modelscope.cn/models/qwen/Qwen2.5-Coder-0.5B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen2.5-0.5B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-AWQ)|
|qwen2_5-coder-1_5b|[qwen/Qwen2.5-Coder-1.5B](https://modelscope.cn/models/qwen/Qwen2.5-Coder-1.5B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-Coder-1.5B](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B)|
|qwen2_5-coder-1_5b-instruct|[qwen/Qwen2.5-Coder-1.5B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-Coder-1.5B-Instruct/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct)|
|qwen2_5-coder-1_5b-instruct-gptq-int4|[qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4)|
|qwen2_5-coder-1_5b-instruct-gptq-int8|[qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8)|
|qwen2_5-coder-1_5b-instruct-awq|[qwen/Qwen2.5-Coder-1.5B-Instruct-AWQ](https://modelscope.cn/models/qwen/Qwen2.5-Coder-1.5B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen2.5-1.5B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-AWQ)|
|qwen2_5-coder-3b|[qwen/Qwen2.5-Coder-3B](https://modelscope.cn/models/qwen/Qwen2.5-Coder-3B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-Coder-3B](https://huggingface.co/Qwen/Qwen2.5-Coder-3B)|
|qwen2_5-coder-3b-instruct|[qwen/Qwen2.5-Coder-3B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-Coder-3B-Instruct/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-Coder-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct)|
|qwen2_5-coder-3b-instruct-gptq-int4|[qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-3B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-GPTQ-Int4)|
|qwen2_5-coder-3b-instruct-gptq-int8|[qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-3B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-GPTQ-Int8)|
|qwen2_5-coder-3b-instruct-awq|[qwen/Qwen2.5-Coder-3B-Instruct-AWQ](https://modelscope.cn/models/qwen/Qwen2.5-Coder-3B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen2.5-3B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-AWQ)|
|qwen2_5-coder-7b|[qwen/Qwen2.5-Coder-7B](https://modelscope.cn/models/qwen/Qwen2.5-Coder-7B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-Coder-7B](https://huggingface.co/Qwen/Qwen2.5-Coder-7B)|
|qwen2_5-coder-7b-instruct|[qwen/Qwen2.5-Coder-7B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-Coder-7B-Instruct/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct)|
|qwen2_5-coder-7b-instruct-gptq-int4|[qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4)|
|qwen2_5-coder-7b-instruct-gptq-int8|[qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8)|
|qwen2_5-coder-7b-instruct-awq|[qwen/Qwen2.5-Coder-7B-Instruct-AWQ](https://modelscope.cn/models/qwen/Qwen2.5-Coder-7B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen2.5-7B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-AWQ)|
|qwen2_5-coder-14b|[qwen/Qwen2.5-Coder-14B](https://modelscope.cn/models/qwen/Qwen2.5-Coder-14B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-Coder-14B](https://huggingface.co/Qwen/Qwen2.5-Coder-14B)|
|qwen2_5-coder-14b-instruct|[qwen/Qwen2.5-Coder-14B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-Coder-14B-Instruct/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-Coder-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct)|
|qwen2_5-coder-14b-instruct-gptq-int4|[qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4)|
|qwen2_5-coder-14b-instruct-gptq-int8|[qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8)|
|qwen2_5-coder-14b-instruct-awq|[qwen/Qwen2.5-Coder-14B-Instruct-AWQ](https://modelscope.cn/models/qwen/Qwen2.5-Coder-14B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen2.5-14B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct-AWQ)|
|qwen2_5-coder-32b|[qwen/Qwen2.5-Coder-32B](https://modelscope.cn/models/qwen/Qwen2.5-Coder-32B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-Coder-32B](https://huggingface.co/Qwen/Qwen2.5-Coder-32B)|
|qwen2_5-coder-32b-instruct|[qwen/Qwen2.5-Coder-32B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-Coder-32B-Instruct/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37|-|[Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct)|
|qwen2_5-coder-32b-instruct-gptq-int4|[qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4)|
|qwen2_5-coder-32b-instruct-gptq-int8|[qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✘|✘|auto_gptq>=0.5, transformers>=4.37|-|[Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8)|
|qwen2_5-coder-32b-instruct-awq|[qwen/Qwen2.5-Coder-32B-Instruct-AWQ](https://modelscope.cn/models/qwen/Qwen2.5-Coder-32B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|qwen2_5|✔|✔|✔|✘|transformers>=4.37, autoawq|-|[Qwen/Qwen2.5-32B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct-AWQ)|
|qwq-32b-preview|[Qwen/QwQ-32B-Preview](https://modelscope.cn/models/Qwen/QwQ-32B-Preview/summary)|q_proj, k_proj, v_proj|qwq|✔|✔|✔|✔|transformers>=4.37|-|[Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview)|
|marco-o1|[AIDC-AI/Marco-o1](https://modelscope.cn/models/AIDC-AI/Marco-o1/summary)|q_proj, k_proj, v_proj|marco_o1|✔|✔|✔|✘|transformers>=4.37|-|[AIDC-AI/Marco-o1](https://huggingface.co/AIDC-AI/Marco-o1)|
|chatglm2-6b|[ZhipuAI/chatglm2-6b](https://modelscope.cn/models/ZhipuAI/chatglm2-6b/summary)|query_key_value|chatglm2|✘|✔|✘|✘|transformers<4.42|-|[THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b)|
|chatglm2-6b-32k|[ZhipuAI/chatglm2-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm2-6b-32k/summary)|query_key_value|chatglm2|✘|✔|✘|✘|transformers<4.42|-|[THUDM/chatglm2-6b-32k](https://huggingface.co/THUDM/chatglm2-6b-32k)|
|chatglm3-6b-base|[ZhipuAI/chatglm3-6b-base](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-base/summary)|query_key_value|chatglm-generation|✘|✔|✘|✘|transformers<4.42|-|[THUDM/chatglm3-6b-base](https://huggingface.co/THUDM/chatglm3-6b-base)|
|chatglm3-6b|[ZhipuAI/chatglm3-6b](https://modelscope.cn/models/ZhipuAI/chatglm3-6b/summary)|query_key_value|chatglm3|✘|✔|✘|✘|transformers<4.42|-|[THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b)|
|chatglm3-6b-32k|[ZhipuAI/chatglm3-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-32k/summary)|query_key_value|chatglm3|✘|✔|✘|✘|transformers<4.42|-|[THUDM/chatglm3-6b-32k](https://huggingface.co/THUDM/chatglm3-6b-32k)|
|chatglm3-6b-128k|[ZhipuAI/chatglm3-6b-128k](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-128k/summary)|query_key_value|chatglm3|✘|✔|✘|✘|transformers<4.42|-|[THUDM/chatglm3-6b-128k](https://huggingface.co/THUDM/chatglm3-6b-128k)|
|codegeex2-6b|[ZhipuAI/codegeex2-6b](https://modelscope.cn/models/ZhipuAI/codegeex2-6b/summary)|query_key_value|chatglm-generation|✘|✔|✘|✘|transformers<4.34|coding|[THUDM/codegeex2-6b](https://huggingface.co/THUDM/codegeex2-6b)|
|glm4-9b|[ZhipuAI/glm-4-9b](https://modelscope.cn/models/ZhipuAI/glm-4-9b/summary)|query_key_value|chatglm-generation|✔|✔|✔|✘|transformers>=4.42|-|[THUDM/glm-4-9b](https://huggingface.co/THUDM/glm-4-9b)|
|glm4-9b-chat|[ZhipuAI/glm-4-9b-chat](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat/summary)|query_key_value|chatglm4|✔|✔|✔|✘|transformers>=4.42|-|[THUDM/glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat)|
|glm4-9b-chat-1m|[ZhipuAI/glm-4-9b-chat-1m](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m/summary)|query_key_value|chatglm4|✔|✔|✔|✘|transformers>=4.42|-|[THUDM/glm-4-9b-chat-1m](https://huggingface.co/THUDM/glm-4-9b-chat-1m)|
|codegeex4-9b-chat|[ZhipuAI/codegeex4-all-9b](https://modelscope.cn/models/ZhipuAI/codegeex4-all-9b/summary)|query_key_value|codegeex4|✔|✔|✔|✘|transformers<4.42|coding|[THUDM/codegeex4-all-9b](https://huggingface.co/THUDM/codegeex4-all-9b)|
|glm-edge-1_5b-chat|[ZhipuAI/glm-edge-1.5b-chat](https://modelscope.cn/models/ZhipuAI/glm-edge-1.5b-chat/summary)|q_proj, k_proj, v_proj|chatglm4|✔|✘|✘|✘|transformers>=4.46|-|[THUDM/glm-edge-1.5b-chat](https://huggingface.co/THUDM/glm-edge-1.5b-chat)|
|glm-edge-4b-chat|[ZhipuAI/glm-edge-4b-chat](https://modelscope.cn/models/ZhipuAI/glm-edge-4b-chat/summary)|q_proj, k_proj, v_proj|chatglm4|✔|✘|✘|✘|transformers>=4.46|-|[THUDM/glm-edge-4b-chat](https://huggingface.co/THUDM/glm-edge-4b-chat)|
|llama2-7b|[modelscope/Llama-2-7b-ms](https://modelscope.cn/models/modelscope/Llama-2-7b-ms/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)|
|llama2-7b-chat|[modelscope/Llama-2-7b-chat-ms](https://modelscope.cn/models/modelscope/Llama-2-7b-chat-ms/summary)|q_proj, k_proj, v_proj|llama|✔|✔|✔|✘||-|[meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)|
|llama2-13b|[modelscope/Llama-2-13b-ms](https://modelscope.cn/models/modelscope/Llama-2-13b-ms/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[meta-llama/Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf)|
|llama2-13b-chat|[modelscope/Llama-2-13b-chat-ms](https://modelscope.cn/models/modelscope/Llama-2-13b-chat-ms/summary)|q_proj, k_proj, v_proj|llama|✔|✔|✔|✘||-|[meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf)|
|llama2-70b|[modelscope/Llama-2-70b-ms](https://modelscope.cn/models/modelscope/Llama-2-70b-ms/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[meta-llama/Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf)|
|llama2-70b-chat|[modelscope/Llama-2-70b-chat-ms](https://modelscope.cn/models/modelscope/Llama-2-70b-chat-ms/summary)|q_proj, k_proj, v_proj|llama|✔|✔|✔|✘||-|[meta-llama/Llama-2-70b-chat-hf](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf)|
|llama2-7b-aqlm-2bit-1x16|[AI-ModelScope/Llama-2-7b-AQLM-2Bit-1x16-hf](https://modelscope.cn/models/AI-ModelScope/Llama-2-7b-AQLM-2Bit-1x16-hf/summary)|q_proj, k_proj, v_proj|default-generation|✔|✘|✘|✘|transformers>=4.38, aqlm, torch>=2.2.0|-|[ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf](https://huggingface.co/ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf)|
|llama3-8b|[LLM-Research/Meta-Llama-3-8B](https://modelscope.cn/models/LLM-Research/Meta-Llama-3-8B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B)|
|llama3-8b-instruct|[LLM-Research/Meta-Llama-3-8B-Instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3-8B-Instruct/summary)|q_proj, k_proj, v_proj|llama3|✔|✔|✔|✘||-|[meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)|
|llama3-8b-instruct-int4|[swift/Meta-Llama-3-8B-Instruct-GPTQ-Int4](https://modelscope.cn/models/swift/Meta-Llama-3-8B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|llama3|✔|✔|✘|✘|auto_gptq|-|[study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int4](https://huggingface.co/study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int4)|
|llama3-8b-instruct-int8|[swift/Meta-Llama-3-8B-Instruct-GPTQ-Int8](https://modelscope.cn/models/swift/Meta-Llama-3-8B-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|llama3|✔|✔|✘|✘|auto_gptq|-|[study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int8](https://huggingface.co/study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int8)|
|llama3-8b-instruct-awq|[swift/Meta-Llama-3-8B-Instruct-AWQ](https://modelscope.cn/models/swift/Meta-Llama-3-8B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|llama3|✔|✔|✔|✘|autoawq|-|[study-hjt/Meta-Llama-3-8B-Instruct-AWQ](https://huggingface.co/study-hjt/Meta-Llama-3-8B-Instruct-AWQ)|
|llama3-70b|[LLM-Research/Meta-Llama-3-70B](https://modelscope.cn/models/LLM-Research/Meta-Llama-3-70B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[meta-llama/Meta-Llama-3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B)|
|llama3-70b-instruct|[LLM-Research/Meta-Llama-3-70B-Instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3-70B-Instruct/summary)|q_proj, k_proj, v_proj|llama3|✔|✔|✔|✘||-|[meta-llama/Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)|
|llama3-70b-instruct-int4|[swift/Meta-Llama-3-70B-Instruct-GPTQ-Int4](https://modelscope.cn/models/swift/Meta-Llama-3-70B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|llama3|✔|✔|✘|✘|auto_gptq|-|[study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int4](https://huggingface.co/study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int4)|
|llama3-70b-instruct-int8|[swift/Meta-Llama-3-70b-Instruct-GPTQ-Int8](https://modelscope.cn/models/swift/Meta-Llama-3-70b-Instruct-GPTQ-Int8/summary)|q_proj, k_proj, v_proj|llama3|✔|✔|✘|✘|auto_gptq|-|[study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int8](https://huggingface.co/study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int8)|
|llama3-70b-instruct-awq|[swift/Meta-Llama-3-70B-Instruct-AWQ](https://modelscope.cn/models/swift/Meta-Llama-3-70B-Instruct-AWQ/summary)|q_proj, k_proj, v_proj|llama3|✔|✔|✔|✘|autoawq|-|[study-hjt/Meta-Llama-3-70B-Instruct-AWQ](https://huggingface.co/study-hjt/Meta-Llama-3-70B-Instruct-AWQ)|
|llama3_1-8b|[LLM-Research/Meta-Llama-3.1-8B](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.43|-|[meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B)|
|llama3_1-8b-instruct|[LLM-Research/Meta-Llama-3.1-8B-Instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B-Instruct/summary)|q_proj, k_proj, v_proj|llama3|✔|✔|✔|✘|transformers>=4.43|-|[meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)|
|llama3_1-8b-instruct-awq|[LLM-Research/Meta-Llama-3.1-8B-Instruct-AWQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B-Instruct-AWQ-INT4/summary)|q_proj, k_proj, v_proj|llama3|✔|✔|✘|✘|transformers>=4.43, autoawq|-|[hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4)|
|llama3_1-8b-instruct-gptq-int4|[LLM-Research/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4/summary)|q_proj, k_proj, v_proj|llama3|✔|✔|✘|✘|transformers>=4.43, auto_gptq|-|[hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4)|
|llama3_1-8b-instruct-bnb|[LLM-Research/Meta-Llama-3.1-8B-Instruct-BNB-NF4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-8B-Instruct-BNB-NF4/summary)|q_proj, k_proj, v_proj|llama3|✔|✔|✘|✘|transformers>=4.43, bitsandbytes|-|[hugging-quants/Meta-Llama-3.1-8B-Instruct-BNB-NF4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-BNB-NF4)|
|llama3_1-70b|[LLM-Research/Meta-Llama-3.1-70B](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.43|-|[meta-llama/Meta-Llama-3.1-70B](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B)|
|llama3_1-70b-instruct|[LLM-Research/Meta-Llama-3.1-70B-Instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct/summary)|q_proj, k_proj, v_proj|llama3|✔|✔|✔|✘|transformers>=4.43|-|[meta-llama/Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct)|
|llama3_1-70b-instruct-fp8|[LLM-Research/Meta-Llama-3.1-70B-Instruct-FP8](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct-FP8/summary)|q_proj, k_proj, v_proj|llama3|✔|✔|✘|✘|transformers>=4.43|-|[meta-llama/Meta-Llama-3.1-70B-Instruct-FP8](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct-FP8)|
|llama3_1-70b-instruct-awq|[LLM-Research/Meta-Llama-3.1-70B-Instruct-AWQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct-AWQ-INT4/summary)|q_proj, k_proj, v_proj|llama3|✔|✔|✔|✘|transformers>=4.43, autoawq|-|[hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4)|
|llama3_1-70b-instruct-gptq-int4|[LLM-Research/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4/summary)|q_proj, k_proj, v_proj|llama3|✔|✔|✘|✘|transformers>=4.43, auto_gptq|-|[hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4)|
|llama3_1-70b-instruct-bnb|[LLM-Research/Meta-Llama-3.1-70B-Instruct-bnb-4bit](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-70B-Instruct-bnb-4bit/summary)|q_proj, k_proj, v_proj|llama3|✔|✔|✘|✘|transformers>=4.43, bitsandbytes|-|[unsloth/Meta-Llama-3.1-70B-Instruct-bnb-4bit](https://huggingface.co/unsloth/Meta-Llama-3.1-70B-Instruct-bnb-4bit)|
|llama3_1-405b|[LLM-Research/Meta-Llama-3.1-405B](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.43|-|[meta-llama/Meta-Llama-3.1-405B](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B)|
|llama3_1-405b-instruct|[LLM-Research/Meta-Llama-3.1-405B-Instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct/summary)|q_proj, k_proj, v_proj|llama3|✔|✔|✔|✘|transformers>=4.43|-|[meta-llama/Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct)|
|llama3_1-405b-instruct-fp8|[LLM-Research/Meta-Llama-3.1-405B-Instruct-FP8](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-FP8/summary)|q_proj, k_proj, v_proj|llama3|✔|✔|✘|✘|transformers>=4.43|-|[meta-llama/Meta-Llama-3.1-405B-Instruct-FP8](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct-FP8)|
|llama3_1-405b-instruct-awq|[LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4/summary)|q_proj, k_proj, v_proj|llama3|✔|✔|✔|✘|transformers>=4.43, autoawq|-|[hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4)|
|llama3_1-405b-instruct-gptq-int4|[LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4/summary)|q_proj, k_proj, v_proj|llama3|✔|✔|✘|✘|transformers>=4.43, auto_gptq|-|[hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4)|
|llama3_1-405b-instruct-bnb|[LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4/summary)|q_proj, k_proj, v_proj|llama3|✔|✔|✘|✘|transformers>=4.43, bitsandbytes|-|[hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4)|
|llama-3.1-nemotron-70B-instruct-hf|[AI-ModelScope/Llama-3.1-Nemotron-70B-Instruct-HF](https://modelscope.cn/models/AI-ModelScope/Llama-3.1-Nemotron-70B-Instruct-HF/summary)|q_proj, k_proj, v_proj|llama3|✔|✔|✔|✘|transformers>=4.43|-|[nvidia/Llama-3.1-Nemotron-70B-Instruct-HF](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF)|
|llama3_2-1b|[LLM-Research/Llama-3.2-1B](https://modelscope.cn/models/LLM-Research/Llama-3.2-1B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.45|-|[meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)|
|llama3_2-1b-instruct|[LLM-Research/Llama-3.2-1B-Instruct](https://modelscope.cn/models/LLM-Research/Llama-3.2-1B-Instruct/summary)|q_proj, k_proj, v_proj|llama3_2|✔|✔|✔|✘|transformers>=4.45|-|[meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)|
|llama3_2-3b|[LLM-Research/Llama-3.2-3B](https://modelscope.cn/models/LLM-Research/Llama-3.2-3B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.45|-|[meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B)|
|llama3_2-3b-instruct|[LLM-Research/Llama-3.2-3B-Instruct](https://modelscope.cn/models/LLM-Research/Llama-3.2-3B-Instruct/summary)|q_proj, k_proj, v_proj|llama3_2|✔|✔|✔|✘|transformers>=4.45|-|[meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)|
|reflection-llama_3_1-70b|[LLM-Research/Reflection-Llama-3.1-70B](https://modelscope.cn/models/LLM-Research/Reflection-Llama-3.1-70B/summary)|q_proj, k_proj, v_proj|reflection|✔|✔|✘|✘|transformers>=4.43|-|[mattshumer/Reflection-Llama-3.1-70B](https://huggingface.co/mattshumer/Reflection-Llama-3.1-70B)|
|longwriter-glm4-9b|[ZhipuAI/LongWriter-glm4-9b](https://modelscope.cn/models/ZhipuAI/LongWriter-glm4-9b/summary)|query_key_value|chatglm4|✔|✔|✔|✘|transformers>=4.42|-|[THUDM/LongWriter-glm4-9b](https://huggingface.co/THUDM/LongWriter-glm4-9b)|
|longwriter-llama3_1-8b|[ZhipuAI/LongWriter-llama3.1-8b](https://modelscope.cn/models/ZhipuAI/LongWriter-llama3.1-8b/summary)|q_proj, k_proj, v_proj|longwriter-llama3|✔|✔|✔|✘|transformers>=4.43|-|[THUDM/LongWriter-llama3.1-8b](https://huggingface.co/THUDM/LongWriter-llama3.1-8b)|
|chinese-llama-2-1_3b|[AI-ModelScope/chinese-llama-2-1.3b](https://modelscope.cn/models/AI-ModelScope/chinese-llama-2-1.3b/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[hfl/chinese-llama-2-1.3b](https://huggingface.co/hfl/chinese-llama-2-1.3b)|
|chinese-llama-2-7b|[AI-ModelScope/chinese-llama-2-7b](https://modelscope.cn/models/AI-ModelScope/chinese-llama-2-7b/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[hfl/chinese-llama-2-7b](https://huggingface.co/hfl/chinese-llama-2-7b)|
|chinese-llama-2-7b-16k|[AI-ModelScope/chinese-llama-2-7b-16k](https://modelscope.cn/models/AI-ModelScope/chinese-llama-2-7b-16k/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[hfl/chinese-llama-2-7b-16k](https://huggingface.co/hfl/chinese-llama-2-7b-16k)|
|chinese-llama-2-7b-64k|[AI-ModelScope/chinese-llama-2-7b-64k](https://modelscope.cn/models/AI-ModelScope/chinese-llama-2-7b-64k/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[hfl/chinese-llama-2-7b-64k](https://huggingface.co/hfl/chinese-llama-2-7b-64k)|
|chinese-llama-2-13b|[AI-ModelScope/chinese-llama-2-13b](https://modelscope.cn/models/AI-ModelScope/chinese-llama-2-13b/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[hfl/chinese-llama-2-13b](https://huggingface.co/hfl/chinese-llama-2-13b)|
|chinese-llama-2-13b-16k|[AI-ModelScope/chinese-llama-2-13b-16k](https://modelscope.cn/models/AI-ModelScope/chinese-llama-2-13b-16k/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[hfl/chinese-llama-2-13b-16k](https://huggingface.co/hfl/chinese-llama-2-13b-16k)|
|chinese-alpaca-2-1_3b|[AI-ModelScope/chinese-alpaca-2-1.3b](https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-1.3b/summary)|q_proj, k_proj, v_proj|llama|✔|✔|✔|✘||-|[hfl/chinese-alpaca-2-1.3b](https://huggingface.co/hfl/chinese-alpaca-2-1.3b)|
|chinese-alpaca-2-7b|[AI-ModelScope/chinese-alpaca-2-7b](https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-7b/summary)|q_proj, k_proj, v_proj|llama|✔|✔|✔|✘||-|[hfl/chinese-alpaca-2-7b](https://huggingface.co/hfl/chinese-alpaca-2-7b)|
|chinese-alpaca-2-7b-16k|[AI-ModelScope/chinese-alpaca-2-7b-16k](https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-7b-16k/summary)|q_proj, k_proj, v_proj|llama|✔|✔|✔|✘||-|[hfl/chinese-alpaca-2-7b-16k](https://huggingface.co/hfl/chinese-alpaca-2-7b-16k)|
|chinese-alpaca-2-7b-64k|[AI-ModelScope/chinese-alpaca-2-7b-64k](https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-7b-64k/summary)|q_proj, k_proj, v_proj|llama|✔|✔|✔|✘||-|[hfl/chinese-alpaca-2-7b-64k](https://huggingface.co/hfl/chinese-alpaca-2-7b-64k)|
|chinese-alpaca-2-13b|[AI-ModelScope/chinese-alpaca-2-13b](https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-13b/summary)|q_proj, k_proj, v_proj|llama|✔|✔|✔|✘||-|[hfl/chinese-alpaca-2-13b](https://huggingface.co/hfl/chinese-alpaca-2-13b)|
|chinese-alpaca-2-13b-16k|[AI-ModelScope/chinese-alpaca-2-13b-16k](https://modelscope.cn/models/AI-ModelScope/chinese-alpaca-2-13b-16k/summary)|q_proj, k_proj, v_proj|llama|✔|✔|✔|✘||-|[hfl/chinese-alpaca-2-13b-16k](https://huggingface.co/hfl/chinese-alpaca-2-13b-16k)|
|llama-3-chinese-8b|[ChineseAlpacaGroup/llama-3-chinese-8b](https://modelscope.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[hfl/llama-3-chinese-8b](https://huggingface.co/hfl/llama-3-chinese-8b)|
|llama-3-chinese-8b-instruct|[ChineseAlpacaGroup/llama-3-chinese-8b-instruct](https://modelscope.cn/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct/summary)|q_proj, k_proj, v_proj|llama3|✔|✔|✔|✘||-|[hfl/llama-3-chinese-8b-instruct](https://huggingface.co/hfl/llama-3-chinese-8b-instruct)|
|atom-7b|[FlagAlpha/Atom-7B](https://modelscope.cn/models/FlagAlpha/Atom-7B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✘|✘||-|[FlagAlpha/Atom-7B](https://huggingface.co/FlagAlpha/Atom-7B)|
|atom-7b-chat|[FlagAlpha/Atom-7B-Chat](https://modelscope.cn/models/FlagAlpha/Atom-7B-Chat/summary)|q_proj, k_proj, v_proj|atom|✔|✔|✘|✘||-|[FlagAlpha/Atom-7B-Chat](https://huggingface.co/FlagAlpha/Atom-7B-Chat)|
|yi-6b|[01ai/Yi-6B](https://modelscope.cn/models/01ai/Yi-6B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[01-ai/Yi-6B](https://huggingface.co/01-ai/Yi-6B)|
|yi-6b-200k|[01ai/Yi-6B-200K](https://modelscope.cn/models/01ai/Yi-6B-200K/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[01-ai/Yi-6B-200K](https://huggingface.co/01-ai/Yi-6B-200K)|
|yi-6b-chat|[01ai/Yi-6B-Chat](https://modelscope.cn/models/01ai/Yi-6B-Chat/summary)|q_proj, k_proj, v_proj|chatml|✔|✔|✔|✘||-|[01-ai/Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)|
|yi-6b-chat-awq|[01ai/Yi-6B-Chat-4bits](https://modelscope.cn/models/01ai/Yi-6B-Chat-4bits/summary)|q_proj, k_proj, v_proj|chatml|✔|✔|✔|✘|autoawq|-|[01-ai/Yi-6B-Chat-4bits](https://huggingface.co/01-ai/Yi-6B-Chat-4bits)|
|yi-6b-chat-int8|[01ai/Yi-6B-Chat-8bits](https://modelscope.cn/models/01ai/Yi-6B-Chat-8bits/summary)|q_proj, k_proj, v_proj|chatml|✔|✔|✘|✘|auto_gptq|-|[01-ai/Yi-6B-Chat-8bits](https://huggingface.co/01-ai/Yi-6B-Chat-8bits)|
|yi-9b|[01ai/Yi-9B](https://modelscope.cn/models/01ai/Yi-9B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[01-ai/Yi-9B](https://huggingface.co/01-ai/Yi-9B)|
|yi-9b-200k|[01ai/Yi-9B-200K](https://modelscope.cn/models/01ai/Yi-9B-200K/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[01-ai/Yi-9B-200K](https://huggingface.co/01-ai/Yi-9B-200K)|
|yi-34b|[01ai/Yi-34B](https://modelscope.cn/models/01ai/Yi-34B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[01-ai/Yi-34B](https://huggingface.co/01-ai/Yi-34B)|
|yi-34b-200k|[01ai/Yi-34B-200K](https://modelscope.cn/models/01ai/Yi-34B-200K/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[01-ai/Yi-34B-200K](https://huggingface.co/01-ai/Yi-34B-200K)|
|yi-34b-chat|[01ai/Yi-34B-Chat](https://modelscope.cn/models/01ai/Yi-34B-Chat/summary)|q_proj, k_proj, v_proj|chatml|✔|✔|✔|✘||-|[01-ai/Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat)|
|yi-34b-chat-awq|[01ai/Yi-34B-Chat-4bits](https://modelscope.cn/models/01ai/Yi-34B-Chat-4bits/summary)|q_proj, k_proj, v_proj|chatml|✔|✔|✔|✘|autoawq|-|[01-ai/Yi-34B-Chat-4bits](https://huggingface.co/01-ai/Yi-34B-Chat-4bits)|
|yi-34b-chat-int8|[01ai/Yi-34B-Chat-8bits](https://modelscope.cn/models/01ai/Yi-34B-Chat-8bits/summary)|q_proj, k_proj, v_proj|chatml|✔|✔|✘|✘|auto_gptq|-|[01-ai/Yi-34B-Chat-8bits](https://huggingface.co/01-ai/Yi-34B-Chat-8bits)|
|yi-1_5-6b|[01ai/Yi-1.5-6B](https://modelscope.cn/models/01ai/Yi-1.5-6B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[01-ai/Yi-1.5-6B](https://huggingface.co/01-ai/Yi-1.5-6B)|
|yi-1_5-6b-chat|[01ai/Yi-1.5-6B-Chat](https://modelscope.cn/models/01ai/Yi-1.5-6B-Chat/summary)|q_proj, k_proj, v_proj|chatml|✔|✔|✔|✘||-|[01-ai/Yi-1.5-6B-Chat](https://huggingface.co/01-ai/Yi-1.5-6B-Chat)|
|yi-1_5-9b|[01ai/Yi-1.5-9B](https://modelscope.cn/models/01ai/Yi-1.5-9B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[01-ai/Yi-1.5-9B](https://huggingface.co/01-ai/Yi-1.5-9B)|
|yi-1_5-9b-chat|[01ai/Yi-1.5-9B-Chat](https://modelscope.cn/models/01ai/Yi-1.5-9B-Chat/summary)|q_proj, k_proj, v_proj|chatml|✔|✔|✔|✘||-|[01-ai/Yi-1.5-9B-Chat](https://huggingface.co/01-ai/Yi-1.5-9B-Chat)|
|yi-1_5-9b-chat-16k|[01ai/Yi-1.5-9B-Chat-16K](https://modelscope.cn/models/01ai/Yi-1.5-9B-Chat-16K/summary)|q_proj, k_proj, v_proj|chatml|✔|✔|✔|✘||-|[01-ai/Yi-1.5-9B-Chat-16K](https://huggingface.co/01-ai/Yi-1.5-9B-Chat-16K)|
|yi-1_5-34b|[01ai/Yi-1.5-34B](https://modelscope.cn/models/01ai/Yi-1.5-34B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[01-ai/Yi-1.5-34B](https://huggingface.co/01-ai/Yi-1.5-34B)|
|yi-1_5-34b-chat|[01ai/Yi-1.5-34B-Chat](https://modelscope.cn/models/01ai/Yi-1.5-34B-Chat/summary)|q_proj, k_proj, v_proj|chatml|✔|✔|✔|✘||-|[01-ai/Yi-1.5-34B-Chat](https://huggingface.co/01-ai/Yi-1.5-34B-Chat)|
|yi-1_5-34b-chat-16k|[01ai/Yi-1.5-34B-Chat-16K](https://modelscope.cn/models/01ai/Yi-1.5-34B-Chat-16K/summary)|q_proj, k_proj, v_proj|chatml|✔|✔|✔|✘||-|[01-ai/Yi-1.5-34B-Chat-16K](https://huggingface.co/01-ai/Yi-1.5-34B-Chat-16K)|
|yi-1_5-6b-chat-awq-int4|[AI-ModelScope/Yi-1.5-6B-Chat-AWQ](https://modelscope.cn/models/AI-ModelScope/Yi-1.5-6B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|chatml|✔|✔|✔|✘|autoawq|-|[modelscope/Yi-1.5-6B-Chat-AWQ](https://huggingface.co/modelscope/Yi-1.5-6B-Chat-AWQ)|
|yi-1_5-6b-chat-gptq-int4|[AI-ModelScope/Yi-1.5-6B-Chat-GPTQ](https://modelscope.cn/models/AI-ModelScope/Yi-1.5-6B-Chat-GPTQ/summary)|q_proj, k_proj, v_proj|chatml|✔|✔|✘|✘|auto_gptq>=0.5|-|[modelscope/Yi-1.5-6B-Chat-GPTQ](https://huggingface.co/modelscope/Yi-1.5-6B-Chat-GPTQ)|
|yi-1_5-9b-chat-awq-int4|[AI-ModelScope/Yi-1.5-9B-Chat-AWQ](https://modelscope.cn/models/AI-ModelScope/Yi-1.5-9B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|chatml|✔|✔|✔|✘|autoawq|-|[modelscope/Yi-1.5-9B-Chat-AWQ](https://huggingface.co/modelscope/Yi-1.5-9B-Chat-AWQ)|
|yi-1_5-9b-chat-gptq-int4|[AI-ModelScope/Yi-1.5-9B-Chat-GPTQ](https://modelscope.cn/models/AI-ModelScope/Yi-1.5-9B-Chat-GPTQ/summary)|q_proj, k_proj, v_proj|chatml|✔|✔|✘|✘|auto_gptq>=0.5|-|[modelscope/Yi-1.5-9B-Chat-GPTQ](https://huggingface.co/modelscope/Yi-1.5-9B-Chat-GPTQ)|
|yi-1_5-34b-chat-awq-int4|[AI-ModelScope/Yi-1.5-34B-Chat-AWQ](https://modelscope.cn/models/AI-ModelScope/Yi-1.5-34B-Chat-AWQ/summary)|q_proj, k_proj, v_proj|chatml|✔|✔|✔|✘|autoawq|-|[modelscope/Yi-1.5-34B-Chat-AWQ](https://huggingface.co/modelscope/Yi-1.5-34B-Chat-AWQ)|
|yi-1_5-34b-chat-gptq-int4|[AI-ModelScope/Yi-1.5-34B-Chat-GPTQ](https://modelscope.cn/models/AI-ModelScope/Yi-1.5-34B-Chat-GPTQ/summary)|q_proj, k_proj, v_proj|chatml|✔|✔|✘|✘|auto_gptq>=0.5|-|[modelscope/Yi-1.5-34B-Chat-GPTQ](https://huggingface.co/modelscope/Yi-1.5-34B-Chat-GPTQ)|
|yi-coder-1_5b|[01ai/Yi-Coder-1.5B](https://modelscope.cn/models/01ai/Yi-Coder-1.5B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[01-ai/Yi-Coder-1.5B](https://huggingface.co/01-ai/Yi-Coder-1.5B)|
|yi-coder-1_5b-chat|[01ai/Yi-Coder-1.5B-Chat](https://modelscope.cn/models/01ai/Yi-Coder-1.5B-Chat/summary)|q_proj, k_proj, v_proj|yi-coder|✔|✔|✔|✘||-|[01-ai/Yi-Coder-1.5B-Chat](https://huggingface.co/01-ai/Yi-Coder-1.5B-Chat)|
|yi-coder-9b|[01ai/Yi-Coder-9B](https://modelscope.cn/models/01ai/Yi-Coder-9B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[01-ai/Yi-Coder-9B](https://huggingface.co/01-ai/Yi-Coder-9B)|
|yi-coder-9b-chat|[01ai/Yi-Coder-9B-Chat](https://modelscope.cn/models/01ai/Yi-Coder-9B-Chat/summary)|q_proj, k_proj, v_proj|yi-coder|✔|✔|✔|✘||-|[01-ai/Yi-Coder-9B-Chat](https://huggingface.co/01-ai/Yi-Coder-9B-Chat)|
|internlm-7b|[Shanghai_AI_Laboratory/internlm-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-7b/summary)|q_proj, k_proj, v_proj|default-generation|✘|✔|✔|✘||-|[internlm/internlm-7b](https://huggingface.co/internlm/internlm-7b)|
|internlm-7b-chat|[Shanghai_AI_Laboratory/internlm-chat-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b/summary)|q_proj, k_proj, v_proj|internlm|✘|✔|✔|✘||-|[internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b)|
|internlm-7b-chat-8k|[Shanghai_AI_Laboratory/internlm-chat-7b-8k](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-7b-8k/summary)|q_proj, k_proj, v_proj|internlm|✘|✔|✔|✘||-|-|
|internlm-20b|[Shanghai_AI_Laboratory/internlm-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-20b/summary)|q_proj, k_proj, v_proj|default-generation|✘|✔|✔|✘||-|[internlm/internlm-20b](https://huggingface.co/internlm/internlm-20b)|
|internlm-20b-chat|[Shanghai_AI_Laboratory/internlm-chat-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-chat-20b/summary)|q_proj, k_proj, v_proj|internlm|✘|✔|✔|✘||-|[internlm/internlm-chat-20b](https://huggingface.co/internlm/internlm-chat-20b)|
|internlm2-1_8b|[Shanghai_AI_Laboratory/internlm2-1_8b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-1_8b/summary)|wqkv|default-generation|✔|✔|✔|✘|transformers>=4.38|-|[internlm/internlm2-1_8b](https://huggingface.co/internlm/internlm2-1_8b)|
|internlm2-1_8b-sft-chat|[Shanghai_AI_Laboratory/internlm2-chat-1_8b-sft](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-1_8b-sft/summary)|wqkv|internlm2|✔|✔|✔|✘|transformers>=4.38|-|[internlm/internlm2-chat-1_8b-sft](https://huggingface.co/internlm/internlm2-chat-1_8b-sft)|
|internlm2-1_8b-chat|[Shanghai_AI_Laboratory/internlm2-chat-1_8b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-1_8b/summary)|wqkv|internlm2|✔|✔|✔|✘|transformers>=4.38|-|[internlm/internlm2-chat-1_8b](https://huggingface.co/internlm/internlm2-chat-1_8b)|
|internlm2-7b-base|[Shanghai_AI_Laboratory/internlm2-base-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-base-7b/summary)|wqkv|default-generation|✔|✔|✔|✘|transformers>=4.38|-|[internlm/internlm2-base-7b](https://huggingface.co/internlm/internlm2-base-7b)|
|internlm2-7b|[Shanghai_AI_Laboratory/internlm2-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-7b/summary)|wqkv|default-generation|✔|✔|✔|✘|transformers>=4.38|-|[internlm/internlm2-7b](https://huggingface.co/internlm/internlm2-7b)|
|internlm2-7b-sft-chat|[Shanghai_AI_Laboratory/internlm2-chat-7b-sft](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-7b-sft/summary)|wqkv|internlm2|✔|✔|✔|✘|transformers>=4.38|-|[internlm/internlm2-chat-7b-sft](https://huggingface.co/internlm/internlm2-chat-7b-sft)|
|internlm2-7b-chat|[Shanghai_AI_Laboratory/internlm2-chat-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-7b/summary)|wqkv|internlm2|✔|✔|✔|✘|transformers>=4.38|-|[internlm/internlm2-chat-7b](https://huggingface.co/internlm/internlm2-chat-7b)|
|internlm2-20b-base|[Shanghai_AI_Laboratory/internlm2-base-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-base-20b/summary)|wqkv|default-generation|✔|✔|✔|✘|transformers>=4.38|-|[internlm/internlm2-base-20b](https://huggingface.co/internlm/internlm2-base-20b)|
|internlm2-20b|[Shanghai_AI_Laboratory/internlm2-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-20b/summary)|wqkv|default-generation|✔|✔|✔|✘|transformers>=4.38|-|[internlm/internlm2-20b](https://huggingface.co/internlm/internlm2-20b)|
|internlm2-20b-sft-chat|[Shanghai_AI_Laboratory/internlm2-chat-20b-sft](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-20b-sft/summary)|wqkv|internlm2|✔|✔|✔|✘|transformers>=4.38|-|[internlm/internlm2-chat-20b-sft](https://huggingface.co/internlm/internlm2-chat-20b-sft)|
|internlm2-20b-chat|[Shanghai_AI_Laboratory/internlm2-chat-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-chat-20b/summary)|wqkv|internlm2|✔|✔|✔|✘|transformers>=4.38|-|[internlm/internlm2-chat-20b](https://huggingface.co/internlm/internlm2-chat-20b)|
|internlm2_5-1_8b|[Shanghai_AI_Laboratory/internlm2_5-1_8b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-1_8b/summary)|wqkv|default-generation|✔|✔|✔|✘|transformers>=4.38|-|[internlm/internlm2_5-1_8b](https://huggingface.co/internlm/internlm2_5-1_8b)|
|internlm2_5-1_8b-chat|[Shanghai_AI_Laboratory/internlm2_5-1_8b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-1_8b-chat/summary)|wqkv|internlm2|✔|✔|✔|✘|transformers>=4.38|-|[internlm/internlm2_5-1_8b-chat](https://huggingface.co/internlm/internlm2_5-1_8b-chat)|
|internlm2_5-7b|[Shanghai_AI_Laboratory/internlm2_5-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-7b/summary)|wqkv|default-generation|✔|✔|✔|✘|transformers>=4.38|-|[internlm/internlm2_5-7b](https://huggingface.co/internlm/internlm2_5-7b)|
|internlm2_5-7b-chat|[Shanghai_AI_Laboratory/internlm2_5-7b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-7b-chat/summary)|wqkv|internlm2|✔|✔|✔|✘|transformers>=4.38|-|[internlm/internlm2_5-7b-chat](https://huggingface.co/internlm/internlm2_5-7b-chat)|
|internlm2_5-7b-chat-1m|[Shanghai_AI_Laboratory/internlm2_5-7b-chat-1m](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-7b-chat-1m/summary)|wqkv|internlm2|✔|✔|✔|✘|transformers>=4.38|-|[internlm/internlm2_5-7b-chat-1m](https://huggingface.co/internlm/internlm2_5-7b-chat-1m)|
|internlm2_5-20b|[Shanghai_AI_Laboratory/internlm2_5-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-20b/summary)|wqkv|default-generation|✔|✔|✔|✘|transformers>=4.38|-|[internlm/internlm2_5-20b](https://huggingface.co/internlm/internlm2_5-20b)|
|internlm2_5-20b-chat|[Shanghai_AI_Laboratory/internlm2_5-20b-chat](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2_5-20b-chat/summary)|wqkv|internlm2|✔|✔|✔|✘|transformers>=4.38|-|[internlm/internlm2_5-20b-chat](https://huggingface.co/internlm/internlm2_5-20b-chat)|
|internlm2-math-7b|[Shanghai_AI_Laboratory/internlm2-math-base-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-base-7b/summary)|wqkv|default-generation|✔|✔|✔|✘|transformers>=4.38|math|[internlm/internlm2-math-base-7b](https://huggingface.co/internlm/internlm2-math-base-7b)|
|internlm2-math-7b-chat|[Shanghai_AI_Laboratory/internlm2-math-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-7b/summary)|wqkv|internlm2|✔|✔|✔|✘|transformers>=4.38|math|[internlm/internlm2-math-7b](https://huggingface.co/internlm/internlm2-math-7b)|
|internlm2-math-20b|[Shanghai_AI_Laboratory/internlm2-math-base-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-base-20b/summary)|wqkv|default-generation|✔|✔|✔|✘|transformers>=4.38|math|[internlm/internlm2-math-base-20b](https://huggingface.co/internlm/internlm2-math-base-20b)|
|internlm2-math-20b-chat|[Shanghai_AI_Laboratory/internlm2-math-20b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-math-20b/summary)|wqkv|internlm2|✔|✔|✔|✘|transformers>=4.38|math|[internlm/internlm2-math-20b](https://huggingface.co/internlm/internlm2-math-20b)|
|deepseek-7b|[deepseek-ai/deepseek-llm-7b-base](https://modelscope.cn/models/deepseek-ai/deepseek-llm-7b-base/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[deepseek-ai/deepseek-llm-7b-base](https://huggingface.co/deepseek-ai/deepseek-llm-7b-base)|
|deepseek-7b-chat|[deepseek-ai/deepseek-llm-7b-chat](https://modelscope.cn/models/deepseek-ai/deepseek-llm-7b-chat/summary)|q_proj, k_proj, v_proj|deepseek|✔|✔|✔|✘||-|[deepseek-ai/deepseek-llm-7b-chat](https://huggingface.co/deepseek-ai/deepseek-llm-7b-chat)|
|deepseek-moe-16b|[deepseek-ai/deepseek-moe-16b-base](https://modelscope.cn/models/deepseek-ai/deepseek-moe-16b-base/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✘|✘||moe|[deepseek-ai/deepseek-moe-16b-base](https://huggingface.co/deepseek-ai/deepseek-moe-16b-base)|
|deepseek-moe-16b-chat|[deepseek-ai/deepseek-moe-16b-chat](https://modelscope.cn/models/deepseek-ai/deepseek-moe-16b-chat/summary)|q_proj, k_proj, v_proj|deepseek|✔|✔|✘|✘||moe|[deepseek-ai/deepseek-moe-16b-chat](https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat)|
|deepseek-67b|[deepseek-ai/deepseek-llm-67b-base](https://modelscope.cn/models/deepseek-ai/deepseek-llm-67b-base/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[deepseek-ai/deepseek-llm-67b-base](https://huggingface.co/deepseek-ai/deepseek-llm-67b-base)|
|deepseek-67b-chat|[deepseek-ai/deepseek-llm-67b-chat](https://modelscope.cn/models/deepseek-ai/deepseek-llm-67b-chat/summary)|q_proj, k_proj, v_proj|deepseek|✔|✔|✔|✘||-|[deepseek-ai/deepseek-llm-67b-chat](https://huggingface.co/deepseek-ai/deepseek-llm-67b-chat)|
|deepseek-coder-1_3b|[deepseek-ai/deepseek-coder-1.3b-base](https://modelscope.cn/models/deepseek-ai/deepseek-coder-1.3b-base/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||coding|[deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base)|
|deepseek-coder-1_3b-instruct|[deepseek-ai/deepseek-coder-1.3b-instruct](https://modelscope.cn/models/deepseek-ai/deepseek-coder-1.3b-instruct/summary)|q_proj, k_proj, v_proj|deepseek-coder|✔|✔|✔|✘||coding|[deepseek-ai/deepseek-coder-1.3b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-instruct)|
|deepseek-coder-6_7b|[deepseek-ai/deepseek-coder-6.7b-base](https://modelscope.cn/models/deepseek-ai/deepseek-coder-6.7b-base/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||coding|[deepseek-ai/deepseek-coder-6.7b-base](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-base)|
|deepseek-coder-6_7b-instruct|[deepseek-ai/deepseek-coder-6.7b-instruct](https://modelscope.cn/models/deepseek-ai/deepseek-coder-6.7b-instruct/summary)|q_proj, k_proj, v_proj|deepseek-coder|✔|✔|✔|✘||coding|[deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct)|
|deepseek-coder-33b|[deepseek-ai/deepseek-coder-33b-base](https://modelscope.cn/models/deepseek-ai/deepseek-coder-33b-base/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||coding|[deepseek-ai/deepseek-coder-33b-base](https://huggingface.co/deepseek-ai/deepseek-coder-33b-base)|
|deepseek-coder-33b-instruct|[deepseek-ai/deepseek-coder-33b-instruct](https://modelscope.cn/models/deepseek-ai/deepseek-coder-33b-instruct/summary)|q_proj, k_proj, v_proj|deepseek-coder|✔|✔|✔|✘||coding|[deepseek-ai/deepseek-coder-33b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-33b-instruct)|
|deepseek-coder-v2-instruct|[deepseek-ai/DeepSeek-Coder-V2-Instruct](https://modelscope.cn/models/deepseek-ai/DeepSeek-Coder-V2-Instruct/summary)|q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj|deepseek2|✔|✔|✘|✘|transformers>=4.39.3|coding, moe|[deepseek-ai/DeepSeek-Coder-V2-Instruct](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Instruct)|
|deepseek-coder-v2-lite-instruct|[deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct](https://modelscope.cn/models/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct/summary)|q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj|deepseek2|✔|✔|✘|✘|transformers>=4.39.3|coding, moe|[deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct)|
|deepseek-coder-v2|[deepseek-ai/DeepSeek-Coder-V2-Base](https://modelscope.cn/models/deepseek-ai/DeepSeek-Coder-V2-Base/summary)|q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj|default-generation|✔|✔|✘|✘|transformers>=4.39.3|coding, moe|[deepseek-ai/DeepSeek-Coder-V2-Base](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Base)|
|deepseek-coder-v2-lite|[deepseek-ai/DeepSeek-Coder-V2-Lite-Base](https://modelscope.cn/models/deepseek-ai/DeepSeek-Coder-V2-Lite-Base/summary)|q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj|default-generation|✔|✔|✘|✘|transformers>=4.39.3|coding, moe|[deepseek-ai/DeepSeek-Coder-V2-Lite-Base](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Base)|
|deepseek-math-7b|[deepseek-ai/deepseek-math-7b-base](https://modelscope.cn/models/deepseek-ai/deepseek-math-7b-base/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||math|[deepseek-ai/deepseek-math-7b-base](https://huggingface.co/deepseek-ai/deepseek-math-7b-base)|
|deepseek-math-7b-instruct|[deepseek-ai/deepseek-math-7b-instruct](https://modelscope.cn/models/deepseek-ai/deepseek-math-7b-instruct/summary)|q_proj, k_proj, v_proj|deepseek|✔|✔|✔|✘||math|[deepseek-ai/deepseek-math-7b-instruct](https://huggingface.co/deepseek-ai/deepseek-math-7b-instruct)|
|deepseek-math-7b-chat|[deepseek-ai/deepseek-math-7b-rl](https://modelscope.cn/models/deepseek-ai/deepseek-math-7b-rl/summary)|q_proj, k_proj, v_proj|deepseek|✔|✔|✔|✘||math|[deepseek-ai/deepseek-math-7b-rl](https://huggingface.co/deepseek-ai/deepseek-math-7b-rl)|
|numina-math-7b|[AI-ModelScope/NuminaMath-7B-TIR](https://modelscope.cn/models/AI-ModelScope/NuminaMath-7B-TIR/summary)|q_proj, k_proj, v_proj|numina-math|✔|✔|✘|✘||math|[AI-MO/NuminaMath-7B-TIR](https://huggingface.co/AI-MO/NuminaMath-7B-TIR)|
|deepseek-v2|[deepseek-ai/DeepSeek-V2](https://modelscope.cn/models/deepseek-ai/DeepSeek-V2/summary)|q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj|default-generation|✔|✔|✘|✘|transformers>=4.39.3|moe|[deepseek-ai/DeepSeek-V2](https://huggingface.co/deepseek-ai/DeepSeek-V2)|
|deepseek-v2-chat|[deepseek-ai/DeepSeek-V2-Chat](https://modelscope.cn/models/deepseek-ai/DeepSeek-V2-Chat/summary)|q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj|deepseek2|✔|✔|✘|✘|transformers>=4.39.3|moe|[deepseek-ai/DeepSeek-V2-Chat](https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat)|
|deepseek-v2-lite|[deepseek-ai/DeepSeek-V2-Lite](https://modelscope.cn/models/deepseek-ai/DeepSeek-V2-Lite/summary)|q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj|default-generation|✔|✔|✘|✘|transformers>=4.39.3|moe|[deepseek-ai/DeepSeek-V2-Lite](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite)|
|deepseek-v2-lite-chat|[deepseek-ai/DeepSeek-V2-Lite-Chat](https://modelscope.cn/models/deepseek-ai/DeepSeek-V2-Lite-Chat/summary)|q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj|deepseek2|✔|✔|✘|✘|transformers>=4.39.3|moe|[deepseek-ai/DeepSeek-V2-Lite-Chat](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite-Chat)|
|deepseek-v2_5|[deepseek-ai/DeepSeek-V2.5](https://modelscope.cn/models/deepseek-ai/DeepSeek-V2.5/summary)|q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj|deepseek2_5|✔|✔|✘|✘|transformers>=4.39.3|moe|[deepseek-ai/DeepSeek-V2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5)|
|gemma-2b|[AI-ModelScope/gemma-2b](https://modelscope.cn/models/AI-ModelScope/gemma-2b/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✘|✘|transformers>=4.38|-|[google/gemma-2b](https://huggingface.co/google/gemma-2b)|
|gemma-7b|[AI-ModelScope/gemma-7b](https://modelscope.cn/models/AI-ModelScope/gemma-7b/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✘|✘|transformers>=4.38|-|[google/gemma-7b](https://huggingface.co/google/gemma-7b)|
|gemma-2b-instruct|[AI-ModelScope/gemma-2b-it](https://modelscope.cn/models/AI-ModelScope/gemma-2b-it/summary)|q_proj, k_proj, v_proj|gemma|✔|✔|✘|✘|transformers>=4.38|-|[google/gemma-2b-it](https://huggingface.co/google/gemma-2b-it)|
|gemma-7b-instruct|[AI-ModelScope/gemma-7b-it](https://modelscope.cn/models/AI-ModelScope/gemma-7b-it/summary)|q_proj, k_proj, v_proj|gemma|✔|✔|✘|✘|transformers>=4.38|-|[google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it)|
|gemma2-2b|[LLM-Research/gemma-2-2b](https://modelscope.cn/models/LLM-Research/gemma-2-2b/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✘|✘|transformers>=4.42|-|[google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b)|
|gemma2-9b|[LLM-Research/gemma-2-9b](https://modelscope.cn/models/LLM-Research/gemma-2-9b/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✘|✘|transformers>=4.42|-|[google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b)|
|gemma2-27b|[LLM-Research/gemma-2-27b](https://modelscope.cn/models/LLM-Research/gemma-2-27b/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✘|✘|transformers>=4.42|-|[google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b)|
|gemma2-2b-instruct|[LLM-Research/gemma-2-2b-it](https://modelscope.cn/models/LLM-Research/gemma-2-2b-it/summary)|q_proj, k_proj, v_proj|gemma|✔|✔|✘|✘|transformers>=4.42|-|[google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it)|
|gemma2-9b-instruct|[LLM-Research/gemma-2-9b-it](https://modelscope.cn/models/LLM-Research/gemma-2-9b-it/summary)|q_proj, k_proj, v_proj|gemma|✔|✔|✘|✘|transformers>=4.42|-|[google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it)|
|gemma2-27b-instruct|[LLM-Research/gemma-2-27b-it](https://modelscope.cn/models/LLM-Research/gemma-2-27b-it/summary)|q_proj, k_proj, v_proj|gemma|✔|✔|✘|✘|transformers>=4.42|-|[google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it)|
|minicpm-1b-sft-chat|[OpenBMB/MiniCPM-1B-sft-bf16](https://modelscope.cn/models/OpenBMB/MiniCPM-1B-sft-bf16/summary)|q_proj, k_proj, v_proj|minicpm|✔|✔|✘|✘|transformers>=4.36.0|-|[openbmb/MiniCPM-1B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16)|
|minicpm-2b-sft-chat|[OpenBMB/MiniCPM-2B-sft-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-sft-fp32/summary)|q_proj, k_proj, v_proj|minicpm|✔|✔|✘|✘||-|[openbmb/MiniCPM-2B-sft-fp32](https://huggingface.co/openbmb/MiniCPM-2B-sft-fp32)|
|minicpm-2b-chat|[OpenBMB/MiniCPM-2B-dpo-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-fp32/summary)|q_proj, k_proj, v_proj|minicpm|✔|✔|✘|✘||-|[openbmb/MiniCPM-2B-dpo-fp32](https://huggingface.co/openbmb/MiniCPM-2B-dpo-fp32)|
|minicpm-2b-128k|[OpenBMB/MiniCPM-2B-128k](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-128k/summary)|q_proj, k_proj, v_proj|chatml|✔|✔|✘|✘|transformers>=4.36.0|-|[openbmb/MiniCPM-2B-128k](https://huggingface.co/openbmb/MiniCPM-2B-128k)|
|minicpm-moe-8x2b|[OpenBMB/MiniCPM-MoE-8x2B](https://modelscope.cn/models/OpenBMB/MiniCPM-MoE-8x2B/summary)|q_proj, k_proj, v_proj|minicpm|✔|✔|✘|✘|transformers>=4.36.0|moe|[openbmb/MiniCPM-MoE-8x2B](https://huggingface.co/openbmb/MiniCPM-MoE-8x2B)|
|minicpm3-4b|[OpenBMB/MiniCPM3-4B](https://modelscope.cn/models/OpenBMB/MiniCPM3-4B/summary)|q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj|chatml|✔|✘|✘|✘|transformers>=4.36|-|[openbmb/MiniCPM3-4B](https://huggingface.co/openbmb/MiniCPM3-4B)|
|openbuddy-llama-65b-chat|[OpenBuddy/openbuddy-llama-65b-v8-bf16](https://modelscope.cn/models/OpenBuddy/openbuddy-llama-65b-v8-bf16/summary)|q_proj, k_proj, v_proj|openbuddy|✔|✔|✔|✘||-|[OpenBuddy/openbuddy-llama-65b-v8-bf16](https://huggingface.co/OpenBuddy/openbuddy-llama-65b-v8-bf16)|
|openbuddy-llama2-13b-chat|[OpenBuddy/openbuddy-llama2-13b-v8.1-fp16](https://modelscope.cn/models/OpenBuddy/openbuddy-llama2-13b-v8.1-fp16/summary)|q_proj, k_proj, v_proj|openbuddy|✔|✔|✔|✘||-|[OpenBuddy/openbuddy-llama2-13b-v8.1-fp16](https://huggingface.co/OpenBuddy/openbuddy-llama2-13b-v8.1-fp16)|
|openbuddy-llama2-70b-chat|[OpenBuddy/openbuddy-llama2-70b-v10.1-bf16](https://modelscope.cn/models/OpenBuddy/openbuddy-llama2-70b-v10.1-bf16/summary)|q_proj, k_proj, v_proj|openbuddy|✔|✔|✔|✘||-|[OpenBuddy/openbuddy-llama2-70b-v10.1-bf16](https://huggingface.co/OpenBuddy/openbuddy-llama2-70b-v10.1-bf16)|
|openbuddy-llama3-8b-chat|[OpenBuddy/openbuddy-llama3-8b-v21.1-8k](https://modelscope.cn/models/OpenBuddy/openbuddy-llama3-8b-v21.1-8k/summary)|q_proj, k_proj, v_proj|openbuddy2|✔|✔|✔|✘||-|[OpenBuddy/openbuddy-llama3-8b-v21.1-8k](https://huggingface.co/OpenBuddy/openbuddy-llama3-8b-v21.1-8k)|
|openbuddy-llama3-70b-chat|[OpenBuddy/openbuddy-llama3-70b-v21.1-8k](https://modelscope.cn/models/OpenBuddy/openbuddy-llama3-70b-v21.1-8k/summary)|q_proj, k_proj, v_proj|openbuddy2|✔|✔|✔|✘||-|[OpenBuddy/openbuddy-llama3-70b-v21.1-8k](https://huggingface.co/OpenBuddy/openbuddy-llama3-70b-v21.1-8k)|
|openbuddy-mistral-7b-chat|[OpenBuddy/openbuddy-mistral-7b-v17.1-32k](https://modelscope.cn/models/OpenBuddy/openbuddy-mistral-7b-v17.1-32k/summary)|q_proj, k_proj, v_proj|openbuddy|✔|✔|✔|✘|transformers>=4.34|-|[OpenBuddy/openbuddy-mistral-7b-v17.1-32k](https://huggingface.co/OpenBuddy/openbuddy-mistral-7b-v17.1-32k)|
|openbuddy-zephyr-7b-chat|[OpenBuddy/openbuddy-zephyr-7b-v14.1](https://modelscope.cn/models/OpenBuddy/openbuddy-zephyr-7b-v14.1/summary)|q_proj, k_proj, v_proj|openbuddy|✔|✔|✔|✘|transformers>=4.34|-|[OpenBuddy/openbuddy-zephyr-7b-v14.1](https://huggingface.co/OpenBuddy/openbuddy-zephyr-7b-v14.1)|
|openbuddy-deepseek-67b-chat|[OpenBuddy/openbuddy-deepseek-67b-v15.2](https://modelscope.cn/models/OpenBuddy/openbuddy-deepseek-67b-v15.2/summary)|q_proj, k_proj, v_proj|openbuddy|✔|✔|✔|✘||-|[OpenBuddy/openbuddy-deepseek-67b-v15.2](https://huggingface.co/OpenBuddy/openbuddy-deepseek-67b-v15.2)|
|openbuddy-mixtral-moe-7b-chat|[OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k](https://modelscope.cn/models/OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k/summary)|q_proj, k_proj, v_proj|openbuddy|✔|✔|✘|✘|transformers>=4.36|moe|[OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k](https://huggingface.co/OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k)|
|openbuddy-llama3_1-8b-chat|[OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k](https://modelscope.cn/models/OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k/summary)|q_proj, k_proj, v_proj|openbuddy2|✔|✔|✔|✘|transformers>=4.43|-|[OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k](https://huggingface.co/OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k)|
|mistral-7b|[AI-ModelScope/Mistral-7B-v0.1](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-v0.1/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.34|-|[mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)|
|mistral-7b-v2|[AI-ModelScope/Mistral-7B-v0.2-hf](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-v0.2-hf/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘|transformers>=4.34|-|[alpindale/Mistral-7B-v0.2-hf](https://huggingface.co/alpindale/Mistral-7B-v0.2-hf)|
|mistral-7b-instruct|[AI-ModelScope/Mistral-7B-Instruct-v0.1](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.1/summary)|q_proj, k_proj, v_proj|llama|✔|✔|✔|✘|transformers>=4.34|-|[mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)|
|mistral-7b-instruct-v2|[AI-ModelScope/Mistral-7B-Instruct-v0.2](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.2/summary)|q_proj, k_proj, v_proj|llama|✔|✔|✔|✘|transformers>=4.34|-|[mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)|
|mistral-7b-instruct-v3|[LLM-Research/Mistral-7B-Instruct-v0.3](https://modelscope.cn/models/LLM-Research/Mistral-7B-Instruct-v0.3/summary)|q_proj, k_proj, v_proj|llama|✔|✔|✔|✘|transformers>=4.34|-|[mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)|
|mistral-nemo-base-2407|[AI-ModelScope/Mistral-Nemo-Base-2407](https://modelscope.cn/models/AI-ModelScope/Mistral-Nemo-Base-2407/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✘|✘|transformers>=4.43|-|[mistralai/Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407)|
|mistral-nemo-instruct-2407|[AI-ModelScope/Mistral-Nemo-Instruct-2407](https://modelscope.cn/models/AI-ModelScope/Mistral-Nemo-Instruct-2407/summary)|q_proj, k_proj, v_proj|mistral-nemo|✔|✔|✘|✘|transformers>=4.43|-|[mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)|
|mistral-large-instruct-2407|[LLM-Research/Mistral-Large-Instruct-2407](https://modelscope.cn/models/LLM-Research/Mistral-Large-Instruct-2407/summary)|q_proj, k_proj, v_proj|mistral-nemo|✔|✔|✘|✘|transformers>=4.43|-|[mistralai/Mistral-Large-Instruct-2407](https://huggingface.co/mistralai/Mistral-Large-Instruct-2407)|
|mistral-small-instruct-2409|[AI-ModelScope/Mistral-Small-Instruct-2409](https://modelscope.cn/models/AI-ModelScope/Mistral-Small-Instruct-2409/summary)|q_proj, k_proj, v_proj|mistral-nemo|✔|✔|✘|✘|transformers>=4.43|-|[mistralai/Mistral-Small-Instruct-2409](https://huggingface.co/mistralai/Mistral-Small-Instruct-2409)|
|mixtral-moe-7b|[AI-ModelScope/Mixtral-8x7B-v0.1](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-v0.1/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✘|✘|transformers>=4.36|moe|[mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)|
|mixtral-moe-7b-instruct|[AI-ModelScope/Mixtral-8x7B-Instruct-v0.1](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-Instruct-v0.1/summary)|q_proj, k_proj, v_proj|llama|✔|✔|✘|✘|transformers>=4.36|moe|[mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)|
|mixtral-moe-7b-aqlm-2bit-1x16|[AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf/summary)|q_proj, k_proj, v_proj|default-generation|✔|✘|✘|✘|transformers>=4.38, aqlm, torch>=2.2.0|moe|[ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf](https://huggingface.co/ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf)|
|mixtral-moe-8x22b-v1|[AI-ModelScope/Mixtral-8x22B-v0.1](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x22B-v0.1/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✘|✘|transformers>=4.36|moe|[mistral-community/Mixtral-8x22B-v0.1](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1)|
|ministral-8b-instruct-2410|[AI-ModelScope/Ministral-8B-Instruct-2410](https://modelscope.cn/models/AI-ModelScope/Ministral-8B-Instruct-2410/summary)|q_proj, k_proj, v_proj|mistral-nemo|✔|✔|✘|✘|transformers>=4.46|-|[mistralai/Ministral-8B-Instruct-2410](https://huggingface.co/mistralai/Ministral-8B-Instruct-2410)|
|wizardlm2-7b-awq|[AI-ModelScope/WizardLM-2-7B-AWQ](https://modelscope.cn/models/AI-ModelScope/WizardLM-2-7B-AWQ/summary)|q_proj, k_proj, v_proj|wizardlm2-awq|✔|✔|✘|✘|transformers>=4.34|-|[MaziyarPanahi/WizardLM-2-7B-AWQ](https://huggingface.co/MaziyarPanahi/WizardLM-2-7B-AWQ)|
|wizardlm2-8x22b|[AI-ModelScope/WizardLM-2-8x22B](https://modelscope.cn/models/AI-ModelScope/WizardLM-2-8x22B/summary)|q_proj, k_proj, v_proj|wizardlm2|✔|✔|✘|✘|transformers>=4.36|-|[alpindale/WizardLM-2-8x22B](https://huggingface.co/alpindale/WizardLM-2-8x22B)|
|baichuan-7b|[baichuan-inc/baichuan-7B](https://modelscope.cn/models/baichuan-inc/baichuan-7B/summary)|W_pack|default-generation|✘|✔|✔|✘|transformers<4.34|-|[baichuan-inc/Baichuan-7B](https://huggingface.co/baichuan-inc/Baichuan-7B)|
|baichuan-13b|[baichuan-inc/Baichuan-13B-Base](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Base/summary)|W_pack|default-generation|✘|✔|✔|✘|transformers<4.34|-|[baichuan-inc/Baichuan-13B-Base](https://huggingface.co/baichuan-inc/Baichuan-13B-Base)|
|baichuan-13b-chat|[baichuan-inc/Baichuan-13B-Chat](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Chat/summary)|W_pack|baichuan|✘|✔|✔|✘|transformers<4.34|-|[baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat)|
|baichuan2-7b|[baichuan-inc/Baichuan2-7B-Base](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Base/summary)|W_pack|default-generation|✘|✔|✔|✘||-|[baichuan-inc/Baichuan2-7B-Base](https://huggingface.co/baichuan-inc/Baichuan2-7B-Base)|
|baichuan2-7b-chat|[baichuan-inc/Baichuan2-7B-Chat](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat/summary)|W_pack|baichuan|✘|✔|✔|✘||-|[baichuan-inc/Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat)|
|baichuan2-7b-chat-int4|[baichuan-inc/Baichuan2-7B-Chat-4bits](https://modelscope.cn/models/baichuan-inc/Baichuan2-7B-Chat-4bits/summary)|W_pack|baichuan|✘|✘|✘|✘|bitsandbytes<0.41.2, accelerate<0.26|-|[baichuan-inc/Baichuan2-7B-Chat-4bits](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat-4bits)|
|baichuan2-13b|[baichuan-inc/Baichuan2-13B-Base](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Base/summary)|W_pack|default-generation|✘|✔|✔|✘||-|[baichuan-inc/Baichuan2-13B-Base](https://huggingface.co/baichuan-inc/Baichuan2-13B-Base)|
|baichuan2-13b-chat|[baichuan-inc/Baichuan2-13B-Chat](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Chat/summary)|W_pack|baichuan|✘|✔|✔|✘||-|[baichuan-inc/Baichuan2-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat)|
|baichuan2-13b-chat-int4|[baichuan-inc/Baichuan2-13B-Chat-4bits](https://modelscope.cn/models/baichuan-inc/Baichuan2-13B-Chat-4bits/summary)|W_pack|baichuan|✘|✘|✘|✘|bitsandbytes<0.41.2, accelerate<0.26|-|[baichuan-inc/Baichuan2-13B-Chat-4bits](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat-4bits)|
|yuan2-2b-instruct|[YuanLLM/Yuan2.0-2B-hf](https://modelscope.cn/models/YuanLLM/Yuan2.0-2B-hf/summary)|q_proj, k_proj, v_proj|yuan|✔|✘|✘|✘||-|[IEITYuan/Yuan2-2B-hf](https://huggingface.co/IEITYuan/Yuan2-2B-hf)|
|yuan2-2b-janus-instruct|[YuanLLM/Yuan2-2B-Janus-hf](https://modelscope.cn/models/YuanLLM/Yuan2-2B-Janus-hf/summary)|q_proj, k_proj, v_proj|yuan|✔|✘|✘|✘||-|[IEITYuan/Yuan2-2B-Janus-hf](https://huggingface.co/IEITYuan/Yuan2-2B-Janus-hf)|
|yuan2-51b-instruct|[YuanLLM/Yuan2.0-51B-hf](https://modelscope.cn/models/YuanLLM/Yuan2.0-51B-hf/summary)|q_proj, k_proj, v_proj|yuan|✔|✘|✘|✘||-|[IEITYuan/Yuan2-51B-hf](https://huggingface.co/IEITYuan/Yuan2-51B-hf)|
|yuan2-102b-instruct|[YuanLLM/Yuan2.0-102B-hf](https://modelscope.cn/models/YuanLLM/Yuan2.0-102B-hf/summary)|q_proj, k_proj, v_proj|yuan|✔|✘|✘|✘||-|[IEITYuan/Yuan2-102B-hf](https://huggingface.co/IEITYuan/Yuan2-102B-hf)|
|yuan2-m32|[YuanLLM/Yuan2-M32-hf](https://modelscope.cn/models/YuanLLM/Yuan2-M32-hf/summary)|q_proj, k_proj, v_proj|yuan|✔|✘|✘|✘||moe|[IEITYuan/Yuan2-M32-hf](https://huggingface.co/IEITYuan/Yuan2-M32-hf)|
|xverse-7b|[xverse/XVERSE-7B](https://modelscope.cn/models/xverse/XVERSE-7B/summary)|q_proj, k_proj, v_proj|default-generation|✘|✔|✘|✘||-|[xverse/XVERSE-7B](https://huggingface.co/xverse/XVERSE-7B)|
|xverse-7b-chat|[xverse/XVERSE-7B-Chat](https://modelscope.cn/models/xverse/XVERSE-7B-Chat/summary)|q_proj, k_proj, v_proj|xverse|✘|✔|✘|✘||-|[xverse/XVERSE-7B-Chat](https://huggingface.co/xverse/XVERSE-7B-Chat)|
|xverse-13b|[xverse/XVERSE-13B](https://modelscope.cn/models/xverse/XVERSE-13B/summary)|q_proj, k_proj, v_proj|default-generation|✘|✔|✘|✘||-|[xverse/XVERSE-13B](https://huggingface.co/xverse/XVERSE-13B)|
|xverse-13b-chat|[xverse/XVERSE-13B-Chat](https://modelscope.cn/models/xverse/XVERSE-13B-Chat/summary)|q_proj, k_proj, v_proj|xverse|✘|✔|✘|✘||-|[xverse/XVERSE-13B-Chat](https://huggingface.co/xverse/XVERSE-13B-Chat)|
|xverse-65b|[xverse/XVERSE-65B](https://modelscope.cn/models/xverse/XVERSE-65B/summary)|q_proj, k_proj, v_proj|default-generation|✘|✔|✘|✘||-|[xverse/XVERSE-65B](https://huggingface.co/xverse/XVERSE-65B)|
|xverse-65b-v2|[xverse/XVERSE-65B-2](https://modelscope.cn/models/xverse/XVERSE-65B-2/summary)|q_proj, k_proj, v_proj|default-generation|✘|✔|✘|✘||-|[xverse/XVERSE-65B-2](https://huggingface.co/xverse/XVERSE-65B-2)|
|xverse-65b-chat|[xverse/XVERSE-65B-Chat](https://modelscope.cn/models/xverse/XVERSE-65B-Chat/summary)|q_proj, k_proj, v_proj|xverse|✘|✔|✘|✘||-|[xverse/XVERSE-65B-Chat](https://huggingface.co/xverse/XVERSE-65B-Chat)|
|xverse-13b-256k|[xverse/XVERSE-13B-256K](https://modelscope.cn/models/xverse/XVERSE-13B-256K/summary)|q_proj, k_proj, v_proj|default-generation|✘|✔|✘|✘||-|[xverse/XVERSE-13B-256K](https://huggingface.co/xverse/XVERSE-13B-256K)|
|xverse-moe-a4_2b|[xverse/XVERSE-MoE-A4.2B](https://modelscope.cn/models/xverse/XVERSE-MoE-A4.2B/summary)|q_proj, k_proj, v_proj|default-generation|✘|✘|✘|✘||moe|[xverse/XVERSE-MoE-A4.2B](https://huggingface.co/xverse/XVERSE-MoE-A4.2B)|
|orion-14b|[OrionStarAI/Orion-14B-Base](https://modelscope.cn/models/OrionStarAI/Orion-14B-Base/summary)|q_proj, k_proj, v_proj|default-generation|✔|✘|✘|✘||-|[OrionStarAI/Orion-14B-Base](https://huggingface.co/OrionStarAI/Orion-14B-Base)|
|orion-14b-chat|[OrionStarAI/Orion-14B-Chat](https://modelscope.cn/models/OrionStarAI/Orion-14B-Chat/summary)|q_proj, k_proj, v_proj|orion|✔|✘|✘|✘||-|[OrionStarAI/Orion-14B-Chat](https://huggingface.co/OrionStarAI/Orion-14B-Chat)|
|bluelm-7b|[vivo-ai/BlueLM-7B-Base](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base/summary)|q_proj, k_proj, v_proj|default-generation|✘|✘|✘|✘||-|[vivo-ai/BlueLM-7B-Base](https://huggingface.co/vivo-ai/BlueLM-7B-Base)|
|bluelm-7b-32k|[vivo-ai/BlueLM-7B-Base-32K](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Base-32K/summary)|q_proj, k_proj, v_proj|default-generation|✘|✘|✘|✘||-|[vivo-ai/BlueLM-7B-Base-32K](https://huggingface.co/vivo-ai/BlueLM-7B-Base-32K)|
|bluelm-7b-chat|[vivo-ai/BlueLM-7B-Chat](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat/summary)|q_proj, k_proj, v_proj|bluelm|✘|✘|✘|✘||-|[vivo-ai/BlueLM-7B-Chat](https://huggingface.co/vivo-ai/BlueLM-7B-Chat)|
|bluelm-7b-chat-32k|[vivo-ai/BlueLM-7B-Chat-32K](https://modelscope.cn/models/vivo-ai/BlueLM-7B-Chat-32K/summary)|q_proj, k_proj, v_proj|bluelm|✘|✘|✘|✘||-|[vivo-ai/BlueLM-7B-Chat-32K](https://huggingface.co/vivo-ai/BlueLM-7B-Chat-32K)|
|ziya2-13b|[Fengshenbang/Ziya2-13B-Base](https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Base/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✔|✘||-|[IDEA-CCNL/Ziya2-13B-Base](https://huggingface.co/IDEA-CCNL/Ziya2-13B-Base)|
|ziya2-13b-chat|[Fengshenbang/Ziya2-13B-Chat](https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Chat/summary)|q_proj, k_proj, v_proj|ziya|✔|✔|✔|✘||-|[IDEA-CCNL/Ziya2-13B-Chat](https://huggingface.co/IDEA-CCNL/Ziya2-13B-Chat)|
|skywork-13b|[skywork/Skywork-13B-base](https://modelscope.cn/models/skywork/Skywork-13B-base/summary)|q_proj, k_proj, v_proj|default-generation|✘|✘|✘|✘||-|[Skywork/Skywork-13B-base](https://huggingface.co/Skywork/Skywork-13B-base)|
|skywork-13b-chat|[skywork/Skywork-13B-chat](https://modelscope.cn/models/skywork/Skywork-13B-chat/summary)|q_proj, k_proj, v_proj|skywork|✘|✘|✘|✘||-|-|
|zephyr-7b-beta-chat|[modelscope/zephyr-7b-beta](https://modelscope.cn/models/modelscope/zephyr-7b-beta/summary)|q_proj, k_proj, v_proj|zephyr|✔|✔|✔|✘|transformers>=4.34|-|[HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)|
|polylm-13b|[damo/nlp_polylm_13b_text_generation](https://modelscope.cn/models/damo/nlp_polylm_13b_text_generation/summary)|c_attn|default-generation|✘|✘|✘|✘||-|[DAMO-NLP-MT/polylm-13b](https://huggingface.co/DAMO-NLP-MT/polylm-13b)|
|seqgpt-560m|[damo/nlp_seqgpt-560m](https://modelscope.cn/models/damo/nlp_seqgpt-560m/summary)|query_key_value|default-generation|✘|✔|✘|✘||-|[DAMO-NLP/SeqGPT-560M](https://huggingface.co/DAMO-NLP/SeqGPT-560M)|
|sus-34b-chat|[SUSTC/SUS-Chat-34B](https://modelscope.cn/models/SUSTC/SUS-Chat-34B/summary)|q_proj, k_proj, v_proj|sus|✔|✔|✔|✘||-|[SUSTech/SUS-Chat-34B](https://huggingface.co/SUSTech/SUS-Chat-34B)|
|tongyi-finance-14b|[TongyiFinance/Tongyi-Finance-14B](https://modelscope.cn/models/TongyiFinance/Tongyi-Finance-14B/summary)|c_attn|default-generation|✔|✔|✔|✘||financial|-|
|tongyi-finance-14b-chat|[TongyiFinance/Tongyi-Finance-14B-Chat](https://modelscope.cn/models/TongyiFinance/Tongyi-Finance-14B-Chat/summary)|c_attn|qwen|✔|✔|✔|✘||financial|[jxy/Tongyi-Finance-14B-Chat](https://huggingface.co/jxy/Tongyi-Finance-14B-Chat)|
|tongyi-finance-14b-chat-int4|[TongyiFinance/Tongyi-Finance-14B-Chat-Int4](https://modelscope.cn/models/TongyiFinance/Tongyi-Finance-14B-Chat-Int4/summary)|c_attn|qwen|✔|✔|✘|✘|auto_gptq>=0.5|financial|[jxy/Tongyi-Finance-14B-Chat-Int4](https://huggingface.co/jxy/Tongyi-Finance-14B-Chat-Int4)|
|codefuse-codellama-34b-chat|[codefuse-ai/CodeFuse-CodeLlama-34B](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B/summary)|q_proj, k_proj, v_proj|codefuse-codellama|✔|✔|✔|✘||coding|[codefuse-ai/CodeFuse-CodeLlama-34B](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B)|
|codefuse-codegeex2-6b-chat|[codefuse-ai/CodeFuse-CodeGeeX2-6B](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeGeeX2-6B/summary)|query_key_value|codefuse|✘|✔|✘|✘|transformers<4.34|coding|[codefuse-ai/CodeFuse-CodeGeeX2-6B](https://huggingface.co/codefuse-ai/CodeFuse-CodeGeeX2-6B)|
|codefuse-qwen-14b-chat|[codefuse-ai/CodeFuse-QWen-14B](https://modelscope.cn/models/codefuse-ai/CodeFuse-QWen-14B/summary)|c_attn|codefuse|✔|✔|✔|✘||coding|[codefuse-ai/CodeFuse-QWen-14B](https://huggingface.co/codefuse-ai/CodeFuse-QWen-14B)|
|phi2-3b|[AI-ModelScope/phi-2](https://modelscope.cn/models/AI-ModelScope/phi-2/summary)|Wqkv|default-generation|✔|✔|✘|✘||coding|[microsoft/phi-2](https://huggingface.co/microsoft/phi-2)|
|phi3-4b-4k-instruct|[LLM-Research/Phi-3-mini-4k-instruct](https://modelscope.cn/models/LLM-Research/Phi-3-mini-4k-instruct/summary)|qkv_proj|phi3|✔|✔|✘|✘|transformers>=4.36|-|[microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)|
|phi3-4b-128k-instruct|[LLM-Research/Phi-3-mini-128k-instruct](https://modelscope.cn/models/LLM-Research/Phi-3-mini-128k-instruct/summary)|qkv_proj|phi3|✔|✔|✘|✘|transformers>=4.36|-|[microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)|
|phi3-small-8k-instruct|[LLM-Research/Phi-3-small-8k-instruct](https://modelscope.cn/models/LLM-Research/Phi-3-small-8k-instruct/summary)|query_key_value|phi3|✔|✔|✘|✘|transformers>=4.36|-|[microsoft/Phi-3-small-8k-instruct](https://huggingface.co/microsoft/Phi-3-small-8k-instruct)|
|phi3-medium-4k-instruct|[LLM-Research/Phi-3-medium-4k-instruct](https://modelscope.cn/models/LLM-Research/Phi-3-medium-4k-instruct/summary)|qkv_proj|phi3|✔|✔|✘|✘|transformers>=4.36|-|[microsoft/Phi-3-medium-4k-instruct](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct)|
|phi3-small-128k-instruct|[LLM-Research/Phi-3-small-128k-instruct](https://modelscope.cn/models/LLM-Research/Phi-3-small-128k-instruct/summary)|query_key_value|phi3|✔|✔|✘|✘|transformers>=4.36|-|[microsoft/Phi-3-small-128k-instruct](https://huggingface.co/microsoft/Phi-3-small-128k-instruct)|
|phi3-medium-128k-instruct|[LLM-Research/Phi-3-medium-128k-instruct](https://modelscope.cn/models/LLM-Research/Phi-3-medium-128k-instruct/summary)|qkv_proj|phi3|✔|✔|✘|✘|transformers>=4.36|-|[microsoft/Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct)|
|phi3_5-mini-instruct|[LLM-Research/Phi-3.5-mini-instruct](https://modelscope.cn/models/LLM-Research/Phi-3.5-mini-instruct/summary)|qkv_proj|phi3|✔|✔|✘|✘|transformers>=4.36|-|[microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)|
|phi3_5-moe-instruct|[LLM-Research/Phi-3.5-MoE-instruct](https://modelscope.cn/models/LLM-Research/Phi-3.5-MoE-instruct/summary)|q_proj, k_proj, v_proj|phi3|✔|✔|✘|✘|transformers>=4.36|moe|[microsoft/Phi-3.5-MoE-instruct](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct)|
|mamba-130m|[AI-ModelScope/mamba-130m-hf](https://modelscope.cn/models/AI-ModelScope/mamba-130m-hf/summary)|in_proj, x_proj, embeddings, out_proj|default-generation|✘|✘|✘|✘|transformers>=4.39.0|-|[state-spaces/mamba-130m-hf](https://huggingface.co/state-spaces/mamba-130m-hf)|
|mamba-370m|[AI-ModelScope/mamba-370m-hf](https://modelscope.cn/models/AI-ModelScope/mamba-370m-hf/summary)|in_proj, x_proj, embeddings, out_proj|default-generation|✘|✘|✘|✘|transformers>=4.39.0|-|[state-spaces/mamba-370m-hf](https://huggingface.co/state-spaces/mamba-370m-hf)|
|mamba-390m|[AI-ModelScope/mamba-390m-hf](https://modelscope.cn/models/AI-ModelScope/mamba-390m-hf/summary)|in_proj, x_proj, embeddings, out_proj|default-generation|✘|✘|✘|✘|transformers>=4.39.0|-|[state-spaces/mamba-390m-hf](https://huggingface.co/state-spaces/mamba-390m-hf)|
|mamba-790m|[AI-ModelScope/mamba-790m-hf](https://modelscope.cn/models/AI-ModelScope/mamba-790m-hf/summary)|in_proj, x_proj, embeddings, out_proj|default-generation|✘|✘|✘|✘|transformers>=4.39.0|-|[state-spaces/mamba-790m-hf](https://huggingface.co/state-spaces/mamba-790m-hf)|
|mamba-1.4b|[AI-ModelScope/mamba-1.4b-hf](https://modelscope.cn/models/AI-ModelScope/mamba-1.4b-hf/summary)|in_proj, x_proj, embeddings, out_proj|default-generation|✘|✘|✘|✘|transformers>=4.39.0|-|[state-spaces/mamba-1.4b-hf](https://huggingface.co/state-spaces/mamba-1.4b-hf)|
|mamba-2.8b|[AI-ModelScope/mamba-2.8b-hf](https://modelscope.cn/models/AI-ModelScope/mamba-2.8b-hf/summary)|in_proj, x_proj, embeddings, out_proj|default-generation|✘|✘|✘|✘|transformers>=4.39.0|-|[state-spaces/mamba-2.8b-hf](https://huggingface.co/state-spaces/mamba-2.8b-hf)|
|telechat-7b|[TeleAI/TeleChat-7B](https://modelscope.cn/models/TeleAI/TeleChat-7B/summary)|key_value, query|telechat|✔|✘|✘|✘||-|[Tele-AI/telechat-7B](https://huggingface.co/Tele-AI/telechat-7B)|
|telechat-12b|[TeleAI/TeleChat-12B](https://modelscope.cn/models/TeleAI/TeleChat-12B/summary)|key_value, query|telechat|✔|✘|✘|✘||-|[Tele-AI/TeleChat-12B](https://huggingface.co/Tele-AI/TeleChat-12B)|
|telechat-12b-v2|[TeleAI/TeleChat-12B-v2](https://modelscope.cn/models/TeleAI/TeleChat-12B-v2/summary)|key_value, query|telechat|✔|✘|✘|✘||-|[Tele-AI/TeleChat-12B-v2](https://huggingface.co/Tele-AI/TeleChat-12B-v2)|
|telechat-12b-v2-gptq-int4|[swift/TeleChat-12B-V2-GPTQ-Int4](https://modelscope.cn/models/swift/TeleChat-12B-V2-GPTQ-Int4/summary)|key_value, query|telechat|✔|✘|✘|✘|auto_gptq>=0.5|-|-|
|telechat2-115b|[TeleAI/TeleChat2-115B](https://modelscope.cn/models/TeleAI/TeleChat2-115B/summary)|key_value, query|telechat2|✔|✘|✘|✘||-|[Tele-AI/TeleChat2-115B](https://huggingface.co/Tele-AI/TeleChat2-115B)|
|grok-1|[colossalai/grok-1-pytorch](https://modelscope.cn/models/colossalai/grok-1-pytorch/summary)|q_proj, k_proj, v_proj|default-generation|✘|✘|✘|✘||-|[hpcai-tech/grok-1](https://huggingface.co/hpcai-tech/grok-1)|
|dbrx-instruct|[AI-ModelScope/dbrx-instruct](https://modelscope.cn/models/AI-ModelScope/dbrx-instruct/summary)|attn.Wqkv|dbrx|✔|✔|✘|✘|transformers>=4.36|moe|[databricks/dbrx-instruct](https://huggingface.co/databricks/dbrx-instruct)|
|dbrx-base|[AI-ModelScope/dbrx-base](https://modelscope.cn/models/AI-ModelScope/dbrx-base/summary)|attn.Wqkv|dbrx|✔|✔|✘|✘|transformers>=4.36|moe|[databricks/dbrx-base](https://huggingface.co/databricks/dbrx-base)|
|mengzi3-13b-base|[langboat/Mengzi3-13B-Base](https://modelscope.cn/models/langboat/Mengzi3-13B-Base/summary)|q_proj, k_proj, v_proj|mengzi|✔|✔|✘|✘||-|[Langboat/Mengzi3-13B-Base](https://huggingface.co/Langboat/Mengzi3-13B-Base)|
|c4ai-command-r-v01|[AI-ModelScope/c4ai-command-r-v01](https://modelscope.cn/models/AI-ModelScope/c4ai-command-r-v01/summary)|q_proj, k_proj, v_proj|c4ai|✔|✔|✘|✘|transformers>=4.39.1|-|[CohereForAI/c4ai-command-r-v01](https://huggingface.co/CohereForAI/c4ai-command-r-v01)|
|c4ai-command-r-plus|[AI-ModelScope/c4ai-command-r-plus](https://modelscope.cn/models/AI-ModelScope/c4ai-command-r-plus/summary)|q_proj, k_proj, v_proj|c4ai|✔|✔|✘|✘|transformers>4.39|-|[CohereForAI/c4ai-command-r-plus](https://huggingface.co/CohereForAI/c4ai-command-r-plus)|
|aya-expanse-8b|[AI-ModelScope/aya-expanse-8b](https://modelscope.cn/models/AI-ModelScope/aya-expanse-8b/summary)|q_proj, k_proj, v_proj|aya|✔|✔|✘|✘|transformers>=4.44.0|-|[CohereForAI/aya-expanse-8b](https://huggingface.co/CohereForAI/aya-expanse-8b)|
|aya-expanse-32b|[AI-ModelScope/aya-expanse-32b](https://modelscope.cn/models/AI-ModelScope/aya-expanse-32b/summary)|q_proj, k_proj, v_proj|aya|✔|✔|✘|✘|transformers>=4.44.0|-|[CohereForAI/aya-expanse-32b](https://huggingface.co/CohereForAI/aya-expanse-32b)|
|codestral-22b|[swift/Codestral-22B-v0.1](https://modelscope.cn/models/swift/Codestral-22B-v0.1/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|✘|✘|transformers>=4.34|-|[mistralai/Codestral-22B-v0.1](https://huggingface.co/mistralai/Codestral-22B-v0.1)|
### 多模态大模型
| Model Type | Model ID | Default Lora Target Modules | Default Template | Support Flash Attn | Support vLLM | Support LMDeploy | Support Megatron | Requires | Tags | HF Model ID |
| --------- | -------- | --------------------------- | ---------------- | ------------------ | ------------ | ---------------- | ---------------- | -------- | ---- | ----------- |
|qwen-vl|[qwen/Qwen-VL](https://modelscope.cn/models/qwen/Qwen-VL/summary)|^(transformer.h)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|qwen-vl-generation|✔|✔|✔|✘||vision|[Qwen/Qwen-VL](https://huggingface.co/Qwen/Qwen-VL)|
|qwen-vl-chat|[qwen/Qwen-VL-Chat](https://modelscope.cn/models/qwen/Qwen-VL-Chat/summary)|^(transformer.h)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|qwen-vl|✔|✔|✔|✘||vision|[Qwen/Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat)|
|qwen-vl-chat-int4|[qwen/Qwen-VL-Chat-Int4](https://modelscope.cn/models/qwen/Qwen-VL-Chat-Int4/summary)|^(transformer.h)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|qwen-vl|✔|✔|✘|✘|auto_gptq>=0.5|vision|[Qwen/Qwen-VL-Chat-Int4](https://huggingface.co/Qwen/Qwen-VL-Chat-Int4)|
|qwen-audio|[qwen/Qwen-Audio](https://modelscope.cn/models/qwen/Qwen-Audio/summary)|^(transformer.h)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|qwen-audio-generation|✔|✘|✘|✘||audio|[Qwen/Qwen-Audio](https://huggingface.co/Qwen/Qwen-Audio)|
|qwen-audio-chat|[qwen/Qwen-Audio-Chat](https://modelscope.cn/models/qwen/Qwen-Audio-Chat/summary)|^(transformer.h)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|qwen-audio|✔|✘|✘|✘||audio|[Qwen/Qwen-Audio-Chat](https://huggingface.co/Qwen/Qwen-Audio-Chat)|
|qwen2-audio-7b|[qwen/Qwen2-Audio-7B](https://modelscope.cn/models/qwen/Qwen2-Audio-7B/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|qwen2-audio-generation|✔|✘|✘|✘|librosa, transformers>=4.45|audio|[Qwen/Qwen2-Audio-7B](https://huggingface.co/Qwen/Qwen2-Audio-7B)|
|qwen2-audio-7b-instruct|[qwen/Qwen2-Audio-7B-Instruct](https://modelscope.cn/models/qwen/Qwen2-Audio-7B-Instruct/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|qwen2-audio|✔|✘|✘|✘|librosa, transformers>=4.45|audio|[Qwen/Qwen2-Audio-7B-Instruct](https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct)|
|qwen2-vl-2b|[qwen/Qwen2-VL-2B](https://modelscope.cn/models/qwen/Qwen2-VL-2B/summary)|^(model)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|qwen2-vl-generation|✔|✔|✘|✘|transformers>=4.45.dev.0, qwen_vl_utils|vision, video|[Qwen/Qwen2-VL-2B](https://huggingface.co/Qwen/Qwen2-VL-2B)|
|qwen2-vl-2b-instruct|[qwen/Qwen2-VL-2B-Instruct](https://modelscope.cn/models/qwen/Qwen2-VL-2B-Instruct/summary)|^(model)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|qwen2-vl|✔|✔|✘|✘|transformers>=4.45.dev.0, qwen_vl_utils|vision, video|[Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)|
|qwen2-vl-2b-instruct-gptq-int4|[qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4/summary)|^(model)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|qwen2-vl|✔|✔|✘|✘|transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5|vision, video|[Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4)|
|qwen2-vl-2b-instruct-gptq-int8|[qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8/summary)|^(model)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|qwen2-vl|✔|✔|✘|✘|transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5|vision, video|[Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8)|
|qwen2-vl-2b-instruct-awq|[qwen/Qwen2-VL-2B-Instruct-AWQ](https://modelscope.cn/models/qwen/Qwen2-VL-2B-Instruct-AWQ/summary)|^(model)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|qwen2-vl|✔|✔|✘|✘|transformers>=4.45.dev.0, qwen_vl_utils, autoawq|vision, video|[Qwen/Qwen2-VL-2B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct-AWQ)|
|qwen2-vl-7b|[qwen/Qwen2-VL-7B](https://modelscope.cn/models/qwen/Qwen2-VL-7B/summary)|^(model)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|qwen2-vl-generation|✔|✔|✘|✘|transformers>=4.45.dev.0, qwen_vl_utils|vision, video|[Qwen/Qwen2-VL-7B](https://huggingface.co/Qwen/Qwen2-VL-7B)|
|qwen2-vl-7b-instruct|[qwen/Qwen2-VL-7B-Instruct](https://modelscope.cn/models/qwen/Qwen2-VL-7B-Instruct/summary)|^(model)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|qwen2-vl|✔|✔|✘|✘|transformers>=4.45.dev.0, qwen_vl_utils|vision, video|[Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)|
|qwen2-vl-7b-instruct-gptq-int4|[qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4/summary)|^(model)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|qwen2-vl|✔|✔|✘|✘|transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5|vision, video|[Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4)|
|qwen2-vl-7b-instruct-gptq-int8|[qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8/summary)|^(model)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|qwen2-vl|✔|✔|✘|✘|transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5|vision, video|[Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8)|
|qwen2-vl-7b-instruct-awq|[qwen/Qwen2-VL-7B-Instruct-AWQ](https://modelscope.cn/models/qwen/Qwen2-VL-7B-Instruct-AWQ/summary)|^(model)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|qwen2-vl|✔|✔|✘|✘|transformers>=4.45.dev.0, qwen_vl_utils, autoawq|vision, video|[Qwen/Qwen2-VL-7B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct-AWQ)|
|qwen2-vl-72b|[qwen/Qwen2-VL-72B](https://modelscope.cn/models/qwen/Qwen2-VL-72B/summary)|^(model)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|qwen2-vl-generation|✔|✔|✘|✘|transformers>=4.45.dev.0, qwen_vl_utils|vision, video|[Qwen/Qwen2-VL-72B](https://huggingface.co/Qwen/Qwen2-VL-72B)|
|qwen2-vl-72b-instruct|[qwen/Qwen2-VL-72B-Instruct](https://modelscope.cn/models/qwen/Qwen2-VL-72B-Instruct/summary)|^(model)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|qwen2-vl|✔|✔|✘|✘|transformers>=4.45.dev.0, qwen_vl_utils|vision, video|[Qwen/Qwen2-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct)|
|qwen2-vl-72b-instruct-gptq-int4|[qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4/summary)|^(model)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|qwen2-vl|✔|✔|✘|✘|transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5|vision, video|[Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4)|
|qwen2-vl-72b-instruct-gptq-int8|[qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8](https://modelscope.cn/models/qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8/summary)|^(model)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|qwen2-vl|✔|✔|✘|✘|transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5|vision, video|[Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8](https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8)|
|qwen2-vl-72b-instruct-awq|[qwen/Qwen2-VL-72B-Instruct-AWQ](https://modelscope.cn/models/qwen/Qwen2-VL-72B-Instruct-AWQ/summary)|^(model)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|qwen2-vl|✔|✔|✘|✘|transformers>=4.45.dev.0, qwen_vl_utils, autoawq|vision, video|[Qwen/Qwen2-VL-72B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct-AWQ)|
|glm4v-9b-chat|[ZhipuAI/glm-4v-9b](https://modelscope.cn/models/ZhipuAI/glm-4v-9b/summary)|^(transformer.encoder)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|glm4v|✘|✘|✘|✘|transformers>=4.42|vision|[THUDM/glm-4v-9b](https://huggingface.co/THUDM/glm-4v-9b)|
|glm-edge-v-2b|[ZhipuAI/glm-edge-v-2b](https://modelscope.cn/models/ZhipuAI/glm-edge-v-2b/summary)|^(model.layers)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|glm-edge-v|✔|✘|✘|✘|transformers>=4.46|vision|[THUDM/glm-edge-v-2b](https://huggingface.co/THUDM/glm-edge-v-2b)|
|glm-edge-v-5b|[ZhipuAI/glm-edge-v-5b](https://modelscope.cn/models/ZhipuAI/glm-edge-v-5b/summary)|^(model.layers)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|glm-edge-v|✔|✘|✘|✘|transformers>=4.46|vision|[THUDM/glm-edge-v-5b](https://huggingface.co/THUDM/glm-edge-v-5b)|
|llama3_2-11b-vision|[LLM-Research/Llama-3.2-11B-Vision](https://modelscope.cn/models/LLM-Research/Llama-3.2-11B-Vision/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llama3_2-vision-generation|✔|✔|✘|✘|transformers>=4.45|vision|[meta-llama/Llama-3.2-11B-Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision)|
|llama3_2-11b-vision-instruct|[LLM-Research/Llama-3.2-11B-Vision-Instruct](https://modelscope.cn/models/LLM-Research/Llama-3.2-11B-Vision-Instruct/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llama3_2-vision|✔|✔|✘|✘|transformers>=4.45|vision|[meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)|
|llama3_2-90b-vision|[LLM-Research/Llama-3.2-90B-Vision](https://modelscope.cn/models/LLM-Research/Llama-3.2-90B-Vision/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llama3_2-vision-generation|✔|✔|✘|✘|transformers>=4.45|vision|[meta-llama/Llama-3.2-90B-Vision](https://huggingface.co/meta-llama/Llama-3.2-90B-Vision)|
|llama3_2-90b-vision-instruct|[LLM-Research/Llama-3.2-90B-Vision-Instruct](https://modelscope.cn/models/LLM-Research/Llama-3.2-90B-Vision-Instruct/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llama3_2-vision|✔|✔|✘|✘|transformers>=4.45|vision|[meta-llama/Llama-3.2-90B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct)|
|llama3_1-8b-omni|[ICTNLP/Llama-3.1-8B-Omni](https://modelscope.cn/models/ICTNLP/Llama-3.1-8B-Omni/summary)|^(model.layers\|model.speech_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llama3_1-omni|✔|✘|✘|✘|whisper, openai-whisper|audio|[ICTNLP/Llama-3.1-8B-Omni](https://huggingface.co/ICTNLP/Llama-3.1-8B-Omni)|
|idefics3-8b-llama3|[AI-ModelScope/Idefics3-8B-Llama3](https://modelscope.cn/models/AI-ModelScope/Idefics3-8B-Llama3/summary)|^(model.text_model\|model.connector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|idefics3|✔|✘|✘|✘|transformers>=4.45|vision|[HuggingFaceM4/Idefics3-8B-Llama3](https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3)|
|llava1_5-7b-instruct|[swift/llava-1.5-7b-hf](https://modelscope.cn/models/swift/llava-1.5-7b-hf/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llava1_5|✔|✔|✘|✘|transformers>=4.36|vision|[llava-hf/llava-1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf)|
|llava1_5-13b-instruct|[swift/llava-1.5-13b-hf](https://modelscope.cn/models/swift/llava-1.5-13b-hf/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llava1_5|✔|✔|✘|✘|transformers>=4.36|vision|[llava-hf/llava-1.5-13b-hf](https://huggingface.co/llava-hf/llava-1.5-13b-hf)|
|llava1_6-mistral-7b-instruct|[swift/llava-v1.6-mistral-7b-hf](https://modelscope.cn/models/swift/llava-v1.6-mistral-7b-hf/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llava-mistral|✔|✔|✘|✘|transformers>=4.39|vision|[llava-hf/llava-v1.6-mistral-7b-hf](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf)|
|llava1_6-vicuna-7b-instruct|[swift/llava-v1.6-vicuna-7b-hf](https://modelscope.cn/models/swift/llava-v1.6-vicuna-7b-hf/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llava-vicuna|✔|✔|✘|✘|transformers>=4.39|vision|[llava-hf/llava-v1.6-vicuna-7b-hf](https://huggingface.co/llava-hf/llava-v1.6-vicuna-7b-hf)|
|llava1_6-vicuna-13b-instruct|[swift/llava-v1.6-vicuna-13b-hf](https://modelscope.cn/models/swift/llava-v1.6-vicuna-13b-hf/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llava-vicuna|✔|✔|✘|✘|transformers>=4.39|vision|[llava-hf/llava-v1.6-vicuna-13b-hf](https://huggingface.co/llava-hf/llava-v1.6-vicuna-13b-hf)|
|llava1_6-llama3_1-8b-instruct|[swift/llava-llama3.1-8b](https://modelscope.cn/models/swift/llava-llama3.1-8b/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llava-next-llama3|✔|✘|✘|✘|transformers>=4.41|vision|-|
|llava1_6-yi-34b-instruct|[swift/llava-v1.6-34b-hf](https://modelscope.cn/models/swift/llava-v1.6-34b-hf/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llava-yi|✔|✔|✘|✘|transformers>=4.39|vision|[llava-hf/llava-v1.6-34b-hf](https://huggingface.co/llava-hf/llava-v1.6-34b-hf)|
|llama3-llava-next-8b-hf|[swift/llama3-llava-next-8b-hf](https://modelscope.cn/models/swift/llama3-llava-next-8b-hf/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llama-llava-next-hf|✔|✔|✘|✘|transformers>=4.39|vision|[llava-hf/llama3-llava-next-8b-hf](https://huggingface.co/llava-hf/llama3-llava-next-8b-hf)|
|llava-next-72b-hf|[AI-ModelScope/llava-next-72b-hf](https://modelscope.cn/models/AI-ModelScope/llava-next-72b-hf/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llama-qwen-hf|✔|✔|✘|✘|transformers>=4.39|vision|[llava-hf/llava-next-72b-hf](https://huggingface.co/llava-hf/llava-next-72b-hf)|
|llava-next-110b-hf|[AI-ModelScope/llava-next-110b-hf](https://modelscope.cn/models/AI-ModelScope/llava-next-110b-hf/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llama-qwen-hf|✔|✔|✘|✘|transformers>=4.39|vision|[llava-hf/llava-next-110b-hf](https://huggingface.co/llava-hf/llava-next-110b-hf)|
|llava-onevision-qwen2-0_5b-ov|[AI-ModelScope/llava-onevision-qwen2-0.5b-ov-hf](https://modelscope.cn/models/AI-ModelScope/llava-onevision-qwen2-0.5b-ov-hf/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llava-onevision-qwen|✔|✘|✘|✘|transformers>=4.45|vision, video|[llava-hf/llava-onevision-qwen2-0.5b-ov-hf](https://huggingface.co/llava-hf/llava-onevision-qwen2-0.5b-ov-hf)|
|llava-onevision-qwen2-7b-ov|[AI-ModelScope/llava-onevision-qwen2-7b-ov-hf](https://modelscope.cn/models/AI-ModelScope/llava-onevision-qwen2-7b-ov-hf/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llava-onevision-qwen|✔|✘|✘|✘|transformers>=4.45|vision, video|[llava-hf/llava-onevision-qwen2-7b-ov-hf](https://huggingface.co/llava-hf/llava-onevision-qwen2-7b-ov-hf)|
|llava-onevision-qwen2-72b-ov|[AI-ModelScope/llava-onevision-qwen2-72b-ov-hf](https://modelscope.cn/models/AI-ModelScope/llava-onevision-qwen2-72b-ov-hf/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llava-onevision-qwen|✔|✘|✘|✘|transformers>=4.45|vision, video|[llava-hf/llava-onevision-qwen2-72b-ov-hf](https://huggingface.co/llava-hf/llava-onevision-qwen2-72b-ov-hf)|
|llama3-llava-next-8b|[AI-Modelscope/llama3-llava-next-8b](https://modelscope.cn/models/AI-Modelscope/llama3-llava-next-8b/summary)|^(model.layers\|model.mm_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llama3-llava-next|✔|✘|✘|✘||vision|[lmms-lab/llama3-llava-next-8b](https://huggingface.co/lmms-lab/llama3-llava-next-8b)|
|llava-next-72b|[AI-Modelscope/llava-next-72b](https://modelscope.cn/models/AI-Modelscope/llava-next-72b/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llava-qwen|✔|✘|✘|✘||vision|[lmms-lab/llava-next-72b](https://huggingface.co/lmms-lab/llava-next-72b)|
|llava-next-110b|[AI-Modelscope/llava-next-110b](https://modelscope.cn/models/AI-Modelscope/llava-next-110b/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llava-qwen|✔|✘|✘|✘||vision|[lmms-lab/llava-next-110b](https://huggingface.co/lmms-lab/llava-next-110b)|
|llava-next-video-7b-instruct|[swift/LLaVA-NeXT-Video-7B-hf](https://modelscope.cn/models/swift/LLaVA-NeXT-Video-7B-hf/summary)|^(language_model\|multi_modal_projector\|vision_resampler)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llava-next-video|✔|✔|✘|✘|transformers>=4.42, av|video|[llava-hf/LLaVA-NeXT-Video-7B-hf](https://huggingface.co/llava-hf/LLaVA-NeXT-Video-7B-hf)|
|llava-next-video-7b-32k-instruct|[swift/LLaVA-NeXT-Video-7B-32K-hf](https://modelscope.cn/models/swift/LLaVA-NeXT-Video-7B-32K-hf/summary)|^(language_model\|multi_modal_projector\|vision_resampler)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llava-next-video|✔|✔|✘|✘|transformers>=4.42, av|video|[llava-hf/LLaVA-NeXT-Video-7B-32K-hf](https://huggingface.co/llava-hf/LLaVA-NeXT-Video-7B-32K-hf)|
|llava-next-video-7b-dpo-instruct|[swift/LLaVA-NeXT-Video-7B-DPO-hf](https://modelscope.cn/models/swift/LLaVA-NeXT-Video-7B-DPO-hf/summary)|^(language_model\|multi_modal_projector\|vision_resampler)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llava-next-video|✔|✔|✘|✘|transformers>=4.42, av|video|[llava-hf/LLaVA-NeXT-Video-7B-DPO-hf](https://huggingface.co/llava-hf/LLaVA-NeXT-Video-7B-DPO-hf)|
|llava-next-video-34b-instruct|[swift/LLaVA-NeXT-Video-34B-hf](https://modelscope.cn/models/swift/LLaVA-NeXT-Video-34B-hf/summary)|^(language_model\|multi_modal_projector\|vision_resampler)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llava-next-video-yi|✔|✔|✘|✘|transformers>=4.42, av|video|[llava-hf/LLaVA-NeXT-Video-34B-hf](https://huggingface.co/llava-hf/LLaVA-NeXT-Video-34B-hf)|
|yi-vl-6b-chat|[01ai/Yi-VL-6B](https://modelscope.cn/models/01ai/Yi-VL-6B/summary)|^(model.layers\|model.mm_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|yi-vl|✔|✘|✘|✘|transformers>=4.34|vision|[01-ai/Yi-VL-6B](https://huggingface.co/01-ai/Yi-VL-6B)|
|yi-vl-34b-chat|[01ai/Yi-VL-34B](https://modelscope.cn/models/01ai/Yi-VL-34B/summary)|^(model.layers\|model.mm_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|yi-vl|✔|✘|✘|✘|transformers>=4.34|vision|[01-ai/Yi-VL-34B](https://huggingface.co/01-ai/Yi-VL-34B)|
|llava-llama3-8b-v1_1|[AI-ModelScope/llava-llama-3-8b-v1_1-transformers](https://modelscope.cn/models/AI-ModelScope/llava-llama-3-8b-v1_1-transformers/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|llava-llama-instruct|✔|✔|✘|✘|transformers>=4.36|vision|[xtuner/llava-llama-3-8b-v1_1-transformers](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers)|
|internlm-xcomposer2-7b-chat|[Shanghai_AI_Laboratory/internlm-xcomposer2-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer2-7b/summary)|attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3|internlm-xcomposer2|✔|✘|✔|✘||vision|[internlm/internlm-xcomposer2-7b](https://huggingface.co/internlm/internlm-xcomposer2-7b)|
|internlm-xcomposer2-4khd-7b-chat|[Shanghai_AI_Laboratory/internlm-xcomposer2-4khd-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer2-4khd-7b/summary)|attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3|internlm-xcomposer2-4khd|✔|✘|✔|✘||vision|[internlm/internlm-xcomposer2-4khd-7b](https://huggingface.co/internlm/internlm-xcomposer2-4khd-7b)|
|internlm-xcomposer2_5-7b-chat|[Shanghai_AI_Laboratory/internlm-xcomposer2d5-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer2d5-7b/summary)|attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3|internlm-xcomposer2_5|✔|✘|✔|✘||vision|[internlm/internlm-xcomposer2d5-7b](https://huggingface.co/internlm/internlm-xcomposer2d5-7b)|
|internvl-chat-v1_5|[AI-ModelScope/InternVL-Chat-V1-5](https://modelscope.cn/models/AI-ModelScope/InternVL-Chat-V1-5/summary)|^(language_model\|mlp1)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|internvl|✔|✔|✔|✘|transformers>=4.35, timm|vision|[OpenGVLab/InternVL-Chat-V1-5](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5)|
|internvl-chat-v1_5-int8|[AI-ModelScope/InternVL-Chat-V1-5-int8](https://modelscope.cn/models/AI-ModelScope/InternVL-Chat-V1-5-int8/summary)|^(language_model\|mlp1)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|internvl|✔|✘|✘|✘|transformers>=4.35, timm|vision|[OpenGVLab/InternVL-Chat-V1-5-int8](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5-int8)|
|mini-internvl-chat-2b-v1_5|[OpenGVLab/Mini-InternVL-Chat-2B-V1-5](https://modelscope.cn/models/OpenGVLab/Mini-InternVL-Chat-2B-V1-5/summary)|^(language_model\|mlp1)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|internvl|✔|✔|✔|✘|transformers>=4.35, timm|vision|[OpenGVLab/Mini-InternVL-Chat-2B-V1-5](https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-2B-V1-5)|
|mini-internvl-chat-4b-v1_5|[OpenGVLab/Mini-InternVL-Chat-4B-V1-5](https://modelscope.cn/models/OpenGVLab/Mini-InternVL-Chat-4B-V1-5/summary)|^(language_model\|mlp1)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|internvl-phi3|✔|✔|✘|✘|transformers>=4.35,<4.42, timm|vision|[OpenGVLab/Mini-InternVL-Chat-4B-V1-5](https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-4B-V1-5)|
|internvl2-1b|[OpenGVLab/InternVL2-1B](https://modelscope.cn/models/OpenGVLab/InternVL2-1B/summary)|^(language_model\|mlp1)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|internvl2|✔|✔|✔|✘|transformers>=4.36, timm|vision, video|[OpenGVLab/InternVL2-1B](https://huggingface.co/OpenGVLab/InternVL2-1B)|
|internvl2-2b|[OpenGVLab/InternVL2-2B](https://modelscope.cn/models/OpenGVLab/InternVL2-2B/summary)|^(language_model\|mlp1)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|internvl2|✔|✔|✔|✘|transformers>=4.36, timm|vision, video|[OpenGVLab/InternVL2-2B](https://huggingface.co/OpenGVLab/InternVL2-2B)|
|internvl2-4b|[OpenGVLab/InternVL2-4B](https://modelscope.cn/models/OpenGVLab/InternVL2-4B/summary)|^(language_model\|mlp1)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|internvl2-phi3|✔|✔|✔|✘|transformers>=4.36,<4.42, timm|vision, video|[OpenGVLab/InternVL2-4B](https://huggingface.co/OpenGVLab/InternVL2-4B)|
|internvl2-8b|[OpenGVLab/InternVL2-8B](https://modelscope.cn/models/OpenGVLab/InternVL2-8B/summary)|^(language_model\|mlp1)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|internvl2|✔|✔|✔|✘|transformers>=4.36, timm|vision, video|[OpenGVLab/InternVL2-8B](https://huggingface.co/OpenGVLab/InternVL2-8B)|
|internvl2-26b|[OpenGVLab/InternVL2-26B](https://modelscope.cn/models/OpenGVLab/InternVL2-26B/summary)|^(language_model\|mlp1)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|internvl2|✔|✔|✔|✘|transformers>=4.36, timm|vision, video|[OpenGVLab/InternVL2-26B](https://huggingface.co/OpenGVLab/InternVL2-26B)|
|internvl2-40b|[OpenGVLab/InternVL2-40B](https://modelscope.cn/models/OpenGVLab/InternVL2-40B/summary)|^(language_model\|mlp1)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|internvl2|✔|✔|✔|✘|transformers>=4.36, timm|vision, video|[OpenGVLab/InternVL2-40B](https://huggingface.co/OpenGVLab/InternVL2-40B)|
|internvl2-llama3-76b|[OpenGVLab/InternVL2-Llama3-76B](https://modelscope.cn/models/OpenGVLab/InternVL2-Llama3-76B/summary)|^(language_model\|mlp1)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|internvl2|✔|✔|✔|✘|transformers>=4.36, timm|vision, video|[OpenGVLab/InternVL2-Llama3-76B](https://huggingface.co/OpenGVLab/InternVL2-Llama3-76B)|
|internvl2-2b-awq|[OpenGVLab/InternVL2-2B-AWQ](https://modelscope.cn/models/OpenGVLab/InternVL2-2B-AWQ/summary)|^(language_model\|mlp1)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|internvl2|✔|✔|✔|✘|transformers>=4.36, timm|vision, video|[OpenGVLab/InternVL2-2B-AWQ](https://huggingface.co/OpenGVLab/InternVL2-2B-AWQ)|
|internvl2-8b-awq|[OpenGVLab/InternVL2-8B-AWQ](https://modelscope.cn/models/OpenGVLab/InternVL2-8B-AWQ/summary)|^(language_model\|mlp1)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|internvl2|✔|✔|✔|✘|transformers>=4.36, timm|vision, video|[OpenGVLab/InternVL2-8B-AWQ](https://huggingface.co/OpenGVLab/InternVL2-8B-AWQ)|
|internvl2-26b-awq|[OpenGVLab/InternVL2-26B-AWQ](https://modelscope.cn/models/OpenGVLab/InternVL2-26B-AWQ/summary)|^(language_model\|mlp1)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|internvl2|✔|✔|✔|✘|transformers>=4.36, timm|vision, video|[OpenGVLab/InternVL2-26B-AWQ](https://huggingface.co/OpenGVLab/InternVL2-26B-AWQ)|
|internvl2-40b-awq|[OpenGVLab/InternVL2-40B-AWQ](https://modelscope.cn/models/OpenGVLab/InternVL2-40B-AWQ/summary)|^(language_model\|mlp1)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|internvl2|✔|✔|✔|✘|transformers>=4.36, timm|vision, video|[OpenGVLab/InternVL2-40B-AWQ](https://huggingface.co/OpenGVLab/InternVL2-40B-AWQ)|
|internvl2-llama3-76b-awq|[OpenGVLab/InternVL2-Llama3-76B-AWQ](https://modelscope.cn/models/OpenGVLab/InternVL2-Llama3-76B-AWQ/summary)|^(language_model\|mlp1)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|internvl2|✔|✔|✔|✘|transformers>=4.36, timm|vision, video|[OpenGVLab/InternVL2-Llama3-76B-AWQ](https://huggingface.co/OpenGVLab/InternVL2-Llama3-76B-AWQ)|
|deepseek-janus-1_3b|[deepseek-ai/Janus-1.3B](https://modelscope.cn/models/deepseek-ai/Janus-1.3B/summary)|^(language_model\|aligner)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|deepseek-janus|✔|✘|✘|✘||vision|[deepseek-ai/Janus-1.3B](https://huggingface.co/deepseek-ai/Janus-1.3B)|
|deepseek-vl-1_3b-chat|[deepseek-ai/deepseek-vl-1.3b-chat](https://modelscope.cn/models/deepseek-ai/deepseek-vl-1.3b-chat/summary)|^(language_model\|aligner)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|deepseek-vl|✔|✘|✔|✘||vision|[deepseek-ai/deepseek-vl-1.3b-chat](https://huggingface.co/deepseek-ai/deepseek-vl-1.3b-chat)|
|deepseek-vl-7b-chat|[deepseek-ai/deepseek-vl-7b-chat](https://modelscope.cn/models/deepseek-ai/deepseek-vl-7b-chat/summary)|^(language_model\|aligner)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|deepseek-vl|✔|✘|✔|✘||vision|[deepseek-ai/deepseek-vl-7b-chat](https://huggingface.co/deepseek-ai/deepseek-vl-7b-chat)|
|ovis1_6-gemma2-9b|[AIDC-AI/Ovis1.6-Gemma2-9B](https://modelscope.cn/models/AIDC-AI/Ovis1.6-Gemma2-9B/summary)|^(llm)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|ovis1_6|✔|✘|✘|✘|transformers>=4.42|vision|[AIDC-AI/Ovis1.6-Gemma2-9B](https://huggingface.co/AIDC-AI/Ovis1.6-Gemma2-9B)|
|paligemma-3b-pt-224|[AI-ModelScope/paligemma-3b-pt-224](https://modelscope.cn/models/AI-ModelScope/paligemma-3b-pt-224/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|paligemma|✔|✔|✘|✘|transformers>=4.41|vision|[google/paligemma-3b-pt-224](https://huggingface.co/google/paligemma-3b-pt-224)|
|paligemma-3b-pt-448|[AI-ModelScope/paligemma-3b-pt-448](https://modelscope.cn/models/AI-ModelScope/paligemma-3b-pt-448/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|paligemma|✔|✔|✘|✘|transformers>=4.41|vision|[google/paligemma-3b-pt-448](https://huggingface.co/google/paligemma-3b-pt-448)|
|paligemma-3b-pt-896|[AI-ModelScope/paligemma-3b-pt-896](https://modelscope.cn/models/AI-ModelScope/paligemma-3b-pt-896/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|paligemma|✔|✔|✘|✘|transformers>=4.41|vision|[google/paligemma-3b-pt-896](https://huggingface.co/google/paligemma-3b-pt-896)|
|paligemma-3b-mix-224|[AI-ModelScope/paligemma-3b-mix-224](https://modelscope.cn/models/AI-ModelScope/paligemma-3b-mix-224/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|paligemma|✔|✔|✘|✘|transformers>=4.41|vision|[google/paligemma-3b-mix-224](https://huggingface.co/google/paligemma-3b-mix-224)|
|paligemma-3b-mix-448|[AI-ModelScope/paligemma-3b-mix-448](https://modelscope.cn/models/AI-ModelScope/paligemma-3b-mix-448/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|paligemma|✔|✔|✘|✘|transformers>=4.41|vision|[google/paligemma-3b-mix-448](https://huggingface.co/google/paligemma-3b-mix-448)|
|minicpm-v-3b-chat|[OpenBMB/MiniCPM-V](https://modelscope.cn/models/OpenBMB/MiniCPM-V/summary)|^(llm\|resampler)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|minicpm-v|✔|✘|✘|✘|timm, transformers<4.42|vision|[openbmb/MiniCPM-V](https://huggingface.co/openbmb/MiniCPM-V)|
|minicpm-v-v2-chat|[OpenBMB/MiniCPM-V-2](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2/summary)|^(llm\|resampler)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|minicpm-v|✔|✘|✘|✘|timm, transformers<4.42|vision|[openbmb/MiniCPM-V-2](https://huggingface.co/openbmb/MiniCPM-V-2)|
|minicpm-v-v2_5-chat|[OpenBMB/MiniCPM-Llama3-V-2_5](https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5/summary)|^(llm\|resampler)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|minicpm-v-v2_5|✔|✔|✘|✘|timm, transformers>=4.36|vision|[openbmb/MiniCPM-Llama3-V-2_5](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5)|
|minicpm-v-v2_6-chat|[OpenBMB/MiniCPM-V-2_6](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/summary)|^(llm\|resampler)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|minicpm-v-v2_6|✔|✔|✘|✘|timm, transformers>=4.36|vision, video|[openbmb/MiniCPM-V-2_6](https://huggingface.co/openbmb/MiniCPM-V-2_6)|
|pixtral-12b|[AI-ModelScope/pixtral-12b](https://modelscope.cn/models/AI-ModelScope/pixtral-12b/summary)|^(language_model\|multi_modal_projector)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|pixtral|✘|✘|✘|✘|transformers>=4.45|vision|[mistral-community/pixtral-12b](https://huggingface.co/mistral-community/pixtral-12b)|
|mplug-owl2-chat|[iic/mPLUG-Owl2](https://modelscope.cn/models/iic/mPLUG-Owl2/summary)|q_proj, k_proj.multiway.0, k_proj.multiway.1, v_proj.multiway.0, v_proj.multiway.1|mplug-owl2|✔|✘|✘|✘|transformers<4.35, icecream|vision|[MAGAer13/mplug-owl2-llama2-7b](https://huggingface.co/MAGAer13/mplug-owl2-llama2-7b)|
|mplug-owl2_1-chat|[iic/mPLUG-Owl2.1](https://modelscope.cn/models/iic/mPLUG-Owl2.1/summary)|c_attn.multiway.0, c_attn.multiway.1|mplug-owl2|✔|✘|✘|✘|transformers<4.35, icecream|vision|[Mizukiluke/mplug_owl_2_1](https://huggingface.co/Mizukiluke/mplug_owl_2_1)|
|mplug-owl3-1b-chat|[iic/mPLUG-Owl3-1B-241014](https://modelscope.cn/models/iic/mPLUG-Owl3-1B-241014/summary)|^(language_model\|vision2text_model)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|mplug_owl3|✔|✘|✘|✘|transformers>=4.36, icecream|vision, video|[mPLUG/mPLUG-Owl3-1B-241014](https://huggingface.co/mPLUG/mPLUG-Owl3-1B-241014)|
|mplug-owl3-2b-chat|[iic/mPLUG-Owl3-2B-241014](https://modelscope.cn/models/iic/mPLUG-Owl3-2B-241014/summary)|^(language_model\|vision2text_model)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|mplug_owl3|✔|✘|✘|✘|transformers>=4.36, icecream|vision, video|[mPLUG/mPLUG-Owl3-2B-241014](https://huggingface.co/mPLUG/mPLUG-Owl3-2B-241014)|
|mplug-owl3-7b-chat|[iic/mPLUG-Owl3-7B-240728](https://modelscope.cn/models/iic/mPLUG-Owl3-7B-240728/summary)|^(language_model\|vision2text_model)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|mplug_owl3|✔|✘|✘|✘|transformers>=4.36, icecream|vision, video|[mPLUG/mPLUG-Owl3-7B-240728](https://huggingface.co/mPLUG/mPLUG-Owl3-7B-240728)|
|mplug-owl3v-7b-chat|[iic/mPLUG-Owl3-7B-241101](https://modelscope.cn/models/iic/mPLUG-Owl3-7B-241101/summary)|^(language_model\|vision2text_model)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|mplug_owl3v|✔|✘|✘|✘|transformers>=4.36, icecream|vision, video|[mPLUG/mPLUG-Owl3-7B-241101](https://huggingface.co/mPLUG/mPLUG-Owl3-7B-241101)|
|phi3-vision-128k-instruct|[LLM-Research/Phi-3-vision-128k-instruct](https://modelscope.cn/models/LLM-Research/Phi-3-vision-128k-instruct/summary)|^(model.layers\|model.vision_embed_tokens.img_projection)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|phi3-vl|✔|✔|✘|✘|transformers>=4.36|vision|[microsoft/Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct)|
|phi3_5-vision-instruct|[LLM-Research/Phi-3.5-vision-instruct](https://modelscope.cn/models/LLM-Research/Phi-3.5-vision-instruct/summary)|^(model.layers\|model.vision_embed_tokens.img_projection)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|phi3-vl|✔|✔|✘|✘|transformers>=4.36|vision|[microsoft/Phi-3.5-vision-instruct](https://huggingface.co/microsoft/Phi-3.5-vision-instruct)|
|cogvlm-17b-chat|[ZhipuAI/cogvlm-chat](https://modelscope.cn/models/ZhipuAI/cogvlm-chat/summary)|^(model.layers)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|cogvlm|✘|✘|✘|✘|transformers<4.42|vision|[THUDM/cogvlm-chat-hf](https://huggingface.co/THUDM/cogvlm-chat-hf)|
|cogvlm2-19b-chat|[ZhipuAI/cogvlm2-llama3-chinese-chat-19B](https://modelscope.cn/models/ZhipuAI/cogvlm2-llama3-chinese-chat-19B/summary)|^(model.layers)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|cogvlm|✘|✘|✔|✘|transformers<4.42|vision|[THUDM/cogvlm2-llama3-chinese-chat-19B](https://huggingface.co/THUDM/cogvlm2-llama3-chinese-chat-19B)|
|cogvlm2-en-19b-chat|[ZhipuAI/cogvlm2-llama3-chat-19B](https://modelscope.cn/models/ZhipuAI/cogvlm2-llama3-chat-19B/summary)|^(model.layers)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|cogvlm|✘|✘|✔|✘|transformers<4.42|vision|[THUDM/cogvlm2-llama3-chat-19B](https://huggingface.co/THUDM/cogvlm2-llama3-chat-19B)|
|cogvlm2-video-13b-chat|[ZhipuAI/cogvlm2-video-llama3-chat](https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-chat/summary)|^(model.layers)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|cogvlm2-video|✘|✘|✘|✘|decord, pytorchvideo, transformers>=4.42|vision, video|[THUDM/cogvlm2-video-llama3-chat](https://huggingface.co/THUDM/cogvlm2-video-llama3-chat)|
|cogagent-18b-chat|[ZhipuAI/cogagent-chat](https://modelscope.cn/models/ZhipuAI/cogagent-chat/summary)|^(model.layers)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|cogagent-chat|✘|✘|✘|✘|timm|vision|[THUDM/cogagent-chat-hf](https://huggingface.co/THUDM/cogagent-chat-hf)|
|cogagent-18b-instruct|[ZhipuAI/cogagent-vqa](https://modelscope.cn/models/ZhipuAI/cogagent-vqa/summary)|^(model.layers)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|cogagent-instruct|✘|✘|✘|✘|timm|vision|[THUDM/cogagent-vqa-hf](https://huggingface.co/THUDM/cogagent-vqa-hf)|
|molmoe-1b|[LLM-Research/MolmoE-1B-0924](https://modelscope.cn/models/LLM-Research/MolmoE-1B-0924/summary)|^(model.transformer)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|molmo|✔|✘|✘|✘|transformers>=4.45.0|vision|[allenai/MolmoE-1B-0924](https://huggingface.co/allenai/MolmoE-1B-0924)|
|molmo-7b-o|[LLM-Research/Molmo-7B-O-0924](https://modelscope.cn/models/LLM-Research/Molmo-7B-O-0924/summary)|^(model.transformer)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|molmo|✔|✘|✘|✘|transformers>=4.45.0|vision|[allenai/Molmo-7B-O-0924](https://huggingface.co/allenai/Molmo-7B-O-0924)|
|molmo-7b-d|[LLM-Research/Molmo-7B-D-0924](https://modelscope.cn/models/LLM-Research/Molmo-7B-D-0924/summary)|^(model.transformer)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|molmo|✔|✘|✘|✘|transformers>=4.45.0|vision|[allenai/Molmo-7B-D-0924](https://huggingface.co/allenai/Molmo-7B-D-0924)|
|molmo-72b|[LLM-Research/Molmo-72B-0924](https://modelscope.cn/models/LLM-Research/Molmo-72B-0924/summary)|^(model.transformer)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|molmo|✔|✘|✘|✘|transformers>=4.45.0|vision|[allenai/Molmo-72B-0924](https://huggingface.co/allenai/Molmo-72B-0924)|
|emu3-chat|[BAAI/Emu3-Chat](https://modelscope.cn/models/BAAI/Emu3-Chat/summary)|^(model)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|emu3-chat|✔|✘|✘|✘|transformers>=4.44.0|vision|[BAAI/Emu3-Chat](https://huggingface.co/BAAI/Emu3-Chat)|
|florence-2-base|[AI-ModelScope/Florence-2-base](https://modelscope.cn/models/AI-ModelScope/Florence-2-base/summary)|^(language_model\|image_projection)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|florence|✔|✘|✘|✘||vision|[microsoft/Florence-2-base](https://huggingface.co/microsoft/Florence-2-base)|
|florence-2-base-ft|[AI-ModelScope/Florence-2-base-ft](https://modelscope.cn/models/AI-ModelScope/Florence-2-base-ft/summary)|^(language_model\|image_projection)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|florence|✔|✘|✘|✘||vision|[microsoft/Florence-2-base-ft](https://huggingface.co/microsoft/Florence-2-base-ft)|
|florence-2-large|[AI-ModelScope/Florence-2-large](https://modelscope.cn/models/AI-ModelScope/Florence-2-large/summary)|^(language_model\|image_projection)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|florence|✔|✘|✘|✘||vision|[microsoft/Florence-2-large](https://huggingface.co/microsoft/Florence-2-large)|
|florence-2-large-ft|[AI-ModelScope/Florence-2-large-ft](https://modelscope.cn/models/AI-ModelScope/Florence-2-large-ft/summary)|^(language_model\|image_projection)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|florence|✔|✘|✘|✘||vision|[microsoft/Florence-2-large-ft](https://huggingface.co/microsoft/Florence-2-large-ft)|
|got-ocr2|[stepfun-ai/GOT-OCR2_0](https://modelscope.cn/models/stepfun-ai/GOT-OCR2_0/summary)|^(model.layers\|model.mm_projector_vary)(?!.\*(lm_head\|output\|emb\|wte\|shared)).\*|got_ocr2|✔|✘|✘|✘||audio|[stepfun-ai/GOT-OCR2_0](https://huggingface.co/stepfun-ai/GOT-OCR2_0)|
## 数据集
下表介绍了swift接入的数据集的相关信息:
- Dataset Name: 数据集在swift中注册的dataset\_name.
- Dataset ID: 数据集在[ModelScope](https://www.modelscope.cn/my/overview)上的dataset\_id.
- Size: 数据集中的数据样本数量.
- Statistic: 数据集的统计量. 我们使用token数进行统计, 这对于调整`max_length`超参数有帮助. 我们将数据集的训练集和验证集进行拼接, 然后进行统计. 我们使用qwen的tokenizer对数据集进行分词. 不同的tokenizer的统计量不同, 如果你要获取其他的模型的tokenizer的token统计量, 可以通过[脚本](https://github.com/modelscope/swift/tree/main/scripts/utils/run_dataset_info.py)自行获取.
| Dataset Name | Dataset ID | Subsets | Dataset Size | Statistic (token) | Tags | HF Dataset ID |
| ------------ | ---------- | ------- |------------- | ----------------- | ---- | ------------- |
|🔥ms-bench|[iic/ms_bench](https://modelscope.cn/datasets/iic/ms_bench/summary)||316820|346.9±443.2, min=22, max=30960|chat, general, multi-round|-|
|🔥alpaca-en|[AI-ModelScope/alpaca-gpt4-data-en](https://modelscope.cn/datasets/AI-ModelScope/alpaca-gpt4-data-en/summary)||52002|176.2±125.8, min=26, max=740|chat, general|[vicgalle/alpaca-gpt4](https://huggingface.co/datasets/vicgalle/alpaca-gpt4)|
|🔥alpaca-zh|[AI-ModelScope/alpaca-gpt4-data-zh](https://modelscope.cn/datasets/AI-ModelScope/alpaca-gpt4-data-zh/summary)||48818|162.1±93.9, min=26, max=856|chat, general|[llm-wizard/alpaca-gpt4-data-zh](https://huggingface.co/datasets/llm-wizard/alpaca-gpt4-data-zh)|
|multi-alpaca|[damo/nlp_polylm_multialpaca_sft](https://modelscope.cn/datasets/damo/nlp_polylm_multialpaca_sft/summary)|ar
de
es
fr
id
ja
ko
pt
ru
th
vi|131867|112.9±50.6, min=26, max=1226|chat, general, multilingual|-|
|instinwild|[wyj123456/instinwild](https://modelscope.cn/datasets/wyj123456/instinwild/summary)|default
subset|103695|145.4±60.7, min=28, max=1434|-|-|
|cot-en|[YorickHe/CoT](https://modelscope.cn/datasets/YorickHe/CoT/summary)||74771|122.7±64.8, min=51, max=8320|chat, general|-|
|cot-zh|[YorickHe/CoT_zh](https://modelscope.cn/datasets/YorickHe/CoT_zh/summary)||74771|117.5±70.8, min=43, max=9636|chat, general|-|
|instruct-en|[wyj123456/instruct](https://modelscope.cn/datasets/wyj123456/instruct/summary)||888970|269.1±331.5, min=26, max=7254|chat, general|-|
|firefly-zh|[AI-ModelScope/firefly-train-1.1M](https://modelscope.cn/datasets/AI-ModelScope/firefly-train-1.1M/summary)||1649399|178.1±260.4, min=26, max=12516|chat, general|[YeungNLP/firefly-train-1.1M](https://huggingface.co/datasets/YeungNLP/firefly-train-1.1M)|
|gpt4all-en|[wyj123456/GPT4all](https://modelscope.cn/datasets/wyj123456/GPT4all/summary)||806199|302.7±384.5, min=27, max=7391|chat, general|-|
|sharegpt|[swift/sharegpt](https://modelscope.cn/datasets/swift/sharegpt/summary)|common-zh
computer-zh
unknow-zh
common-en
computer-en|96566|933.3±864.8, min=21, max=66412|chat, general, multi-round|-|
|tulu-v2-sft-mixture|[AI-ModelScope/tulu-v2-sft-mixture](https://modelscope.cn/datasets/AI-ModelScope/tulu-v2-sft-mixture/summary)||5119|520.7±437.6, min=68, max=2549|chat, multilingual, general, multi-round|[allenai/tulu-v2-sft-mixture](https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture)|
|wikipedia-zh|[AI-ModelScope/wikipedia-cn-20230720-filtered](https://modelscope.cn/datasets/AI-ModelScope/wikipedia-cn-20230720-filtered/summary)||254547|568.4±713.2, min=37, max=78678|text-generation, general, pretrained|[pleisto/wikipedia-cn-20230720-filtered](https://huggingface.co/datasets/pleisto/wikipedia-cn-20230720-filtered)|
|open-orca|[AI-ModelScope/OpenOrca](https://modelscope.cn/datasets/AI-ModelScope/OpenOrca/summary)||994896|382.3±417.4, min=31, max=8740|chat, multilingual, general|-|
|🔥sharegpt-gpt4|[AI-ModelScope/sharegpt_gpt4](https://modelscope.cn/datasets/AI-ModelScope/sharegpt_gpt4/summary)|default
V3_format
zh_38K_format|72684|1047.6±1313.1, min=22, max=66412|chat, multilingual, general, multi-round, gpt4|-|
|deepctrl-sft|[AI-ModelScope/deepctrl-sft-data](https://modelscope.cn/datasets/AI-ModelScope/deepctrl-sft-data/summary)|default
en|14149024|389.8±628.6, min=21, max=626237|chat, general, sft, multi-round|-|
|🔥coig-cqia|[AI-ModelScope/COIG-CQIA](https://modelscope.cn/datasets/AI-ModelScope/COIG-CQIA/summary)|chinese_traditional
coig_pc
exam
finance
douban
human_value
logi_qa
ruozhiba
segmentfault
wiki
wikihow
xhs
zhihu|44694|703.8±654.2, min=33, max=19288|general|-|
|🔥ruozhiba|[AI-ModelScope/ruozhiba](https://modelscope.cn/datasets/AI-ModelScope/ruozhiba/summary)|post-annual
title-good
title-norm|85658|39.9±13.1, min=21, max=559|pretrain|-|
|long-alpaca-12k|[AI-ModelScope/LongAlpaca-12k](https://modelscope.cn/datasets/AI-ModelScope/LongAlpaca-12k/summary)||11998|9619.0±8295.8, min=36, max=78925|longlora, QA|[Yukang/LongAlpaca-12k](https://huggingface.co/datasets/Yukang/LongAlpaca-12k)|
|lmsys-chat-1m|[AI-ModelScope/lmsys-chat-1m](https://modelscope.cn/datasets/AI-ModelScope/lmsys-chat-1m/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|chat, em|[lmsys/lmsys-chat-1m](https://huggingface.co/datasets/lmsys/lmsys-chat-1m)|
|🔥ms-agent|[iic/ms_agent](https://modelscope.cn/datasets/iic/ms_agent/summary)||26336|650.9±217.2, min=209, max=2740|chat, agent, multi-round|-|
|🔥ms-agent-for-agentfabric|[AI-ModelScope/ms_agent_for_agentfabric](https://modelscope.cn/datasets/AI-ModelScope/ms_agent_for_agentfabric/summary)|default
addition|30000|617.8±199.1, min=251, max=2657|chat, agent, multi-round|-|
|ms-agent-multirole|[iic/MSAgent-MultiRole](https://modelscope.cn/datasets/iic/MSAgent-MultiRole/summary)||9500|447.6±84.9, min=145, max=1101|chat, agent, multi-round, role-play, multi-agent|-|
|🔥toolbench-for-alpha-umi|[shenweizhou/alpha-umi-toolbench-processed-v2](https://modelscope.cn/datasets/shenweizhou/alpha-umi-toolbench-processed-v2/summary)|backbone
caller
planner
summarizer|1448337|1439.7±853.9, min=123, max=18467|chat, agent|-|
|damo-agent-zh|[damo/MSAgent-Bench](https://modelscope.cn/datasets/damo/MSAgent-Bench/summary)||386984|956.5±407.3, min=326, max=19001|chat, agent, multi-round|-|
|damo-agent-zh-mini|[damo/MSAgent-Bench](https://modelscope.cn/datasets/damo/MSAgent-Bench/summary)||20845|1326.4±329.6, min=571, max=4304|chat, agent, multi-round|-|
|agent-instruct-all-en|[huangjintao/AgentInstruct_copy](https://modelscope.cn/datasets/huangjintao/AgentInstruct_copy/summary)|alfworld
db
kg
mind2web
os
webshop|1866|1144.3±635.5, min=206, max=6412|chat, agent, multi-round|-|
|🔥msagent-pro|[iic/MSAgent-Pro](https://modelscope.cn/datasets/iic/MSAgent-Pro/summary)||21905|1524.5±921.3, min=64, max=16770|chat, agent, multi-round|-|
|toolbench|[swift/ToolBench](https://modelscope.cn/datasets/swift/ToolBench/summary)||124345|3669.5±1600.9, min=1047, max=22581|chat, agent, multi-round|-|
|code-alpaca-en|[wyj123456/code_alpaca_en](https://modelscope.cn/datasets/wyj123456/code_alpaca_en/summary)||20016|100.2±60.1, min=29, max=1776|-|[sahil2801/CodeAlpaca-20k](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k)|
|🔥leetcode-python-en|[AI-ModelScope/leetcode-solutions-python](https://modelscope.cn/datasets/AI-ModelScope/leetcode-solutions-python/summary)||2359|727.1±235.9, min=259, max=2146|chat, coding|-|
|🔥codefuse-python-en|[codefuse-ai/CodeExercise-Python-27k](https://modelscope.cn/datasets/codefuse-ai/CodeExercise-Python-27k/summary)||27224|483.6±193.9, min=45, max=3082|chat, coding|-|
|🔥codefuse-evol-instruction-zh|[codefuse-ai/Evol-instruction-66k](https://modelscope.cn/datasets/codefuse-ai/Evol-instruction-66k/summary)||66862|439.6±206.3, min=37, max=2983|chat, coding|-|
|medical-en|[swift/medical_zh](https://modelscope.cn/datasets/swift/medical_zh/summary)|en|117617|257.4±89.1, min=36, max=2564|chat, medical|-|
|medical-zh|[swift/medical_zh](https://modelscope.cn/datasets/swift/medical_zh/summary)|zh|1950972|167.2±219.7, min=26, max=27351|chat, medical|-|
|🔥disc-med-sft-zh|[AI-ModelScope/DISC-Med-SFT](https://modelscope.cn/datasets/AI-ModelScope/DISC-Med-SFT/summary)||441767|354.1±193.1, min=25, max=2231|chat, medical|[Flmc/DISC-Med-SFT](https://huggingface.co/datasets/Flmc/DISC-Med-SFT)|
|lawyer-llama-zh|[AI-ModelScope/lawyer_llama_data](https://modelscope.cn/datasets/AI-ModelScope/lawyer_llama_data/summary)||21476|194.4±91.7, min=27, max=924|chat, law|[Skepsun/lawyer_llama_data](https://huggingface.co/datasets/Skepsun/lawyer_llama_data)|
|tigerbot-law-zh|[AI-ModelScope/tigerbot-law-plugin](https://modelscope.cn/datasets/AI-ModelScope/tigerbot-law-plugin/summary)||55895|109.9±126.4, min=37, max=18878|text-generation, law, pretrained|[TigerResearch/tigerbot-law-plugin](https://huggingface.co/datasets/TigerResearch/tigerbot-law-plugin)|
|🔥disc-law-sft-zh|[AI-ModelScope/DISC-Law-SFT](https://modelscope.cn/datasets/AI-ModelScope/DISC-Law-SFT/summary)||166758|533.7±495.4, min=30, max=15169|chat, law|[ShengbinYue/DISC-Law-SFT](https://huggingface.co/datasets/ShengbinYue/DISC-Law-SFT)|
|🔥blossom-math-zh|[AI-ModelScope/blossom-math-v2](https://modelscope.cn/datasets/AI-ModelScope/blossom-math-v2/summary)||10000|169.3±58.7, min=35, max=563|chat, math|[Azure99/blossom-math-v2](https://huggingface.co/datasets/Azure99/blossom-math-v2)|
|school-math-zh|[AI-ModelScope/school_math_0.25M](https://modelscope.cn/datasets/AI-ModelScope/school_math_0.25M/summary)||248480|157.7±72.2, min=33, max=3450|chat, math, quality|[BelleGroup/school_math_0.25M](https://huggingface.co/datasets/BelleGroup/school_math_0.25M)|
|open-platypus-en|[AI-ModelScope/Open-Platypus](https://modelscope.cn/datasets/AI-ModelScope/Open-Platypus/summary)||24926|367.9±254.8, min=30, max=3951|chat, math, quality|[garage-bAInd/Open-Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus)|
|text2sql-en|[AI-ModelScope/texttosqlv2_25000_v2](https://modelscope.cn/datasets/AI-ModelScope/texttosqlv2_25000_v2/summary)||25000|274.6±326.4, min=38, max=1975|chat, sql|[Clinton/texttosqlv2_25000_v2](https://huggingface.co/datasets/Clinton/texttosqlv2_25000_v2)|
|🔥sql-create-context-en|[AI-ModelScope/sql-create-context](https://modelscope.cn/datasets/AI-ModelScope/sql-create-context/summary)||78577|80.2±17.8, min=36, max=456|chat, sql|[b-mc2/sql-create-context](https://huggingface.co/datasets/b-mc2/sql-create-context)|
|synthetic-text-to-sql|[AI-ModelScope/synthetic_text_to_sql](https://modelscope.cn/datasets/AI-ModelScope/synthetic_text_to_sql/summary)|default|100000|283.4±115.8, min=61, max=1356|nl2sql, en|[gretelai/synthetic_text_to_sql](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql)|
|🔥advertise-gen-zh|[lvjianjin/AdvertiseGen](https://modelscope.cn/datasets/lvjianjin/AdvertiseGen/summary)||98399|130.6±21.7, min=51, max=241|text-generation|[shibing624/AdvertiseGen](https://huggingface.co/datasets/shibing624/AdvertiseGen)|
|🔥dureader-robust-zh|[modelscope/DuReader_robust-QG](https://modelscope.cn/datasets/modelscope/DuReader_robust-QG/summary)||17899|241.1±137.4, min=60, max=1416|text-generation|-|
|cmnli-zh|[modelscope/clue](https://modelscope.cn/datasets/modelscope/clue/summary)|cmnli|404024|82.6±16.6, min=51, max=199|text-generation, classification|[clue](https://huggingface.co/datasets/clue)|
|🔥jd-sentiment-zh|[DAMO_NLP/jd](https://modelscope.cn/datasets/DAMO_NLP/jd/summary)||50000|66.0±83.2, min=39, max=4039|text-generation, classification|-|
|🔥hc3-zh|[simpleai/HC3-Chinese](https://modelscope.cn/datasets/simpleai/HC3-Chinese/summary)|baike
open_qa
nlpcc_dbqa
finance
medicine
law
psychology|39781|176.8±81.5, min=57, max=3051|text-generation, classification|[Hello-SimpleAI/HC3-Chinese](https://huggingface.co/datasets/Hello-SimpleAI/HC3-Chinese)|
|🔥hc3-en|[simpleai/HC3](https://modelscope.cn/datasets/simpleai/HC3/summary)|finance
medicine|11021|298.3±138.7, min=65, max=2267|text-generation, classification|[Hello-SimpleAI/HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3)|
|dolly-15k|[AI-ModelScope/databricks-dolly-15k](https://modelscope.cn/datasets/AI-ModelScope/databricks-dolly-15k/summary)|default|15011|199.2±267.8, min=22, max=8615|multi-task, en, quality|[databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k)|
|zhihu-kol|[OmniData/Zhihu-KOL](https://modelscope.cn/datasets/OmniData/Zhihu-KOL/summary)|default|-|Dataset is too huge, please click the original link to view the dataset stat.|zhihu, qa|[wangrui6/Zhihu-KOL](https://huggingface.co/datasets/wangrui6/Zhihu-KOL)|
|zhihu-kol-filtered|[OmniData/Zhihu-KOL-More-Than-100-Upvotes](https://modelscope.cn/datasets/OmniData/Zhihu-KOL-More-Than-100-Upvotes/summary)|default|271261|952.0±1727.2, min=25, max=98658|zhihu, qa|[bzb2023/Zhihu-KOL-More-Than-100-Upvotes](https://huggingface.co/datasets/bzb2023/Zhihu-KOL-More-Than-100-Upvotes)|
|finance-en|[wyj123456/finance_en](https://modelscope.cn/datasets/wyj123456/finance_en/summary)||68911|135.6±134.3, min=26, max=3525|chat, financial|[ssbuild/alpaca_finance_en](https://huggingface.co/datasets/ssbuild/alpaca_finance_en)|
|poetry-zh|[modelscope/chinese-poetry-collection](https://modelscope.cn/datasets/modelscope/chinese-poetry-collection/summary)||390309|55.2±9.4, min=23, max=83|text-generation, poetry|-|
|webnovel-zh|[AI-ModelScope/webnovel_cn](https://modelscope.cn/datasets/AI-ModelScope/webnovel_cn/summary)||50000|1478.9±11526.1, min=100, max=490484|chat, novel|[zxbsmk/webnovel_cn](https://huggingface.co/datasets/zxbsmk/webnovel_cn)|
|generated-chat-zh|[AI-ModelScope/generated_chat_0.4M](https://modelscope.cn/datasets/AI-ModelScope/generated_chat_0.4M/summary)||396004|273.3±52.0, min=32, max=873|chat, character-dialogue|[BelleGroup/generated_chat_0.4M](https://huggingface.co/datasets/BelleGroup/generated_chat_0.4M)|
|🔥self-cognition|[swift/self-cognition](https://modelscope.cn/datasets/swift/self-cognition/summary)||134|53.6±18.6, min=29, max=121|chat, self-cognition|[modelscope/self-cognition](https://huggingface.co/datasets/modelscope/self-cognition)|
|🔥swift-mix|[swift/swift-sft-mixture](https://modelscope.cn/datasets/swift/swift-sft-mixture/summary)|sharegpt
firefly
codefuse
metamathqa|-|Dataset is too huge, please click the original link to view the dataset stat.|chat, sft, general|-|
|cls-fudan-news-zh|[damo/zh_cls_fudan-news](https://modelscope.cn/datasets/damo/zh_cls_fudan-news/summary)||4959|3234.4±2547.5, min=91, max=19548|chat, classification|-|
|ner-jave-zh|[damo/zh_ner-JAVE](https://modelscope.cn/datasets/damo/zh_ner-JAVE/summary)||1266|118.3±45.5, min=44, max=223|chat, ner|-|
|coco-en|[modelscope/coco_2014_caption](https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary)|coco_2014_caption|454617|299.8±2.8, min=295, max=352|chat, multi-modal, vision|-|
|🔥coco-en-mini|[modelscope/coco_2014_caption](https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary)|coco_2014_caption|40504|299.8±2.6, min=295, max=338|chat, multi-modal, vision|-|
|coco-en-2|[modelscope/coco_2014_caption](https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary)|coco_2014_caption|454617|36.8±2.8, min=32, max=89|chat, multi-modal, vision|-|
|🔥coco-en-2-mini|[modelscope/coco_2014_caption](https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary)|coco_2014_caption|40504|36.8±2.6, min=32, max=75|chat, multi-modal, vision|-|
|capcha-images|[AI-ModelScope/captcha-images](https://modelscope.cn/datasets/AI-ModelScope/captcha-images/summary)||8000|31.0±0.0, min=31, max=31|chat, multi-modal, vision|-|
|latex-ocr-print|[AI-ModelScope/LaTeX_OCR](https://modelscope.cn/datasets/AI-ModelScope/LaTeX_OCR/summary)|default|17918|362.7±34.8, min=294, max=528|chat, ocr, multi-modal, vision|[linxy/LaTeX_OCR](https://huggingface.co/datasets/linxy/LaTeX_OCR)|
|latex-ocr-handwrite|[AI-ModelScope/LaTeX_OCR](https://modelscope.cn/datasets/AI-ModelScope/LaTeX_OCR/summary)|synthetic_handwrite|95424|375.1±59.4, min=292, max=2115|chat, ocr, multi-modal, vision|[linxy/LaTeX_OCR](https://huggingface.co/datasets/linxy/LaTeX_OCR)|
|aishell1-zh|[speech_asr/speech_asr_aishell1_trainsets](https://modelscope.cn/datasets/speech_asr/speech_asr_aishell1_trainsets/summary)||141600|152.2±36.8, min=63, max=419|chat, multi-modal, audio|-|
|🔥aishell1-zh-mini|[speech_asr/speech_asr_aishell1_trainsets](https://modelscope.cn/datasets/speech_asr/speech_asr_aishell1_trainsets/summary)||14526|152.2±35.6, min=74, max=359|chat, multi-modal, audio|-|
|🔥video-chatgpt|[swift/VideoChatGPT](https://modelscope.cn/datasets/swift/VideoChatGPT/summary)|Generic
Temporal
Consistency|3206|88.4±48.3, min=32, max=399|chat, multi-modal, video|[lmms-lab/VideoChatGPT](https://huggingface.co/datasets/lmms-lab/VideoChatGPT)|
|egoschema|[AI-ModelScope/egoschema](https://modelscope.cn/datasets/AI-ModelScope/egoschema/summary)|Subset|101|191.6±80.7, min=96, max=435|chat, multi-modal, video|[lmms-lab/egoschema](https://huggingface.co/datasets/lmms-lab/egoschema)|
|llava-video-178k|[lmms-lab/LLaVA-Video-178K](https://modelscope.cn/datasets/lmms-lab/LLaVA-Video-178K/summary)|0_30_s_academic_v0_1
0_30_s_youtube_v0_1
1_2_m_academic_v0_1
1_2_m_youtube_v0_1
2_3_m_academic_v0_1
2_3_m_youtube_v0_1
30_60_s_academic_v0_1
30_60_s_youtube_v0_1|-|Dataset is too huge, please click the original link to view the dataset stat.|chat, multi-modal, video|[lmms-lab/LLaVA-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K)|
|moviechat-1k-test|[AI-ModelScope/MovieChat-1K-test](https://modelscope.cn/datasets/AI-ModelScope/MovieChat-1K-test/summary)||486|36.1±4.3, min=27, max=42|chat, multi-modal, video|[Enxin/MovieChat-1K-test](https://huggingface.co/datasets/Enxin/MovieChat-1K-test)|
|hh-rlhf|[AI-ModelScope/hh-rlhf](https://modelscope.cn/datasets/AI-ModelScope/hh-rlhf/summary)|harmless-base
helpful-base
helpful-online
helpful-rejection-sampled|127459|245.4±190.7, min=22, max=1999|rlhf, dpo, pairwise|-|
|🔥hh-rlhf-cn|[AI-ModelScope/hh_rlhf_cn](https://modelscope.cn/datasets/AI-ModelScope/hh_rlhf_cn/summary)|hh_rlhf
harmless_base_cn
harmless_base_en
helpful_base_cn
helpful_base_en|355920|171.2±122.7, min=22, max=3078|rlhf, dpo, pairwise|-|
|orpo-dpo-mix-40k|[AI-ModelScope/orpo-dpo-mix-40k](https://modelscope.cn/datasets/AI-ModelScope/orpo-dpo-mix-40k/summary)|default|43666|548.3±397.4, min=28, max=8483|dpo, orpo, en, quality|[mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k)|
|stack-exchange-paired|[AI-ModelScope/stack-exchange-paired](https://modelscope.cn/datasets/AI-ModelScope/stack-exchange-paired/summary)||4483004|534.5±594.6, min=31, max=56588|hfrl, dpo, pairwise|[lvwerra/stack-exchange-paired](https://huggingface.co/datasets/lvwerra/stack-exchange-paired)|
|shareai-llama3-dpo-zh-en-emoji|[hjh0119/shareAI-Llama3-DPO-zh-en-emoji](https://modelscope.cn/datasets/hjh0119/shareAI-Llama3-DPO-zh-en-emoji/summary)|default|2449|334.0±162.8, min=36, max=1801|rlhf, dpo, pairwise|-|
|ultrafeedback-kto|[AI-ModelScope/ultrafeedback-binarized-preferences-cleaned-kto](https://modelscope.cn/datasets/AI-ModelScope/ultrafeedback-binarized-preferences-cleaned-kto/summary)|default|230720|11.0±0.0, min=11, max=11|rlhf, kto|-|
|rlaif-v|[swift/RLAIF-V-Dataset](https://modelscope.cn/datasets/swift/RLAIF-V-Dataset/summary)|default|83132|119.8±52.6, min=28, max=556|rlhf, dpo, multi-modal, en|[openbmb/RLAIF-V-Dataset](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset)|
|pileval|[swift/pile-val-backup](https://modelscope.cn/datasets/swift/pile-val-backup/summary)||214670|1612.3±8856.2, min=11, max=1208955|text-generation, awq|[mit-han-lab/pile-val-backup](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)|
|mantis-instruct|[swift/Mantis-Instruct](https://modelscope.cn/datasets/swift/Mantis-Instruct/summary)|birds-to-words
chartqa
coinstruct
contrastive_caption
docvqa
dreamsim
dvqa
iconqa
imagecode
llava_665k_multi
lrv_multi
multi_vqa
nextqa
nlvr2
spot-the-diff
star
visual_story_telling|655351|825.7±812.5, min=284, max=13563|chat, multi-modal, vision, quality|[TIGER-Lab/Mantis-Instruct](https://huggingface.co/datasets/TIGER-Lab/Mantis-Instruct)|
|llava-data-instruct|[swift/llava-data](https://modelscope.cn/datasets/swift/llava-data/summary)|llava_instruct|364100|189.0±142.1, min=33, max=5183|sft, multi-modal, quality|[TIGER-Lab/llava-data](https://huggingface.co/datasets/TIGER-Lab/llava-data)|
|midefics|[swift/MideficsDataset](https://modelscope.cn/datasets/swift/MideficsDataset/summary)||3800|201.3±70.2, min=60, max=454|medical, en, vqa|[WinterSchool/MideficsDataset](https://huggingface.co/datasets/WinterSchool/MideficsDataset)|
|gqa|[None](https://modelscope.cn/datasets/None/summary)|train_all_instructions|-|Dataset is too huge, please click the original link to view the dataset stat.|multi-modal, en, vqa, quality|[lmms-lab/GQA](https://huggingface.co/datasets/lmms-lab/GQA)|
|text-caps|[swift/TextCaps](https://modelscope.cn/datasets/swift/TextCaps/summary)||18145|38.2±4.4, min=31, max=73|multi-modal, en, caption, quality|[HuggingFaceM4/TextCaps](https://huggingface.co/datasets/HuggingFaceM4/TextCaps)|
|refcoco-unofficial-caption|[swift/refcoco](https://modelscope.cn/datasets/swift/refcoco/summary)||46215|44.7±3.2, min=36, max=71|multi-modal, en, caption|[jxu124/refcoco](https://huggingface.co/datasets/jxu124/refcoco)|
|refcoco-unofficial-grounding|[swift/refcoco](https://modelscope.cn/datasets/swift/refcoco/summary)||46215|45.2±3.1, min=37, max=69|multi-modal, en, grounding|[jxu124/refcoco](https://huggingface.co/datasets/jxu124/refcoco)|
|refcocog-unofficial-caption|[swift/refcocog](https://modelscope.cn/datasets/swift/refcocog/summary)||44799|49.7±4.7, min=37, max=88|multi-modal, en, caption|[jxu124/refcocog](https://huggingface.co/datasets/jxu124/refcocog)|
|refcocog-unofficial-grounding|[swift/refcocog](https://modelscope.cn/datasets/swift/refcocog/summary)||44799|50.1±4.7, min=37, max=90|multi-modal, en, grounding|[jxu124/refcocog](https://huggingface.co/datasets/jxu124/refcocog)|
|a-okvqa|[swift/A-OKVQA](https://modelscope.cn/datasets/swift/A-OKVQA/summary)||18201|45.8±7.9, min=32, max=100|multi-modal, en, vqa, quality|[HuggingFaceM4/A-OKVQA](https://huggingface.co/datasets/HuggingFaceM4/A-OKVQA)|
|okvqa|[swift/OK-VQA_train](https://modelscope.cn/datasets/swift/OK-VQA_train/summary)||9009|34.4±3.3, min=28, max=59|multi-modal, en, vqa, quality|[Multimodal-Fatima/OK-VQA_train](https://huggingface.co/datasets/Multimodal-Fatima/OK-VQA_train)|
|ocr-vqa|[swift/OCR-VQA](https://modelscope.cn/datasets/swift/OCR-VQA/summary)||186753|35.6±6.6, min=29, max=193|multi-modal, en, ocr-vqa|[howard-hou/OCR-VQA](https://huggingface.co/datasets/howard-hou/OCR-VQA)|
|grit|[swift/GRIT](https://modelscope.cn/datasets/swift/GRIT/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|multi-modal, en, caption-grounding, quality|[zzliang/GRIT](https://huggingface.co/datasets/zzliang/GRIT)|
|llava-instruct-mix|[swift/llava-instruct-mix-vsft](https://modelscope.cn/datasets/swift/llava-instruct-mix-vsft/summary)||13640|179.8±120.2, min=30, max=962|multi-modal, en, vqa, quality|[HuggingFaceH4/llava-instruct-mix-vsft](https://huggingface.co/datasets/HuggingFaceH4/llava-instruct-mix-vsft)|
|lnqa|[swift/lnqa](https://modelscope.cn/datasets/swift/lnqa/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|multi-modal, en, ocr-vqa, quality|[vikhyatk/lnqa](https://huggingface.co/datasets/vikhyatk/lnqa)|
|science-qa|[swift/ScienceQA](https://modelscope.cn/datasets/swift/ScienceQA/summary)||8315|100.3±59.5, min=38, max=638|multi-modal, science, vqa, quality|[derek-thomas/ScienceQA](https://huggingface.co/datasets/derek-thomas/ScienceQA)|
|guanaco|[AI-ModelScope/GuanacoDataset](https://modelscope.cn/datasets/AI-ModelScope/GuanacoDataset/summary)|default|31561|250.1±70.3, min=89, max=1436|chat, zh|[JosephusCheung/GuanacoDataset](https://huggingface.co/datasets/JosephusCheung/GuanacoDataset)|
|mind2web|[swift/Multimodal-Mind2Web](https://modelscope.cn/datasets/swift/Multimodal-Mind2Web/summary)||1009|297522.4±325496.2, min=8592, max=3499715|agent, multi-modal|[osunlp/Multimodal-Mind2Web](https://huggingface.co/datasets/osunlp/Multimodal-Mind2Web)|
|sharegpt-4o-image|[AI-ModelScope/ShareGPT-4o](https://modelscope.cn/datasets/AI-ModelScope/ShareGPT-4o/summary)|image_caption|57289|638.7±157.9, min=47, max=4640|vqa, multi-modal|[OpenGVLab/ShareGPT-4o](https://huggingface.co/datasets/OpenGVLab/ShareGPT-4o)|
|pixelprose|[swift/pixelprose](https://modelscope.cn/datasets/swift/pixelprose/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|caption, multi-modal, vision|[tomg-group-umd/pixelprose](https://huggingface.co/datasets/tomg-group-umd/pixelprose)|
|m3it|[AI-ModelScope/M3IT](https://modelscope.cn/datasets/AI-ModelScope/M3IT/summary)|coco
vqa-v2
shapes
shapes-rephrased
coco-goi-rephrased
snli-ve
snli-ve-rephrased
okvqa
a-okvqa
viquae
textcap
docvqa
science-qa
imagenet
imagenet-open-ended
imagenet-rephrased
coco-goi
clevr
clevr-rephrased
nlvr
coco-itm
coco-itm-rephrased
vsr
vsr-rephrased
mocheg
mocheg-rephrased
coco-text
fm-iqa
activitynet-qa
msrvtt
ss
coco-cn
refcoco
refcoco-rephrased
multi30k
image-paragraph-captioning
visual-dialog
visual-dialog-rephrased
iqa
vcr
visual-mrc
ivqa
msrvtt-qa
msvd-qa
gqa
text-vqa
ocr-vqa
st-vqa
flickr8k-cn|-|Dataset is too huge, please click the original link to view the dataset stat.|chat, multi-modal, vision|-|
|sharegpt4v|[AI-ModelScope/ShareGPT4V](https://modelscope.cn/datasets/AI-ModelScope/ShareGPT4V/summary)|ShareGPT4V
ShareGPT4V-PT|-|Dataset is too huge, please click the original link to view the dataset stat.|chat, multi-modal, vision|-|
|llava-instruct-150k|[AI-ModelScope/LLaVA-Instruct-150K](https://modelscope.cn/datasets/AI-ModelScope/LLaVA-Instruct-150K/summary)||624610|490.4±180.2, min=288, max=5438|chat, multi-modal, vision|-|
|llava-pretrain|[AI-ModelScope/LLaVA-Pretrain](https://modelscope.cn/datasets/AI-ModelScope/LLaVA-Pretrain/summary)|default|-|Dataset is too huge, please click the original link to view the dataset stat.|vqa, multi-modal, quality|[liuhaotian/LLaVA-Pretrain](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain)|
|sa1b-dense-caption|[Tongyi-DataEngine/SA1B-Dense-Caption](https://modelscope.cn/datasets/Tongyi-DataEngine/SA1B-Dense-Caption/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|zh, multi-modal, vqa|-|
|sa1b-paired-caption|[Tongyi-DataEngine/SA1B-Paired-Captions-Images](https://modelscope.cn/datasets/Tongyi-DataEngine/SA1B-Paired-Captions-Images/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|zh, multi-modal, vqa|-|
|alpaca-cleaned|[AI-ModelScope/alpaca-cleaned](https://modelscope.cn/datasets/AI-ModelScope/alpaca-cleaned/summary)||51760|177.9±126.4, min=26, max=1044|chat, general, bench, quality|[yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned)|
|aya-collection|[swift/aya_collection](https://modelscope.cn/datasets/swift/aya_collection/summary)|aya_dataset|202364|494.0±6911.3, min=21, max=3044268|multi-lingual, qa|[CohereForAI/aya_collection](https://huggingface.co/datasets/CohereForAI/aya_collection)|
|belle-generated-chat-0.4M|[AI-ModelScope/generated_chat_0.4M](https://modelscope.cn/datasets/AI-ModelScope/generated_chat_0.4M/summary)||396004|273.3±52.0, min=32, max=873|common, zh|[BelleGroup/generated_chat_0.4M](https://huggingface.co/datasets/BelleGroup/generated_chat_0.4M)|
|belle-math-0.25M|[AI-ModelScope/school_math_0.25M](https://modelscope.cn/datasets/AI-ModelScope/school_math_0.25M/summary)||248480|157.7±72.2, min=33, max=3450|math, zh|[BelleGroup/school_math_0.25M](https://huggingface.co/datasets/BelleGroup/school_math_0.25M)|
|belle-train-0.5M-CN|[AI-ModelScope/train_0.5M_CN](https://modelscope.cn/datasets/AI-ModelScope/train_0.5M_CN/summary)||519255|129.1±91.5, min=27, max=6507|common, zh, quality|[BelleGroup/train_0.5M_CN](https://huggingface.co/datasets/BelleGroup/train_0.5M_CN)|
|belle-train-1M-CN|[AI-ModelScope/train_1M_CN](https://modelscope.cn/datasets/AI-ModelScope/train_1M_CN/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|common, zh, quality|[BelleGroup/train_1M_CN](https://huggingface.co/datasets/BelleGroup/train_1M_CN)|
|belle-train-2M-CN|[AI-ModelScope/train_2M_CN](https://modelscope.cn/datasets/AI-ModelScope/train_2M_CN/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|common, zh, quality|[BelleGroup/train_2M_CN](https://huggingface.co/datasets/BelleGroup/train_2M_CN)|
|belle-train-3.5M-CN|[swift/train_3.5M_CN](https://modelscope.cn/datasets/swift/train_3.5M_CN/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|common, zh, quality|[BelleGroup/train_3.5M_CN](https://huggingface.co/datasets/BelleGroup/train_3.5M_CN)|
|c4|[None](https://modelscope.cn/datasets/None/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|pretrain, quality|[allenai/c4](https://huggingface.co/datasets/allenai/c4)|
|chart-qa|[swift/ChartQA](https://modelscope.cn/datasets/swift/ChartQA/summary)||28299|43.1±5.5, min=29, max=77|en, vqa, quality|[HuggingFaceM4/ChartQA](https://huggingface.co/datasets/HuggingFaceM4/ChartQA)|
|chinese-c4|[swift/chinese-c4](https://modelscope.cn/datasets/swift/chinese-c4/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|pretrain, zh, quality|[shjwudp/chinese-c4](https://huggingface.co/datasets/shjwudp/chinese-c4)|
|cinepile|[swift/cinepile](https://modelscope.cn/datasets/swift/cinepile/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|vqa, en, youtube, video|[tomg-group-umd/cinepile](https://huggingface.co/datasets/tomg-group-umd/cinepile)|
|classical-chinese-translate|[swift/classical_chinese_translate](https://modelscope.cn/datasets/swift/classical_chinese_translate/summary)||6655|344.0±76.4, min=61, max=815|chat, play-ground|-|
|codealpaca-20k|[AI-ModelScope/CodeAlpaca-20k](https://modelscope.cn/datasets/AI-ModelScope/CodeAlpaca-20k/summary)||20016|100.2±60.1, min=29, max=1776|code, en|[HuggingFaceH4/CodeAlpaca_20K](https://huggingface.co/datasets/HuggingFaceH4/CodeAlpaca_20K)|
|cosmopedia|[None](https://modelscope.cn/datasets/None/summary)|auto_math_text
khanacademy
openstax
stanford
stories
web_samples_v1
web_samples_v2
wikihow|-|Dataset is too huge, please click the original link to view the dataset stat.|multi-domain, en, qa|[HuggingFaceTB/cosmopedia](https://huggingface.co/datasets/HuggingFaceTB/cosmopedia)|
|cosmopedia-100k|[swift/cosmopedia-100k](https://modelscope.cn/datasets/swift/cosmopedia-100k/summary)||100000|1024.5±243.1, min=239, max=2981|multi-domain, en, qa|[HuggingFaceTB/cosmopedia-100k](https://huggingface.co/datasets/HuggingFaceTB/cosmopedia-100k)|
|dolma|[swift/dolma](https://modelscope.cn/datasets/swift/dolma/summary)|v1_7|-|Dataset is too huge, please click the original link to view the dataset stat.|pretrain, quality|[allenai/dolma](https://huggingface.co/datasets/allenai/dolma)|
|dolphin|[swift/dolphin](https://modelscope.cn/datasets/swift/dolphin/summary)|flan1m-alpaca-uncensored
flan5m-alpaca-uncensored|-|Dataset is too huge, please click the original link to view the dataset stat.|en|[cognitivecomputations/dolphin](https://huggingface.co/datasets/cognitivecomputations/dolphin)|
|duet|[AI-ModelScope/Duet-v0.5](https://modelscope.cn/datasets/AI-ModelScope/Duet-v0.5/summary)||5000|1157.4±189.3, min=657, max=2344|CoT, en|[G-reen/Duet-v0.5](https://huggingface.co/datasets/G-reen/Duet-v0.5)|
|evol-instruct-v2|[AI-ModelScope/WizardLM_evol_instruct_V2_196k](https://modelscope.cn/datasets/AI-ModelScope/WizardLM_evol_instruct_V2_196k/summary)||109184|480.9±333.1, min=26, max=4942|chat, en|[WizardLM/WizardLM_evol_instruct_V2_196k](https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_V2_196k)|
|fineweb|[None](https://modelscope.cn/datasets/None/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|pretrain, quality|[HuggingFaceFW/fineweb](https://huggingface.co/datasets/HuggingFaceFW/fineweb)|
|gen-qa|[swift/GenQA](https://modelscope.cn/datasets/swift/GenQA/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|qa, quality, multi-task|[tomg-group-umd/GenQA](https://huggingface.co/datasets/tomg-group-umd/GenQA)|
|github-code|[swift/github-code](https://modelscope.cn/datasets/swift/github-code/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|pretrain, quality|[codeparrot/github-code](https://huggingface.co/datasets/codeparrot/github-code)|
|gpt4v-dataset|[swift/gpt4v-dataset](https://modelscope.cn/datasets/swift/gpt4v-dataset/summary)||12356|217.9±68.3, min=35, max=596|en, caption, multi-modal, quality|[laion/gpt4v-dataset](https://huggingface.co/datasets/laion/gpt4v-dataset)|
|guanaco-belle-merge|[AI-ModelScope/guanaco_belle_merge_v1.0](https://modelscope.cn/datasets/AI-ModelScope/guanaco_belle_merge_v1.0/summary)||693987|134.2±92.0, min=24, max=6507|QA, zh|[Chinese-Vicuna/guanaco_belle_merge_v1.0](https://huggingface.co/datasets/Chinese-Vicuna/guanaco_belle_merge_v1.0)|
|infinity-instruct|[swift/Infinity-Instruct](https://modelscope.cn/datasets/swift/Infinity-Instruct/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|qa, quality, multi-task|[BAAI/Infinity-Instruct](https://huggingface.co/datasets/BAAI/Infinity-Instruct)|
|llava-med-zh-instruct|[swift/llava-med-zh-instruct-60k](https://modelscope.cn/datasets/swift/llava-med-zh-instruct-60k/summary)||56649|207.7±67.6, min=37, max=657|zh, medical, vqa|[BUAADreamer/llava-med-zh-instruct-60k](https://huggingface.co/datasets/BUAADreamer/llava-med-zh-instruct-60k)|
|🔥longwriter-6k|[ZhipuAI/LongWriter-6k](https://modelscope.cn/datasets/ZhipuAI/LongWriter-6k/summary)||6000|4887.2±2879.2, min=117, max=30354|long, chat, sft|[THUDM/LongWriter-6k](https://huggingface.co/datasets/THUDM/LongWriter-6k)|
|🔥longwriter-6k-filtered|[swift/longwriter-6k-filtered](https://modelscope.cn/datasets/swift/longwriter-6k-filtered/summary)||666|4108.9±2636.9, min=1190, max=17050|long, chat, sft|-|
|math-instruct|[AI-ModelScope/MathInstruct](https://modelscope.cn/datasets/AI-ModelScope/MathInstruct/summary)||262283|254.4±183.5, min=11, max=4383|math, cot, en, quality|[TIGER-Lab/MathInstruct](https://huggingface.co/datasets/TIGER-Lab/MathInstruct)|
|math-plus|[TIGER-Lab/MATH-plus](https://modelscope.cn/datasets/TIGER-Lab/MATH-plus/summary)|train|893929|287.1±158.7, min=24, max=2919|qa, math, en, quality|[TIGER-Lab/MATH-plus](https://huggingface.co/datasets/TIGER-Lab/MATH-plus)|
|moondream2-coyo-5M|[swift/moondream2-coyo-5M-captions](https://modelscope.cn/datasets/swift/moondream2-coyo-5M-captions/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|caption, pretrain, quality|[isidentical/moondream2-coyo-5M-captions](https://huggingface.co/datasets/isidentical/moondream2-coyo-5M-captions)|
|no-robots|[swift/no_robots](https://modelscope.cn/datasets/swift/no_robots/summary)||9485|298.7±246.4, min=40, max=6739|multi-task, quality, human-annotated|[HuggingFaceH4/no_robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots)|
|open-hermes|[swift/OpenHermes-2.5](https://modelscope.cn/datasets/swift/OpenHermes-2.5/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|cot, en, quality|[teknium/OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5)|
|open-o1|[AI-ModelScope/OpenO1-SFT](https://modelscope.cn/datasets/AI-ModelScope/OpenO1-SFT/summary)|default|203579|615.5±659.6, min=11, max=27509|chat, general, o1|[O1-OPEN/OpenO1-SFT](https://huggingface.co/datasets/O1-OPEN/OpenO1-SFT)|
|open-orca-chinese|[AI-ModelScope/OpenOrca-Chinese](https://modelscope.cn/datasets/AI-ModelScope/OpenOrca-Chinese/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|QA, zh, general, quality|[yys/OpenOrca-Chinese](https://huggingface.co/datasets/yys/OpenOrca-Chinese)|
|orca_dpo_pairs|[swift/orca_dpo_pairs](https://modelscope.cn/datasets/swift/orca_dpo_pairs/summary)||12859|366.9±251.9, min=30, max=2010|rlhf, quality|[Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs)|
|path-vqa|[swift/path-vqa](https://modelscope.cn/datasets/swift/path-vqa/summary)||19654|34.8±7.3, min=27, max=85|multi-modal, vqa, medical|[flaviagiammarino/path-vqa](https://huggingface.co/datasets/flaviagiammarino/path-vqa)|
|pile|[AI-ModelScope/pile](https://modelscope.cn/datasets/AI-ModelScope/pile/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|pretrain|[EleutherAI/pile](https://huggingface.co/datasets/EleutherAI/pile)|
|poison-mpts|[iic/100PoisonMpts](https://modelscope.cn/datasets/iic/100PoisonMpts/summary)||906|150.6±80.8, min=39, max=656|poison-management, zh|-|
|🔥qwen2-pro-en|[AI-ModelScope/Magpie-Qwen2-Pro-200K-English](https://modelscope.cn/datasets/AI-ModelScope/Magpie-Qwen2-Pro-200K-English/summary)||200000|605.4±287.3, min=221, max=4267|chat, sft, en|[Magpie-Align/Magpie-Qwen2-Pro-200K-English](https://huggingface.co/datasets/Magpie-Align/Magpie-Qwen2-Pro-200K-English)|
|🔥qwen2-pro-filtered|[AI-ModelScope/Magpie-Qwen2-Pro-300K-Filtered](https://modelscope.cn/datasets/AI-ModelScope/Magpie-Qwen2-Pro-300K-Filtered/summary)||300000|555.8±286.6, min=148, max=4267|chat, sft|[Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered)|
|🔥qwen2-pro-zh|[AI-ModelScope/Magpie-Qwen2-Pro-200K-Chinese](https://modelscope.cn/datasets/AI-ModelScope/Magpie-Qwen2-Pro-200K-Chinese/summary)||200000|446.2±246.4, min=74, max=4101|chat, sft, zh|[Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese](https://huggingface.co/datasets/Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese)|
|redpajama-data-1t|[swift/RedPajama-Data-1T](https://modelscope.cn/datasets/swift/RedPajama-Data-1T/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|pretrain, quality|[togethercomputer/RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T)|
|redpajama-data-v2|[swift/RedPajama-Data-V2](https://modelscope.cn/datasets/swift/RedPajama-Data-V2/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|pretrain, quality|[togethercomputer/RedPajama-Data-V2](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-V2)|
|refinedweb|[None](https://modelscope.cn/datasets/None/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|pretrain, quality|[tiiuae/falcon-refinedweb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)|
|rwkv-pretrain-web|[mapjack/openwebtext_dataset](https://modelscope.cn/datasets/mapjack/openwebtext_dataset/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|pretrain, zh, quality|-|
|sft-nectar|[AI-ModelScope/SFT-Nectar](https://modelscope.cn/datasets/AI-ModelScope/SFT-Nectar/summary)||131192|396.4±272.1, min=44, max=10732|cot, en, quality|[AstraMindAI/SFT-Nectar](https://huggingface.co/datasets/AstraMindAI/SFT-Nectar)|
|skypile|[AI-ModelScope/SkyPile-150B](https://modelscope.cn/datasets/AI-ModelScope/SkyPile-150B/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|pretrain, quality, zh|[Skywork/SkyPile-150B](https://huggingface.co/datasets/Skywork/SkyPile-150B)|
|slim-orca|[swift/SlimOrca](https://modelscope.cn/datasets/swift/SlimOrca/summary)||517982|399.1±370.2, min=35, max=8756|quality, en|[Open-Orca/SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca)|
|slim-pajama-627b|[None](https://modelscope.cn/datasets/None/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|pretrain, quality|[cerebras/SlimPajama-627B](https://huggingface.co/datasets/cerebras/SlimPajama-627B)|
|starcoder|[AI-ModelScope/starcoderdata](https://modelscope.cn/datasets/AI-ModelScope/starcoderdata/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|pretrain, quality|[bigcode/starcoderdata](https://huggingface.co/datasets/bigcode/starcoderdata)|
|tagengo-gpt4|[swift/tagengo-gpt4](https://modelscope.cn/datasets/swift/tagengo-gpt4/summary)||78057|472.3±292.9, min=22, max=3521|chat, multi-lingual, quality|[lightblue/tagengo-gpt4](https://huggingface.co/datasets/lightblue/tagengo-gpt4)|
|the-stack|[AI-ModelScope/the-stack](https://modelscope.cn/datasets/AI-ModelScope/the-stack/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|pretrain, quality|[bigcode/the-stack](https://huggingface.co/datasets/bigcode/the-stack)|
|ultrachat-200k|[swift/ultrachat_200k](https://modelscope.cn/datasets/swift/ultrachat_200k/summary)||207865|1195.4±573.7, min=76, max=4470|chat, en, quality|[HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k)|
|vqa-v2|[swift/VQAv2](https://modelscope.cn/datasets/swift/VQAv2/summary)||443757|31.8±2.2, min=27, max=58|en, vqa, quality|[HuggingFaceM4/VQAv2](https://huggingface.co/datasets/HuggingFaceM4/VQAv2)|
|web-instruct-sub|[swift/WebInstructSub](https://modelscope.cn/datasets/swift/WebInstructSub/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|qa, en, math, quality, multi-domain, science|[TIGER-Lab/WebInstructSub](https://huggingface.co/datasets/TIGER-Lab/WebInstructSub)|
|wikipedia|[swift/wikipedia](https://modelscope.cn/datasets/swift/wikipedia/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|pretrain, quality|[wikipedia](https://huggingface.co/datasets/wikipedia)|
|wikipedia-cn-filtered|[AI-ModelScope/wikipedia-cn-20230720-filtered](https://modelscope.cn/datasets/AI-ModelScope/wikipedia-cn-20230720-filtered/summary)||-|Dataset is too huge, please click the original link to view the dataset stat.|pretrain, quality|[pleisto/wikipedia-cn-20230720-filtered](https://huggingface.co/datasets/pleisto/wikipedia-cn-20230720-filtered)|
|zhihu-rlhf|[AI-ModelScope/zhihu_rlhf_3k](https://modelscope.cn/datasets/AI-ModelScope/zhihu_rlhf_3k/summary)||3460|594.5±365.9, min=31, max=1716|rlhf, dpo, zh|[liyucheng/zhihu_rlhf_3k](https://huggingface.co/datasets/liyucheng/zhihu_rlhf_3k)|