支持的模型和数据集

目录

模型
- 大语言模型
- 多模态大模型
数据集

模型

下表介绍了swift介入的模型的相关信息:

Model List: 模型在swift中注册的model_type的列表.
Default Lora Target Modules: 对应模型的默认lora_target_modules.
Default Template: 对应模型的默认template.
Support Flash Attn: 模型是否支持flash attention加速推理和微调.
Support VLLM: 模型是否支持vllm加速推理和部署.
Requires: 对应模型所需的额外依赖要求.

大语言模型

Model Type	Model ID	Default Lora Target Modules	Default Template	Support Flash Attn	Support vLLM	Support LMDeploy	Support Megatron	Requires	Tags	HF Model ID
qwen-1_8b	qwen/Qwen-1_8B	c_attn	default-generation	✔	✔	✔	✘		-	Qwen/Qwen-1_8B
qwen-1_8b-chat	qwen/Qwen-1_8B-Chat	c_attn	qwen	✔	✔	✔	✘		-	Qwen/Qwen-1_8B-Chat
qwen-1_8b-chat-int4	qwen/Qwen-1_8B-Chat-Int4	c_attn	qwen	✔	✔	✘	✘	auto_gptq>=0.5	-	Qwen/Qwen-1_8B-Chat-Int4
qwen-1_8b-chat-int8	qwen/Qwen-1_8B-Chat-Int8	c_attn	qwen	✔	✔	✘	✘	auto_gptq>=0.5	-	Qwen/Qwen-1_8B-Chat-Int8
qwen-7b	qwen/Qwen-7B	c_attn	default-generation	✔	✔	✔	✘		-	Qwen/Qwen-7B
qwen-7b-chat	qwen/Qwen-7B-Chat	c_attn	qwen	✔	✔	✔	✘		-	Qwen/Qwen-7B-Chat
qwen-7b-chat-int4	qwen/Qwen-7B-Chat-Int4	c_attn	qwen	✔	✔	✘	✘	auto_gptq>=0.5	-	Qwen/Qwen-7B-Chat-Int4
qwen-7b-chat-int8	qwen/Qwen-7B-Chat-Int8	c_attn	qwen	✔	✔	✘	✘	auto_gptq>=0.5	-	Qwen/Qwen-7B-Chat-Int8
qwen-14b	qwen/Qwen-14B	c_attn	default-generation	✔	✔	✔	✘		-	Qwen/Qwen-14B
qwen-14b-chat	qwen/Qwen-14B-Chat	c_attn	qwen	✔	✔	✔	✘		-	Qwen/Qwen-14B-Chat
qwen-14b-chat-int4	qwen/Qwen-14B-Chat-Int4	c_attn	qwen	✔	✔	✘	✘	auto_gptq>=0.5	-	Qwen/Qwen-14B-Chat-Int4
qwen-14b-chat-int8	qwen/Qwen-14B-Chat-Int8	c_attn	qwen	✔	✔	✘	✘	auto_gptq>=0.5	-	Qwen/Qwen-14B-Chat-Int8
qwen-72b	qwen/Qwen-72B	c_attn	default-generation	✔	✔	✔	✘		-	Qwen/Qwen-72B
qwen-72b-chat	qwen/Qwen-72B-Chat	c_attn	qwen	✔	✔	✔	✘		-	Qwen/Qwen-72B-Chat
qwen-72b-chat-int4	qwen/Qwen-72B-Chat-Int4	c_attn	qwen	✔	✔	✘	✘	auto_gptq>=0.5	-	Qwen/Qwen-72B-Chat-Int4
qwen-72b-chat-int8	qwen/Qwen-72B-Chat-Int8	c_attn	qwen	✔	✔	✘	✘	auto_gptq>=0.5	-	Qwen/Qwen-72B-Chat-Int8
modelscope-agent-7b	iic/ModelScope-Agent-7B	c_attn	modelscope-agent	✔	✘	✘	✘		-	-
modelscope-agent-14b	iic/ModelScope-Agent-14B	c_attn	modelscope-agent	✔	✘	✘	✘		-	-
qwen1half-0_5b	qwen/Qwen1.5-0.5B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-0.5B
qwen1half-1_8b	qwen/Qwen1.5-1.8B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-1.8B
qwen1half-4b	qwen/Qwen1.5-4B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-4B
qwen1half-7b	qwen/Qwen1.5-7B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-7B
qwen1half-14b	qwen/Qwen1.5-14B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-14B
qwen1half-32b	qwen/Qwen1.5-32B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen1.5-32B
qwen1half-72b	qwen/Qwen1.5-72B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-72B
qwen1half-110b	qwen/Qwen1.5-110B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen1.5-110B
codeqwen1half-7b	qwen/CodeQwen1.5-7B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/CodeQwen1.5-7B
qwen1half-moe-a2_7b	qwen/Qwen1.5-MoE-A2.7B	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘	transformers>=4.40	moe	Qwen/Qwen1.5-MoE-A2.7B
qwen1half-0_5b-chat	qwen/Qwen1.5-0.5B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-0.5B-Chat
qwen1half-1_8b-chat	qwen/Qwen1.5-1.8B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-1.8B-Chat
qwen1half-4b-chat	qwen/Qwen1.5-4B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-4B-Chat
qwen1half-7b-chat	qwen/Qwen1.5-7B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-7B-Chat
qwen1half-14b-chat	qwen/Qwen1.5-14B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-14B-Chat
qwen1half-32b-chat	qwen/Qwen1.5-32B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen1.5-32B-Chat
qwen1half-72b-chat	qwen/Qwen1.5-72B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen1.5-72B-Chat
qwen1half-110b-chat	qwen/Qwen1.5-110B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen1.5-110B-Chat
qwen1half-moe-a2_7b-chat	qwen/Qwen1.5-MoE-A2.7B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	transformers>=4.40	moe	Qwen/Qwen1.5-MoE-A2.7B-Chat
codeqwen1half-7b-chat	qwen/CodeQwen1.5-7B-Chat	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37	-	Qwen/CodeQwen1.5-7B-Chat
qwen1half-0_5b-chat-int4	qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4
qwen1half-1_8b-chat-int4	qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4
qwen1half-4b-chat-int4	qwen/Qwen1.5-4B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-4B-Chat-GPTQ-Int4
qwen1half-7b-chat-int4	qwen/Qwen1.5-7B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-7B-Chat-GPTQ-Int4
qwen1half-14b-chat-int4	qwen/Qwen1.5-14B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-14B-Chat-GPTQ-Int4
qwen1half-32b-chat-int4	qwen/Qwen1.5-32B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-32B-Chat-GPTQ-Int4
qwen1half-72b-chat-int4	qwen/Qwen1.5-72B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-72B-Chat-GPTQ-Int4
qwen1half-110b-chat-int4	qwen/Qwen1.5-110B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-110B-Chat-GPTQ-Int4
qwen1half-0_5b-chat-int8	qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8
qwen1half-1_8b-chat-int8	qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8
qwen1half-4b-chat-int8	qwen/Qwen1.5-4B-Chat-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-4B-Chat-GPTQ-Int8
qwen1half-7b-chat-int8	qwen/Qwen1.5-7B-Chat-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-7B-Chat-GPTQ-Int8
qwen1half-14b-chat-int8	qwen/Qwen1.5-14B-Chat-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-14B-Chat-GPTQ-Int8
qwen1half-72b-chat-int8	qwen/Qwen1.5-72B-Chat-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen1.5-72B-Chat-GPTQ-Int8
qwen1half-moe-a2_7b-chat-int4	qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✘	✘	✘	auto_gptq>=0.5, transformers>=4.40	moe	Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4
qwen1half-0_5b-chat-awq	qwen/Qwen1.5-0.5B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-0.5B-Chat-AWQ
qwen1half-1_8b-chat-awq	qwen/Qwen1.5-1.8B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-1.8B-Chat-AWQ
qwen1half-4b-chat-awq	qwen/Qwen1.5-4B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-4B-Chat-AWQ
qwen1half-7b-chat-awq	qwen/Qwen1.5-7B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-7B-Chat-AWQ
qwen1half-14b-chat-awq	qwen/Qwen1.5-14B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-14B-Chat-AWQ
qwen1half-32b-chat-awq	qwen/Qwen1.5-32B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-32B-Chat-AWQ
qwen1half-72b-chat-awq	qwen/Qwen1.5-72B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-72B-Chat-AWQ
qwen1half-110b-chat-awq	qwen/Qwen1.5-110B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen1.5-110B-Chat-AWQ
codeqwen1half-7b-chat-awq	qwen/CodeQwen1.5-7B-Chat-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/CodeQwen1.5-7B-Chat-AWQ
qwen2-0_5b	qwen/Qwen2-0.5B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-0.5B
qwen2-0_5b-instruct	qwen/Qwen2-0.5B-Instruct	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-0.5B-Instruct
qwen2-0_5b-instruct-int4	qwen/Qwen2-0.5B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2-0.5B-Instruct-GPTQ-Int4
qwen2-0_5b-instruct-int8	qwen/Qwen2-0.5B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2-0.5B-Instruct-GPTQ-Int8
qwen2-0_5b-instruct-awq	qwen/Qwen2-0.5B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2-0.5B-Instruct-AWQ
qwen2-1_5b	qwen/Qwen2-1.5B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-1.5B
qwen2-1_5b-instruct	qwen/Qwen2-1.5B-Instruct	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-1.5B-Instruct
qwen2-1_5b-instruct-int4	qwen/Qwen2-1.5B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4
qwen2-1_5b-instruct-int8	qwen/Qwen2-1.5B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2-1_5B-Instruct-GPTQ-Int8
qwen2-1_5b-instruct-awq	qwen/Qwen2-1.5B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2-1.5B-Instruct-AWQ
qwen2-7b	qwen/Qwen2-7B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-7B
qwen2-7b-instruct	qwen/Qwen2-7B-Instruct	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-7B-Instruct
qwen2-7b-instruct-int4	qwen/Qwen2-7B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2-7B-Instruct-GPTQ-Int4
qwen2-7b-instruct-int8	qwen/Qwen2-7B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2-7B-Instruct-GPTQ-Int8
qwen2-7b-instruct-awq	qwen/Qwen2-7B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2-7B-Instruct-AWQ
qwen2-72b	qwen/Qwen2-72B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-72B
qwen2-72b-instruct	qwen/Qwen2-72B-Instruct	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-72B-Instruct
qwen2-72b-instruct-int4	qwen/Qwen2-72B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2-72B-Instruct-GPTQ-Int4
qwen2-72b-instruct-int8	qwen/Qwen2-72B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2-72B-Instruct-GPTQ-Int8
qwen2-72b-instruct-awq	qwen/Qwen2-72B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2-72B-Instruct-AWQ
qwen2-57b-a14b	qwen/Qwen2-57B-A14B	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘	transformers>=4.40	moe	Qwen/Qwen2-57B-A14B
qwen2-57b-a14b-instruct	qwen/Qwen2-57B-A14B-Instruct	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	transformers>=4.40	moe	Qwen/Qwen2-57B-A14B-Instruct
qwen2-57b-a14b-instruct-int4	qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.40	moe	Qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4
qwen2-math-1_5b	qwen/Qwen2-Math-1.5B	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-Math-1.5B
qwen2-math-1_5b-instruct	qwen/Qwen2-Math-1.5B-Instruct	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-Math-1.5B-Instruct
qwen2-math-7b	qwen/Qwen2-Math-7B	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-Math-7B
qwen2-math-7b-instruct	qwen/Qwen2-Math-7B-Instruct	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-Math-7B-Instruct
qwen2-math-72b	qwen/Qwen2-Math-72B	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-Math-72B
qwen2-math-72b-instruct	qwen/Qwen2-Math-72B-Instruct	q_proj, k_proj, v_proj	qwen	✔	✔	✔	✔	transformers>=4.37	-	Qwen/Qwen2-Math-72B-Instruct
qwen2_5-0_5b	qwen/Qwen2.5-0.5B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-0.5B
qwen2_5-1_5b	qwen/Qwen2.5-1.5B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-1.5B
qwen2_5-3b	qwen/Qwen2.5-3B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-3B
qwen2_5-7b	qwen/Qwen2.5-7B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-7B
qwen2_5-14b	qwen/Qwen2.5-14B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-14B
qwen2_5-32b	qwen/Qwen2.5-32B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-32B
qwen2_5-72b	qwen/Qwen2.5-72B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-72B
qwen2_5-0_5b-instruct	qwen/Qwen2.5-0.5B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-0.5B-Instruct
qwen2_5-1_5b-instruct	qwen/Qwen2.5-1.5B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-1.5B-Instruct
qwen2_5-3b-instruct	qwen/Qwen2.5-3B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-3B-Instruct
qwen2_5-7b-instruct	qwen/Qwen2.5-7B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-7B-Instruct
qwen2_5-14b-instruct	qwen/Qwen2.5-14B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-14B-Instruct
qwen2_5-32b-instruct	qwen/Qwen2.5-32B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-32B-Instruct
qwen2_5-72b-instruct	qwen/Qwen2.5-72B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-72B-Instruct
qwen2_5-0_5b-instruct-gptq-int4	qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4
qwen2_5-1_5b-instruct-gptq-int4	qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4
qwen2_5-3b-instruct-gptq-int4	qwen/Qwen2.5-3B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-3B-Instruct-GPTQ-Int4
qwen2_5-7b-instruct-gptq-int4	qwen/Qwen2.5-7B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4
qwen2_5-14b-instruct-gptq-int4	qwen/Qwen2.5-14B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4
qwen2_5-32b-instruct-gptq-int4	qwen/Qwen2.5-32B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4
qwen2_5-72b-instruct-gptq-int4	qwen/Qwen2.5-72B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-72B-Instruct-GPTQ-Int4
qwen2_5-0_5b-instruct-gptq-int8	qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8
qwen2_5-1_5b-instruct-gptq-int8	qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8
qwen2_5-3b-instruct-gptq-int8	qwen/Qwen2.5-3B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-3B-Instruct-GPTQ-Int8
qwen2_5-7b-instruct-gptq-int8	qwen/Qwen2.5-7B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8
qwen2_5-14b-instruct-gptq-int8	qwen/Qwen2.5-14B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8
qwen2_5-32b-instruct-gptq-int8	qwen/Qwen2.5-32B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8
qwen2_5-72b-instruct-gptq-int8	qwen/Qwen2.5-72B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-72B-Instruct-GPTQ-Int8
qwen2_5-0_5b-instruct-awq	qwen/Qwen2.5-0.5B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-0.5B-Instruct-AWQ
qwen2_5-1_5b-instruct-awq	qwen/Qwen2.5-1.5B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-1.5B-Instruct-AWQ
qwen2_5-3b-instruct-awq	qwen/Qwen2.5-3B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-3B-Instruct-AWQ
qwen2_5-7b-instruct-awq	qwen/Qwen2.5-7B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-7B-Instruct-AWQ
qwen2_5-14b-instruct-awq	qwen/Qwen2.5-14B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-14B-Instruct-AWQ
qwen2_5-32b-instruct-awq	qwen/Qwen2.5-32B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-32B-Instruct-AWQ
qwen2_5-72b-instruct-awq	qwen/Qwen2.5-72B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-72B-Instruct-AWQ
qwen2_5-math-1_5b	qwen/Qwen2.5-Math-1.5B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Math-1.5B
qwen2_5-math-7b	qwen/Qwen2.5-Math-7B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Math-7B
qwen2_5-math-72b	qwen/Qwen2.5-Math-72B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Math-72B
qwen2_5-math-1_5b-instruct	qwen/Qwen2.5-Math-1.5B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Math-1.5B-Instruct
qwen2_5-math-7b-instruct	qwen/Qwen2.5-Math-7B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Math-7B-Instruct
qwen2_5-math-72b-instruct	qwen/Qwen2.5-Math-72B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Math-72B-Instruct
qwen2_5-coder-0_5b	qwen/Qwen2.5-Coder-0.5B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Coder-0.5B
qwen2_5-coder-0_5b-instruct	qwen/Qwen2.5-Coder-0.5B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Coder-0.5B-Instruct
qwen2_5-coder-0_5b-instruct-gptq-int4	qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4
qwen2_5-coder-0_5b-instruct-gptq-int8	qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8
qwen2_5-coder-0_5b-instruct-awq	qwen/Qwen2.5-Coder-0.5B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-0.5B-Instruct-AWQ
qwen2_5-coder-1_5b	qwen/Qwen2.5-Coder-1.5B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Coder-1.5B
qwen2_5-coder-1_5b-instruct	qwen/Qwen2.5-Coder-1.5B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Coder-1.5B-Instruct
qwen2_5-coder-1_5b-instruct-gptq-int4	qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4
qwen2_5-coder-1_5b-instruct-gptq-int8	qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8
qwen2_5-coder-1_5b-instruct-awq	qwen/Qwen2.5-Coder-1.5B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-1.5B-Instruct-AWQ
qwen2_5-coder-3b	qwen/Qwen2.5-Coder-3B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Coder-3B
qwen2_5-coder-3b-instruct	qwen/Qwen2.5-Coder-3B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Coder-3B-Instruct
qwen2_5-coder-3b-instruct-gptq-int4	qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-3B-Instruct-GPTQ-Int4
qwen2_5-coder-3b-instruct-gptq-int8	qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-3B-Instruct-GPTQ-Int8
qwen2_5-coder-3b-instruct-awq	qwen/Qwen2.5-Coder-3B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-3B-Instruct-AWQ
qwen2_5-coder-7b	qwen/Qwen2.5-Coder-7B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Coder-7B
qwen2_5-coder-7b-instruct	qwen/Qwen2.5-Coder-7B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Coder-7B-Instruct
qwen2_5-coder-7b-instruct-gptq-int4	qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4
qwen2_5-coder-7b-instruct-gptq-int8	qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8
qwen2_5-coder-7b-instruct-awq	qwen/Qwen2.5-Coder-7B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-7B-Instruct-AWQ
qwen2_5-coder-14b	qwen/Qwen2.5-Coder-14B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Coder-14B
qwen2_5-coder-14b-instruct	qwen/Qwen2.5-Coder-14B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Coder-14B-Instruct
qwen2_5-coder-14b-instruct-gptq-int4	qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4
qwen2_5-coder-14b-instruct-gptq-int8	qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8
qwen2_5-coder-14b-instruct-awq	qwen/Qwen2.5-Coder-14B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-14B-Instruct-AWQ
qwen2_5-coder-32b	qwen/Qwen2.5-Coder-32B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Coder-32B
qwen2_5-coder-32b-instruct	qwen/Qwen2.5-Coder-32B-Instruct	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37	-	Qwen/Qwen2.5-Coder-32B-Instruct
qwen2_5-coder-32b-instruct-gptq-int4	qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4
qwen2_5-coder-32b-instruct-gptq-int8	qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✘	✘	auto_gptq>=0.5, transformers>=4.37	-	Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8
qwen2_5-coder-32b-instruct-awq	qwen/Qwen2.5-Coder-32B-Instruct-AWQ	q_proj, k_proj, v_proj	qwen2_5	✔	✔	✔	✘	transformers>=4.37, autoawq	-	Qwen/Qwen2.5-32B-Instruct-AWQ
qwq-32b-preview	Qwen/QwQ-32B-Preview	q_proj, k_proj, v_proj	qwq	✔	✔	✔	✔	transformers>=4.37	-	Qwen/QwQ-32B-Preview
marco-o1	AIDC-AI/Marco-o1	q_proj, k_proj, v_proj	marco_o1	✔	✔	✔	✘	transformers>=4.37	-	AIDC-AI/Marco-o1
chatglm2-6b	ZhipuAI/chatglm2-6b	query_key_value	chatglm2	✘	✔	✘	✘	transformers<4.42	-	THUDM/chatglm2-6b
chatglm2-6b-32k	ZhipuAI/chatglm2-6b-32k	query_key_value	chatglm2	✘	✔	✘	✘	transformers<4.42	-	THUDM/chatglm2-6b-32k
chatglm3-6b-base	ZhipuAI/chatglm3-6b-base	query_key_value	chatglm-generation	✘	✔	✘	✘	transformers<4.42	-	THUDM/chatglm3-6b-base
chatglm3-6b	ZhipuAI/chatglm3-6b	query_key_value	chatglm3	✘	✔	✘	✘	transformers<4.42	-	THUDM/chatglm3-6b
chatglm3-6b-32k	ZhipuAI/chatglm3-6b-32k	query_key_value	chatglm3	✘	✔	✘	✘	transformers<4.42	-	THUDM/chatglm3-6b-32k
chatglm3-6b-128k	ZhipuAI/chatglm3-6b-128k	query_key_value	chatglm3	✘	✔	✘	✘	transformers<4.42	-	THUDM/chatglm3-6b-128k
codegeex2-6b	ZhipuAI/codegeex2-6b	query_key_value	chatglm-generation	✘	✔	✘	✘	transformers<4.34	coding	THUDM/codegeex2-6b
glm4-9b	ZhipuAI/glm-4-9b	query_key_value	chatglm-generation	✔	✔	✔	✘	transformers>=4.42	-	THUDM/glm-4-9b
glm4-9b-chat	ZhipuAI/glm-4-9b-chat	query_key_value	chatglm4	✔	✔	✔	✘	transformers>=4.42	-	THUDM/glm-4-9b-chat
glm4-9b-chat-1m	ZhipuAI/glm-4-9b-chat-1m	query_key_value	chatglm4	✔	✔	✔	✘	transformers>=4.42	-	THUDM/glm-4-9b-chat-1m
codegeex4-9b-chat	ZhipuAI/codegeex4-all-9b	query_key_value	codegeex4	✔	✔	✔	✘	transformers<4.42	coding	THUDM/codegeex4-all-9b
glm-edge-1_5b-chat	ZhipuAI/glm-edge-1.5b-chat	q_proj, k_proj, v_proj	chatglm4	✔	✘	✘	✘	transformers>=4.46	-	THUDM/glm-edge-1.5b-chat
glm-edge-4b-chat	ZhipuAI/glm-edge-4b-chat	q_proj, k_proj, v_proj	chatglm4	✔	✘	✘	✘	transformers>=4.46	-	THUDM/glm-edge-4b-chat
llama2-7b	modelscope/Llama-2-7b-ms	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	meta-llama/Llama-2-7b-hf
llama2-7b-chat	modelscope/Llama-2-7b-chat-ms	q_proj, k_proj, v_proj	llama	✔	✔	✔	✘		-	meta-llama/Llama-2-7b-chat-hf
llama2-13b	modelscope/Llama-2-13b-ms	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	meta-llama/Llama-2-13b-hf
llama2-13b-chat	modelscope/Llama-2-13b-chat-ms	q_proj, k_proj, v_proj	llama	✔	✔	✔	✘		-	meta-llama/Llama-2-13b-chat-hf
llama2-70b	modelscope/Llama-2-70b-ms	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	meta-llama/Llama-2-70b-hf
llama2-70b-chat	modelscope/Llama-2-70b-chat-ms	q_proj, k_proj, v_proj	llama	✔	✔	✔	✘		-	meta-llama/Llama-2-70b-chat-hf
llama2-7b-aqlm-2bit-1x16	AI-ModelScope/Llama-2-7b-AQLM-2Bit-1x16-hf	q_proj, k_proj, v_proj	default-generation	✔	✘	✘	✘	transformers>=4.38, aqlm, torch>=2.2.0	-	ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf
llama3-8b	LLM-Research/Meta-Llama-3-8B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	meta-llama/Meta-Llama-3-8B
llama3-8b-instruct	LLM-Research/Meta-Llama-3-8B-Instruct	q_proj, k_proj, v_proj	llama3	✔	✔	✔	✘		-	meta-llama/Meta-Llama-3-8B-Instruct
llama3-8b-instruct-int4	swift/Meta-Llama-3-8B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	auto_gptq	-	study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int4
llama3-8b-instruct-int8	swift/Meta-Llama-3-8B-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	auto_gptq	-	study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int8
llama3-8b-instruct-awq	swift/Meta-Llama-3-8B-Instruct-AWQ	q_proj, k_proj, v_proj	llama3	✔	✔	✔	✘	autoawq	-	study-hjt/Meta-Llama-3-8B-Instruct-AWQ
llama3-70b	LLM-Research/Meta-Llama-3-70B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	meta-llama/Meta-Llama-3-70B
llama3-70b-instruct	LLM-Research/Meta-Llama-3-70B-Instruct	q_proj, k_proj, v_proj	llama3	✔	✔	✔	✘		-	meta-llama/Meta-Llama-3-70B-Instruct
llama3-70b-instruct-int4	swift/Meta-Llama-3-70B-Instruct-GPTQ-Int4	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	auto_gptq	-	study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int4
llama3-70b-instruct-int8	swift/Meta-Llama-3-70b-Instruct-GPTQ-Int8	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	auto_gptq	-	study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int8
llama3-70b-instruct-awq	swift/Meta-Llama-3-70B-Instruct-AWQ	q_proj, k_proj, v_proj	llama3	✔	✔	✔	✘	autoawq	-	study-hjt/Meta-Llama-3-70B-Instruct-AWQ
llama3_1-8b	LLM-Research/Meta-Llama-3.1-8B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.43	-	meta-llama/Meta-Llama-3.1-8B
llama3_1-8b-instruct	LLM-Research/Meta-Llama-3.1-8B-Instruct	q_proj, k_proj, v_proj	llama3	✔	✔	✔	✘	transformers>=4.43	-	meta-llama/Meta-Llama-3.1-8B-Instruct
llama3_1-8b-instruct-awq	LLM-Research/Meta-Llama-3.1-8B-Instruct-AWQ-INT4	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	transformers>=4.43, autoawq	-	hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4
llama3_1-8b-instruct-gptq-int4	LLM-Research/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	transformers>=4.43, auto_gptq	-	hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4
llama3_1-8b-instruct-bnb	LLM-Research/Meta-Llama-3.1-8B-Instruct-BNB-NF4	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	transformers>=4.43, bitsandbytes	-	hugging-quants/Meta-Llama-3.1-8B-Instruct-BNB-NF4
llama3_1-70b	LLM-Research/Meta-Llama-3.1-70B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.43	-	meta-llama/Meta-Llama-3.1-70B
llama3_1-70b-instruct	LLM-Research/Meta-Llama-3.1-70B-Instruct	q_proj, k_proj, v_proj	llama3	✔	✔	✔	✘	transformers>=4.43	-	meta-llama/Meta-Llama-3.1-70B-Instruct
llama3_1-70b-instruct-fp8	LLM-Research/Meta-Llama-3.1-70B-Instruct-FP8	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	transformers>=4.43	-	meta-llama/Meta-Llama-3.1-70B-Instruct-FP8
llama3_1-70b-instruct-awq	LLM-Research/Meta-Llama-3.1-70B-Instruct-AWQ-INT4	q_proj, k_proj, v_proj	llama3	✔	✔	✔	✘	transformers>=4.43, autoawq	-	hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4
llama3_1-70b-instruct-gptq-int4	LLM-Research/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	transformers>=4.43, auto_gptq	-	hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4
llama3_1-70b-instruct-bnb	LLM-Research/Meta-Llama-3.1-70B-Instruct-bnb-4bit	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	transformers>=4.43, bitsandbytes	-	unsloth/Meta-Llama-3.1-70B-Instruct-bnb-4bit
llama3_1-405b	LLM-Research/Meta-Llama-3.1-405B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.43	-	meta-llama/Meta-Llama-3.1-405B
llama3_1-405b-instruct	LLM-Research/Meta-Llama-3.1-405B-Instruct	q_proj, k_proj, v_proj	llama3	✔	✔	✔	✘	transformers>=4.43	-	meta-llama/Meta-Llama-3.1-405B-Instruct
llama3_1-405b-instruct-fp8	LLM-Research/Meta-Llama-3.1-405B-Instruct-FP8	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	transformers>=4.43	-	meta-llama/Meta-Llama-3.1-405B-Instruct-FP8
llama3_1-405b-instruct-awq	LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4	q_proj, k_proj, v_proj	llama3	✔	✔	✔	✘	transformers>=4.43, autoawq	-	hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4
llama3_1-405b-instruct-gptq-int4	LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	transformers>=4.43, auto_gptq	-	hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4
llama3_1-405b-instruct-bnb	LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4	q_proj, k_proj, v_proj	llama3	✔	✔	✘	✘	transformers>=4.43, bitsandbytes	-	hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4
llama-3.1-nemotron-70B-instruct-hf	AI-ModelScope/Llama-3.1-Nemotron-70B-Instruct-HF	q_proj, k_proj, v_proj	llama3	✔	✔	✔	✘	transformers>=4.43	-	nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
llama3_2-1b	LLM-Research/Llama-3.2-1B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.45	-	meta-llama/Llama-3.2-1B
llama3_2-1b-instruct	LLM-Research/Llama-3.2-1B-Instruct	q_proj, k_proj, v_proj	llama3_2	✔	✔	✔	✘	transformers>=4.45	-	meta-llama/Llama-3.2-1B-Instruct
llama3_2-3b	LLM-Research/Llama-3.2-3B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.45	-	meta-llama/Llama-3.2-3B
llama3_2-3b-instruct	LLM-Research/Llama-3.2-3B-Instruct	q_proj, k_proj, v_proj	llama3_2	✔	✔	✔	✘	transformers>=4.45	-	meta-llama/Llama-3.2-3B-Instruct
reflection-llama_3_1-70b	LLM-Research/Reflection-Llama-3.1-70B	q_proj, k_proj, v_proj	reflection	✔	✔	✘	✘	transformers>=4.43	-	mattshumer/Reflection-Llama-3.1-70B
longwriter-glm4-9b	ZhipuAI/LongWriter-glm4-9b	query_key_value	chatglm4	✔	✔	✔	✘	transformers>=4.42	-	THUDM/LongWriter-glm4-9b
longwriter-llama3_1-8b	ZhipuAI/LongWriter-llama3.1-8b	q_proj, k_proj, v_proj	longwriter-llama3	✔	✔	✔	✘	transformers>=4.43	-	THUDM/LongWriter-llama3.1-8b
chinese-llama-2-1_3b	AI-ModelScope/chinese-llama-2-1.3b	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	hfl/chinese-llama-2-1.3b
chinese-llama-2-7b	AI-ModelScope/chinese-llama-2-7b	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	hfl/chinese-llama-2-7b
chinese-llama-2-7b-16k	AI-ModelScope/chinese-llama-2-7b-16k	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	hfl/chinese-llama-2-7b-16k
chinese-llama-2-7b-64k	AI-ModelScope/chinese-llama-2-7b-64k	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	hfl/chinese-llama-2-7b-64k
chinese-llama-2-13b	AI-ModelScope/chinese-llama-2-13b	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	hfl/chinese-llama-2-13b
chinese-llama-2-13b-16k	AI-ModelScope/chinese-llama-2-13b-16k	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	hfl/chinese-llama-2-13b-16k
chinese-alpaca-2-1_3b	AI-ModelScope/chinese-alpaca-2-1.3b	q_proj, k_proj, v_proj	llama	✔	✔	✔	✘		-	hfl/chinese-alpaca-2-1.3b
chinese-alpaca-2-7b	AI-ModelScope/chinese-alpaca-2-7b	q_proj, k_proj, v_proj	llama	✔	✔	✔	✘		-	hfl/chinese-alpaca-2-7b
chinese-alpaca-2-7b-16k	AI-ModelScope/chinese-alpaca-2-7b-16k	q_proj, k_proj, v_proj	llama	✔	✔	✔	✘		-	hfl/chinese-alpaca-2-7b-16k
chinese-alpaca-2-7b-64k	AI-ModelScope/chinese-alpaca-2-7b-64k	q_proj, k_proj, v_proj	llama	✔	✔	✔	✘		-	hfl/chinese-alpaca-2-7b-64k
chinese-alpaca-2-13b	AI-ModelScope/chinese-alpaca-2-13b	q_proj, k_proj, v_proj	llama	✔	✔	✔	✘		-	hfl/chinese-alpaca-2-13b
chinese-alpaca-2-13b-16k	AI-ModelScope/chinese-alpaca-2-13b-16k	q_proj, k_proj, v_proj	llama	✔	✔	✔	✘		-	hfl/chinese-alpaca-2-13b-16k
llama-3-chinese-8b	ChineseAlpacaGroup/llama-3-chinese-8b	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	hfl/llama-3-chinese-8b
llama-3-chinese-8b-instruct	ChineseAlpacaGroup/llama-3-chinese-8b-instruct	q_proj, k_proj, v_proj	llama3	✔	✔	✔	✘		-	hfl/llama-3-chinese-8b-instruct
atom-7b	FlagAlpha/Atom-7B	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘		-	FlagAlpha/Atom-7B
atom-7b-chat	FlagAlpha/Atom-7B-Chat	q_proj, k_proj, v_proj	atom	✔	✔	✘	✘		-	FlagAlpha/Atom-7B-Chat
yi-6b	01ai/Yi-6B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	01-ai/Yi-6B
yi-6b-200k	01ai/Yi-6B-200K	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	01-ai/Yi-6B-200K
yi-6b-chat	01ai/Yi-6B-Chat	q_proj, k_proj, v_proj	chatml	✔	✔	✔	✘		-	01-ai/Yi-6B-Chat
yi-6b-chat-awq	01ai/Yi-6B-Chat-4bits	q_proj, k_proj, v_proj	chatml	✔	✔	✔	✘	autoawq	-	01-ai/Yi-6B-Chat-4bits
yi-6b-chat-int8	01ai/Yi-6B-Chat-8bits	q_proj, k_proj, v_proj	chatml	✔	✔	✘	✘	auto_gptq	-	01-ai/Yi-6B-Chat-8bits
yi-9b	01ai/Yi-9B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	01-ai/Yi-9B
yi-9b-200k	01ai/Yi-9B-200K	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	01-ai/Yi-9B-200K
yi-34b	01ai/Yi-34B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	01-ai/Yi-34B
yi-34b-200k	01ai/Yi-34B-200K	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	01-ai/Yi-34B-200K
yi-34b-chat	01ai/Yi-34B-Chat	q_proj, k_proj, v_proj	chatml	✔	✔	✔	✘		-	01-ai/Yi-34B-Chat
yi-34b-chat-awq	01ai/Yi-34B-Chat-4bits	q_proj, k_proj, v_proj	chatml	✔	✔	✔	✘	autoawq	-	01-ai/Yi-34B-Chat-4bits
yi-34b-chat-int8	01ai/Yi-34B-Chat-8bits	q_proj, k_proj, v_proj	chatml	✔	✔	✘	✘	auto_gptq	-	01-ai/Yi-34B-Chat-8bits
yi-1_5-6b	01ai/Yi-1.5-6B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	01-ai/Yi-1.5-6B
yi-1_5-6b-chat	01ai/Yi-1.5-6B-Chat	q_proj, k_proj, v_proj	chatml	✔	✔	✔	✘		-	01-ai/Yi-1.5-6B-Chat
yi-1_5-9b	01ai/Yi-1.5-9B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	01-ai/Yi-1.5-9B
yi-1_5-9b-chat	01ai/Yi-1.5-9B-Chat	q_proj, k_proj, v_proj	chatml	✔	✔	✔	✘		-	01-ai/Yi-1.5-9B-Chat
yi-1_5-9b-chat-16k	01ai/Yi-1.5-9B-Chat-16K	q_proj, k_proj, v_proj	chatml	✔	✔	✔	✘		-	01-ai/Yi-1.5-9B-Chat-16K
yi-1_5-34b	01ai/Yi-1.5-34B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	01-ai/Yi-1.5-34B
yi-1_5-34b-chat	01ai/Yi-1.5-34B-Chat	q_proj, k_proj, v_proj	chatml	✔	✔	✔	✘		-	01-ai/Yi-1.5-34B-Chat
yi-1_5-34b-chat-16k	01ai/Yi-1.5-34B-Chat-16K	q_proj, k_proj, v_proj	chatml	✔	✔	✔	✘		-	01-ai/Yi-1.5-34B-Chat-16K
yi-1_5-6b-chat-awq-int4	AI-ModelScope/Yi-1.5-6B-Chat-AWQ	q_proj, k_proj, v_proj	chatml	✔	✔	✔	✘	autoawq	-	modelscope/Yi-1.5-6B-Chat-AWQ
yi-1_5-6b-chat-gptq-int4	AI-ModelScope/Yi-1.5-6B-Chat-GPTQ	q_proj, k_proj, v_proj	chatml	✔	✔	✘	✘	auto_gptq>=0.5	-	modelscope/Yi-1.5-6B-Chat-GPTQ
yi-1_5-9b-chat-awq-int4	AI-ModelScope/Yi-1.5-9B-Chat-AWQ	q_proj, k_proj, v_proj	chatml	✔	✔	✔	✘	autoawq	-	modelscope/Yi-1.5-9B-Chat-AWQ
yi-1_5-9b-chat-gptq-int4	AI-ModelScope/Yi-1.5-9B-Chat-GPTQ	q_proj, k_proj, v_proj	chatml	✔	✔	✘	✘	auto_gptq>=0.5	-	modelscope/Yi-1.5-9B-Chat-GPTQ
yi-1_5-34b-chat-awq-int4	AI-ModelScope/Yi-1.5-34B-Chat-AWQ	q_proj, k_proj, v_proj	chatml	✔	✔	✔	✘	autoawq	-	modelscope/Yi-1.5-34B-Chat-AWQ
yi-1_5-34b-chat-gptq-int4	AI-ModelScope/Yi-1.5-34B-Chat-GPTQ	q_proj, k_proj, v_proj	chatml	✔	✔	✘	✘	auto_gptq>=0.5	-	modelscope/Yi-1.5-34B-Chat-GPTQ
yi-coder-1_5b	01ai/Yi-Coder-1.5B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	01-ai/Yi-Coder-1.5B
yi-coder-1_5b-chat	01ai/Yi-Coder-1.5B-Chat	q_proj, k_proj, v_proj	yi-coder	✔	✔	✔	✘		-	01-ai/Yi-Coder-1.5B-Chat
yi-coder-9b	01ai/Yi-Coder-9B	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	01-ai/Yi-Coder-9B
yi-coder-9b-chat	01ai/Yi-Coder-9B-Chat	q_proj, k_proj, v_proj	yi-coder	✔	✔	✔	✘		-	01-ai/Yi-Coder-9B-Chat
internlm-7b	Shanghai_AI_Laboratory/internlm-7b	q_proj, k_proj, v_proj	default-generation	✘	✔	✔	✘		-	internlm/internlm-7b
internlm-7b-chat	Shanghai_AI_Laboratory/internlm-chat-7b	q_proj, k_proj, v_proj	internlm	✘	✔	✔	✘		-	internlm/internlm-chat-7b
internlm-7b-chat-8k	Shanghai_AI_Laboratory/internlm-chat-7b-8k	q_proj, k_proj, v_proj	internlm	✘	✔	✔	✘		-	-
internlm-20b	Shanghai_AI_Laboratory/internlm-20b	q_proj, k_proj, v_proj	default-generation	✘	✔	✔	✘		-	internlm/internlm-20b
internlm-20b-chat	Shanghai_AI_Laboratory/internlm-chat-20b	q_proj, k_proj, v_proj	internlm	✘	✔	✔	✘		-	internlm/internlm-chat-20b
internlm2-1_8b	Shanghai_AI_Laboratory/internlm2-1_8b	wqkv	default-generation	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2-1_8b
internlm2-1_8b-sft-chat	Shanghai_AI_Laboratory/internlm2-chat-1_8b-sft	wqkv	internlm2	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2-chat-1_8b-sft
internlm2-1_8b-chat	Shanghai_AI_Laboratory/internlm2-chat-1_8b	wqkv	internlm2	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2-chat-1_8b
internlm2-7b-base	Shanghai_AI_Laboratory/internlm2-base-7b	wqkv	default-generation	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2-base-7b
internlm2-7b	Shanghai_AI_Laboratory/internlm2-7b	wqkv	default-generation	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2-7b
internlm2-7b-sft-chat	Shanghai_AI_Laboratory/internlm2-chat-7b-sft	wqkv	internlm2	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2-chat-7b-sft
internlm2-7b-chat	Shanghai_AI_Laboratory/internlm2-chat-7b	wqkv	internlm2	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2-chat-7b
internlm2-20b-base	Shanghai_AI_Laboratory/internlm2-base-20b	wqkv	default-generation	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2-base-20b
internlm2-20b	Shanghai_AI_Laboratory/internlm2-20b	wqkv	default-generation	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2-20b
internlm2-20b-sft-chat	Shanghai_AI_Laboratory/internlm2-chat-20b-sft	wqkv	internlm2	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2-chat-20b-sft
internlm2-20b-chat	Shanghai_AI_Laboratory/internlm2-chat-20b	wqkv	internlm2	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2-chat-20b
internlm2_5-1_8b	Shanghai_AI_Laboratory/internlm2_5-1_8b	wqkv	default-generation	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2_5-1_8b
internlm2_5-1_8b-chat	Shanghai_AI_Laboratory/internlm2_5-1_8b-chat	wqkv	internlm2	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2_5-1_8b-chat
internlm2_5-7b	Shanghai_AI_Laboratory/internlm2_5-7b	wqkv	default-generation	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2_5-7b
internlm2_5-7b-chat	Shanghai_AI_Laboratory/internlm2_5-7b-chat	wqkv	internlm2	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2_5-7b-chat
internlm2_5-7b-chat-1m	Shanghai_AI_Laboratory/internlm2_5-7b-chat-1m	wqkv	internlm2	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2_5-7b-chat-1m
internlm2_5-20b	Shanghai_AI_Laboratory/internlm2_5-20b	wqkv	default-generation	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2_5-20b
internlm2_5-20b-chat	Shanghai_AI_Laboratory/internlm2_5-20b-chat	wqkv	internlm2	✔	✔	✔	✘	transformers>=4.38	-	internlm/internlm2_5-20b-chat
internlm2-math-7b	Shanghai_AI_Laboratory/internlm2-math-base-7b	wqkv	default-generation	✔	✔	✔	✘	transformers>=4.38	math	internlm/internlm2-math-base-7b
internlm2-math-7b-chat	Shanghai_AI_Laboratory/internlm2-math-7b	wqkv	internlm2	✔	✔	✔	✘	transformers>=4.38	math	internlm/internlm2-math-7b
internlm2-math-20b	Shanghai_AI_Laboratory/internlm2-math-base-20b	wqkv	default-generation	✔	✔	✔	✘	transformers>=4.38	math	internlm/internlm2-math-base-20b
internlm2-math-20b-chat	Shanghai_AI_Laboratory/internlm2-math-20b	wqkv	internlm2	✔	✔	✔	✘	transformers>=4.38	math	internlm/internlm2-math-20b
deepseek-7b	deepseek-ai/deepseek-llm-7b-base	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	deepseek-ai/deepseek-llm-7b-base
deepseek-7b-chat	deepseek-ai/deepseek-llm-7b-chat	q_proj, k_proj, v_proj	deepseek	✔	✔	✔	✘		-	deepseek-ai/deepseek-llm-7b-chat
deepseek-moe-16b	deepseek-ai/deepseek-moe-16b-base	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘		moe	deepseek-ai/deepseek-moe-16b-base
deepseek-moe-16b-chat	deepseek-ai/deepseek-moe-16b-chat	q_proj, k_proj, v_proj	deepseek	✔	✔	✘	✘		moe	deepseek-ai/deepseek-moe-16b-chat
deepseek-67b	deepseek-ai/deepseek-llm-67b-base	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	deepseek-ai/deepseek-llm-67b-base
deepseek-67b-chat	deepseek-ai/deepseek-llm-67b-chat	q_proj, k_proj, v_proj	deepseek	✔	✔	✔	✘		-	deepseek-ai/deepseek-llm-67b-chat
deepseek-coder-1_3b	deepseek-ai/deepseek-coder-1.3b-base	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		coding	deepseek-ai/deepseek-coder-1.3b-base
deepseek-coder-1_3b-instruct	deepseek-ai/deepseek-coder-1.3b-instruct	q_proj, k_proj, v_proj	deepseek-coder	✔	✔	✔	✘		coding	deepseek-ai/deepseek-coder-1.3b-instruct
deepseek-coder-6_7b	deepseek-ai/deepseek-coder-6.7b-base	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		coding	deepseek-ai/deepseek-coder-6.7b-base
deepseek-coder-6_7b-instruct	deepseek-ai/deepseek-coder-6.7b-instruct	q_proj, k_proj, v_proj	deepseek-coder	✔	✔	✔	✘		coding	deepseek-ai/deepseek-coder-6.7b-instruct
deepseek-coder-33b	deepseek-ai/deepseek-coder-33b-base	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		coding	deepseek-ai/deepseek-coder-33b-base
deepseek-coder-33b-instruct	deepseek-ai/deepseek-coder-33b-instruct	q_proj, k_proj, v_proj	deepseek-coder	✔	✔	✔	✘		coding	deepseek-ai/deepseek-coder-33b-instruct
deepseek-coder-v2-instruct	deepseek-ai/DeepSeek-Coder-V2-Instruct	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj	deepseek2	✔	✔	✘	✘	transformers>=4.39.3	coding, moe	deepseek-ai/DeepSeek-Coder-V2-Instruct
deepseek-coder-v2-lite-instruct	deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj	deepseek2	✔	✔	✘	✘	transformers>=4.39.3	coding, moe	deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
deepseek-coder-v2	deepseek-ai/DeepSeek-Coder-V2-Base	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj	default-generation	✔	✔	✘	✘	transformers>=4.39.3	coding, moe	deepseek-ai/DeepSeek-Coder-V2-Base
deepseek-coder-v2-lite	deepseek-ai/DeepSeek-Coder-V2-Lite-Base	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj	default-generation	✔	✔	✘	✘	transformers>=4.39.3	coding, moe	deepseek-ai/DeepSeek-Coder-V2-Lite-Base
deepseek-math-7b	deepseek-ai/deepseek-math-7b-base	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		math	deepseek-ai/deepseek-math-7b-base
deepseek-math-7b-instruct	deepseek-ai/deepseek-math-7b-instruct	q_proj, k_proj, v_proj	deepseek	✔	✔	✔	✘		math	deepseek-ai/deepseek-math-7b-instruct
deepseek-math-7b-chat	deepseek-ai/deepseek-math-7b-rl	q_proj, k_proj, v_proj	deepseek	✔	✔	✔	✘		math	deepseek-ai/deepseek-math-7b-rl
numina-math-7b	AI-ModelScope/NuminaMath-7B-TIR	q_proj, k_proj, v_proj	numina-math	✔	✔	✘	✘		math	AI-MO/NuminaMath-7B-TIR
deepseek-v2	deepseek-ai/DeepSeek-V2	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj	default-generation	✔	✔	✘	✘	transformers>=4.39.3	moe	deepseek-ai/DeepSeek-V2
deepseek-v2-chat	deepseek-ai/DeepSeek-V2-Chat	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj	deepseek2	✔	✔	✘	✘	transformers>=4.39.3	moe	deepseek-ai/DeepSeek-V2-Chat
deepseek-v2-lite	deepseek-ai/DeepSeek-V2-Lite	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj	default-generation	✔	✔	✘	✘	transformers>=4.39.3	moe	deepseek-ai/DeepSeek-V2-Lite
deepseek-v2-lite-chat	deepseek-ai/DeepSeek-V2-Lite-Chat	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj	deepseek2	✔	✔	✘	✘	transformers>=4.39.3	moe	deepseek-ai/DeepSeek-V2-Lite-Chat
deepseek-v2_5	deepseek-ai/DeepSeek-V2.5	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj	deepseek2_5	✔	✔	✘	✘	transformers>=4.39.3	moe	deepseek-ai/DeepSeek-V2.5
gemma-2b	AI-ModelScope/gemma-2b	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘	transformers>=4.38	-	google/gemma-2b
gemma-7b	AI-ModelScope/gemma-7b	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘	transformers>=4.38	-	google/gemma-7b
gemma-2b-instruct	AI-ModelScope/gemma-2b-it	q_proj, k_proj, v_proj	gemma	✔	✔	✘	✘	transformers>=4.38	-	google/gemma-2b-it
gemma-7b-instruct	AI-ModelScope/gemma-7b-it	q_proj, k_proj, v_proj	gemma	✔	✔	✘	✘	transformers>=4.38	-	google/gemma-7b-it
gemma2-2b	LLM-Research/gemma-2-2b	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘	transformers>=4.42	-	google/gemma-2-2b
gemma2-9b	LLM-Research/gemma-2-9b	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘	transformers>=4.42	-	google/gemma-2-9b
gemma2-27b	LLM-Research/gemma-2-27b	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘	transformers>=4.42	-	google/gemma-2-27b
gemma2-2b-instruct	LLM-Research/gemma-2-2b-it	q_proj, k_proj, v_proj	gemma	✔	✔	✘	✘	transformers>=4.42	-	google/gemma-2-2b-it
gemma2-9b-instruct	LLM-Research/gemma-2-9b-it	q_proj, k_proj, v_proj	gemma	✔	✔	✘	✘	transformers>=4.42	-	google/gemma-2-9b-it
gemma2-27b-instruct	LLM-Research/gemma-2-27b-it	q_proj, k_proj, v_proj	gemma	✔	✔	✘	✘	transformers>=4.42	-	google/gemma-2-27b-it
minicpm-1b-sft-chat	OpenBMB/MiniCPM-1B-sft-bf16	q_proj, k_proj, v_proj	minicpm	✔	✔	✘	✘	transformers>=4.36.0	-	openbmb/MiniCPM-1B-sft-bf16
minicpm-2b-sft-chat	OpenBMB/MiniCPM-2B-sft-fp32	q_proj, k_proj, v_proj	minicpm	✔	✔	✘	✘		-	openbmb/MiniCPM-2B-sft-fp32
minicpm-2b-chat	OpenBMB/MiniCPM-2B-dpo-fp32	q_proj, k_proj, v_proj	minicpm	✔	✔	✘	✘		-	openbmb/MiniCPM-2B-dpo-fp32
minicpm-2b-128k	OpenBMB/MiniCPM-2B-128k	q_proj, k_proj, v_proj	chatml	✔	✔	✘	✘	transformers>=4.36.0	-	openbmb/MiniCPM-2B-128k
minicpm-moe-8x2b	OpenBMB/MiniCPM-MoE-8x2B	q_proj, k_proj, v_proj	minicpm	✔	✔	✘	✘	transformers>=4.36.0	moe	openbmb/MiniCPM-MoE-8x2B
minicpm3-4b	OpenBMB/MiniCPM3-4B	q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj	chatml	✔	✘	✘	✘	transformers>=4.36	-	openbmb/MiniCPM3-4B
openbuddy-llama-65b-chat	OpenBuddy/openbuddy-llama-65b-v8-bf16	q_proj, k_proj, v_proj	openbuddy	✔	✔	✔	✘		-	OpenBuddy/openbuddy-llama-65b-v8-bf16
openbuddy-llama2-13b-chat	OpenBuddy/openbuddy-llama2-13b-v8.1-fp16	q_proj, k_proj, v_proj	openbuddy	✔	✔	✔	✘		-	OpenBuddy/openbuddy-llama2-13b-v8.1-fp16
openbuddy-llama2-70b-chat	OpenBuddy/openbuddy-llama2-70b-v10.1-bf16	q_proj, k_proj, v_proj	openbuddy	✔	✔	✔	✘		-	OpenBuddy/openbuddy-llama2-70b-v10.1-bf16
openbuddy-llama3-8b-chat	OpenBuddy/openbuddy-llama3-8b-v21.1-8k	q_proj, k_proj, v_proj	openbuddy2	✔	✔	✔	✘		-	OpenBuddy/openbuddy-llama3-8b-v21.1-8k
openbuddy-llama3-70b-chat	OpenBuddy/openbuddy-llama3-70b-v21.1-8k	q_proj, k_proj, v_proj	openbuddy2	✔	✔	✔	✘		-	OpenBuddy/openbuddy-llama3-70b-v21.1-8k
openbuddy-mistral-7b-chat	OpenBuddy/openbuddy-mistral-7b-v17.1-32k	q_proj, k_proj, v_proj	openbuddy	✔	✔	✔	✘	transformers>=4.34	-	OpenBuddy/openbuddy-mistral-7b-v17.1-32k
openbuddy-zephyr-7b-chat	OpenBuddy/openbuddy-zephyr-7b-v14.1	q_proj, k_proj, v_proj	openbuddy	✔	✔	✔	✘	transformers>=4.34	-	OpenBuddy/openbuddy-zephyr-7b-v14.1
openbuddy-deepseek-67b-chat	OpenBuddy/openbuddy-deepseek-67b-v15.2	q_proj, k_proj, v_proj	openbuddy	✔	✔	✔	✘		-	OpenBuddy/openbuddy-deepseek-67b-v15.2
openbuddy-mixtral-moe-7b-chat	OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k	q_proj, k_proj, v_proj	openbuddy	✔	✔	✘	✘	transformers>=4.36	moe	OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k
openbuddy-llama3_1-8b-chat	OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k	q_proj, k_proj, v_proj	openbuddy2	✔	✔	✔	✘	transformers>=4.43	-	OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k
mistral-7b	AI-ModelScope/Mistral-7B-v0.1	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.34	-	mistralai/Mistral-7B-v0.1
mistral-7b-v2	AI-ModelScope/Mistral-7B-v0.2-hf	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘	transformers>=4.34	-	alpindale/Mistral-7B-v0.2-hf
mistral-7b-instruct	AI-ModelScope/Mistral-7B-Instruct-v0.1	q_proj, k_proj, v_proj	llama	✔	✔	✔	✘	transformers>=4.34	-	mistralai/Mistral-7B-Instruct-v0.1
mistral-7b-instruct-v2	AI-ModelScope/Mistral-7B-Instruct-v0.2	q_proj, k_proj, v_proj	llama	✔	✔	✔	✘	transformers>=4.34	-	mistralai/Mistral-7B-Instruct-v0.2
mistral-7b-instruct-v3	LLM-Research/Mistral-7B-Instruct-v0.3	q_proj, k_proj, v_proj	llama	✔	✔	✔	✘	transformers>=4.34	-	mistralai/Mistral-7B-Instruct-v0.3
mistral-nemo-base-2407	AI-ModelScope/Mistral-Nemo-Base-2407	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘	transformers>=4.43	-	mistralai/Mistral-Nemo-Base-2407
mistral-nemo-instruct-2407	AI-ModelScope/Mistral-Nemo-Instruct-2407	q_proj, k_proj, v_proj	mistral-nemo	✔	✔	✘	✘	transformers>=4.43	-	mistralai/Mistral-Nemo-Instruct-2407
mistral-large-instruct-2407	LLM-Research/Mistral-Large-Instruct-2407	q_proj, k_proj, v_proj	mistral-nemo	✔	✔	✘	✘	transformers>=4.43	-	mistralai/Mistral-Large-Instruct-2407
mistral-small-instruct-2409	AI-ModelScope/Mistral-Small-Instruct-2409	q_proj, k_proj, v_proj	mistral-nemo	✔	✔	✘	✘	transformers>=4.43	-	mistralai/Mistral-Small-Instruct-2409
mixtral-moe-7b	AI-ModelScope/Mixtral-8x7B-v0.1	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘	transformers>=4.36	moe	mistralai/Mixtral-8x7B-v0.1
mixtral-moe-7b-instruct	AI-ModelScope/Mixtral-8x7B-Instruct-v0.1	q_proj, k_proj, v_proj	llama	✔	✔	✘	✘	transformers>=4.36	moe	mistralai/Mixtral-8x7B-Instruct-v0.1
mixtral-moe-7b-aqlm-2bit-1x16	AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf	q_proj, k_proj, v_proj	default-generation	✔	✘	✘	✘	transformers>=4.38, aqlm, torch>=2.2.0	moe	ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf
mixtral-moe-8x22b-v1	AI-ModelScope/Mixtral-8x22B-v0.1	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘	transformers>=4.36	moe	mistral-community/Mixtral-8x22B-v0.1
ministral-8b-instruct-2410	AI-ModelScope/Ministral-8B-Instruct-2410	q_proj, k_proj, v_proj	mistral-nemo	✔	✔	✘	✘	transformers>=4.46	-	mistralai/Ministral-8B-Instruct-2410
wizardlm2-7b-awq	AI-ModelScope/WizardLM-2-7B-AWQ	q_proj, k_proj, v_proj	wizardlm2-awq	✔	✔	✘	✘	transformers>=4.34	-	MaziyarPanahi/WizardLM-2-7B-AWQ
wizardlm2-8x22b	AI-ModelScope/WizardLM-2-8x22B	q_proj, k_proj, v_proj	wizardlm2	✔	✔	✘	✘	transformers>=4.36	-	alpindale/WizardLM-2-8x22B
baichuan-7b	baichuan-inc/baichuan-7B	W_pack	default-generation	✘	✔	✔	✘	transformers<4.34	-	baichuan-inc/Baichuan-7B
baichuan-13b	baichuan-inc/Baichuan-13B-Base	W_pack	default-generation	✘	✔	✔	✘	transformers<4.34	-	baichuan-inc/Baichuan-13B-Base
baichuan-13b-chat	baichuan-inc/Baichuan-13B-Chat	W_pack	baichuan	✘	✔	✔	✘	transformers<4.34	-	baichuan-inc/Baichuan-13B-Chat
baichuan2-7b	baichuan-inc/Baichuan2-7B-Base	W_pack	default-generation	✘	✔	✔	✘		-	baichuan-inc/Baichuan2-7B-Base
baichuan2-7b-chat	baichuan-inc/Baichuan2-7B-Chat	W_pack	baichuan	✘	✔	✔	✘		-	baichuan-inc/Baichuan2-7B-Chat
baichuan2-7b-chat-int4	baichuan-inc/Baichuan2-7B-Chat-4bits	W_pack	baichuan	✘	✘	✘	✘	bitsandbytes<0.41.2, accelerate<0.26	-	baichuan-inc/Baichuan2-7B-Chat-4bits
baichuan2-13b	baichuan-inc/Baichuan2-13B-Base	W_pack	default-generation	✘	✔	✔	✘		-	baichuan-inc/Baichuan2-13B-Base
baichuan2-13b-chat	baichuan-inc/Baichuan2-13B-Chat	W_pack	baichuan	✘	✔	✔	✘		-	baichuan-inc/Baichuan2-13B-Chat
baichuan2-13b-chat-int4	baichuan-inc/Baichuan2-13B-Chat-4bits	W_pack	baichuan	✘	✘	✘	✘	bitsandbytes<0.41.2, accelerate<0.26	-	baichuan-inc/Baichuan2-13B-Chat-4bits
yuan2-2b-instruct	YuanLLM/Yuan2.0-2B-hf	q_proj, k_proj, v_proj	yuan	✔	✘	✘	✘		-	IEITYuan/Yuan2-2B-hf
yuan2-2b-janus-instruct	YuanLLM/Yuan2-2B-Janus-hf	q_proj, k_proj, v_proj	yuan	✔	✘	✘	✘		-	IEITYuan/Yuan2-2B-Janus-hf
yuan2-51b-instruct	YuanLLM/Yuan2.0-51B-hf	q_proj, k_proj, v_proj	yuan	✔	✘	✘	✘		-	IEITYuan/Yuan2-51B-hf
yuan2-102b-instruct	YuanLLM/Yuan2.0-102B-hf	q_proj, k_proj, v_proj	yuan	✔	✘	✘	✘		-	IEITYuan/Yuan2-102B-hf
yuan2-m32	YuanLLM/Yuan2-M32-hf	q_proj, k_proj, v_proj	yuan	✔	✘	✘	✘		moe	IEITYuan/Yuan2-M32-hf
xverse-7b	xverse/XVERSE-7B	q_proj, k_proj, v_proj	default-generation	✘	✔	✘	✘		-	xverse/XVERSE-7B
xverse-7b-chat	xverse/XVERSE-7B-Chat	q_proj, k_proj, v_proj	xverse	✘	✔	✘	✘		-	xverse/XVERSE-7B-Chat
xverse-13b	xverse/XVERSE-13B	q_proj, k_proj, v_proj	default-generation	✘	✔	✘	✘		-	xverse/XVERSE-13B
xverse-13b-chat	xverse/XVERSE-13B-Chat	q_proj, k_proj, v_proj	xverse	✘	✔	✘	✘		-	xverse/XVERSE-13B-Chat
xverse-65b	xverse/XVERSE-65B	q_proj, k_proj, v_proj	default-generation	✘	✔	✘	✘		-	xverse/XVERSE-65B
xverse-65b-v2	xverse/XVERSE-65B-2	q_proj, k_proj, v_proj	default-generation	✘	✔	✘	✘		-	xverse/XVERSE-65B-2
xverse-65b-chat	xverse/XVERSE-65B-Chat	q_proj, k_proj, v_proj	xverse	✘	✔	✘	✘		-	xverse/XVERSE-65B-Chat
xverse-13b-256k	xverse/XVERSE-13B-256K	q_proj, k_proj, v_proj	default-generation	✘	✔	✘	✘		-	xverse/XVERSE-13B-256K
xverse-moe-a4_2b	xverse/XVERSE-MoE-A4.2B	q_proj, k_proj, v_proj	default-generation	✘	✘	✘	✘		moe	xverse/XVERSE-MoE-A4.2B
orion-14b	OrionStarAI/Orion-14B-Base	q_proj, k_proj, v_proj	default-generation	✔	✘	✘	✘		-	OrionStarAI/Orion-14B-Base
orion-14b-chat	OrionStarAI/Orion-14B-Chat	q_proj, k_proj, v_proj	orion	✔	✘	✘	✘		-	OrionStarAI/Orion-14B-Chat
bluelm-7b	vivo-ai/BlueLM-7B-Base	q_proj, k_proj, v_proj	default-generation	✘	✘	✘	✘		-	vivo-ai/BlueLM-7B-Base
bluelm-7b-32k	vivo-ai/BlueLM-7B-Base-32K	q_proj, k_proj, v_proj	default-generation	✘	✘	✘	✘		-	vivo-ai/BlueLM-7B-Base-32K
bluelm-7b-chat	vivo-ai/BlueLM-7B-Chat	q_proj, k_proj, v_proj	bluelm	✘	✘	✘	✘		-	vivo-ai/BlueLM-7B-Chat
bluelm-7b-chat-32k	vivo-ai/BlueLM-7B-Chat-32K	q_proj, k_proj, v_proj	bluelm	✘	✘	✘	✘		-	vivo-ai/BlueLM-7B-Chat-32K
ziya2-13b	Fengshenbang/Ziya2-13B-Base	q_proj, k_proj, v_proj	default-generation	✔	✔	✔	✘		-	IDEA-CCNL/Ziya2-13B-Base
ziya2-13b-chat	Fengshenbang/Ziya2-13B-Chat	q_proj, k_proj, v_proj	ziya	✔	✔	✔	✘		-	IDEA-CCNL/Ziya2-13B-Chat
skywork-13b	skywork/Skywork-13B-base	q_proj, k_proj, v_proj	default-generation	✘	✘	✘	✘		-	Skywork/Skywork-13B-base
skywork-13b-chat	skywork/Skywork-13B-chat	q_proj, k_proj, v_proj	skywork	✘	✘	✘	✘		-	-
zephyr-7b-beta-chat	modelscope/zephyr-7b-beta	q_proj, k_proj, v_proj	zephyr	✔	✔	✔	✘	transformers>=4.34	-	HuggingFaceH4/zephyr-7b-beta
polylm-13b	damo/nlp_polylm_13b_text_generation	c_attn	default-generation	✘	✘	✘	✘		-	DAMO-NLP-MT/polylm-13b
seqgpt-560m	damo/nlp_seqgpt-560m	query_key_value	default-generation	✘	✔	✘	✘		-	DAMO-NLP/SeqGPT-560M
sus-34b-chat	SUSTC/SUS-Chat-34B	q_proj, k_proj, v_proj	sus	✔	✔	✔	✘		-	SUSTech/SUS-Chat-34B
tongyi-finance-14b	TongyiFinance/Tongyi-Finance-14B	c_attn	default-generation	✔	✔	✔	✘		financial	-
tongyi-finance-14b-chat	TongyiFinance/Tongyi-Finance-14B-Chat	c_attn	qwen	✔	✔	✔	✘		financial	jxy/Tongyi-Finance-14B-Chat
tongyi-finance-14b-chat-int4	TongyiFinance/Tongyi-Finance-14B-Chat-Int4	c_attn	qwen	✔	✔	✘	✘	auto_gptq>=0.5	financial	jxy/Tongyi-Finance-14B-Chat-Int4
codefuse-codellama-34b-chat	codefuse-ai/CodeFuse-CodeLlama-34B	q_proj, k_proj, v_proj	codefuse-codellama	✔	✔	✔	✘		coding	codefuse-ai/CodeFuse-CodeLlama-34B
codefuse-codegeex2-6b-chat	codefuse-ai/CodeFuse-CodeGeeX2-6B	query_key_value	codefuse	✘	✔	✘	✘	transformers<4.34	coding	codefuse-ai/CodeFuse-CodeGeeX2-6B
codefuse-qwen-14b-chat	codefuse-ai/CodeFuse-QWen-14B	c_attn	codefuse	✔	✔	✔	✘		coding	codefuse-ai/CodeFuse-QWen-14B
phi2-3b	AI-ModelScope/phi-2	Wqkv	default-generation	✔	✔	✘	✘		coding	microsoft/phi-2
phi3-4b-4k-instruct	LLM-Research/Phi-3-mini-4k-instruct	qkv_proj	phi3	✔	✔	✘	✘	transformers>=4.36	-	microsoft/Phi-3-mini-4k-instruct
phi3-4b-128k-instruct	LLM-Research/Phi-3-mini-128k-instruct	qkv_proj	phi3	✔	✔	✘	✘	transformers>=4.36	-	microsoft/Phi-3-mini-128k-instruct
phi3-small-8k-instruct	LLM-Research/Phi-3-small-8k-instruct	query_key_value	phi3	✔	✔	✘	✘	transformers>=4.36	-	microsoft/Phi-3-small-8k-instruct
phi3-medium-4k-instruct	LLM-Research/Phi-3-medium-4k-instruct	qkv_proj	phi3	✔	✔	✘	✘	transformers>=4.36	-	microsoft/Phi-3-medium-4k-instruct
phi3-small-128k-instruct	LLM-Research/Phi-3-small-128k-instruct	query_key_value	phi3	✔	✔	✘	✘	transformers>=4.36	-	microsoft/Phi-3-small-128k-instruct
phi3-medium-128k-instruct	LLM-Research/Phi-3-medium-128k-instruct	qkv_proj	phi3	✔	✔	✘	✘	transformers>=4.36	-	microsoft/Phi-3-medium-128k-instruct
phi3_5-mini-instruct	LLM-Research/Phi-3.5-mini-instruct	qkv_proj	phi3	✔	✔	✘	✘	transformers>=4.36	-	microsoft/Phi-3.5-mini-instruct
phi3_5-moe-instruct	LLM-Research/Phi-3.5-MoE-instruct	q_proj, k_proj, v_proj	phi3	✔	✔	✘	✘	transformers>=4.36	moe	microsoft/Phi-3.5-MoE-instruct
mamba-130m	AI-ModelScope/mamba-130m-hf	in_proj, x_proj, embeddings, out_proj	default-generation	✘	✘	✘	✘	transformers>=4.39.0	-	state-spaces/mamba-130m-hf
mamba-370m	AI-ModelScope/mamba-370m-hf	in_proj, x_proj, embeddings, out_proj	default-generation	✘	✘	✘	✘	transformers>=4.39.0	-	state-spaces/mamba-370m-hf
mamba-390m	AI-ModelScope/mamba-390m-hf	in_proj, x_proj, embeddings, out_proj	default-generation	✘	✘	✘	✘	transformers>=4.39.0	-	state-spaces/mamba-390m-hf
mamba-790m	AI-ModelScope/mamba-790m-hf	in_proj, x_proj, embeddings, out_proj	default-generation	✘	✘	✘	✘	transformers>=4.39.0	-	state-spaces/mamba-790m-hf
mamba-1.4b	AI-ModelScope/mamba-1.4b-hf	in_proj, x_proj, embeddings, out_proj	default-generation	✘	✘	✘	✘	transformers>=4.39.0	-	state-spaces/mamba-1.4b-hf
mamba-2.8b	AI-ModelScope/mamba-2.8b-hf	in_proj, x_proj, embeddings, out_proj	default-generation	✘	✘	✘	✘	transformers>=4.39.0	-	state-spaces/mamba-2.8b-hf
telechat-7b	TeleAI/TeleChat-7B	key_value, query	telechat	✔	✘	✘	✘		-	Tele-AI/telechat-7B
telechat-12b	TeleAI/TeleChat-12B	key_value, query	telechat	✔	✘	✘	✘		-	Tele-AI/TeleChat-12B
telechat-12b-v2	TeleAI/TeleChat-12B-v2	key_value, query	telechat	✔	✘	✘	✘		-	Tele-AI/TeleChat-12B-v2
telechat-12b-v2-gptq-int4	swift/TeleChat-12B-V2-GPTQ-Int4	key_value, query	telechat	✔	✘	✘	✘	auto_gptq>=0.5	-	-
telechat2-115b	TeleAI/TeleChat2-115B	key_value, query	telechat2	✔	✘	✘	✘		-	Tele-AI/TeleChat2-115B
grok-1	colossalai/grok-1-pytorch	q_proj, k_proj, v_proj	default-generation	✘	✘	✘	✘		-	hpcai-tech/grok-1
dbrx-instruct	AI-ModelScope/dbrx-instruct	attn.Wqkv	dbrx	✔	✔	✘	✘	transformers>=4.36	moe	databricks/dbrx-instruct
dbrx-base	AI-ModelScope/dbrx-base	attn.Wqkv	dbrx	✔	✔	✘	✘	transformers>=4.36	moe	databricks/dbrx-base
mengzi3-13b-base	langboat/Mengzi3-13B-Base	q_proj, k_proj, v_proj	mengzi	✔	✔	✘	✘		-	Langboat/Mengzi3-13B-Base
c4ai-command-r-v01	AI-ModelScope/c4ai-command-r-v01	q_proj, k_proj, v_proj	c4ai	✔	✔	✘	✘	transformers>=4.39.1	-	CohereForAI/c4ai-command-r-v01
c4ai-command-r-plus	AI-ModelScope/c4ai-command-r-plus	q_proj, k_proj, v_proj	c4ai	✔	✔	✘	✘	transformers>4.39	-	CohereForAI/c4ai-command-r-plus
aya-expanse-8b	AI-ModelScope/aya-expanse-8b	q_proj, k_proj, v_proj	aya	✔	✔	✘	✘	transformers>=4.44.0	-	CohereForAI/aya-expanse-8b
aya-expanse-32b	AI-ModelScope/aya-expanse-32b	q_proj, k_proj, v_proj	aya	✔	✔	✘	✘	transformers>=4.44.0	-	CohereForAI/aya-expanse-32b
codestral-22b	swift/Codestral-22B-v0.1	q_proj, k_proj, v_proj	default-generation	✔	✔	✘	✘	transformers>=4.34	-	mistralai/Codestral-22B-v0.1

多模态大模型

Model Type	Model ID	Default Lora Target Modules	Default Template	Support Flash Attn	Support vLLM	Support LMDeploy	Support Megatron	Requires	Tags	HF Model ID
qwen-vl	qwen/Qwen-VL	^(transformer.h)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen-vl-generation	✔	✔	✔	✘		vision	Qwen/Qwen-VL
qwen-vl-chat	qwen/Qwen-VL-Chat	^(transformer.h)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen-vl	✔	✔	✔	✘		vision	Qwen/Qwen-VL-Chat
qwen-vl-chat-int4	qwen/Qwen-VL-Chat-Int4	^(transformer.h)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen-vl	✔	✔	✘	✘	auto_gptq>=0.5	vision	Qwen/Qwen-VL-Chat-Int4
qwen-audio	qwen/Qwen-Audio	^(transformer.h)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen-audio-generation	✔	✘	✘	✘		audio	Qwen/Qwen-Audio
qwen-audio-chat	qwen/Qwen-Audio-Chat	^(transformer.h)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen-audio	✔	✘	✘	✘		audio	Qwen/Qwen-Audio-Chat
qwen2-audio-7b	qwen/Qwen2-Audio-7B	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-audio-generation	✔	✘	✘	✘	librosa, transformers>=4.45	audio	Qwen/Qwen2-Audio-7B
qwen2-audio-7b-instruct	qwen/Qwen2-Audio-7B-Instruct	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-audio	✔	✘	✘	✘	librosa, transformers>=4.45	audio	Qwen/Qwen2-Audio-7B-Instruct
qwen2-vl-2b	qwen/Qwen2-VL-2B	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl-generation	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils	vision, video	Qwen/Qwen2-VL-2B
qwen2-vl-2b-instruct	qwen/Qwen2-VL-2B-Instruct	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils	vision, video	Qwen/Qwen2-VL-2B-Instruct
qwen2-vl-2b-instruct-gptq-int4	qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5	vision, video	Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4
qwen2-vl-2b-instruct-gptq-int8	qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5	vision, video	Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8
qwen2-vl-2b-instruct-awq	qwen/Qwen2-VL-2B-Instruct-AWQ	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils, autoawq	vision, video	Qwen/Qwen2-VL-2B-Instruct-AWQ
qwen2-vl-7b	qwen/Qwen2-VL-7B	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl-generation	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils	vision, video	Qwen/Qwen2-VL-7B
qwen2-vl-7b-instruct	qwen/Qwen2-VL-7B-Instruct	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils	vision, video	Qwen/Qwen2-VL-7B-Instruct
qwen2-vl-7b-instruct-gptq-int4	qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5	vision, video	Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4
qwen2-vl-7b-instruct-gptq-int8	qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5	vision, video	Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8
qwen2-vl-7b-instruct-awq	qwen/Qwen2-VL-7B-Instruct-AWQ	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils, autoawq	vision, video	Qwen/Qwen2-VL-7B-Instruct-AWQ
qwen2-vl-72b	qwen/Qwen2-VL-72B	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl-generation	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils	vision, video	Qwen/Qwen2-VL-72B
qwen2-vl-72b-instruct	qwen/Qwen2-VL-72B-Instruct	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils	vision, video	Qwen/Qwen2-VL-72B-Instruct
qwen2-vl-72b-instruct-gptq-int4	qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5	vision, video	Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4
qwen2-vl-72b-instruct-gptq-int8	qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5	vision, video	Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8
qwen2-vl-72b-instruct-awq	qwen/Qwen2-VL-72B-Instruct-AWQ	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	qwen2-vl	✔	✔	✘	✘	transformers>=4.45.dev.0, qwen_vl_utils, autoawq	vision, video	Qwen/Qwen2-VL-72B-Instruct-AWQ
glm4v-9b-chat	ZhipuAI/glm-4v-9b	^(transformer.encoder)(?!.(lm_head\|output\|emb\|wte\|shared)).	glm4v	✘	✘	✘	✘	transformers>=4.42	vision	THUDM/glm-4v-9b
glm-edge-v-2b	ZhipuAI/glm-edge-v-2b	^(model.layers)(?!.(lm_head\|output\|emb\|wte\|shared)).	glm-edge-v	✔	✘	✘	✘	transformers>=4.46	vision	THUDM/glm-edge-v-2b
glm-edge-v-5b	ZhipuAI/glm-edge-v-5b	^(model.layers)(?!.(lm_head\|output\|emb\|wte\|shared)).	glm-edge-v	✔	✘	✘	✘	transformers>=4.46	vision	THUDM/glm-edge-v-5b
llama3_2-11b-vision	LLM-Research/Llama-3.2-11B-Vision	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llama3_2-vision-generation	✔	✔	✘	✘	transformers>=4.45	vision	meta-llama/Llama-3.2-11B-Vision
llama3_2-11b-vision-instruct	LLM-Research/Llama-3.2-11B-Vision-Instruct	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llama3_2-vision	✔	✔	✘	✘	transformers>=4.45	vision	meta-llama/Llama-3.2-11B-Vision-Instruct
llama3_2-90b-vision	LLM-Research/Llama-3.2-90B-Vision	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llama3_2-vision-generation	✔	✔	✘	✘	transformers>=4.45	vision	meta-llama/Llama-3.2-90B-Vision
llama3_2-90b-vision-instruct	LLM-Research/Llama-3.2-90B-Vision-Instruct	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llama3_2-vision	✔	✔	✘	✘	transformers>=4.45	vision	meta-llama/Llama-3.2-90B-Vision-Instruct
llama3_1-8b-omni	ICTNLP/Llama-3.1-8B-Omni	^(model.layers\|model.speech_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llama3_1-omni	✔	✘	✘	✘	whisper, openai-whisper	audio	ICTNLP/Llama-3.1-8B-Omni
idefics3-8b-llama3	AI-ModelScope/Idefics3-8B-Llama3	^(model.text_model\|model.connector)(?!.(lm_head\|output\|emb\|wte\|shared)).	idefics3	✔	✘	✘	✘	transformers>=4.45	vision	HuggingFaceM4/Idefics3-8B-Llama3
llava1_5-7b-instruct	swift/llava-1.5-7b-hf	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava1_5	✔	✔	✘	✘	transformers>=4.36	vision	llava-hf/llava-1.5-7b-hf
llava1_5-13b-instruct	swift/llava-1.5-13b-hf	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava1_5	✔	✔	✘	✘	transformers>=4.36	vision	llava-hf/llava-1.5-13b-hf
llava1_6-mistral-7b-instruct	swift/llava-v1.6-mistral-7b-hf	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-mistral	✔	✔	✘	✘	transformers>=4.39	vision	llava-hf/llava-v1.6-mistral-7b-hf
llava1_6-vicuna-7b-instruct	swift/llava-v1.6-vicuna-7b-hf	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-vicuna	✔	✔	✘	✘	transformers>=4.39	vision	llava-hf/llava-v1.6-vicuna-7b-hf
llava1_6-vicuna-13b-instruct	swift/llava-v1.6-vicuna-13b-hf	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-vicuna	✔	✔	✘	✘	transformers>=4.39	vision	llava-hf/llava-v1.6-vicuna-13b-hf
llava1_6-llama3_1-8b-instruct	swift/llava-llama3.1-8b	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-next-llama3	✔	✘	✘	✘	transformers>=4.41	vision	-
llava1_6-yi-34b-instruct	swift/llava-v1.6-34b-hf	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-yi	✔	✔	✘	✘	transformers>=4.39	vision	llava-hf/llava-v1.6-34b-hf
llama3-llava-next-8b-hf	swift/llama3-llava-next-8b-hf	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llama-llava-next-hf	✔	✔	✘	✘	transformers>=4.39	vision	llava-hf/llama3-llava-next-8b-hf
llava-next-72b-hf	AI-ModelScope/llava-next-72b-hf	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llama-qwen-hf	✔	✔	✘	✘	transformers>=4.39	vision	llava-hf/llava-next-72b-hf
llava-next-110b-hf	AI-ModelScope/llava-next-110b-hf	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llama-qwen-hf	✔	✔	✘	✘	transformers>=4.39	vision	llava-hf/llava-next-110b-hf
llava-onevision-qwen2-0_5b-ov	AI-ModelScope/llava-onevision-qwen2-0.5b-ov-hf	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-onevision-qwen	✔	✘	✘	✘	transformers>=4.45	vision, video	llava-hf/llava-onevision-qwen2-0.5b-ov-hf
llava-onevision-qwen2-7b-ov	AI-ModelScope/llava-onevision-qwen2-7b-ov-hf	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-onevision-qwen	✔	✘	✘	✘	transformers>=4.45	vision, video	llava-hf/llava-onevision-qwen2-7b-ov-hf
llava-onevision-qwen2-72b-ov	AI-ModelScope/llava-onevision-qwen2-72b-ov-hf	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-onevision-qwen	✔	✘	✘	✘	transformers>=4.45	vision, video	llava-hf/llava-onevision-qwen2-72b-ov-hf
llama3-llava-next-8b	AI-Modelscope/llama3-llava-next-8b	^(model.layers\|model.mm_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llama3-llava-next	✔	✘	✘	✘		vision	lmms-lab/llama3-llava-next-8b
llava-next-72b	AI-Modelscope/llava-next-72b	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-qwen	✔	✘	✘	✘		vision	lmms-lab/llava-next-72b
llava-next-110b	AI-Modelscope/llava-next-110b	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-qwen	✔	✘	✘	✘		vision	lmms-lab/llava-next-110b
llava-next-video-7b-instruct	swift/LLaVA-NeXT-Video-7B-hf	^(language_model\|multi_modal_projector\|vision_resampler)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-next-video	✔	✔	✘	✘	transformers>=4.42, av	video	llava-hf/LLaVA-NeXT-Video-7B-hf
llava-next-video-7b-32k-instruct	swift/LLaVA-NeXT-Video-7B-32K-hf	^(language_model\|multi_modal_projector\|vision_resampler)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-next-video	✔	✔	✘	✘	transformers>=4.42, av	video	llava-hf/LLaVA-NeXT-Video-7B-32K-hf
llava-next-video-7b-dpo-instruct	swift/LLaVA-NeXT-Video-7B-DPO-hf	^(language_model\|multi_modal_projector\|vision_resampler)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-next-video	✔	✔	✘	✘	transformers>=4.42, av	video	llava-hf/LLaVA-NeXT-Video-7B-DPO-hf
llava-next-video-34b-instruct	swift/LLaVA-NeXT-Video-34B-hf	^(language_model\|multi_modal_projector\|vision_resampler)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-next-video-yi	✔	✔	✘	✘	transformers>=4.42, av	video	llava-hf/LLaVA-NeXT-Video-34B-hf
yi-vl-6b-chat	01ai/Yi-VL-6B	^(model.layers\|model.mm_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	yi-vl	✔	✘	✘	✘	transformers>=4.34	vision	01-ai/Yi-VL-6B
yi-vl-34b-chat	01ai/Yi-VL-34B	^(model.layers\|model.mm_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	yi-vl	✔	✘	✘	✘	transformers>=4.34	vision	01-ai/Yi-VL-34B
llava-llama3-8b-v1_1	AI-ModelScope/llava-llama-3-8b-v1_1-transformers	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	llava-llama-instruct	✔	✔	✘	✘	transformers>=4.36	vision	xtuner/llava-llama-3-8b-v1_1-transformers
internlm-xcomposer2-7b-chat	Shanghai_AI_Laboratory/internlm-xcomposer2-7b	attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3	internlm-xcomposer2	✔	✘	✔	✘		vision	internlm/internlm-xcomposer2-7b
internlm-xcomposer2-4khd-7b-chat	Shanghai_AI_Laboratory/internlm-xcomposer2-4khd-7b	attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3	internlm-xcomposer2-4khd	✔	✘	✔	✘		vision	internlm/internlm-xcomposer2-4khd-7b
internlm-xcomposer2_5-7b-chat	Shanghai_AI_Laboratory/internlm-xcomposer2d5-7b	attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3	internlm-xcomposer2_5	✔	✘	✔	✘		vision	internlm/internlm-xcomposer2d5-7b
internvl-chat-v1_5	AI-ModelScope/InternVL-Chat-V1-5	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl	✔	✔	✔	✘	transformers>=4.35, timm	vision	OpenGVLab/InternVL-Chat-V1-5
internvl-chat-v1_5-int8	AI-ModelScope/InternVL-Chat-V1-5-int8	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl	✔	✘	✘	✘	transformers>=4.35, timm	vision	OpenGVLab/InternVL-Chat-V1-5-int8
mini-internvl-chat-2b-v1_5	OpenGVLab/Mini-InternVL-Chat-2B-V1-5	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl	✔	✔	✔	✘	transformers>=4.35, timm	vision	OpenGVLab/Mini-InternVL-Chat-2B-V1-5
mini-internvl-chat-4b-v1_5	OpenGVLab/Mini-InternVL-Chat-4B-V1-5	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl-phi3	✔	✔	✘	✘	transformers>=4.35,<4.42, timm	vision	OpenGVLab/Mini-InternVL-Chat-4B-V1-5
internvl2-1b	OpenGVLab/InternVL2-1B	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl2	✔	✔	✔	✘	transformers>=4.36, timm	vision, video	OpenGVLab/InternVL2-1B
internvl2-2b	OpenGVLab/InternVL2-2B	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl2	✔	✔	✔	✘	transformers>=4.36, timm	vision, video	OpenGVLab/InternVL2-2B
internvl2-4b	OpenGVLab/InternVL2-4B	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl2-phi3	✔	✔	✔	✘	transformers>=4.36,<4.42, timm	vision, video	OpenGVLab/InternVL2-4B
internvl2-8b	OpenGVLab/InternVL2-8B	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl2	✔	✔	✔	✘	transformers>=4.36, timm	vision, video	OpenGVLab/InternVL2-8B
internvl2-26b	OpenGVLab/InternVL2-26B	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl2	✔	✔	✔	✘	transformers>=4.36, timm	vision, video	OpenGVLab/InternVL2-26B
internvl2-40b	OpenGVLab/InternVL2-40B	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl2	✔	✔	✔	✘	transformers>=4.36, timm	vision, video	OpenGVLab/InternVL2-40B
internvl2-llama3-76b	OpenGVLab/InternVL2-Llama3-76B	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl2	✔	✔	✔	✘	transformers>=4.36, timm	vision, video	OpenGVLab/InternVL2-Llama3-76B
internvl2-2b-awq	OpenGVLab/InternVL2-2B-AWQ	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl2	✔	✔	✔	✘	transformers>=4.36, timm	vision, video	OpenGVLab/InternVL2-2B-AWQ
internvl2-8b-awq	OpenGVLab/InternVL2-8B-AWQ	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl2	✔	✔	✔	✘	transformers>=4.36, timm	vision, video	OpenGVLab/InternVL2-8B-AWQ
internvl2-26b-awq	OpenGVLab/InternVL2-26B-AWQ	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl2	✔	✔	✔	✘	transformers>=4.36, timm	vision, video	OpenGVLab/InternVL2-26B-AWQ
internvl2-40b-awq	OpenGVLab/InternVL2-40B-AWQ	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl2	✔	✔	✔	✘	transformers>=4.36, timm	vision, video	OpenGVLab/InternVL2-40B-AWQ
internvl2-llama3-76b-awq	OpenGVLab/InternVL2-Llama3-76B-AWQ	^(language_model\|mlp1)(?!.(lm_head\|output\|emb\|wte\|shared)).	internvl2	✔	✔	✔	✘	transformers>=4.36, timm	vision, video	OpenGVLab/InternVL2-Llama3-76B-AWQ
deepseek-janus-1_3b	deepseek-ai/Janus-1.3B	^(language_model\|aligner)(?!.(lm_head\|output\|emb\|wte\|shared)).	deepseek-janus	✔	✘	✘	✘		vision	deepseek-ai/Janus-1.3B
deepseek-vl-1_3b-chat	deepseek-ai/deepseek-vl-1.3b-chat	^(language_model\|aligner)(?!.(lm_head\|output\|emb\|wte\|shared)).	deepseek-vl	✔	✘	✔	✘		vision	deepseek-ai/deepseek-vl-1.3b-chat
deepseek-vl-7b-chat	deepseek-ai/deepseek-vl-7b-chat	^(language_model\|aligner)(?!.(lm_head\|output\|emb\|wte\|shared)).	deepseek-vl	✔	✘	✔	✘		vision	deepseek-ai/deepseek-vl-7b-chat
ovis1_6-gemma2-9b	AIDC-AI/Ovis1.6-Gemma2-9B	^(llm)(?!.(lm_head\|output\|emb\|wte\|shared)).	ovis1_6	✔	✘	✘	✘	transformers>=4.42	vision	AIDC-AI/Ovis1.6-Gemma2-9B
paligemma-3b-pt-224	AI-ModelScope/paligemma-3b-pt-224	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	paligemma	✔	✔	✘	✘	transformers>=4.41	vision	google/paligemma-3b-pt-224
paligemma-3b-pt-448	AI-ModelScope/paligemma-3b-pt-448	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	paligemma	✔	✔	✘	✘	transformers>=4.41	vision	google/paligemma-3b-pt-448
paligemma-3b-pt-896	AI-ModelScope/paligemma-3b-pt-896	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	paligemma	✔	✔	✘	✘	transformers>=4.41	vision	google/paligemma-3b-pt-896
paligemma-3b-mix-224	AI-ModelScope/paligemma-3b-mix-224	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	paligemma	✔	✔	✘	✘	transformers>=4.41	vision	google/paligemma-3b-mix-224
paligemma-3b-mix-448	AI-ModelScope/paligemma-3b-mix-448	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	paligemma	✔	✔	✘	✘	transformers>=4.41	vision	google/paligemma-3b-mix-448
minicpm-v-3b-chat	OpenBMB/MiniCPM-V	^(llm\|resampler)(?!.(lm_head\|output\|emb\|wte\|shared)).	minicpm-v	✔	✘	✘	✘	timm, transformers<4.42	vision	openbmb/MiniCPM-V
minicpm-v-v2-chat	OpenBMB/MiniCPM-V-2	^(llm\|resampler)(?!.(lm_head\|output\|emb\|wte\|shared)).	minicpm-v	✔	✘	✘	✘	timm, transformers<4.42	vision	openbmb/MiniCPM-V-2
minicpm-v-v2_5-chat	OpenBMB/MiniCPM-Llama3-V-2_5	^(llm\|resampler)(?!.(lm_head\|output\|emb\|wte\|shared)).	minicpm-v-v2_5	✔	✔	✘	✘	timm, transformers>=4.36	vision	openbmb/MiniCPM-Llama3-V-2_5
minicpm-v-v2_6-chat	OpenBMB/MiniCPM-V-2_6	^(llm\|resampler)(?!.(lm_head\|output\|emb\|wte\|shared)).	minicpm-v-v2_6	✔	✔	✘	✘	timm, transformers>=4.36	vision, video	openbmb/MiniCPM-V-2_6
pixtral-12b	AI-ModelScope/pixtral-12b	^(language_model\|multi_modal_projector)(?!.(lm_head\|output\|emb\|wte\|shared)).	pixtral	✘	✘	✘	✘	transformers>=4.45	vision	mistral-community/pixtral-12b
mplug-owl2-chat	iic/mPLUG-Owl2	q_proj, k_proj.multiway.0, k_proj.multiway.1, v_proj.multiway.0, v_proj.multiway.1	mplug-owl2	✔	✘	✘	✘	transformers<4.35, icecream	vision	MAGAer13/mplug-owl2-llama2-7b
mplug-owl2_1-chat	iic/mPLUG-Owl2.1	c_attn.multiway.0, c_attn.multiway.1	mplug-owl2	✔	✘	✘	✘	transformers<4.35, icecream	vision	Mizukiluke/mplug_owl_2_1
mplug-owl3-1b-chat	iic/mPLUG-Owl3-1B-241014	^(language_model\|vision2text_model)(?!.(lm_head\|output\|emb\|wte\|shared)).	mplug_owl3	✔	✘	✘	✘	transformers>=4.36, icecream	vision, video	mPLUG/mPLUG-Owl3-1B-241014
mplug-owl3-2b-chat	iic/mPLUG-Owl3-2B-241014	^(language_model\|vision2text_model)(?!.(lm_head\|output\|emb\|wte\|shared)).	mplug_owl3	✔	✘	✘	✘	transformers>=4.36, icecream	vision, video	mPLUG/mPLUG-Owl3-2B-241014
mplug-owl3-7b-chat	iic/mPLUG-Owl3-7B-240728	^(language_model\|vision2text_model)(?!.(lm_head\|output\|emb\|wte\|shared)).	mplug_owl3	✔	✘	✘	✘	transformers>=4.36, icecream	vision, video	mPLUG/mPLUG-Owl3-7B-240728
mplug-owl3v-7b-chat	iic/mPLUG-Owl3-7B-241101	^(language_model\|vision2text_model)(?!.(lm_head\|output\|emb\|wte\|shared)).	mplug_owl3v	✔	✘	✘	✘	transformers>=4.36, icecream	vision, video	mPLUG/mPLUG-Owl3-7B-241101
phi3-vision-128k-instruct	LLM-Research/Phi-3-vision-128k-instruct	^(model.layers\|model.vision_embed_tokens.img_projection)(?!.(lm_head\|output\|emb\|wte\|shared)).	phi3-vl	✔	✔	✘	✘	transformers>=4.36	vision	microsoft/Phi-3-vision-128k-instruct
phi3_5-vision-instruct	LLM-Research/Phi-3.5-vision-instruct	^(model.layers\|model.vision_embed_tokens.img_projection)(?!.(lm_head\|output\|emb\|wte\|shared)).	phi3-vl	✔	✔	✘	✘	transformers>=4.36	vision	microsoft/Phi-3.5-vision-instruct
cogvlm-17b-chat	ZhipuAI/cogvlm-chat	^(model.layers)(?!.(lm_head\|output\|emb\|wte\|shared)).	cogvlm	✘	✘	✘	✘	transformers<4.42	vision	THUDM/cogvlm-chat-hf
cogvlm2-19b-chat	ZhipuAI/cogvlm2-llama3-chinese-chat-19B	^(model.layers)(?!.(lm_head\|output\|emb\|wte\|shared)).	cogvlm	✘	✘	✔	✘	transformers<4.42	vision	THUDM/cogvlm2-llama3-chinese-chat-19B
cogvlm2-en-19b-chat	ZhipuAI/cogvlm2-llama3-chat-19B	^(model.layers)(?!.(lm_head\|output\|emb\|wte\|shared)).	cogvlm	✘	✘	✔	✘	transformers<4.42	vision	THUDM/cogvlm2-llama3-chat-19B
cogvlm2-video-13b-chat	ZhipuAI/cogvlm2-video-llama3-chat	^(model.layers)(?!.(lm_head\|output\|emb\|wte\|shared)).	cogvlm2-video	✘	✘	✘	✘	decord, pytorchvideo, transformers>=4.42	vision, video	THUDM/cogvlm2-video-llama3-chat
cogagent-18b-chat	ZhipuAI/cogagent-chat	^(model.layers)(?!.(lm_head\|output\|emb\|wte\|shared)).	cogagent-chat	✘	✘	✘	✘	timm	vision	THUDM/cogagent-chat-hf
cogagent-18b-instruct	ZhipuAI/cogagent-vqa	^(model.layers)(?!.(lm_head\|output\|emb\|wte\|shared)).	cogagent-instruct	✘	✘	✘	✘	timm	vision	THUDM/cogagent-vqa-hf
molmoe-1b	LLM-Research/MolmoE-1B-0924	^(model.transformer)(?!.(lm_head\|output\|emb\|wte\|shared)).	molmo	✔	✘	✘	✘	transformers>=4.45.0	vision	allenai/MolmoE-1B-0924
molmo-7b-o	LLM-Research/Molmo-7B-O-0924	^(model.transformer)(?!.(lm_head\|output\|emb\|wte\|shared)).	molmo	✔	✘	✘	✘	transformers>=4.45.0	vision	allenai/Molmo-7B-O-0924
molmo-7b-d	LLM-Research/Molmo-7B-D-0924	^(model.transformer)(?!.(lm_head\|output\|emb\|wte\|shared)).	molmo	✔	✘	✘	✘	transformers>=4.45.0	vision	allenai/Molmo-7B-D-0924
molmo-72b	LLM-Research/Molmo-72B-0924	^(model.transformer)(?!.(lm_head\|output\|emb\|wte\|shared)).	molmo	✔	✘	✘	✘	transformers>=4.45.0	vision	allenai/Molmo-72B-0924
emu3-chat	BAAI/Emu3-Chat	^(model)(?!.(lm_head\|output\|emb\|wte\|shared)).	emu3-chat	✔	✘	✘	✘	transformers>=4.44.0	vision	BAAI/Emu3-Chat
florence-2-base	AI-ModelScope/Florence-2-base	^(language_model\|image_projection)(?!.(lm_head\|output\|emb\|wte\|shared)).	florence	✔	✘	✘	✘		vision	microsoft/Florence-2-base
florence-2-base-ft	AI-ModelScope/Florence-2-base-ft	^(language_model\|image_projection)(?!.(lm_head\|output\|emb\|wte\|shared)).	florence	✔	✘	✘	✘		vision	microsoft/Florence-2-base-ft
florence-2-large	AI-ModelScope/Florence-2-large	^(language_model\|image_projection)(?!.(lm_head\|output\|emb\|wte\|shared)).	florence	✔	✘	✘	✘		vision	microsoft/Florence-2-large
florence-2-large-ft	AI-ModelScope/Florence-2-large-ft	^(language_model\|image_projection)(?!.(lm_head\|output\|emb\|wte\|shared)).	florence	✔	✘	✘	✘		vision	microsoft/Florence-2-large-ft
got-ocr2	stepfun-ai/GOT-OCR2_0	^(model.layers\|model.mm_projector_vary)(?!.(lm_head\|output\|emb\|wte\|shared)).	got_ocr2	✔	✘	✘	✘		audio	stepfun-ai/GOT-OCR2_0

数据集

下表介绍了swift接入的数据集的相关信息:

Dataset Name: 数据集在swift中注册的dataset_name.
Dataset ID: 数据集在ModelScope上的dataset_id.
Size: 数据集中的数据样本数量.
Statistic: 数据集的统计量. 我们使用token数进行统计, 这对于调整max_length超参数有帮助. 我们将数据集的训练集和验证集进行拼接, 然后进行统计. 我们使用qwen的tokenizer对数据集进行分词. 不同的tokenizer的统计量不同, 如果你要获取其他的模型的tokenizer的token统计量, 可以通过脚本自行获取.

Dataset Name	Dataset ID	Subsets	Dataset Size	Statistic (token)	Tags	HF Dataset ID
🔥ms-bench	iic/ms_bench		316820	346.9±443.2, min=22, max=30960	chat, general, multi-round	-
🔥alpaca-en	AI-ModelScope/alpaca-gpt4-data-en		52002	176.2±125.8, min=26, max=740	chat, general	vicgalle/alpaca-gpt4
🔥alpaca-zh	AI-ModelScope/alpaca-gpt4-data-zh		48818	162.1±93.9, min=26, max=856	chat, general	llm-wizard/alpaca-gpt4-data-zh
multi-alpaca	damo/nlp_polylm_multialpaca_sft	ar de es fr id ja ko pt ru th vi	131867	112.9±50.6, min=26, max=1226	chat, general, multilingual	-
instinwild	wyj123456/instinwild	default subset	103695	145.4±60.7, min=28, max=1434	-	-
cot-en	YorickHe/CoT		74771	122.7±64.8, min=51, max=8320	chat, general	-
cot-zh	YorickHe/CoT_zh		74771	117.5±70.8, min=43, max=9636	chat, general	-
instruct-en	wyj123456/instruct		888970	269.1±331.5, min=26, max=7254	chat, general	-
firefly-zh	AI-ModelScope/firefly-train-1.1M		1649399	178.1±260.4, min=26, max=12516	chat, general	YeungNLP/firefly-train-1.1M
gpt4all-en	wyj123456/GPT4all		806199	302.7±384.5, min=27, max=7391	chat, general	-
sharegpt	swift/sharegpt	common-zh computer-zh unknow-zh common-en computer-en	96566	933.3±864.8, min=21, max=66412	chat, general, multi-round	-
tulu-v2-sft-mixture	AI-ModelScope/tulu-v2-sft-mixture		5119	520.7±437.6, min=68, max=2549	chat, multilingual, general, multi-round	allenai/tulu-v2-sft-mixture
wikipedia-zh	AI-ModelScope/wikipedia-cn-20230720-filtered		254547	568.4±713.2, min=37, max=78678	text-generation, general, pretrained	pleisto/wikipedia-cn-20230720-filtered
open-orca	AI-ModelScope/OpenOrca		994896	382.3±417.4, min=31, max=8740	chat, multilingual, general	-
🔥sharegpt-gpt4	AI-ModelScope/sharegpt_gpt4	default V3_format zh_38K_format	72684	1047.6±1313.1, min=22, max=66412	chat, multilingual, general, multi-round, gpt4	-
deepctrl-sft	AI-ModelScope/deepctrl-sft-data	default en	14149024	389.8±628.6, min=21, max=626237	chat, general, sft, multi-round	-
🔥coig-cqia	AI-ModelScope/COIG-CQIA	chinese_traditional coig_pc exam finance douban human_value logi_qa ruozhiba segmentfault wiki wikihow xhs zhihu	44694	703.8±654.2, min=33, max=19288	general	-
🔥ruozhiba	AI-ModelScope/ruozhiba	post-annual title-good title-norm	85658	39.9±13.1, min=21, max=559	pretrain	-
long-alpaca-12k	AI-ModelScope/LongAlpaca-12k		11998	9619.0±8295.8, min=36, max=78925	longlora, QA	Yukang/LongAlpaca-12k
lmsys-chat-1m	AI-ModelScope/lmsys-chat-1m		-	Dataset is too huge, please click the original link to view the dataset stat.	chat, em	lmsys/lmsys-chat-1m
🔥ms-agent	iic/ms_agent		26336	650.9±217.2, min=209, max=2740	chat, agent, multi-round	-
🔥ms-agent-for-agentfabric	AI-ModelScope/ms_agent_for_agentfabric	default addition	30000	617.8±199.1, min=251, max=2657	chat, agent, multi-round	-
ms-agent-multirole	iic/MSAgent-MultiRole		9500	447.6±84.9, min=145, max=1101	chat, agent, multi-round, role-play, multi-agent	-
🔥toolbench-for-alpha-umi	shenweizhou/alpha-umi-toolbench-processed-v2	backbone caller planner summarizer	1448337	1439.7±853.9, min=123, max=18467	chat, agent	-
damo-agent-zh	damo/MSAgent-Bench		386984	956.5±407.3, min=326, max=19001	chat, agent, multi-round	-
damo-agent-zh-mini	damo/MSAgent-Bench		20845	1326.4±329.6, min=571, max=4304	chat, agent, multi-round	-
agent-instruct-all-en	huangjintao/AgentInstruct_copy	alfworld db kg mind2web os webshop	1866	1144.3±635.5, min=206, max=6412	chat, agent, multi-round	-
🔥msagent-pro	iic/MSAgent-Pro		21905	1524.5±921.3, min=64, max=16770	chat, agent, multi-round	-
toolbench	swift/ToolBench		124345	3669.5±1600.9, min=1047, max=22581	chat, agent, multi-round	-
code-alpaca-en	wyj123456/code_alpaca_en		20016	100.2±60.1, min=29, max=1776	-	sahil2801/CodeAlpaca-20k
🔥leetcode-python-en	AI-ModelScope/leetcode-solutions-python		2359	727.1±235.9, min=259, max=2146	chat, coding	-
🔥codefuse-python-en	codefuse-ai/CodeExercise-Python-27k		27224	483.6±193.9, min=45, max=3082	chat, coding	-
🔥codefuse-evol-instruction-zh	codefuse-ai/Evol-instruction-66k		66862	439.6±206.3, min=37, max=2983	chat, coding	-
medical-en	swift/medical_zh	en	117617	257.4±89.1, min=36, max=2564	chat, medical	-
medical-zh	swift/medical_zh	zh	1950972	167.2±219.7, min=26, max=27351	chat, medical	-
🔥disc-med-sft-zh	AI-ModelScope/DISC-Med-SFT		441767	354.1±193.1, min=25, max=2231	chat, medical	Flmc/DISC-Med-SFT
lawyer-llama-zh	AI-ModelScope/lawyer_llama_data		21476	194.4±91.7, min=27, max=924	chat, law	Skepsun/lawyer_llama_data
tigerbot-law-zh	AI-ModelScope/tigerbot-law-plugin		55895	109.9±126.4, min=37, max=18878	text-generation, law, pretrained	TigerResearch/tigerbot-law-plugin
🔥disc-law-sft-zh	AI-ModelScope/DISC-Law-SFT		166758	533.7±495.4, min=30, max=15169	chat, law	ShengbinYue/DISC-Law-SFT
🔥blossom-math-zh	AI-ModelScope/blossom-math-v2		10000	169.3±58.7, min=35, max=563	chat, math	Azure99/blossom-math-v2
school-math-zh	AI-ModelScope/school_math_0.25M		248480	157.7±72.2, min=33, max=3450	chat, math, quality	BelleGroup/school_math_0.25M
open-platypus-en	AI-ModelScope/Open-Platypus		24926	367.9±254.8, min=30, max=3951	chat, math, quality	garage-bAInd/Open-Platypus
text2sql-en	AI-ModelScope/texttosqlv2_25000_v2		25000	274.6±326.4, min=38, max=1975	chat, sql	Clinton/texttosqlv2_25000_v2
🔥sql-create-context-en	AI-ModelScope/sql-create-context		78577	80.2±17.8, min=36, max=456	chat, sql	b-mc2/sql-create-context
synthetic-text-to-sql	AI-ModelScope/synthetic_text_to_sql	default	100000	283.4±115.8, min=61, max=1356	nl2sql, en	gretelai/synthetic_text_to_sql
🔥advertise-gen-zh	lvjianjin/AdvertiseGen		98399	130.6±21.7, min=51, max=241	text-generation	shibing624/AdvertiseGen
🔥dureader-robust-zh	modelscope/DuReader_robust-QG		17899	241.1±137.4, min=60, max=1416	text-generation	-
cmnli-zh	modelscope/clue	cmnli	404024	82.6±16.6, min=51, max=199	text-generation, classification	clue
🔥jd-sentiment-zh	DAMO_NLP/jd		50000	66.0±83.2, min=39, max=4039	text-generation, classification	-
🔥hc3-zh	simpleai/HC3-Chinese	baike open_qa nlpcc_dbqa finance medicine law psychology	39781	176.8±81.5, min=57, max=3051	text-generation, classification	Hello-SimpleAI/HC3-Chinese
🔥hc3-en	simpleai/HC3	finance medicine	11021	298.3±138.7, min=65, max=2267	text-generation, classification	Hello-SimpleAI/HC3
dolly-15k	AI-ModelScope/databricks-dolly-15k	default	15011	199.2±267.8, min=22, max=8615	multi-task, en, quality	databricks/databricks-dolly-15k
zhihu-kol	OmniData/Zhihu-KOL	default	-	Dataset is too huge, please click the original link to view the dataset stat.	zhihu, qa	wangrui6/Zhihu-KOL
zhihu-kol-filtered	OmniData/Zhihu-KOL-More-Than-100-Upvotes	default	271261	952.0±1727.2, min=25, max=98658	zhihu, qa	bzb2023/Zhihu-KOL-More-Than-100-Upvotes
finance-en	wyj123456/finance_en		68911	135.6±134.3, min=26, max=3525	chat, financial	ssbuild/alpaca_finance_en
poetry-zh	modelscope/chinese-poetry-collection		390309	55.2±9.4, min=23, max=83	text-generation, poetry	-
webnovel-zh	AI-ModelScope/webnovel_cn		50000	1478.9±11526.1, min=100, max=490484	chat, novel	zxbsmk/webnovel_cn
generated-chat-zh	AI-ModelScope/generated_chat_0.4M		396004	273.3±52.0, min=32, max=873	chat, character-dialogue	BelleGroup/generated_chat_0.4M
🔥self-cognition	swift/self-cognition		134	53.6±18.6, min=29, max=121	chat, self-cognition	modelscope/self-cognition
🔥swift-mix	swift/swift-sft-mixture	sharegpt firefly codefuse metamathqa	-	Dataset is too huge, please click the original link to view the dataset stat.	chat, sft, general	-
cls-fudan-news-zh	damo/zh_cls_fudan-news		4959	3234.4±2547.5, min=91, max=19548	chat, classification	-
ner-jave-zh	damo/zh_ner-JAVE		1266	118.3±45.5, min=44, max=223	chat, ner	-
coco-en	modelscope/coco_2014_caption	coco_2014_caption	454617	299.8±2.8, min=295, max=352	chat, multi-modal, vision	-
🔥coco-en-mini	modelscope/coco_2014_caption	coco_2014_caption	40504	299.8±2.6, min=295, max=338	chat, multi-modal, vision	-
coco-en-2	modelscope/coco_2014_caption	coco_2014_caption	454617	36.8±2.8, min=32, max=89	chat, multi-modal, vision	-
🔥coco-en-2-mini	modelscope/coco_2014_caption	coco_2014_caption	40504	36.8±2.6, min=32, max=75	chat, multi-modal, vision	-
capcha-images	AI-ModelScope/captcha-images		8000	31.0±0.0, min=31, max=31	chat, multi-modal, vision	-
latex-ocr-print	AI-ModelScope/LaTeX_OCR	default	17918	362.7±34.8, min=294, max=528	chat, ocr, multi-modal, vision	linxy/LaTeX_OCR
latex-ocr-handwrite	AI-ModelScope/LaTeX_OCR	synthetic_handwrite	95424	375.1±59.4, min=292, max=2115	chat, ocr, multi-modal, vision	linxy/LaTeX_OCR
aishell1-zh	speech_asr/speech_asr_aishell1_trainsets		141600	152.2±36.8, min=63, max=419	chat, multi-modal, audio	-
🔥aishell1-zh-mini	speech_asr/speech_asr_aishell1_trainsets		14526	152.2±35.6, min=74, max=359	chat, multi-modal, audio	-
🔥video-chatgpt	swift/VideoChatGPT	Generic Temporal Consistency	3206	88.4±48.3, min=32, max=399	chat, multi-modal, video	lmms-lab/VideoChatGPT
egoschema	AI-ModelScope/egoschema	Subset	101	191.6±80.7, min=96, max=435	chat, multi-modal, video	lmms-lab/egoschema
llava-video-178k	lmms-lab/LLaVA-Video-178K	0_30_s_academic_v0_1 0_30_s_youtube_v0_1 1_2_m_academic_v0_1 1_2_m_youtube_v0_1 2_3_m_academic_v0_1 2_3_m_youtube_v0_1 30_60_s_academic_v0_1 30_60_s_youtube_v0_1	-	Dataset is too huge, please click the original link to view the dataset stat.	chat, multi-modal, video	lmms-lab/LLaVA-Video-178K
moviechat-1k-test	AI-ModelScope/MovieChat-1K-test		486	36.1±4.3, min=27, max=42	chat, multi-modal, video	Enxin/MovieChat-1K-test
hh-rlhf	AI-ModelScope/hh-rlhf	harmless-base helpful-base helpful-online helpful-rejection-sampled	127459	245.4±190.7, min=22, max=1999	rlhf, dpo, pairwise	-
🔥hh-rlhf-cn	AI-ModelScope/hh_rlhf_cn	hh_rlhf harmless_base_cn harmless_base_en helpful_base_cn helpful_base_en	355920	171.2±122.7, min=22, max=3078	rlhf, dpo, pairwise	-
orpo-dpo-mix-40k	AI-ModelScope/orpo-dpo-mix-40k	default	43666	548.3±397.4, min=28, max=8483	dpo, orpo, en, quality	mlabonne/orpo-dpo-mix-40k
stack-exchange-paired	AI-ModelScope/stack-exchange-paired		4483004	534.5±594.6, min=31, max=56588	hfrl, dpo, pairwise	lvwerra/stack-exchange-paired
shareai-llama3-dpo-zh-en-emoji	hjh0119/shareAI-Llama3-DPO-zh-en-emoji	default	2449	334.0±162.8, min=36, max=1801	rlhf, dpo, pairwise	-
ultrafeedback-kto	AI-ModelScope/ultrafeedback-binarized-preferences-cleaned-kto	default	230720	11.0±0.0, min=11, max=11	rlhf, kto	-
rlaif-v	swift/RLAIF-V-Dataset	default	83132	119.8±52.6, min=28, max=556	rlhf, dpo, multi-modal, en	openbmb/RLAIF-V-Dataset
pileval	swift/pile-val-backup		214670	1612.3±8856.2, min=11, max=1208955	text-generation, awq	mit-han-lab/pile-val-backup
mantis-instruct	swift/Mantis-Instruct	birds-to-words chartqa coinstruct contrastive_caption docvqa dreamsim dvqa iconqa imagecode llava_665k_multi lrv_multi multi_vqa nextqa nlvr2 spot-the-diff star visual_story_telling	655351	825.7±812.5, min=284, max=13563	chat, multi-modal, vision, quality	TIGER-Lab/Mantis-Instruct
llava-data-instruct	swift/llava-data	llava_instruct	364100	189.0±142.1, min=33, max=5183	sft, multi-modal, quality	TIGER-Lab/llava-data
midefics	swift/MideficsDataset		3800	201.3±70.2, min=60, max=454	medical, en, vqa	WinterSchool/MideficsDataset
gqa	None	train_all_instructions	-	Dataset is too huge, please click the original link to view the dataset stat.	multi-modal, en, vqa, quality	lmms-lab/GQA
text-caps	swift/TextCaps		18145	38.2±4.4, min=31, max=73	multi-modal, en, caption, quality	HuggingFaceM4/TextCaps
refcoco-unofficial-caption	swift/refcoco		46215	44.7±3.2, min=36, max=71	multi-modal, en, caption	jxu124/refcoco
refcoco-unofficial-grounding	swift/refcoco		46215	45.2±3.1, min=37, max=69	multi-modal, en, grounding	jxu124/refcoco
refcocog-unofficial-caption	swift/refcocog		44799	49.7±4.7, min=37, max=88	multi-modal, en, caption	jxu124/refcocog
refcocog-unofficial-grounding	swift/refcocog		44799	50.1±4.7, min=37, max=90	multi-modal, en, grounding	jxu124/refcocog
a-okvqa	swift/A-OKVQA		18201	45.8±7.9, min=32, max=100	multi-modal, en, vqa, quality	HuggingFaceM4/A-OKVQA
okvqa	swift/OK-VQA_train		9009	34.4±3.3, min=28, max=59	multi-modal, en, vqa, quality	Multimodal-Fatima/OK-VQA_train
ocr-vqa	swift/OCR-VQA		186753	35.6±6.6, min=29, max=193	multi-modal, en, ocr-vqa	howard-hou/OCR-VQA
grit	swift/GRIT		-	Dataset is too huge, please click the original link to view the dataset stat.	multi-modal, en, caption-grounding, quality	zzliang/GRIT
llava-instruct-mix	swift/llava-instruct-mix-vsft		13640	179.8±120.2, min=30, max=962	multi-modal, en, vqa, quality	HuggingFaceH4/llava-instruct-mix-vsft
lnqa	swift/lnqa		-	Dataset is too huge, please click the original link to view the dataset stat.	multi-modal, en, ocr-vqa, quality	vikhyatk/lnqa
science-qa	swift/ScienceQA		8315	100.3±59.5, min=38, max=638	multi-modal, science, vqa, quality	derek-thomas/ScienceQA
guanaco	AI-ModelScope/GuanacoDataset	default	31561	250.1±70.3, min=89, max=1436	chat, zh	JosephusCheung/GuanacoDataset
mind2web	swift/Multimodal-Mind2Web		1009	297522.4±325496.2, min=8592, max=3499715	agent, multi-modal	osunlp/Multimodal-Mind2Web
sharegpt-4o-image	AI-ModelScope/ShareGPT-4o	image_caption	57289	638.7±157.9, min=47, max=4640	vqa, multi-modal	OpenGVLab/ShareGPT-4o
pixelprose	swift/pixelprose		-	Dataset is too huge, please click the original link to view the dataset stat.	caption, multi-modal, vision	tomg-group-umd/pixelprose
m3it	AI-ModelScope/M3IT	coco vqa-v2 shapes shapes-rephrased coco-goi-rephrased snli-ve snli-ve-rephrased okvqa a-okvqa viquae textcap docvqa science-qa imagenet imagenet-open-ended imagenet-rephrased coco-goi clevr clevr-rephrased nlvr coco-itm coco-itm-rephrased vsr vsr-rephrased mocheg mocheg-rephrased coco-text fm-iqa activitynet-qa msrvtt ss coco-cn refcoco refcoco-rephrased multi30k image-paragraph-captioning visual-dialog visual-dialog-rephrased iqa vcr visual-mrc ivqa msrvtt-qa msvd-qa gqa text-vqa ocr-vqa st-vqa flickr8k-cn	-	Dataset is too huge, please click the original link to view the dataset stat.	chat, multi-modal, vision	-
sharegpt4v	AI-ModelScope/ShareGPT4V	ShareGPT4V ShareGPT4V-PT	-	Dataset is too huge, please click the original link to view the dataset stat.	chat, multi-modal, vision	-
llava-instruct-150k	AI-ModelScope/LLaVA-Instruct-150K		624610	490.4±180.2, min=288, max=5438	chat, multi-modal, vision	-
llava-pretrain	AI-ModelScope/LLaVA-Pretrain	default	-	Dataset is too huge, please click the original link to view the dataset stat.	vqa, multi-modal, quality	liuhaotian/LLaVA-Pretrain
sa1b-dense-caption	Tongyi-DataEngine/SA1B-Dense-Caption		-	Dataset is too huge, please click the original link to view the dataset stat.	zh, multi-modal, vqa	-
sa1b-paired-caption	Tongyi-DataEngine/SA1B-Paired-Captions-Images		-	Dataset is too huge, please click the original link to view the dataset stat.	zh, multi-modal, vqa	-
alpaca-cleaned	AI-ModelScope/alpaca-cleaned		51760	177.9±126.4, min=26, max=1044	chat, general, bench, quality	yahma/alpaca-cleaned
aya-collection	swift/aya_collection	aya_dataset	202364	494.0±6911.3, min=21, max=3044268	multi-lingual, qa	CohereForAI/aya_collection
belle-generated-chat-0.4M	AI-ModelScope/generated_chat_0.4M		396004	273.3±52.0, min=32, max=873	common, zh	BelleGroup/generated_chat_0.4M
belle-math-0.25M	AI-ModelScope/school_math_0.25M		248480	157.7±72.2, min=33, max=3450	math, zh	BelleGroup/school_math_0.25M
belle-train-0.5M-CN	AI-ModelScope/train_0.5M_CN		519255	129.1±91.5, min=27, max=6507	common, zh, quality	BelleGroup/train_0.5M_CN
belle-train-1M-CN	AI-ModelScope/train_1M_CN		-	Dataset is too huge, please click the original link to view the dataset stat.	common, zh, quality	BelleGroup/train_1M_CN
belle-train-2M-CN	AI-ModelScope/train_2M_CN		-	Dataset is too huge, please click the original link to view the dataset stat.	common, zh, quality	BelleGroup/train_2M_CN
belle-train-3.5M-CN	swift/train_3.5M_CN		-	Dataset is too huge, please click the original link to view the dataset stat.	common, zh, quality	BelleGroup/train_3.5M_CN
c4	None		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	allenai/c4
chart-qa	swift/ChartQA		28299	43.1±5.5, min=29, max=77	en, vqa, quality	HuggingFaceM4/ChartQA
chinese-c4	swift/chinese-c4		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, zh, quality	shjwudp/chinese-c4
cinepile	swift/cinepile		-	Dataset is too huge, please click the original link to view the dataset stat.	vqa, en, youtube, video	tomg-group-umd/cinepile
classical-chinese-translate	swift/classical_chinese_translate		6655	344.0±76.4, min=61, max=815	chat, play-ground	-
codealpaca-20k	AI-ModelScope/CodeAlpaca-20k		20016	100.2±60.1, min=29, max=1776	code, en	HuggingFaceH4/CodeAlpaca_20K
cosmopedia	None	auto_math_text khanacademy openstax stanford stories web_samples_v1 web_samples_v2 wikihow	-	Dataset is too huge, please click the original link to view the dataset stat.	multi-domain, en, qa	HuggingFaceTB/cosmopedia
cosmopedia-100k	swift/cosmopedia-100k		100000	1024.5±243.1, min=239, max=2981	multi-domain, en, qa	HuggingFaceTB/cosmopedia-100k
dolma	swift/dolma	v1_7	-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	allenai/dolma
dolphin	swift/dolphin	flan1m-alpaca-uncensored flan5m-alpaca-uncensored	-	Dataset is too huge, please click the original link to view the dataset stat.	en	cognitivecomputations/dolphin
duet	AI-ModelScope/Duet-v0.5		5000	1157.4±189.3, min=657, max=2344	CoT, en	G-reen/Duet-v0.5
evol-instruct-v2	AI-ModelScope/WizardLM_evol_instruct_V2_196k		109184	480.9±333.1, min=26, max=4942	chat, en	WizardLM/WizardLM_evol_instruct_V2_196k
fineweb	None		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	HuggingFaceFW/fineweb
gen-qa	swift/GenQA		-	Dataset is too huge, please click the original link to view the dataset stat.	qa, quality, multi-task	tomg-group-umd/GenQA
github-code	swift/github-code		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	codeparrot/github-code
gpt4v-dataset	swift/gpt4v-dataset		12356	217.9±68.3, min=35, max=596	en, caption, multi-modal, quality	laion/gpt4v-dataset
guanaco-belle-merge	AI-ModelScope/guanaco_belle_merge_v1.0		693987	134.2±92.0, min=24, max=6507	QA, zh	Chinese-Vicuna/guanaco_belle_merge_v1.0
infinity-instruct	swift/Infinity-Instruct		-	Dataset is too huge, please click the original link to view the dataset stat.	qa, quality, multi-task	BAAI/Infinity-Instruct
llava-med-zh-instruct	swift/llava-med-zh-instruct-60k		56649	207.7±67.6, min=37, max=657	zh, medical, vqa	BUAADreamer/llava-med-zh-instruct-60k
🔥longwriter-6k	ZhipuAI/LongWriter-6k		6000	4887.2±2879.2, min=117, max=30354	long, chat, sft	THUDM/LongWriter-6k
🔥longwriter-6k-filtered	swift/longwriter-6k-filtered		666	4108.9±2636.9, min=1190, max=17050	long, chat, sft	-
math-instruct	AI-ModelScope/MathInstruct		262283	254.4±183.5, min=11, max=4383	math, cot, en, quality	TIGER-Lab/MathInstruct
math-plus	TIGER-Lab/MATH-plus	train	893929	287.1±158.7, min=24, max=2919	qa, math, en, quality	TIGER-Lab/MATH-plus
moondream2-coyo-5M	swift/moondream2-coyo-5M-captions		-	Dataset is too huge, please click the original link to view the dataset stat.	caption, pretrain, quality	isidentical/moondream2-coyo-5M-captions
no-robots	swift/no_robots		9485	298.7±246.4, min=40, max=6739	multi-task, quality, human-annotated	HuggingFaceH4/no_robots
open-hermes	swift/OpenHermes-2.5		-	Dataset is too huge, please click the original link to view the dataset stat.	cot, en, quality	teknium/OpenHermes-2.5
open-o1	AI-ModelScope/OpenO1-SFT	default	203579	615.5±659.6, min=11, max=27509	chat, general, o1	O1-OPEN/OpenO1-SFT
open-orca-chinese	AI-ModelScope/OpenOrca-Chinese		-	Dataset is too huge, please click the original link to view the dataset stat.	QA, zh, general, quality	yys/OpenOrca-Chinese
orca_dpo_pairs	swift/orca_dpo_pairs		12859	366.9±251.9, min=30, max=2010	rlhf, quality	Intel/orca_dpo_pairs
path-vqa	swift/path-vqa		19654	34.8±7.3, min=27, max=85	multi-modal, vqa, medical	flaviagiammarino/path-vqa
pile	AI-ModelScope/pile		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain	EleutherAI/pile
poison-mpts	iic/100PoisonMpts		906	150.6±80.8, min=39, max=656	poison-management, zh	-
🔥qwen2-pro-en	AI-ModelScope/Magpie-Qwen2-Pro-200K-English		200000	605.4±287.3, min=221, max=4267	chat, sft, en	Magpie-Align/Magpie-Qwen2-Pro-200K-English
🔥qwen2-pro-filtered	AI-ModelScope/Magpie-Qwen2-Pro-300K-Filtered		300000	555.8±286.6, min=148, max=4267	chat, sft	Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered
🔥qwen2-pro-zh	AI-ModelScope/Magpie-Qwen2-Pro-200K-Chinese		200000	446.2±246.4, min=74, max=4101	chat, sft, zh	Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese
redpajama-data-1t	swift/RedPajama-Data-1T		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	togethercomputer/RedPajama-Data-1T
redpajama-data-v2	swift/RedPajama-Data-V2		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	togethercomputer/RedPajama-Data-V2
refinedweb	None		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	tiiuae/falcon-refinedweb
rwkv-pretrain-web	mapjack/openwebtext_dataset		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, zh, quality	-
sft-nectar	AI-ModelScope/SFT-Nectar		131192	396.4±272.1, min=44, max=10732	cot, en, quality	AstraMindAI/SFT-Nectar
skypile	AI-ModelScope/SkyPile-150B		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality, zh	Skywork/SkyPile-150B
slim-orca	swift/SlimOrca		517982	399.1±370.2, min=35, max=8756	quality, en	Open-Orca/SlimOrca
slim-pajama-627b	None		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	cerebras/SlimPajama-627B
starcoder	AI-ModelScope/starcoderdata		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	bigcode/starcoderdata
tagengo-gpt4	swift/tagengo-gpt4		78057	472.3±292.9, min=22, max=3521	chat, multi-lingual, quality	lightblue/tagengo-gpt4
the-stack	AI-ModelScope/the-stack		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	bigcode/the-stack
ultrachat-200k	swift/ultrachat_200k		207865	1195.4±573.7, min=76, max=4470	chat, en, quality	HuggingFaceH4/ultrachat_200k
vqa-v2	swift/VQAv2		443757	31.8±2.2, min=27, max=58	en, vqa, quality	HuggingFaceM4/VQAv2
web-instruct-sub	swift/WebInstructSub		-	Dataset is too huge, please click the original link to view the dataset stat.	qa, en, math, quality, multi-domain, science	TIGER-Lab/WebInstructSub
wikipedia	swift/wikipedia		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	wikipedia
wikipedia-cn-filtered	AI-ModelScope/wikipedia-cn-20230720-filtered		-	Dataset is too huge, please click the original link to view the dataset stat.	pretrain, quality	pleisto/wikipedia-cn-20230720-filtered
zhihu-rlhf	AI-ModelScope/zhihu_rlhf_3k		3460	594.5±365.9, min=31, max=1716	rlhf, dpo, zh	liyucheng/zhihu_rlhf_3k