支持的模型和数据集
目录
模型
下表介绍了swift介入的模型的相关信息:
Model List: 模型在swift中注册的model_type的列表.
Default Lora Target Modules: 对应模型的默认lora_target_modules.
Default Template: 对应模型的默认template.
Support Flash Attn: 模型是否支持flash attention加速推理和微调.
Support VLLM: 模型是否支持vllm加速推理和部署.
Requires: 对应模型所需的额外依赖要求.
大语言模型
Model Type |
Model ID |
Default Lora Target Modules |
Default Template |
Support Flash Attn |
Support vLLM |
Support LMDeploy |
Support Megatron |
Requires |
Tags |
HF Model ID |
|---|---|---|---|---|---|---|---|---|---|---|
qwen-1_8b |
c_attn |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
qwen-1_8b-chat |
c_attn |
qwen |
✔ |
✔ |
✔ |
✘ |
- |
|||
qwen-1_8b-chat-int4 |
c_attn |
qwen |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5 |
- |
||
qwen-1_8b-chat-int8 |
c_attn |
qwen |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5 |
- |
||
qwen-7b |
c_attn |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
qwen-7b-chat |
c_attn |
qwen |
✔ |
✔ |
✔ |
✘ |
- |
|||
qwen-7b-chat-int4 |
c_attn |
qwen |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5 |
- |
||
qwen-7b-chat-int8 |
c_attn |
qwen |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5 |
- |
||
qwen-14b |
c_attn |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
qwen-14b-chat |
c_attn |
qwen |
✔ |
✔ |
✔ |
✘ |
- |
|||
qwen-14b-chat-int4 |
c_attn |
qwen |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5 |
- |
||
qwen-14b-chat-int8 |
c_attn |
qwen |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5 |
- |
||
qwen-72b |
c_attn |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
qwen-72b-chat |
c_attn |
qwen |
✔ |
✔ |
✔ |
✘ |
- |
|||
qwen-72b-chat-int4 |
c_attn |
qwen |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5 |
- |
||
qwen-72b-chat-int8 |
c_attn |
qwen |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5 |
- |
||
modelscope-agent-7b |
c_attn |
modelscope-agent |
✔ |
✘ |
✘ |
✘ |
- |
- |
||
modelscope-agent-14b |
c_attn |
modelscope-agent |
✔ |
✘ |
✘ |
✘ |
- |
- |
||
qwen1half-0_5b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen1half-1_8b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen1half-4b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen1half-7b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen1half-14b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen1half-32b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen1half-72b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen1half-110b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
codeqwen1half-7b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen1half-moe-a2_7b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✘ |
✘ |
transformers>=4.40 |
moe |
||
qwen1half-0_5b-chat |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen1half-1_8b-chat |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen1half-4b-chat |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen1half-7b-chat |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen1half-14b-chat |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen1half-32b-chat |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen1half-72b-chat |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen1half-110b-chat |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen1half-moe-a2_7b-chat |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✘ |
✘ |
transformers>=4.40 |
moe |
||
codeqwen1half-7b-chat |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen1half-0_5b-chat-int4 |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen1half-1_8b-chat-int4 |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen1half-4b-chat-int4 |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen1half-7b-chat-int4 |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen1half-14b-chat-int4 |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen1half-32b-chat-int4 |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen1half-72b-chat-int4 |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen1half-110b-chat-int4 |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen1half-0_5b-chat-int8 |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen1half-1_8b-chat-int8 |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen1half-4b-chat-int8 |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen1half-7b-chat-int8 |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen1half-14b-chat-int8 |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen1half-72b-chat-int8 |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen1half-moe-a2_7b-chat-int4 |
q_proj, k_proj, v_proj |
qwen |
✔ |
✘ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.40 |
moe |
||
qwen1half-0_5b-chat-awq |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✘ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwen1half-1_8b-chat-awq |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwen1half-4b-chat-awq |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwen1half-7b-chat-awq |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwen1half-14b-chat-awq |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwen1half-32b-chat-awq |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwen1half-72b-chat-awq |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwen1half-110b-chat-awq |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37, autoawq |
- |
||
codeqwen1half-7b-chat-awq |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwen2-0_5b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen2-0_5b-instruct |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen2-0_5b-instruct-int4 |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2-0_5b-instruct-int8 |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2-0_5b-instruct-awq |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✘ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwen2-1_5b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen2-1_5b-instruct |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen2-1_5b-instruct-int4 |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2-1_5b-instruct-int8 |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2-1_5b-instruct-awq |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwen2-7b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen2-7b-instruct |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen2-7b-instruct-int4 |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2-7b-instruct-int8 |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2-7b-instruct-awq |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwen2-72b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen2-72b-instruct |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen2-72b-instruct-int4 |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2-72b-instruct-int8 |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2-72b-instruct-awq |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwen2-57b-a14b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✘ |
✘ |
transformers>=4.40 |
moe |
||
qwen2-57b-a14b-instruct |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✘ |
✘ |
transformers>=4.40 |
moe |
||
qwen2-57b-a14b-instruct-int4 |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.40 |
moe |
||
qwen2-math-1_5b |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen2-math-1_5b-instruct |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen2-math-7b |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen2-math-7b-instruct |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen2-math-72b |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen2-math-72b-instruct |
q_proj, k_proj, v_proj |
qwen |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
qwen2_5-0_5b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-1_5b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-3b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-7b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-14b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-32b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-72b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-0_5b-instruct |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-1_5b-instruct |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-3b-instruct |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-7b-instruct |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-14b-instruct |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-32b-instruct |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-72b-instruct |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-0_5b-instruct-gptq-int4 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-1_5b-instruct-gptq-int4 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-3b-instruct-gptq-int4 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-7b-instruct-gptq-int4 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-14b-instruct-gptq-int4 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-32b-instruct-gptq-int4 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-72b-instruct-gptq-int4 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-0_5b-instruct-gptq-int8 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-1_5b-instruct-gptq-int8 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-3b-instruct-gptq-int8 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-7b-instruct-gptq-int8 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-14b-instruct-gptq-int8 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-32b-instruct-gptq-int8 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-72b-instruct-gptq-int8 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-0_5b-instruct-awq |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwen2_5-1_5b-instruct-awq |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwen2_5-3b-instruct-awq |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwen2_5-7b-instruct-awq |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwen2_5-14b-instruct-awq |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwen2_5-32b-instruct-awq |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwen2_5-72b-instruct-awq |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwen2_5-math-1_5b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-math-7b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-math-72b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-math-1_5b-instruct |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-math-7b-instruct |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-math-72b-instruct |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-coder-0_5b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-coder-0_5b-instruct |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-coder-0_5b-instruct-gptq-int4 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-coder-0_5b-instruct-gptq-int8 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-coder-0_5b-instruct-awq |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwen2_5-coder-1_5b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-coder-1_5b-instruct |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-coder-1_5b-instruct-gptq-int4 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-coder-1_5b-instruct-gptq-int8 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-coder-1_5b-instruct-awq |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwen2_5-coder-3b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-coder-3b-instruct |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-coder-3b-instruct-gptq-int4 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-coder-3b-instruct-gptq-int8 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-coder-3b-instruct-awq |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwen2_5-coder-7b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-coder-7b-instruct |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-coder-7b-instruct-gptq-int4 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-coder-7b-instruct-gptq-int8 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-coder-7b-instruct-awq |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwen2_5-coder-14b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-coder-14b-instruct |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-coder-14b-instruct-gptq-int4 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-coder-14b-instruct-gptq-int8 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-coder-14b-instruct-awq |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwen2_5-coder-32b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-coder-32b-instruct |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
qwen2_5-coder-32b-instruct-gptq-int4 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-coder-32b-instruct-gptq-int8 |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5, transformers>=4.37 |
- |
||
qwen2_5-coder-32b-instruct-awq |
q_proj, k_proj, v_proj |
qwen2_5 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37, autoawq |
- |
||
qwq-32b-preview |
q_proj, k_proj, v_proj |
qwq |
✔ |
✔ |
✔ |
✔ |
transformers>=4.37 |
- |
||
marco-o1 |
q_proj, k_proj, v_proj |
marco_o1 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.37 |
- |
||
chatglm2-6b |
query_key_value |
chatglm2 |
✘ |
✔ |
✘ |
✘ |
transformers<4.42 |
- |
||
chatglm2-6b-32k |
query_key_value |
chatglm2 |
✘ |
✔ |
✘ |
✘ |
transformers<4.42 |
- |
||
chatglm3-6b-base |
query_key_value |
chatglm-generation |
✘ |
✔ |
✘ |
✘ |
transformers<4.42 |
- |
||
chatglm3-6b |
query_key_value |
chatglm3 |
✘ |
✔ |
✘ |
✘ |
transformers<4.42 |
- |
||
chatglm3-6b-32k |
query_key_value |
chatglm3 |
✘ |
✔ |
✘ |
✘ |
transformers<4.42 |
- |
||
chatglm3-6b-128k |
query_key_value |
chatglm3 |
✘ |
✔ |
✘ |
✘ |
transformers<4.42 |
- |
||
codegeex2-6b |
query_key_value |
chatglm-generation |
✘ |
✔ |
✘ |
✘ |
transformers<4.34 |
coding |
||
glm4-9b |
query_key_value |
chatglm-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.42 |
- |
||
glm4-9b-chat |
query_key_value |
chatglm4 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.42 |
- |
||
glm4-9b-chat-1m |
query_key_value |
chatglm4 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.42 |
- |
||
codegeex4-9b-chat |
query_key_value |
codegeex4 |
✔ |
✔ |
✔ |
✘ |
transformers<4.42 |
coding |
||
glm-edge-1_5b-chat |
q_proj, k_proj, v_proj |
chatglm4 |
✔ |
✘ |
✘ |
✘ |
transformers>=4.46 |
- |
||
glm-edge-4b-chat |
q_proj, k_proj, v_proj |
chatglm4 |
✔ |
✘ |
✘ |
✘ |
transformers>=4.46 |
- |
||
llama2-7b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
llama2-7b-chat |
q_proj, k_proj, v_proj |
llama |
✔ |
✔ |
✔ |
✘ |
- |
|||
llama2-13b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
llama2-13b-chat |
q_proj, k_proj, v_proj |
llama |
✔ |
✔ |
✔ |
✘ |
- |
|||
llama2-70b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
llama2-70b-chat |
q_proj, k_proj, v_proj |
llama |
✔ |
✔ |
✔ |
✘ |
- |
|||
llama2-7b-aqlm-2bit-1x16 |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✘ |
✘ |
✘ |
transformers>=4.38, aqlm, torch>=2.2.0 |
- |
||
llama3-8b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
llama3-8b-instruct |
q_proj, k_proj, v_proj |
llama3 |
✔ |
✔ |
✔ |
✘ |
- |
|||
llama3-8b-instruct-int4 |
q_proj, k_proj, v_proj |
llama3 |
✔ |
✔ |
✘ |
✘ |
auto_gptq |
- |
||
llama3-8b-instruct-int8 |
q_proj, k_proj, v_proj |
llama3 |
✔ |
✔ |
✘ |
✘ |
auto_gptq |
- |
||
llama3-8b-instruct-awq |
q_proj, k_proj, v_proj |
llama3 |
✔ |
✔ |
✔ |
✘ |
autoawq |
- |
||
llama3-70b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
llama3-70b-instruct |
q_proj, k_proj, v_proj |
llama3 |
✔ |
✔ |
✔ |
✘ |
- |
|||
llama3-70b-instruct-int4 |
q_proj, k_proj, v_proj |
llama3 |
✔ |
✔ |
✘ |
✘ |
auto_gptq |
- |
||
llama3-70b-instruct-int8 |
q_proj, k_proj, v_proj |
llama3 |
✔ |
✔ |
✘ |
✘ |
auto_gptq |
- |
||
llama3-70b-instruct-awq |
q_proj, k_proj, v_proj |
llama3 |
✔ |
✔ |
✔ |
✘ |
autoawq |
- |
||
llama3_1-8b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.43 |
- |
||
llama3_1-8b-instruct |
q_proj, k_proj, v_proj |
llama3 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.43 |
- |
||
llama3_1-8b-instruct-awq |
q_proj, k_proj, v_proj |
llama3 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.43, autoawq |
- |
||
llama3_1-8b-instruct-gptq-int4 |
q_proj, k_proj, v_proj |
llama3 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.43, auto_gptq |
- |
||
llama3_1-8b-instruct-bnb |
q_proj, k_proj, v_proj |
llama3 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.43, bitsandbytes |
- |
||
llama3_1-70b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.43 |
- |
||
llama3_1-70b-instruct |
q_proj, k_proj, v_proj |
llama3 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.43 |
- |
||
llama3_1-70b-instruct-fp8 |
q_proj, k_proj, v_proj |
llama3 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.43 |
- |
||
llama3_1-70b-instruct-awq |
q_proj, k_proj, v_proj |
llama3 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.43, autoawq |
- |
||
llama3_1-70b-instruct-gptq-int4 |
q_proj, k_proj, v_proj |
llama3 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.43, auto_gptq |
- |
||
llama3_1-70b-instruct-bnb |
q_proj, k_proj, v_proj |
llama3 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.43, bitsandbytes |
- |
||
llama3_1-405b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.43 |
- |
||
llama3_1-405b-instruct |
q_proj, k_proj, v_proj |
llama3 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.43 |
- |
||
llama3_1-405b-instruct-fp8 |
q_proj, k_proj, v_proj |
llama3 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.43 |
- |
||
llama3_1-405b-instruct-awq |
q_proj, k_proj, v_proj |
llama3 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.43, autoawq |
- |
||
llama3_1-405b-instruct-gptq-int4 |
q_proj, k_proj, v_proj |
llama3 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.43, auto_gptq |
- |
||
llama3_1-405b-instruct-bnb |
q_proj, k_proj, v_proj |
llama3 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.43, bitsandbytes |
- |
||
llama-3.1-nemotron-70B-instruct-hf |
q_proj, k_proj, v_proj |
llama3 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.43 |
- |
||
llama3_2-1b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.45 |
- |
||
llama3_2-1b-instruct |
q_proj, k_proj, v_proj |
llama3_2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.45 |
- |
||
llama3_2-3b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.45 |
- |
||
llama3_2-3b-instruct |
q_proj, k_proj, v_proj |
llama3_2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.45 |
- |
||
reflection-llama_3_1-70b |
q_proj, k_proj, v_proj |
reflection |
✔ |
✔ |
✘ |
✘ |
transformers>=4.43 |
- |
||
longwriter-glm4-9b |
query_key_value |
chatglm4 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.42 |
- |
||
longwriter-llama3_1-8b |
q_proj, k_proj, v_proj |
longwriter-llama3 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.43 |
- |
||
chinese-llama-2-1_3b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
chinese-llama-2-7b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
chinese-llama-2-7b-16k |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
chinese-llama-2-7b-64k |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
chinese-llama-2-13b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
chinese-llama-2-13b-16k |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
chinese-alpaca-2-1_3b |
q_proj, k_proj, v_proj |
llama |
✔ |
✔ |
✔ |
✘ |
- |
|||
chinese-alpaca-2-7b |
q_proj, k_proj, v_proj |
llama |
✔ |
✔ |
✔ |
✘ |
- |
|||
chinese-alpaca-2-7b-16k |
q_proj, k_proj, v_proj |
llama |
✔ |
✔ |
✔ |
✘ |
- |
|||
chinese-alpaca-2-7b-64k |
q_proj, k_proj, v_proj |
llama |
✔ |
✔ |
✔ |
✘ |
- |
|||
chinese-alpaca-2-13b |
q_proj, k_proj, v_proj |
llama |
✔ |
✔ |
✔ |
✘ |
- |
|||
chinese-alpaca-2-13b-16k |
q_proj, k_proj, v_proj |
llama |
✔ |
✔ |
✔ |
✘ |
- |
|||
llama-3-chinese-8b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
llama-3-chinese-8b-instruct |
q_proj, k_proj, v_proj |
llama3 |
✔ |
✔ |
✔ |
✘ |
- |
|||
atom-7b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✘ |
✘ |
- |
|||
atom-7b-chat |
q_proj, k_proj, v_proj |
atom |
✔ |
✔ |
✘ |
✘ |
- |
|||
yi-6b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
yi-6b-200k |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
yi-6b-chat |
q_proj, k_proj, v_proj |
chatml |
✔ |
✔ |
✔ |
✘ |
- |
|||
yi-6b-chat-awq |
q_proj, k_proj, v_proj |
chatml |
✔ |
✔ |
✔ |
✘ |
autoawq |
- |
||
yi-6b-chat-int8 |
q_proj, k_proj, v_proj |
chatml |
✔ |
✔ |
✘ |
✘ |
auto_gptq |
- |
||
yi-9b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
yi-9b-200k |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
yi-34b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
yi-34b-200k |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
yi-34b-chat |
q_proj, k_proj, v_proj |
chatml |
✔ |
✔ |
✔ |
✘ |
- |
|||
yi-34b-chat-awq |
q_proj, k_proj, v_proj |
chatml |
✔ |
✔ |
✔ |
✘ |
autoawq |
- |
||
yi-34b-chat-int8 |
q_proj, k_proj, v_proj |
chatml |
✔ |
✔ |
✘ |
✘ |
auto_gptq |
- |
||
yi-1_5-6b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
yi-1_5-6b-chat |
q_proj, k_proj, v_proj |
chatml |
✔ |
✔ |
✔ |
✘ |
- |
|||
yi-1_5-9b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
yi-1_5-9b-chat |
q_proj, k_proj, v_proj |
chatml |
✔ |
✔ |
✔ |
✘ |
- |
|||
yi-1_5-9b-chat-16k |
q_proj, k_proj, v_proj |
chatml |
✔ |
✔ |
✔ |
✘ |
- |
|||
yi-1_5-34b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
yi-1_5-34b-chat |
q_proj, k_proj, v_proj |
chatml |
✔ |
✔ |
✔ |
✘ |
- |
|||
yi-1_5-34b-chat-16k |
q_proj, k_proj, v_proj |
chatml |
✔ |
✔ |
✔ |
✘ |
- |
|||
yi-1_5-6b-chat-awq-int4 |
q_proj, k_proj, v_proj |
chatml |
✔ |
✔ |
✔ |
✘ |
autoawq |
- |
||
yi-1_5-6b-chat-gptq-int4 |
q_proj, k_proj, v_proj |
chatml |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5 |
- |
||
yi-1_5-9b-chat-awq-int4 |
q_proj, k_proj, v_proj |
chatml |
✔ |
✔ |
✔ |
✘ |
autoawq |
- |
||
yi-1_5-9b-chat-gptq-int4 |
q_proj, k_proj, v_proj |
chatml |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5 |
- |
||
yi-1_5-34b-chat-awq-int4 |
q_proj, k_proj, v_proj |
chatml |
✔ |
✔ |
✔ |
✘ |
autoawq |
- |
||
yi-1_5-34b-chat-gptq-int4 |
q_proj, k_proj, v_proj |
chatml |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5 |
- |
||
yi-coder-1_5b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
yi-coder-1_5b-chat |
q_proj, k_proj, v_proj |
yi-coder |
✔ |
✔ |
✔ |
✘ |
- |
|||
yi-coder-9b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
yi-coder-9b-chat |
q_proj, k_proj, v_proj |
yi-coder |
✔ |
✔ |
✔ |
✘ |
- |
|||
internlm-7b |
q_proj, k_proj, v_proj |
default-generation |
✘ |
✔ |
✔ |
✘ |
- |
|||
internlm-7b-chat |
q_proj, k_proj, v_proj |
internlm |
✘ |
✔ |
✔ |
✘ |
- |
|||
internlm-7b-chat-8k |
q_proj, k_proj, v_proj |
internlm |
✘ |
✔ |
✔ |
✘ |
- |
- |
||
internlm-20b |
q_proj, k_proj, v_proj |
default-generation |
✘ |
✔ |
✔ |
✘ |
- |
|||
internlm-20b-chat |
q_proj, k_proj, v_proj |
internlm |
✘ |
✔ |
✔ |
✘ |
- |
|||
internlm2-1_8b |
wqkv |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.38 |
- |
||
internlm2-1_8b-sft-chat |
wqkv |
internlm2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.38 |
- |
||
internlm2-1_8b-chat |
wqkv |
internlm2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.38 |
- |
||
internlm2-7b-base |
wqkv |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.38 |
- |
||
internlm2-7b |
wqkv |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.38 |
- |
||
internlm2-7b-sft-chat |
wqkv |
internlm2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.38 |
- |
||
internlm2-7b-chat |
wqkv |
internlm2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.38 |
- |
||
internlm2-20b-base |
wqkv |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.38 |
- |
||
internlm2-20b |
wqkv |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.38 |
- |
||
internlm2-20b-sft-chat |
wqkv |
internlm2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.38 |
- |
||
internlm2-20b-chat |
wqkv |
internlm2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.38 |
- |
||
internlm2_5-1_8b |
wqkv |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.38 |
- |
||
internlm2_5-1_8b-chat |
wqkv |
internlm2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.38 |
- |
||
internlm2_5-7b |
wqkv |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.38 |
- |
||
internlm2_5-7b-chat |
wqkv |
internlm2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.38 |
- |
||
internlm2_5-7b-chat-1m |
wqkv |
internlm2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.38 |
- |
||
internlm2_5-20b |
wqkv |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.38 |
- |
||
internlm2_5-20b-chat |
wqkv |
internlm2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.38 |
- |
||
internlm2-math-7b |
wqkv |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.38 |
math |
||
internlm2-math-7b-chat |
wqkv |
internlm2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.38 |
math |
||
internlm2-math-20b |
wqkv |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.38 |
math |
||
internlm2-math-20b-chat |
wqkv |
internlm2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.38 |
math |
||
deepseek-7b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
deepseek-7b-chat |
q_proj, k_proj, v_proj |
deepseek |
✔ |
✔ |
✔ |
✘ |
- |
|||
deepseek-moe-16b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✘ |
✘ |
moe |
|||
deepseek-moe-16b-chat |
q_proj, k_proj, v_proj |
deepseek |
✔ |
✔ |
✘ |
✘ |
moe |
|||
deepseek-67b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
deepseek-67b-chat |
q_proj, k_proj, v_proj |
deepseek |
✔ |
✔ |
✔ |
✘ |
- |
|||
deepseek-coder-1_3b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
coding |
|||
deepseek-coder-1_3b-instruct |
q_proj, k_proj, v_proj |
deepseek-coder |
✔ |
✔ |
✔ |
✘ |
coding |
|||
deepseek-coder-6_7b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
coding |
|||
deepseek-coder-6_7b-instruct |
q_proj, k_proj, v_proj |
deepseek-coder |
✔ |
✔ |
✔ |
✘ |
coding |
|||
deepseek-coder-33b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
coding |
|||
deepseek-coder-33b-instruct |
q_proj, k_proj, v_proj |
deepseek-coder |
✔ |
✔ |
✔ |
✘ |
coding |
|||
deepseek-coder-v2-instruct |
q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj |
deepseek2 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.39.3 |
coding, moe |
||
deepseek-coder-v2-lite-instruct |
q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj |
deepseek2 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.39.3 |
coding, moe |
||
deepseek-coder-v2 |
q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj |
default-generation |
✔ |
✔ |
✘ |
✘ |
transformers>=4.39.3 |
coding, moe |
||
deepseek-coder-v2-lite |
q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj |
default-generation |
✔ |
✔ |
✘ |
✘ |
transformers>=4.39.3 |
coding, moe |
||
deepseek-math-7b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
math |
|||
deepseek-math-7b-instruct |
q_proj, k_proj, v_proj |
deepseek |
✔ |
✔ |
✔ |
✘ |
math |
|||
deepseek-math-7b-chat |
q_proj, k_proj, v_proj |
deepseek |
✔ |
✔ |
✔ |
✘ |
math |
|||
numina-math-7b |
q_proj, k_proj, v_proj |
numina-math |
✔ |
✔ |
✘ |
✘ |
math |
|||
deepseek-v2 |
q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj |
default-generation |
✔ |
✔ |
✘ |
✘ |
transformers>=4.39.3 |
moe |
||
deepseek-v2-chat |
q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj |
deepseek2 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.39.3 |
moe |
||
deepseek-v2-lite |
q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj |
default-generation |
✔ |
✔ |
✘ |
✘ |
transformers>=4.39.3 |
moe |
||
deepseek-v2-lite-chat |
q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj |
deepseek2 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.39.3 |
moe |
||
deepseek-v2_5 |
q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj |
deepseek2_5 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.39.3 |
moe |
||
gemma-2b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✘ |
✘ |
transformers>=4.38 |
- |
||
gemma-7b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✘ |
✘ |
transformers>=4.38 |
- |
||
gemma-2b-instruct |
q_proj, k_proj, v_proj |
gemma |
✔ |
✔ |
✘ |
✘ |
transformers>=4.38 |
- |
||
gemma-7b-instruct |
q_proj, k_proj, v_proj |
gemma |
✔ |
✔ |
✘ |
✘ |
transformers>=4.38 |
- |
||
gemma2-2b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✘ |
✘ |
transformers>=4.42 |
- |
||
gemma2-9b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✘ |
✘ |
transformers>=4.42 |
- |
||
gemma2-27b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✘ |
✘ |
transformers>=4.42 |
- |
||
gemma2-2b-instruct |
q_proj, k_proj, v_proj |
gemma |
✔ |
✔ |
✘ |
✘ |
transformers>=4.42 |
- |
||
gemma2-9b-instruct |
q_proj, k_proj, v_proj |
gemma |
✔ |
✔ |
✘ |
✘ |
transformers>=4.42 |
- |
||
gemma2-27b-instruct |
q_proj, k_proj, v_proj |
gemma |
✔ |
✔ |
✘ |
✘ |
transformers>=4.42 |
- |
||
minicpm-1b-sft-chat |
q_proj, k_proj, v_proj |
minicpm |
✔ |
✔ |
✘ |
✘ |
transformers>=4.36.0 |
- |
||
minicpm-2b-sft-chat |
q_proj, k_proj, v_proj |
minicpm |
✔ |
✔ |
✘ |
✘ |
- |
|||
minicpm-2b-chat |
q_proj, k_proj, v_proj |
minicpm |
✔ |
✔ |
✘ |
✘ |
- |
|||
minicpm-2b-128k |
q_proj, k_proj, v_proj |
chatml |
✔ |
✔ |
✘ |
✘ |
transformers>=4.36.0 |
- |
||
minicpm-moe-8x2b |
q_proj, k_proj, v_proj |
minicpm |
✔ |
✔ |
✘ |
✘ |
transformers>=4.36.0 |
moe |
||
minicpm3-4b |
q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj |
chatml |
✔ |
✘ |
✘ |
✘ |
transformers>=4.36 |
- |
||
openbuddy-llama-65b-chat |
q_proj, k_proj, v_proj |
openbuddy |
✔ |
✔ |
✔ |
✘ |
- |
|||
openbuddy-llama2-13b-chat |
q_proj, k_proj, v_proj |
openbuddy |
✔ |
✔ |
✔ |
✘ |
- |
|||
openbuddy-llama2-70b-chat |
q_proj, k_proj, v_proj |
openbuddy |
✔ |
✔ |
✔ |
✘ |
- |
|||
openbuddy-llama3-8b-chat |
q_proj, k_proj, v_proj |
openbuddy2 |
✔ |
✔ |
✔ |
✘ |
- |
|||
openbuddy-llama3-70b-chat |
q_proj, k_proj, v_proj |
openbuddy2 |
✔ |
✔ |
✔ |
✘ |
- |
|||
openbuddy-mistral-7b-chat |
q_proj, k_proj, v_proj |
openbuddy |
✔ |
✔ |
✔ |
✘ |
transformers>=4.34 |
- |
||
openbuddy-zephyr-7b-chat |
q_proj, k_proj, v_proj |
openbuddy |
✔ |
✔ |
✔ |
✘ |
transformers>=4.34 |
- |
||
openbuddy-deepseek-67b-chat |
q_proj, k_proj, v_proj |
openbuddy |
✔ |
✔ |
✔ |
✘ |
- |
|||
openbuddy-mixtral-moe-7b-chat |
q_proj, k_proj, v_proj |
openbuddy |
✔ |
✔ |
✘ |
✘ |
transformers>=4.36 |
moe |
||
openbuddy-llama3_1-8b-chat |
q_proj, k_proj, v_proj |
openbuddy2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.43 |
- |
||
mistral-7b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.34 |
- |
||
mistral-7b-v2 |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
transformers>=4.34 |
- |
||
mistral-7b-instruct |
q_proj, k_proj, v_proj |
llama |
✔ |
✔ |
✔ |
✘ |
transformers>=4.34 |
- |
||
mistral-7b-instruct-v2 |
q_proj, k_proj, v_proj |
llama |
✔ |
✔ |
✔ |
✘ |
transformers>=4.34 |
- |
||
mistral-7b-instruct-v3 |
q_proj, k_proj, v_proj |
llama |
✔ |
✔ |
✔ |
✘ |
transformers>=4.34 |
- |
||
mistral-nemo-base-2407 |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✘ |
✘ |
transformers>=4.43 |
- |
||
mistral-nemo-instruct-2407 |
q_proj, k_proj, v_proj |
mistral-nemo |
✔ |
✔ |
✘ |
✘ |
transformers>=4.43 |
- |
||
mistral-large-instruct-2407 |
q_proj, k_proj, v_proj |
mistral-nemo |
✔ |
✔ |
✘ |
✘ |
transformers>=4.43 |
- |
||
mistral-small-instruct-2409 |
q_proj, k_proj, v_proj |
mistral-nemo |
✔ |
✔ |
✘ |
✘ |
transformers>=4.43 |
- |
||
mixtral-moe-7b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✘ |
✘ |
transformers>=4.36 |
moe |
||
mixtral-moe-7b-instruct |
q_proj, k_proj, v_proj |
llama |
✔ |
✔ |
✘ |
✘ |
transformers>=4.36 |
moe |
||
mixtral-moe-7b-aqlm-2bit-1x16 |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✘ |
✘ |
✘ |
transformers>=4.38, aqlm, torch>=2.2.0 |
moe |
||
mixtral-moe-8x22b-v1 |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✘ |
✘ |
transformers>=4.36 |
moe |
||
ministral-8b-instruct-2410 |
q_proj, k_proj, v_proj |
mistral-nemo |
✔ |
✔ |
✘ |
✘ |
transformers>=4.46 |
- |
||
wizardlm2-7b-awq |
q_proj, k_proj, v_proj |
wizardlm2-awq |
✔ |
✔ |
✘ |
✘ |
transformers>=4.34 |
- |
||
wizardlm2-8x22b |
q_proj, k_proj, v_proj |
wizardlm2 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.36 |
- |
||
baichuan-7b |
W_pack |
default-generation |
✘ |
✔ |
✔ |
✘ |
transformers<4.34 |
- |
||
baichuan-13b |
W_pack |
default-generation |
✘ |
✔ |
✔ |
✘ |
transformers<4.34 |
- |
||
baichuan-13b-chat |
W_pack |
baichuan |
✘ |
✔ |
✔ |
✘ |
transformers<4.34 |
- |
||
baichuan2-7b |
W_pack |
default-generation |
✘ |
✔ |
✔ |
✘ |
- |
|||
baichuan2-7b-chat |
W_pack |
baichuan |
✘ |
✔ |
✔ |
✘ |
- |
|||
baichuan2-7b-chat-int4 |
W_pack |
baichuan |
✘ |
✘ |
✘ |
✘ |
bitsandbytes<0.41.2, accelerate<0.26 |
- |
||
baichuan2-13b |
W_pack |
default-generation |
✘ |
✔ |
✔ |
✘ |
- |
|||
baichuan2-13b-chat |
W_pack |
baichuan |
✘ |
✔ |
✔ |
✘ |
- |
|||
baichuan2-13b-chat-int4 |
W_pack |
baichuan |
✘ |
✘ |
✘ |
✘ |
bitsandbytes<0.41.2, accelerate<0.26 |
- |
||
yuan2-2b-instruct |
q_proj, k_proj, v_proj |
yuan |
✔ |
✘ |
✘ |
✘ |
- |
|||
yuan2-2b-janus-instruct |
q_proj, k_proj, v_proj |
yuan |
✔ |
✘ |
✘ |
✘ |
- |
|||
yuan2-51b-instruct |
q_proj, k_proj, v_proj |
yuan |
✔ |
✘ |
✘ |
✘ |
- |
|||
yuan2-102b-instruct |
q_proj, k_proj, v_proj |
yuan |
✔ |
✘ |
✘ |
✘ |
- |
|||
yuan2-m32 |
q_proj, k_proj, v_proj |
yuan |
✔ |
✘ |
✘ |
✘ |
moe |
|||
xverse-7b |
q_proj, k_proj, v_proj |
default-generation |
✘ |
✔ |
✘ |
✘ |
- |
|||
xverse-7b-chat |
q_proj, k_proj, v_proj |
xverse |
✘ |
✔ |
✘ |
✘ |
- |
|||
xverse-13b |
q_proj, k_proj, v_proj |
default-generation |
✘ |
✔ |
✘ |
✘ |
- |
|||
xverse-13b-chat |
q_proj, k_proj, v_proj |
xverse |
✘ |
✔ |
✘ |
✘ |
- |
|||
xverse-65b |
q_proj, k_proj, v_proj |
default-generation |
✘ |
✔ |
✘ |
✘ |
- |
|||
xverse-65b-v2 |
q_proj, k_proj, v_proj |
default-generation |
✘ |
✔ |
✘ |
✘ |
- |
|||
xverse-65b-chat |
q_proj, k_proj, v_proj |
xverse |
✘ |
✔ |
✘ |
✘ |
- |
|||
xverse-13b-256k |
q_proj, k_proj, v_proj |
default-generation |
✘ |
✔ |
✘ |
✘ |
- |
|||
xverse-moe-a4_2b |
q_proj, k_proj, v_proj |
default-generation |
✘ |
✘ |
✘ |
✘ |
moe |
|||
orion-14b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✘ |
✘ |
✘ |
- |
|||
orion-14b-chat |
q_proj, k_proj, v_proj |
orion |
✔ |
✘ |
✘ |
✘ |
- |
|||
bluelm-7b |
q_proj, k_proj, v_proj |
default-generation |
✘ |
✘ |
✘ |
✘ |
- |
|||
bluelm-7b-32k |
q_proj, k_proj, v_proj |
default-generation |
✘ |
✘ |
✘ |
✘ |
- |
|||
bluelm-7b-chat |
q_proj, k_proj, v_proj |
bluelm |
✘ |
✘ |
✘ |
✘ |
- |
|||
bluelm-7b-chat-32k |
q_proj, k_proj, v_proj |
bluelm |
✘ |
✘ |
✘ |
✘ |
- |
|||
ziya2-13b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✔ |
✘ |
- |
|||
ziya2-13b-chat |
q_proj, k_proj, v_proj |
ziya |
✔ |
✔ |
✔ |
✘ |
- |
|||
skywork-13b |
q_proj, k_proj, v_proj |
default-generation |
✘ |
✘ |
✘ |
✘ |
- |
|||
skywork-13b-chat |
q_proj, k_proj, v_proj |
skywork |
✘ |
✘ |
✘ |
✘ |
- |
- |
||
zephyr-7b-beta-chat |
q_proj, k_proj, v_proj |
zephyr |
✔ |
✔ |
✔ |
✘ |
transformers>=4.34 |
- |
||
polylm-13b |
c_attn |
default-generation |
✘ |
✘ |
✘ |
✘ |
- |
|||
seqgpt-560m |
query_key_value |
default-generation |
✘ |
✔ |
✘ |
✘ |
- |
|||
sus-34b-chat |
q_proj, k_proj, v_proj |
sus |
✔ |
✔ |
✔ |
✘ |
- |
|||
tongyi-finance-14b |
c_attn |
default-generation |
✔ |
✔ |
✔ |
✘ |
financial |
- |
||
tongyi-finance-14b-chat |
c_attn |
qwen |
✔ |
✔ |
✔ |
✘ |
financial |
|||
tongyi-finance-14b-chat-int4 |
c_attn |
qwen |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5 |
financial |
||
codefuse-codellama-34b-chat |
q_proj, k_proj, v_proj |
codefuse-codellama |
✔ |
✔ |
✔ |
✘ |
coding |
|||
codefuse-codegeex2-6b-chat |
query_key_value |
codefuse |
✘ |
✔ |
✘ |
✘ |
transformers<4.34 |
coding |
||
codefuse-qwen-14b-chat |
c_attn |
codefuse |
✔ |
✔ |
✔ |
✘ |
coding |
|||
phi2-3b |
Wqkv |
default-generation |
✔ |
✔ |
✘ |
✘ |
coding |
|||
phi3-4b-4k-instruct |
qkv_proj |
phi3 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.36 |
- |
||
phi3-4b-128k-instruct |
qkv_proj |
phi3 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.36 |
- |
||
phi3-small-8k-instruct |
query_key_value |
phi3 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.36 |
- |
||
phi3-medium-4k-instruct |
qkv_proj |
phi3 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.36 |
- |
||
phi3-small-128k-instruct |
query_key_value |
phi3 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.36 |
- |
||
phi3-medium-128k-instruct |
qkv_proj |
phi3 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.36 |
- |
||
phi3_5-mini-instruct |
qkv_proj |
phi3 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.36 |
- |
||
phi3_5-moe-instruct |
q_proj, k_proj, v_proj |
phi3 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.36 |
moe |
||
mamba-130m |
in_proj, x_proj, embeddings, out_proj |
default-generation |
✘ |
✘ |
✘ |
✘ |
transformers>=4.39.0 |
- |
||
mamba-370m |
in_proj, x_proj, embeddings, out_proj |
default-generation |
✘ |
✘ |
✘ |
✘ |
transformers>=4.39.0 |
- |
||
mamba-390m |
in_proj, x_proj, embeddings, out_proj |
default-generation |
✘ |
✘ |
✘ |
✘ |
transformers>=4.39.0 |
- |
||
mamba-790m |
in_proj, x_proj, embeddings, out_proj |
default-generation |
✘ |
✘ |
✘ |
✘ |
transformers>=4.39.0 |
- |
||
mamba-1.4b |
in_proj, x_proj, embeddings, out_proj |
default-generation |
✘ |
✘ |
✘ |
✘ |
transformers>=4.39.0 |
- |
||
mamba-2.8b |
in_proj, x_proj, embeddings, out_proj |
default-generation |
✘ |
✘ |
✘ |
✘ |
transformers>=4.39.0 |
- |
||
telechat-7b |
key_value, query |
telechat |
✔ |
✘ |
✘ |
✘ |
- |
|||
telechat-12b |
key_value, query |
telechat |
✔ |
✘ |
✘ |
✘ |
- |
|||
telechat-12b-v2 |
key_value, query |
telechat |
✔ |
✘ |
✘ |
✘ |
- |
|||
telechat-12b-v2-gptq-int4 |
key_value, query |
telechat |
✔ |
✘ |
✘ |
✘ |
auto_gptq>=0.5 |
- |
- |
|
telechat2-115b |
key_value, query |
telechat2 |
✔ |
✘ |
✘ |
✘ |
- |
|||
grok-1 |
q_proj, k_proj, v_proj |
default-generation |
✘ |
✘ |
✘ |
✘ |
- |
|||
dbrx-instruct |
attn.Wqkv |
dbrx |
✔ |
✔ |
✘ |
✘ |
transformers>=4.36 |
moe |
||
dbrx-base |
attn.Wqkv |
dbrx |
✔ |
✔ |
✘ |
✘ |
transformers>=4.36 |
moe |
||
mengzi3-13b-base |
q_proj, k_proj, v_proj |
mengzi |
✔ |
✔ |
✘ |
✘ |
- |
|||
c4ai-command-r-v01 |
q_proj, k_proj, v_proj |
c4ai |
✔ |
✔ |
✘ |
✘ |
transformers>=4.39.1 |
- |
||
c4ai-command-r-plus |
q_proj, k_proj, v_proj |
c4ai |
✔ |
✔ |
✘ |
✘ |
transformers>4.39 |
- |
||
aya-expanse-8b |
q_proj, k_proj, v_proj |
aya |
✔ |
✔ |
✘ |
✘ |
transformers>=4.44.0 |
- |
||
aya-expanse-32b |
q_proj, k_proj, v_proj |
aya |
✔ |
✔ |
✘ |
✘ |
transformers>=4.44.0 |
- |
||
codestral-22b |
q_proj, k_proj, v_proj |
default-generation |
✔ |
✔ |
✘ |
✘ |
transformers>=4.34 |
- |
多模态大模型
Model Type |
Model ID |
Default Lora Target Modules |
Default Template |
Support Flash Attn |
Support vLLM |
Support LMDeploy |
Support Megatron |
Requires |
Tags |
HF Model ID |
|---|---|---|---|---|---|---|---|---|---|---|
qwen-vl |
^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).* |
qwen-vl-generation |
✔ |
✔ |
✔ |
✘ |
vision |
|||
qwen-vl-chat |
^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).* |
qwen-vl |
✔ |
✔ |
✔ |
✘ |
vision |
|||
qwen-vl-chat-int4 |
^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).* |
qwen-vl |
✔ |
✔ |
✘ |
✘ |
auto_gptq>=0.5 |
vision |
||
qwen-audio |
^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).* |
qwen-audio-generation |
✔ |
✘ |
✘ |
✘ |
audio |
|||
qwen-audio-chat |
^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).* |
qwen-audio |
✔ |
✘ |
✘ |
✘ |
audio |
|||
qwen2-audio-7b |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
qwen2-audio-generation |
✔ |
✘ |
✘ |
✘ |
librosa, transformers>=4.45 |
audio |
||
qwen2-audio-7b-instruct |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
qwen2-audio |
✔ |
✘ |
✘ |
✘ |
librosa, transformers>=4.45 |
audio |
||
qwen2-vl-2b |
^(model)(?!.*(lm_head|output|emb|wte|shared)).* |
qwen2-vl-generation |
✔ |
✔ |
✘ |
✘ |
transformers>=4.45.dev.0, qwen_vl_utils |
vision, video |
||
qwen2-vl-2b-instruct |
^(model)(?!.*(lm_head|output|emb|wte|shared)).* |
qwen2-vl |
✔ |
✔ |
✘ |
✘ |
transformers>=4.45.dev.0, qwen_vl_utils |
vision, video |
||
qwen2-vl-2b-instruct-gptq-int4 |
^(model)(?!.*(lm_head|output|emb|wte|shared)).* |
qwen2-vl |
✔ |
✔ |
✘ |
✘ |
transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5 |
vision, video |
||
qwen2-vl-2b-instruct-gptq-int8 |
^(model)(?!.*(lm_head|output|emb|wte|shared)).* |
qwen2-vl |
✔ |
✔ |
✘ |
✘ |
transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5 |
vision, video |
||
qwen2-vl-2b-instruct-awq |
^(model)(?!.*(lm_head|output|emb|wte|shared)).* |
qwen2-vl |
✔ |
✔ |
✘ |
✘ |
transformers>=4.45.dev.0, qwen_vl_utils, autoawq |
vision, video |
||
qwen2-vl-7b |
^(model)(?!.*(lm_head|output|emb|wte|shared)).* |
qwen2-vl-generation |
✔ |
✔ |
✘ |
✘ |
transformers>=4.45.dev.0, qwen_vl_utils |
vision, video |
||
qwen2-vl-7b-instruct |
^(model)(?!.*(lm_head|output|emb|wte|shared)).* |
qwen2-vl |
✔ |
✔ |
✘ |
✘ |
transformers>=4.45.dev.0, qwen_vl_utils |
vision, video |
||
qwen2-vl-7b-instruct-gptq-int4 |
^(model)(?!.*(lm_head|output|emb|wte|shared)).* |
qwen2-vl |
✔ |
✔ |
✘ |
✘ |
transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5 |
vision, video |
||
qwen2-vl-7b-instruct-gptq-int8 |
^(model)(?!.*(lm_head|output|emb|wte|shared)).* |
qwen2-vl |
✔ |
✔ |
✘ |
✘ |
transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5 |
vision, video |
||
qwen2-vl-7b-instruct-awq |
^(model)(?!.*(lm_head|output|emb|wte|shared)).* |
qwen2-vl |
✔ |
✔ |
✘ |
✘ |
transformers>=4.45.dev.0, qwen_vl_utils, autoawq |
vision, video |
||
qwen2-vl-72b |
^(model)(?!.*(lm_head|output|emb|wte|shared)).* |
qwen2-vl-generation |
✔ |
✔ |
✘ |
✘ |
transformers>=4.45.dev.0, qwen_vl_utils |
vision, video |
||
qwen2-vl-72b-instruct |
^(model)(?!.*(lm_head|output|emb|wte|shared)).* |
qwen2-vl |
✔ |
✔ |
✘ |
✘ |
transformers>=4.45.dev.0, qwen_vl_utils |
vision, video |
||
qwen2-vl-72b-instruct-gptq-int4 |
^(model)(?!.*(lm_head|output|emb|wte|shared)).* |
qwen2-vl |
✔ |
✔ |
✘ |
✘ |
transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5 |
vision, video |
||
qwen2-vl-72b-instruct-gptq-int8 |
^(model)(?!.*(lm_head|output|emb|wte|shared)).* |
qwen2-vl |
✔ |
✔ |
✘ |
✘ |
transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5 |
vision, video |
||
qwen2-vl-72b-instruct-awq |
^(model)(?!.*(lm_head|output|emb|wte|shared)).* |
qwen2-vl |
✔ |
✔ |
✘ |
✘ |
transformers>=4.45.dev.0, qwen_vl_utils, autoawq |
vision, video |
||
glm4v-9b-chat |
^(transformer.encoder)(?!.*(lm_head|output|emb|wte|shared)).* |
glm4v |
✘ |
✘ |
✘ |
✘ |
transformers>=4.42 |
vision |
||
glm-edge-v-2b |
^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* |
glm-edge-v |
✔ |
✘ |
✘ |
✘ |
transformers>=4.46 |
vision |
||
glm-edge-v-5b |
^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* |
glm-edge-v |
✔ |
✘ |
✘ |
✘ |
transformers>=4.46 |
vision |
||
llama3_2-11b-vision |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
llama3_2-vision-generation |
✔ |
✔ |
✘ |
✘ |
transformers>=4.45 |
vision |
||
llama3_2-11b-vision-instruct |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
llama3_2-vision |
✔ |
✔ |
✘ |
✘ |
transformers>=4.45 |
vision |
||
llama3_2-90b-vision |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
llama3_2-vision-generation |
✔ |
✔ |
✘ |
✘ |
transformers>=4.45 |
vision |
||
llama3_2-90b-vision-instruct |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
llama3_2-vision |
✔ |
✔ |
✘ |
✘ |
transformers>=4.45 |
vision |
||
llama3_1-8b-omni |
^(model.layers|model.speech_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
llama3_1-omni |
✔ |
✘ |
✘ |
✘ |
whisper, openai-whisper |
audio |
||
idefics3-8b-llama3 |
^(model.text_model|model.connector)(?!.*(lm_head|output|emb|wte|shared)).* |
idefics3 |
✔ |
✘ |
✘ |
✘ |
transformers>=4.45 |
vision |
||
llava1_5-7b-instruct |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
llava1_5 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.36 |
vision |
||
llava1_5-13b-instruct |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
llava1_5 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.36 |
vision |
||
llava1_6-mistral-7b-instruct |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
llava-mistral |
✔ |
✔ |
✘ |
✘ |
transformers>=4.39 |
vision |
||
llava1_6-vicuna-7b-instruct |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
llava-vicuna |
✔ |
✔ |
✘ |
✘ |
transformers>=4.39 |
vision |
||
llava1_6-vicuna-13b-instruct |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
llava-vicuna |
✔ |
✔ |
✘ |
✘ |
transformers>=4.39 |
vision |
||
llava1_6-llama3_1-8b-instruct |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
llava-next-llama3 |
✔ |
✘ |
✘ |
✘ |
transformers>=4.41 |
vision |
- |
|
llava1_6-yi-34b-instruct |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
llava-yi |
✔ |
✔ |
✘ |
✘ |
transformers>=4.39 |
vision |
||
llama3-llava-next-8b-hf |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
llama-llava-next-hf |
✔ |
✔ |
✘ |
✘ |
transformers>=4.39 |
vision |
||
llava-next-72b-hf |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
llama-qwen-hf |
✔ |
✔ |
✘ |
✘ |
transformers>=4.39 |
vision |
||
llava-next-110b-hf |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
llama-qwen-hf |
✔ |
✔ |
✘ |
✘ |
transformers>=4.39 |
vision |
||
llava-onevision-qwen2-0_5b-ov |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
llava-onevision-qwen |
✔ |
✘ |
✘ |
✘ |
transformers>=4.45 |
vision, video |
||
llava-onevision-qwen2-7b-ov |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
llava-onevision-qwen |
✔ |
✘ |
✘ |
✘ |
transformers>=4.45 |
vision, video |
||
llava-onevision-qwen2-72b-ov |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
llava-onevision-qwen |
✔ |
✘ |
✘ |
✘ |
transformers>=4.45 |
vision, video |
||
llama3-llava-next-8b |
^(model.layers|model.mm_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
llama3-llava-next |
✔ |
✘ |
✘ |
✘ |
vision |
|||
llava-next-72b |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
llava-qwen |
✔ |
✘ |
✘ |
✘ |
vision |
|||
llava-next-110b |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
llava-qwen |
✔ |
✘ |
✘ |
✘ |
vision |
|||
llava-next-video-7b-instruct |
^(language_model|multi_modal_projector|vision_resampler)(?!.*(lm_head|output|emb|wte|shared)).* |
llava-next-video |
✔ |
✔ |
✘ |
✘ |
transformers>=4.42, av |
video |
||
llava-next-video-7b-32k-instruct |
^(language_model|multi_modal_projector|vision_resampler)(?!.*(lm_head|output|emb|wte|shared)).* |
llava-next-video |
✔ |
✔ |
✘ |
✘ |
transformers>=4.42, av |
video |
||
llava-next-video-7b-dpo-instruct |
^(language_model|multi_modal_projector|vision_resampler)(?!.*(lm_head|output|emb|wte|shared)).* |
llava-next-video |
✔ |
✔ |
✘ |
✘ |
transformers>=4.42, av |
video |
||
llava-next-video-34b-instruct |
^(language_model|multi_modal_projector|vision_resampler)(?!.*(lm_head|output|emb|wte|shared)).* |
llava-next-video-yi |
✔ |
✔ |
✘ |
✘ |
transformers>=4.42, av |
video |
||
yi-vl-6b-chat |
^(model.layers|model.mm_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
yi-vl |
✔ |
✘ |
✘ |
✘ |
transformers>=4.34 |
vision |
||
yi-vl-34b-chat |
^(model.layers|model.mm_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
yi-vl |
✔ |
✘ |
✘ |
✘ |
transformers>=4.34 |
vision |
||
llava-llama3-8b-v1_1 |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
llava-llama-instruct |
✔ |
✔ |
✘ |
✘ |
transformers>=4.36 |
vision |
||
internlm-xcomposer2-7b-chat |
attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3 |
internlm-xcomposer2 |
✔ |
✘ |
✔ |
✘ |
vision |
|||
internlm-xcomposer2-4khd-7b-chat |
attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3 |
internlm-xcomposer2-4khd |
✔ |
✘ |
✔ |
✘ |
vision |
|||
internlm-xcomposer2_5-7b-chat |
attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3 |
internlm-xcomposer2_5 |
✔ |
✘ |
✔ |
✘ |
vision |
|||
internvl-chat-v1_5 |
^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* |
internvl |
✔ |
✔ |
✔ |
✘ |
transformers>=4.35, timm |
vision |
||
internvl-chat-v1_5-int8 |
^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* |
internvl |
✔ |
✘ |
✘ |
✘ |
transformers>=4.35, timm |
vision |
||
mini-internvl-chat-2b-v1_5 |
^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* |
internvl |
✔ |
✔ |
✔ |
✘ |
transformers>=4.35, timm |
vision |
||
mini-internvl-chat-4b-v1_5 |
^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* |
internvl-phi3 |
✔ |
✔ |
✘ |
✘ |
transformers>=4.35,<4.42, timm |
vision |
||
internvl2-1b |
^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* |
internvl2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.36, timm |
vision, video |
||
internvl2-2b |
^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* |
internvl2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.36, timm |
vision, video |
||
internvl2-4b |
^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* |
internvl2-phi3 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.36,<4.42, timm |
vision, video |
||
internvl2-8b |
^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* |
internvl2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.36, timm |
vision, video |
||
internvl2-26b |
^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* |
internvl2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.36, timm |
vision, video |
||
internvl2-40b |
^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* |
internvl2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.36, timm |
vision, video |
||
internvl2-llama3-76b |
^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* |
internvl2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.36, timm |
vision, video |
||
internvl2-2b-awq |
^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* |
internvl2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.36, timm |
vision, video |
||
internvl2-8b-awq |
^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* |
internvl2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.36, timm |
vision, video |
||
internvl2-26b-awq |
^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* |
internvl2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.36, timm |
vision, video |
||
internvl2-40b-awq |
^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* |
internvl2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.36, timm |
vision, video |
||
internvl2-llama3-76b-awq |
^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).* |
internvl2 |
✔ |
✔ |
✔ |
✘ |
transformers>=4.36, timm |
vision, video |
||
deepseek-janus-1_3b |
^(language_model|aligner)(?!.*(lm_head|output|emb|wte|shared)).* |
deepseek-janus |
✔ |
✘ |
✘ |
✘ |
vision |
|||
deepseek-vl-1_3b-chat |
^(language_model|aligner)(?!.*(lm_head|output|emb|wte|shared)).* |
deepseek-vl |
✔ |
✘ |
✔ |
✘ |
vision |
|||
deepseek-vl-7b-chat |
^(language_model|aligner)(?!.*(lm_head|output|emb|wte|shared)).* |
deepseek-vl |
✔ |
✘ |
✔ |
✘ |
vision |
|||
ovis1_6-gemma2-9b |
^(llm)(?!.*(lm_head|output|emb|wte|shared)).* |
ovis1_6 |
✔ |
✘ |
✘ |
✘ |
transformers>=4.42 |
vision |
||
paligemma-3b-pt-224 |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
paligemma |
✔ |
✔ |
✘ |
✘ |
transformers>=4.41 |
vision |
||
paligemma-3b-pt-448 |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
paligemma |
✔ |
✔ |
✘ |
✘ |
transformers>=4.41 |
vision |
||
paligemma-3b-pt-896 |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
paligemma |
✔ |
✔ |
✘ |
✘ |
transformers>=4.41 |
vision |
||
paligemma-3b-mix-224 |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
paligemma |
✔ |
✔ |
✘ |
✘ |
transformers>=4.41 |
vision |
||
paligemma-3b-mix-448 |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
paligemma |
✔ |
✔ |
✘ |
✘ |
transformers>=4.41 |
vision |
||
minicpm-v-3b-chat |
^(llm|resampler)(?!.*(lm_head|output|emb|wte|shared)).* |
minicpm-v |
✔ |
✘ |
✘ |
✘ |
timm, transformers<4.42 |
vision |
||
minicpm-v-v2-chat |
^(llm|resampler)(?!.*(lm_head|output|emb|wte|shared)).* |
minicpm-v |
✔ |
✘ |
✘ |
✘ |
timm, transformers<4.42 |
vision |
||
minicpm-v-v2_5-chat |
^(llm|resampler)(?!.*(lm_head|output|emb|wte|shared)).* |
minicpm-v-v2_5 |
✔ |
✔ |
✘ |
✘ |
timm, transformers>=4.36 |
vision |
||
minicpm-v-v2_6-chat |
^(llm|resampler)(?!.*(lm_head|output|emb|wte|shared)).* |
minicpm-v-v2_6 |
✔ |
✔ |
✘ |
✘ |
timm, transformers>=4.36 |
vision, video |
||
pixtral-12b |
^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).* |
pixtral |
✘ |
✘ |
✘ |
✘ |
transformers>=4.45 |
vision |
||
mplug-owl2-chat |
q_proj, k_proj.multiway.0, k_proj.multiway.1, v_proj.multiway.0, v_proj.multiway.1 |
mplug-owl2 |
✔ |
✘ |
✘ |
✘ |
transformers<4.35, icecream |
vision |
||
mplug-owl2_1-chat |
c_attn.multiway.0, c_attn.multiway.1 |
mplug-owl2 |
✔ |
✘ |
✘ |
✘ |
transformers<4.35, icecream |
vision |
||
mplug-owl3-1b-chat |
^(language_model|vision2text_model)(?!.*(lm_head|output|emb|wte|shared)).* |
mplug_owl3 |
✔ |
✘ |
✘ |
✘ |
transformers>=4.36, icecream |
vision, video |
||
mplug-owl3-2b-chat |
^(language_model|vision2text_model)(?!.*(lm_head|output|emb|wte|shared)).* |
mplug_owl3 |
✔ |
✘ |
✘ |
✘ |
transformers>=4.36, icecream |
vision, video |
||
mplug-owl3-7b-chat |
^(language_model|vision2text_model)(?!.*(lm_head|output|emb|wte|shared)).* |
mplug_owl3 |
✔ |
✘ |
✘ |
✘ |
transformers>=4.36, icecream |
vision, video |
||
mplug-owl3v-7b-chat |
^(language_model|vision2text_model)(?!.*(lm_head|output|emb|wte|shared)).* |
mplug_owl3v |
✔ |
✘ |
✘ |
✘ |
transformers>=4.36, icecream |
vision, video |
||
phi3-vision-128k-instruct |
^(model.layers|model.vision_embed_tokens.img_projection)(?!.*(lm_head|output|emb|wte|shared)).* |
phi3-vl |
✔ |
✔ |
✘ |
✘ |
transformers>=4.36 |
vision |
||
phi3_5-vision-instruct |
^(model.layers|model.vision_embed_tokens.img_projection)(?!.*(lm_head|output|emb|wte|shared)).* |
phi3-vl |
✔ |
✔ |
✘ |
✘ |
transformers>=4.36 |
vision |
||
cogvlm-17b-chat |
^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* |
cogvlm |
✘ |
✘ |
✘ |
✘ |
transformers<4.42 |
vision |
||
cogvlm2-19b-chat |
^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* |
cogvlm |
✘ |
✘ |
✔ |
✘ |
transformers<4.42 |
vision |
||
cogvlm2-en-19b-chat |
^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* |
cogvlm |
✘ |
✘ |
✔ |
✘ |
transformers<4.42 |
vision |
||
cogvlm2-video-13b-chat |
^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* |
cogvlm2-video |
✘ |
✘ |
✘ |
✘ |
decord, pytorchvideo, transformers>=4.42 |
vision, video |
||
cogagent-18b-chat |
^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* |
cogagent-chat |
✘ |
✘ |
✘ |
✘ |
timm |
vision |
||
cogagent-18b-instruct |
^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).* |
cogagent-instruct |
✘ |
✘ |
✘ |
✘ |
timm |
vision |
||
molmoe-1b |
^(model.transformer)(?!.*(lm_head|output|emb|wte|shared)).* |
molmo |
✔ |
✘ |
✘ |
✘ |
transformers>=4.45.0 |
vision |
||
molmo-7b-o |
^(model.transformer)(?!.*(lm_head|output|emb|wte|shared)).* |
molmo |
✔ |
✘ |
✘ |
✘ |
transformers>=4.45.0 |
vision |
||
molmo-7b-d |
^(model.transformer)(?!.*(lm_head|output|emb|wte|shared)).* |
molmo |
✔ |
✘ |
✘ |
✘ |
transformers>=4.45.0 |
vision |
||
molmo-72b |
^(model.transformer)(?!.*(lm_head|output|emb|wte|shared)).* |
molmo |
✔ |
✘ |
✘ |
✘ |
transformers>=4.45.0 |
vision |
||
emu3-chat |
^(model)(?!.*(lm_head|output|emb|wte|shared)).* |
emu3-chat |
✔ |
✘ |
✘ |
✘ |
transformers>=4.44.0 |
vision |
||
florence-2-base |
^(language_model|image_projection)(?!.*(lm_head|output|emb|wte|shared)).* |
florence |
✔ |
✘ |
✘ |
✘ |
vision |
|||
florence-2-base-ft |
^(language_model|image_projection)(?!.*(lm_head|output|emb|wte|shared)).* |
florence |
✔ |
✘ |
✘ |
✘ |
vision |
|||
florence-2-large |
^(language_model|image_projection)(?!.*(lm_head|output|emb|wte|shared)).* |
florence |
✔ |
✘ |
✘ |
✘ |
vision |
|||
florence-2-large-ft |
^(language_model|image_projection)(?!.*(lm_head|output|emb|wte|shared)).* |
florence |
✔ |
✘ |
✘ |
✘ |
vision |
|||
got-ocr2 |
^(model.layers|model.mm_projector_vary)(?!.*(lm_head|output|emb|wte|shared)).* |
got_ocr2 |
✔ |
✘ |
✘ |
✘ |
audio |
数据集
下表介绍了swift接入的数据集的相关信息:
Dataset Name: 数据集在swift中注册的dataset_name.
Dataset ID: 数据集在ModelScope上的dataset_id.
Size: 数据集中的数据样本数量.
Statistic: 数据集的统计量. 我们使用token数进行统计, 这对于调整
max_length超参数有帮助. 我们将数据集的训练集和验证集进行拼接, 然后进行统计. 我们使用qwen的tokenizer对数据集进行分词. 不同的tokenizer的统计量不同, 如果你要获取其他的模型的tokenizer的token统计量, 可以通过脚本自行获取.
| Dataset Name | Dataset ID | Subsets | Dataset Size | Statistic (token) | Tags | HF Dataset ID |
|---|---|---|---|---|---|---|
| 🔥ms-bench | iic/ms_bench | 316820 | 346.9±443.2, min=22, max=30960 | chat, general, multi-round | - | |
| 🔥alpaca-en | AI-ModelScope/alpaca-gpt4-data-en | 52002 | 176.2±125.8, min=26, max=740 | chat, general | vicgalle/alpaca-gpt4 | |
| 🔥alpaca-zh | AI-ModelScope/alpaca-gpt4-data-zh | 48818 | 162.1±93.9, min=26, max=856 | chat, general | llm-wizard/alpaca-gpt4-data-zh | |
| multi-alpaca | damo/nlp_polylm_multialpaca_sft | ar de es fr id ja ko pt ru th vi |
131867 | 112.9±50.6, min=26, max=1226 | chat, general, multilingual | - |
| instinwild | wyj123456/instinwild | default subset |
103695 | 145.4±60.7, min=28, max=1434 | - | - |
| cot-en | YorickHe/CoT | 74771 | 122.7±64.8, min=51, max=8320 | chat, general | - | |
| cot-zh | YorickHe/CoT_zh | 74771 | 117.5±70.8, min=43, max=9636 | chat, general | - | |
| instruct-en | wyj123456/instruct | 888970 | 269.1±331.5, min=26, max=7254 | chat, general | - | |
| firefly-zh | AI-ModelScope/firefly-train-1.1M | 1649399 | 178.1±260.4, min=26, max=12516 | chat, general | YeungNLP/firefly-train-1.1M | |
| gpt4all-en | wyj123456/GPT4all | 806199 | 302.7±384.5, min=27, max=7391 | chat, general | - | |
| sharegpt | swift/sharegpt | common-zh computer-zh unknow-zh common-en computer-en |
96566 | 933.3±864.8, min=21, max=66412 | chat, general, multi-round | - |
| tulu-v2-sft-mixture | AI-ModelScope/tulu-v2-sft-mixture | 5119 | 520.7±437.6, min=68, max=2549 | chat, multilingual, general, multi-round | allenai/tulu-v2-sft-mixture | |
| wikipedia-zh | AI-ModelScope/wikipedia-cn-20230720-filtered | 254547 | 568.4±713.2, min=37, max=78678 | text-generation, general, pretrained | pleisto/wikipedia-cn-20230720-filtered | |
| open-orca | AI-ModelScope/OpenOrca | 994896 | 382.3±417.4, min=31, max=8740 | chat, multilingual, general | - | |
| 🔥sharegpt-gpt4 | AI-ModelScope/sharegpt_gpt4 | default V3_format zh_38K_format |
72684 | 1047.6±1313.1, min=22, max=66412 | chat, multilingual, general, multi-round, gpt4 | - |
| deepctrl-sft | AI-ModelScope/deepctrl-sft-data | default en |
14149024 | 389.8±628.6, min=21, max=626237 | chat, general, sft, multi-round | - |
| 🔥coig-cqia | AI-ModelScope/COIG-CQIA | chinese_traditional coig_pc exam finance douban human_value logi_qa ruozhiba segmentfault wiki wikihow xhs zhihu |
44694 | 703.8±654.2, min=33, max=19288 | general | - |
| 🔥ruozhiba | AI-ModelScope/ruozhiba | post-annual title-good title-norm |
85658 | 39.9±13.1, min=21, max=559 | pretrain | - |
| long-alpaca-12k | AI-ModelScope/LongAlpaca-12k | 11998 | 9619.0±8295.8, min=36, max=78925 | longlora, QA | Yukang/LongAlpaca-12k | |
| lmsys-chat-1m | AI-ModelScope/lmsys-chat-1m | - | Dataset is too huge, please click the original link to view the dataset stat. | chat, em | lmsys/lmsys-chat-1m | |
| 🔥ms-agent | iic/ms_agent | 26336 | 650.9±217.2, min=209, max=2740 | chat, agent, multi-round | - | |
| 🔥ms-agent-for-agentfabric | AI-ModelScope/ms_agent_for_agentfabric | default addition |
30000 | 617.8±199.1, min=251, max=2657 | chat, agent, multi-round | - |
| ms-agent-multirole | iic/MSAgent-MultiRole | 9500 | 447.6±84.9, min=145, max=1101 | chat, agent, multi-round, role-play, multi-agent | - | |
| 🔥toolbench-for-alpha-umi | shenweizhou/alpha-umi-toolbench-processed-v2 | backbone caller planner summarizer |
1448337 | 1439.7±853.9, min=123, max=18467 | chat, agent | - |
| damo-agent-zh | damo/MSAgent-Bench | 386984 | 956.5±407.3, min=326, max=19001 | chat, agent, multi-round | - | |
| damo-agent-zh-mini | damo/MSAgent-Bench | 20845 | 1326.4±329.6, min=571, max=4304 | chat, agent, multi-round | - | |
| agent-instruct-all-en | huangjintao/AgentInstruct_copy | alfworld db kg mind2web os webshop |
1866 | 1144.3±635.5, min=206, max=6412 | chat, agent, multi-round | - |
| 🔥msagent-pro | iic/MSAgent-Pro | 21905 | 1524.5±921.3, min=64, max=16770 | chat, agent, multi-round | - | |
| toolbench | swift/ToolBench | 124345 | 3669.5±1600.9, min=1047, max=22581 | chat, agent, multi-round | - | |
| code-alpaca-en | wyj123456/code_alpaca_en | 20016 | 100.2±60.1, min=29, max=1776 | - | sahil2801/CodeAlpaca-20k | |
| 🔥leetcode-python-en | AI-ModelScope/leetcode-solutions-python | 2359 | 727.1±235.9, min=259, max=2146 | chat, coding | - | |
| 🔥codefuse-python-en | codefuse-ai/CodeExercise-Python-27k | 27224 | 483.6±193.9, min=45, max=3082 | chat, coding | - | |
| 🔥codefuse-evol-instruction-zh | codefuse-ai/Evol-instruction-66k | 66862 | 439.6±206.3, min=37, max=2983 | chat, coding | - | |
| medical-en | swift/medical_zh | en | 117617 | 257.4±89.1, min=36, max=2564 | chat, medical | - |
| medical-zh | swift/medical_zh | zh | 1950972 | 167.2±219.7, min=26, max=27351 | chat, medical | - |
| 🔥disc-med-sft-zh | AI-ModelScope/DISC-Med-SFT | 441767 | 354.1±193.1, min=25, max=2231 | chat, medical | Flmc/DISC-Med-SFT | |
| lawyer-llama-zh | AI-ModelScope/lawyer_llama_data | 21476 | 194.4±91.7, min=27, max=924 | chat, law | Skepsun/lawyer_llama_data | |
| tigerbot-law-zh | AI-ModelScope/tigerbot-law-plugin | 55895 | 109.9±126.4, min=37, max=18878 | text-generation, law, pretrained | TigerResearch/tigerbot-law-plugin | |
| 🔥disc-law-sft-zh | AI-ModelScope/DISC-Law-SFT | 166758 | 533.7±495.4, min=30, max=15169 | chat, law | ShengbinYue/DISC-Law-SFT | |
| 🔥blossom-math-zh | AI-ModelScope/blossom-math-v2 | 10000 | 169.3±58.7, min=35, max=563 | chat, math | Azure99/blossom-math-v2 | |
| school-math-zh | AI-ModelScope/school_math_0.25M | 248480 | 157.7±72.2, min=33, max=3450 | chat, math, quality | BelleGroup/school_math_0.25M | |
| open-platypus-en | AI-ModelScope/Open-Platypus | 24926 | 367.9±254.8, min=30, max=3951 | chat, math, quality | garage-bAInd/Open-Platypus | |
| text2sql-en | AI-ModelScope/texttosqlv2_25000_v2 | 25000 | 274.6±326.4, min=38, max=1975 | chat, sql | Clinton/texttosqlv2_25000_v2 | |
| 🔥sql-create-context-en | AI-ModelScope/sql-create-context | 78577 | 80.2±17.8, min=36, max=456 | chat, sql | b-mc2/sql-create-context | |
| synthetic-text-to-sql | AI-ModelScope/synthetic_text_to_sql | default | 100000 | 283.4±115.8, min=61, max=1356 | nl2sql, en | gretelai/synthetic_text_to_sql |
| 🔥advertise-gen-zh | lvjianjin/AdvertiseGen | 98399 | 130.6±21.7, min=51, max=241 | text-generation | shibing624/AdvertiseGen | |
| 🔥dureader-robust-zh | modelscope/DuReader_robust-QG | 17899 | 241.1±137.4, min=60, max=1416 | text-generation | - | |
| cmnli-zh | modelscope/clue | cmnli | 404024 | 82.6±16.6, min=51, max=199 | text-generation, classification | clue |
| 🔥jd-sentiment-zh | DAMO_NLP/jd | 50000 | 66.0±83.2, min=39, max=4039 | text-generation, classification | - | |
| 🔥hc3-zh | simpleai/HC3-Chinese | baike open_qa nlpcc_dbqa finance medicine law psychology |
39781 | 176.8±81.5, min=57, max=3051 | text-generation, classification | Hello-SimpleAI/HC3-Chinese |
| 🔥hc3-en | simpleai/HC3 | finance medicine |
11021 | 298.3±138.7, min=65, max=2267 | text-generation, classification | Hello-SimpleAI/HC3 |
| dolly-15k | AI-ModelScope/databricks-dolly-15k | default | 15011 | 199.2±267.8, min=22, max=8615 | multi-task, en, quality | databricks/databricks-dolly-15k |
| zhihu-kol | OmniData/Zhihu-KOL | default | - | Dataset is too huge, please click the original link to view the dataset stat. | zhihu, qa | wangrui6/Zhihu-KOL |
| zhihu-kol-filtered | OmniData/Zhihu-KOL-More-Than-100-Upvotes | default | 271261 | 952.0±1727.2, min=25, max=98658 | zhihu, qa | bzb2023/Zhihu-KOL-More-Than-100-Upvotes |
| finance-en | wyj123456/finance_en | 68911 | 135.6±134.3, min=26, max=3525 | chat, financial | ssbuild/alpaca_finance_en | |
| poetry-zh | modelscope/chinese-poetry-collection | 390309 | 55.2±9.4, min=23, max=83 | text-generation, poetry | - | |
| webnovel-zh | AI-ModelScope/webnovel_cn | 50000 | 1478.9±11526.1, min=100, max=490484 | chat, novel | zxbsmk/webnovel_cn | |
| generated-chat-zh | AI-ModelScope/generated_chat_0.4M | 396004 | 273.3±52.0, min=32, max=873 | chat, character-dialogue | BelleGroup/generated_chat_0.4M | |
| 🔥self-cognition | swift/self-cognition | 134 | 53.6±18.6, min=29, max=121 | chat, self-cognition | modelscope/self-cognition | |
| 🔥swift-mix | swift/swift-sft-mixture | sharegpt firefly codefuse metamathqa |
- | Dataset is too huge, please click the original link to view the dataset stat. | chat, sft, general | - |
| cls-fudan-news-zh | damo/zh_cls_fudan-news | 4959 | 3234.4±2547.5, min=91, max=19548 | chat, classification | - | |
| ner-jave-zh | damo/zh_ner-JAVE | 1266 | 118.3±45.5, min=44, max=223 | chat, ner | - | |
| coco-en | modelscope/coco_2014_caption | coco_2014_caption | 454617 | 299.8±2.8, min=295, max=352 | chat, multi-modal, vision | - |
| 🔥coco-en-mini | modelscope/coco_2014_caption | coco_2014_caption | 40504 | 299.8±2.6, min=295, max=338 | chat, multi-modal, vision | - |
| coco-en-2 | modelscope/coco_2014_caption | coco_2014_caption | 454617 | 36.8±2.8, min=32, max=89 | chat, multi-modal, vision | - |
| 🔥coco-en-2-mini | modelscope/coco_2014_caption | coco_2014_caption | 40504 | 36.8±2.6, min=32, max=75 | chat, multi-modal, vision | - |
| capcha-images | AI-ModelScope/captcha-images | 8000 | 31.0±0.0, min=31, max=31 | chat, multi-modal, vision | - | |
| latex-ocr-print | AI-ModelScope/LaTeX_OCR | default | 17918 | 362.7±34.8, min=294, max=528 | chat, ocr, multi-modal, vision | linxy/LaTeX_OCR |
| latex-ocr-handwrite | AI-ModelScope/LaTeX_OCR | synthetic_handwrite | 95424 | 375.1±59.4, min=292, max=2115 | chat, ocr, multi-modal, vision | linxy/LaTeX_OCR |
| aishell1-zh | speech_asr/speech_asr_aishell1_trainsets | 141600 | 152.2±36.8, min=63, max=419 | chat, multi-modal, audio | - | |
| 🔥aishell1-zh-mini | speech_asr/speech_asr_aishell1_trainsets | 14526 | 152.2±35.6, min=74, max=359 | chat, multi-modal, audio | - | |
| 🔥video-chatgpt | swift/VideoChatGPT | Generic Temporal Consistency |
3206 | 88.4±48.3, min=32, max=399 | chat, multi-modal, video | lmms-lab/VideoChatGPT |
| egoschema | AI-ModelScope/egoschema | Subset | 101 | 191.6±80.7, min=96, max=435 | chat, multi-modal, video | lmms-lab/egoschema |
| llava-video-178k | lmms-lab/LLaVA-Video-178K | 0_30_s_academic_v0_1 0_30_s_youtube_v0_1 1_2_m_academic_v0_1 1_2_m_youtube_v0_1 2_3_m_academic_v0_1 2_3_m_youtube_v0_1 30_60_s_academic_v0_1 30_60_s_youtube_v0_1 |
- | Dataset is too huge, please click the original link to view the dataset stat. | chat, multi-modal, video | lmms-lab/LLaVA-Video-178K |
| moviechat-1k-test | AI-ModelScope/MovieChat-1K-test | 486 | 36.1±4.3, min=27, max=42 | chat, multi-modal, video | Enxin/MovieChat-1K-test | |
| hh-rlhf | AI-ModelScope/hh-rlhf | harmless-base helpful-base helpful-online helpful-rejection-sampled |
127459 | 245.4±190.7, min=22, max=1999 | rlhf, dpo, pairwise | - |
| 🔥hh-rlhf-cn | AI-ModelScope/hh_rlhf_cn | hh_rlhf harmless_base_cn harmless_base_en helpful_base_cn helpful_base_en |
355920 | 171.2±122.7, min=22, max=3078 | rlhf, dpo, pairwise | - |
| orpo-dpo-mix-40k | AI-ModelScope/orpo-dpo-mix-40k | default | 43666 | 548.3±397.4, min=28, max=8483 | dpo, orpo, en, quality | mlabonne/orpo-dpo-mix-40k |
| stack-exchange-paired | AI-ModelScope/stack-exchange-paired | 4483004 | 534.5±594.6, min=31, max=56588 | hfrl, dpo, pairwise | lvwerra/stack-exchange-paired | |
| shareai-llama3-dpo-zh-en-emoji | hjh0119/shareAI-Llama3-DPO-zh-en-emoji | default | 2449 | 334.0±162.8, min=36, max=1801 | rlhf, dpo, pairwise | - |
| ultrafeedback-kto | AI-ModelScope/ultrafeedback-binarized-preferences-cleaned-kto | default | 230720 | 11.0±0.0, min=11, max=11 | rlhf, kto | - |
| rlaif-v | swift/RLAIF-V-Dataset | default | 83132 | 119.8±52.6, min=28, max=556 | rlhf, dpo, multi-modal, en | openbmb/RLAIF-V-Dataset |
| pileval | swift/pile-val-backup | 214670 | 1612.3±8856.2, min=11, max=1208955 | text-generation, awq | mit-han-lab/pile-val-backup | |
| mantis-instruct | swift/Mantis-Instruct | birds-to-words chartqa coinstruct contrastive_caption docvqa dreamsim dvqa iconqa imagecode llava_665k_multi lrv_multi multi_vqa nextqa nlvr2 spot-the-diff star visual_story_telling |
655351 | 825.7±812.5, min=284, max=13563 | chat, multi-modal, vision, quality | TIGER-Lab/Mantis-Instruct |
| llava-data-instruct | swift/llava-data | llava_instruct | 364100 | 189.0±142.1, min=33, max=5183 | sft, multi-modal, quality | TIGER-Lab/llava-data |
| midefics | swift/MideficsDataset | 3800 | 201.3±70.2, min=60, max=454 | medical, en, vqa | WinterSchool/MideficsDataset | |
| gqa | None | train_all_instructions | - | Dataset is too huge, please click the original link to view the dataset stat. | multi-modal, en, vqa, quality | lmms-lab/GQA |
| text-caps | swift/TextCaps | 18145 | 38.2±4.4, min=31, max=73 | multi-modal, en, caption, quality | HuggingFaceM4/TextCaps | |
| refcoco-unofficial-caption | swift/refcoco | 46215 | 44.7±3.2, min=36, max=71 | multi-modal, en, caption | jxu124/refcoco | |
| refcoco-unofficial-grounding | swift/refcoco | 46215 | 45.2±3.1, min=37, max=69 | multi-modal, en, grounding | jxu124/refcoco | |
| refcocog-unofficial-caption | swift/refcocog | 44799 | 49.7±4.7, min=37, max=88 | multi-modal, en, caption | jxu124/refcocog | |
| refcocog-unofficial-grounding | swift/refcocog | 44799 | 50.1±4.7, min=37, max=90 | multi-modal, en, grounding | jxu124/refcocog | |
| a-okvqa | swift/A-OKVQA | 18201 | 45.8±7.9, min=32, max=100 | multi-modal, en, vqa, quality | HuggingFaceM4/A-OKVQA | |
| okvqa | swift/OK-VQA_train | 9009 | 34.4±3.3, min=28, max=59 | multi-modal, en, vqa, quality | Multimodal-Fatima/OK-VQA_train | |
| ocr-vqa | swift/OCR-VQA | 186753 | 35.6±6.6, min=29, max=193 | multi-modal, en, ocr-vqa | howard-hou/OCR-VQA | |
| grit | swift/GRIT | - | Dataset is too huge, please click the original link to view the dataset stat. | multi-modal, en, caption-grounding, quality | zzliang/GRIT | |
| llava-instruct-mix | swift/llava-instruct-mix-vsft | 13640 | 179.8±120.2, min=30, max=962 | multi-modal, en, vqa, quality | HuggingFaceH4/llava-instruct-mix-vsft | |
| lnqa | swift/lnqa | - | Dataset is too huge, please click the original link to view the dataset stat. | multi-modal, en, ocr-vqa, quality | vikhyatk/lnqa | |
| science-qa | swift/ScienceQA | 8315 | 100.3±59.5, min=38, max=638 | multi-modal, science, vqa, quality | derek-thomas/ScienceQA | |
| guanaco | AI-ModelScope/GuanacoDataset | default | 31561 | 250.1±70.3, min=89, max=1436 | chat, zh | JosephusCheung/GuanacoDataset |
| mind2web | swift/Multimodal-Mind2Web | 1009 | 297522.4±325496.2, min=8592, max=3499715 | agent, multi-modal | osunlp/Multimodal-Mind2Web | |
| sharegpt-4o-image | AI-ModelScope/ShareGPT-4o | image_caption | 57289 | 638.7±157.9, min=47, max=4640 | vqa, multi-modal | OpenGVLab/ShareGPT-4o |
| pixelprose | swift/pixelprose | - | Dataset is too huge, please click the original link to view the dataset stat. | caption, multi-modal, vision | tomg-group-umd/pixelprose | |
| m3it | AI-ModelScope/M3IT | coco vqa-v2 shapes shapes-rephrased coco-goi-rephrased snli-ve snli-ve-rephrased okvqa a-okvqa viquae textcap docvqa science-qa imagenet imagenet-open-ended imagenet-rephrased coco-goi clevr clevr-rephrased nlvr coco-itm coco-itm-rephrased vsr vsr-rephrased mocheg mocheg-rephrased coco-text fm-iqa activitynet-qa msrvtt ss coco-cn refcoco refcoco-rephrased multi30k image-paragraph-captioning visual-dialog visual-dialog-rephrased iqa vcr visual-mrc ivqa msrvtt-qa msvd-qa gqa text-vqa ocr-vqa st-vqa flickr8k-cn |
- | Dataset is too huge, please click the original link to view the dataset stat. | chat, multi-modal, vision | - |
| sharegpt4v | AI-ModelScope/ShareGPT4V | ShareGPT4V ShareGPT4V-PT |
- | Dataset is too huge, please click the original link to view the dataset stat. | chat, multi-modal, vision | - |
| llava-instruct-150k | AI-ModelScope/LLaVA-Instruct-150K | 624610 | 490.4±180.2, min=288, max=5438 | chat, multi-modal, vision | - | |
| llava-pretrain | AI-ModelScope/LLaVA-Pretrain | default | - | Dataset is too huge, please click the original link to view the dataset stat. | vqa, multi-modal, quality | liuhaotian/LLaVA-Pretrain |
| sa1b-dense-caption | Tongyi-DataEngine/SA1B-Dense-Caption | - | Dataset is too huge, please click the original link to view the dataset stat. | zh, multi-modal, vqa | - | |
| sa1b-paired-caption | Tongyi-DataEngine/SA1B-Paired-Captions-Images | - | Dataset is too huge, please click the original link to view the dataset stat. | zh, multi-modal, vqa | - | |
| alpaca-cleaned | AI-ModelScope/alpaca-cleaned | 51760 | 177.9±126.4, min=26, max=1044 | chat, general, bench, quality | yahma/alpaca-cleaned | |
| aya-collection | swift/aya_collection | aya_dataset | 202364 | 494.0±6911.3, min=21, max=3044268 | multi-lingual, qa | CohereForAI/aya_collection |
| belle-generated-chat-0.4M | AI-ModelScope/generated_chat_0.4M | 396004 | 273.3±52.0, min=32, max=873 | common, zh | BelleGroup/generated_chat_0.4M | |
| belle-math-0.25M | AI-ModelScope/school_math_0.25M | 248480 | 157.7±72.2, min=33, max=3450 | math, zh | BelleGroup/school_math_0.25M | |
| belle-train-0.5M-CN | AI-ModelScope/train_0.5M_CN | 519255 | 129.1±91.5, min=27, max=6507 | common, zh, quality | BelleGroup/train_0.5M_CN | |
| belle-train-1M-CN | AI-ModelScope/train_1M_CN | - | Dataset is too huge, please click the original link to view the dataset stat. | common, zh, quality | BelleGroup/train_1M_CN | |
| belle-train-2M-CN | AI-ModelScope/train_2M_CN | - | Dataset is too huge, please click the original link to view the dataset stat. | common, zh, quality | BelleGroup/train_2M_CN | |
| belle-train-3.5M-CN | swift/train_3.5M_CN | - | Dataset is too huge, please click the original link to view the dataset stat. | common, zh, quality | BelleGroup/train_3.5M_CN | |
| c4 | None | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality | allenai/c4 | |
| chart-qa | swift/ChartQA | 28299 | 43.1±5.5, min=29, max=77 | en, vqa, quality | HuggingFaceM4/ChartQA | |
| chinese-c4 | swift/chinese-c4 | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, zh, quality | shjwudp/chinese-c4 | |
| cinepile | swift/cinepile | - | Dataset is too huge, please click the original link to view the dataset stat. | vqa, en, youtube, video | tomg-group-umd/cinepile | |
| classical-chinese-translate | swift/classical_chinese_translate | 6655 | 344.0±76.4, min=61, max=815 | chat, play-ground | - | |
| codealpaca-20k | AI-ModelScope/CodeAlpaca-20k | 20016 | 100.2±60.1, min=29, max=1776 | code, en | HuggingFaceH4/CodeAlpaca_20K | |
| cosmopedia | None | auto_math_text khanacademy openstax stanford stories web_samples_v1 web_samples_v2 wikihow |
- | Dataset is too huge, please click the original link to view the dataset stat. | multi-domain, en, qa | HuggingFaceTB/cosmopedia |
| cosmopedia-100k | swift/cosmopedia-100k | 100000 | 1024.5±243.1, min=239, max=2981 | multi-domain, en, qa | HuggingFaceTB/cosmopedia-100k | |
| dolma | swift/dolma | v1_7 | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality | allenai/dolma |
| dolphin | swift/dolphin | flan1m-alpaca-uncensored flan5m-alpaca-uncensored |
- | Dataset is too huge, please click the original link to view the dataset stat. | en | cognitivecomputations/dolphin |
| duet | AI-ModelScope/Duet-v0.5 | 5000 | 1157.4±189.3, min=657, max=2344 | CoT, en | G-reen/Duet-v0.5 | |
| evol-instruct-v2 | AI-ModelScope/WizardLM_evol_instruct_V2_196k | 109184 | 480.9±333.1, min=26, max=4942 | chat, en | WizardLM/WizardLM_evol_instruct_V2_196k | |
| fineweb | None | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality | HuggingFaceFW/fineweb | |
| gen-qa | swift/GenQA | - | Dataset is too huge, please click the original link to view the dataset stat. | qa, quality, multi-task | tomg-group-umd/GenQA | |
| github-code | swift/github-code | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality | codeparrot/github-code | |
| gpt4v-dataset | swift/gpt4v-dataset | 12356 | 217.9±68.3, min=35, max=596 | en, caption, multi-modal, quality | laion/gpt4v-dataset | |
| guanaco-belle-merge | AI-ModelScope/guanaco_belle_merge_v1.0 | 693987 | 134.2±92.0, min=24, max=6507 | QA, zh | Chinese-Vicuna/guanaco_belle_merge_v1.0 | |
| infinity-instruct | swift/Infinity-Instruct | - | Dataset is too huge, please click the original link to view the dataset stat. | qa, quality, multi-task | BAAI/Infinity-Instruct | |
| llava-med-zh-instruct | swift/llava-med-zh-instruct-60k | 56649 | 207.7±67.6, min=37, max=657 | zh, medical, vqa | BUAADreamer/llava-med-zh-instruct-60k | |
| 🔥longwriter-6k | ZhipuAI/LongWriter-6k | 6000 | 4887.2±2879.2, min=117, max=30354 | long, chat, sft | THUDM/LongWriter-6k | |
| 🔥longwriter-6k-filtered | swift/longwriter-6k-filtered | 666 | 4108.9±2636.9, min=1190, max=17050 | long, chat, sft | - | |
| math-instruct | AI-ModelScope/MathInstruct | 262283 | 254.4±183.5, min=11, max=4383 | math, cot, en, quality | TIGER-Lab/MathInstruct | |
| math-plus | TIGER-Lab/MATH-plus | train | 893929 | 287.1±158.7, min=24, max=2919 | qa, math, en, quality | TIGER-Lab/MATH-plus |
| moondream2-coyo-5M | swift/moondream2-coyo-5M-captions | - | Dataset is too huge, please click the original link to view the dataset stat. | caption, pretrain, quality | isidentical/moondream2-coyo-5M-captions | |
| no-robots | swift/no_robots | 9485 | 298.7±246.4, min=40, max=6739 | multi-task, quality, human-annotated | HuggingFaceH4/no_robots | |
| open-hermes | swift/OpenHermes-2.5 | - | Dataset is too huge, please click the original link to view the dataset stat. | cot, en, quality | teknium/OpenHermes-2.5 | |
| open-o1 | AI-ModelScope/OpenO1-SFT | default | 203579 | 615.5±659.6, min=11, max=27509 | chat, general, o1 | O1-OPEN/OpenO1-SFT |
| open-orca-chinese | AI-ModelScope/OpenOrca-Chinese | - | Dataset is too huge, please click the original link to view the dataset stat. | QA, zh, general, quality | yys/OpenOrca-Chinese | |
| orca_dpo_pairs | swift/orca_dpo_pairs | 12859 | 366.9±251.9, min=30, max=2010 | rlhf, quality | Intel/orca_dpo_pairs | |
| path-vqa | swift/path-vqa | 19654 | 34.8±7.3, min=27, max=85 | multi-modal, vqa, medical | flaviagiammarino/path-vqa | |
| pile | AI-ModelScope/pile | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain | EleutherAI/pile | |
| poison-mpts | iic/100PoisonMpts | 906 | 150.6±80.8, min=39, max=656 | poison-management, zh | - | |
| 🔥qwen2-pro-en | AI-ModelScope/Magpie-Qwen2-Pro-200K-English | 200000 | 605.4±287.3, min=221, max=4267 | chat, sft, en | Magpie-Align/Magpie-Qwen2-Pro-200K-English | |
| 🔥qwen2-pro-filtered | AI-ModelScope/Magpie-Qwen2-Pro-300K-Filtered | 300000 | 555.8±286.6, min=148, max=4267 | chat, sft | Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered | |
| 🔥qwen2-pro-zh | AI-ModelScope/Magpie-Qwen2-Pro-200K-Chinese | 200000 | 446.2±246.4, min=74, max=4101 | chat, sft, zh | Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese | |
| redpajama-data-1t | swift/RedPajama-Data-1T | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality | togethercomputer/RedPajama-Data-1T | |
| redpajama-data-v2 | swift/RedPajama-Data-V2 | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality | togethercomputer/RedPajama-Data-V2 | |
| refinedweb | None | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality | tiiuae/falcon-refinedweb | |
| rwkv-pretrain-web | mapjack/openwebtext_dataset | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, zh, quality | - | |
| sft-nectar | AI-ModelScope/SFT-Nectar | 131192 | 396.4±272.1, min=44, max=10732 | cot, en, quality | AstraMindAI/SFT-Nectar | |
| skypile | AI-ModelScope/SkyPile-150B | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality, zh | Skywork/SkyPile-150B | |
| slim-orca | swift/SlimOrca | 517982 | 399.1±370.2, min=35, max=8756 | quality, en | Open-Orca/SlimOrca | |
| slim-pajama-627b | None | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality | cerebras/SlimPajama-627B | |
| starcoder | AI-ModelScope/starcoderdata | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality | bigcode/starcoderdata | |
| tagengo-gpt4 | swift/tagengo-gpt4 | 78057 | 472.3±292.9, min=22, max=3521 | chat, multi-lingual, quality | lightblue/tagengo-gpt4 | |
| the-stack | AI-ModelScope/the-stack | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality | bigcode/the-stack | |
| ultrachat-200k | swift/ultrachat_200k | 207865 | 1195.4±573.7, min=76, max=4470 | chat, en, quality | HuggingFaceH4/ultrachat_200k | |
| vqa-v2 | swift/VQAv2 | 443757 | 31.8±2.2, min=27, max=58 | en, vqa, quality | HuggingFaceM4/VQAv2 | |
| web-instruct-sub | swift/WebInstructSub | - | Dataset is too huge, please click the original link to view the dataset stat. | qa, en, math, quality, multi-domain, science | TIGER-Lab/WebInstructSub | |
| wikipedia | swift/wikipedia | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality | wikipedia | |
| wikipedia-cn-filtered | AI-ModelScope/wikipedia-cn-20230720-filtered | - | Dataset is too huge, please click the original link to view the dataset stat. | pretrain, quality | pleisto/wikipedia-cn-20230720-filtered | |
| zhihu-rlhf | AI-ModelScope/zhihu_rlhf_3k | 3460 | 594.5±365.9, min=31, max=1716 | rlhf, dpo, zh | liyucheng/zhihu_rlhf_3k |