支持的模型和数据集

目录

模型

下表介绍了swift介入的模型的相关信息:

  • Model List: 模型在swift中注册的model_type的列表.

  • Default Lora Target Modules: 对应模型的默认lora_target_modules.

  • Default Template: 对应模型的默认template.

  • Support Flash Attn: 模型是否支持flash attention加速推理和微调.

  • Support VLLM: 模型是否支持vllm加速推理和部署.

  • Requires: 对应模型所需的额外依赖要求.

大语言模型

Model Type

Model ID

Default Lora Target Modules

Default Template

Support Flash Attn

Support vLLM

Support LMDeploy

Support Megatron

Requires

Tags

HF Model ID

qwen-1_8b

qwen/Qwen-1_8B

c_attn

default-generation

-

Qwen/Qwen-1_8B

qwen-1_8b-chat

qwen/Qwen-1_8B-Chat

c_attn

qwen

-

Qwen/Qwen-1_8B-Chat

qwen-1_8b-chat-int4

qwen/Qwen-1_8B-Chat-Int4

c_attn

qwen

auto_gptq>=0.5

-

Qwen/Qwen-1_8B-Chat-Int4

qwen-1_8b-chat-int8

qwen/Qwen-1_8B-Chat-Int8

c_attn

qwen

auto_gptq>=0.5

-

Qwen/Qwen-1_8B-Chat-Int8

qwen-7b

qwen/Qwen-7B

c_attn

default-generation

-

Qwen/Qwen-7B

qwen-7b-chat

qwen/Qwen-7B-Chat

c_attn

qwen

-

Qwen/Qwen-7B-Chat

qwen-7b-chat-int4

qwen/Qwen-7B-Chat-Int4

c_attn

qwen

auto_gptq>=0.5

-

Qwen/Qwen-7B-Chat-Int4

qwen-7b-chat-int8

qwen/Qwen-7B-Chat-Int8

c_attn

qwen

auto_gptq>=0.5

-

Qwen/Qwen-7B-Chat-Int8

qwen-14b

qwen/Qwen-14B

c_attn

default-generation

-

Qwen/Qwen-14B

qwen-14b-chat

qwen/Qwen-14B-Chat

c_attn

qwen

-

Qwen/Qwen-14B-Chat

qwen-14b-chat-int4

qwen/Qwen-14B-Chat-Int4

c_attn

qwen

auto_gptq>=0.5

-

Qwen/Qwen-14B-Chat-Int4

qwen-14b-chat-int8

qwen/Qwen-14B-Chat-Int8

c_attn

qwen

auto_gptq>=0.5

-

Qwen/Qwen-14B-Chat-Int8

qwen-72b

qwen/Qwen-72B

c_attn

default-generation

-

Qwen/Qwen-72B

qwen-72b-chat

qwen/Qwen-72B-Chat

c_attn

qwen

-

Qwen/Qwen-72B-Chat

qwen-72b-chat-int4

qwen/Qwen-72B-Chat-Int4

c_attn

qwen

auto_gptq>=0.5

-

Qwen/Qwen-72B-Chat-Int4

qwen-72b-chat-int8

qwen/Qwen-72B-Chat-Int8

c_attn

qwen

auto_gptq>=0.5

-

Qwen/Qwen-72B-Chat-Int8

modelscope-agent-7b

iic/ModelScope-Agent-7B

c_attn

modelscope-agent

-

-

modelscope-agent-14b

iic/ModelScope-Agent-14B

c_attn

modelscope-agent

-

-

qwen1half-0_5b

qwen/Qwen1.5-0.5B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen1.5-0.5B

qwen1half-1_8b

qwen/Qwen1.5-1.8B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen1.5-1.8B

qwen1half-4b

qwen/Qwen1.5-4B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen1.5-4B

qwen1half-7b

qwen/Qwen1.5-7B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen1.5-7B

qwen1half-14b

qwen/Qwen1.5-14B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen1.5-14B

qwen1half-32b

qwen/Qwen1.5-32B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen1.5-32B

qwen1half-72b

qwen/Qwen1.5-72B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen1.5-72B

qwen1half-110b

qwen/Qwen1.5-110B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen1.5-110B

codeqwen1half-7b

qwen/CodeQwen1.5-7B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/CodeQwen1.5-7B

qwen1half-moe-a2_7b

qwen/Qwen1.5-MoE-A2.7B

q_proj, k_proj, v_proj

default-generation

transformers>=4.40

moe

Qwen/Qwen1.5-MoE-A2.7B

qwen1half-0_5b-chat

qwen/Qwen1.5-0.5B-Chat

q_proj, k_proj, v_proj

qwen

transformers>=4.37

-

Qwen/Qwen1.5-0.5B-Chat

qwen1half-1_8b-chat

qwen/Qwen1.5-1.8B-Chat

q_proj, k_proj, v_proj

qwen

transformers>=4.37

-

Qwen/Qwen1.5-1.8B-Chat

qwen1half-4b-chat

qwen/Qwen1.5-4B-Chat

q_proj, k_proj, v_proj

qwen

transformers>=4.37

-

Qwen/Qwen1.5-4B-Chat

qwen1half-7b-chat

qwen/Qwen1.5-7B-Chat

q_proj, k_proj, v_proj

qwen

transformers>=4.37

-

Qwen/Qwen1.5-7B-Chat

qwen1half-14b-chat

qwen/Qwen1.5-14B-Chat

q_proj, k_proj, v_proj

qwen

transformers>=4.37

-

Qwen/Qwen1.5-14B-Chat

qwen1half-32b-chat

qwen/Qwen1.5-32B-Chat

q_proj, k_proj, v_proj

qwen

transformers>=4.37

-

Qwen/Qwen1.5-32B-Chat

qwen1half-72b-chat

qwen/Qwen1.5-72B-Chat

q_proj, k_proj, v_proj

qwen

transformers>=4.37

-

Qwen/Qwen1.5-72B-Chat

qwen1half-110b-chat

qwen/Qwen1.5-110B-Chat

q_proj, k_proj, v_proj

qwen

transformers>=4.37

-

Qwen/Qwen1.5-110B-Chat

qwen1half-moe-a2_7b-chat

qwen/Qwen1.5-MoE-A2.7B-Chat

q_proj, k_proj, v_proj

qwen

transformers>=4.40

moe

Qwen/Qwen1.5-MoE-A2.7B-Chat

codeqwen1half-7b-chat

qwen/CodeQwen1.5-7B-Chat

q_proj, k_proj, v_proj

qwen

transformers>=4.37

-

Qwen/CodeQwen1.5-7B-Chat

qwen1half-0_5b-chat-int4

qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4

qwen1half-1_8b-chat-int4

qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4

qwen1half-4b-chat-int4

qwen/Qwen1.5-4B-Chat-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-4B-Chat-GPTQ-Int4

qwen1half-7b-chat-int4

qwen/Qwen1.5-7B-Chat-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-7B-Chat-GPTQ-Int4

qwen1half-14b-chat-int4

qwen/Qwen1.5-14B-Chat-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-14B-Chat-GPTQ-Int4

qwen1half-32b-chat-int4

qwen/Qwen1.5-32B-Chat-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-32B-Chat-GPTQ-Int4

qwen1half-72b-chat-int4

qwen/Qwen1.5-72B-Chat-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-72B-Chat-GPTQ-Int4

qwen1half-110b-chat-int4

qwen/Qwen1.5-110B-Chat-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-110B-Chat-GPTQ-Int4

qwen1half-0_5b-chat-int8

qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8

q_proj, k_proj, v_proj

qwen

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8

qwen1half-1_8b-chat-int8

qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8

q_proj, k_proj, v_proj

qwen

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8

qwen1half-4b-chat-int8

qwen/Qwen1.5-4B-Chat-GPTQ-Int8

q_proj, k_proj, v_proj

qwen

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-4B-Chat-GPTQ-Int8

qwen1half-7b-chat-int8

qwen/Qwen1.5-7B-Chat-GPTQ-Int8

q_proj, k_proj, v_proj

qwen

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-7B-Chat-GPTQ-Int8

qwen1half-14b-chat-int8

qwen/Qwen1.5-14B-Chat-GPTQ-Int8

q_proj, k_proj, v_proj

qwen

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-14B-Chat-GPTQ-Int8

qwen1half-72b-chat-int8

qwen/Qwen1.5-72B-Chat-GPTQ-Int8

q_proj, k_proj, v_proj

qwen

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen1.5-72B-Chat-GPTQ-Int8

qwen1half-moe-a2_7b-chat-int4

qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

auto_gptq>=0.5, transformers>=4.40

moe

Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4

qwen1half-0_5b-chat-awq

qwen/Qwen1.5-0.5B-Chat-AWQ

q_proj, k_proj, v_proj

qwen

transformers>=4.37, autoawq

-

Qwen/Qwen1.5-0.5B-Chat-AWQ

qwen1half-1_8b-chat-awq

qwen/Qwen1.5-1.8B-Chat-AWQ

q_proj, k_proj, v_proj

qwen

transformers>=4.37, autoawq

-

Qwen/Qwen1.5-1.8B-Chat-AWQ

qwen1half-4b-chat-awq

qwen/Qwen1.5-4B-Chat-AWQ

q_proj, k_proj, v_proj

qwen

transformers>=4.37, autoawq

-

Qwen/Qwen1.5-4B-Chat-AWQ

qwen1half-7b-chat-awq

qwen/Qwen1.5-7B-Chat-AWQ

q_proj, k_proj, v_proj

qwen

transformers>=4.37, autoawq

-

Qwen/Qwen1.5-7B-Chat-AWQ

qwen1half-14b-chat-awq

qwen/Qwen1.5-14B-Chat-AWQ

q_proj, k_proj, v_proj

qwen

transformers>=4.37, autoawq

-

Qwen/Qwen1.5-14B-Chat-AWQ

qwen1half-32b-chat-awq

qwen/Qwen1.5-32B-Chat-AWQ

q_proj, k_proj, v_proj

qwen

transformers>=4.37, autoawq

-

Qwen/Qwen1.5-32B-Chat-AWQ

qwen1half-72b-chat-awq

qwen/Qwen1.5-72B-Chat-AWQ

q_proj, k_proj, v_proj

qwen

transformers>=4.37, autoawq

-

Qwen/Qwen1.5-72B-Chat-AWQ

qwen1half-110b-chat-awq

qwen/Qwen1.5-110B-Chat-AWQ

q_proj, k_proj, v_proj

qwen

transformers>=4.37, autoawq

-

Qwen/Qwen1.5-110B-Chat-AWQ

codeqwen1half-7b-chat-awq

qwen/CodeQwen1.5-7B-Chat-AWQ

q_proj, k_proj, v_proj

qwen

transformers>=4.37, autoawq

-

Qwen/CodeQwen1.5-7B-Chat-AWQ

qwen2-0_5b

qwen/Qwen2-0.5B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen2-0.5B

qwen2-0_5b-instruct

qwen/Qwen2-0.5B-Instruct

q_proj, k_proj, v_proj

qwen

transformers>=4.37

-

Qwen/Qwen2-0.5B-Instruct

qwen2-0_5b-instruct-int4

qwen/Qwen2-0.5B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2-0.5B-Instruct-GPTQ-Int4

qwen2-0_5b-instruct-int8

qwen/Qwen2-0.5B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2-0.5B-Instruct-GPTQ-Int8

qwen2-0_5b-instruct-awq

qwen/Qwen2-0.5B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen

transformers>=4.37, autoawq

-

Qwen/Qwen2-0.5B-Instruct-AWQ

qwen2-1_5b

qwen/Qwen2-1.5B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen2-1.5B

qwen2-1_5b-instruct

qwen/Qwen2-1.5B-Instruct

q_proj, k_proj, v_proj

qwen

transformers>=4.37

-

Qwen/Qwen2-1.5B-Instruct

qwen2-1_5b-instruct-int4

qwen/Qwen2-1.5B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4

qwen2-1_5b-instruct-int8

qwen/Qwen2-1.5B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2-1_5B-Instruct-GPTQ-Int8

qwen2-1_5b-instruct-awq

qwen/Qwen2-1.5B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen

transformers>=4.37, autoawq

-

Qwen/Qwen2-1.5B-Instruct-AWQ

qwen2-7b

qwen/Qwen2-7B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen2-7B

qwen2-7b-instruct

qwen/Qwen2-7B-Instruct

q_proj, k_proj, v_proj

qwen

transformers>=4.37

-

Qwen/Qwen2-7B-Instruct

qwen2-7b-instruct-int4

qwen/Qwen2-7B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2-7B-Instruct-GPTQ-Int4

qwen2-7b-instruct-int8

qwen/Qwen2-7B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2-7B-Instruct-GPTQ-Int8

qwen2-7b-instruct-awq

qwen/Qwen2-7B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen

transformers>=4.37, autoawq

-

Qwen/Qwen2-7B-Instruct-AWQ

qwen2-72b

qwen/Qwen2-72B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen2-72B

qwen2-72b-instruct

qwen/Qwen2-72B-Instruct

q_proj, k_proj, v_proj

qwen

transformers>=4.37

-

Qwen/Qwen2-72B-Instruct

qwen2-72b-instruct-int4

qwen/Qwen2-72B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2-72B-Instruct-GPTQ-Int4

qwen2-72b-instruct-int8

qwen/Qwen2-72B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2-72B-Instruct-GPTQ-Int8

qwen2-72b-instruct-awq

qwen/Qwen2-72B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen

transformers>=4.37, autoawq

-

Qwen/Qwen2-72B-Instruct-AWQ

qwen2-57b-a14b

qwen/Qwen2-57B-A14B

q_proj, k_proj, v_proj

default-generation

transformers>=4.40

moe

Qwen/Qwen2-57B-A14B

qwen2-57b-a14b-instruct

qwen/Qwen2-57B-A14B-Instruct

q_proj, k_proj, v_proj

qwen

transformers>=4.40

moe

Qwen/Qwen2-57B-A14B-Instruct

qwen2-57b-a14b-instruct-int4

qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen

auto_gptq>=0.5, transformers>=4.40

moe

Qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4

qwen2-math-1_5b

qwen/Qwen2-Math-1.5B

q_proj, k_proj, v_proj

qwen

transformers>=4.37

-

Qwen/Qwen2-Math-1.5B

qwen2-math-1_5b-instruct

qwen/Qwen2-Math-1.5B-Instruct

q_proj, k_proj, v_proj

qwen

transformers>=4.37

-

Qwen/Qwen2-Math-1.5B-Instruct

qwen2-math-7b

qwen/Qwen2-Math-7B

q_proj, k_proj, v_proj

qwen

transformers>=4.37

-

Qwen/Qwen2-Math-7B

qwen2-math-7b-instruct

qwen/Qwen2-Math-7B-Instruct

q_proj, k_proj, v_proj

qwen

transformers>=4.37

-

Qwen/Qwen2-Math-7B-Instruct

qwen2-math-72b

qwen/Qwen2-Math-72B

q_proj, k_proj, v_proj

qwen

transformers>=4.37

-

Qwen/Qwen2-Math-72B

qwen2-math-72b-instruct

qwen/Qwen2-Math-72B-Instruct

q_proj, k_proj, v_proj

qwen

transformers>=4.37

-

Qwen/Qwen2-Math-72B-Instruct

qwen2_5-0_5b

qwen/Qwen2.5-0.5B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen2.5-0.5B

qwen2_5-1_5b

qwen/Qwen2.5-1.5B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen2.5-1.5B

qwen2_5-3b

qwen/Qwen2.5-3B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen2.5-3B

qwen2_5-7b

qwen/Qwen2.5-7B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen2.5-7B

qwen2_5-14b

qwen/Qwen2.5-14B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen2.5-14B

qwen2_5-32b

qwen/Qwen2.5-32B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen2.5-32B

qwen2_5-72b

qwen/Qwen2.5-72B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen2.5-72B

qwen2_5-0_5b-instruct

qwen/Qwen2.5-0.5B-Instruct

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-0.5B-Instruct

qwen2_5-1_5b-instruct

qwen/Qwen2.5-1.5B-Instruct

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-1.5B-Instruct

qwen2_5-3b-instruct

qwen/Qwen2.5-3B-Instruct

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-3B-Instruct

qwen2_5-7b-instruct

qwen/Qwen2.5-7B-Instruct

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-7B-Instruct

qwen2_5-14b-instruct

qwen/Qwen2.5-14B-Instruct

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-14B-Instruct

qwen2_5-32b-instruct

qwen/Qwen2.5-32B-Instruct

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-32B-Instruct

qwen2_5-72b-instruct

qwen/Qwen2.5-72B-Instruct

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-72B-Instruct

qwen2_5-0_5b-instruct-gptq-int4

qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4

qwen2_5-1_5b-instruct-gptq-int4

qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4

qwen2_5-3b-instruct-gptq-int4

qwen/Qwen2.5-3B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-3B-Instruct-GPTQ-Int4

qwen2_5-7b-instruct-gptq-int4

qwen/Qwen2.5-7B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4

qwen2_5-14b-instruct-gptq-int4

qwen/Qwen2.5-14B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4

qwen2_5-32b-instruct-gptq-int4

qwen/Qwen2.5-32B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4

qwen2_5-72b-instruct-gptq-int4

qwen/Qwen2.5-72B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-72B-Instruct-GPTQ-Int4

qwen2_5-0_5b-instruct-gptq-int8

qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8

qwen2_5-1_5b-instruct-gptq-int8

qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8

qwen2_5-3b-instruct-gptq-int8

qwen/Qwen2.5-3B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-3B-Instruct-GPTQ-Int8

qwen2_5-7b-instruct-gptq-int8

qwen/Qwen2.5-7B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8

qwen2_5-14b-instruct-gptq-int8

qwen/Qwen2.5-14B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8

qwen2_5-32b-instruct-gptq-int8

qwen/Qwen2.5-32B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8

qwen2_5-72b-instruct-gptq-int8

qwen/Qwen2.5-72B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-72B-Instruct-GPTQ-Int8

qwen2_5-0_5b-instruct-awq

qwen/Qwen2.5-0.5B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-0.5B-Instruct-AWQ

qwen2_5-1_5b-instruct-awq

qwen/Qwen2.5-1.5B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-1.5B-Instruct-AWQ

qwen2_5-3b-instruct-awq

qwen/Qwen2.5-3B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-3B-Instruct-AWQ

qwen2_5-7b-instruct-awq

qwen/Qwen2.5-7B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-7B-Instruct-AWQ

qwen2_5-14b-instruct-awq

qwen/Qwen2.5-14B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-14B-Instruct-AWQ

qwen2_5-32b-instruct-awq

qwen/Qwen2.5-32B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-32B-Instruct-AWQ

qwen2_5-72b-instruct-awq

qwen/Qwen2.5-72B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-72B-Instruct-AWQ

qwen2_5-math-1_5b

qwen/Qwen2.5-Math-1.5B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen2.5-Math-1.5B

qwen2_5-math-7b

qwen/Qwen2.5-Math-7B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen2.5-Math-7B

qwen2_5-math-72b

qwen/Qwen2.5-Math-72B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen2.5-Math-72B

qwen2_5-math-1_5b-instruct

qwen/Qwen2.5-Math-1.5B-Instruct

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-Math-1.5B-Instruct

qwen2_5-math-7b-instruct

qwen/Qwen2.5-Math-7B-Instruct

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-Math-7B-Instruct

qwen2_5-math-72b-instruct

qwen/Qwen2.5-Math-72B-Instruct

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-Math-72B-Instruct

qwen2_5-coder-0_5b

qwen/Qwen2.5-Coder-0.5B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen2.5-Coder-0.5B

qwen2_5-coder-0_5b-instruct

qwen/Qwen2.5-Coder-0.5B-Instruct

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-Coder-0.5B-Instruct

qwen2_5-coder-0_5b-instruct-gptq-int4

qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4

qwen2_5-coder-0_5b-instruct-gptq-int8

qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8

qwen2_5-coder-0_5b-instruct-awq

qwen/Qwen2.5-Coder-0.5B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-0.5B-Instruct-AWQ

qwen2_5-coder-1_5b

qwen/Qwen2.5-Coder-1.5B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen2.5-Coder-1.5B

qwen2_5-coder-1_5b-instruct

qwen/Qwen2.5-Coder-1.5B-Instruct

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-Coder-1.5B-Instruct

qwen2_5-coder-1_5b-instruct-gptq-int4

qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4

qwen2_5-coder-1_5b-instruct-gptq-int8

qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8

qwen2_5-coder-1_5b-instruct-awq

qwen/Qwen2.5-Coder-1.5B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-1.5B-Instruct-AWQ

qwen2_5-coder-3b

qwen/Qwen2.5-Coder-3B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen2.5-Coder-3B

qwen2_5-coder-3b-instruct

qwen/Qwen2.5-Coder-3B-Instruct

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-Coder-3B-Instruct

qwen2_5-coder-3b-instruct-gptq-int4

qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-3B-Instruct-GPTQ-Int4

qwen2_5-coder-3b-instruct-gptq-int8

qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-3B-Instruct-GPTQ-Int8

qwen2_5-coder-3b-instruct-awq

qwen/Qwen2.5-Coder-3B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-3B-Instruct-AWQ

qwen2_5-coder-7b

qwen/Qwen2.5-Coder-7B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen2.5-Coder-7B

qwen2_5-coder-7b-instruct

qwen/Qwen2.5-Coder-7B-Instruct

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-Coder-7B-Instruct

qwen2_5-coder-7b-instruct-gptq-int4

qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4

qwen2_5-coder-7b-instruct-gptq-int8

qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8

qwen2_5-coder-7b-instruct-awq

qwen/Qwen2.5-Coder-7B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-7B-Instruct-AWQ

qwen2_5-coder-14b

qwen/Qwen2.5-Coder-14B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen2.5-Coder-14B

qwen2_5-coder-14b-instruct

qwen/Qwen2.5-Coder-14B-Instruct

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-Coder-14B-Instruct

qwen2_5-coder-14b-instruct-gptq-int4

qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4

qwen2_5-coder-14b-instruct-gptq-int8

qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8

qwen2_5-coder-14b-instruct-awq

qwen/Qwen2.5-Coder-14B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-14B-Instruct-AWQ

qwen2_5-coder-32b

qwen/Qwen2.5-Coder-32B

q_proj, k_proj, v_proj

default-generation

transformers>=4.37

-

Qwen/Qwen2.5-Coder-32B

qwen2_5-coder-32b-instruct

qwen/Qwen2.5-Coder-32B-Instruct

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-Coder-32B-Instruct

qwen2_5-coder-32b-instruct-gptq-int4

qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4

qwen2_5-coder-32b-instruct-gptq-int8

qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

qwen2_5

auto_gptq>=0.5, transformers>=4.37

-

Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8

qwen2_5-coder-32b-instruct-awq

qwen/Qwen2.5-Coder-32B-Instruct-AWQ

q_proj, k_proj, v_proj

qwen2_5

transformers>=4.37, autoawq

-

Qwen/Qwen2.5-32B-Instruct-AWQ

qwq-32b-preview

Qwen/QwQ-32B-Preview

q_proj, k_proj, v_proj

qwq

transformers>=4.37

-

Qwen/QwQ-32B-Preview

marco-o1

AIDC-AI/Marco-o1

q_proj, k_proj, v_proj

marco_o1

transformers>=4.37

-

AIDC-AI/Marco-o1

chatglm2-6b

ZhipuAI/chatglm2-6b

query_key_value

chatglm2

transformers<4.42

-

THUDM/chatglm2-6b

chatglm2-6b-32k

ZhipuAI/chatglm2-6b-32k

query_key_value

chatglm2

transformers<4.42

-

THUDM/chatglm2-6b-32k

chatglm3-6b-base

ZhipuAI/chatglm3-6b-base

query_key_value

chatglm-generation

transformers<4.42

-

THUDM/chatglm3-6b-base

chatglm3-6b

ZhipuAI/chatglm3-6b

query_key_value

chatglm3

transformers<4.42

-

THUDM/chatglm3-6b

chatglm3-6b-32k

ZhipuAI/chatglm3-6b-32k

query_key_value

chatglm3

transformers<4.42

-

THUDM/chatglm3-6b-32k

chatglm3-6b-128k

ZhipuAI/chatglm3-6b-128k

query_key_value

chatglm3

transformers<4.42

-

THUDM/chatglm3-6b-128k

codegeex2-6b

ZhipuAI/codegeex2-6b

query_key_value

chatglm-generation

transformers<4.34

coding

THUDM/codegeex2-6b

glm4-9b

ZhipuAI/glm-4-9b

query_key_value

chatglm-generation

transformers>=4.42

-

THUDM/glm-4-9b

glm4-9b-chat

ZhipuAI/glm-4-9b-chat

query_key_value

chatglm4

transformers>=4.42

-

THUDM/glm-4-9b-chat

glm4-9b-chat-1m

ZhipuAI/glm-4-9b-chat-1m

query_key_value

chatglm4

transformers>=4.42

-

THUDM/glm-4-9b-chat-1m

codegeex4-9b-chat

ZhipuAI/codegeex4-all-9b

query_key_value

codegeex4

transformers<4.42

coding

THUDM/codegeex4-all-9b

glm-edge-1_5b-chat

ZhipuAI/glm-edge-1.5b-chat

q_proj, k_proj, v_proj

chatglm4

transformers>=4.46

-

THUDM/glm-edge-1.5b-chat

glm-edge-4b-chat

ZhipuAI/glm-edge-4b-chat

q_proj, k_proj, v_proj

chatglm4

transformers>=4.46

-

THUDM/glm-edge-4b-chat

llama2-7b

modelscope/Llama-2-7b-ms

q_proj, k_proj, v_proj

default-generation

-

meta-llama/Llama-2-7b-hf

llama2-7b-chat

modelscope/Llama-2-7b-chat-ms

q_proj, k_proj, v_proj

llama

-

meta-llama/Llama-2-7b-chat-hf

llama2-13b

modelscope/Llama-2-13b-ms

q_proj, k_proj, v_proj

default-generation

-

meta-llama/Llama-2-13b-hf

llama2-13b-chat

modelscope/Llama-2-13b-chat-ms

q_proj, k_proj, v_proj

llama

-

meta-llama/Llama-2-13b-chat-hf

llama2-70b

modelscope/Llama-2-70b-ms

q_proj, k_proj, v_proj

default-generation

-

meta-llama/Llama-2-70b-hf

llama2-70b-chat

modelscope/Llama-2-70b-chat-ms

q_proj, k_proj, v_proj

llama

-

meta-llama/Llama-2-70b-chat-hf

llama2-7b-aqlm-2bit-1x16

AI-ModelScope/Llama-2-7b-AQLM-2Bit-1x16-hf

q_proj, k_proj, v_proj

default-generation

transformers>=4.38, aqlm, torch>=2.2.0

-

ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf

llama3-8b

LLM-Research/Meta-Llama-3-8B

q_proj, k_proj, v_proj

default-generation

-

meta-llama/Meta-Llama-3-8B

llama3-8b-instruct

LLM-Research/Meta-Llama-3-8B-Instruct

q_proj, k_proj, v_proj

llama3

-

meta-llama/Meta-Llama-3-8B-Instruct

llama3-8b-instruct-int4

swift/Meta-Llama-3-8B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

llama3

auto_gptq

-

study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int4

llama3-8b-instruct-int8

swift/Meta-Llama-3-8B-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

llama3

auto_gptq

-

study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int8

llama3-8b-instruct-awq

swift/Meta-Llama-3-8B-Instruct-AWQ

q_proj, k_proj, v_proj

llama3

autoawq

-

study-hjt/Meta-Llama-3-8B-Instruct-AWQ

llama3-70b

LLM-Research/Meta-Llama-3-70B

q_proj, k_proj, v_proj

default-generation

-

meta-llama/Meta-Llama-3-70B

llama3-70b-instruct

LLM-Research/Meta-Llama-3-70B-Instruct

q_proj, k_proj, v_proj

llama3

-

meta-llama/Meta-Llama-3-70B-Instruct

llama3-70b-instruct-int4

swift/Meta-Llama-3-70B-Instruct-GPTQ-Int4

q_proj, k_proj, v_proj

llama3

auto_gptq

-

study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int4

llama3-70b-instruct-int8

swift/Meta-Llama-3-70b-Instruct-GPTQ-Int8

q_proj, k_proj, v_proj

llama3

auto_gptq

-

study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int8

llama3-70b-instruct-awq

swift/Meta-Llama-3-70B-Instruct-AWQ

q_proj, k_proj, v_proj

llama3

autoawq

-

study-hjt/Meta-Llama-3-70B-Instruct-AWQ

llama3_1-8b

LLM-Research/Meta-Llama-3.1-8B

q_proj, k_proj, v_proj

default-generation

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-8B

llama3_1-8b-instruct

LLM-Research/Meta-Llama-3.1-8B-Instruct

q_proj, k_proj, v_proj

llama3

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-8B-Instruct

llama3_1-8b-instruct-awq

LLM-Research/Meta-Llama-3.1-8B-Instruct-AWQ-INT4

q_proj, k_proj, v_proj

llama3

transformers>=4.43, autoawq

-

hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4

llama3_1-8b-instruct-gptq-int4

LLM-Research/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4

q_proj, k_proj, v_proj

llama3

transformers>=4.43, auto_gptq

-

hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4

llama3_1-8b-instruct-bnb

LLM-Research/Meta-Llama-3.1-8B-Instruct-BNB-NF4

q_proj, k_proj, v_proj

llama3

transformers>=4.43, bitsandbytes

-

hugging-quants/Meta-Llama-3.1-8B-Instruct-BNB-NF4

llama3_1-70b

LLM-Research/Meta-Llama-3.1-70B

q_proj, k_proj, v_proj

default-generation

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-70B

llama3_1-70b-instruct

LLM-Research/Meta-Llama-3.1-70B-Instruct

q_proj, k_proj, v_proj

llama3

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-70B-Instruct

llama3_1-70b-instruct-fp8

LLM-Research/Meta-Llama-3.1-70B-Instruct-FP8

q_proj, k_proj, v_proj

llama3

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-70B-Instruct-FP8

llama3_1-70b-instruct-awq

LLM-Research/Meta-Llama-3.1-70B-Instruct-AWQ-INT4

q_proj, k_proj, v_proj

llama3

transformers>=4.43, autoawq

-

hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4

llama3_1-70b-instruct-gptq-int4

LLM-Research/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4

q_proj, k_proj, v_proj

llama3

transformers>=4.43, auto_gptq

-

hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4

llama3_1-70b-instruct-bnb

LLM-Research/Meta-Llama-3.1-70B-Instruct-bnb-4bit

q_proj, k_proj, v_proj

llama3

transformers>=4.43, bitsandbytes

-

unsloth/Meta-Llama-3.1-70B-Instruct-bnb-4bit

llama3_1-405b

LLM-Research/Meta-Llama-3.1-405B

q_proj, k_proj, v_proj

default-generation

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-405B

llama3_1-405b-instruct

LLM-Research/Meta-Llama-3.1-405B-Instruct

q_proj, k_proj, v_proj

llama3

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-405B-Instruct

llama3_1-405b-instruct-fp8

LLM-Research/Meta-Llama-3.1-405B-Instruct-FP8

q_proj, k_proj, v_proj

llama3

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-405B-Instruct-FP8

llama3_1-405b-instruct-awq

LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4

q_proj, k_proj, v_proj

llama3

transformers>=4.43, autoawq

-

hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4

llama3_1-405b-instruct-gptq-int4

LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4

q_proj, k_proj, v_proj

llama3

transformers>=4.43, auto_gptq

-

hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4

llama3_1-405b-instruct-bnb

LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4

q_proj, k_proj, v_proj

llama3

transformers>=4.43, bitsandbytes

-

hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4

llama-3.1-nemotron-70B-instruct-hf

AI-ModelScope/Llama-3.1-Nemotron-70B-Instruct-HF

q_proj, k_proj, v_proj

llama3

transformers>=4.43

-

nvidia/Llama-3.1-Nemotron-70B-Instruct-HF

llama3_2-1b

LLM-Research/Llama-3.2-1B

q_proj, k_proj, v_proj

default-generation

transformers>=4.45

-

meta-llama/Llama-3.2-1B

llama3_2-1b-instruct

LLM-Research/Llama-3.2-1B-Instruct

q_proj, k_proj, v_proj

llama3_2

transformers>=4.45

-

meta-llama/Llama-3.2-1B-Instruct

llama3_2-3b

LLM-Research/Llama-3.2-3B

q_proj, k_proj, v_proj

default-generation

transformers>=4.45

-

meta-llama/Llama-3.2-3B

llama3_2-3b-instruct

LLM-Research/Llama-3.2-3B-Instruct

q_proj, k_proj, v_proj

llama3_2

transformers>=4.45

-

meta-llama/Llama-3.2-3B-Instruct

reflection-llama_3_1-70b

LLM-Research/Reflection-Llama-3.1-70B

q_proj, k_proj, v_proj

reflection

transformers>=4.43

-

mattshumer/Reflection-Llama-3.1-70B

longwriter-glm4-9b

ZhipuAI/LongWriter-glm4-9b

query_key_value

chatglm4

transformers>=4.42

-

THUDM/LongWriter-glm4-9b

longwriter-llama3_1-8b

ZhipuAI/LongWriter-llama3.1-8b

q_proj, k_proj, v_proj

longwriter-llama3

transformers>=4.43

-

THUDM/LongWriter-llama3.1-8b

chinese-llama-2-1_3b

AI-ModelScope/chinese-llama-2-1.3b

q_proj, k_proj, v_proj

default-generation

-

hfl/chinese-llama-2-1.3b

chinese-llama-2-7b

AI-ModelScope/chinese-llama-2-7b

q_proj, k_proj, v_proj

default-generation

-

hfl/chinese-llama-2-7b

chinese-llama-2-7b-16k

AI-ModelScope/chinese-llama-2-7b-16k

q_proj, k_proj, v_proj

default-generation

-

hfl/chinese-llama-2-7b-16k

chinese-llama-2-7b-64k

AI-ModelScope/chinese-llama-2-7b-64k

q_proj, k_proj, v_proj

default-generation

-

hfl/chinese-llama-2-7b-64k

chinese-llama-2-13b

AI-ModelScope/chinese-llama-2-13b

q_proj, k_proj, v_proj

default-generation

-

hfl/chinese-llama-2-13b

chinese-llama-2-13b-16k

AI-ModelScope/chinese-llama-2-13b-16k

q_proj, k_proj, v_proj

default-generation

-

hfl/chinese-llama-2-13b-16k

chinese-alpaca-2-1_3b

AI-ModelScope/chinese-alpaca-2-1.3b

q_proj, k_proj, v_proj

llama

-

hfl/chinese-alpaca-2-1.3b

chinese-alpaca-2-7b

AI-ModelScope/chinese-alpaca-2-7b

q_proj, k_proj, v_proj

llama

-

hfl/chinese-alpaca-2-7b

chinese-alpaca-2-7b-16k

AI-ModelScope/chinese-alpaca-2-7b-16k

q_proj, k_proj, v_proj

llama

-

hfl/chinese-alpaca-2-7b-16k

chinese-alpaca-2-7b-64k

AI-ModelScope/chinese-alpaca-2-7b-64k

q_proj, k_proj, v_proj

llama

-

hfl/chinese-alpaca-2-7b-64k

chinese-alpaca-2-13b

AI-ModelScope/chinese-alpaca-2-13b

q_proj, k_proj, v_proj

llama

-

hfl/chinese-alpaca-2-13b

chinese-alpaca-2-13b-16k

AI-ModelScope/chinese-alpaca-2-13b-16k

q_proj, k_proj, v_proj

llama

-

hfl/chinese-alpaca-2-13b-16k

llama-3-chinese-8b

ChineseAlpacaGroup/llama-3-chinese-8b

q_proj, k_proj, v_proj

default-generation

-

hfl/llama-3-chinese-8b

llama-3-chinese-8b-instruct

ChineseAlpacaGroup/llama-3-chinese-8b-instruct

q_proj, k_proj, v_proj

llama3

-

hfl/llama-3-chinese-8b-instruct

atom-7b

FlagAlpha/Atom-7B

q_proj, k_proj, v_proj

default-generation

-

FlagAlpha/Atom-7B

atom-7b-chat

FlagAlpha/Atom-7B-Chat

q_proj, k_proj, v_proj

atom

-

FlagAlpha/Atom-7B-Chat

yi-6b

01ai/Yi-6B

q_proj, k_proj, v_proj

default-generation

-

01-ai/Yi-6B

yi-6b-200k

01ai/Yi-6B-200K

q_proj, k_proj, v_proj

default-generation

-

01-ai/Yi-6B-200K

yi-6b-chat

01ai/Yi-6B-Chat

q_proj, k_proj, v_proj

chatml

-

01-ai/Yi-6B-Chat

yi-6b-chat-awq

01ai/Yi-6B-Chat-4bits

q_proj, k_proj, v_proj

chatml

autoawq

-

01-ai/Yi-6B-Chat-4bits

yi-6b-chat-int8

01ai/Yi-6B-Chat-8bits

q_proj, k_proj, v_proj

chatml

auto_gptq

-

01-ai/Yi-6B-Chat-8bits

yi-9b

01ai/Yi-9B

q_proj, k_proj, v_proj

default-generation

-

01-ai/Yi-9B

yi-9b-200k

01ai/Yi-9B-200K

q_proj, k_proj, v_proj

default-generation

-

01-ai/Yi-9B-200K

yi-34b

01ai/Yi-34B

q_proj, k_proj, v_proj

default-generation

-

01-ai/Yi-34B

yi-34b-200k

01ai/Yi-34B-200K

q_proj, k_proj, v_proj

default-generation

-

01-ai/Yi-34B-200K

yi-34b-chat

01ai/Yi-34B-Chat

q_proj, k_proj, v_proj

chatml

-

01-ai/Yi-34B-Chat

yi-34b-chat-awq

01ai/Yi-34B-Chat-4bits

q_proj, k_proj, v_proj

chatml

autoawq

-

01-ai/Yi-34B-Chat-4bits

yi-34b-chat-int8

01ai/Yi-34B-Chat-8bits

q_proj, k_proj, v_proj

chatml

auto_gptq

-

01-ai/Yi-34B-Chat-8bits

yi-1_5-6b

01ai/Yi-1.5-6B

q_proj, k_proj, v_proj

default-generation

-

01-ai/Yi-1.5-6B

yi-1_5-6b-chat

01ai/Yi-1.5-6B-Chat

q_proj, k_proj, v_proj

chatml

-

01-ai/Yi-1.5-6B-Chat

yi-1_5-9b

01ai/Yi-1.5-9B

q_proj, k_proj, v_proj

default-generation

-

01-ai/Yi-1.5-9B

yi-1_5-9b-chat

01ai/Yi-1.5-9B-Chat

q_proj, k_proj, v_proj

chatml

-

01-ai/Yi-1.5-9B-Chat

yi-1_5-9b-chat-16k

01ai/Yi-1.5-9B-Chat-16K

q_proj, k_proj, v_proj

chatml

-

01-ai/Yi-1.5-9B-Chat-16K

yi-1_5-34b

01ai/Yi-1.5-34B

q_proj, k_proj, v_proj

default-generation

-

01-ai/Yi-1.5-34B

yi-1_5-34b-chat

01ai/Yi-1.5-34B-Chat

q_proj, k_proj, v_proj

chatml

-

01-ai/Yi-1.5-34B-Chat

yi-1_5-34b-chat-16k

01ai/Yi-1.5-34B-Chat-16K

q_proj, k_proj, v_proj

chatml

-

01-ai/Yi-1.5-34B-Chat-16K

yi-1_5-6b-chat-awq-int4

AI-ModelScope/Yi-1.5-6B-Chat-AWQ

q_proj, k_proj, v_proj

chatml

autoawq

-

modelscope/Yi-1.5-6B-Chat-AWQ

yi-1_5-6b-chat-gptq-int4

AI-ModelScope/Yi-1.5-6B-Chat-GPTQ

q_proj, k_proj, v_proj

chatml

auto_gptq>=0.5

-

modelscope/Yi-1.5-6B-Chat-GPTQ

yi-1_5-9b-chat-awq-int4

AI-ModelScope/Yi-1.5-9B-Chat-AWQ

q_proj, k_proj, v_proj

chatml

autoawq

-

modelscope/Yi-1.5-9B-Chat-AWQ

yi-1_5-9b-chat-gptq-int4

AI-ModelScope/Yi-1.5-9B-Chat-GPTQ

q_proj, k_proj, v_proj

chatml

auto_gptq>=0.5

-

modelscope/Yi-1.5-9B-Chat-GPTQ

yi-1_5-34b-chat-awq-int4

AI-ModelScope/Yi-1.5-34B-Chat-AWQ

q_proj, k_proj, v_proj

chatml

autoawq

-

modelscope/Yi-1.5-34B-Chat-AWQ

yi-1_5-34b-chat-gptq-int4

AI-ModelScope/Yi-1.5-34B-Chat-GPTQ

q_proj, k_proj, v_proj

chatml

auto_gptq>=0.5

-

modelscope/Yi-1.5-34B-Chat-GPTQ

yi-coder-1_5b

01ai/Yi-Coder-1.5B

q_proj, k_proj, v_proj

default-generation

-

01-ai/Yi-Coder-1.5B

yi-coder-1_5b-chat

01ai/Yi-Coder-1.5B-Chat

q_proj, k_proj, v_proj

yi-coder

-

01-ai/Yi-Coder-1.5B-Chat

yi-coder-9b

01ai/Yi-Coder-9B

q_proj, k_proj, v_proj

default-generation

-

01-ai/Yi-Coder-9B

yi-coder-9b-chat

01ai/Yi-Coder-9B-Chat

q_proj, k_proj, v_proj

yi-coder

-

01-ai/Yi-Coder-9B-Chat

internlm-7b

Shanghai_AI_Laboratory/internlm-7b

q_proj, k_proj, v_proj

default-generation

-

internlm/internlm-7b

internlm-7b-chat

Shanghai_AI_Laboratory/internlm-chat-7b

q_proj, k_proj, v_proj

internlm

-

internlm/internlm-chat-7b

internlm-7b-chat-8k

Shanghai_AI_Laboratory/internlm-chat-7b-8k

q_proj, k_proj, v_proj

internlm

-

-

internlm-20b

Shanghai_AI_Laboratory/internlm-20b

q_proj, k_proj, v_proj

default-generation

-

internlm/internlm-20b

internlm-20b-chat

Shanghai_AI_Laboratory/internlm-chat-20b

q_proj, k_proj, v_proj

internlm

-

internlm/internlm-chat-20b

internlm2-1_8b

Shanghai_AI_Laboratory/internlm2-1_8b

wqkv

default-generation

transformers>=4.38

-

internlm/internlm2-1_8b

internlm2-1_8b-sft-chat

Shanghai_AI_Laboratory/internlm2-chat-1_8b-sft

wqkv

internlm2

transformers>=4.38

-

internlm/internlm2-chat-1_8b-sft

internlm2-1_8b-chat

Shanghai_AI_Laboratory/internlm2-chat-1_8b

wqkv

internlm2

transformers>=4.38

-

internlm/internlm2-chat-1_8b

internlm2-7b-base

Shanghai_AI_Laboratory/internlm2-base-7b

wqkv

default-generation

transformers>=4.38

-

internlm/internlm2-base-7b

internlm2-7b

Shanghai_AI_Laboratory/internlm2-7b

wqkv

default-generation

transformers>=4.38

-

internlm/internlm2-7b

internlm2-7b-sft-chat

Shanghai_AI_Laboratory/internlm2-chat-7b-sft

wqkv

internlm2

transformers>=4.38

-

internlm/internlm2-chat-7b-sft

internlm2-7b-chat

Shanghai_AI_Laboratory/internlm2-chat-7b

wqkv

internlm2

transformers>=4.38

-

internlm/internlm2-chat-7b

internlm2-20b-base

Shanghai_AI_Laboratory/internlm2-base-20b

wqkv

default-generation

transformers>=4.38

-

internlm/internlm2-base-20b

internlm2-20b

Shanghai_AI_Laboratory/internlm2-20b

wqkv

default-generation

transformers>=4.38

-

internlm/internlm2-20b

internlm2-20b-sft-chat

Shanghai_AI_Laboratory/internlm2-chat-20b-sft

wqkv

internlm2

transformers>=4.38

-

internlm/internlm2-chat-20b-sft

internlm2-20b-chat

Shanghai_AI_Laboratory/internlm2-chat-20b

wqkv

internlm2

transformers>=4.38

-

internlm/internlm2-chat-20b

internlm2_5-1_8b

Shanghai_AI_Laboratory/internlm2_5-1_8b

wqkv

default-generation

transformers>=4.38

-

internlm/internlm2_5-1_8b

internlm2_5-1_8b-chat

Shanghai_AI_Laboratory/internlm2_5-1_8b-chat

wqkv

internlm2

transformers>=4.38

-

internlm/internlm2_5-1_8b-chat

internlm2_5-7b

Shanghai_AI_Laboratory/internlm2_5-7b

wqkv

default-generation

transformers>=4.38

-

internlm/internlm2_5-7b

internlm2_5-7b-chat

Shanghai_AI_Laboratory/internlm2_5-7b-chat

wqkv

internlm2

transformers>=4.38

-

internlm/internlm2_5-7b-chat

internlm2_5-7b-chat-1m

Shanghai_AI_Laboratory/internlm2_5-7b-chat-1m

wqkv

internlm2

transformers>=4.38

-

internlm/internlm2_5-7b-chat-1m

internlm2_5-20b

Shanghai_AI_Laboratory/internlm2_5-20b

wqkv

default-generation

transformers>=4.38

-

internlm/internlm2_5-20b

internlm2_5-20b-chat

Shanghai_AI_Laboratory/internlm2_5-20b-chat

wqkv

internlm2

transformers>=4.38

-

internlm/internlm2_5-20b-chat

internlm2-math-7b

Shanghai_AI_Laboratory/internlm2-math-base-7b

wqkv

default-generation

transformers>=4.38

math

internlm/internlm2-math-base-7b

internlm2-math-7b-chat

Shanghai_AI_Laboratory/internlm2-math-7b

wqkv

internlm2

transformers>=4.38

math

internlm/internlm2-math-7b

internlm2-math-20b

Shanghai_AI_Laboratory/internlm2-math-base-20b

wqkv

default-generation

transformers>=4.38

math

internlm/internlm2-math-base-20b

internlm2-math-20b-chat

Shanghai_AI_Laboratory/internlm2-math-20b

wqkv

internlm2

transformers>=4.38

math

internlm/internlm2-math-20b

deepseek-7b

deepseek-ai/deepseek-llm-7b-base

q_proj, k_proj, v_proj

default-generation

-

deepseek-ai/deepseek-llm-7b-base

deepseek-7b-chat

deepseek-ai/deepseek-llm-7b-chat

q_proj, k_proj, v_proj

deepseek

-

deepseek-ai/deepseek-llm-7b-chat

deepseek-moe-16b

deepseek-ai/deepseek-moe-16b-base

q_proj, k_proj, v_proj

default-generation

moe

deepseek-ai/deepseek-moe-16b-base

deepseek-moe-16b-chat

deepseek-ai/deepseek-moe-16b-chat

q_proj, k_proj, v_proj

deepseek

moe

deepseek-ai/deepseek-moe-16b-chat

deepseek-67b

deepseek-ai/deepseek-llm-67b-base

q_proj, k_proj, v_proj

default-generation

-

deepseek-ai/deepseek-llm-67b-base

deepseek-67b-chat

deepseek-ai/deepseek-llm-67b-chat

q_proj, k_proj, v_proj

deepseek

-

deepseek-ai/deepseek-llm-67b-chat

deepseek-coder-1_3b

deepseek-ai/deepseek-coder-1.3b-base

q_proj, k_proj, v_proj

default-generation

coding

deepseek-ai/deepseek-coder-1.3b-base

deepseek-coder-1_3b-instruct

deepseek-ai/deepseek-coder-1.3b-instruct

q_proj, k_proj, v_proj

deepseek-coder

coding

deepseek-ai/deepseek-coder-1.3b-instruct

deepseek-coder-6_7b

deepseek-ai/deepseek-coder-6.7b-base

q_proj, k_proj, v_proj

default-generation

coding

deepseek-ai/deepseek-coder-6.7b-base

deepseek-coder-6_7b-instruct

deepseek-ai/deepseek-coder-6.7b-instruct

q_proj, k_proj, v_proj

deepseek-coder

coding

deepseek-ai/deepseek-coder-6.7b-instruct

deepseek-coder-33b

deepseek-ai/deepseek-coder-33b-base

q_proj, k_proj, v_proj

default-generation

coding

deepseek-ai/deepseek-coder-33b-base

deepseek-coder-33b-instruct

deepseek-ai/deepseek-coder-33b-instruct

q_proj, k_proj, v_proj

deepseek-coder

coding

deepseek-ai/deepseek-coder-33b-instruct

deepseek-coder-v2-instruct

deepseek-ai/DeepSeek-Coder-V2-Instruct

q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj

deepseek2

transformers>=4.39.3

coding, moe

deepseek-ai/DeepSeek-Coder-V2-Instruct

deepseek-coder-v2-lite-instruct

deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj

deepseek2

transformers>=4.39.3

coding, moe

deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

deepseek-coder-v2

deepseek-ai/DeepSeek-Coder-V2-Base

q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj

default-generation

transformers>=4.39.3

coding, moe

deepseek-ai/DeepSeek-Coder-V2-Base

deepseek-coder-v2-lite

deepseek-ai/DeepSeek-Coder-V2-Lite-Base

q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj

default-generation

transformers>=4.39.3

coding, moe

deepseek-ai/DeepSeek-Coder-V2-Lite-Base

deepseek-math-7b

deepseek-ai/deepseek-math-7b-base

q_proj, k_proj, v_proj

default-generation

math

deepseek-ai/deepseek-math-7b-base

deepseek-math-7b-instruct

deepseek-ai/deepseek-math-7b-instruct

q_proj, k_proj, v_proj

deepseek

math

deepseek-ai/deepseek-math-7b-instruct

deepseek-math-7b-chat

deepseek-ai/deepseek-math-7b-rl

q_proj, k_proj, v_proj

deepseek

math

deepseek-ai/deepseek-math-7b-rl

numina-math-7b

AI-ModelScope/NuminaMath-7B-TIR

q_proj, k_proj, v_proj

numina-math

math

AI-MO/NuminaMath-7B-TIR

deepseek-v2

deepseek-ai/DeepSeek-V2

q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj

default-generation

transformers>=4.39.3

moe

deepseek-ai/DeepSeek-V2

deepseek-v2-chat

deepseek-ai/DeepSeek-V2-Chat

q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj

deepseek2

transformers>=4.39.3

moe

deepseek-ai/DeepSeek-V2-Chat

deepseek-v2-lite

deepseek-ai/DeepSeek-V2-Lite

q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj

default-generation

transformers>=4.39.3

moe

deepseek-ai/DeepSeek-V2-Lite

deepseek-v2-lite-chat

deepseek-ai/DeepSeek-V2-Lite-Chat

q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj

deepseek2

transformers>=4.39.3

moe

deepseek-ai/DeepSeek-V2-Lite-Chat

deepseek-v2_5

deepseek-ai/DeepSeek-V2.5

q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj

deepseek2_5

transformers>=4.39.3

moe

deepseek-ai/DeepSeek-V2.5

gemma-2b

AI-ModelScope/gemma-2b

q_proj, k_proj, v_proj

default-generation

transformers>=4.38

-

google/gemma-2b

gemma-7b

AI-ModelScope/gemma-7b

q_proj, k_proj, v_proj

default-generation

transformers>=4.38

-

google/gemma-7b

gemma-2b-instruct

AI-ModelScope/gemma-2b-it

q_proj, k_proj, v_proj

gemma

transformers>=4.38

-

google/gemma-2b-it

gemma-7b-instruct

AI-ModelScope/gemma-7b-it

q_proj, k_proj, v_proj

gemma

transformers>=4.38

-

google/gemma-7b-it

gemma2-2b

LLM-Research/gemma-2-2b

q_proj, k_proj, v_proj

default-generation

transformers>=4.42

-

google/gemma-2-2b

gemma2-9b

LLM-Research/gemma-2-9b

q_proj, k_proj, v_proj

default-generation

transformers>=4.42

-

google/gemma-2-9b

gemma2-27b

LLM-Research/gemma-2-27b

q_proj, k_proj, v_proj

default-generation

transformers>=4.42

-

google/gemma-2-27b

gemma2-2b-instruct

LLM-Research/gemma-2-2b-it

q_proj, k_proj, v_proj

gemma

transformers>=4.42

-

google/gemma-2-2b-it

gemma2-9b-instruct

LLM-Research/gemma-2-9b-it

q_proj, k_proj, v_proj

gemma

transformers>=4.42

-

google/gemma-2-9b-it

gemma2-27b-instruct

LLM-Research/gemma-2-27b-it

q_proj, k_proj, v_proj

gemma

transformers>=4.42

-

google/gemma-2-27b-it

minicpm-1b-sft-chat

OpenBMB/MiniCPM-1B-sft-bf16

q_proj, k_proj, v_proj

minicpm

transformers>=4.36.0

-

openbmb/MiniCPM-1B-sft-bf16

minicpm-2b-sft-chat

OpenBMB/MiniCPM-2B-sft-fp32

q_proj, k_proj, v_proj

minicpm

-

openbmb/MiniCPM-2B-sft-fp32

minicpm-2b-chat

OpenBMB/MiniCPM-2B-dpo-fp32

q_proj, k_proj, v_proj

minicpm

-

openbmb/MiniCPM-2B-dpo-fp32

minicpm-2b-128k

OpenBMB/MiniCPM-2B-128k

q_proj, k_proj, v_proj

chatml

transformers>=4.36.0

-

openbmb/MiniCPM-2B-128k

minicpm-moe-8x2b

OpenBMB/MiniCPM-MoE-8x2B

q_proj, k_proj, v_proj

minicpm

transformers>=4.36.0

moe

openbmb/MiniCPM-MoE-8x2B

minicpm3-4b

OpenBMB/MiniCPM3-4B

q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj

chatml

transformers>=4.36

-

openbmb/MiniCPM3-4B

openbuddy-llama-65b-chat

OpenBuddy/openbuddy-llama-65b-v8-bf16

q_proj, k_proj, v_proj

openbuddy

-

OpenBuddy/openbuddy-llama-65b-v8-bf16

openbuddy-llama2-13b-chat

OpenBuddy/openbuddy-llama2-13b-v8.1-fp16

q_proj, k_proj, v_proj

openbuddy

-

OpenBuddy/openbuddy-llama2-13b-v8.1-fp16

openbuddy-llama2-70b-chat

OpenBuddy/openbuddy-llama2-70b-v10.1-bf16

q_proj, k_proj, v_proj

openbuddy

-

OpenBuddy/openbuddy-llama2-70b-v10.1-bf16

openbuddy-llama3-8b-chat

OpenBuddy/openbuddy-llama3-8b-v21.1-8k

q_proj, k_proj, v_proj

openbuddy2

-

OpenBuddy/openbuddy-llama3-8b-v21.1-8k

openbuddy-llama3-70b-chat

OpenBuddy/openbuddy-llama3-70b-v21.1-8k

q_proj, k_proj, v_proj

openbuddy2

-

OpenBuddy/openbuddy-llama3-70b-v21.1-8k

openbuddy-mistral-7b-chat

OpenBuddy/openbuddy-mistral-7b-v17.1-32k

q_proj, k_proj, v_proj

openbuddy

transformers>=4.34

-

OpenBuddy/openbuddy-mistral-7b-v17.1-32k

openbuddy-zephyr-7b-chat

OpenBuddy/openbuddy-zephyr-7b-v14.1

q_proj, k_proj, v_proj

openbuddy

transformers>=4.34

-

OpenBuddy/openbuddy-zephyr-7b-v14.1

openbuddy-deepseek-67b-chat

OpenBuddy/openbuddy-deepseek-67b-v15.2

q_proj, k_proj, v_proj

openbuddy

-

OpenBuddy/openbuddy-deepseek-67b-v15.2

openbuddy-mixtral-moe-7b-chat

OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k

q_proj, k_proj, v_proj

openbuddy

transformers>=4.36

moe

OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k

openbuddy-llama3_1-8b-chat

OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k

q_proj, k_proj, v_proj

openbuddy2

transformers>=4.43

-

OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k

mistral-7b

AI-ModelScope/Mistral-7B-v0.1

q_proj, k_proj, v_proj

default-generation

transformers>=4.34

-

mistralai/Mistral-7B-v0.1

mistral-7b-v2

AI-ModelScope/Mistral-7B-v0.2-hf

q_proj, k_proj, v_proj

default-generation

transformers>=4.34

-

alpindale/Mistral-7B-v0.2-hf

mistral-7b-instruct

AI-ModelScope/Mistral-7B-Instruct-v0.1

q_proj, k_proj, v_proj

llama

transformers>=4.34

-

mistralai/Mistral-7B-Instruct-v0.1

mistral-7b-instruct-v2

AI-ModelScope/Mistral-7B-Instruct-v0.2

q_proj, k_proj, v_proj

llama

transformers>=4.34

-

mistralai/Mistral-7B-Instruct-v0.2

mistral-7b-instruct-v3

LLM-Research/Mistral-7B-Instruct-v0.3

q_proj, k_proj, v_proj

llama

transformers>=4.34

-

mistralai/Mistral-7B-Instruct-v0.3

mistral-nemo-base-2407

AI-ModelScope/Mistral-Nemo-Base-2407

q_proj, k_proj, v_proj

default-generation

transformers>=4.43

-

mistralai/Mistral-Nemo-Base-2407

mistral-nemo-instruct-2407

AI-ModelScope/Mistral-Nemo-Instruct-2407

q_proj, k_proj, v_proj

mistral-nemo

transformers>=4.43

-

mistralai/Mistral-Nemo-Instruct-2407

mistral-large-instruct-2407

LLM-Research/Mistral-Large-Instruct-2407

q_proj, k_proj, v_proj

mistral-nemo

transformers>=4.43

-

mistralai/Mistral-Large-Instruct-2407

mistral-small-instruct-2409

AI-ModelScope/Mistral-Small-Instruct-2409

q_proj, k_proj, v_proj

mistral-nemo

transformers>=4.43

-

mistralai/Mistral-Small-Instruct-2409

mixtral-moe-7b

AI-ModelScope/Mixtral-8x7B-v0.1

q_proj, k_proj, v_proj

default-generation

transformers>=4.36

moe

mistralai/Mixtral-8x7B-v0.1

mixtral-moe-7b-instruct

AI-ModelScope/Mixtral-8x7B-Instruct-v0.1

q_proj, k_proj, v_proj

llama

transformers>=4.36

moe

mistralai/Mixtral-8x7B-Instruct-v0.1

mixtral-moe-7b-aqlm-2bit-1x16

AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf

q_proj, k_proj, v_proj

default-generation

transformers>=4.38, aqlm, torch>=2.2.0

moe

ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf

mixtral-moe-8x22b-v1

AI-ModelScope/Mixtral-8x22B-v0.1

q_proj, k_proj, v_proj

default-generation

transformers>=4.36

moe

mistral-community/Mixtral-8x22B-v0.1

ministral-8b-instruct-2410

AI-ModelScope/Ministral-8B-Instruct-2410

q_proj, k_proj, v_proj

mistral-nemo

transformers>=4.46

-

mistralai/Ministral-8B-Instruct-2410

wizardlm2-7b-awq

AI-ModelScope/WizardLM-2-7B-AWQ

q_proj, k_proj, v_proj

wizardlm2-awq

transformers>=4.34

-

MaziyarPanahi/WizardLM-2-7B-AWQ

wizardlm2-8x22b

AI-ModelScope/WizardLM-2-8x22B

q_proj, k_proj, v_proj

wizardlm2

transformers>=4.36

-

alpindale/WizardLM-2-8x22B

baichuan-7b

baichuan-inc/baichuan-7B

W_pack

default-generation

transformers<4.34

-

baichuan-inc/Baichuan-7B

baichuan-13b

baichuan-inc/Baichuan-13B-Base

W_pack

default-generation

transformers<4.34

-

baichuan-inc/Baichuan-13B-Base

baichuan-13b-chat

baichuan-inc/Baichuan-13B-Chat

W_pack

baichuan

transformers<4.34

-

baichuan-inc/Baichuan-13B-Chat

baichuan2-7b

baichuan-inc/Baichuan2-7B-Base

W_pack

default-generation

-

baichuan-inc/Baichuan2-7B-Base

baichuan2-7b-chat

baichuan-inc/Baichuan2-7B-Chat

W_pack

baichuan

-

baichuan-inc/Baichuan2-7B-Chat

baichuan2-7b-chat-int4

baichuan-inc/Baichuan2-7B-Chat-4bits

W_pack

baichuan

bitsandbytes<0.41.2, accelerate<0.26

-

baichuan-inc/Baichuan2-7B-Chat-4bits

baichuan2-13b

baichuan-inc/Baichuan2-13B-Base

W_pack

default-generation

-

baichuan-inc/Baichuan2-13B-Base

baichuan2-13b-chat

baichuan-inc/Baichuan2-13B-Chat

W_pack

baichuan

-

baichuan-inc/Baichuan2-13B-Chat

baichuan2-13b-chat-int4

baichuan-inc/Baichuan2-13B-Chat-4bits

W_pack

baichuan

bitsandbytes<0.41.2, accelerate<0.26

-

baichuan-inc/Baichuan2-13B-Chat-4bits

yuan2-2b-instruct

YuanLLM/Yuan2.0-2B-hf

q_proj, k_proj, v_proj

yuan

-

IEITYuan/Yuan2-2B-hf

yuan2-2b-janus-instruct

YuanLLM/Yuan2-2B-Janus-hf

q_proj, k_proj, v_proj

yuan

-

IEITYuan/Yuan2-2B-Janus-hf

yuan2-51b-instruct

YuanLLM/Yuan2.0-51B-hf

q_proj, k_proj, v_proj

yuan

-

IEITYuan/Yuan2-51B-hf

yuan2-102b-instruct

YuanLLM/Yuan2.0-102B-hf

q_proj, k_proj, v_proj

yuan

-

IEITYuan/Yuan2-102B-hf

yuan2-m32

YuanLLM/Yuan2-M32-hf

q_proj, k_proj, v_proj

yuan

moe

IEITYuan/Yuan2-M32-hf

xverse-7b

xverse/XVERSE-7B

q_proj, k_proj, v_proj

default-generation

-

xverse/XVERSE-7B

xverse-7b-chat

xverse/XVERSE-7B-Chat

q_proj, k_proj, v_proj

xverse

-

xverse/XVERSE-7B-Chat

xverse-13b

xverse/XVERSE-13B

q_proj, k_proj, v_proj

default-generation

-

xverse/XVERSE-13B

xverse-13b-chat

xverse/XVERSE-13B-Chat

q_proj, k_proj, v_proj

xverse

-

xverse/XVERSE-13B-Chat

xverse-65b

xverse/XVERSE-65B

q_proj, k_proj, v_proj

default-generation

-

xverse/XVERSE-65B

xverse-65b-v2

xverse/XVERSE-65B-2

q_proj, k_proj, v_proj

default-generation

-

xverse/XVERSE-65B-2

xverse-65b-chat

xverse/XVERSE-65B-Chat

q_proj, k_proj, v_proj

xverse

-

xverse/XVERSE-65B-Chat

xverse-13b-256k

xverse/XVERSE-13B-256K

q_proj, k_proj, v_proj

default-generation

-

xverse/XVERSE-13B-256K

xverse-moe-a4_2b

xverse/XVERSE-MoE-A4.2B

q_proj, k_proj, v_proj

default-generation

moe

xverse/XVERSE-MoE-A4.2B

orion-14b

OrionStarAI/Orion-14B-Base

q_proj, k_proj, v_proj

default-generation

-

OrionStarAI/Orion-14B-Base

orion-14b-chat

OrionStarAI/Orion-14B-Chat

q_proj, k_proj, v_proj

orion

-

OrionStarAI/Orion-14B-Chat

bluelm-7b

vivo-ai/BlueLM-7B-Base

q_proj, k_proj, v_proj

default-generation

-

vivo-ai/BlueLM-7B-Base

bluelm-7b-32k

vivo-ai/BlueLM-7B-Base-32K

q_proj, k_proj, v_proj

default-generation

-

vivo-ai/BlueLM-7B-Base-32K

bluelm-7b-chat

vivo-ai/BlueLM-7B-Chat

q_proj, k_proj, v_proj

bluelm

-

vivo-ai/BlueLM-7B-Chat

bluelm-7b-chat-32k

vivo-ai/BlueLM-7B-Chat-32K

q_proj, k_proj, v_proj

bluelm

-

vivo-ai/BlueLM-7B-Chat-32K

ziya2-13b

Fengshenbang/Ziya2-13B-Base

q_proj, k_proj, v_proj

default-generation

-

IDEA-CCNL/Ziya2-13B-Base

ziya2-13b-chat

Fengshenbang/Ziya2-13B-Chat

q_proj, k_proj, v_proj

ziya

-

IDEA-CCNL/Ziya2-13B-Chat

skywork-13b

skywork/Skywork-13B-base

q_proj, k_proj, v_proj

default-generation

-

Skywork/Skywork-13B-base

skywork-13b-chat

skywork/Skywork-13B-chat

q_proj, k_proj, v_proj

skywork

-

-

zephyr-7b-beta-chat

modelscope/zephyr-7b-beta

q_proj, k_proj, v_proj

zephyr

transformers>=4.34

-

HuggingFaceH4/zephyr-7b-beta

polylm-13b

damo/nlp_polylm_13b_text_generation

c_attn

default-generation

-

DAMO-NLP-MT/polylm-13b

seqgpt-560m

damo/nlp_seqgpt-560m

query_key_value

default-generation

-

DAMO-NLP/SeqGPT-560M

sus-34b-chat

SUSTC/SUS-Chat-34B

q_proj, k_proj, v_proj

sus

-

SUSTech/SUS-Chat-34B

tongyi-finance-14b

TongyiFinance/Tongyi-Finance-14B

c_attn

default-generation

financial

-

tongyi-finance-14b-chat

TongyiFinance/Tongyi-Finance-14B-Chat

c_attn

qwen

financial

jxy/Tongyi-Finance-14B-Chat

tongyi-finance-14b-chat-int4

TongyiFinance/Tongyi-Finance-14B-Chat-Int4

c_attn

qwen

auto_gptq>=0.5

financial

jxy/Tongyi-Finance-14B-Chat-Int4

codefuse-codellama-34b-chat

codefuse-ai/CodeFuse-CodeLlama-34B

q_proj, k_proj, v_proj

codefuse-codellama

coding

codefuse-ai/CodeFuse-CodeLlama-34B

codefuse-codegeex2-6b-chat

codefuse-ai/CodeFuse-CodeGeeX2-6B

query_key_value

codefuse

transformers<4.34

coding

codefuse-ai/CodeFuse-CodeGeeX2-6B

codefuse-qwen-14b-chat

codefuse-ai/CodeFuse-QWen-14B

c_attn

codefuse

coding

codefuse-ai/CodeFuse-QWen-14B

phi2-3b

AI-ModelScope/phi-2

Wqkv

default-generation

coding

microsoft/phi-2

phi3-4b-4k-instruct

LLM-Research/Phi-3-mini-4k-instruct

qkv_proj

phi3

transformers>=4.36

-

microsoft/Phi-3-mini-4k-instruct

phi3-4b-128k-instruct

LLM-Research/Phi-3-mini-128k-instruct

qkv_proj

phi3

transformers>=4.36

-

microsoft/Phi-3-mini-128k-instruct

phi3-small-8k-instruct

LLM-Research/Phi-3-small-8k-instruct

query_key_value

phi3

transformers>=4.36

-

microsoft/Phi-3-small-8k-instruct

phi3-medium-4k-instruct

LLM-Research/Phi-3-medium-4k-instruct

qkv_proj

phi3

transformers>=4.36

-

microsoft/Phi-3-medium-4k-instruct

phi3-small-128k-instruct

LLM-Research/Phi-3-small-128k-instruct

query_key_value

phi3

transformers>=4.36

-

microsoft/Phi-3-small-128k-instruct

phi3-medium-128k-instruct

LLM-Research/Phi-3-medium-128k-instruct

qkv_proj

phi3

transformers>=4.36

-

microsoft/Phi-3-medium-128k-instruct

phi3_5-mini-instruct

LLM-Research/Phi-3.5-mini-instruct

qkv_proj

phi3

transformers>=4.36

-

microsoft/Phi-3.5-mini-instruct

phi3_5-moe-instruct

LLM-Research/Phi-3.5-MoE-instruct

q_proj, k_proj, v_proj

phi3

transformers>=4.36

moe

microsoft/Phi-3.5-MoE-instruct

mamba-130m

AI-ModelScope/mamba-130m-hf

in_proj, x_proj, embeddings, out_proj

default-generation

transformers>=4.39.0

-

state-spaces/mamba-130m-hf

mamba-370m

AI-ModelScope/mamba-370m-hf

in_proj, x_proj, embeddings, out_proj

default-generation

transformers>=4.39.0

-

state-spaces/mamba-370m-hf

mamba-390m

AI-ModelScope/mamba-390m-hf

in_proj, x_proj, embeddings, out_proj

default-generation

transformers>=4.39.0

-

state-spaces/mamba-390m-hf

mamba-790m

AI-ModelScope/mamba-790m-hf

in_proj, x_proj, embeddings, out_proj

default-generation

transformers>=4.39.0

-

state-spaces/mamba-790m-hf

mamba-1.4b

AI-ModelScope/mamba-1.4b-hf

in_proj, x_proj, embeddings, out_proj

default-generation

transformers>=4.39.0

-

state-spaces/mamba-1.4b-hf

mamba-2.8b

AI-ModelScope/mamba-2.8b-hf

in_proj, x_proj, embeddings, out_proj

default-generation

transformers>=4.39.0

-

state-spaces/mamba-2.8b-hf

telechat-7b

TeleAI/TeleChat-7B

key_value, query

telechat

-

Tele-AI/telechat-7B

telechat-12b

TeleAI/TeleChat-12B

key_value, query

telechat

-

Tele-AI/TeleChat-12B

telechat-12b-v2

TeleAI/TeleChat-12B-v2

key_value, query

telechat

-

Tele-AI/TeleChat-12B-v2

telechat-12b-v2-gptq-int4

swift/TeleChat-12B-V2-GPTQ-Int4

key_value, query

telechat

auto_gptq>=0.5

-

-

telechat2-115b

TeleAI/TeleChat2-115B

key_value, query

telechat2

-

Tele-AI/TeleChat2-115B

grok-1

colossalai/grok-1-pytorch

q_proj, k_proj, v_proj

default-generation

-

hpcai-tech/grok-1

dbrx-instruct

AI-ModelScope/dbrx-instruct

attn.Wqkv

dbrx

transformers>=4.36

moe

databricks/dbrx-instruct

dbrx-base

AI-ModelScope/dbrx-base

attn.Wqkv

dbrx

transformers>=4.36

moe

databricks/dbrx-base

mengzi3-13b-base

langboat/Mengzi3-13B-Base

q_proj, k_proj, v_proj

mengzi

-

Langboat/Mengzi3-13B-Base

c4ai-command-r-v01

AI-ModelScope/c4ai-command-r-v01

q_proj, k_proj, v_proj

c4ai

transformers>=4.39.1

-

CohereForAI/c4ai-command-r-v01

c4ai-command-r-plus

AI-ModelScope/c4ai-command-r-plus

q_proj, k_proj, v_proj

c4ai

transformers>4.39

-

CohereForAI/c4ai-command-r-plus

aya-expanse-8b

AI-ModelScope/aya-expanse-8b

q_proj, k_proj, v_proj

aya

transformers>=4.44.0

-

CohereForAI/aya-expanse-8b

aya-expanse-32b

AI-ModelScope/aya-expanse-32b

q_proj, k_proj, v_proj

aya

transformers>=4.44.0

-

CohereForAI/aya-expanse-32b

codestral-22b

swift/Codestral-22B-v0.1

q_proj, k_proj, v_proj

default-generation

transformers>=4.34

-

mistralai/Codestral-22B-v0.1

多模态大模型

Model Type

Model ID

Default Lora Target Modules

Default Template

Support Flash Attn

Support vLLM

Support LMDeploy

Support Megatron

Requires

Tags

HF Model ID

qwen-vl

qwen/Qwen-VL

^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).*

qwen-vl-generation

vision

Qwen/Qwen-VL

qwen-vl-chat

qwen/Qwen-VL-Chat

^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).*

qwen-vl

vision

Qwen/Qwen-VL-Chat

qwen-vl-chat-int4

qwen/Qwen-VL-Chat-Int4

^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).*

qwen-vl

auto_gptq>=0.5

vision

Qwen/Qwen-VL-Chat-Int4

qwen-audio

qwen/Qwen-Audio

^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).*

qwen-audio-generation

audio

Qwen/Qwen-Audio

qwen-audio-chat

qwen/Qwen-Audio-Chat

^(transformer.h)(?!.*(lm_head|output|emb|wte|shared)).*

qwen-audio

audio

Qwen/Qwen-Audio-Chat

qwen2-audio-7b

qwen/Qwen2-Audio-7B

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-audio-generation

librosa, transformers>=4.45

audio

Qwen/Qwen2-Audio-7B

qwen2-audio-7b-instruct

qwen/Qwen2-Audio-7B-Instruct

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-audio

librosa, transformers>=4.45

audio

Qwen/Qwen2-Audio-7B-Instruct

qwen2-vl-2b

qwen/Qwen2-VL-2B

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl-generation

transformers>=4.45.dev.0, qwen_vl_utils

vision, video

Qwen/Qwen2-VL-2B

qwen2-vl-2b-instruct

qwen/Qwen2-VL-2B-Instruct

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl

transformers>=4.45.dev.0, qwen_vl_utils

vision, video

Qwen/Qwen2-VL-2B-Instruct

qwen2-vl-2b-instruct-gptq-int4

qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl

transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5

vision, video

Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4

qwen2-vl-2b-instruct-gptq-int8

qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl

transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5

vision, video

Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8

qwen2-vl-2b-instruct-awq

qwen/Qwen2-VL-2B-Instruct-AWQ

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl

transformers>=4.45.dev.0, qwen_vl_utils, autoawq

vision, video

Qwen/Qwen2-VL-2B-Instruct-AWQ

qwen2-vl-7b

qwen/Qwen2-VL-7B

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl-generation

transformers>=4.45.dev.0, qwen_vl_utils

vision, video

Qwen/Qwen2-VL-7B

qwen2-vl-7b-instruct

qwen/Qwen2-VL-7B-Instruct

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl

transformers>=4.45.dev.0, qwen_vl_utils

vision, video

Qwen/Qwen2-VL-7B-Instruct

qwen2-vl-7b-instruct-gptq-int4

qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl

transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5

vision, video

Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4

qwen2-vl-7b-instruct-gptq-int8

qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl

transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5

vision, video

Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8

qwen2-vl-7b-instruct-awq

qwen/Qwen2-VL-7B-Instruct-AWQ

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl

transformers>=4.45.dev.0, qwen_vl_utils, autoawq

vision, video

Qwen/Qwen2-VL-7B-Instruct-AWQ

qwen2-vl-72b

qwen/Qwen2-VL-72B

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl-generation

transformers>=4.45.dev.0, qwen_vl_utils

vision, video

Qwen/Qwen2-VL-72B

qwen2-vl-72b-instruct

qwen/Qwen2-VL-72B-Instruct

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl

transformers>=4.45.dev.0, qwen_vl_utils

vision, video

Qwen/Qwen2-VL-72B-Instruct

qwen2-vl-72b-instruct-gptq-int4

qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl

transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5

vision, video

Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4

qwen2-vl-72b-instruct-gptq-int8

qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl

transformers>=4.45.dev.0, qwen_vl_utils, auto_gptq>=0.5

vision, video

Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8

qwen2-vl-72b-instruct-awq

qwen/Qwen2-VL-72B-Instruct-AWQ

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

qwen2-vl

transformers>=4.45.dev.0, qwen_vl_utils, autoawq

vision, video

Qwen/Qwen2-VL-72B-Instruct-AWQ

glm4v-9b-chat

ZhipuAI/glm-4v-9b

^(transformer.encoder)(?!.*(lm_head|output|emb|wte|shared)).*

glm4v

transformers>=4.42

vision

THUDM/glm-4v-9b

glm-edge-v-2b

ZhipuAI/glm-edge-v-2b

^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).*

glm-edge-v

transformers>=4.46

vision

THUDM/glm-edge-v-2b

glm-edge-v-5b

ZhipuAI/glm-edge-v-5b

^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).*

glm-edge-v

transformers>=4.46

vision

THUDM/glm-edge-v-5b

llama3_2-11b-vision

LLM-Research/Llama-3.2-11B-Vision

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llama3_2-vision-generation

transformers>=4.45

vision

meta-llama/Llama-3.2-11B-Vision

llama3_2-11b-vision-instruct

LLM-Research/Llama-3.2-11B-Vision-Instruct

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llama3_2-vision

transformers>=4.45

vision

meta-llama/Llama-3.2-11B-Vision-Instruct

llama3_2-90b-vision

LLM-Research/Llama-3.2-90B-Vision

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llama3_2-vision-generation

transformers>=4.45

vision

meta-llama/Llama-3.2-90B-Vision

llama3_2-90b-vision-instruct

LLM-Research/Llama-3.2-90B-Vision-Instruct

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llama3_2-vision

transformers>=4.45

vision

meta-llama/Llama-3.2-90B-Vision-Instruct

llama3_1-8b-omni

ICTNLP/Llama-3.1-8B-Omni

^(model.layers|model.speech_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llama3_1-omni

whisper, openai-whisper

audio

ICTNLP/Llama-3.1-8B-Omni

idefics3-8b-llama3

AI-ModelScope/Idefics3-8B-Llama3

^(model.text_model|model.connector)(?!.*(lm_head|output|emb|wte|shared)).*

idefics3

transformers>=4.45

vision

HuggingFaceM4/Idefics3-8B-Llama3

llava1_5-7b-instruct

swift/llava-1.5-7b-hf

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava1_5

transformers>=4.36

vision

llava-hf/llava-1.5-7b-hf

llava1_5-13b-instruct

swift/llava-1.5-13b-hf

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava1_5

transformers>=4.36

vision

llava-hf/llava-1.5-13b-hf

llava1_6-mistral-7b-instruct

swift/llava-v1.6-mistral-7b-hf

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava-mistral

transformers>=4.39

vision

llava-hf/llava-v1.6-mistral-7b-hf

llava1_6-vicuna-7b-instruct

swift/llava-v1.6-vicuna-7b-hf

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava-vicuna

transformers>=4.39

vision

llava-hf/llava-v1.6-vicuna-7b-hf

llava1_6-vicuna-13b-instruct

swift/llava-v1.6-vicuna-13b-hf

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava-vicuna

transformers>=4.39

vision

llava-hf/llava-v1.6-vicuna-13b-hf

llava1_6-llama3_1-8b-instruct

swift/llava-llama3.1-8b

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava-next-llama3

transformers>=4.41

vision

-

llava1_6-yi-34b-instruct

swift/llava-v1.6-34b-hf

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava-yi

transformers>=4.39

vision

llava-hf/llava-v1.6-34b-hf

llama3-llava-next-8b-hf

swift/llama3-llava-next-8b-hf

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llama-llava-next-hf

transformers>=4.39

vision

llava-hf/llama3-llava-next-8b-hf

llava-next-72b-hf

AI-ModelScope/llava-next-72b-hf

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llama-qwen-hf

transformers>=4.39

vision

llava-hf/llava-next-72b-hf

llava-next-110b-hf

AI-ModelScope/llava-next-110b-hf

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llama-qwen-hf

transformers>=4.39

vision

llava-hf/llava-next-110b-hf

llava-onevision-qwen2-0_5b-ov

AI-ModelScope/llava-onevision-qwen2-0.5b-ov-hf

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava-onevision-qwen

transformers>=4.45

vision, video

llava-hf/llava-onevision-qwen2-0.5b-ov-hf

llava-onevision-qwen2-7b-ov

AI-ModelScope/llava-onevision-qwen2-7b-ov-hf

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava-onevision-qwen

transformers>=4.45

vision, video

llava-hf/llava-onevision-qwen2-7b-ov-hf

llava-onevision-qwen2-72b-ov

AI-ModelScope/llava-onevision-qwen2-72b-ov-hf

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava-onevision-qwen

transformers>=4.45

vision, video

llava-hf/llava-onevision-qwen2-72b-ov-hf

llama3-llava-next-8b

AI-Modelscope/llama3-llava-next-8b

^(model.layers|model.mm_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llama3-llava-next

vision

lmms-lab/llama3-llava-next-8b

llava-next-72b

AI-Modelscope/llava-next-72b

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava-qwen

vision

lmms-lab/llava-next-72b

llava-next-110b

AI-Modelscope/llava-next-110b

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava-qwen

vision

lmms-lab/llava-next-110b

llava-next-video-7b-instruct

swift/LLaVA-NeXT-Video-7B-hf

^(language_model|multi_modal_projector|vision_resampler)(?!.*(lm_head|output|emb|wte|shared)).*

llava-next-video

transformers>=4.42, av

video

llava-hf/LLaVA-NeXT-Video-7B-hf

llava-next-video-7b-32k-instruct

swift/LLaVA-NeXT-Video-7B-32K-hf

^(language_model|multi_modal_projector|vision_resampler)(?!.*(lm_head|output|emb|wte|shared)).*

llava-next-video

transformers>=4.42, av

video

llava-hf/LLaVA-NeXT-Video-7B-32K-hf

llava-next-video-7b-dpo-instruct

swift/LLaVA-NeXT-Video-7B-DPO-hf

^(language_model|multi_modal_projector|vision_resampler)(?!.*(lm_head|output|emb|wte|shared)).*

llava-next-video

transformers>=4.42, av

video

llava-hf/LLaVA-NeXT-Video-7B-DPO-hf

llava-next-video-34b-instruct

swift/LLaVA-NeXT-Video-34B-hf

^(language_model|multi_modal_projector|vision_resampler)(?!.*(lm_head|output|emb|wte|shared)).*

llava-next-video-yi

transformers>=4.42, av

video

llava-hf/LLaVA-NeXT-Video-34B-hf

yi-vl-6b-chat

01ai/Yi-VL-6B

^(model.layers|model.mm_projector)(?!.*(lm_head|output|emb|wte|shared)).*

yi-vl

transformers>=4.34

vision

01-ai/Yi-VL-6B

yi-vl-34b-chat

01ai/Yi-VL-34B

^(model.layers|model.mm_projector)(?!.*(lm_head|output|emb|wte|shared)).*

yi-vl

transformers>=4.34

vision

01-ai/Yi-VL-34B

llava-llama3-8b-v1_1

AI-ModelScope/llava-llama-3-8b-v1_1-transformers

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

llava-llama-instruct

transformers>=4.36

vision

xtuner/llava-llama-3-8b-v1_1-transformers

internlm-xcomposer2-7b-chat

Shanghai_AI_Laboratory/internlm-xcomposer2-7b

attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3

internlm-xcomposer2

vision

internlm/internlm-xcomposer2-7b

internlm-xcomposer2-4khd-7b-chat

Shanghai_AI_Laboratory/internlm-xcomposer2-4khd-7b

attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3

internlm-xcomposer2-4khd

vision

internlm/internlm-xcomposer2-4khd-7b

internlm-xcomposer2_5-7b-chat

Shanghai_AI_Laboratory/internlm-xcomposer2d5-7b

attention.wqkv, attention.wo, feed_forward.w1, feed_forward.w2, feed_forward.w3

internlm-xcomposer2_5

vision

internlm/internlm-xcomposer2d5-7b

internvl-chat-v1_5

AI-ModelScope/InternVL-Chat-V1-5

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl

transformers>=4.35, timm

vision

OpenGVLab/InternVL-Chat-V1-5

internvl-chat-v1_5-int8

AI-ModelScope/InternVL-Chat-V1-5-int8

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl

transformers>=4.35, timm

vision

OpenGVLab/InternVL-Chat-V1-5-int8

mini-internvl-chat-2b-v1_5

OpenGVLab/Mini-InternVL-Chat-2B-V1-5

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl

transformers>=4.35, timm

vision

OpenGVLab/Mini-InternVL-Chat-2B-V1-5

mini-internvl-chat-4b-v1_5

OpenGVLab/Mini-InternVL-Chat-4B-V1-5

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl-phi3

transformers>=4.35,<4.42, timm

vision

OpenGVLab/Mini-InternVL-Chat-4B-V1-5

internvl2-1b

OpenGVLab/InternVL2-1B

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-1B

internvl2-2b

OpenGVLab/InternVL2-2B

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-2B

internvl2-4b

OpenGVLab/InternVL2-4B

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl2-phi3

transformers>=4.36,<4.42, timm

vision, video

OpenGVLab/InternVL2-4B

internvl2-8b

OpenGVLab/InternVL2-8B

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-8B

internvl2-26b

OpenGVLab/InternVL2-26B

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-26B

internvl2-40b

OpenGVLab/InternVL2-40B

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-40B

internvl2-llama3-76b

OpenGVLab/InternVL2-Llama3-76B

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-Llama3-76B

internvl2-2b-awq

OpenGVLab/InternVL2-2B-AWQ

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-2B-AWQ

internvl2-8b-awq

OpenGVLab/InternVL2-8B-AWQ

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-8B-AWQ

internvl2-26b-awq

OpenGVLab/InternVL2-26B-AWQ

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-26B-AWQ

internvl2-40b-awq

OpenGVLab/InternVL2-40B-AWQ

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-40B-AWQ

internvl2-llama3-76b-awq

OpenGVLab/InternVL2-Llama3-76B-AWQ

^(language_model|mlp1)(?!.*(lm_head|output|emb|wte|shared)).*

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-Llama3-76B-AWQ

deepseek-janus-1_3b

deepseek-ai/Janus-1.3B

^(language_model|aligner)(?!.*(lm_head|output|emb|wte|shared)).*

deepseek-janus

vision

deepseek-ai/Janus-1.3B

deepseek-vl-1_3b-chat

deepseek-ai/deepseek-vl-1.3b-chat

^(language_model|aligner)(?!.*(lm_head|output|emb|wte|shared)).*

deepseek-vl

vision

deepseek-ai/deepseek-vl-1.3b-chat

deepseek-vl-7b-chat

deepseek-ai/deepseek-vl-7b-chat

^(language_model|aligner)(?!.*(lm_head|output|emb|wte|shared)).*

deepseek-vl

vision

deepseek-ai/deepseek-vl-7b-chat

ovis1_6-gemma2-9b

AIDC-AI/Ovis1.6-Gemma2-9B

^(llm)(?!.*(lm_head|output|emb|wte|shared)).*

ovis1_6

transformers>=4.42

vision

AIDC-AI/Ovis1.6-Gemma2-9B

paligemma-3b-pt-224

AI-ModelScope/paligemma-3b-pt-224

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

paligemma

transformers>=4.41

vision

google/paligemma-3b-pt-224

paligemma-3b-pt-448

AI-ModelScope/paligemma-3b-pt-448

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

paligemma

transformers>=4.41

vision

google/paligemma-3b-pt-448

paligemma-3b-pt-896

AI-ModelScope/paligemma-3b-pt-896

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

paligemma

transformers>=4.41

vision

google/paligemma-3b-pt-896

paligemma-3b-mix-224

AI-ModelScope/paligemma-3b-mix-224

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

paligemma

transformers>=4.41

vision

google/paligemma-3b-mix-224

paligemma-3b-mix-448

AI-ModelScope/paligemma-3b-mix-448

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

paligemma

transformers>=4.41

vision

google/paligemma-3b-mix-448

minicpm-v-3b-chat

OpenBMB/MiniCPM-V

^(llm|resampler)(?!.*(lm_head|output|emb|wte|shared)).*

minicpm-v

timm, transformers<4.42

vision

openbmb/MiniCPM-V

minicpm-v-v2-chat

OpenBMB/MiniCPM-V-2

^(llm|resampler)(?!.*(lm_head|output|emb|wte|shared)).*

minicpm-v

timm, transformers<4.42

vision

openbmb/MiniCPM-V-2

minicpm-v-v2_5-chat

OpenBMB/MiniCPM-Llama3-V-2_5

^(llm|resampler)(?!.*(lm_head|output|emb|wte|shared)).*

minicpm-v-v2_5

timm, transformers>=4.36

vision

openbmb/MiniCPM-Llama3-V-2_5

minicpm-v-v2_6-chat

OpenBMB/MiniCPM-V-2_6

^(llm|resampler)(?!.*(lm_head|output|emb|wte|shared)).*

minicpm-v-v2_6

timm, transformers>=4.36

vision, video

openbmb/MiniCPM-V-2_6

pixtral-12b

AI-ModelScope/pixtral-12b

^(language_model|multi_modal_projector)(?!.*(lm_head|output|emb|wte|shared)).*

pixtral

transformers>=4.45

vision

mistral-community/pixtral-12b

mplug-owl2-chat

iic/mPLUG-Owl2

q_proj, k_proj.multiway.0, k_proj.multiway.1, v_proj.multiway.0, v_proj.multiway.1

mplug-owl2

transformers<4.35, icecream

vision

MAGAer13/mplug-owl2-llama2-7b

mplug-owl2_1-chat

iic/mPLUG-Owl2.1

c_attn.multiway.0, c_attn.multiway.1

mplug-owl2

transformers<4.35, icecream

vision

Mizukiluke/mplug_owl_2_1

mplug-owl3-1b-chat

iic/mPLUG-Owl3-1B-241014

^(language_model|vision2text_model)(?!.*(lm_head|output|emb|wte|shared)).*

mplug_owl3

transformers>=4.36, icecream

vision, video

mPLUG/mPLUG-Owl3-1B-241014

mplug-owl3-2b-chat

iic/mPLUG-Owl3-2B-241014

^(language_model|vision2text_model)(?!.*(lm_head|output|emb|wte|shared)).*

mplug_owl3

transformers>=4.36, icecream

vision, video

mPLUG/mPLUG-Owl3-2B-241014

mplug-owl3-7b-chat

iic/mPLUG-Owl3-7B-240728

^(language_model|vision2text_model)(?!.*(lm_head|output|emb|wte|shared)).*

mplug_owl3

transformers>=4.36, icecream

vision, video

mPLUG/mPLUG-Owl3-7B-240728

mplug-owl3v-7b-chat

iic/mPLUG-Owl3-7B-241101

^(language_model|vision2text_model)(?!.*(lm_head|output|emb|wte|shared)).*

mplug_owl3v

transformers>=4.36, icecream

vision, video

mPLUG/mPLUG-Owl3-7B-241101

phi3-vision-128k-instruct

LLM-Research/Phi-3-vision-128k-instruct

^(model.layers|model.vision_embed_tokens.img_projection)(?!.*(lm_head|output|emb|wte|shared)).*

phi3-vl

transformers>=4.36

vision

microsoft/Phi-3-vision-128k-instruct

phi3_5-vision-instruct

LLM-Research/Phi-3.5-vision-instruct

^(model.layers|model.vision_embed_tokens.img_projection)(?!.*(lm_head|output|emb|wte|shared)).*

phi3-vl

transformers>=4.36

vision

microsoft/Phi-3.5-vision-instruct

cogvlm-17b-chat

ZhipuAI/cogvlm-chat

^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).*

cogvlm

transformers<4.42

vision

THUDM/cogvlm-chat-hf

cogvlm2-19b-chat

ZhipuAI/cogvlm2-llama3-chinese-chat-19B

^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).*

cogvlm

transformers<4.42

vision

THUDM/cogvlm2-llama3-chinese-chat-19B

cogvlm2-en-19b-chat

ZhipuAI/cogvlm2-llama3-chat-19B

^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).*

cogvlm

transformers<4.42

vision

THUDM/cogvlm2-llama3-chat-19B

cogvlm2-video-13b-chat

ZhipuAI/cogvlm2-video-llama3-chat

^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).*

cogvlm2-video

decord, pytorchvideo, transformers>=4.42

vision, video

THUDM/cogvlm2-video-llama3-chat

cogagent-18b-chat

ZhipuAI/cogagent-chat

^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).*

cogagent-chat

timm

vision

THUDM/cogagent-chat-hf

cogagent-18b-instruct

ZhipuAI/cogagent-vqa

^(model.layers)(?!.*(lm_head|output|emb|wte|shared)).*

cogagent-instruct

timm

vision

THUDM/cogagent-vqa-hf

molmoe-1b

LLM-Research/MolmoE-1B-0924

^(model.transformer)(?!.*(lm_head|output|emb|wte|shared)).*

molmo

transformers>=4.45.0

vision

allenai/MolmoE-1B-0924

molmo-7b-o

LLM-Research/Molmo-7B-O-0924

^(model.transformer)(?!.*(lm_head|output|emb|wte|shared)).*

molmo

transformers>=4.45.0

vision

allenai/Molmo-7B-O-0924

molmo-7b-d

LLM-Research/Molmo-7B-D-0924

^(model.transformer)(?!.*(lm_head|output|emb|wte|shared)).*

molmo

transformers>=4.45.0

vision

allenai/Molmo-7B-D-0924

molmo-72b

LLM-Research/Molmo-72B-0924

^(model.transformer)(?!.*(lm_head|output|emb|wte|shared)).*

molmo

transformers>=4.45.0

vision

allenai/Molmo-72B-0924

emu3-chat

BAAI/Emu3-Chat

^(model)(?!.*(lm_head|output|emb|wte|shared)).*

emu3-chat

transformers>=4.44.0

vision

BAAI/Emu3-Chat

florence-2-base

AI-ModelScope/Florence-2-base

^(language_model|image_projection)(?!.*(lm_head|output|emb|wte|shared)).*

florence

vision

microsoft/Florence-2-base

florence-2-base-ft

AI-ModelScope/Florence-2-base-ft

^(language_model|image_projection)(?!.*(lm_head|output|emb|wte|shared)).*

florence

vision

microsoft/Florence-2-base-ft

florence-2-large

AI-ModelScope/Florence-2-large

^(language_model|image_projection)(?!.*(lm_head|output|emb|wte|shared)).*

florence

vision

microsoft/Florence-2-large

florence-2-large-ft

AI-ModelScope/Florence-2-large-ft

^(language_model|image_projection)(?!.*(lm_head|output|emb|wte|shared)).*

florence

vision

microsoft/Florence-2-large-ft

got-ocr2

stepfun-ai/GOT-OCR2_0

^(model.layers|model.mm_projector_vary)(?!.*(lm_head|output|emb|wte|shared)).*

got_ocr2

audio

stepfun-ai/GOT-OCR2_0

数据集

下表介绍了swift接入的数据集的相关信息:

  • Dataset Name: 数据集在swift中注册的dataset_name.

  • Dataset ID: 数据集在ModelScope上的dataset_id.

  • Size: 数据集中的数据样本数量.

  • Statistic: 数据集的统计量. 我们使用token数进行统计, 这对于调整max_length超参数有帮助. 我们将数据集的训练集和验证集进行拼接, 然后进行统计. 我们使用qwen的tokenizer对数据集进行分词. 不同的tokenizer的统计量不同, 如果你要获取其他的模型的tokenizer的token统计量, 可以通过脚本自行获取.

Dataset Name Dataset ID Subsets Dataset Size Statistic (token) Tags HF Dataset ID
🔥ms-bench iic/ms_bench 316820 346.9±443.2, min=22, max=30960 chat, general, multi-round -
🔥alpaca-en AI-ModelScope/alpaca-gpt4-data-en 52002 176.2±125.8, min=26, max=740 chat, general vicgalle/alpaca-gpt4
🔥alpaca-zh AI-ModelScope/alpaca-gpt4-data-zh 48818 162.1±93.9, min=26, max=856 chat, general llm-wizard/alpaca-gpt4-data-zh
multi-alpaca damo/nlp_polylm_multialpaca_sft ar
de
es
fr
id
ja
ko
pt
ru
th
vi
131867 112.9±50.6, min=26, max=1226 chat, general, multilingual -
instinwild wyj123456/instinwild default
subset
103695 145.4±60.7, min=28, max=1434 - -
cot-en YorickHe/CoT 74771 122.7±64.8, min=51, max=8320 chat, general -
cot-zh YorickHe/CoT_zh 74771 117.5±70.8, min=43, max=9636 chat, general -
instruct-en wyj123456/instruct 888970 269.1±331.5, min=26, max=7254 chat, general -
firefly-zh AI-ModelScope/firefly-train-1.1M 1649399 178.1±260.4, min=26, max=12516 chat, general YeungNLP/firefly-train-1.1M
gpt4all-en wyj123456/GPT4all 806199 302.7±384.5, min=27, max=7391 chat, general -
sharegpt swift/sharegpt common-zh
computer-zh
unknow-zh
common-en
computer-en
96566 933.3±864.8, min=21, max=66412 chat, general, multi-round -
tulu-v2-sft-mixture AI-ModelScope/tulu-v2-sft-mixture 5119 520.7±437.6, min=68, max=2549 chat, multilingual, general, multi-round allenai/tulu-v2-sft-mixture
wikipedia-zh AI-ModelScope/wikipedia-cn-20230720-filtered 254547 568.4±713.2, min=37, max=78678 text-generation, general, pretrained pleisto/wikipedia-cn-20230720-filtered
open-orca AI-ModelScope/OpenOrca 994896 382.3±417.4, min=31, max=8740 chat, multilingual, general -
🔥sharegpt-gpt4 AI-ModelScope/sharegpt_gpt4 default
V3_format
zh_38K_format
72684 1047.6±1313.1, min=22, max=66412 chat, multilingual, general, multi-round, gpt4 -
deepctrl-sft AI-ModelScope/deepctrl-sft-data default
en
14149024 389.8±628.6, min=21, max=626237 chat, general, sft, multi-round -
🔥coig-cqia AI-ModelScope/COIG-CQIA chinese_traditional
coig_pc
exam
finance
douban
human_value
logi_qa
ruozhiba
segmentfault
wiki
wikihow
xhs
zhihu
44694 703.8±654.2, min=33, max=19288 general -
🔥ruozhiba AI-ModelScope/ruozhiba post-annual
title-good
title-norm
85658 39.9±13.1, min=21, max=559 pretrain -
long-alpaca-12k AI-ModelScope/LongAlpaca-12k 11998 9619.0±8295.8, min=36, max=78925 longlora, QA Yukang/LongAlpaca-12k
lmsys-chat-1m AI-ModelScope/lmsys-chat-1m - Dataset is too huge, please click the original link to view the dataset stat. chat, em lmsys/lmsys-chat-1m
🔥ms-agent iic/ms_agent 26336 650.9±217.2, min=209, max=2740 chat, agent, multi-round -
🔥ms-agent-for-agentfabric AI-ModelScope/ms_agent_for_agentfabric default
addition
30000 617.8±199.1, min=251, max=2657 chat, agent, multi-round -
ms-agent-multirole iic/MSAgent-MultiRole 9500 447.6±84.9, min=145, max=1101 chat, agent, multi-round, role-play, multi-agent -
🔥toolbench-for-alpha-umi shenweizhou/alpha-umi-toolbench-processed-v2 backbone
caller
planner
summarizer
1448337 1439.7±853.9, min=123, max=18467 chat, agent -
damo-agent-zh damo/MSAgent-Bench 386984 956.5±407.3, min=326, max=19001 chat, agent, multi-round -
damo-agent-zh-mini damo/MSAgent-Bench 20845 1326.4±329.6, min=571, max=4304 chat, agent, multi-round -
agent-instruct-all-en huangjintao/AgentInstruct_copy alfworld
db
kg
mind2web
os
webshop
1866 1144.3±635.5, min=206, max=6412 chat, agent, multi-round -
🔥msagent-pro iic/MSAgent-Pro 21905 1524.5±921.3, min=64, max=16770 chat, agent, multi-round -
toolbench swift/ToolBench 124345 3669.5±1600.9, min=1047, max=22581 chat, agent, multi-round -
code-alpaca-en wyj123456/code_alpaca_en 20016 100.2±60.1, min=29, max=1776 - sahil2801/CodeAlpaca-20k
🔥leetcode-python-en AI-ModelScope/leetcode-solutions-python 2359 727.1±235.9, min=259, max=2146 chat, coding -
🔥codefuse-python-en codefuse-ai/CodeExercise-Python-27k 27224 483.6±193.9, min=45, max=3082 chat, coding -
🔥codefuse-evol-instruction-zh codefuse-ai/Evol-instruction-66k 66862 439.6±206.3, min=37, max=2983 chat, coding -
medical-en swift/medical_zh en 117617 257.4±89.1, min=36, max=2564 chat, medical -
medical-zh swift/medical_zh zh 1950972 167.2±219.7, min=26, max=27351 chat, medical -
🔥disc-med-sft-zh AI-ModelScope/DISC-Med-SFT 441767 354.1±193.1, min=25, max=2231 chat, medical Flmc/DISC-Med-SFT
lawyer-llama-zh AI-ModelScope/lawyer_llama_data 21476 194.4±91.7, min=27, max=924 chat, law Skepsun/lawyer_llama_data
tigerbot-law-zh AI-ModelScope/tigerbot-law-plugin 55895 109.9±126.4, min=37, max=18878 text-generation, law, pretrained TigerResearch/tigerbot-law-plugin
🔥disc-law-sft-zh AI-ModelScope/DISC-Law-SFT 166758 533.7±495.4, min=30, max=15169 chat, law ShengbinYue/DISC-Law-SFT
🔥blossom-math-zh AI-ModelScope/blossom-math-v2 10000 169.3±58.7, min=35, max=563 chat, math Azure99/blossom-math-v2
school-math-zh AI-ModelScope/school_math_0.25M 248480 157.7±72.2, min=33, max=3450 chat, math, quality BelleGroup/school_math_0.25M
open-platypus-en AI-ModelScope/Open-Platypus 24926 367.9±254.8, min=30, max=3951 chat, math, quality garage-bAInd/Open-Platypus
text2sql-en AI-ModelScope/texttosqlv2_25000_v2 25000 274.6±326.4, min=38, max=1975 chat, sql Clinton/texttosqlv2_25000_v2
🔥sql-create-context-en AI-ModelScope/sql-create-context 78577 80.2±17.8, min=36, max=456 chat, sql b-mc2/sql-create-context
synthetic-text-to-sql AI-ModelScope/synthetic_text_to_sql default 100000 283.4±115.8, min=61, max=1356 nl2sql, en gretelai/synthetic_text_to_sql
🔥advertise-gen-zh lvjianjin/AdvertiseGen 98399 130.6±21.7, min=51, max=241 text-generation shibing624/AdvertiseGen
🔥dureader-robust-zh modelscope/DuReader_robust-QG 17899 241.1±137.4, min=60, max=1416 text-generation -
cmnli-zh modelscope/clue cmnli 404024 82.6±16.6, min=51, max=199 text-generation, classification clue
🔥jd-sentiment-zh DAMO_NLP/jd 50000 66.0±83.2, min=39, max=4039 text-generation, classification -
🔥hc3-zh simpleai/HC3-Chinese baike
open_qa
nlpcc_dbqa
finance
medicine
law
psychology
39781 176.8±81.5, min=57, max=3051 text-generation, classification Hello-SimpleAI/HC3-Chinese
🔥hc3-en simpleai/HC3 finance
medicine
11021 298.3±138.7, min=65, max=2267 text-generation, classification Hello-SimpleAI/HC3
dolly-15k AI-ModelScope/databricks-dolly-15k default 15011 199.2±267.8, min=22, max=8615 multi-task, en, quality databricks/databricks-dolly-15k
zhihu-kol OmniData/Zhihu-KOL default - Dataset is too huge, please click the original link to view the dataset stat. zhihu, qa wangrui6/Zhihu-KOL
zhihu-kol-filtered OmniData/Zhihu-KOL-More-Than-100-Upvotes default 271261 952.0±1727.2, min=25, max=98658 zhihu, qa bzb2023/Zhihu-KOL-More-Than-100-Upvotes
finance-en wyj123456/finance_en 68911 135.6±134.3, min=26, max=3525 chat, financial ssbuild/alpaca_finance_en
poetry-zh modelscope/chinese-poetry-collection 390309 55.2±9.4, min=23, max=83 text-generation, poetry -
webnovel-zh AI-ModelScope/webnovel_cn 50000 1478.9±11526.1, min=100, max=490484 chat, novel zxbsmk/webnovel_cn
generated-chat-zh AI-ModelScope/generated_chat_0.4M 396004 273.3±52.0, min=32, max=873 chat, character-dialogue BelleGroup/generated_chat_0.4M
🔥self-cognition swift/self-cognition 134 53.6±18.6, min=29, max=121 chat, self-cognition modelscope/self-cognition
🔥swift-mix swift/swift-sft-mixture sharegpt
firefly
codefuse
metamathqa
- Dataset is too huge, please click the original link to view the dataset stat. chat, sft, general -
cls-fudan-news-zh damo/zh_cls_fudan-news 4959 3234.4±2547.5, min=91, max=19548 chat, classification -
ner-jave-zh damo/zh_ner-JAVE 1266 118.3±45.5, min=44, max=223 chat, ner -
coco-en modelscope/coco_2014_caption coco_2014_caption 454617 299.8±2.8, min=295, max=352 chat, multi-modal, vision -
🔥coco-en-mini modelscope/coco_2014_caption coco_2014_caption 40504 299.8±2.6, min=295, max=338 chat, multi-modal, vision -
coco-en-2 modelscope/coco_2014_caption coco_2014_caption 454617 36.8±2.8, min=32, max=89 chat, multi-modal, vision -
🔥coco-en-2-mini modelscope/coco_2014_caption coco_2014_caption 40504 36.8±2.6, min=32, max=75 chat, multi-modal, vision -
capcha-images AI-ModelScope/captcha-images 8000 31.0±0.0, min=31, max=31 chat, multi-modal, vision -
latex-ocr-print AI-ModelScope/LaTeX_OCR default 17918 362.7±34.8, min=294, max=528 chat, ocr, multi-modal, vision linxy/LaTeX_OCR
latex-ocr-handwrite AI-ModelScope/LaTeX_OCR synthetic_handwrite 95424 375.1±59.4, min=292, max=2115 chat, ocr, multi-modal, vision linxy/LaTeX_OCR
aishell1-zh speech_asr/speech_asr_aishell1_trainsets 141600 152.2±36.8, min=63, max=419 chat, multi-modal, audio -
🔥aishell1-zh-mini speech_asr/speech_asr_aishell1_trainsets 14526 152.2±35.6, min=74, max=359 chat, multi-modal, audio -
🔥video-chatgpt swift/VideoChatGPT Generic
Temporal
Consistency
3206 88.4±48.3, min=32, max=399 chat, multi-modal, video lmms-lab/VideoChatGPT
egoschema AI-ModelScope/egoschema Subset 101 191.6±80.7, min=96, max=435 chat, multi-modal, video lmms-lab/egoschema
llava-video-178k lmms-lab/LLaVA-Video-178K 0_30_s_academic_v0_1
0_30_s_youtube_v0_1
1_2_m_academic_v0_1
1_2_m_youtube_v0_1
2_3_m_academic_v0_1
2_3_m_youtube_v0_1
30_60_s_academic_v0_1
30_60_s_youtube_v0_1
- Dataset is too huge, please click the original link to view the dataset stat. chat, multi-modal, video lmms-lab/LLaVA-Video-178K
moviechat-1k-test AI-ModelScope/MovieChat-1K-test 486 36.1±4.3, min=27, max=42 chat, multi-modal, video Enxin/MovieChat-1K-test
hh-rlhf AI-ModelScope/hh-rlhf harmless-base
helpful-base
helpful-online
helpful-rejection-sampled
127459 245.4±190.7, min=22, max=1999 rlhf, dpo, pairwise -
🔥hh-rlhf-cn AI-ModelScope/hh_rlhf_cn hh_rlhf
harmless_base_cn
harmless_base_en
helpful_base_cn
helpful_base_en
355920 171.2±122.7, min=22, max=3078 rlhf, dpo, pairwise -
orpo-dpo-mix-40k AI-ModelScope/orpo-dpo-mix-40k default 43666 548.3±397.4, min=28, max=8483 dpo, orpo, en, quality mlabonne/orpo-dpo-mix-40k
stack-exchange-paired AI-ModelScope/stack-exchange-paired 4483004 534.5±594.6, min=31, max=56588 hfrl, dpo, pairwise lvwerra/stack-exchange-paired
shareai-llama3-dpo-zh-en-emoji hjh0119/shareAI-Llama3-DPO-zh-en-emoji default 2449 334.0±162.8, min=36, max=1801 rlhf, dpo, pairwise -
ultrafeedback-kto AI-ModelScope/ultrafeedback-binarized-preferences-cleaned-kto default 230720 11.0±0.0, min=11, max=11 rlhf, kto -
rlaif-v swift/RLAIF-V-Dataset default 83132 119.8±52.6, min=28, max=556 rlhf, dpo, multi-modal, en openbmb/RLAIF-V-Dataset
pileval swift/pile-val-backup 214670 1612.3±8856.2, min=11, max=1208955 text-generation, awq mit-han-lab/pile-val-backup
mantis-instruct swift/Mantis-Instruct birds-to-words
chartqa
coinstruct
contrastive_caption
docvqa
dreamsim
dvqa
iconqa
imagecode
llava_665k_multi
lrv_multi
multi_vqa
nextqa
nlvr2
spot-the-diff
star
visual_story_telling
655351 825.7±812.5, min=284, max=13563 chat, multi-modal, vision, quality TIGER-Lab/Mantis-Instruct
llava-data-instruct swift/llava-data llava_instruct 364100 189.0±142.1, min=33, max=5183 sft, multi-modal, quality TIGER-Lab/llava-data
midefics swift/MideficsDataset 3800 201.3±70.2, min=60, max=454 medical, en, vqa WinterSchool/MideficsDataset
gqa None train_all_instructions - Dataset is too huge, please click the original link to view the dataset stat. multi-modal, en, vqa, quality lmms-lab/GQA
text-caps swift/TextCaps 18145 38.2±4.4, min=31, max=73 multi-modal, en, caption, quality HuggingFaceM4/TextCaps
refcoco-unofficial-caption swift/refcoco 46215 44.7±3.2, min=36, max=71 multi-modal, en, caption jxu124/refcoco
refcoco-unofficial-grounding swift/refcoco 46215 45.2±3.1, min=37, max=69 multi-modal, en, grounding jxu124/refcoco
refcocog-unofficial-caption swift/refcocog 44799 49.7±4.7, min=37, max=88 multi-modal, en, caption jxu124/refcocog
refcocog-unofficial-grounding swift/refcocog 44799 50.1±4.7, min=37, max=90 multi-modal, en, grounding jxu124/refcocog
a-okvqa swift/A-OKVQA 18201 45.8±7.9, min=32, max=100 multi-modal, en, vqa, quality HuggingFaceM4/A-OKVQA
okvqa swift/OK-VQA_train 9009 34.4±3.3, min=28, max=59 multi-modal, en, vqa, quality Multimodal-Fatima/OK-VQA_train
ocr-vqa swift/OCR-VQA 186753 35.6±6.6, min=29, max=193 multi-modal, en, ocr-vqa howard-hou/OCR-VQA
grit swift/GRIT - Dataset is too huge, please click the original link to view the dataset stat. multi-modal, en, caption-grounding, quality zzliang/GRIT
llava-instruct-mix swift/llava-instruct-mix-vsft 13640 179.8±120.2, min=30, max=962 multi-modal, en, vqa, quality HuggingFaceH4/llava-instruct-mix-vsft
lnqa swift/lnqa - Dataset is too huge, please click the original link to view the dataset stat. multi-modal, en, ocr-vqa, quality vikhyatk/lnqa
science-qa swift/ScienceQA 8315 100.3±59.5, min=38, max=638 multi-modal, science, vqa, quality derek-thomas/ScienceQA
guanaco AI-ModelScope/GuanacoDataset default 31561 250.1±70.3, min=89, max=1436 chat, zh JosephusCheung/GuanacoDataset
mind2web swift/Multimodal-Mind2Web 1009 297522.4±325496.2, min=8592, max=3499715 agent, multi-modal osunlp/Multimodal-Mind2Web
sharegpt-4o-image AI-ModelScope/ShareGPT-4o image_caption 57289 638.7±157.9, min=47, max=4640 vqa, multi-modal OpenGVLab/ShareGPT-4o
pixelprose swift/pixelprose - Dataset is too huge, please click the original link to view the dataset stat. caption, multi-modal, vision tomg-group-umd/pixelprose
m3it AI-ModelScope/M3IT coco
vqa-v2
shapes
shapes-rephrased
coco-goi-rephrased
snli-ve
snli-ve-rephrased
okvqa
a-okvqa
viquae
textcap
docvqa
science-qa
imagenet
imagenet-open-ended
imagenet-rephrased
coco-goi
clevr
clevr-rephrased
nlvr
coco-itm
coco-itm-rephrased
vsr
vsr-rephrased
mocheg
mocheg-rephrased
coco-text
fm-iqa
activitynet-qa
msrvtt
ss
coco-cn
refcoco
refcoco-rephrased
multi30k
image-paragraph-captioning
visual-dialog
visual-dialog-rephrased
iqa
vcr
visual-mrc
ivqa
msrvtt-qa
msvd-qa
gqa
text-vqa
ocr-vqa
st-vqa
flickr8k-cn
- Dataset is too huge, please click the original link to view the dataset stat. chat, multi-modal, vision -
sharegpt4v AI-ModelScope/ShareGPT4V ShareGPT4V
ShareGPT4V-PT
- Dataset is too huge, please click the original link to view the dataset stat. chat, multi-modal, vision -
llava-instruct-150k AI-ModelScope/LLaVA-Instruct-150K 624610 490.4±180.2, min=288, max=5438 chat, multi-modal, vision -
llava-pretrain AI-ModelScope/LLaVA-Pretrain default - Dataset is too huge, please click the original link to view the dataset stat. vqa, multi-modal, quality liuhaotian/LLaVA-Pretrain
sa1b-dense-caption Tongyi-DataEngine/SA1B-Dense-Caption - Dataset is too huge, please click the original link to view the dataset stat. zh, multi-modal, vqa -
sa1b-paired-caption Tongyi-DataEngine/SA1B-Paired-Captions-Images - Dataset is too huge, please click the original link to view the dataset stat. zh, multi-modal, vqa -
alpaca-cleaned AI-ModelScope/alpaca-cleaned 51760 177.9±126.4, min=26, max=1044 chat, general, bench, quality yahma/alpaca-cleaned
aya-collection swift/aya_collection aya_dataset 202364 494.0±6911.3, min=21, max=3044268 multi-lingual, qa CohereForAI/aya_collection
belle-generated-chat-0.4M AI-ModelScope/generated_chat_0.4M 396004 273.3±52.0, min=32, max=873 common, zh BelleGroup/generated_chat_0.4M
belle-math-0.25M AI-ModelScope/school_math_0.25M 248480 157.7±72.2, min=33, max=3450 math, zh BelleGroup/school_math_0.25M
belle-train-0.5M-CN AI-ModelScope/train_0.5M_CN 519255 129.1±91.5, min=27, max=6507 common, zh, quality BelleGroup/train_0.5M_CN
belle-train-1M-CN AI-ModelScope/train_1M_CN - Dataset is too huge, please click the original link to view the dataset stat. common, zh, quality BelleGroup/train_1M_CN
belle-train-2M-CN AI-ModelScope/train_2M_CN - Dataset is too huge, please click the original link to view the dataset stat. common, zh, quality BelleGroup/train_2M_CN
belle-train-3.5M-CN swift/train_3.5M_CN - Dataset is too huge, please click the original link to view the dataset stat. common, zh, quality BelleGroup/train_3.5M_CN
c4 None - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality allenai/c4
chart-qa swift/ChartQA 28299 43.1±5.5, min=29, max=77 en, vqa, quality HuggingFaceM4/ChartQA
chinese-c4 swift/chinese-c4 - Dataset is too huge, please click the original link to view the dataset stat. pretrain, zh, quality shjwudp/chinese-c4
cinepile swift/cinepile - Dataset is too huge, please click the original link to view the dataset stat. vqa, en, youtube, video tomg-group-umd/cinepile
classical-chinese-translate swift/classical_chinese_translate 6655 344.0±76.4, min=61, max=815 chat, play-ground -
codealpaca-20k AI-ModelScope/CodeAlpaca-20k 20016 100.2±60.1, min=29, max=1776 code, en HuggingFaceH4/CodeAlpaca_20K
cosmopedia None auto_math_text
khanacademy
openstax
stanford
stories
web_samples_v1
web_samples_v2
wikihow
- Dataset is too huge, please click the original link to view the dataset stat. multi-domain, en, qa HuggingFaceTB/cosmopedia
cosmopedia-100k swift/cosmopedia-100k 100000 1024.5±243.1, min=239, max=2981 multi-domain, en, qa HuggingFaceTB/cosmopedia-100k
dolma swift/dolma v1_7 - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality allenai/dolma
dolphin swift/dolphin flan1m-alpaca-uncensored
flan5m-alpaca-uncensored
- Dataset is too huge, please click the original link to view the dataset stat. en cognitivecomputations/dolphin
duet AI-ModelScope/Duet-v0.5 5000 1157.4±189.3, min=657, max=2344 CoT, en G-reen/Duet-v0.5
evol-instruct-v2 AI-ModelScope/WizardLM_evol_instruct_V2_196k 109184 480.9±333.1, min=26, max=4942 chat, en WizardLM/WizardLM_evol_instruct_V2_196k
fineweb None - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality HuggingFaceFW/fineweb
gen-qa swift/GenQA - Dataset is too huge, please click the original link to view the dataset stat. qa, quality, multi-task tomg-group-umd/GenQA
github-code swift/github-code - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality codeparrot/github-code
gpt4v-dataset swift/gpt4v-dataset 12356 217.9±68.3, min=35, max=596 en, caption, multi-modal, quality laion/gpt4v-dataset
guanaco-belle-merge AI-ModelScope/guanaco_belle_merge_v1.0 693987 134.2±92.0, min=24, max=6507 QA, zh Chinese-Vicuna/guanaco_belle_merge_v1.0
infinity-instruct swift/Infinity-Instruct - Dataset is too huge, please click the original link to view the dataset stat. qa, quality, multi-task BAAI/Infinity-Instruct
llava-med-zh-instruct swift/llava-med-zh-instruct-60k 56649 207.7±67.6, min=37, max=657 zh, medical, vqa BUAADreamer/llava-med-zh-instruct-60k
🔥longwriter-6k ZhipuAI/LongWriter-6k 6000 4887.2±2879.2, min=117, max=30354 long, chat, sft THUDM/LongWriter-6k
🔥longwriter-6k-filtered swift/longwriter-6k-filtered 666 4108.9±2636.9, min=1190, max=17050 long, chat, sft -
math-instruct AI-ModelScope/MathInstruct 262283 254.4±183.5, min=11, max=4383 math, cot, en, quality TIGER-Lab/MathInstruct
math-plus TIGER-Lab/MATH-plus train 893929 287.1±158.7, min=24, max=2919 qa, math, en, quality TIGER-Lab/MATH-plus
moondream2-coyo-5M swift/moondream2-coyo-5M-captions - Dataset is too huge, please click the original link to view the dataset stat. caption, pretrain, quality isidentical/moondream2-coyo-5M-captions
no-robots swift/no_robots 9485 298.7±246.4, min=40, max=6739 multi-task, quality, human-annotated HuggingFaceH4/no_robots
open-hermes swift/OpenHermes-2.5 - Dataset is too huge, please click the original link to view the dataset stat. cot, en, quality teknium/OpenHermes-2.5
open-o1 AI-ModelScope/OpenO1-SFT default 203579 615.5±659.6, min=11, max=27509 chat, general, o1 O1-OPEN/OpenO1-SFT
open-orca-chinese AI-ModelScope/OpenOrca-Chinese - Dataset is too huge, please click the original link to view the dataset stat. QA, zh, general, quality yys/OpenOrca-Chinese
orca_dpo_pairs swift/orca_dpo_pairs 12859 366.9±251.9, min=30, max=2010 rlhf, quality Intel/orca_dpo_pairs
path-vqa swift/path-vqa 19654 34.8±7.3, min=27, max=85 multi-modal, vqa, medical flaviagiammarino/path-vqa
pile AI-ModelScope/pile - Dataset is too huge, please click the original link to view the dataset stat. pretrain EleutherAI/pile
poison-mpts iic/100PoisonMpts 906 150.6±80.8, min=39, max=656 poison-management, zh -
🔥qwen2-pro-en AI-ModelScope/Magpie-Qwen2-Pro-200K-English 200000 605.4±287.3, min=221, max=4267 chat, sft, en Magpie-Align/Magpie-Qwen2-Pro-200K-English
🔥qwen2-pro-filtered AI-ModelScope/Magpie-Qwen2-Pro-300K-Filtered 300000 555.8±286.6, min=148, max=4267 chat, sft Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered
🔥qwen2-pro-zh AI-ModelScope/Magpie-Qwen2-Pro-200K-Chinese 200000 446.2±246.4, min=74, max=4101 chat, sft, zh Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese
redpajama-data-1t swift/RedPajama-Data-1T - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality togethercomputer/RedPajama-Data-1T
redpajama-data-v2 swift/RedPajama-Data-V2 - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality togethercomputer/RedPajama-Data-V2
refinedweb None - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality tiiuae/falcon-refinedweb
rwkv-pretrain-web mapjack/openwebtext_dataset - Dataset is too huge, please click the original link to view the dataset stat. pretrain, zh, quality -
sft-nectar AI-ModelScope/SFT-Nectar 131192 396.4±272.1, min=44, max=10732 cot, en, quality AstraMindAI/SFT-Nectar
skypile AI-ModelScope/SkyPile-150B - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality, zh Skywork/SkyPile-150B
slim-orca swift/SlimOrca 517982 399.1±370.2, min=35, max=8756 quality, en Open-Orca/SlimOrca
slim-pajama-627b None - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality cerebras/SlimPajama-627B
starcoder AI-ModelScope/starcoderdata - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality bigcode/starcoderdata
tagengo-gpt4 swift/tagengo-gpt4 78057 472.3±292.9, min=22, max=3521 chat, multi-lingual, quality lightblue/tagengo-gpt4
the-stack AI-ModelScope/the-stack - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality bigcode/the-stack
ultrachat-200k swift/ultrachat_200k 207865 1195.4±573.7, min=76, max=4470 chat, en, quality HuggingFaceH4/ultrachat_200k
vqa-v2 swift/VQAv2 443757 31.8±2.2, min=27, max=58 en, vqa, quality HuggingFaceM4/VQAv2
web-instruct-sub swift/WebInstructSub - Dataset is too huge, please click the original link to view the dataset stat. qa, en, math, quality, multi-domain, science TIGER-Lab/WebInstructSub
wikipedia swift/wikipedia - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality wikipedia
wikipedia-cn-filtered AI-ModelScope/wikipedia-cn-20230720-filtered - Dataset is too huge, please click the original link to view the dataset stat. pretrain, quality pleisto/wikipedia-cn-20230720-filtered
zhihu-rlhf AI-ModelScope/zhihu_rlhf_3k 3460 594.5±365.9, min=31, max=1716 rlhf, dpo, zh liyucheng/zhihu_rlhf_3k