Omni-Infer v0.7.0 已经发布,超大规模 MoE 模型推理加速技术。
v0.7.0
核心特性
- Omni Cache支持MLA/GQA
- chunk prefill混部入图
- 支持SGLang
其它优化
- 基于2P8-1D32@A3,平均3.5K+1K,Deepseek R1性能达到QPM186,TTFT<2s,TPOT<20ms
- 基于2P2-1D4@A3,2K+2K,openPangu-72B单卡Decode峰值性能达到1560 TPS,TPOT<30ms
支持模型列表
| 模型 |
硬件 |
精度类型 |
部署形态 |
| openPangu-Ultra-MoE-718B |
A3 |
INT8 |
PD分离 |
| openPangu-Ultra-MoE-718B |
A2 |
INT8 |
PD分离 |
| openPangu-72B |
A3 |
INT8 |
PD分离 |
| openPangu-38B |
A3 |
INT8 |
混布 |
| openPangu-38B |
A2 |
INT8 |
混布 |
| openPangu-7B |
A3 |
BF16 |
混布 |
| openPangu-7B |
A2 |
BF16 |
混布 |
| openPangu-7BVL |
A3 |
BF16 |
混布 |
| DeepSeek-R1 |
A3 |
INT8 |
PD分离 |
| DeepSeek-R1 |
A3 |
W4A8C16 |
PD分离 |
| DeepSeek-R1 |
A3 |
BF16 |
PD分离 |
| DeepSeek-R1 |
A2 |
INT8 |
PD分离 |
| DeepSeek-V3.1 |
A3 |
INT8 |
PD分离 |
| DeepSeek-V3.2 |
A3 |
INT8 |
PD分离 |
| DeepSeek-OCR |
A2 |
BF16 |
混布 |
| Qwen2.5-7B |
A3 |
INT8 |
混布(TP>=1 DP=1) |
| Qwen2.5-7B |
A2 |
INT8 |
混布(TP>=1 DP=1) |
| QwQ |
A3 |
BF16 |
PD分离 |
| QwQ |
A2 |
BF16 |
PD分离 |
| Qwen3-235B |
A3 |
INT8 |
PD分离 |
| Qwen3-235B |
A2 |
BF16 |
PD分离 |
| Qwen3-32B |
A3 |
BF16 |
PD分离 |
| Qwen3-32B |
A3 |
INT8 |
PD分离 |
| Qwen3-30B |
A3 |
BF16 |
PD分离 |
| Kimi-K2 |
A3 |
W4A8C16 |
PD分离 |
| Kimi-K2 Thinking |
A3 |
W4A8C16 |
PD分离 |
| Longcat-flash |
A3 |
BF16 |
PD分离 |
| Ling-1T |
A3 |
BF16 |
PD分离 |
| GPT-OSS120B |
A3 |
INT8 |
PD分离 |
| GPT-OSS120B |
A2 |
INT8 |
PD分离 |
| GPT-OSS20B |
A3 |
INT8 |
PD分离 |
| GPT-OSS20B |
A2 |
INT8 |
PD分离 |
安装包
详情查看:https://gitee.com/omniai/omniinfer/releases/v0.7.0