Omni-Infer v0.4.1 已经发布,超大规模 MoE 模型推理加速技术
Omni-Infer v0.4.1 已经发布,超大规模 MoE 模型推理加速技术
此版本更新内容包括:
v0.4.1
核心特性
- 稳定性压测与优化
支持模型列表
| 模型 | 硬件 | 精度类型 | 部署形态 |
|---|---|---|---|
| DeepSeek-R1 | A3 | INT8 | PD分离 |
| DeepSeek-R1 | A3 | W4A8C16 | PD分离 |
| DeepSeek-R1 | A3 | BF16 | PD分离 |
| DeepSeek-R1 | A2 | INT8 | PD分离 |
| Qwen2.5-7B | A3 | INT8 | 混布(TP>=1 DP=1) |
| Qwen2.5-7B | A2 | INT8 | 混布(TP>=1 DP=1) |
| QwQ | A3 | BF16 | PD分离 |
| Qwen3-235B | A3 | INT8 | PD分离 |
| Kimi-K2 | A3 | W4A8C16 | PD分离 |
安装包
| 硬件 | 架构 | 镜像文件 | Tar包 |
|---|---|---|---|
| A3 | arm | docker pull swr.cn-east-4.myhuaweicloud.com/omni/omni_infer-a3-arm:release_v0.4.1 | omni_infer-a3-arm:v0.4.1 |
| A3 | x86 | docker pull swr.cn-east-4.myhuaweicloud.com/omni/omni_infer-a3-x86:release_v0.4.1 | omni_infer-a3-x86:v0.4.1 |
| A2 | arm | docker pull swr.cn-east-4.myhuaweicloud.com/omni/omni_infer-a2-arm:release_v0.4.1 | omni_infer-a2-arm:v0.4.1 |
| A2 | x86 | docker pull swr.cn-east-4.myhuaweicloud.com/omni/omni_infer-a2-x86:release_v0.4.1 | omni_infer-a2-x86:v0.4.1 |