模型支持说明
概述
我们基于推理卡的软硬件打通了众多算法模型,覆盖了大语言模型(LLM)、计算机视觉(CV)、自然语言处理(NLP)、光学字符识别(OCR)、搜索推荐、语音、多模态等主流领域,并且有完整、成熟的软件栈帮助您进行部署和运维。本文以表格形式列举出完成推理任务的模型,以及相关的数据指标。
表格中缩写含义如下:
-
NN:Neural Network
-
fps:frame per second
-
sps:sentence per second
-
pps:product per second
-
fp16:floating point 16-bit
计算机视觉
| NN | 吞吐 | 时延单位 | 时延 | NN说明 | 板卡数 |
|---|---|---|---|---|---|
| arcface | 5154.4 | fps | 0.05 | fp16,参数量41.57M,arcface,输入112*112 | 1 |
| arcface_ir50 | 896.54 | fps | 0.0011 | fp16,参数量41.57M,arcface_ir50,输入112*112 | 1 |
| atmosphere_vulgar | 7128.6 | fps | 0.09 | fp16,参数量22.5M,atmosphere_vulgar,输入224*224 | 1 |
| BiT | 1640.2 | fps | 0.0006 | fp16,参数量24.37M,BiT,输入224*224 | 1 |
| centernet_x | 530.96 | fps | 0.0019 | fp16,参数量13.56M,centernet_x,输入512*512 | 1 |
| Conformer | 5448.2 | fps | 0.09 | fp16,参数量17.85M,Conformer,输入32*512 | 1 |
| conformer_ctc_zh_trail _3098504_iter_18000 | 10863.9 | fps | 0.09 | fp16,参数量1.2M,conformer_ctc_zh_trail_3098504_iter_18000,输入32*512 | 1 |
| content_classify | 2348.99 | fps | 0.0004 | fp16,参数量8.63M,content_classify,输入260*260 | 1 |
| CSPResNet50 | 1595.49 | fps | 0.0006 | fp16,参数量20.6M,Resnet50,输入256*256 | 1 |
| cv_model_01 | 6014.6 | fps | 0.09 | fp16,参数量22.75M,cv_model_01,输入224*224 | 1 |
| cv_model_02 | 5011.8 | fps | 0.12 | fp16,参数量22.47M,cv_model_02,输入224*224 | 1 |
| cv_model_03 | 6273 | fps | 0.09 | fp16,参数量22.47M,cv_model_03,输入224*224 | 1 |
| db_res18_epoach123_up20 | 199.5 | fps | 0.64 | fp16,参数量11.64M,db_res18_epoach123_up20,输入960*480 | 1 |
| deeplabv3 | 25.45 | fps | 0.0393 | fp16,参数量55.38M,v3,输入519*519 | 1 |
| DenseNet121 | 6739.4 | fps | 0.08 | fp16,参数量7.67M,densenet121,输入224*224 | 1 |
| EfficientNet-B0 | 9584.79 | fps | 0.0001 | fp16,参数量3.86M,B0,输入112*96 | 1 |
| EfficientNet-B0 | 5398.4 | fps | 0.11 | fp16,参数量5.02M,B0,输入224*224 | 1 |
| EfficientNet-B1 | 3277.67 | fps | 0.0003 | fp16,参数量7.4M,B1,输入240*240 | 1 |
| EfficientNet-B5 | 1343.33 | fps | 0.0007 | fp16,参数量28.9M,B5,输入224*224 | 1 |
| EfficientNetV2 | 1974.27 | fps | 0.0005 | fp16,参数量12.96M,V2,输入288*288 | 1 |
| EfficientNetV2_s | 5950.4 | fps | 0.09 | fp16,参数量19.29M,V2,输入224*224 | 1 |
| face_bbox_landmark_dets | 862.95 | fps | 0.0012 | fp16,参数量4.03M,face_bbox_landmark_dets,输入640*640 | 1 |
| FaceNet | 17191.2 | fps | 0.3 | fp16,参数量22.38M,FaceNet,输入160*160 | 1 |
| fairface | 12772.6 | fps | 0.25 | fp16,参数量20.3M,fairface,输入224*224 | 1 |
| fairface_resnet34 | 1607.14 | fps | 0.0006 | fp16,参数量20.3M,fairface_resnet34,输入224*224 | 1 |
| fairmot | 207.4 | fps | 0.16 | fp16,参数量4.77M,fairmot,输入608*1088 | 1 |
| fast_reid | 1953.9 | fps | 0.0095 | fp16,参数量22.41M,fast_reid,输入256*128 | 1 |
| fer | 231.15 | fps | 0.0043 | fp16,参数量19.1M,fer,输入44*44 | 1 |
| FCOS | 3111.6 | fps | 0.16 | fp16,参数量30.85M,FCOS,输入800*1216 | 1 |
| GhostNet | 2354.74 | fps | 0.0004 | fp16,参数量4.93M,GhostNet,输入224*224 | 1 |
| GLEAN | 16.2 | fps | 1.54 | fp16,参数量151.61M,GLEAN,输入32x32 | 1 |
| goods_tag_fashion_gender | 6448.1 | fps | 0.08 | fp16,参数量22.47M,goods_tag_fashion_gender,输入224*224 | 1 |
| hastag | 547.9 | fps | 0.94 | fp16,参数量23.97M,hastag,输入256*256 | 1 |
| HarDNet 39DS | 13008.37 | fps | 0.0001 | fp16,参数量3.34M,HarDNet 39DS,输入224*224 | 1 |
| HarDNet 68 | 4702.6 | fps | 0.0002 | fp16,参数量16.76M,HarDNet 68,输入224*224 | 1 |
| HarDNet 85 | 2547.6 | fps | 0.0004 | fp16,参数量34.99M,HarDNet 85,输入224*224 | 1 |
| hotsoon_live_v6_turbo | 7155.2 | fps | 0.07 | fp16,参数量22.47M,v6,输入224*224 | 1 |
| hotsoon_live_v8 | 726.6 | fps | 0.78 | fp16,参数量22.55M,v8,输入256*256 | 1 |
| HRNet-W18 | 3383.43 | fps | 0.0003 | fp16,参数量20.36M,HRNet-W18,输入224*224 | 1 |
| HRNetV2-W44 | 422.39 | fps | 0.0024 | fp16,参数量63.91M,HRNetV2-W44,输入224*224 | 1 |
| HRNet-ResNet50 | 854.85 | fps | 0.0012 | fp16,参数量32.4M,HRNet_pose_resnet50,输入384*288 | 1 |
| Inception-v3 | 3118.11 | fps | 0.0003 | fp16,参数量22.72M,v3,输入299*299 | 1 |
| Lightweight OpenPose | 1061.62 | fps | 0.0009 | fp16,参数量3.89M,Lightweight OpenPose,输入368*512 | 1 |
| MobileNetV2 | 9132.14 | fps | 0.0001 | fp16,参数量5.8M,v2,输入224*224 | 1 |
| MobileNetv3 | 11828.36 | fps | 0.0001 | fp16,参数量2.41M,v3,输入224*224 | 1 |
| MobileNetV3 large | 13191.29 | fps | 0.0001 | fp16,参数量5.22M,V3,输入224*224 | 1 |
| MobileNetV3 small | 13963.4 | fps | 0.0001 | fp16,参数量2.42M,V3,输入224*224 | 1 |
| model_goods_search | 159.5 | fps | 3.2 | fp16,参数量84.08M,model_goods_search,输入224*224 | 1 |
| model_goods_universal _emb_v6_serving | 6370.4 | fps | 0.09 | fp16,参数量22.72M,v6,输入224*224 | 1 |
| mp_cls3_fpn | 534.5 | fps | 0.96 | fp16,参数量25.04M,mp_cls3_fpn,输入224*224 | 1 |
| multi_task_resnet | 3262 | fps | 0.18 | fp16,参数量22.41M,multi_task_resnet,输入320*320 | 1 |
| pose_hrnet_w32 | 1975.7 | fps | 0.02 | fp16,参数量27.19M,pose_hrnet_w32,输入512*512 | 1 |
| pose_hrnet_w48 | 389.61 | fps | 0.0026 | fp16,参数量60.61M,pose_hrnet_w48,输入384*288 | 1 |
| PP-LCNet-0.25x | 14307.58 | fps | 0.0001 | fp16,参数量1.45M,0.25x,输入224*224 | 1 |
| PP-LCNet-0.35x | 14462.53 | fps | 0.0001 | fp16,参数量1.57M,0.35x,输入224*224 | 1 |
| PP-LCNet-0.5x | 14603.74 | fps | 0.0001 | fp16,参数量1.8M,0.5x,输入224*224 | 1 |
| PP-LCNet-0.75x | 14302.47 | fps | 0.0001 | fp16,参数量2.26M,0.75x,输入224*224 | 1 |
| PP-LCNet-1.0x | 14011.8 | fps | 0.0001 | fp16,参数量2.83M,1.0x,输入224*224 | 1 |
| PP-LCNet-1.5x | 12983.27 | fps | 0.0001 | fp16,参数量4.31M,1.5x,输入224*224 | 1 |
| PP-LCNet-2.0x | 12537.93 | fps | 0.0001 | fp16,参数量6.24M,2.0x,输入224*224 | 1 |
| PP-LCNet-2.5x | 9184.52 | fps | 0.0001 | fp16,参数量8.63M,2.5x,输入224*224 | 1 |
| PSEnet | 325.3 | fps | 1.58 | fp16,参数量27.39M,PSEnet,输入736*1312 | 1 |
| Pseudo-3D | 632.15 | fps | 0.0016 | fp16,参数量62.75M,Pseudo-3D,输入160*160 | 1 |
| rec_0530_add | 3048.6 | fps | 0.16 | fp16,参数量1.98M,rec_0530_add,输入32*512 | 1 |
| regnet_quan_hist_mask | 6477.5 | fps | 0.08 | fp16,参数量8.05M,regnet_quan_hist_mask,输入224*224 | 1 |
| regnet_quan_hist_mask_live | 5883.5 | fps | 0.09 | fp16,参数量7.98M,regnet_quan_hist_mask_live,输入224*224 | 1 |
| RegNetX-800MF | 3213.05 | fps | 0.0003 | fp16,参数量6.91M,RegNetX-800MF,输入224*224 | 1 |
| RepVGG | 245.67 | fps | 0.0041 | fp16,参数量139.29M,RepVGG,输入256*256 | 1 |
| ResNet101 | 5020.8 | fps | 0.11 | fp16,参数量42.52M,resnet101,输入224*224 | 1 |
| ResNet18 | 2727.2 | fps | 0.0004 | fp16,参数量11.14M,Resnet18,输入224*224 | 1 |
| ResNet34 | 7779.2 | fps | 0.07 | fp16,参数量20.79M,resnet34,输入224*224 | 1 |
| ResNet50 | 8443.2 | fps | 0.07 | fp16,参数量24.35M,V2,输入224*224 | 1 |
| resnet50_fcnn | 40.3 | fps | 6.31 | fp16,参数量15.52M,resnet50_fcnn,输入832*832 | 1 |
| resnet50_hotsoon | 280.9 | fps | 1.82 | fp16,参数量33.84M,resnet50_hotsoon,输入224*224 | 1 |
| ResNet50_v1p5 | 6638.99 | fps | 0.0002 | fp16,参数量24.35M,v1.5,输入224*224 | 1 |
| ResNet50_v2 | 6423.27 | fps | 0.0002 | fp16,参数量24.41M,v2,输入224*224 | 1 |
| resnet50-torchvision-v0_10_0 | 7287.74 | fps | 0.0001 | fp16,参数量24.35M,v0,输入224*224 | 1 |
| ResNeXt50 | 1238.72 | fps | 0.0008 | fp16,参数量23.84M,ResNeXt50,输入224*224 | 1 |
| RetinaFace_ResNet50 | 51.8 | fps | 0.35 | fp16,参数量26.0M,RetinaFace_ResNet50,输入1024*1024 | 1 |
| RetinNnet_ResNet50_FPN | 10931.1 | fps | 0.05 | fp16,参数量36.18M,RetinNnet_ResNet50_FPN,输入640*640 | 1 |
| SENet | 413.57 | fps | 0.0024 | fp16,参数量116.38M,SENet,输入224*224 | 1 |
| SEResNeXt | 988.8 | fps | 0.51 | fp16,参数量24.31M,SEResNeXt,输入320*320 | 1 |
| SE_ResNeXt101 | 1873.24 | fps | 0.0005 | fp16,参数量46.82M,ResNeXt101,输入224*224 | 1 |
| SE_ResNeXt50 | 2468.61 | fps | 0.0004 | fp16,参数量26.35M,ResNeXt50,输入224*224 | 1 |
| SE_ResNet18 | 10986.57 | fps | 0.0001 | fp16,参数量11.26M,ResNet18,输入224*224 | 1 |
| SE_ResNet34 | 7313.26 | fps | 0.0001 | fp16,参数量20.98M,ResNet34,输入224*224 | 1 |
| SE_ResNet50 | 3583.2 | fps | 0.0003 | fp16,参数量26.86M,ResNet50,输入224*224 | 1 |
| ShuffleNet V2 | 12542.61 | fps | 0.0001 | fp16,参数量2.17M,v2,输入224*224 | 1 |
| SlowFast | 1978.2 | fps | 0.03 | fp16,参数量32.85M,SlowFast,输入224*224 | 1 |
| SqueezeNet1_1 | 12884.47 | fps | 0.0001 | fp16,参数量1.18M,SqueezeNet1_1,输入224*224 | 1 |
| SSD300 | 3306.52 | fps | 0.0003 | fp16,参数量6.5M,SSD300,输入300*300 | 1 |
| smoke_hotsoon_live_v2 | 5210.1 | fps | 0.12 | fp16,参数量22.47M,v2,输入256*256 | 1 |
| Swin Transformer | 2030.86 | fps | 0.0005 | fp16,参数量27.48M,Swin Transformer,输入224*224 | 1 |
| tongcheng_0819 | 11972.2 | fps | 0.04 | fp16,参数量10.66M,tongcheng,输入224*224 | 1 |
| TSM | 143.4 | fps | 0.08 | fp16,参数量22.73M,v2,输入256*256 | 1 |
| U-Net 2D | 1225.71 | fps | 0.0008 | fp16, 参数量7.4M, U-Net 2D,输入256*256 | 1 |
| 3D U-Net | 444.3 | fps | 1.15 | fp16,参数量29.75M,3D U-Net,输入128*128 | 1 |
| VAN | 5195.72 | fps | 0.0002 | fp16,参数量3.92M,VAN,输入224*224 | 1 |
| VGG11 | 1969.73 | fps | 0.0006 | fp16,参数量126.71M,vgg11,输入224*224 | 1 |
| VGG13 | 1648.32 | fps | 0.0006 | fp16,参数量126.88M,vgg13,输入224*224 | 1 |
| VGG16 | 1447.69 | fps | 0.0007 | fp16,参数量131.95M,vgg16,输入224*224 | 1 |
| VGG19 | 1242.05 | fps | 0.0008 | fp16,参数量137.01M,vgg19,输入224*224 | 1 |
| video_jitter | 227.11 | fps | 0.0044 | fp16,参数量22.47M,video_jitter,输入224*224 | 1 |
| Vision Transformer | 6464.02 | fps | 0.0002 | fp16,参数量83.78M,Vision Transformer,输入224*224 | 1 |
| Xception | 1463.08 | fps | 0.0007 | fp16,参数量21.77M,Xception,输入299*299 | 1 |
| YOLOv3 | 500.51 | fps | 0.002 | fp16,参数量7.11M,YOLOv3,输入416*416 | 1 |
| YOLOv5 | 511.87 | fps | 0.0002 | fp16,参数量6.9M,YOLOv5,输入640*480 | 1 |
| YOLOv5 | 242.86 | fps | 0.0041 | fp16,参数量6.91M,YOLOv5,输入768*768 | 1 |
| YOLOv5m | 284.65 | fps | 0.0035 | fp16,参数量20.19M,YOLOv5m,输入640*640 | 1 |
| YOLOv5s | 366.32 | fps | 0.0027 | fp16,参数量6.89M,YOLOv5s,输入640*640 | 1 |
| YOLO11s | 557.41 | fps | 0.0018 | fp16,参数量9.05M,YOLO11s,输入640*640 | 1 |
| YOLO11s-cls | 3155.16 | fps | 0.0003 | fp16,参数量6.4M,YOLO11s-cls,输入224*224 | 1 |
| YOLO11s-obb | 170.96 | fps | 0.0058 | fp16,参数量9.32M,YOLO11s-obb,输入1024*1024 | 1 |
| YOLO11s-pose | 686.11 | fps | 0.0015 | fp16,参数量9.5M,YOLO11s-pose,输入640*640 | 1 |
| YOLO11s-seg | 279.54 | fps | 0.0036 | fp16,参数量9.67M,YOLO11s-seg,输入640*640 | 1 |
| yolox_m | 378.76 | fps | 0.0026 | fp16,参数量24.13M,yolox_m,输入640*640 | 1 |
| yolov8_n | 905.46 | fps | 0.0011 | fp16,参数量3.05M,yolov8_n,输入640*640 | 1 |
自然语言处理
| NN | 吞吐 | 时延单位 | 时延 | NN说明 | 板卡数 |
|---|---|---|---|---|---|
| ALBERT | 2903 | sps | 0.17 | fp16,参数量7.44M,ALBERT,输入长度128 | 1 |
| ALBERT | 679.72 | sps | 0.0015 | fp16,参数量84.83M,ALBERT,输入长度384 | 1 |
| ALBERT-zh-base | 3523.07 | sps | 0.0003 | fp16,参数量10.06M,ALBERT-zh-base,输入长度128 | 1 |
| ALBERT-zh-large | 616.49 | sps | 0.0016 | fp16,参数量15.78M,ALBERT-zh-large,输入长度128 | 1 |
| ALBERT-zh-small | 13950.31 | sps | 0.0001 | fp16,参数量4.52M,ALBERT-zh-small,输入长度128 | 1 |
| ALBERT-zh-tiny | 17372.22 | sps | 0.0001 | fp16,参数量3.89M,ALBERT-zh-tiny,输入长度128 | 1 |
| ALBERT-zh-xlarge | 78.41 | sps | 0.0128 | fp16,参数量1158.93M,xlarge,输入长度128 | 1 |
| BERT | 3520.76 | sps | 0.0003 | fp16,参数量81.78M,BERT,输入长度128 | 1 |
| BERT | 1302.85 | sps | 0.0008 | fp16,参数量81.87M,BERT,输入长度256 | 1 |
| BERT | 537.19 | sps | 0.0019 | fp16,参数量103.75M,BERT,输入长度384 | 1 |
| BERT | 614.06 | sps | 0.0016 | fp16,参数量82.06M,BERT,输入长度512 | 1 |
| bert_128_mixin_asr _end2end | 1715.3 | sps | 0.39 | fp16,参数量100.98M,bert_128_mixin_asr_end2end,输入长度128 | 1 |
| bert_256_fever_ review_nlp_e2e_eco_rate | 666.8 | sps | 0.92 | fp16,参数量141.54M,bert_256_fever_review_nlp_e2e_eco_rate,输入长度256 | 1 |
| bert-base-chinese | 104.89 | sps | 0.0095 | fp16,参数量97.53M,bert-base-chinese,输入长度384 | 1 |
| BERT-Base | 602.78 | sps | 0.0017 | fp16,参数量103.75M,BERT-Base,输入长度384 | 1 |
| BERT-Base | 3581.6 | sps | 0.04 | fp16,参数量98.66M,BERT-base,输入长度128 | 1 |
| bert_base_chinese | 1137.2 | sps | 0.07 | fp16,参数量96.97M,bert_base_chinese,输入长度256 | 1 |
| BERT-BiLSTM | 233.6 | sps | 0.24 | fp16,参数量506.74M,BERT-BiLSTM,输入长度600 | 1 |
| bert_classify_45_huggingface | 4090.5 | sps | 0.13 | fp16,参数量104.09M,bert_classify_45_huggingface,输入长度45 | 1 |
| BERT-Large | 139.22 | sps | 0.0072 | fp16,参数量318.62M,BERT-Large,输入长度384 | 1 |
| bge-base-zh-v1.5 | 113.89 | sps | 0.0088 | fp16,参数量97.53M,bge-base-zh-v1.5,输入长度384 | 1 |
| bge_m3 | 39.93 | sps | 0.025 | fp16,参数量541.45M,bge-m3,输入长度128 | 1 |
| bge-reranker-base | 108.83 | sps | 0.0092 | fp16,参数量265.16M,bge-reranker-base,输入长度512 | 1 |
| bge_small_zh_v1p5 | 363.28 | sps | 0.0028 | fp16,参数量22.84M,bge_small_zh_v1p5,输入长度384 | 1 |
| bvrbert | 20307 | sps | 0.03 | fp16,参数量76.96M,bvrbert,输入长度32 | 1 |
| ChineseBERT-wwm-ext | 757.2 | sps | 0.06 | fp16,参数量96.97M,ChineseBERT-wwm-ext,输入长度512 | 1 |
| chinese-roberta-wwm-ext | 97.26 | sps | 0.0103 | fp16,参数量97.53M,chinese-roberta-wwm-ext,输入长度512 | 1 |
| chinese_roberta_wwm _ext_large | 30.97 | sps | 0.0323 | fp16,参数量310.44M,chinese_roberta_wwm_ext_large,输入长度512 | 1 |
| Chinese-XLNet-base | 118 | sps | 0.13 | fp16,参数量112.69M,Chinese-XLNet-base,输入长度512 | 1 |
| DeBERTa_lay6 | 1442.3 | sps | 0.36 | fp16,参数量138.88M,DeBERTa_lay6,输入长度64 | 1 |
| DistilBERT | 1185.61 | sps | 0.008 | fp16,参数量63.2M,DistilBERT,输入长度384 | 1 |
| Erlangshen-SimCSE-110M-Chinese | 479.32 | sps | 0.0021 | fp16,参数量97.53M,Erlangshen-SimCSE-110M-Chinese,输入长度256 | 1 |
| model_politics_scorer_64 | 3370.4 | sps | 0.19 | fp16,参数量100.98M,model_politics_scorer_64,输入长度128 | 1 |
| model_video_porn_scorer_32 | 6865 | sps | 0.07 | fp16,参数量100.98M,model_video_porn_scorer_32,输入长度128 | 1 |
| RoBERTa | 492.8 | sps | 0.0002 | fp16,参数量118.31M,RoBERTa,输入长度384 | 1 |
| RoBERTa-zh | 118.1 | sps | 0.27 | fp16,参数量314.95M,RoBERTa-zh,输入长度256 | 1 |
| RoFormer | 213.17 | sps | 0.0047 | fp16,参数量28.47M,RoFormer,输入长度1024 | 1 |
| rtcbert | 1418.8 | sps | 0.41 | fp16,参数量141.54M,rtcbert,输入长度128 | 1 |
| t5_small_decoder | 18.57 | sps | 0.0538 | fp16,参数量59.38M,t5_small_decoder,输入长度512 | 1 |
| t5_small_encoder | 228.04 | sps | 0.0044 | fp16,参数量33.69M,t5_small_decoder,输入长度512 | 1 |
| video_medical_mm | 588.2 | sps | 1.11 | fp16,参数量58.55M,video_medical_mm,输入长度128 | 1 |
| XLNet | 1242.4 | sps | 0.0008 | fp16,参数量88.43M,XLNet,输入长度128 | 1 |
| bce_embedding_base_v1 | 96.32 | sps | 0.0104 | fp16,参数量265.16M,v1,输入长度512 | 1 |
| bce_reranker_base_v1 | 103.89 | sps | 0.0096 | fp16,参数量265.16M,v1,输入长度512 | 1 |
| bge_reranker_v2_m3 | 39.72 | sps | 0.0252 | fp16,参数量541.45M,v2,输入长度128 | 1 |
| gte-multilingual-reranker-base | 71.2 | sps | 0.014 | fp16,参数量291.79M,base,输入长度512 | 1 |
光学字符识别
| NN | 吞吐 | 时延单位 | 时延 | NN说明 | 板卡数 |
|---|---|---|---|---|---|
| AttentionOCR | 13804.8 | fps | 0.04 | fp16,参数量7.57M,AttentionOCR,输入32*172 | 1 |
| ch_PP-OCRv4_server_rec | 469.95 | fps | 0.0021 | fp16,参数量21.56M,ch_PP-OCRv4_server_rec,输入48*320 | 1 |
| CRNN | 13824 | fps | 0.04 | fp16,参数量7.94M,CRNN,输入32*100 | 1 |
| crnn_r34_ppocr | 11503.2 | fps | 0.35 | fp16,参数量23.37M,crnn_r34_ppocr,输入32*100 | 1 |
| DBNet-MobileNetV3 | 212.5 | fps | 0.17 | fp16,参数量1.61M,DBNet-MobileNetV3,输入736*1280 | 1 |
| DBNet-ResNet50_vd | 27.2 | fps | 2.3 | fp16,参数量24.15M,DBNet-ResNet50_vd,输入736*1280 | 1 |
| ocr_decoder | 2585.63 | fps | 0.0004 | fp16,参数量17.31M,ocr_decoder,输入128*512 | 1 |
| ocr_encoder | 1283.5 | fps | 0.4 | fp16,参数量20.63M,ocr_encoder,输入32*512 | 1 |
| PaddleOCRCnRec | 121121.50 | fps | 0.0042 | fp16,参数量2.53M,PaddleOCRCnRec,输入48*320 | 1 |
| ch_PP-OCRv4_server _det_modified | 33.27 | fps | 0.0301 | fp16,参数量27.01M,ch_PP-OCRv4_server_det_modified,输入384*512 | 1 |
搜索推荐
| NN | 吞吐 | 时延单位 | 时延 | NN说明 | 板卡数 |
|---|---|---|---|---|---|
| ctr_base1501646 | 1440 | pps | 0.34 | fp16,参数量9.04M,ctr_base1501646 | 1 |
| cvr_pack_dcn_mmcn | 147838.8 | pps | 0.23 | fp16,参数量19.23M,cvr_pack_dcn_mmcn | 1 |
| cypher_cvr_b1582402 | 513.2 | pps | 1.12 | fp16,参数量3.35M,cypher_cvr_b1582402 | 1 |
| cypher_norbert_send _seq_iw_afs_r1765296_0 | 269.5 | pps | 0.95 | fp16,参数量281.87M,cypher_norbert_send_seq_iw_afs_r1765296_0 | 1 |
| cypher_realtime | 421.5 | pps | 1.28 | fp16,参数量3.24M,cypher_realtime | 1 |
| deep_interest | 96366.68 | pps | 0.0001 | fp16,参数量4.05M,deep_interest | 1 |
| DeepFM | 951944.90 | pps | 0.0344 | fp16,参数量0.01M,DeepFM | 1 |
| DFN | 2336814.30 | pps | 0.0018 | fp16,参数量0.27M,DFN | 1 |
| DF_debias | 6080.1 | pps | 5.84 | fp16,参数量4.41M,DF_debias | 1 |
| DLRM | 127551.24 | pps | 0.0001 | fp16,参数量1.26M,DLRM | 1 |
| experience_model_split | 5383.5 | pps | 0.0951 | fp16,参数量6.22M,experience_model_split | 1 |
| gip_cypher_ltr | 365.7 | pps | 1.4 | fp16,参数量6.78M,gip_cypher_ltr | 1 |
| ipnn | 205356.29 | pps | 0.0001 | fp16,参数量25.39M,ipnn | 1 |
| kpnn | 232349.52 | pps | 0.0001 | fp16,参数量25.4M,kpnn | 1 |
| mmoe_large | 210969.63 | pps | 0.0001 | fp16,参数量0.08M,mmoe_large | 1 |
| mmoe_XL | 208351.22 | pps | 0.0001 | fp16,参数量0.08M,mmoe_XL | 1 |
| NCF | 136651.16 | pps | 0.0001 | fp16,参数量0.37M,NeuralCF | 1 |
| opnn | 150868.48 | pps | 0.0001 | fp16,参数量25.39M,opnn | 1 |
| preclk_sail | 1598.4 | pps | 21.16 | fp16,参数量8.52M,preclk_sail | 1 |
| recall_base | 8744.20 | pps | 0.0585 | fp16,参数量1.9M,recall_base | 1 |
| recall_ctr_base | 6215.4 | pps | 5.2720 | fp16,参数量1.91M,recall_ctr_base | 1 |
| rough | 339134.40 | pps | 0.2144 | fp16,参数量0.09M,rough | 1 |
| sail_cypher_ctr_aid _realtime_b1585413 | 484.6 | pps | 1.09 | fp16,参数量5.03M,sail_cypher_ctr_aid_realtime_b1585413 | 1 |
| sail_model | 28417 | pps | 1.28 | fp16,参数量358.6M,sail_model | 1 |
| search_ctr | 3282202.60 | pps | 0.0079 | fp16,参数量7.39M,search_ctr | 1 |
| st_interactive | 4606.6 | pps | 0.1465 | fp16,参数量7.25M,st_interactive | 1 |
| staytime | 2006867.10 | pps | 0.0163 | fp16,参数量1.43M,staytime | 1 |
| WDL | 122245.41 | pps | 0.0001 | fp16,参数量2.27M,WDL,输入532482,204813 | 1 |
语音
| NN | 吞吐 | 时延单位 | 时延 | NN说明 | 板卡数 |
|---|---|---|---|---|---|
| conformer_speech_large | 204.4 | fps | 2.5 | fp16,参数量193.01M,large,输入1078*80 | 1 |
| conformer_speech_medium | 640.22 | fps | 0.0016 | fp16,参数量65.3M,medium,输入1078*80 | 1 |
| conformer_speech_small | 943.34 | fps | 0.0011 | fp16,参数量30.39M,small,输入1078*80 | 1 |
| ECAPA-TDNN | 315.80 | fps | 0.2 | fp16,参数量19.84M,ECAPA-TDNN,输入640*80 | 1 |
| ECAPA-TDNN_s400 | 691.90 | fps | 0.09 | fp16,参数量19.84M,ECAPA-TDNN_s400,输入400*80 | 1 |
| whisper_small_decoder | 10.34 | fps | 0.0967 | fp16,参数量184.45M,whisper_small_decoder,输入1500*768 | 1 |
| whisper_small_encoder | 16.92 | fps | 0.0591 | fp16,参数量84.07M,whisper_small_encoder,输入80*3000 | 1 |
多模态
| NN | 吞吐 | 时延单位 | 时延 | NN说明 | 板卡数 |
|---|---|---|---|---|---|
| METER | 126.13 | fps | 0.0079 | fp16,参数量56.89M,METER,输入图片240*768,序列长度240 | 1 |
| videobert_t | 1125.5 | fps | 0.45 | fp16,参数量169.61M,videobert_t,输入256*256 | 1 |
| videobert_v | 598.8 | fps | 0.23 | fp16,参数量22.64M,videobert_v,输入256*256 | 1 |
强化学习
| NN | 吞吐 | 时延单位 | 时延 | NN说明 | 板卡数 |
|---|---|---|---|---|---|
| A3C | 5552.42 | fps | 0.0002 | fp16,参数量2.9M,A3C,输入42*42 | 1 |
| DQN | 6281.69 | fps | 0.0002 | fp16,参数量1.61M,videobert_v,输入84*84 | 1 |
LLM
| NN | 板卡数 | Batch Size | 输入长度 | 输出长度 | 首字延迟(s) | 吞吐(tokens/s) | 总延迟(s) |
|---|---|---|---|---|---|---|---|
| Baichuan2-7B-Base | 2 | 1 | 256 | 256 | 0.115 | 10.58 | 23.741 |
| Baichuan2-7B-Chat | 2 | 1 | 256 | 256 | 0.114 | 10.53 | 24.318 |
| chatglm2-6b | 2 | 1 | 256 | 256 | 0.097 | 12.81 | 19.979 |
| chatglm3-6b | 2 | 1 | 256 | 256 | 0.096 | 13.08 | 19.567 |
| chatglm3-6b-32k | 2 | 1 | 256 | 150 | 0.098 | 12.79 | 11.73 |
| chatglm3-6b-base | 2 | 1 | 256 | 200 | 0.097 | 12.83 | 15.591 |
| DeepSeek-R1-Distill-Llama-8B | 2 | 1 | 256 | 256 | 0.114 | 10.76 | 23.803 |
| DeepSeek-R1-Distill-Llama-70B | 16 | 1 | 256 | 256 | 1.005 | 4.42 | 57.956 |
| DeepSeek-R1-Distill-Qwen-1.5B | 1 | 1 | 256 | 256 | 0.05 | 20.9 | 12.25 |
| DeepSeek-R1-Distill-Qwen-14B | 4 | 1 | 256 | 256 | 0.299 | 9.33 | 27.444 |
| DeepSeek-R1-Distill-Qwen-32B | 8 | 1 | 256 | 256 | 0.463 | 7.13 | 35.899 |
| DeepSeek-R1-Distill-Qwen-7B | 2 | 1 | 256 | 256 | 0.113 | 10.4 | 24.622 |
| ERNIE-4.5-21B-A3B-PT | 4 | 1 | 256 | 256 | 0.306 | 8.57 | 30.608 |
| glm-4-9b-chat | 2 | 1 | 256 | 256 | 0.137 | 9.14 | 28.008 |
| internlm2-chat-20b | 4 | 1 | 256 | 256 | 0.365 | 7.91 | 32.361 |
| internlm2-chat-7b | 2 | 1 | 256 | 256 | 0.112 | 11.01 | 23.256 |
| Llama2-Chinese-7b-Chat | 2 | 1 | 256 | 200 | 0.111 | 11.33 | 17.657 |
| Llama-2-13b | 4 | 1 | 256 | 256 | 0.265 | 10.6 | 24.149 |
| Llama3-Chinese-8B-Instruct | 2 | 1 | 256 | 150 | 0.113 | 10.77 | 13.933 |
| Marco-o1 | 2 | 1 | 256 | 256 | 0.114 | 10.37 | 24.679 |
| Meta-Llama-3.1-8B-Instruct | 2 | 1 | 256 | 256 | 0.114 | 10.77 | 23.777 |
| Meta-Llama-3-8B | 2 | 1 | 256 | 256 | 0.115 | 10.74 | 23.835 |
| MiniCPM-1B-sft-bf16 | 1 | 1 | 256 | 256 | 0.047 | 24.12 | 10.612 |
| Mixtral-8x7B-Instruct-v0.1 | 8 | 1 | 256 | 256 | 0.248 | 14.85 | 17.24 |
| moss-moon-003-sft | 4 | 1 | 256 | 256 | 0.3 | 8.87 | 28.86 |
| Phi-2 | 1 | 1 | 256 | 256 | 0.071 | 14.2 | 18.023 |
| Qwen1.5-1.8B-Chat | 1 | 1 | 256 | 256 | 0.045 | 22.61 | 11.321 |
| Qwen1.5-14B-Chat | 4 | 1 | 256 | 256 | 0.272 | 10.06 | 25.442 |
| Qwen1.5-14B-Chat-w8a16 | 4 | 1 | 256 | 256 | 0.256 | 10.92 | 23.438 |
| Qwen1.5-14B-Chat-w8a8 | 4 | 1 | 256 | 256 | 0.218 | 13.61 | 18.812 |
| Qwen1.5-32B-Chat | 8 | 1 | 256 | 256 | 0.466 | 6.99 | 36.6 |
| Qwen1.5-7B-Chat | 2 | 1 | 256 | 256 | 0.117 | 10.28 | 24.902 |
| Qwen2-0.5B-Instruct | 1 | 1 | 256 | 256 | 0.024 | 43 | 5.954 |
| Qwen2-1.5B-Instruct | 1 | 1 | 256 | 256 | 0.051 | 20.08 | 12.747 |
| Qwen2-57B-A14B-Instruct-w8a16 | 8 | 1 | 256 | 256 | 0.655 | 9.09 | 28.176 |
| Qwen2-7B-Instruct | 2 | 1 | 256 | 256 | 0.115 | 10.39 | 24.644 |
| Qwen2-7B-Instruct-w8a8 | 2 | 1 | 256 | 256 | 0.089 | 14 | 18.289 |
| Qwen2-72B-Instruct | 16 | 1 | 256 | 256 | 0.948 | 4.98 | 51.427 |
| Qwen-7B-Chat | 2 | 1 | 256 | 256 | 0.116 | 20.82 | 24.588 |
| Qwen2.5-0.5B-Instruct | 1 | 1 | 256 | 256 | 0.023 | 44.97 | 5.693 |
| Qwen2.5-7B-Instruct | 2 | 1 | 256 | 256 | 0.115 | 10.2 | 25.099 |
| Qwen2.5-14B-Instruct | 4 | 1 | 256 | 256 | 0.301 | 9.59 | 26.686 |
| Qwen2.5-Coder-7B-Instruct | 2 | 1 | 256 | 256 | 0.112 | 10.31 | 24.83 |
| Qwen2.5-Math-7B-Instruct | 2 | 1 | 256 | 256 | 0.113 | 10.37 | 24.676 |
| QwQ-32B | 8 | 1 | 256 | 256 | 0.464 | 7.18 | 35.665 |
| TeleChat-12B-v2 | 2 | 1 | 256 | 256 | 0.316 | 6.91 | 37.027 |
| XVERSE-13B-Chat | 4 | 1 | 256 | 256 | 0.268 | 10.23 | 25.024 |
| Yi-1.5-34B-Chat | 8 | 1 | 256 | 256 | 0.521 | 7.19 | 35.585 |
| Yi-34B-Chat | 8 | 1 | 256 | 256 | 0.519 | 7.34 | 34.859 |
| Qwen3-8B | 2 | 1 | 256 | 256 | 0.124 | 9.97 | 25.68 |
| Qwen3-14B | 4 | 1 | 256 | 256 | 0.32 | 8.1 | 31.591 |
| Qwen3-32B | 8 | 1 | 256 | 256 | 0.461 | 7.14 | 35.833 |
| Jiuzhou-7B | 2 | 1 | 256 | 256 | 0.114 | 10.3 | 24.844 |