4. 希姆计算TensorTurbo模型支持说明
4.1. 版本历史
文档版本 | 对应产品版本 | 作者 | 日期 | 描述 |
---|---|---|---|---|
V1.1.0 | STCRP v1.5.1 | 希姆计算 | 2024-05-19 | 按业务需求调整表头。 |
V1.0.1 | STCRP v1.5.1 | 希姆计算 | 2024-04-23 | 调整部分模型的分类。 |
V1.0.0 | STCRP v1.5.1 | 希姆计算 | 2024-04-19 | 初始对外版本。 |
4.2. 概述
希姆计算基于STCP920推理卡的软硬件打通了众多算法模型,覆盖了计算机视觉(CV)、自然语言处理(NLP)、光学字符识别(OCR)、搜索推荐、语音、多模态等主流领域,并且有完整、成熟的软件栈帮助您进行部署和运维。本文以表格形式列举出可以使用TensorTurbo编译并完成推理任务的模型,以及相关的数据指标。
表格中缩写含义如下:
NN:Neural Network
fps:frame per second
sps:sentence per second
pps:product per second
fp16:floating point 16-bit
4.3. 计算机视觉
NN | 吞吐 | 延时单位 | 延时 | NN说明 | 板卡数 |
---|---|---|---|---|---|
arcface | 5154.4 | fps | 0.05 | fp16,参数量41.57M,arcface,输入112*112 | 1 |
atmosphere_vulgar | 7128.6 | fps | 0.09 | fp16,参数量22.5M,atmosphere_vulgar,输入224*224 | 1 |
BiT | 1365.6 | fps | 0.41 | fp16,参数量24.37M,BiT,输入224*224 | 1 |
centernet_x | 23800.40 | fps | 0.02 | fp16,参数量13.56M,centernet_x,输入512*512 | 1 |
Conformer | 5448.2 | fps | 0.09 | fp16,参数量17.85M,Conformer,输入32*512 | 1 |
conformer_ctc_zh_trail_3098504_iter_18000 | 10863.9 | fps | 0.09 | fp16,参数量1.2M,conformer_ctc_zh_trail_3098504_iter_18000,输入32*512 | 1 |
content_classify | 1508.8 | fps | 0.34 | fp16,参数量8.63M,content_classify,输入260*260 | 1 |
cv_model_01 | 6014.6 | fps | 0.09 | fp16,参数量22.75M,cv_model_01,输入224*224 | 1 |
cv_model_02 | 5011.8 | fps | 0.12 | fp16,参数量22.47M,cv_model_02,输入224*224 | 1 |
cv_model_03 | 6273 | fps | 0.09 | fp16,参数量22.47M,cv_model_03,输入224*224 | 1 |
db_res18_epoach123_up20 | 199.5 | fps | 0.64 | fp16,参数量11.64M,db_res18_epoach123_up20,输入960*480 | 1 |
deeplabv3 | 6.2 | fps | 82.88 | fp16,参数量55.38M,v3,输入519*519 | 1 |
DenseNet121 | 6739.4 | fps | 0.08 | fp16,参数量7.67M,densenet121,输入224*224 | 1 |
EfficientNet-B0 | 8006.4 | fps | 0.07 | fp16,参数量3.86M,B0,输入112*96 | 1 |
EfficientNet-B0 | 5398.4 | fps | 0.11 | fp16,参数量5.02M,B0,输入224*224 | 1 |
EfficientNet-B1 | 3316.5 | fps | 0.17 | fp16,参数量7.4M,B1,输入240*240 | 1 |
EfficientNet-B5 | 1342.4 | fps | 0.39 | fp16,参数量28.9M,B5,输入224*224 | 1 |
EfficientNetV2 | 3296.8 | fps | 0.22 | fp16,参数量12.96M,V2,输入288*288 | 1 |
EfficientNetV2_s | 5950.4 | fps | 0.09 | fp16,参数量19.29M,V2,输入224*224 | 1 |
face_bbox_landmark_dets | 623.5 | fps | 0.21 | fp16,参数量4.03M,face_bbox_landmark_dets,输入640*640 | 1 |
FaceNet | 17191.2 | fps | 0.3 | fp16,参数量22.38M,FaceNet,输入160*160 | 1 |
fairface | 12772.6 | fps | 0.25 | fp16,参数量20.3M,fairface,输入224*224 | 1 |
fairmot | 207.4 | fps | 0.16 | fp16,参数量4.77M,fairmot,输入608*1088 | 1 |
fast_reid | 1953.9 | fps | 0.0095 | fp16,参数量22.41M,fast_reid,输入256*128 | 1 |
FCOS | 3111.6 | fps | 0.16 | fp16,参数量30.85M,FCOS,输入800*1216 | 1 |
GLEAN | 16.2 | fps | 1.54 | fp16,参数量151.61M,GLEAN,输入32x32 | 1 |
goods_tag_fashion_gender | 6448.1 | fps | 0.08 | fp16,参数量22.47M,goods_tag_fashion_gender,输入224*224 | 1 |
hastag | 547.9 | fps | 0.94 | fp16,参数量23.97M,hastag,输入256*256 | 1 |
hotsoon_live_v6_turbo | 7155.2 | fps | 0.07 | fp16,参数量22.47M,v6,输入224*224 | 1 |
hotsoon_live_v8 | 726.6 | fps | 0.78 | fp16,参数量22.55M,v8,输入256*256 | 1 |
HRNet_pose_resnet50 | 9659.8 | fps | 0.0066 | fp16,参数量32.4M,HRNet_pose_resnet50,输入384*288 | 1 |
Inception-v3 | 2798.9 | fps | 0.45 | fp16,参数量22.72M,v3,输入299*299 | 1 |
MobileNetV2 | 8163.7 | fps | 0.0627 | fp16,参数量5.8M,v2,输入224*224 | 1 |
MobileNetv3 | 25294 | fps | 0.16 | fp16,参数量2.41M,v3,输入224*224 | 1 |
model_goods_search | 159.5 | fps | 3.2 | fp16,参数量84.08M,model_goods_search,输入224*224 | 1 |
model_goods_universal_emb_v6_serving | 6370.4 | fps | 0.09 | fp16,参数量22.72M,v6,输入224*224 | 1 |
mp_cls3_fpn | 534.5 | fps | 0.96 | fp16,参数量25.04M,mp_cls3_fpn,输入224*224 | 1 |
multi_task_resnet | 3262 | fps | 0.18 | fp16,参数量22.41M,multi_task_resnet,输入320*320 | 1 |
pose_hrnet_w32 | 1975.7 | fps | 0.02 | fp16,参数量27.19M,pose_hrnet_w32,输入512*512 | 1 |
pose_hrnet_w48 | 2798.4 | fps | 0.0112 | fp16,参数量60.61M,pose_hrnet_w48,输入384*288 | 1 |
PSEnet | 325.3 | fps | 1.58 | fp16,参数量27.39M,PSEnet,输入736*1312 | 1 |
rec_0530_add | 3048.6 | fps | 0.16 | fp16,参数量1.98M,rec_0530_add,输入32*512 | 1 |
regnet_quan_hist_mask | 6477.5 | fps | 0.08 | fp16,参数量8.05M,regnet_quan_hist_mask,输入224*224 | 1 |
regnet_quan_hist_mask_live | 5883.5 | fps | 0.09 | fp16,参数量7.98M,regnet_quan_hist_mask_live,输入224*224 | 1 |
ResNet101 | 5020.8 | fps | 0.11 | fp16,参数量42.52M,resnet101,输入224*224 | 1 |
ResNet34 | 7779.2 | fps | 0.07 | fp16,参数量20.79M,resnet34,输入224*224 | 1 |
ResNet50 | 8443.2 | fps | 0.07 | fp16,参数量24.35M,V2,输入224*224 | 1 |
resnet50_fcnn | 40.3 | fps | 6.31 | fp16,参数量15.52M,resnet50_fcnn,输入832*832 | 1 |
resnet50_hotsoon | 280.9 | fps | 1.82 | fp16,参数量33.84M,resnet50_hotsoon,输入224*224 | 1 |
ResNet50_v1p5 | 7043.7 | fps | 0.09 | fp16,参数量24.35M,v1.5,输入224*224 | 1 |
resnet50-torchvision-v0_10_0 | 6207.1 | fps | 0.0099 | fp16,参数量24.35M,v0,输入224*224 | 1 |
RetinaFace_ResNet50 | 51.8 | fps | 0.35 | fp16,参数量26.0M,RetinaFace_ResNet50,输入1024*1024 | 1 |
RetinNnet_ResNet50_FPN | 10931.1 | fps | 0.05 | fp16,参数量36.18M,RetinNnet_ResNet50_FPN,输入640*640 | 1 |
SEResNeXt | 988.8 | fps | 0.51 | fp16,参数量24.31M,SEResNeXt,输入320*320 | 1 |
ShuffleNet V2 | 14389.5 | fps | 0.04 | fp16,参数量2.17M,v2,输入224*224 | 1 |
SlowFast | 1978.2 | fps | 0.03 | fp16,参数量32.85M,SlowFast,输入224*224 | 1 |
smoke_hotsoon_live_v2 | 5210.1 | fps | 0.12 | fp16,参数量22.47M,v2,输入256*256 | 1 |
Swin Transformer | 2111.3 | fps | 0.24 | fp16,参数量27.48M,Swin Transformer,输入224*224 | 1 |
swin_transformer_s | 1332.8 | fps | 0.39 | fp16,参数量48.1M,swin_transformer_s,输入224*224 | 1 |
tongcheng_0819 | 11972.2 | fps | 0.04 | fp16,参数量10.66M,tongcheng,输入224*224 | 1 |
TSM | 143.4 | fps | 0.08 | fp16,参数量22.73M,v2,输入256*256 | 1 |
U-Net | 864.00 | fps | 0.59 | fp16,参数量7.4M,U-Net,输入256*256 | 1 |
3D U-Net | 444.3 | fps | 1.15 | fp16,参数量29.75M,3D U-Net,输入128*128 | 1 |
VGG16 | 1147.2 | fps | 0.45 | fp16,参数量131.95M,vgg16,输入224*224 | 1 |
video_jitter | 249.5 | fps | 2.2 | fp16,参数量22.47M,video_jitter,输入224*224 | 1 |
Vision Transformer | 355.2 | fps | 1.46 | fp16,参数量82.56M,Vision Transformer,输入224*224 | 1 |
ViT | 4099.2 | fps | 0.14 | fp16,参数量83.78M,ViT,输入224*224 | 1 |
YOLOv3 | 652.3 | fps | 2.53 | fp16,参数量7.11M,YOLOv3,输入416*416 | 1 |
YOLOv5 | 755.3 | fps | 0.99 | fp16,参数量6.9M,YOLOv5,输入640*480 | 1 |
YOLOv5 | 406.4 | fps | 1.4 | fp16,参数量6.91M,YOLOv5,输入768*768 | 1 |
YOLOv5m | 514 | fps | 0.27 | fp16,参数量20.19M,YOLOv5m,输入640*640 | 1 |
YOLOv5s | 759.3 | fps | 0.08 | fp16,参数量6.89M,YOLOv5s,输入640*640 | 1 |
4.4. 自然语言处理
NN | 吞吐 | 延时单位 | 延时 | NN说明 | 板卡数 |
---|---|---|---|---|---|
ALBERT | 2903 | sps | 0.17 | fp16,参数量7.44M,ALBERT,输入长度128 | 1 |
ALBERT | 744.8 | sps | 0.81 | fp16,参数量84.83M,ALBERT,输入长度384 | 1 |
ALBERT-zh-base | 1820.2 | sps | 0.32 | fp16,参数量10.06M,ALBERT-zh-base,输入长度128 | 1 |
ALBERT-zh-large | 520.4 | sps | 0.99 | fp16,参数量15.78M,ALBERT-zh-large,输入长度128 | 1 |
ALBERT-zh-small | 4470.1 | sps | 0.17 | fp16,参数量4.52M,ALBERT-zh-small,输入长度128 | 1 |
ALBERT-zh-tiny | 4712.1 | sps | 0.16 | fp16,参数量3.89M,ALBERT-zh-tiny,输入长度128 | 1 |
BERT | 1132.4 | sps | 0.45 | fp16,参数量81.87M,BERT,输入长度256 | 1 |
BERT | 698.7 | sps | 0.82 | fp16,参数量103.75M,BERT,输入长度384 | 1 |
BERT | 599 | sps | 0.86 | fp16,参数量82.06M,BERT,输入长度512 | 1 |
bert_128_mixin_asr_end2end | 1715.3 | sps | 0.39 | fp16,参数量100.98M,bert_128_mixin_asr_end2end,输入长度128 | 1 |
bert_256_fever_review_nlp_e2e_eco_rate | 666.8 | sps | 0.92 | fp16,参数量141.54M,bert_256_fever_review_nlp_e2e_eco_rate,输入长度256 | 1 |
BERT-Base | 778.6 | sps | 0.16 | fp16,参数量103.75M,BERT-Base,输入长度384 | 1 |
BERT-Base | 3581.6 | sps | 0.04 | fp16,参数量98.66M,BERT-base,输入长度128 | 1 |
bert_base_chinese | 1137.2 | sps | 0.07 | fp16,参数量96.97M,bert_base_chinese,输入长度256 | 1 |
BERT-BiLSTM | 233.6 | sps | 0.24 | fp16,参数量506.74M,BERT-BiLSTM,输入长度600 | 1 |
bert_classify_45_huggingface | 4090.5 | sps | 0.13 | fp16,参数量104.09M,bert_classify_45_huggingface,输入长度45 | 1 |
BERT-Large | 122.7 | sps | 4.49 | fp16,参数量318.62M,BERT-Large,输入长度384 | 1 |
bvrbert | 20307 | sps | 0.03 | fp16,参数量76.96M,bvrbert,输入长度32 | 1 |
ChineseBERT-wwm-ext | 757.2 | sps | 0.06 | fp16,参数量96.97M,ChineseBERT-wwm-ext,输入长度512 | 1 |
Chinese-XLNet-base | 118 | sps | 0.13 | fp16,参数量112.69M,Chinese-XLNet-base,输入长度512 | 1 |
DeBERTa_lay6 | 1442.3 | sps | 0.36 | fp16,参数量138.88M,DeBERTa_lay6,输入长度64 | 1 |
DistilBERT | 1291.2 | sps | 0.47 | fp16,参数量63.2M,DistilBERT,输入长度384 | 1 |
model_politics_scorer_64 | 3370.4 | sps | 0.19 | fp16,参数量100.98M,model_politics_scorer_64,输入长度128 | 1 |
model_video_porn_scorer_32 | 6865 | sps | 0.07 | fp16,参数量100.98M,model_video_porn_scorer_32,输入长度128 | 1 |
RoBERTa | 457.9 | sps | 1.28 | fp16,参数量118.31M,RoBERTa,输入长度384 | 1 |
RoBERTa-zh | 118.1 | sps | 0.27 | fp16,参数量314.95M,RoBERTa-zh,输入长度256 | 1 |
RoFormer | 372.9 | sps | 1.51 | fp16,参数量28.47M,RoFormer,输入长度1024 | 1 |
rtcbert | 1418.8 | sps | 0.41 | fp16,参数量141.54M,rtcbert,输入长度128 | 1 |
video_medical_mm | 588.2 | sps | 1.11 | fp16,参数量58.55M,video_medical_mm,输入长度128 | 1 |
XLNet | 1019.5 | sps | 0.55 | fp16,参数量88.43M,XLNet,输入长度128 | 1 |
4.5. 光学字符识别
NN | 吞吐 | 延时单位 | 延时 | NN说明 | 板卡数 |
---|---|---|---|---|---|
AttentionOCR | 13804.8 | fps | 0.04 | fp16,参数量7.57M,AttentionOCR,输入32*172 | 1 |
CRNN | 13824 | fps | 0.04 | fp16,参数量7.94M,CRNN,输入32*100 | 1 |
crnn_r34_ppocr | 11503.2 | fps | 0.35 | fp16,参数量23.37M,crnn_r34_ppocr,输入32*100 | 1 |
DBNet-MobileNetV3 | 212.5 | fps | 0.17 | fp16,参数量1.61M,DBNet-MobileNetV3,输入736*1280 | 1 |
DBNet-ResNet50_vd | 27.2 | fps | 2.3 | fp16,参数量24.15M,DBNet-ResNet50_vd,输入736*1280 | 1 |
ocr_decoder | 1959.2 | fps | 0.31 | fp16,参数量17.31M,ocr_decoder,输入128*512 | 1 |
ocr_encoder | 1283.5 | fps | 0.4 | fp16,参数量20.63M,ocr_encoder,输入32*512 | 1 |
PaddleOCRCnRec | 121121.50 | fps | 0.0042 | fp16,参数量2.53M,PaddleOCRCnRec,输入48*320 | 1 |
4.6. 搜索推荐
NN | 吞吐 | 延时单位 | 延时 | NN说明 | 板卡数 |
---|---|---|---|---|---|
ctr_base1501646 | 1440 | pps | 0.34 | fp16,参数量9.04M,ctr_base1501646 | 1 |
cvr_pack_dcn_mmcn | 147838.8 | pps | 0.23 | fp16,参数量19.23M,cvr_pack_dcn_mmcn | 1 |
cypher_cvr_b1582402 | 513.2 | pps | 1.12 | fp16,参数量3.35M,cypher_cvr_b1582402 | 1 |
cypher_norbert_send_seq_iw_afs_r1765296_0 | 269.5 | pps | 0.95 | fp16,参数量281.87M,cypher_norbert_send_seq_iw_afs_r1765296_0 | 1 |
cypher_realtime | 421.5 | pps | 1.28 | fp16,参数量3.24M,cypher_realtime | 1 |
deep_interest | 35136 | pps | 1.29 | fp16,参数量4.05M,deep_interest | 1 |
DeepFM | 951944.90 | pps | 0.0344 | fp16,参数量0.01M,DeepFM | 1 |
DFN | 2336814.30 | pps | 0.0018 | fp16,参数量0.27M,DFN | 1 |
DF_debias | 6080.1 | pps | 5.84 | fp16,参数量4.41M,DF_debias | 1 |
DLRM | 4578983.50 | pps | 0.0072 | fp16,参数量1.26M,DLRM | 1 |
experience_model_split | 5383.5 | pps | 0.0951 | fp16,参数量6.22M,experience_model_split | 1 |
gip_cypher_ltr | 365.7 | pps | 1.4 | fp16,参数量6.78M,gip_cypher_ltr | 1 |
ipnn | 293389.4 | pps | 0.09 | fp16,参数量25.39M,ipnn | 1 |
kpnn | 296797.6 | pps | 0.11 | fp16,参数量25.4M,kpnn | 1 |
mmoe_large | 4272742 | pps | 0.0082 | fp16,参数量0.08M,large | 1 |
mmoe_XL | 18080504 | pps | 0.0021 | fp16,参数量0.08M,mmoe_XL | 1 |
NeuralCF | 24377704 | pps | 0.0107 | fp16,参数量0.37M,NeuralCF | 1 |
opnn | 199722.6 | pps | 0.0026 | fp16,参数量25.39M,opnn | 1 |
preclk_sail | 1598.4 | pps | 21.16 | fp16,参数量8.52M,preclk_sail | 1 |
recall_base | 8744.20 | pps | 0.0585 | fp16,参数量1.9M,recall_base | 1 |
recall_ctr_base | 6215.4 | pps | 5.2720 | fp16,参数量1.91M,recall_ctr_base | 1 |
rough | 339134.40 | pps | 0.2144 | fp16,参数量0.09M,rough | 1 |
sail_cypher_ctr_aid_realtime_b1585413 | 484.6 | pps | 1.09 | fp16,参数量5.03M,sail_cypher_ctr_aid_realtime_b1585413 | 1 |
sail_model | 28417 | pps | 1.28 | fp16,参数量358.6M,sail_model | 1 |
search_ctr | 3282202.60 | pps | 0.0079 | fp16,参数量7.39M,search_ctr | 1 |
st_interactive | 4606.6 | pps | 0.1465 | fp16,参数量7.25M,st_interactive | 1 |
staytime | 2006867.10 | pps | 0.0163 | fp16,参数量1.43M,staytime | 1 |
WDL | 2261253 | pps | 0.0097 | fp16,参数量2.27M,WDL,输入53248*2,2048*13 | 1 |
4.7. 语音
NN | 吞吐 | 延时单位 | 延时 | NN说明 | 板卡数 |
---|---|---|---|---|---|
conformer_speech_large | 204.4 | fps | 2.5 | fp16,参数量193.01M,large,输入1078*80 | 1 |
conformer_speech_medium | 571.4 | fps | 0.9 | fp16,参数量65.3M,medium,输入1078*80 | 1 |
conformer_speech_small | 893.2 | fps | 0.58 | fp16,参数量30.39M,small,输入1078*80 | 1 |
ECAPA-TDNN | 315.80 | fps | 0.2 | fp16,参数量19.84M,ECAPA-TDNN,输入640*80 | 1 |
ECAPA-TDNN_s400 | 691.90 | fps | 0.09 | fp16,参数量19.84M,ECAPA-TDNN_s400,输入400*80 | 1 |
4.8. 多模态
NN | 吞吐 | 延时单位 | 延时 | NN说明 | 板卡数 |
---|---|---|---|---|---|
METER | 8388.1 | fps | 0.0076 | fp16,参数量56.89M,METER,输入图片240*768,序列长度240 | 1 |
videobert_t | 1125.5 | fps | 0.45 | fp16,参数量169.61M,videobert_t,输入256*256 | 1 |
videobert_v | 598.8 | fps | 0.23 | fp16,参数量22.64M,videobert_v,输入256*256 | 1 |