4. 希姆计算TensorTurbo模型支持说明

4.1. 版本历史

文档版本 对应产品版本 作者 日期 描述
V1.1.0 STCRP v1.5.1 希姆计算 2024-05-19 按业务需求调整表头。
V1.0.1 STCRP v1.5.1 希姆计算 2024-04-23 调整部分模型的分类。
V1.0.0 STCRP v1.5.1 希姆计算 2024-04-19 初始对外版本。

4.2. 概述

希姆计算基于STCP920推理卡的软硬件打通了众多算法模型,覆盖了计算机视觉(CV)、自然语言处理(NLP)、光学字符识别(OCR)、搜索推荐、语音、多模态等主流领域,并且有完整、成熟的软件栈帮助您进行部署和运维。本文以表格形式列举出可以使用TensorTurbo编译并完成推理任务的模型,以及相关的数据指标。

表格中缩写含义如下:

  • NN:Neural Network

  • fps:frame per second

  • sps:sentence per second

  • pps:product per second

  • fp16:floating point 16-bit

4.3. 计算机视觉

NN 吞吐 延时单位 延时 NN说明 板卡数
arcface 5154.4 fps 0.05 fp16,参数量41.57M,arcface,输入112*112 1
atmosphere_vulgar 7128.6 fps 0.09 fp16,参数量22.5M,atmosphere_vulgar,输入224*224 1
BiT 1365.6 fps 0.41 fp16,参数量24.37M,BiT,输入224*224 1
centernet_x 23800.40 fps 0.02 fp16,参数量13.56M,centernet_x,输入512*512 1
Conformer 5448.2 fps 0.09 fp16,参数量17.85M,Conformer,输入32*512 1
conformer_ctc_zh_trail_3098504_iter_18000 10863.9 fps 0.09 fp16,参数量1.2M,conformer_ctc_zh_trail_3098504_iter_18000,输入32*512 1
content_classify 1508.8 fps 0.34 fp16,参数量8.63M,content_classify,输入260*260 1
cv_model_01 6014.6 fps 0.09 fp16,参数量22.75M,cv_model_01,输入224*224 1
cv_model_02 5011.8 fps 0.12 fp16,参数量22.47M,cv_model_02,输入224*224 1
cv_model_03 6273 fps 0.09 fp16,参数量22.47M,cv_model_03,输入224*224 1
db_res18_epoach123_up20 199.5 fps 0.64 fp16,参数量11.64M,db_res18_epoach123_up20,输入960*480 1
deeplabv3 6.2 fps 82.88 fp16,参数量55.38M,v3,输入519*519 1
DenseNet121 6739.4 fps 0.08 fp16,参数量7.67M,densenet121,输入224*224 1
EfficientNet-B0 8006.4 fps 0.07 fp16,参数量3.86M,B0,输入112*96 1
EfficientNet-B0 5398.4 fps 0.11 fp16,参数量5.02M,B0,输入224*224 1
EfficientNet-B1 3316.5 fps 0.17 fp16,参数量7.4M,B1,输入240*240 1
EfficientNet-B5 1342.4 fps 0.39 fp16,参数量28.9M,B5,输入224*224 1
EfficientNetV2 3296.8 fps 0.22 fp16,参数量12.96M,V2,输入288*288 1
EfficientNetV2_s 5950.4 fps 0.09 fp16,参数量19.29M,V2,输入224*224 1
face_bbox_landmark_dets 623.5 fps 0.21 fp16,参数量4.03M,face_bbox_landmark_dets,输入640*640 1
FaceNet 17191.2 fps 0.3 fp16,参数量22.38M,FaceNet,输入160*160 1
fairface 12772.6 fps 0.25 fp16,参数量20.3M,fairface,输入224*224 1
fairmot 207.4 fps 0.16 fp16,参数量4.77M,fairmot,输入608*1088 1
fast_reid 1953.9 fps 0.0095 fp16,参数量22.41M,fast_reid,输入256*128 1
FCOS 3111.6 fps 0.16 fp16,参数量30.85M,FCOS,输入800*1216 1
GLEAN 16.2 fps 1.54 fp16,参数量151.61M,GLEAN,输入32x32 1
goods_tag_fashion_gender 6448.1 fps 0.08 fp16,参数量22.47M,goods_tag_fashion_gender,输入224*224 1
hastag 547.9 fps 0.94 fp16,参数量23.97M,hastag,输入256*256 1
hotsoon_live_v6_turbo 7155.2 fps 0.07 fp16,参数量22.47M,v6,输入224*224 1
hotsoon_live_v8 726.6 fps 0.78 fp16,参数量22.55M,v8,输入256*256 1
HRNet_pose_resnet50 9659.8 fps 0.0066 fp16,参数量32.4M,HRNet_pose_resnet50,输入384*288 1
Inception-v3 2798.9 fps 0.45 fp16,参数量22.72M,v3,输入299*299 1
MobileNetV2 8163.7 fps 0.0627 fp16,参数量5.8M,v2,输入224*224 1
MobileNetv3 25294 fps 0.16 fp16,参数量2.41M,v3,输入224*224 1
model_goods_search 159.5 fps 3.2 fp16,参数量84.08M,model_goods_search,输入224*224 1
model_goods_universal_emb_v6_serving 6370.4 fps 0.09 fp16,参数量22.72M,v6,输入224*224 1
mp_cls3_fpn 534.5 fps 0.96 fp16,参数量25.04M,mp_cls3_fpn,输入224*224 1
multi_task_resnet 3262 fps 0.18 fp16,参数量22.41M,multi_task_resnet,输入320*320 1
pose_hrnet_w32 1975.7 fps 0.02 fp16,参数量27.19M,pose_hrnet_w32,输入512*512 1
pose_hrnet_w48 2798.4 fps 0.0112 fp16,参数量60.61M,pose_hrnet_w48,输入384*288 1
PSEnet 325.3 fps 1.58 fp16,参数量27.39M,PSEnet,输入736*1312 1
rec_0530_add 3048.6 fps 0.16 fp16,参数量1.98M,rec_0530_add,输入32*512 1
regnet_quan_hist_mask 6477.5 fps 0.08 fp16,参数量8.05M,regnet_quan_hist_mask,输入224*224 1
regnet_quan_hist_mask_live 5883.5 fps 0.09 fp16,参数量7.98M,regnet_quan_hist_mask_live,输入224*224 1
ResNet101 5020.8 fps 0.11 fp16,参数量42.52M,resnet101,输入224*224 1
ResNet34 7779.2 fps 0.07 fp16,参数量20.79M,resnet34,输入224*224 1
ResNet50 8443.2 fps 0.07 fp16,参数量24.35M,V2,输入224*224 1
resnet50_fcnn 40.3 fps 6.31 fp16,参数量15.52M,resnet50_fcnn,输入832*832 1
resnet50_hotsoon 280.9 fps 1.82 fp16,参数量33.84M,resnet50_hotsoon,输入224*224 1
ResNet50_v1p5 7043.7 fps 0.09 fp16,参数量24.35M,v1.5,输入224*224 1
resnet50-torchvision-v0_10_0 6207.1 fps 0.0099 fp16,参数量24.35M,v0,输入224*224 1
RetinaFace_ResNet50 51.8 fps 0.35 fp16,参数量26.0M,RetinaFace_ResNet50,输入1024*1024 1
RetinNnet_ResNet50_FPN 10931.1 fps 0.05 fp16,参数量36.18M,RetinNnet_ResNet50_FPN,输入640*640 1
SEResNeXt 988.8 fps 0.51 fp16,参数量24.31M,SEResNeXt,输入320*320 1
ShuffleNet V2 14389.5 fps 0.04 fp16,参数量2.17M,v2,输入224*224 1
SlowFast 1978.2 fps 0.03 fp16,参数量32.85M,SlowFast,输入224*224 1
smoke_hotsoon_live_v2 5210.1 fps 0.12 fp16,参数量22.47M,v2,输入256*256 1
Swin Transformer 2111.3 fps 0.24 fp16,参数量27.48M,Swin Transformer,输入224*224 1
swin_transformer_s 1332.8 fps 0.39 fp16,参数量48.1M,swin_transformer_s,输入224*224 1
tongcheng_0819 11972.2 fps 0.04 fp16,参数量10.66M,tongcheng,输入224*224 1
TSM 143.4 fps 0.08 fp16,参数量22.73M,v2,输入256*256 1
U-Net 864.00 fps 0.59 fp16,参数量7.4M,U-Net,输入256*256 1
3D U-Net 444.3 fps 1.15 fp16,参数量29.75M,3D U-Net,输入128*128 1
VGG16 1147.2 fps 0.45 fp16,参数量131.95M,vgg16,输入224*224 1
video_jitter 249.5 fps 2.2 fp16,参数量22.47M,video_jitter,输入224*224 1
Vision Transformer 355.2 fps 1.46 fp16,参数量82.56M,Vision Transformer,输入224*224 1
ViT 4099.2 fps 0.14 fp16,参数量83.78M,ViT,输入224*224 1
YOLOv3 652.3 fps 2.53 fp16,参数量7.11M,YOLOv3,输入416*416 1
YOLOv5 755.3 fps 0.99 fp16,参数量6.9M,YOLOv5,输入640*480 1
YOLOv5 406.4 fps 1.4 fp16,参数量6.91M,YOLOv5,输入768*768 1
YOLOv5m 514 fps 0.27 fp16,参数量20.19M,YOLOv5m,输入640*640 1
YOLOv5s 759.3 fps 0.08 fp16,参数量6.89M,YOLOv5s,输入640*640 1

4.4. 自然语言处理

NN 吞吐 延时单位 延时 NN说明 板卡数
ALBERT 2903 sps 0.17 fp16,参数量7.44M,ALBERT,输入长度128 1
ALBERT 744.8 sps 0.81 fp16,参数量84.83M,ALBERT,输入长度384 1
ALBERT-zh-base 1820.2 sps 0.32 fp16,参数量10.06M,ALBERT-zh-base,输入长度128 1
ALBERT-zh-large 520.4 sps 0.99 fp16,参数量15.78M,ALBERT-zh-large,输入长度128 1
ALBERT-zh-small 4470.1 sps 0.17 fp16,参数量4.52M,ALBERT-zh-small,输入长度128 1
ALBERT-zh-tiny 4712.1 sps 0.16 fp16,参数量3.89M,ALBERT-zh-tiny,输入长度128 1
BERT 1132.4 sps 0.45 fp16,参数量81.87M,BERT,输入长度256 1
BERT 698.7 sps 0.82 fp16,参数量103.75M,BERT,输入长度384 1
BERT 599 sps 0.86 fp16,参数量82.06M,BERT,输入长度512 1
bert_128_mixin_asr_end2end 1715.3 sps 0.39 fp16,参数量100.98M,bert_128_mixin_asr_end2end,输入长度128 1
bert_256_fever_review_nlp_e2e_eco_rate 666.8 sps 0.92 fp16,参数量141.54M,bert_256_fever_review_nlp_e2e_eco_rate,输入长度256 1
BERT-Base 778.6 sps 0.16 fp16,参数量103.75M,BERT-Base,输入长度384 1
BERT-Base 3581.6 sps 0.04 fp16,参数量98.66M,BERT-base,输入长度128 1
bert_base_chinese 1137.2 sps 0.07 fp16,参数量96.97M,bert_base_chinese,输入长度256 1
BERT-BiLSTM 233.6 sps 0.24 fp16,参数量506.74M,BERT-BiLSTM,输入长度600 1
bert_classify_45_huggingface 4090.5 sps 0.13 fp16,参数量104.09M,bert_classify_45_huggingface,输入长度45 1
BERT-Large 122.7 sps 4.49 fp16,参数量318.62M,BERT-Large,输入长度384 1
bvrbert 20307 sps 0.03 fp16,参数量76.96M,bvrbert,输入长度32 1
ChineseBERT-wwm-ext 757.2 sps 0.06 fp16,参数量96.97M,ChineseBERT-wwm-ext,输入长度512 1
Chinese-XLNet-base 118 sps 0.13 fp16,参数量112.69M,Chinese-XLNet-base,输入长度512 1
DeBERTa_lay6 1442.3 sps 0.36 fp16,参数量138.88M,DeBERTa_lay6,输入长度64 1
DistilBERT 1291.2 sps 0.47 fp16,参数量63.2M,DistilBERT,输入长度384 1
model_politics_scorer_64 3370.4 sps 0.19 fp16,参数量100.98M,model_politics_scorer_64,输入长度128 1
model_video_porn_scorer_32 6865 sps 0.07 fp16,参数量100.98M,model_video_porn_scorer_32,输入长度128 1
RoBERTa 457.9 sps 1.28 fp16,参数量118.31M,RoBERTa,输入长度384 1
RoBERTa-zh 118.1 sps 0.27 fp16,参数量314.95M,RoBERTa-zh,输入长度256 1
RoFormer 372.9 sps 1.51 fp16,参数量28.47M,RoFormer,输入长度1024 1
rtcbert 1418.8 sps 0.41 fp16,参数量141.54M,rtcbert,输入长度128 1
video_medical_mm 588.2 sps 1.11 fp16,参数量58.55M,video_medical_mm,输入长度128 1
XLNet 1019.5 sps 0.55 fp16,参数量88.43M,XLNet,输入长度128 1

4.5. 光学字符识别

NN 吞吐 延时单位 延时 NN说明 板卡数
AttentionOCR 13804.8 fps 0.04 fp16,参数量7.57M,AttentionOCR,输入32*172 1
CRNN 13824 fps 0.04 fp16,参数量7.94M,CRNN,输入32*100 1
crnn_r34_ppocr 11503.2 fps 0.35 fp16,参数量23.37M,crnn_r34_ppocr,输入32*100 1
DBNet-MobileNetV3 212.5 fps 0.17 fp16,参数量1.61M,DBNet-MobileNetV3,输入736*1280 1
DBNet-ResNet50_vd 27.2 fps 2.3 fp16,参数量24.15M,DBNet-ResNet50_vd,输入736*1280 1
ocr_decoder 1959.2 fps 0.31 fp16,参数量17.31M,ocr_decoder,输入128*512 1
ocr_encoder 1283.5 fps 0.4 fp16,参数量20.63M,ocr_encoder,输入32*512 1
PaddleOCRCnRec 121121.50 fps 0.0042 fp16,参数量2.53M,PaddleOCRCnRec,输入48*320 1

4.6. 搜索推荐

NN 吞吐 延时单位 延时 NN说明 板卡数
ctr_base1501646 1440 pps 0.34 fp16,参数量9.04M,ctr_base1501646 1
cvr_pack_dcn_mmcn 147838.8 pps 0.23 fp16,参数量19.23M,cvr_pack_dcn_mmcn 1
cypher_cvr_b1582402 513.2 pps 1.12 fp16,参数量3.35M,cypher_cvr_b1582402 1
cypher_norbert_send_seq_iw_afs_r1765296_0 269.5 pps 0.95 fp16,参数量281.87M,cypher_norbert_send_seq_iw_afs_r1765296_0 1
cypher_realtime 421.5 pps 1.28 fp16,参数量3.24M,cypher_realtime 1
deep_interest 35136 pps 1.29 fp16,参数量4.05M,deep_interest 1
DeepFM 951944.90 pps 0.0344 fp16,参数量0.01M,DeepFM 1
DFN 2336814.30 pps 0.0018 fp16,参数量0.27M,DFN 1
DF_debias 6080.1 pps 5.84 fp16,参数量4.41M,DF_debias 1
DLRM 4578983.50 pps 0.0072 fp16,参数量1.26M,DLRM 1
experience_model_split 5383.5 pps 0.0951 fp16,参数量6.22M,experience_model_split 1
gip_cypher_ltr 365.7 pps 1.4 fp16,参数量6.78M,gip_cypher_ltr 1
ipnn 293389.4 pps 0.09 fp16,参数量25.39M,ipnn 1
kpnn 296797.6 pps 0.11 fp16,参数量25.4M,kpnn 1
mmoe_large 4272742 pps 0.0082 fp16,参数量0.08M,large 1
mmoe_XL 18080504 pps 0.0021 fp16,参数量0.08M,mmoe_XL 1
NeuralCF 24377704 pps 0.0107 fp16,参数量0.37M,NeuralCF 1
opnn 199722.6 pps 0.0026 fp16,参数量25.39M,opnn 1
preclk_sail 1598.4 pps 21.16 fp16,参数量8.52M,preclk_sail 1
recall_base 8744.20 pps 0.0585 fp16,参数量1.9M,recall_base 1
recall_ctr_base 6215.4 pps 5.2720 fp16,参数量1.91M,recall_ctr_base 1
rough 339134.40 pps 0.2144 fp16,参数量0.09M,rough 1
sail_cypher_ctr_aid_realtime_b1585413 484.6 pps 1.09 fp16,参数量5.03M,sail_cypher_ctr_aid_realtime_b1585413 1
sail_model 28417 pps 1.28 fp16,参数量358.6M,sail_model 1
search_ctr 3282202.60 pps 0.0079 fp16,参数量7.39M,search_ctr 1
st_interactive 4606.6 pps 0.1465 fp16,参数量7.25M,st_interactive 1
staytime 2006867.10 pps 0.0163 fp16,参数量1.43M,staytime 1
WDL 2261253 pps 0.0097 fp16,参数量2.27M,WDL,输入53248*2,2048*13 1

4.7. 语音

NN 吞吐 延时单位 延时 NN说明 板卡数
conformer_speech_large 204.4 fps 2.5 fp16,参数量193.01M,large,输入1078*80 1
conformer_speech_medium 571.4 fps 0.9 fp16,参数量65.3M,medium,输入1078*80 1
conformer_speech_small 893.2 fps 0.58 fp16,参数量30.39M,small,输入1078*80 1
ECAPA-TDNN 315.80 fps 0.2 fp16,参数量19.84M,ECAPA-TDNN,输入640*80 1
ECAPA-TDNN_s400 691.90 fps 0.09 fp16,参数量19.84M,ECAPA-TDNN_s400,输入400*80 1

4.8. 多模态

NN 吞吐 延时单位 延时 NN说明 板卡数
METER 8388.1 fps 0.0076 fp16,参数量56.89M,METER,输入图片240*768,序列长度240 1
videobert_t 1125.5 fps 0.45 fp16,参数量169.61M,videobert_t,输入256*256 1
videobert_v 598.8 fps 0.23 fp16,参数量22.64M,videobert_v,输入256*256 1