希姆计算stc-gdb使用说明

版本历史

版本 作者 日期 说明
V1.2.0 希姆计算 2022-04-11 - 更新编译命令和示例代码。
- 文档版本号对齐HPE发版。
- 整篇编辑优化。
V1.0.0 希姆计算 2021-09-13 初始版本。

概述

stc-gdb(Stream Computing Debugger)是希姆计算推出的命令行工具,用于调试NPU异构程序。stc-gdb扩展了GDB(GNU Debugger),支持在Linux系统上调试主机端代码,控制运行在NPU上的程序。您可以使用stc-gdb方便地监视程序运行状态,获取和修改程序的中间运行结果,减轻程序开发过程中的调试工作量,提高开发效率。

stc-gdb具有如下特性:

  • 完全兼容GDB原生命令。

  • ⽀持同时调试主机端和设备端的代码。

  • ⽀持调试使用NPU单核和多核的程序。

  • 支持对设备端代码进行源码级和指令级的单步调试。

  • 支持attach机制直接跟踪运行中的程序,方便定位到设备端代码。

  • 支持检查和修改程序所使用核的寄存器、变量或其他内存数据。

基本概念

概念 全称 说明
NPU Neural-network Processing Unit 神经网络处理器,专门面向大规模神经元和突触处理的处理器。采用数据驱动并行计算的架构,非常适合并行计算的场景。
NerualScale NerualScale 希姆计算自研的NPU神经网络计算加速架构,支持基于RISC-V指令集的扩展指令,具有良好的可编程性。
STCP920 Stream Computing P920 希姆计算推出的云端AI推理计算加速板卡,内含可编程NPU,基于NerualScale架构、软硬件协同设计。
NPC Neural-network Processing Core 希姆计算自研NPU中的神经网络处理核心。 每个NPU中包含多个NPC Cluster,例如STCP920中包含4个NPC Cluster,每个NPC Cluster包括8个NPC,因此STCP920中共包含32个NPC。 每个NPC Cluster以NPC Cluster ID作为标识符,对N个NPC Cluster,异构编程环境中使用0至N-1依次标记,同一个NPU中的所有NPC Cluster的ID是连续的。
核函数 Kernel Function 一个核函数对应一个计算任务,由主机端(CPU)部署到设备端(NPU),并在设备端的多个NPC上并行执行。
本地内存(L1) L1 每个NPC私有的高速内存,STCP920中每个NPC的本地内存大小为1.25 MiB。
共享内存(LLB) Last Level Buffer 每个NPC Cluster私有的中速内存,由NPC Cluster内的NPC共享,STCP920中每个NPC Cluster的共享内存大小为8 MiB。
全局内存(DDR) Double Data Rate Synchronous Dynamic Random-Access Memory 每个NPC Cluster私有的低速内存,由NPC Cluster内的NPC共享,STCP920中每个NPC Cluster的全局内存大小为4 GiB。
sysDMA System Direct Memory Access 数据传输通道,用于在NPC Cluster内的共享内存和全局内存间传输数据。

使用流程

  1. 准备调试环境。

    • 安装HPE。安装HPE后,stc-gdb的二进制文件位于/usr/local/hpe/bin目录中。

    • stc-gdb会将调试过程中产生的临时文件存储在/tmp目录中,因此请保证登录的用户拥有读写/tmp目录的权限。

  2. 编译获得含有调试信息的文件。

    在编译程序时添加-g选项,即可生成含有调试信息的⼆进制⽂件。示例如下,编译源文件matrix_multiply.hc,输出名为matrix_multiply的二进制文件,其中包含了调试信息。

    stcc --rtlib=compiler-rt matrix_multiply.hc -g -o matrix_multiply
    
  3. 启动stc-gdb。

    在启动stc-gdb可以同时载入编译获得的二进制文件。示例如下,启动stc-gdb并载入名为matrix_multiply的二进制文件。

    stc-gdb matrix_multiply
    

    如果需要在调试程序时传入更多参数,可以在启动stc-gdb时添加--args选项并指定命令行参数。示例如下,启动stc-gdb并载入名为matrix_multiply的二进制文件,并传入命令行参数arg1和arg2。

    stc-gdb --args matrix_multiply arg1 arg2
    
  4. 添加断点,然后运行程序开始调试。

    启动stc-gdb后,在需要查看调试信息的位置添加断点并运行程序即可。命中断点时程序暂停运行,您可以查看此时的调用堆栈、内存数据等信息。示例如下,分别在matmul函数处和程序第55行设置断点,然后运行程序开始调试。

    (stc-gdb) b matmul
    (stc-gdb) b 55
    (stc-gdb) run
    

stc-gdb命令

命令类型

运行程序开始调试后,可以执行命令控制程序运行、获取相关的信息。stc-gdb支持的命令类别如下:

  • GDB原生命令和选项:命名及使用方式和GDB定义保持一致。

  • 希姆计算扩展命令和选项:希姆计算的扩展命令均使用stc作为前缀。

与GDB一致,您可以通过help命令查看命令说明。示例如下,查看扩展命令列表及stc focus命令的说明。

说明:希姆计算扩展命令仅在异构程序运行在设备端时可用,具体的调试过程请参见调试示例章节。

(stc-gdb) help stc
STC npu specific commands

List of stc subcommands:

stc auto-switch -- Open/Close auto switch func
stc breakpoint -- Set/Unset breakpoint always on device
stc focus -- Modify/show STC npu focus state
stc info -- Show STC npu hardware info

Type "help stc" followed by stc subcommand name for full documentation.
Type "apropos word" to search for commands related to "word".
Command name abbreviations are allowed if unambiguous.
(stc-gdb) help stc focus
Modify/show STC npu focus state.

stc focus

命令说明

stc focus命令用于在调试程序时查看和管理使用的NPC。

  • NPC坐标

NPC(即core)是运行异构程序的最小单元。一张AI推理卡(device)包括多个cluster,每个cluster包括多个NPC。以STCP920为例,单张STCP920包括4个cluster,单cluster又包括8个NPC。在运行异构程序时可能需要使用多个NPC,为方便灵活操作每个NPC,规定使用[device x, cluster y, core z]的坐标形式标识唯一的NPC。

  • 多核调试

在NPC坐标的基础上,stc-gdb提供了focus机制,通过stc focus命令focus到指定的NPC后,该stc-gdb进程的命令只会作用在指定的NPC上,方便您调试多核程序。

使用示例

  • 查看当前的focus,NPC坐标为[device 0, cluster 0, core 0]。

    (stc-gdb) stc focus
    [Focusing on logical device 0 cluster 0 core 0]
    
  • 将focus从[device 0, cluster 0, core 0]切换到[device 0, cluster 0, core 2]。

    (stc-gdb) stc focus device 0 cluster 0 core 2
    [Switch from logical device 0 cluster 0 core 0 to logical device 0 cluster 0 core 2.]
    
  • 如果未指定完整的NPC坐标信息,则参照当前使用中NPC的坐标信息补全,例如自动补全device和cluster。

    (stc-gdb) stc focus core 3
    [Switch from logical device 0 cluster 0 core 2 to logical device 0 cluster 0 core 3.]
    

stc info

命令说明

stc info命令用于查看当前device硬件相关的信息,包括cluster列表、NPC列表、NPC状态、当前focus的 NPC等。

可能的NPC状态包括:

  • BREAKPOINT:命中断点。

  • INTERRUPT:因Ctrl+C等操作中断。

  • SINGLESTEP:执行n等命令后单步调试。

使用示例

(stc-gdb) stc info
device cluster core phy-core     pc       status    focus 
   0      0      0      0    0x1400180  BREAKPOINT    *   
   0      0      1      1    0x1400180  BREAKPOINT        
   0      0      2      2    0x1400180  BREAKPOINT        
   0      0      3      3    0x1400180  BREAKPOINT        
   0      0      4      4    0x1400180  BREAKPOINT        
   0      0      5      5    0x1400180  BREAKPOINT        
   0      0      6      6    0x1400180  BREAKPOINT        
   0      0      7      7    0x1400180  BREAKPOINT 

说明:focus列中的星号(*)表示当前使用的NPC为[device 0, cluster 0, core 0]。

attach

命令说明

stc-gdb兼容原生GDB的attach机制,您可以跟踪运行中的程序,直观了解设备端代码的执行情况。与GDB一致,stc-gdb支持在启动时和启动后attach到运行中程序所在的进程。

使用示例

  • 启动stc-gdb时指定进程的pid。

    stc-gdb --pid $(pid of matrix_multiply)
    
  • 启动stc-gdb后执行attach命令指定进程的pid。

    (stc-gdb) attach $(pid of matrix_multiply)
    

说明:部分操作系统中需要sudo权限,请按照提示进行提权操作。

调试示例

新创建进程并调试程序

启动stc-gdb后,在需要查看调试信息的位置添加断点并运行程序即可。命中断点时程序暂停运行,您可以使用查看此时的调用堆栈、内存数据等信息。

说明:支持指定函数或代码行添加断点,主机端代码和设备端代码均可,但设备端调试仅支持添加软件断点,不支持硬件断点和观察点。

假设有示例异构程序文件matrix_multiply.hc,代码如下:

/*
 * Copyright (c) 2019-2022 北京希姆计算科技有限公司 (Stream Computing Inc.)
 * All Rights Reserved.
 *
 * NOTICE: All intellectual and technical information contained herein
 * are proprietary to Stream Computing Inc. Any unauthorized disemination,
 * copying or redistribution of this file via any medium is strictly prohibited,
 * unless you get a prior written permission or an applicable license agreement
 * from Stream Computing Inc.
 */
/*
 * This example uses internal instructions to do matrix multiply.
 */

#include <asm_macro.h>
#include <hpe.h>
#include <npurt.h>
#include <stdio.h>

// number of left matrix's col and right matrix's row
#define LCOL_RROW 8

// local_left * local_right = local_out
__device__ void matmul(__fp16 *local_out, __fp16 *local_left,
                       __fp16 *local_right) {
    int shape1, shape2;

    // do matrix multiply and result must be stored in IM buffer
    shape1 = DEFINE_SHAPE(LCOL_RROW, 1);
    shape2 = DEFINE_SHAPE(1, LCOL_RROW);
    CONFIG_VE_BC_CSR(shape1, shape2, 0, 0);
    memul_mm((__fp16 *)IM_BUFFER_START, local_left, local_right);

    // move result from IM buffer to local memory
    shape1 = DEFINE_SHAPE(1, 1);
    shape2 = 0;
    CONFIG_VE_CSR(shape1, shape2, 0, 0);
    mov_m(local_out, (__fp16 *)IM_BUFFER_START);
}

__global__ void matmul_kernel(__fp16 *global_out, __fp16 *global_left,
                              __fp16 *global_right) {
    __local__ __fp16 local_left[LCOL_RROW];
    __local__ __fp16 local_right[LCOL_RROW];
    __local__ __fp16 local_out;
    __shared__ __fp16 share_out[CoreNum];
    printf("CoreNum is %d\n", CoreNum);
    // copy right matrix to each core
    memcpy(local_right, global_right, LCOL_RROW * sizeof(__fp16));

    // copy a row of left matrix for each core
    memcpy(local_left, global_left + CoreID * LCOL_RROW,
           LCOL_RROW * sizeof(__fp16));
    // matrix multiply
    matmul(&local_out, local_left, local_right);

    // copy result in local memory of each core to shared memory
    memcpy(share_out + CoreID, &local_out, sizeof(__fp16));

    // sync to wait each of the core compute share_out data filled
    sync();

    if (CoreID == 0) {
        // copy result in share memory of each core to global memory
        memcpy(global_out, share_out, CoreNum * sizeof(__fp16));
    }
}

#define NCORE 8

int main(void) {
    __fp16 *dev_left, *dev_right, *dev_out;

    // left matrix data for 8 cores
    __fp16 host_left[8 * LCOL_RROW] = {
        0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, // row1
        1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, // row2
        2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, // row3
        3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, // row4
        4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, // row5
        5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, // row6
        6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, // row7
        7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, // row8
    };
    // right matrix data
    __fp16 host_right[LCOL_RROW] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0};

    // copy left matrix to device
    int mat_size_left = NCORE * LCOL_RROW * sizeof(__fp16);
    stcMalloc((void **)&dev_left, mat_size_left);
    stcMemcpy(dev_left, host_left, mat_size_left, stcMemcpyHostToDevice);

    // copy right matrix to device
    int mat_size_right = LCOL_RROW * sizeof(__fp16);
    stcMalloc((void **)&dev_right, mat_size_right);
    stcMemcpy(dev_right, host_right, mat_size_right, stcMemcpyHostToDevice);

    // allocate result buffer in device
    int mat_size_out = NCORE * sizeof(__fp16);
    __fp16 host_out[NCORE];
    stcMalloc((void **)&dev_out, mat_size_out);

    matmul_kernel<<<NCORE>>>(dev_out, dev_left, dev_right);
    stcDeviceSynchronize();

    // copy result from device to host
    stcMemcpy(host_out, dev_out, mat_size_out, stcMemcpyDeviceToHost);

    printf("matrix multiply result:");
    for (int i = 0; i < NCORE; i++)
        printf("%.1f, ", (float)(host_out[i]));
    printf("\n");

    stcFree(dev_left);
    stcFree(dev_right);
    stcFree(dev_out);
    return 0;
}

按照使用流程章节的步骤编译matrix_multiply.hc,获得名为matrix_multiply的二进制文件,即可启动stc-gdb开始调试。

  1. 启动stc-gdb并载入matrix_multiply。

    stc-gdb matrix_multiply 
    STREAMCOMPUTING (R) STCNPU Debugger
    1.1.0 release.
    Portions Copyright (C) 2019-2021 STREAMCOMPUTING Inc.
    GNU gdb (GDB) 8.2.50.20181127-git
    Copyright (C) 2018 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.
    Type "show copying" and "show warranty" for details.
    This GDB was configured as "x86_64-pc-linux-gnu".
    Type "show configuration" for configuration details.
    --Type <RET> for more, q to quit, c to continue without paging--
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>.
    Find the GDB manual and other documentation resources online at:
        <http://www.gnu.org/software/gdb/documentation/>.
    
    For help, type "help".
    Type "apropos word" to search for commands related to "word"...
    Reading symbols from matrix_multiply...
    
    warning: A handler for the OS ABI "GNU/Linux" is not built into this configuration
    of GDB.  Attempting to continue with the default riscv:rv32 settings.
    
  2. 在核函数上添加断点,然后运行程序。

    a. 执行break(简写为b)命令添加函数断点。

    (stc-gdb) b matmul
    Breakpoint 1 at 0xc000001a: file matrix_multiply.hc, line 29.
    

    b. 添加代码行断点,然后执行run(简写为r)命令运行程序。

    (stc-gdb) b 55
    Breakpoint 2 at 0xc00001b6: file matrix_multiply.hc, line 55.
    (stc-gdb) run
    Starting program: /home/.../matrix_multiply
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
    [Detaching after vfork from child process 3925344]
    STCGDB LOAD INFERIOR MODULE
    INF PID OF MODULE -- 3925340
            NPU -- 0
            cluster -- 0
            link addr -- 0x1400180
            inf image content addr -- 0x230390
            inf image content size -- 0x93b0
            image name: /tmp/stcimage-pid3925340-dev0-cluster0-2398c0-0
            inf module prefix addr -- 0x2398c0
    
    warning: Probes-based dynamic linker interface failed.
    Reverting to original interface.
    
  3. 命中断点后暂停运行程序,查看调用堆栈、focus等信息。

    a. 执行backtrace(简写为bt)命令查看调用堆栈。

    Breakpoint 2, matmul_kernel (global_out=0x1400100, global_left=0x1400000, global_right=0x1400080) at matrix_multiply.hc:55
    55          matmul(&local_out, local_left, local_right);
    (stc-gdb) bt
    #0  matmul_kernel (global_out=0x1400100, global_left=0x1400000, global_right=0x1400080) at matrix_multiply.hc:55
    

    b. 此时异构程序已经运行到设备端,可以执行扩展命令stc focus查看当前使用的NPC,或者使用stc info查看NPC状态等更多信息。

    (stc-gdb) stc focus
    [Focusing on logical device 0 cluster 0 core 0]
    (stc-gdb) stc info
    device cluster core phy-core     pc       status    focus 
       0      0      0      0    0x1400336  BREAKPOINT    *   
       0      0      1      1    0x1400336  BREAKPOINT        
       0      0      2      2    0x1400336  BREAKPOINT        
       0      0      3      3    0x1400336  BREAKPOINT        
       0      0      4      4    0x1400336  BREAKPOINT        
       0      0      5      5    0x1400336  BREAKPOINT        
       0      0      6      6    0x1400336  BREAKPOINT        
       0      0      7      7    0x1400336  BREAKPOINT   
    
  4. 切换focus获取对应NPC的控制权,然后执行continue(简写为c)继续运行。

    (stc-gdb) stc focus device 0 cluster 0 core 2
    [Switch from logical device 0 cluster 0 core 0 to logical device 0 cluster 0 core 2.]
    (stc-gdb) c
    Continuing.
    
  5. 命中断点后暂停运行程序,查看变量、内存数据等信息。

    • 执行print(简写为p)命令打印变量的值。

      说明:stc-gdb支持打印半精度__fp16类型的变量。

      Breakpoint 1, matmul (local_out=0xc013fff0, local_left=0xc013ffd0, local_right=0xc013ffe0) at matrix_multiply.hc:29
      29          shape1 = DEFINE_SHAPE(LCOL_RROW, 1);
      (stc-gdb) p local_left
      $1 = (__fp16 *) 0xc013ffd0
      (stc-gdb) p *local_left
      $2 = 2
      
    • 执行examine(简写为x)命令查看内存数据。

      (stc-gdb) x /8x local_left
      0xc013ffd0:     0x40004000      0x40004000      0x40004000      0x40004000
      0xc013ffe0:     0x40003c00      0x44004200      0x46004500      0x48004700
      
    • 执行info(简写为i)命令查看寄存器的值。

      (stc-gdb) i registers pc
      pc                  0x140019a   0x140019a <matmul(half*, half*, half*)+26>
      (stc-gdb) i registers
      ra                  0x1400342   0x1400342 <matmul_kernel(half*, half*, half*)+258>
      sp                  0x10dff84   0x10dff84
      gp                  0x4af0      0x4af0
      tp                  0x140a9c0   0x140a9c0
      t0                  0xc0140000  -1072431104
      t1                  0x10        16
      t2                  0x0 0
      fp                  0x10dffb4   0x10dffb4
      s1                  0x2 2
      a0                  0x80001     524289
      a1                  0xc013ffd0  -1072431152
      a2                  0xc013ffe0  -1072431136
      a3                  0x10        16
      a4                  0x0 0
      a5                  0x1 1
      a6                  0x10dff58   17694552
      a7                  0x0 0
      s2                  0x34980     215424
      s3                  0x2b        43
      s4                  0x1400240   20972096
      s5                  0x26980     158080
      s6                  0xc1005c3c  -1056940996
      s7                  0x0 0
      s8                  0x10e0000   17694720
      s9                  0x140a9c0   21014976
      s10                 0xc1020004  -1056833532
      s11                 0x3 3
      t3                  0x1000      4096
      t4                  0x3030101   50528513
      t5                  0x400062    4194402
      t6                  0xc0040000  -1073479680
      pc                  0x140019a   0x140019a <matmul(half*, half*, half*)+26>
      
    • 执行info(简写为i)命令查看扩展寄存器的值。

      (stc-gdb) i registers tid
      tid                 0x2 2
      (stc-gdb) i registers shape_s1
      shape_s1            0x800002    8388610
      
  6. 执行sini命令进行汇编指令级单步调试,执行sn命令进行源码级单步调试。

    (stc-gdb) si
    0x0140019e      29          shape1 = DEFINE_SHAPE(LCOL_RROW, 1);
    (stc-gdb) ni
    0x014001a0      29          shape1 = DEFINE_SHAPE(LCOL_RROW, 1);
    (stc-gdb) n
    30          shape2 = DEFINE_SHAPE(1, LCOL_RROW);
    (stc-gdb) s
    31          CONFIG_VE_BC_CSR(shape1, shape2, 0, 0);
    
  7. 执行continue(简写为c)继续运行,命中断点后暂停运行程序,查看pc附近的汇编指令等信息。

    • 命中断点后暂停运行程序。

      (stc-gdb) c
      Continuing.
      
      Breakpoint 2, matmul_kernel (global_out=0x1400100, global_left=0x1400000, global_right=0x1400080) at matrix_multiply.hc:55
      55          matmul(&local_out, local_left, local_right);
      
    • 执行disassemble(简写为disass)命令查看当前pc附近的汇编指令。

      (stc-gdb) disass
      Dump of assembler code for function matmul_kernel(half*, half*, half*):
         0x01400240 <+0>:     addi    sp,sp,-16
         0x01400242 <+2>:     sw      a0,12(sp)
         0x01400244 <+4>:     sw      a1,8(sp)
         0x01400246 <+6>:     sw      a2,4(sp)
         0x01400248 <+8>:     sw      ra,0(sp)
         0x0140024a <+10>:    li      a1,48
         0x0140024e <+14>:    fmv.x.w a0,fs9
         0x01400252 <+18>:    auipc   ra,0x1
         0x01400256 <+22>:    jalr    1998(ra) # 0x1401a20 <check_local_memory>
         0x0140025a <+26>:    fmv.x.w a0,fs9
         0x0140025e <+30>:    addi    a0,a0,-48
         0x01400262 <+34>:    fmv.w.x fs9,a0
         0x01400266 <+38>:    lw      a0,12(sp)
         0x01400268 <+40>:    lw      a1,8(sp)
         0x0140026a <+42>:    lw      a2,4(sp)
         0x0140026c <+44>:    lw      ra,0(sp)
         0x0140026e <+46>:    addi    sp,sp,16
         0x01400270 <+48>:    addi    sp,sp,-64
         0x01400272 <+50>:    sw      ra,60(sp)
         0x01400274 <+52>:    sw      s0,56(sp)
         0x01400276 <+54>:    fsw     fs8,52(sp)
         0x01400278 <+56>:    fsw     fs11,48(sp)
         0x0140027a <+58>:    fmv.x.w a0,fs11
         0x0140027e <+62>:    mv      a0,a0
         0x01400280 <+64>:    fmv.w.x fs8,a0
         0x01400284 <+68>:    addi    s0,sp,64
         0x01400286 <+70>:    lw      a0,8(s0)
         0x01400288 <+72>:    lw      a0,4(s0)
      --Type <RET> for more, q to quit, c to continue without paging--
      
  8. 执行info(简写i)命令查看已添加的断点,并执行delete(简写d)删除对应的断点。

    (stc-gdb) i b
    Num     Type           Disp Enb Address    What
    1       breakpoint     keep y   <MULTIPLE> 
            breakpoint already hit 1 time
    1.1                         y   0x0140019a in matmul(half*, half*, half*) at matrix_multiply.hc:29
    1.2                         n   0xc000001a in matmul(half*, half*, half*) at matrix_multiply.hc:29
    2       breakpoint     keep y   <MULTIPLE> 
            breakpoint already hit 2 times
    2.1                         y   0x01400336 in matmul_kernel(half*, half*, half*) at matrix_multiply.hc:55
    2.2                         n   0xc00001b6 in matmul_kernel(half*, half*, half*) at matrix_multiply.hc:55
    (stc-gdb) d 1
    (stc-gdb) i b
    Num     Type           Disp Enb Address    What
    2       breakpoint     keep y   <MULTIPLE> 
            breakpoint already hit 2 times
    2.1                         y   0x01400336 in matmul_kernel(half*, half*, half*) at matrix_multiply.hc:55
    2.2                         n   0xc00001b6 in matmul_kernel(half*, half*, half*) at matrix_multiply.hc:55
    (stc-gdb) d 2
    (stc-gdb) i b
    No breakpoints or watchpoints.
    
  9. 执行kill(简写为k)强制结束进程并退出调试。

    (stc-gdb) k
    Kill the program being debugged? (y or n) y
    STCGDB UNLOAD INFERIOR MODULE
    INF PID OF MODULE -- 3925340
            NPU -- 0
            cluster -- 0
            link addr -- 0x1400180
            inf image content addr -- 0x230390
            inf image content size -- 0x93b0
            image name: /tmp/stcimage-pid3925340-dev0-cluster0-2398c0-0
            inf module prefix addr -- 0x2398c0
    
    [Inferior 1 (process 3925340) killed]
    

attach已有进程并调试程序

假设有示例异构程序文件hello_world.hc,代码如下:

/*
 * Copyright (c) 2019-2021 北京希姆计算科技有限公司 (Stream Computing Inc.)
 * All Rights Reserved.
 *
 * NOTICE: All intellectual and technical information contained herein
 * are proprietary to Stream Computing Inc. Any unauthorized disemination,
 * copying or redistribution of this file via any medium is strictly prohibited,
 * unless you get a prior written permission or an applicable license agreement
 * from Stream Computing Inc.
 */
/*
 * This example uses NPURT API 'printf' to print device message in host
 * terminal.
 */
#include <asm_macro.h>
#include <hpe.h>
#include <npurt.h>
#include <stdio.h>

__global__ void hello(void) {
    while (1) {
        printf("hello world from core %d/%d.\n", CoreID, CoreNum);
    }
}

#define NCORE 8

int main(void) {

    printf("running hello_world......\n");

    hello<<<NCORE>>>();
    stcDeviceSynchronize();

    return 0;
}

假设hello_world当前已经在设备端运行,且pid为512906:

  1. 执行attach(简写为at)命令指定hello_world所在的进程,跟踪到hello_world。

    (stc-gdb) at 512906
    Attaching to process 512906
    Reading symbols from /home/.../hello_world...
    warning: A handler for the OS ABI "GNU/Linux" is not built into this configuration
    of GDB.  Attempting to continue with the default riscv:rv32 settings.
    
    Reading symbols from /usr/local/hpe-1.1/lib/libhpert.so.1.1...
    (No debugging symbols found in /usr/local/hpe-1.1/lib/libhpert.so.1.1)
    Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...
    (No debugging symbols found in /lib/x86_64-linux-gnu/libpthread.so.0)
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
    Reading symbols from /lib/x86_64-linux-gnu/libstdc++.so.6...
    (No debugging symbols found in /lib/x86_64-linux-gnu/libstdc++.so.6)
    Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...
    (No debugging symbols found in /lib/x86_64-linux-gnu/libm.so.6)
    Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...
    (No debugging symbols found in /lib/x86_64-linux-gnu/libgcc_s.so.1)
    Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
    (No debugging symbols found in /lib/x86_64-linux-gnu/libc.so.6)
    Reading symbols from /lib64/ld-linux-x86-64.so.2...
    (No debugging symbols found in /lib64/ld-linux-x86-64.so.2)
    STCGDB LOAD INFERIOR MODULE
    INF PID OF MODULE -- 512906
            NPU -- 2
            cluster -- 1
            link addr -- 0x1400000
            inf image content addr -- 0x194a930
            inf image content size -- 0x9cac
            image name: /tmp/stcimage-pid512906-dev2-cluster1-1954750-0
            inf module prefix addr -- 0x1954750
    
    0x014010d2 in send_uart ()
    
  2. 执行扩展命令stc info查看NPC状态等信息。

    (stc-gdb) stc info
    device cluster core phy-core     pc       status    focus 
       2      1      0      0    0x14010d2   INTERRUPT    *   
       2      1      1      1    0x14010d2   INTERRUPT        
       2      1      2      2    0x14011b4   INTERRUPT        
       2      1      3      3    0x14011b4   INTERRUPT        
       2      1      4      4    0x1401222   INTERRUPT        
       2      1      5      5    0x14011f2   INTERRUPT        
       2      1      6      6    0x1401124   INTERRUPT        
       2      1      7      7    0x14010fc   INTERRUPT        
    
  3. 执行break(简写为b)命令为设备端代码添加断点,然后执行continue(简写为c)继续运行。命中断点后,暂停运行程序。

    (stc-gdb) b 22
    Breakpoint 1 at 0x140000c: file hello_world.hc, line 22.
    (stc-gdb) c
    Continuing.
    
    Breakpoint 1, hello () at hello_world.hc:22
    22              printf("hello world from core %d/%d.\n", CoreID, CoreNum);
    
  4. 执行quitq)命令退出调试。

    (stc-gdb) q
    A debugging session is active.
    
            Inferior 1 [process 512906] will be detached.
    
    Quit anyway? (y or n) y
    Detaching from program: /home/.../hello_world, process 512906
    STCGDB UNLOAD INFERIOR MODULE
    INF PID OF MODULE -- 512906
            NPU -- 2
            cluster -- 1
            link addr -- 0x1400000
            inf image content addr -- 0x194a930
            inf image content size -- 0x9cac
            image name: /tmp/stcimage-pid512906-dev2-cluster1-1954750-0
            inf module prefix addr -- 0x1954750
    
    [Inferior 1 (process 512906) detached]