Hi CoreSight maintainers, I am currently working on adding CoreSight support for the RISC-V platform and have identified a data loss issue when using workload-only mode in conjunction with fork system calls. Issue Description When recording traces in workload-only mode, fork events lead to partial trace data loss. Specifically, trace data prior to the fork call—including the entry of the main() function—is missing. In contrast, per-thread mode correctly captures the complete trace. Test Case The following test program was used to reproduce the issue: c /* gcc -D_GNU_SOURCE test_fork.c -o test_fork */ #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/types.h> #include <sys/wait.h> #include <sched.h> void child_task() { printf("Child process: Running on CPU=%d PID=%d\n", sched_getcpu(), getpid()); sleep(1); printf("Child process: Task finished\n"); } int main() { pid_t pid; printf("Parent process: Running on CPU=%d PID=%d\n", sched_getcpu(), getpid()); pid = fork(); if (pid == -1) { perror("fork failed"); exit(1); } else if (pid == 0) { child_task(); exit(0); } else { printf("Parent process: Waiting for child to finish...\n"); wait(NULL); printf("Parent process: Child finished\n"); } return 0; } Code Change for Debugging To assist in diagnosing the issue, the following debug output was added to coresight-etm-perf.c: diff --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -613,6 +611,10 @@ static void etm_event_pause(struct perf_event *event, size = sink_ops(sink)->update_buffer(sink, handle, ctxt->event_data->snk_config); + dev_dbg(&csdev->dev, "etm_event_pause called on CPU%d, size=%lu\n", + smp_processor_id(), size); + if (READ_ONCE(handle->event)) { if (!size) return; Log Output The following logs were captured during execution: root@k3:~# ~/perf record -e rvtrace/@tmc_etr0/ ./test_fork Parent process: Running on CPU=1 PID=1534 [ 2190.481643] coresight encoder1: DEBUG: CPU1 update_buffer returned size=0 Parent process: Waiting for child to finish... [ 2190.489223] coresight encoder6: DEBUG: CPU6 update_buffer returned size=0 Child process: Running on CPU=6 PID=1535 [ 2191.489372] coresight encoder6: DEBUG: CPU6 update_buffer returned size=30880 Child process: Task finished Parent process: Child finished [ 2191.496382] coresight encoder1: DEBUG: CPU1 update_buffer returned size=21504 [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.056 MB perf.data ] root@k3:~# ~/perf script > ./test_fork.log root@k3:~# grep "/root/test" test_fork.log -nrI 115: test_fork 1535 [006] 52.991808: 1 branches: 2aaacc7910 child_task+0x2e (/root/test_fork) => 2aaacc7794 puts@plt+0x4 (/root/test_fork) 116: test_fork 1535 [006] 52.991808: 1 branches: 2aaacc7798 puts@plt+0x8 (/root/test_fork) => 2aaacc7798 puts@plt+0x8 (/root/test_fork) 962: test_fork 1535 [006] 52.991809: 1 branches: 2aaacc791c child_task+0x3a (/root/test_fork) => 2aaacc7928 child_task+0x46 (/root/test_fork) 963: test_fork 1535 [006] 52.991809: 1 branches: 2aaacc7988 main+0x5e (/root/test_fork) => 2aaacc77c8 exit@plt+0x8 (/root/test_fork) 2468: test_fork 1535 [006] 52.991809: 1 branches: 2aaacc7894 __do_global_dtors_aux+0x0 (/root/test_fork) => 2aaacc7894 __do_global_dtors_aux+0x0 (/root/test_fork) 2563: test_fork 1535 [006] 52.991809: 1 branches: 2aaacc7898 __do_global_dtors_aux+0x4 (/root/test_fork) => 2aaacc7898 __do_global_dtors_aux+0x4 (/root/test_fork) 2564: test_fork 1535 [006] 52.991809: 1 branches: 2aaacc789c __do_global_dtors_aux+0x8 (/root/test_fork) => 2aaacc78b8 __do_global_dtors_aux+0x24 (/root/test_fork) 2570: test_fork 1535 [006] 52.991809: 1 branches: 2aaacc78ba __do_global_dtors_aux+0x26 (/root/test_fork) => 2aaacc7836 deregister_tm_clones+0x18 (/root/test_fork) 2571: test_fork 1535 [006] 52.991809: 1 branches: 2aaacc784c deregister_tm_clones+0x2e (/root/test_fork) => 2aaacc7852 deregister_tm_clones+0x34 (/root/test_fork) 2572: test_fork 1535 [006] 52.991809: 1 branches: 2aaacc78be __do_global_dtors_aux+0x2a (/root/test_fork) => 2aaacc78c4 __do_global_dtors_aux+0x30 (/root/test_fork) 2864: test_fork 1535 [006] 52.991809: 1 branches: 2aaacc78c8 __do_global_dtors_aux+0x34 (/root/test_fork) => 2aaacc78ce __do_global_dtors_aux+0x3a (/root/test_fork) 4163: test_fork 1534 [001] 52.991977: 1 branches: 2aaacc79a0 main+0x76 (/root/test_fork) => 2aaacc7798 puts@plt+0x8 (/root/test_fork) 4849: test_fork 1534 [001] 52.991977: 1 branches: 2aaacc79ac main+0x82 (/root/test_fork) => 2aaacc79b8 main+0x8e (/root/test_fork) 5244: test_fork 1534 [001] 52.991978: 1 branches: 2aaacc7894 __do_global_dtors_aux+0x0 (/root/test_fork) => 2aaacc78b8 __do_global_dtors_aux+0x24 (/root/test_fork) 5368: test_fork 1534 [001] 52.991978: 1 branches: 2aaacc78ba __do_global_dtors_aux+0x26 (/root/test_fork) => 2aaacc7836 deregister_tm_clones+0x18 (/root/test_fork) 5369: test_fork 1534 [001] 52.991978: 1 branches: 2aaacc784c deregister_tm_clones+0x2e (/root/test_fork) => 2aaacc7852 deregister_tm_clones+0x34 (/root/test_fork) 5370: test_fork 1534 [001] 52.991978: 1 branches: 2aaacc78be __do_global_dtors_aux+0x2a (/root/test_fork) => 2aaacc78c4 __do_global_dtors_aux+0x30 (/root/test_fork) 5478: test_fork 1534 [001] 52.991978: 1 branches: 2aaacc78c8 __do_global_dtors_aux+0x34 (/root/test_fork) => 2aaacc78ce __do_global_dtors_aux+0x3a (/root/test_fork) As shown in the logs, trace data from the parent process before the fork (including the entry to main()) is missing in workload-only mode. Questions I am currently uncertain whether this issue also affects the coresight-etm tracer, or if it is a hardware-specific problem on our platform. I would greatly appreciate any suggestions or guidance you may have. Thank you for your support!
进迭时空 梁镇