Hi CoreSight maintainers,
I am currently working on adding CoreSight support for the RISC-V platform and have identified a data loss issue when using workload-only mode in conjunction with fork system calls.
Issue Description
When recording traces in workload-only mode, fork events lead to partial trace data loss. Specifically, trace data prior to the fork call—including the entry of the main() function—is missing. In contrast, per-thread mode correctly captures the complete trace.
Test Case
The following test program was used to reproduce the issue:
c
/* gcc -D_GNU_SOURCE test_fork.c -o test_fork */ #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/types.h> #include <sys/wait.h> #include <sched.h> void child_task() {     printf("Child process: Running on CPU=%d PID=%d\n", sched_getcpu(), getpid());     sleep(1);     printf("Child process: Task finished\n"); } int main() {     pid_t pid;     printf("Parent process: Running on CPU=%d PID=%d\n", sched_getcpu(), getpid());         pid = fork();     if (pid == -1) {         perror("fork failed");         exit(1);     } else if (pid == 0) {         child_task();         exit(0);     } else {         printf("Parent process: Waiting for child to finish...\n");         wait(NULL);         printf("Parent process: Child finished\n");     }     return 0; }
Code Change for Debugging
To assist in diagnosing the issue, the following debug output was added to coresight-etm-perf.c:
diff
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -613,6 +611,10 @@ static void etm_event_pause(struct perf_event *event,         size = sink_ops(sink)->update_buffer(sink, handle, ctxt->event_data->snk_config);   +       dev_dbg(&csdev->dev, "etm_event_pause called on CPU%d, size=%lu\n", +               smp_processor_id(), size); +         if (READ_ONCE(handle->event)) {                 if (!size)                         return;
Log Output
The following logs were captured during execution:
root@k3:~# ~/perf record -e rvtrace/@tmc_etr0/ ./test_fork
Parent process: Running on CPU=1 PID=1534
[ 2190.481643] coresight encoder1: DEBUG: CPU1 update_buffer returned size=0
Parent process: Waiting for child to finish...
[ 2190.489223] coresight encoder6: DEBUG: CPU6 update_buffer returned size=0
Child process: Running on CPU=6 PID=1535
[ 2191.489372] coresight encoder6: DEBUG: CPU6 update_buffer returned size=30880
Child process: Task finished
Parent process: Child finished
[ 2191.496382] coresight encoder1: DEBUG: CPU1 update_buffer returned size=21504
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.056 MB perf.data ]

root@k3:~# ~/perf script > ./test_fork.log
root@k3:~# grep "/root/test" test_fork.log -nrI
115: test_fork 1535 [006] 52.991808: 1 branches: 2aaacc7910 child_task+0x2e (/root/test_fork) => 2aaacc7794 puts@plt+0x4 (/root/test_fork)
116: test_fork 1535 [006] 52.991808: 1 branches: 2aaacc7798 puts@plt+0x8 (/root/test_fork) => 2aaacc7798 puts@plt+0x8 (/root/test_fork)
962: test_fork 1535 [006] 52.991809: 1 branches: 2aaacc791c child_task+0x3a (/root/test_fork) => 2aaacc7928 child_task+0x46 (/root/test_fork)
963: test_fork 1535 [006] 52.991809: 1 branches: 2aaacc7988 main+0x5e (/root/test_fork) => 2aaacc77c8 exit@plt+0x8 (/root/test_fork)
2468: test_fork 1535 [006] 52.991809: 1 branches: 2aaacc7894 __do_global_dtors_aux+0x0 (/root/test_fork) => 2aaacc7894 __do_global_dtors_aux+0x0 (/root/test_fork)
2563: test_fork 1535 [006] 52.991809: 1 branches: 2aaacc7898 __do_global_dtors_aux+0x4 (/root/test_fork) => 2aaacc7898 __do_global_dtors_aux+0x4 (/root/test_fork)
2564: test_fork 1535 [006] 52.991809: 1 branches: 2aaacc789c __do_global_dtors_aux+0x8 (/root/test_fork) => 2aaacc78b8 __do_global_dtors_aux+0x24 (/root/test_fork)
2570: test_fork 1535 [006] 52.991809: 1 branches: 2aaacc78ba __do_global_dtors_aux+0x26 (/root/test_fork) => 2aaacc7836 deregister_tm_clones+0x18 (/root/test_fork)
2571: test_fork 1535 [006] 52.991809: 1 branches: 2aaacc784c deregister_tm_clones+0x2e (/root/test_fork) => 2aaacc7852 deregister_tm_clones+0x34 (/root/test_fork)
2572: test_fork 1535 [006] 52.991809: 1 branches: 2aaacc78be __do_global_dtors_aux+0x2a (/root/test_fork) => 2aaacc78c4 __do_global_dtors_aux+0x30 (/root/test_fork)
2864: test_fork 1535 [006] 52.991809: 1 branches: 2aaacc78c8 __do_global_dtors_aux+0x34 (/root/test_fork) => 2aaacc78ce __do_global_dtors_aux+0x3a (/root/test_fork)
4163: test_fork 1534 [001] 52.991977: 1 branches: 2aaacc79a0 main+0x76 (/root/test_fork) => 2aaacc7798 puts@plt+0x8 (/root/test_fork)
4849: test_fork 1534 [001] 52.991977: 1 branches: 2aaacc79ac main+0x82 (/root/test_fork) => 2aaacc79b8 main+0x8e (/root/test_fork)
5244: test_fork 1534 [001] 52.991978: 1 branches: 2aaacc7894 __do_global_dtors_aux+0x0 (/root/test_fork) => 2aaacc78b8 __do_global_dtors_aux+0x24 (/root/test_fork)
5368: test_fork 1534 [001] 52.991978: 1 branches: 2aaacc78ba __do_global_dtors_aux+0x26 (/root/test_fork) => 2aaacc7836 deregister_tm_clones+0x18 (/root/test_fork)
5369: test_fork 1534 [001] 52.991978: 1 branches: 2aaacc784c deregister_tm_clones+0x2e (/root/test_fork) => 2aaacc7852 deregister_tm_clones+0x34 (/root/test_fork)
5370: test_fork 1534 [001] 52.991978: 1 branches: 2aaacc78be __do_global_dtors_aux+0x2a (/root/test_fork) => 2aaacc78c4 __do_global_dtors_aux+0x30 (/root/test_fork)
5478: test_fork 1534 [001] 52.991978: 1 branches: 2aaacc78c8 __do_global_dtors_aux+0x34 (/root/test_fork) => 2aaacc78ce __do_global_dtors_aux+0x3a (/root/test_fork)
As shown in the logs, trace data from the parent process before the fork (including the entry to main()) is missing in workload-only mode.
Questions
I am currently uncertain whether this issue also affects the coresight-etm tracer, or if it is a hardware-specific problem on our platform. I would greatly appreciate any suggestions or guidance you may have.
Thank you for your support!

进迭时空
梁镇