Ășt 10. 12. 2024 v 22:04 odesĂlatel Sasha Levin sashal@kernel.org napsal:
This is a note to let you know that I've just added the patch titled
rtla/timerlat: Make timerlat_top_cpu->*_count unsigned long long
to the 6.6-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git%3Ba=su...
The filename of the patch is: rtla-timerlat-make-timerlat_top_cpu-_count-unsigned-.patch and it can be found in the queue-6.6 subdirectory.
Could you also add "rtla/timerlat: Make timerlat_hist_cpu->*_count unsigned long long", too (76b3102148135945b013797fac9b20), just like we already have in-queue for 6.12? It makes no sense to do one fix but not the other (clearly autosel AI won't take over the world yet).
If you, or anyone else, feels it should not be added to the stable tree, please let stable@vger.kernel.org know about it.
commit 0b8030ad5be8c39c4ad0f27fa740b3140a31023b Author: Tomas Glozar tglozar@redhat.com Date: Fri Oct 11 14:10:14 2024 +0200
rtla/timerlat: Make timerlat_top_cpu->*_count unsigned long long [ Upstream commit 4eba4723c5254ba8251ecb7094a5078d5c300646 ] Most fields of struct timerlat_top_cpu are unsigned long long, but the fields {irq,thread,user}_count are int (32-bit signed). This leads to overflow when tracing on a large number of CPUs for a long enough time: $ rtla timerlat top -a20 -c 1-127 -d 12h ... 0 12:00:00 | IRQ Timer Latency (us) | Thread Timer Latency (us) CPU COUNT | cur min avg max | cur min avg max 1 #43200096 | 0 0 1 2 | 3 2 6 12 ... 127 #43200096 | 0 0 1 2 | 3 2 5 11 ALL #119144 e4 | 0 5 4 | 2 28 16 The average latency should be 0-1 for IRQ and 5-6 for thread, but is reported as 5 and 28, about 4 to 5 times more, due to the count overflowing when summed over all CPUs: 43200096 * 127 = 5486412192, however, 1191444898 (= 5486412192 mod MAX_INT) is reported instead, as seen on the last line of the output, and the averages are thus ~4.6 times higher than they should be (5486412192 / 1191444898 = ~4.6). Fix the issue by changing {irq,thread,user}_count fields to unsigned long long, similarly to other fields in struct timerlat_top_cpu and to the count variable in timerlat_top_print_sum. Link: https://lore.kernel.org/20241011121015.2868751-1-tglozar@redhat.com Reported-by: Attila Fazekas <afazekas@redhat.com> Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
diff --git a/tools/tracing/rtla/src/timerlat_top.c b/tools/tracing/rtla/src/timerlat_top.c index a84f43857de14..0915092057f85 100644 --- a/tools/tracing/rtla/src/timerlat_top.c +++ b/tools/tracing/rtla/src/timerlat_top.c @@ -49,9 +49,9 @@ struct timerlat_top_params { };
struct timerlat_top_cpu {
int irq_count;
int thread_count;
int user_count;
unsigned long long irq_count;
unsigned long long thread_count;
unsigned long long user_count; unsigned long long cur_irq; unsigned long long min_irq;
@@ -237,7 +237,7 @@ static void timerlat_top_print(struct osnoise_tool *top, int cpu) /* * Unless trace is being lost, IRQ counter is always the max. */
trace_seq_printf(s, "%3d #%-9d |", cpu, cpu_data->irq_count);
trace_seq_printf(s, "%3d #%-9llu |", cpu, cpu_data->irq_count); if (!cpu_data->irq_count) { trace_seq_printf(s, "%s %s %s %s |", no_value, no_value, no_value, no_value);
Thanks, Tomas