On Fri, 10 Feb 2023 14:32:13 -0800 Alexei Starovoitov alexei.starovoitov@gmail.com wrote:
fwiw I don't mind reverting commit 3087c61ed2c4. For the record it didn't go through the bpf tree. It went through mm.
I think the solution being presented can work, but still needs some work.
But first we need to define which part of ftrace format is an abi. I think it's a format of tracing/event/foo/format file and not the contents of it. The tracepoints come and go. Their arguments get changed too. So the contents of ftrace format files change.
As Linus always says. The abi is what user space uses. ;-)
In this case Perfetto stumbled on parsing field:char comm[TASK_COMM_LEN]; offset:8;
Perfetto always expected that to be a number, so it must remain one.
We probably should define that 'field: value offset: value' is an abi, but value-s can change. Say offset:8 becomes offset:+8, for whatever reason. If Perfetto fails to parse it, it's a Perfetto bug.
In this case it failed to parse char comm[TASK_COMM_LEN]; But that's not the only such "field:". These three were there for a long time. field:u32 rates[NUM_NL80211_BANDS]; offset:16; size:24; field:int mcast_rate[NUM_NL80211_BANDS]; offset:60; size:24; field:int mcast_rate[NUM_NL80211_BANDS]; offset:108; size:24;
I suspect Perfetto didn't have a use case to parse them, so the bug stayed dormant and a change in TASK_COMM_LEN triggered it.
Correct. Perfetto picks and chooses what events it needs. It does not parse all events. So it wouldn't notice these. In fact, some tools require these fields to be numbers and have used the TRACE_EVENT_ENUM() to convert them.
But with this change, we can likely also remove the parsing of the fields on boot up. Which is a good thing.
We can use Yafang's patch to do: -field:%.*s %s%s; +field:%.*s %s[%d]; but it will affect both TASK_COMM_LEN and NUM_NL80211_BANDS.
Nothing appears to care about the NUM_NL80211_BANDS being there. It's basically useless information. If nothing complains about it changing, then it isn't a breakage of user space.
Remember, Linus's rule is not "do not modify user space interfaces", it is "don't break user space". If a user space interface changes and no user space tool notices, did it really break? The tree in a forest analogy.
In summary the TASK_COMM_LEN change from #define to enum didn't introduce anything new in the kind of "value"s being printed in ftrace files. Hence I'm arguing there is no abi breakage.
There was a question why change from #define to enum is useful to bpf. The reason is that #define-s are not seen in dwarf whereas enums and their values are. bpf tooling has ways to extract that data and auto-adjust bpf programs when enum-s disappear or their values change.
In most cases, having all the [xxx] turn into useful numbers is what we want. There's a few things broken with the current patch, but I can help fix those.
-- Steve