stable-rc/linux-4.14.y bisection: baseline.login on meson8b-odroidc1

List overview All Threads
Download

newer

older

[merged mm-hotfixes-stable]...

[PATCH v2 3/9] PCI/ASPM: Use RMW...

Ricardo Cañuelo

10 Apr 2023 10 Apr '23

6:06 a.m.

Culprit: https://lore.kernel.org/r/20211227180026.4068352-2-martin.blumenstingl@googl...

On lun 27-12-2021 19:00:24, Martin Blumenstingl wrote:

...

The dt-bindings for the UART controller only allow the following values for Meson6 SoCs:

"amlogic,meson6-uart", "amlogic,meson-ao-uart"

"amlogic,meson6-uart"

Use the correct fallback compatible string "amlogic,meson-ao-uart" for AO UART. Drop the "amlogic,meson-uart" compatible string from the EE domain UART controllers.

KernelCI detected that this patch introduced a regression in stable-rc/linux-4.14.y (4.14.267) on a meson8b-odroidc1. After this patch was applied the tests running on this platform don't show any serial output.

This doesn't happen in other stable branches nor in mainline, but 4.14 hasn't still reached EOL and it'd be good to find a fix.

Here's the bisection report: https://groups.io/g/kernelci-results/message/40147

KernelCI info: https://linux.kernelci.org/test/case/id/64234f7761021a30b262f776/

Test log: https://storage.kernelci.org/stable-rc/linux-4.14.y/v4.14.311-43-g88e481d604...

Thanks, Ricardo

Show replies by date

Linux regression tracking (Thorsten Leemhuis)

4 May 4 May

9:06 a.m.

[CCing the regression list, as it should be in the loop for regressions: https://docs.kernel.org/admin-guide/reporting-regressions.html]

On 10.04.23 08:06, Ricardo Cañuelo wrote:

...

Culprit: https://lore.kernel.org/r/20211227180026.4068352-2-martin.blumenstingl@googl...

On lun 27-12-2021 19:00:24, Martin Blumenstingl wrote:

...
The dt-bindings for the UART controller only allow the following values for Meson6 SoCs:

"amlogic,meson6-uart", "amlogic,meson-ao-uart"

"amlogic,meson6-uart"

Use the correct fallback compatible string "amlogic,meson-ao-uart" for AO UART. Drop the "amlogic,meson-uart" compatible string from the EE domain UART controllers.

KernelCI detected that this patch introduced a regression in stable-rc/linux-4.14.y (4.14.267) on a meson8b-odroidc1. After this patch was applied the tests running on this platform don't show any serial output.

This doesn't happen in other stable branches nor in mainline, but 4.14 hasn't still reached EOL and it'd be good to find a fix.

Here's the bisection report: https://groups.io/g/kernelci-results/message/40147

KernelCI info: https://linux.kernelci.org/test/case/id/64234f7761021a30b262f776/

Test log: https://storage.kernelci.org/stable-rc/linux-4.14.y/v4.14.311-43-g88e481d604...

Lo! From the earlier discussion[1] it seems the mainline developers of the patch-set don't care (which is fine). And the stable team always has a lot of work at hand, which might explain why they haven't looked into this. Hence let me try to fill this gap a little here by asking:

Have you tried if reverting the change on top of the latest 4.14.y kernel works and looks safe (e.g. doesn't cause a regression on its own)?

I also briefly looked into "git log v4.14..v4.19 -- arch/arm/boot/dts/meson.dtsi" and noticed commit 291f45dd6da ("ARM: dts: meson: fixing USB support on Meson6, Meson8 and Meson8b") [v4.15-rc1] that mentions a fix for the Odroid-C1+ board -- which afaics wasn't backported to 4.14.y. Is that maybe why this happens on 4.14.y and not on 4.19.y? Note though: It's just a wild guess from the peanut gallery, as this is not my area of expertise!

Ciao, Thorsten

[1] https://lore.kernel.org/lkml/20230405132900.ci35xji3xbb3igar@rcn-XPS-13-9305...

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page.

#regzbot poke

Ricardo Cañuelo

10:22 a.m.

Hey Thorsten,

Thanks for bringing this up, I think what you mentioned is interesting in a more general way, so let me use this email to share my impressions about the approach to reporting regressions and the role of the reporter.

On 4/5/23 11:06, Linux regression tracking (Thorsten Leemhuis) wrote:

...

Have you tried if reverting the change on top of the latest 4.14.y kernel works and looks safe (e.g. doesn't cause a regression on its own)?

No, I haven't. To be honest, my current approach when I'm reporting regressions is to act merely as a reporter, making sure the regression summaries reach the right people and providing as much info as possible with the data we gather from the test runs in KernelCI.

Sometimes I stop for some more time in a particular regression and I test it / investigate it more thoroughly to find the exact root cause and try to fix it, but I consider that to be beyond the role of a reporter. At that point I'm basically trying to find a fix, and that's much more time consuming.

...

I also briefly looked into "git log v4.14..v4.19 -- arch/arm/boot/dts/meson.dtsi" and noticed commit 291f45dd6da ("ARM: dts: meson: fixing USB support on Meson6, Meson8 and Meson8b") [v4.15-rc1] that mentions a fix for the Odroid-C1+ board -- which afaics wasn't backported to 4.14.y. Is that maybe why this happens on 4.14.y and not on 4.19.y? Note though: It's just a wild guess from the peanut gallery, as this is not my area of expertise!

Maybe, that's the kind of thing that someone who's familiar with the code (author / maintainers) can quickly evaluate. What you said about that not being your area of expertise is key, IMO. I don't think it's reasonable to expect a single person to investigate every possible type of regression. Investigating a bug could take me 5 minutes if it's something trivial or a few days if it's not and I'm not familiar with it, while the patch author/s could probably have it assessed and fixed in minutes. That's why I think that providing the regression info to the right people is a better use of the reporter's time.

There are many of us now in the community that are working towards building a common effort for regression reporting, so maybe we should take some time to define the roles involved and gather ideas about how to approach certain types of problems.

Thanks, Ricardo

Thorsten Leemhuis

11:28 a.m.

[CCing Greg, in case he's interested]

On 04.05.23 12:22, Ricardo Cañuelo wrote:

...

Thanks for bringing this up, I think what you mentioned is interesting in a more general way, so let me use this email to share my impressions about the approach to reporting regressions and the role of the reporter.

Many thx for this, let me follow suit a bit.

...

On 4/5/23 11:06, Linux regression tracking (Thorsten Leemhuis) wrote:

...
Have you tried if reverting the change on top of the latest 4.14.y kernel works and looks safe (e.g. doesn't cause a regression on its own)?

No, I haven't. To be honest, my current approach when I'm reporting regressions is to act merely as a reporter, making sure the regression summaries reach the right people and providing as much info as possible with the data we gather from the test runs in KernelCI.

Sometimes I stop for some more time in a particular regression and I test it / investigate it more thoroughly to find the exact root cause and try to fix it, but I consider that to be beyond the role of a reporter. At that point I'm basically trying to find a fix, and that's much more time consuming.

Yeah, my situation is quite similar -- just that I'm not the reporter and instead someone supposed to handle the tracking. But just like you I sometimes do a bit more than the job description in the strict sense requires. That msg you replied to was written in one of those moments. :-D

But FWIW, I have lines I don't cross myself (or at least try to). Submitting fixes myself for example, even if they are simple -- like patches adding quirk entries to resolve regressions (recently I nevertheless got close to ignore that line, but then found a better solution...).

...

...
I also briefly looked into "git log v4.14..v4.19 -- arch/arm/boot/dts/meson.dtsi" and noticed commit 291f45dd6da ("ARM: dts: meson: fixing USB support on Meson6, Meson8 and Meson8b") [v4.15-rc1] that mentions a fix for the Odroid-C1+ board -- which afaics wasn't backported to 4.14.y. Is that maybe why this happens on 4.14.y and not on 4.19.y? Note though: It's just a wild guess from the peanut gallery, as this is not my area of expertise!

Maybe, that's the kind of thing that someone who's familiar with the code (author / maintainers) can quickly evaluate.

Definitely. Maybe I should have CCed them in my mail, but I didn't, as that the point where I thought "the reporter is the better judge here".

...

What you said about that not being your area of expertise is key, IMO. I don't think it's reasonable to expect a single person to investigate every possible type of regression. Investigating a bug could take me 5 minutes if it's something trivial or a few days if it's not and I'm not familiar with it, while the patch author/s could probably have it assessed and fixed in minutes. That's why I think that providing the regression info to the right people is a better use of the reporter's time.

There are many of us now in the community that are working towards building a common effort for regression reporting, so maybe we should take some time to define the roles involved and gather ideas about how to approach certain types of problems.

Yeah, maybe.

But OTOH I think we (e.g. reporters and developers) are all volunteers here (e.g. as hobbyist or because our employer wants us to contribute). Volunteers with a common goal. And all of us only have 24 hours in a day (at least as far as I know) -- which is often not enough to get everything done one is supposed to do. That in an ideal world should not affect duties like "fix any regressions you caused". But well, we don't live in an ideal world.

That's why I sometimes ignore the strict role definitions and also wonder if defining them is worth it. But it's totally fine for me if someone wants to do that.

That might sound a bit like a speech I'm giving trying to convince you to follow my model. But be assured: that's not the case at all. After your words I just felt I wanted to share my view on things.

Maybe that's because this is afaics a situation where a regression likely will remain unfixed, unless some of us do a bit more than what is expected from them. That's because I guess most people don't care much about 4.14.y anymore -- either in general or on the particular platform affected by this regression.

That leads to the question: should we spend our time on it? Maybe the time would better be spend on more important things, even if that means this particular regressions then likely will remain unfixed in 4.14.y. Heck, maybe we should define that such an outcome is totally fine in cases like this -- not sure, but I currently think leaving that undefined might be better approach for the project as a whole.

Ciao, Thorsten

Linux regression tracking (Thorsten Leemhuis)

19 Jun 19 Jun

9:36 a.m.

On 04.05.23 13:28, Thorsten Leemhuis wrote:

...

[CCing Greg, in case he's interested]

On 04.05.23 12:22, Ricardo Cañuelo wrote:

...
Thanks for bringing this up, [...]

BTW and JFYI (as you earlier said my docs helped you): the aspect "who is responsible to handle this regression: the regular maintainer or the stable team?" that came up earlier with this report lead me to sit down and write a text called "Why your Linux kernel bug report might be ignored or is fruitless" I published here:

https://linux-regtracking.leemhuis.info/post/frequent-reasons-why-linux-kern...

In the end that document grew a lot, but that aspect is covered there. Maybe it's helpful for you or somebody else down the road.

Still a bit unsure if there is anything else I should do with that text. Is written from the perspective of users (otherwise it will sound apologetic) and thus likely not something that would fit into the kernel's Documentation/ directory. :-/

Anyway, there is a different reason why I write:

...

Maybe that's because this is afaics a situation where a regression likely will remain unfixed, unless some of us do a bit more than what is expected from them. That's because I guess most people don't care much about 4.14.y anymore -- either in general or on the particular platform affected by this regression.

That leads to the question: should we spend our time on it?

As expected there wasn't any progress (at least afaics).

As mentioned earlier. In an ideal world this regression would be addressed, but it looks like it won't come down to it, as nobody is motivated enough to look closer (aka "everybody has more important things to do"). Hence I'm inclined to just remove it from the regression tacking. Or I need to create a category "bisected regressions that nevertheless are unlikely to be ever fixed" in the regzbot webui to avoid the clutter (but this is only one of a few that would fit).

Ricardo, how would do you and Kernelci folks feel about ignoring this?

Ricardo Cañuelo

11:53 a.m.

Hi Thorsten,

On lun, jun 19 2023 at 11:36:02, "Linux regression tracking (Thorsten Leemhuis)" regressions@leemhuis.info wrote:

...

BTW and JFYI (as you earlier said my docs helped you): the aspect "who is responsible to handle this regression: the regular maintainer or the stable team?" that came up earlier with this report lead me to sit down and write a text called "Why your Linux kernel bug report might be ignored or is fruitless" I published here:

https://linux-regtracking.leemhuis.info/post/frequent-reasons-why-linux-kern...

This is fantastic and a much needed document that should be mandatory training for anyone reporting kernel regressions. IMO this kind of documents should be located in a more prominent place so that it can become a key reference, specially in this case where there's no single right workflow. Maybe with a bit of effort of us all we can improve the situation so that bugs and regression reporting and tracking in the kernel becomes a much more streamlined process.

...

...
That leads to the question: should we spend our time on it?

As expected there wasn't any progress (at least afaics). [...] Ricardo, how would do you and Kernelci folks feel about ignoring this?

I can't speak on behalf of the KernelCI people, but this being something that isn't failing in mainline and considering that the stable release where it happened was very close to EOL puts this in the low-priority category for me. Fixing bugs can become a quite expensive task in terms of time, and I'm try to factor in the impact of the fix to make sure the time spent fixing it is worth it. In other words, making test results green just for the sake of green-ness is not a sound reason to go after the failures. We're trying to improve the kernel quality after all, so I'd rather focus on the regressions that seem more important for the kernel integrity and for the users.

Cheers, Ricardo

Thorsten Leemhuis

5:09 p.m.

On 19.06.23 13:53, Ricardo Cañuelo wrote:

...

On lun, jun 19 2023 at 11:36:02, "Linux regression tracking (Thorsten Leemhuis)" regressions@leemhuis.info wrote:

...
BTW and JFYI (as you earlier said my docs helped you): the aspect "who is responsible to handle this regression: the regular maintainer or the stable team?" that came up earlier with this report lead me to sit down and write a text called "Why your Linux kernel bug report might be ignored or is fruitless" I published here:

https://linux-regtracking.leemhuis.info/post/frequent-reasons-why-linux-kern...

This is fantastic

Feels really good to hear this, as it was a lot of work that involved a lot of rewriting...

Nevertheless: let me know, if there is something where you think "this doesn't feel right", "this could be clearer", "I don't understand this", or something like that.

...

and a much needed document that should be mandatory training for anyone reporting kernel regressions.

Well, bugs in general I'd say.

...

IMO this kind of documents should be located in a more prominent place

Yeah, but where? I wondered if I should ask Jonathan if this is something for lwn.net, but something in me says it would be a odd fit.

...

Maybe with a bit of effort of us all we can improve the situation so that bugs and regression reporting and tracking in the kernel becomes a much more streamlined process.

I'd really like to work more on that, but this regression tracking thing is a time sink. And regzbot still needs quite a few improvements as well. :-/

Would help if I finally would figure out how to use "git clone" to create a clone or two of myself. ;)

...

...
...
That leads to the question: should we spend our time on it?

As expected there wasn't any progress (at least afaics). [...] Ricardo, how would do you and Kernelci folks feel about ignoring this?

I can't speak on behalf of the KernelCI people, but this being something that isn't failing in mainline and considering that the stable release where it happened was very close to EOL puts this in the low-priority category for me. Fixing bugs can become a quite expensive task in terms of time, and I'm try to factor in the impact of the fix to make sure the time spent fixing it is worth it. In other words, making test results green just for the sake of green-ness is not a sound reason to go after the failures. We're trying to improve the kernel quality after all, so I'd rather focus on the regressions that seem more important for the kernel integrity and for the users.

Well said. It's similar for regression tracking, hence let me remove it from the list of tracked issues

#regzbot inconclusive: seems nobody is motivated enough to work on resolving this issue found by KernelCI (see lists for details).

796

days inactive

866

days old

linux-stable-mirror@lists.linaro.org

6 comments

participants

tags (0)

participants (3)

Linux regression tracking (Thorsten Leemhuis)
Ricardo Cañuelo
Thorsten Leemhuis