On Mon, Nov 08, 2021 at 11:17:07AM -0800, Yi Fan wrote:
On Mon, Nov 8, 2021 at 12:00 AM Greg KH gregkh@linuxfoundation.org wrote:
On Thu, Nov 04, 2021 at 12:40:32PM -0700, Yi Fan wrote:
Reply inline.
On Thu, Nov 4, 2021 at 11:56 AM Greg KH gregkh@linuxfoundation.org wrote:
On Thu, Nov 04, 2021 at 11:14:55AM -0700, Yi Fan wrote:
Resend the email using plain text.
I found some kernel performance regression issues that might be related w/ 4.14.y LTS commit.
4.14.y commit: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v...
The issue is observed when "console=" is used as a kernel parameter to disable the kernel console.
What exact "performance issue" are you seeing?
[YF] one kernel thread was randomly blocked for more than ~40 milliseconds, causing a certain task to fail to process in time. [YF] the issue is highly random on a single device. But it might happen a few times per 24 hours on a certain percentage of devices. The overall percentage of devices that show the issue seems quite stable over a long period of time (somehow the magic number is ~40%.). [YF] local test on a pool of devices does not show any correlation w/ any particular devices. [YF] local test after reverting the above single commit passes, no issue is observed.
And what type of device is this?
[YF] it happens on multiple devices on the 4.14.y kernel. (sorry cannot disclose the device type here.)
That's not helpful :(
Can you say "server" or "tiny device you hold in your hand"?
How about architecture type?
If you see this thread: https://lore.kernel.org/r/f19c18fd-20b3-b694-5448-7d899966a868@roeck-us.net it looks like chromeos devices have now disabled this change, and there was a long discussion about possible issues and solutions.
Can you try the patch set referenced in that thread to see if that resolves the issue for you or not? Given that I have not seen any reports of this being an issue since over a year ago, odds are it has been resolved already with some change that we probably also need to backport to 4.14.y.
So any help in identifying that change would be appreciated.
[YF] thanks for the context. I did not find a clear patch that seems to solve this issue yet. [YF] for the time being, reverting the offending commit seems the safest solution for the 4.14.y.
What about for the 4.19.y kernel tree? Why is this limited to just 4.14.y?
Can you send a patch that reverts this from 4.14 that explains why it should be removed?
thanks,
greg k-h