The standard printk() tries to flush the message to the console immediately. It tries to take the console lock. If the lock is already taken then the current owner is responsible for flushing even the new message.
There is a small race window between checking whether a new message is available and releasing the console lock. It is solved by re-checking the state after releasing the console lock. If the check is positive then console_unlock() tries to take the lock again and process the new message as well.
The commit 996e966640ddea7b535c ("printk: remove logbuf_lock") causes that console_seq is not longer read atomically. As a result, the re-check might be done with an inconsistent 64-bit index.
Solve it by using the last sequence number that has been checked under the console lock. In the worst case, it will take the lock again only to realized that the new message has already been proceed. But it was possible even before.
The variable next_seq is marked as __maybe_unused to call down compiler warning when CONFIG_PRINTK is not defined.
Fixes: commit 996e966640ddea7b535c ("printk: remove logbuf_lock") Reported-by: kernel test robot lkp@intel.com # unused next_seq warning Cc: stable@vger.kernel.org # 5.13 Signed-off-by: Petr Mladek pmladek@suse.com Acked-by: Sergey Senozhatsky senozhatsky@chromium.org --- Changes in v2:
- fixed warning about unused next_seq variable reported by kernel test robot
kernel/printk/printk.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 142a58d124d9..6dad7da8f383 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2545,6 +2545,7 @@ void console_unlock(void) bool do_cond_resched, retry; struct printk_info info; struct printk_record r; + u64 __maybe_unused next_seq;
if (console_suspended) { up_console_sem(); @@ -2654,8 +2655,10 @@ void console_unlock(void) cond_resched(); }
- console_locked = 0; + /* Get consistent value of the next-to-be-used sequence number. */ + next_seq = console_seq;
+ console_locked = 0; up_console_sem();
/* @@ -2664,7 +2667,7 @@ void console_unlock(void) * there's a new owner and the console_unlock() from them will do the * flush, no worries. */ - retry = prb_read_valid(prb, console_seq, NULL); + retry = prb_read_valid(prb, next_seq, NULL); printk_safe_exit_irqrestore(flags);
if (retry && console_trylock())
On 2021-07-02, Petr Mladek pmladek@suse.com wrote:
The standard printk() tries to flush the message to the console immediately. It tries to take the console lock. If the lock is already taken then the current owner is responsible for flushing even the new message.
There is a small race window between checking whether a new message is available and releasing the console lock. It is solved by re-checking the state after releasing the console lock. If the check is positive then console_unlock() tries to take the lock again and process the new message as well.
The commit 996e966640ddea7b535c ("printk: remove logbuf_lock") causes that console_seq is not longer read atomically. As a result, the re-check might be done with an inconsistent 64-bit index.
Solve it by using the last sequence number that has been checked under the console lock. In the worst case, it will take the lock again only to realized that the new message has already been proceed. But it was possible even before.
The variable next_seq is marked as __maybe_unused to call down compiler warning when CONFIG_PRINTK is not defined.
As Sergey already pointed out, this patch is not fixing a real problem. An inconsistent value (or an increased consistent value) would mean that another printer is actively printing, and thus a retry is not necessary anyway. But this patch will avoid a KASAN message about an unmarked (although safe) data race.
Fixes: commit 996e966640ddea7b535c ("printk: remove logbuf_lock") Reported-by: kernel test robot lkp@intel.com # unused next_seq warning Cc: stable@vger.kernel.org # 5.13 Signed-off-by: Petr Mladek pmladek@suse.com Acked-by: Sergey Senozhatsky senozhatsky@chromium.org
Reviewed-by: John Ogness john.ogness@linutronix.de
On Sat 2021-07-03 08:32:02, John Ogness wrote:
On 2021-07-02, Petr Mladek pmladek@suse.com wrote:
The standard printk() tries to flush the message to the console immediately. It tries to take the console lock. If the lock is already taken then the current owner is responsible for flushing even the new message.
There is a small race window between checking whether a new message is available and releasing the console lock. It is solved by re-checking the state after releasing the console lock. If the check is positive then console_unlock() tries to take the lock again and process the new message as well.
The commit 996e966640ddea7b535c ("printk: remove logbuf_lock") causes that console_seq is not longer read atomically. As a result, the re-check might be done with an inconsistent 64-bit index.
Solve it by using the last sequence number that has been checked under the console lock. In the worst case, it will take the lock again only to realized that the new message has already been proceed. But it was possible even before.
The variable next_seq is marked as __maybe_unused to call down compiler warning when CONFIG_PRINTK is not defined.
As Sergey already pointed out, this patch is not fixing a real problem. An inconsistent value (or an increased consistent value) would mean that another printer is actively printing, and thus a retry is not necessary anyway.
Ah, I misunderstood that part. You are right. CPU_X might see wrong console_seq only when CPU_Y incremented console_seq. If CPU_X does not do retry because of racy console_seq. Then CPU_Y would do retry when yet another CPU added yet another new message in the meantime.
But this patch will avoid a KASAN message about an unmarked (although safe) data race.
Yup.
OK, I am going to queue the patch for-5.15. There is no need to rush it for-4.14.
Best Regards, Petr
On Wed 2021-07-07 14:26:03, Petr Mladek wrote:
On Sat 2021-07-03 08:32:02, John Ogness wrote:
On 2021-07-02, Petr Mladek pmladek@suse.com wrote:
The standard printk() tries to flush the message to the console immediately. It tries to take the console lock. If the lock is already taken then the current owner is responsible for flushing even the new message.
There is a small race window between checking whether a new message is available and releasing the console lock. It is solved by re-checking the state after releasing the console lock. If the check is positive then console_unlock() tries to take the lock again and process the new message as well.
The commit 996e966640ddea7b535c ("printk: remove logbuf_lock") causes that console_seq is not longer read atomically. As a result, the re-check might be done with an inconsistent 64-bit index.
Solve it by using the last sequence number that has been checked under the console lock. In the worst case, it will take the lock again only to realized that the new message has already been proceed. But it was possible even before.
The variable next_seq is marked as __maybe_unused to call down compiler warning when CONFIG_PRINTK is not defined.
As Sergey already pointed out, this patch is not fixing a real problem. An inconsistent value (or an increased consistent value) would mean that another printer is actively printing, and thus a retry is not necessary anyway.
Ah, I misunderstood that part. You are right. CPU_X might see wrong console_seq only when CPU_Y incremented console_seq. If CPU_X does not do retry because of racy console_seq. Then CPU_Y would do retry when yet another CPU added yet another new message in the meantime.
But this patch will avoid a KASAN message about an unmarked (although safe) data race.
Yup.
OK, I am going to queue the patch for-5.15. There is no need to rush it for-4.14.
The patch has been committed into printk/linux.git, branch rework/fixup-for-5.15.
Note that I am going to use topic branches rework/* for the printk rework from now on. It will allow to be more flexible with pushing big changes and fixes into linux-next and mainline.
The "rework/" prefix will still allow to differ printk rework-related changes from "unrelated" printk features and fixes.
As a result, "printk-rework" branch will not longer be merged into "for-next" or "for-linus" branches. But I am still going to merge "rework/*" branches there so that "printk-rework" branch shows the printk rework history. I think about renaming this branch to "rework/history" or "rework/HEAD".
Best Regards, Petr
linux-stable-mirror@lists.linaro.org