On Wed, Feb 23, 2022 at 8:50 AM Kees Cook keescook@chromium.org wrote:
On February 18, 2022 10:19:50 AM PST, Jann Horn jannh@google.com wrote:
pstore_dump() is *always* invoked in atomic context (nowadays in an RCU read-side critical section, before that under a spinlock). It doesn't make sense to try to use semaphores here.
Ah, very nice. Thanks for the analysis!
[...] -static bool pstore_cannot_wait(enum kmsg_dump_reason reason) +bool pstore_cannot_block_path(enum kmsg_dump_reason reason)
Why the rename,
That's one of the parts of commit ea84b580b955 that I included in the revert. "wait" in the name is not accurate, since "wait" in the kernel normally refers to scheduling away until some condition is fulfilled. (Though I guess "block" also isn't the best name either... idk.) The place where we might want to have different behavior depending on whether we're handling a kernel crash are spinlocks; during a kernel crash, we shouldn't deadlock on them, but otherwise, AFAIK it's fine to block on them.
extern, and EXPORT? This appears to still only have the same single caller?
Also part of the revert. I figured it might make sense to also revert that part because:
With this commit applied, the EFI code will always take the "nonblock" path for now, but that's kinda suboptimal; on some platforms the "blocking" path uses a semaphore, so we really can't take that, but on x86 it uses a spinlock, which we could block on if we're not oopsing. We could avoid needlessly losing non-crash dmesg dumps there; I don't know whether we care about that though.
So I figured that we might want to start adding new callers to this later on. But if you want, I'll remove that part of the revert and resend?
[...]
pr_err("dump skipped in %s path: may corrupt error record\n",
in_nmi() ? "NMI" : why);
return;
}
if (down_interruptible(&psinfo->buf_lock)) {
pr_err("could not grab semaphore?!\n");
if (pstore_cannot_block_path(reason)) {
if (!spin_trylock_irqsave(&psinfo->buf_lock, flags)) {
pr_err("dump skipped in %s path because of concurrent dump\n"
, in_nmi() ? "NMI" : why);
The pr_err had the comma following the format string moved,
Ah, whoops, that was also part of the revert, but I guess I should have left that part out...
and the note about corruption removed. Is that no longer accurate?
There should be no more corruption since commit 959217c84c27 ("pstore: Actually give up during locking failure") - if we're bailing out, we can't be causing corruption, I believe?