Hi there, hello,
Sometimes when I suspend (by closing the lid, less often - by pressing
Fn+F1 (sleep key combo)) or poweroff my laptop (both by pressing powerit
button and running "loginctl poweroff"), it goes in such a state when it
doesn't respond to opening/closing the lid, power button nor
Ctrl+Alt+Del, but, unlike in sleep mode, the fan is rotating and the
"awake status" LED is on. I checked /var/log/kern.log, but it didn't
report suspend at that moment at all: went straight from [UFW BLOCK] to
"Microcode updated" on force reboot (marked with an arrow):
Apr 13 10:40:32 bong kernel: asus_wmi: Unknown key code 0xcf
Apr 13 10:44:05 bong kernel: [UFW BLOCK] IN=wlan0 OUT= MAC=/*confidential*/
Apr 13 10:47:45 bong kernel: [UFW BLOCK] IN=wlan0 OUT= MAC=/*confidential*/
Apr 13 10:47:46 bong kernel: ICMPv6: NA: /*router*/ advertised our address /*ipv6*/ on wlan0!
Apr 13 10:47:48 bong last message buffered 2 times
-> Apr 13 10:49:11 bong kernel: [UFW BLOCK] IN=wlan0 OUT= MAC=/*confidential*/
Apr 13 10:52:34 bong kernel: microcode: microcode updated early to revision 0xf0, date = 2021-11-12
Apr 13 10:52:34 bong kernel: Linux version 6.1.23-bong+ (acid@bong) (gcc (Gentoo Hardened 12.2.1_p20230121-r1 p10) 12.2.1 20230121, GNU ld (Gentoo 2.39 p5) 2.39.0) #1 SMP PREEMPT_DYNAMIC Tue Apr 11 15:21:57 EEST 2023
Apr 13 10:52:34 bong kernel: Command line: root=/dev/genston/root ro loglevel=4 rd.lvm.vg=genston rd.luks.uuid=97d10669-2da1-452d-a372-887e420b2ad4 rd.luks.allow-discards pci=nomsi initrd=\x5cinitramfs-6.1.23-bong+.img
Apr 13 10:52:34 bong kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Apr 13 10:52:34 bong kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Apr 13 10:52:34 bong kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Normally it starts like this (taken from dmesg to sync with elogind messages)
[ 7835.869228] elogind-daemon[2033]: Lid closed.
[ 7835.872875] elogind-daemon[2033]: Suspending...
[ 7835.873955] elogind-daemon[2033]: Suspending system...
[ 7835.873970] PM: suspend entry (deep)
[ 7835.902814] Filesystems sync: 0.028 seconds
[ 7835.920362] Freezing user space processes
[ 7835.923030] Freezing user space processes completed (elapsed 0.002 seconds)
[ 7835.923046] OOM killer disabled.
[ 7835.923049] Freezing remaining freezable tasks
[ 7835.924445] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 7835.924624] printk: Suspending console(s) (use no_console_suspend to debug)
The issue appeared when I was using pf-kernel with genpatches and
updated from 6.1-pf2 to 6.1-pf3 (corresponding to vanilla versions 6.1.3
-> 6.1.6). I used that fork until 6.2-pf2, but since then (early March)
moved to vanilla sources and started following the 6.1.y branch when it
was declared LTS. And the issue was present on all of them.
The hang was last detected 3 days ago on 6.1.22 and today on 6.1.23.
I'd like to bisect it, but it could take ages for a couple of reasons:
1) I don't know exact patterns it follows. One of the scenarios I've
noticed was this one (sorry if too ridiculous):
- put the laptop on the nearby couch and simultaneously close
the lid; the loose charger jack might disconnect;
- lay the mouse upside down (so it doesn't wake up when I
reconnect the charger),
but it's not a 100% guarantee of the bug and, as I said earlier, the
laptop also misbehaves on shutdown.
2) The issue happens rarely, once in a few days (sometimes up to a week;
I haven't measured it precisely back then).
Hardware: https://tilde.cafe/u/acidbong/kernel/lspci (`lspci -vvnn`)
Config (latest vanilla): https://git.sr.ht/~acid-bong/kernel/tree/806e6639da610952798e1b5d8c0d700062…
Built with KCFLAGS="-march=native"
Isolated cmdline: root=/dev/genston/root ro loglevel=4 rd.lvm.vg=genston rd.luks.uuid=97d10669-2da1-452d-a372-887e420b2ad4 rd.luks.allow-discards pci=nomsi initrd=\initramfs-6.1.23-bong+.img
# regzbot introduced v6.1.3..v6.1.6
---
Regards,
~acidbong
As noted by Michal, the blkg_iostat_set's in the lockless list hold
reference to blkg's to protect against their removal. Those blkg's
hold reference to blkcg. When a cgroup is being destroyed,
cgroup_rstat_flush() is only called at css_release_work_fn() which
is called when the blkcg reference count reaches 0. This circular
dependency will prevent blkcg and some blkgs from being freed after
they are made offline.
It is less a problem if the cgroup to be destroyed also has other
controllers like memory that will call cgroup_rstat_flush() which will
clean up the reference count. If block is the only controller that uses
rstat, these offline blkcg and blkgs may never be freed leaking more
and more memory over time.
To prevent this potential memory leak:
- flush blkcg per-cpu stats list in __blkg_release(), when no new stat
can be added
- add global blkg_stat_lock for covering concurrent parent blkg stat
update
- don't grab bio->bi_blkg reference when adding the stats into blkcg's
per-cpu stat list since all stats are guaranteed to be consumed before
releasing blkg instance, and grabbing blkg reference for stats was the
most fragile part of original patch
Based on Waiman's patch:
https://lore.kernel.org/linux-block/20221215033132.230023-3-longman@redhat.…
Fixes: 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()")
Cc: stable(a)vger.kernel.org
Reported-by: Jay Shin <jaeshin(a)redhat.com>
Cc: Waiman Long <longman(a)redhat.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: mkoutny(a)suse.com
Cc: Yosry Ahmed <yosryahmed(a)google.com>
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
---
V3:
- add one global blkg_stat_lock for avoiding concurrent update on
blkg stat; this way is easier for backport, also won't cause contention;
V2:
- remove kernel/cgroup change, and call blkcg_rstat_flush()
to flush stat directly
block/blk-cgroup.c | 40 +++++++++++++++++++++++++++++++---------
1 file changed, 31 insertions(+), 9 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 0ce64dd73cfe..f0b5c9c41cde 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -34,6 +34,8 @@
#include "blk-ioprio.h"
#include "blk-throttle.h"
+static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu);
+
/*
* blkcg_pol_mutex protects blkcg_policy[] and policy [de]activation.
* blkcg_pol_register_mutex nests outside of it and synchronizes entire
@@ -56,6 +58,8 @@ static LIST_HEAD(all_blkcgs); /* protected by blkcg_pol_mutex */
bool blkcg_debug_stats = false;
+static DEFINE_RAW_SPINLOCK(blkg_stat_lock);
+
#define BLKG_DESTROY_BATCH_SIZE 64
/*
@@ -163,10 +167,20 @@ static void blkg_free(struct blkcg_gq *blkg)
static void __blkg_release(struct rcu_head *rcu)
{
struct blkcg_gq *blkg = container_of(rcu, struct blkcg_gq, rcu_head);
+ struct blkcg *blkcg = blkg->blkcg;
+ int cpu;
#ifdef CONFIG_BLK_CGROUP_PUNT_BIO
WARN_ON(!bio_list_empty(&blkg->async_bios));
#endif
+ /*
+ * Flush all the non-empty percpu lockless lists before releasing
+ * us, given these stat belongs to us.
+ *
+ * blkg_stat_lock is for serializing blkg stat update
+ */
+ for_each_possible_cpu(cpu)
+ __blkcg_rstat_flush(blkcg, cpu);
/* release the blkcg and parent blkg refs this blkg has been holding */
css_put(&blkg->blkcg->css);
@@ -951,23 +965,26 @@ static void blkcg_iostat_update(struct blkcg_gq *blkg, struct blkg_iostat *cur,
u64_stats_update_end_irqrestore(&blkg->iostat.sync, flags);
}
-static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
+static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu)
{
- struct blkcg *blkcg = css_to_blkcg(css);
struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu);
struct llist_node *lnode;
struct blkg_iostat_set *bisc, *next_bisc;
- /* Root-level stats are sourced from system-wide IO stats */
- if (!cgroup_parent(css->cgroup))
- return;
-
rcu_read_lock();
lnode = llist_del_all(lhead);
if (!lnode)
goto out;
+ /*
+ * For covering concurrent parent blkg update from blkg_release().
+ *
+ * When flushing from cgroup, cgroup_rstat_lock is always held, so
+ * this lock won't cause contention most of time.
+ */
+ raw_spin_lock(&blkg_stat_lock);
+
/*
* Iterate only the iostat_cpu's queued in the lockless list.
*/
@@ -991,13 +1008,19 @@ static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
if (parent && parent->parent)
blkcg_iostat_update(parent, &blkg->iostat.cur,
&blkg->iostat.last);
- percpu_ref_put(&blkg->refcnt);
}
-
+ raw_spin_unlock(&blkg_stat_lock);
out:
rcu_read_unlock();
}
+static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
+{
+ /* Root-level stats are sourced from system-wide IO stats */
+ if (cgroup_parent(css->cgroup))
+ __blkcg_rstat_flush(css_to_blkcg(css), cpu);
+}
+
/*
* We source root cgroup stats from the system-wide stats to avoid
* tracking the same information twice and incurring overhead when no
@@ -2075,7 +2098,6 @@ void blk_cgroup_bio_start(struct bio *bio)
llist_add(&bis->lnode, lhead);
WRITE_ONCE(bis->lqueued, true);
- percpu_ref_get(&bis->blkg->refcnt);
}
u64_stats_update_end_irqrestore(&bis->sync, flags);
--
2.40.1
I'm proposing to address the most obvious issues with dpt_i2o on stable
branches. At this stage it may be better to remove it as has been done
upstream, but I'd rather limit the regression for anyone still using
the hardware.
The changes are:
- "scsi: dpt_i2o: Remove broken pass-through ioctl (I2OUSERCMD)",
which closes security flaws including CVE-2023-2007.
- "scsi: dpt_i2o: Do not process completions with invalid addresses",
which removes the remaining bus_to_virt() call and may slightly
improve handling of misbehaving hardware.
These changes have been compiled on all the relevant stable branches,
but I don't have hardware to test on.
Ben.
--
Ben Hutchings - Debian developer, member of kernel, installer and LTS
teams