Stable Team,
> Revert of revert of "io_uring: wait potential ->release() on resurrect",
> which adds a helper for resurrect not racing completion reinit, as was
> removed because of a strange bug with no clear root or link to the
> patch.
>
> Was improved, instead of rcu_synchronize(), just wait_for_completion()
> because we're at 0 refs and it will happen very shortly. Specifically
> use non-interruptible version to ignore all pending signals that may
> have ended prior interruptible wait.
>
> This reverts commit cb5e1b81304e089ee3ca948db4d29f71902eb575.
>
> Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
> ---
> fs/io_uring.c | 18 ++++++++++++++----
> 1 file changed, 14 insertions(+), 4 deletions(-)
Please back-port this as far as it will apply.
Definitely through v5.10.y.
It solves a critical bug.
Subject: "io_uring: return back safer resurrect"
Upstream commit:: f70865db5ff35f5ed0c7e9ef63e7cca3d4947f04
--
Lee Jones [李琼斯]
Principal Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog
Daniel Dao has reported [1] a regression on workloads that may trigger
a lot of refaults (anon and file). The underlying issue is that flushing
rstat is expensive. Although rstat flush are batched with (nr_cpus *
MEMCG_BATCH) stat updates, it seems like there are workloads which
genuinely do stat updates larger than batch value within short amount of
time. Since the rstat flush can happen in the performance critical
codepaths like page faults, such workload can suffer greatly.
This patch fixes this regression by making the rstat flushing
conditional in the performance critical codepaths. More specifically,
the kernel relies on the async periodic rstat flusher to flush the stats
and only if the periodic flusher is delayed by more than twice the
amount of its normal time window then the kernel allows rstat flushing
from the performance critical codepaths.
Now the question: what are the side-effects of this change? The worst
that can happen is the refault codepath will see 4sec old lruvec stats
and may cause false (or missed) activations of the refaulted page which
may under-or-overestimate the workingset size. Though that is not very
concerning as the kernel can already miss or do false activations.
There are two more codepaths whose flushing behavior is not changed by
this patch and we may need to come to them in future. One is the
writeback stats used by dirty throttling and second is the deactivation
heuristic in the reclaim. For now keeping an eye on them and if there is
report of regression due to these codepaths, we will reevaluate then.
Link: https://lore.kernel.org/all/CA+wXwBSyO87ZX5PVwdHm-=dBjZYECGmfnydUicUyrQqndg… [1]
Fixes: 1f828223b799 ("memcg: flush lruvec stats in the refault")
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Reported-by: Daniel Dao <dqminh(a)cloudflare.com>
Cc: <stable(a)vger.kernel.org>
---
include/linux/memcontrol.h | 5 +++++
mm/memcontrol.c | 12 +++++++++++-
mm/workingset.c | 2 +-
3 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a68dce3873fc..89b14729d59f 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1012,6 +1012,7 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec,
}
void mem_cgroup_flush_stats(void);
+void mem_cgroup_flush_stats_delayed(void);
void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
int val);
@@ -1455,6 +1456,10 @@ static inline void mem_cgroup_flush_stats(void)
{
}
+static inline void mem_cgroup_flush_stats_delayed(void)
+{
+}
+
static inline void __mod_memcg_lruvec_state(struct lruvec *lruvec,
enum node_stat_item idx, int val)
{
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f79bb3f25ce4..edfb337e6948 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -587,6 +587,9 @@ static DECLARE_DEFERRABLE_WORK(stats_flush_dwork, flush_memcg_stats_dwork);
static DEFINE_SPINLOCK(stats_flush_lock);
static DEFINE_PER_CPU(unsigned int, stats_updates);
static atomic_t stats_flush_threshold = ATOMIC_INIT(0);
+static u64 flush_next_time;
+
+#define FLUSH_TIME (2UL*HZ)
/*
* Accessors to ensure that preemption is disabled on PREEMPT_RT because it can
@@ -637,6 +640,7 @@ static void __mem_cgroup_flush_stats(void)
if (!spin_trylock_irqsave(&stats_flush_lock, flag))
return;
+ flush_next_time = jiffies_64 + 2*FLUSH_TIME;
cgroup_rstat_flush_irqsafe(root_mem_cgroup->css.cgroup);
atomic_set(&stats_flush_threshold, 0);
spin_unlock_irqrestore(&stats_flush_lock, flag);
@@ -648,10 +652,16 @@ void mem_cgroup_flush_stats(void)
__mem_cgroup_flush_stats();
}
+void mem_cgroup_flush_stats_delayed(void)
+{
+ if (rstat_flush_time && time_after64(jiffies_64, flush_next_time))
+ mem_cgroup_flush_stats();
+}
+
static void flush_memcg_stats_dwork(struct work_struct *w)
{
__mem_cgroup_flush_stats();
- queue_delayed_work(system_unbound_wq, &stats_flush_dwork, 2UL*HZ);
+ queue_delayed_work(system_unbound_wq, &stats_flush_dwork, FLUSH_TIME);
}
/**
diff --git a/mm/workingset.c b/mm/workingset.c
index 8a3828acc0bf..592569a8974c 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -355,7 +355,7 @@ void workingset_refault(struct folio *folio, void *shadow)
mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr);
- mem_cgroup_flush_stats();
+ mem_cgroup_flush_stats_delayed();
/*
* Compare the distance to the existing workingset size. We
* don't activate pages that couldn't stay resident even if
--
2.35.1.616.g0bdcbb4464-goog
Hi Rafael,
We (Fedora) have been receiving a whole bunch of bug reports about
laptops getting hot/toasty while suspended with kernels >= 5.16.10
and this seems to still happen with 5.17-rc7 too.
The following are all bugzilla.redhat.com bug numbers:
1750910 - Laptop failed to suspend and completely drained the battery
2050036 - Framework laptop: 5.16.5 breaks s2idle sleep
2053957 - Package c-states never go below C2
2056729 - No lid events when closing lid / laptop does not suspend
2057909 - Thinkpad X1C 9th in s2idle suspend still draining battery to zero over night , Ap
2059668 - HP Envy Laptop deadlocks on entering suspend power state when plugged in. Case ge
2059688 - Dell G15 5510 s2idle fails in 5.16.11 works in 5.16.10
And one of the bugs has also been mirrored at bugzilla.kernel.org by
the reporter:
bko215641 - Dell G15 5510 s2idle fails in 5.16.11 works in 5.16.10
The common denominator here (besides the kernel version) seems to
be that these are all Ice or Tiger Lake systems (I did not do
check this applies 100% to all bugs, but it does see, to be a pattern).
A similar arch-linux report:
https://bbs.archlinux.org/viewtopic.php?id=274292&p=2
Suggest that reverting
"ACPI: PM: s2idle: Cancel wakeup before dispatching EC GPE"
which was cherry-picked into 5.16.10 fixes things.
If you want I can create Fedora kernel test-rpms of a recent
5.16.y with just that one commit reverted and ask users to
confirm if that helps. Please let me know if doing that woulkd
be useful ?
Regards,
Hans
As pointed out by this bug report [1], the buffered write is now broken on
S29GL064N. The reason is that changed the buffered write to use chip_good
instead of chip_ready. One way to solve the issue is to revert the change
partially to use chip_ready for S29GL064N since the way of least surprise.
[1] https://lore.kernel.org/r/b687c259-6413-26c9-d4c9-b3afa69ea124@pengutronix.…
Fixes: dfeae1073583("mtd: cfi_cmdset_0002: Change write buffer to check correct value")
Signed-off-by: Tokunori Ikegami <ikegami.t(a)gmail.com>
Tested-by: Ahmad Fatoum <a.fatoum(a)pengutronix.de>
Cc: Miquel Raynal <miquel.raynal(a)bootlin.com>
Cc: Richard Weinberger <richard(a)nod.at>
Cc: Vignesh Raghavendra <vigneshr(a)ti.com>
Cc: linux-mtd(a)lists.infradead.org
Cc: stable(a)vger.kernel.org
Tokunori Ikegami (3):
mtd: cfi_cmdset_0002: Add S29GL064N ID definition
mtd: cfi_cmdset_0002: Move and rename
chip_check/chip_ready/chip_good_for_write
mtd: cfi_cmdset_0002: Use chip_ready() for write on S29GL064N
drivers/mtd/chips/cfi_cmdset_0002.c | 89 +++++++++++++++--------------
1 file changed, 47 insertions(+), 42 deletions(-)
--
2.32.0
We might have RISC-V systems (such as QEMU) where VMID is not part
of the TLB entry tag so these systems will have to flush all TLB
enteries upon any change in hgatp.VMID.
Currently, we zero-out hgatp CSR in kvm_arch_vcpu_put() and we
re-program hgatp CSR in kvm_arch_vcpu_load(). For above described
systems, this will flush all TLB enteries whenever VCPU exits to
user-space hence reducing performance.
This patch fixes above described performance issue by not clearing
hgatp CSR in kvm_arch_vcpu_put().
Fixes: 34bde9d8b9e6 ("RISC-V: KVM: Implement VCPU world-switch")
Cc: stable(a)vger.kernel.org
Signed-off-by: Anup Patel <apatel(a)ventanamicro.com>
---
arch/riscv/kvm/vcpu.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 624166004e36..6785aef4cbd4 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -653,8 +653,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
vcpu->arch.isa);
kvm_riscv_vcpu_host_fp_restore(&vcpu->arch.host_context);
- csr_write(CSR_HGATP, 0);
-
csr->vsstatus = csr_read(CSR_VSSTATUS);
csr->vsie = csr_read(CSR_VSIE);
csr->vstvec = csr_read(CSR_VSTVEC);
--
2.25.1