From: Michal Hocko mhocko@suse.com
[ Upstream commit 093590c16b447f53e66771c8579ae66c96f6ef61 ]
The fill_page_cache_func() function allocates couple of pages to store kvfree_rcu_bulk_data structures. This is a lightweight (GFP_NORETRY) allocation which can fail under memory pressure. The function will, however keep retrying even when the previous attempt has failed.
This retrying is in theory correct, but in practice the allocation is invoked from workqueue context, which means that if the memory reclaim gets stuck, these retries can hog the worker for quite some time. Although the workqueues subsystem automatically adjusts concurrency, such adjustment is not guaranteed to happen until the worker context sleeps. And the fill_page_cache_func() function's retry loop is not guaranteed to sleep (see the should_reclaim_retry() function).
And we have seen this function cause workqueue lockups:
kernel: BUG: workqueue lockup - pool cpus=93 node=1 flags=0x1 nice=0 stuck for 32s! [...] kernel: pool 74: cpus=37 node=0 flags=0x1 nice=0 hung=32s workers=2 manager: 2146 kernel: pwq 498: cpus=249 node=1 flags=0x1 nice=0 active=4/256 refcnt=5 kernel: in-flight: 1917:fill_page_cache_func kernel: pending: dbs_work_handler, free_work, kfree_rcu_monitor
Originally, we thought that the root cause of this lockup was several retries with direct reclaim, but this is not yet confirmed. Furthermore, we have seen similar lockups without any heavy memory pressure. This suggests that there are other factors contributing to these lockups. However, it is not really clear that endless retries are desireable.
So let's make the fill_page_cache_func() function back off after allocation failure.
Cc: Uladzislau Rezki (Sony) urezki@gmail.com Cc: "Paul E. McKenney" paulmck@kernel.org Cc: Frederic Weisbecker frederic@kernel.org Cc: Neeraj Upadhyay quic_neeraju@quicinc.com Cc: Josh Triplett josh@joshtriplett.org Cc: Steven Rostedt rostedt@goodmis.org Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Lai Jiangshan jiangshanlai@gmail.com Cc: Joel Fernandes joel@joelfernandes.org Signed-off-by: Michal Hocko mhocko@suse.com Reviewed-by: Uladzislau Rezki (Sony) urezki@gmail.com Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/rcu/tree.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index b41009a283ca..b10d6bcea77d 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3393,15 +3393,16 @@ static void fill_page_cache_func(struct work_struct *work) bnode = (struct kvfree_rcu_bulk_data *) __get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
- if (bnode) { - raw_spin_lock_irqsave(&krcp->lock, flags); - pushed = put_cached_bnode(krcp, bnode); - raw_spin_unlock_irqrestore(&krcp->lock, flags); + if (!bnode) + break;
- if (!pushed) { - free_page((unsigned long) bnode); - break; - } + raw_spin_lock_irqsave(&krcp->lock, flags); + pushed = put_cached_bnode(krcp, bnode); + raw_spin_unlock_irqrestore(&krcp->lock, flags); + + if (!pushed) { + free_page((unsigned long) bnode); + break; } }
From: Zqiang qiang1.zhang@intel.com
[ Upstream commit fcd53c8a4dfa38bafb89efdd0b0f718f3a03f884 ]
Kernels built with CONFIG_PROVE_RCU=y and CONFIG_DEBUG_LOCK_ALLOC=y attempt to emit a warning when the synchronize_rcu_tasks_generic() function is called during early boot while the rcu_scheduler_active variable is RCU_SCHEDULER_INACTIVE. However the warnings is not actually be printed because the debug_lockdep_rcu_enabled() returns false, exactly because the rcu_scheduler_active variable is still equal to RCU_SCHEDULER_INACTIVE.
This commit therefore replaces RCU_LOCKDEP_WARN() with WARN_ONCE() to force these warnings to actually be printed.
Signed-off-by: Zqiang qiang1.zhang@intel.com Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/rcu/tasks.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h index 14af29fe1377..8b51e6a5b386 100644 --- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h @@ -171,7 +171,7 @@ static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func, static void synchronize_rcu_tasks_generic(struct rcu_tasks *rtp) { /* Complain if the scheduler has not started. */ - RCU_LOCKDEP_WARN(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE, + WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE, "synchronize_rcu_tasks called too soon");
/* Wait for the grace period. */
From: Arvid Norlander lkml@vorpal.se
[ Upstream commit 574160b8548deff8b80b174f03201e94ab8431e2 ]
Toshiba Satellite Z830 needs the quirk video_disable_backlight_sysfs_if for proper backlight control after suspend/resume cycles.
Toshiba Portege Z830 is simply the same laptop rebranded for certain markets (I looked through the manual to other language sections to confirm this) and thus also needs this quirk.
Thanks to Hans de Goede for suggesting this fix.
Link: https://www.spinics.net/lists/platform-driver-x86/msg34394.html Suggested-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Arvid Norlander lkml@vorpal.se Reviewed-by: Hans de Goede hdegoede@redhat.com Tested-by: Arvid Norlander lkml@vorpal.se Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/acpi_video.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/drivers/acpi/acpi_video.c b/drivers/acpi/acpi_video.c index eb04b2f828ee..cf6c9ffe04a2 100644 --- a/drivers/acpi/acpi_video.c +++ b/drivers/acpi/acpi_video.c @@ -498,6 +498,22 @@ static const struct dmi_system_id video_dmi_table[] = { DMI_MATCH(DMI_PRODUCT_NAME, "SATELLITE R830"), }, }, + { + .callback = video_disable_backlight_sysfs_if, + .ident = "Toshiba Satellite Z830", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "TOSHIBA"), + DMI_MATCH(DMI_PRODUCT_NAME, "SATELLITE Z830"), + }, + }, + { + .callback = video_disable_backlight_sysfs_if, + .ident = "Toshiba Portege Z830", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "TOSHIBA"), + DMI_MATCH(DMI_PRODUCT_NAME, "PORTEGE Z830"), + }, + }, /* * Some machine's _DOD IDs don't have bit 31(Device ID Scheme) set * but the IDs actually follow the Device ID Scheme.
From: Kees Cook keescook@chromium.org
[ Upstream commit 0dedcf6e3301836eb70cfa649052e7ce4fcd13ba ]
Clang is especially sensitive about argument type matching when using __overloaded functions (like memcmp(), etc). Help it see that function pointers are just "void *". Avoids this error:
arch/mips/bcm47xx/prom.c:89:8: error: no matching function for call to 'memcmp' if (!memcmp(prom_init, prom_init + mem, 32)) ^~~~~~ include/linux/string.h:156:12: note: candidate function not viable: no known conversion from 'void (void)' to 'const void *' for 1st argument extern int memcmp(const void *,const void *,__kernel_size_t);
Cc: Hauke Mehrtens hauke@hauke-m.de Cc: "Rafał Miłecki" zajec5@gmail.com Cc: Thomas Bogendoerfer tsbogend@alpha.franken.de Cc: linux-mips@vger.kernel.org Cc: Nathan Chancellor nathan@kernel.org Cc: Nick Desaulniers ndesaulniers@google.com Cc: llvm@lists.linux.dev Reported-by: kernel test robot lkp@intel.com Link: https://lore.kernel.org/lkml/202209080652.sz2d68e5-lkp@intel.com Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/mips/bcm47xx/prom.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/mips/bcm47xx/prom.c b/arch/mips/bcm47xx/prom.c index 3e2a8166377f..22509b5fab74 100644 --- a/arch/mips/bcm47xx/prom.c +++ b/arch/mips/bcm47xx/prom.c @@ -86,7 +86,7 @@ static __init void prom_init_mem(void) pr_debug("Assume 128MB RAM\n"); break; } - if (!memcmp(prom_init, prom_init + mem, 32)) + if (!memcmp((void *)prom_init, (void *)prom_init + mem, 32)) break; } lowmem = mem; @@ -163,7 +163,7 @@ void __init bcm47xx_prom_highmem_init(void)
off = EXTVBASE + __pa(off); for (extmem = 128 << 20; extmem < 512 << 20; extmem <<= 1) { - if (!memcmp(prom_init, (void *)(off + extmem), 16)) + if (!memcmp((void *)prom_init, (void *)(off + extmem), 16)) break; } extmem -= lowmem;
From: Chao Qin chao.qin@intel.com
[ Upstream commit 2d93540014387d1c73b9ccc4d7895320df66d01b ]
When value < time_unit, the parameter of ilog2() will be zero and the return value is -1. u64(-1) is too large for shift exponent and then will trigger shift-out-of-bounds:
shift exponent 18446744073709551615 is too large for 32-bit type 'int' Call Trace: rapl_compute_time_window_core rapl_write_data_raw set_time_window store_constraint_time_window_us
Signed-off-by: Chao Qin chao.qin@intel.com Acked-by: Zhang Rui rui.zhang@intel.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/powercap/intel_rapl_common.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c index 70d6d52bc1e2..64a86990671a 100644 --- a/drivers/powercap/intel_rapl_common.c +++ b/drivers/powercap/intel_rapl_common.c @@ -938,6 +938,9 @@ static u64 rapl_compute_time_window_core(struct rapl_package *rp, u64 value, y = value & 0x1f; value = (1 << y) * (4 + f) * rp->time_unit / 4; } else { + if (value < rp->time_unit) + return 0; + do_div(value, rp->time_unit); y = ilog2(value); f = div64_u64(4 * (value - (1 << y)), 1 << y);
From: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com
[ Upstream commit 68b99e94a4a2db6ba9b31fe0485e057b9354a640 ]
When CPU 0 is offline and intel_powerclamp is used to inject idle, it generates kernel BUG:
BUG: using smp_processor_id() in preemptible [00000000] code: bash/15687 caller is debug_smp_processor_id+0x17/0x20 CPU: 4 PID: 15687 Comm: bash Not tainted 5.19.0-rc7+ #57 Call Trace: <TASK> dump_stack_lvl+0x49/0x63 dump_stack+0x10/0x16 check_preemption_disabled+0xdd/0xe0 debug_smp_processor_id+0x17/0x20 powerclamp_set_cur_state+0x7f/0xf9 [intel_powerclamp] ... ...
Here CPU 0 is the control CPU by default and changed to the current CPU, if CPU 0 offlined. This check has to be performed under cpus_read_lock(), hence the above warning.
Use get_cpu() instead of smp_processor_id() to avoid this BUG.
Suggested-by: Chen Yu yu.c.chen@intel.com Signed-off-by: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/thermal/intel/intel_powerclamp.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/intel/intel_powerclamp.c b/drivers/thermal/intel/intel_powerclamp.c index b0eb5ece9243..14381f7587ff 100644 --- a/drivers/thermal/intel/intel_powerclamp.c +++ b/drivers/thermal/intel/intel_powerclamp.c @@ -532,8 +532,10 @@ static int start_power_clamp(void)
/* prefer BSP */ control_cpu = 0; - if (!cpu_online(control_cpu)) - control_cpu = smp_processor_id(); + if (!cpu_online(control_cpu)) { + control_cpu = get_cpu(); + put_cpu(); + }
clamping = true; schedule_delayed_work(&poll_pkg_cstate_work, 0);
From: Kees Cook keescook@chromium.org
[ Upstream commit 1b64daf413acd86c2c13f5443f6b4ef3690c8061 ]
The .data.rel.ro.local section has the same semantics as .data.rel.ro here, so include it in the .rodata section of the decompressor. Additionally since the .printk_index section isn't usable outside of the core kernel, discard it in the decompressor. Avoids these warnings:
arm-linux-gnueabi-ld: warning: orphan section `.data.rel.ro.local' from `arch/arm/boot/compressed/fdt_rw.o' being placed in section `.data.rel.ro.local' arm-linux-gnueabi-ld: warning: orphan section `.printk_index' from `arch/arm/boot/compressed/fdt_rw.o' being placed in section `.printk_index'
Reported-by: kernel test robot lkp@intel.com Link: https://lore.kernel.org/linux-mm/202209080545.qMIVj7YM-lkp@intel.com Cc: Russell King linux@armlinux.org.uk Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/compressed/vmlinux.lds.S | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/arm/boot/compressed/vmlinux.lds.S b/arch/arm/boot/compressed/vmlinux.lds.S index 1bcb68ac4b01..3fcb3e62dc56 100644 --- a/arch/arm/boot/compressed/vmlinux.lds.S +++ b/arch/arm/boot/compressed/vmlinux.lds.S @@ -23,6 +23,7 @@ SECTIONS *(.ARM.extab*) *(.note.*) *(.rel.*) + *(.printk_index) /* * Discard any r/w data - this produces a link error if we have any, * which is required for PIC decompression. Local data generates @@ -57,6 +58,7 @@ SECTIONS *(.rodata) *(.rodata.*) *(.data.rel.ro) + *(.data.rel.ro.*) } .piggydata : { *(.piggydata)
Hi!
From: Kees Cook keescook@chromium.org
[ Upstream commit 1b64daf413acd86c2c13f5443f6b4ef3690c8061 ]
The .data.rel.ro.local section has the same semantics as .data.rel.ro here, so include it in the .rodata section of the decompressor. Additionally since the .printk_index section isn't usable outside of the core kernel, discard it in the decompressor. Avoids these warnings:
arm-linux-gnueabi-ld: warning: orphan section `.data.rel.ro.local' from `arch/arm/boot/compressed/fdt_rw.o' being placed in section `.data.rel.ro.local' arm-linux-gnueabi-ld: warning: orphan section `.printk_index' from `arch/arm/boot/compressed/fdt_rw.o' being placed in section `.printk_index'
There's no printk_index in 5.10. Perhaps this does not need to be backported?
Best regards, Pavel
+++ b/arch/arm/boot/compressed/vmlinux.lds.S @@ -23,6 +23,7 @@ SECTIONS *(.ARM.extab*) *(.note.*) *(.rel.*)
- *(.printk_index) /*
- Discard any r/w data - this produces a link error if we have any,
- which is required for PIC decompression. Local data generates
@@ -57,6 +58,7 @@ SECTIONS *(.rodata) *(.rodata.*) *(.data.rel.ro)
- *(.data.rel.ro.*) } .piggydata : { *(.piggydata)
-- 2.35.1
From: Kees Cook keescook@chromium.org
[ Upstream commit 3e1730842f142add55dc658929221521a9ea62b6 ]
Clang produces a false positive when building with CONFIG_FORTIFY_SOURCE=y and CONFIG_UBSAN_BOUNDS=y when operating on an array with a dynamic offset. Work around this by using a direct assignment of an empty instance. Avoids this warning:
../include/linux/fortify-string.h:309:4: warning: call to __write_overflow_field declared with 'warn ing' attribute: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wat tribute-warning] __write_overflow_field(p_size_field, size); ^
which was isolated to the memset() call in xen_load_idt().
Note that this looks very much like another bug that was worked around: https://github.com/ClangBuiltLinux/linux/issues/1592
Cc: Juergen Gross jgross@suse.com Cc: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: Dave Hansen dave.hansen@linux.intel.com Cc: x86@kernel.org Cc: "H. Peter Anvin" hpa@zytor.com Cc: xen-devel@lists.xenproject.org Reviewed-by: Boris Ostrovsky boris.ostrovsky@oracle.com Link: https://lore.kernel.org/lkml/41527d69-e8ab-3f86-ff37-6b298c01d5bc@oracle.com Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/xen/enlighten_pv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 804c65d2b95f..815030b7f6fa 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -768,6 +768,7 @@ static void xen_load_idt(const struct desc_ptr *desc) { static DEFINE_SPINLOCK(lock); static struct trap_info traps[257]; + static const struct trap_info zero = { }; unsigned out;
trace_xen_cpu_load_idt(desc); @@ -777,7 +778,7 @@ static void xen_load_idt(const struct desc_ptr *desc) memcpy(this_cpu_ptr(&idt_desc), desc, sizeof(idt_desc));
out = xen_convert_trap_info(desc, traps, false); - memset(&traps[out], 0, sizeof(traps[0])); + traps[out] = zero;
xen_mc_flush(); if (HYPERVISOR_set_trap_table(traps))
From: Anna Schumaker Anna.Schumaker@Netapp.com
[ Upstream commit 06981d560606ac48d61e5f4fff6738b925c93173 ]
This was discussed with Chuck as part of this patch set. Returning nfserr_resource was decided to not be the best error message here, and he suggested changing to nfserr_serverfault instead.
Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Link: https://lore.kernel.org/linux-nfs/20220907195259.926736-1-anna@kernel.org/T/... Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4xdr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 46f825cf53f4..cc605ee0b2fa 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3871,7 +3871,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr, if (resp->xdr.buf->page_len && test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags)) { WARN_ON_ONCE(1); - return nfserr_resource; + return nfserr_serverfault; } xdr_commit_encode(xdr);
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit 019805fea91599b22dfa62ffb29c022f35abeb06 ]
Use-after-free occurred when the laundromat tried to free expired cpntf_state entry on the s2s_cp_stateids list after inter-server copy completed. The sc_cp_list that the expired copy state was inserted on was already freed.
When COPY completes, the Linux client normally sends LOCKU(lock_state x), FREE_STATEID(lock_state x) and CLOSE(open_state y) to the source server. The nfs4_put_stid call from nfsd4_free_stateid cleans up the copy state from the s2s_cp_stateids list before freeing the lock state's stid.
However, sometimes the CLOSE was sent before the FREE_STATEID request. When this happens, the nfsd4_close_open_stateid call from nfsd4_close frees all lock states on its st_locks list without cleaning up the copy state on the sc_cp_list list. When the time the FREE_STATEID arrives the server returns BAD_STATEID since the lock state was freed. This causes the use-after-free error to occur when the laundromat tries to free the expired cpntf_state.
This patch adds a call to nfs4_free_cpntf_statelist in nfsd4_close_open_stateid to clean up the copy state before calling free_ol_stateid_reaplist to free the lock state's stid on the reaplist.
Signed-off-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index f1b503bec222..665d0eaeb8db 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -843,6 +843,7 @@ static struct nfs4_ol_stateid * nfs4_alloc_open_stateid(struct nfs4_client *clp)
static void nfs4_free_deleg(struct nfs4_stid *stid) { + WARN_ON(!list_empty(&stid->sc_cp_list)); kmem_cache_free(deleg_slab, stid); atomic_long_dec(&num_delegations); } @@ -1358,6 +1359,7 @@ static void nfs4_free_ol_stateid(struct nfs4_stid *stid) release_all_access(stp); if (stp->st_stateowner) nfs4_put_stateowner(stp->st_stateowner); + WARN_ON(!list_empty(&stid->sc_cp_list)); kmem_cache_free(stateid_slab, stid); }
@@ -6207,6 +6209,7 @@ static void nfsd4_close_open_stateid(struct nfs4_ol_stateid *s) struct nfs4_client *clp = s->st_stid.sc_client; bool unhashed; LIST_HEAD(reaplist); + struct nfs4_ol_stateid *stp;
spin_lock(&clp->cl_lock); unhashed = unhash_open_stateid(s, &reaplist); @@ -6215,6 +6218,8 @@ static void nfsd4_close_open_stateid(struct nfs4_ol_stateid *s) if (unhashed) put_ol_stateid_locked(s, &reaplist); spin_unlock(&clp->cl_lock); + list_for_each_entry(stp, &reaplist, st_locks) + nfs4_free_cpntf_statelist(clp->net, &stp->st_stid); free_ol_stateid_reaplist(&reaplist); } else { spin_unlock(&clp->cl_lock);
linux-stable-mirror@lists.linaro.org