July 2024 - Linux-stable-mirror

[PATCH 1/2] soc: fsl: qbman: delete bogus device tree fixup in qbman_init_private_mem()

by Vladimir Oltean

This is effectively a revert of commit 6ea4c0fe4570 ("soc/fsl/qbman: Update device tree with reserved memory"). What that commit intended to do: Fix up the device tree that is passed to a subsequent kexec-loaded kernel, so that the reserved-memory nodes have the same base addresses as the currently running kernel. What that commit actually does: Fix up the running device tree, which has no effect whatsoever upon the device tree passed to the next kernel. I would have refrained from making this kind of non-bugfix change in stable kernels, but qbman_init_private_mem() grossly misrepresents what this function does, and for an actual upcoming bug fix, it needs to be refactored. There is no place for the bogus code afterwards, so it needs to go as part of that, sadly. Cc: <stable(a)vger.kernel.org> Signed-off-by: Vladimir Oltean <vladimir.oltean(a)nxp.com> --- drivers/soc/fsl/qbman/dpaa_sys.c | 31 ------------------------------- 1 file changed, 31 deletions(-) diff --git a/drivers/soc/fsl/qbman/dpaa_sys.c b/drivers/soc/fsl/qbman/dpaa_sys.c index e1d7b79cc450..b1cee145cbd7 100644 --- a/drivers/soc/fsl/qbman/dpaa_sys.c +++ b/drivers/soc/fsl/qbman/dpaa_sys.c @@ -39,8 +39,6 @@ int qbman_init_private_mem(struct device *dev, int idx, const char *compat, { struct device_node *mem_node; struct reserved_mem *rmem; - int err; - __be32 *res_array; mem_node = of_parse_phandle(dev->of_node, "memory-region", idx); if (!mem_node) { @@ -60,34 +58,5 @@ int qbman_init_private_mem(struct device *dev, int idx, const char *compat, *addr = rmem->base; *size = rmem->size; - /* - * Check if the reg property exists - if not insert the node - * so upon kexec() the same memory region address will be preserved. - * This is needed because QBMan HW does not allow the base address/ - * size to be modified once set. - */ - if (!of_property_present(mem_node, "reg")) { - struct property *prop; - - prop = devm_kzalloc(dev, sizeof(*prop), GFP_KERNEL); - if (!prop) - return -ENOMEM; - prop->value = res_array = devm_kzalloc(dev, sizeof(__be32) * 4, - GFP_KERNEL); - if (!prop->value) - return -ENOMEM; - res_array[0] = cpu_to_be32(upper_32_bits(*addr)); - res_array[1] = cpu_to_be32(lower_32_bits(*addr)); - res_array[2] = cpu_to_be32(upper_32_bits(*size)); - res_array[3] = cpu_to_be32(lower_32_bits(*size)); - prop->length = sizeof(__be32) * 4; - prop->name = devm_kstrdup(dev, "reg", GFP_KERNEL); - if (!prop->name) - return -ENOMEM; - err = of_add_property(mem_node, prop); - if (err) - return err; - } - return 0; } -- 2.34.1

11 months, 3 weeks

1
1
0 0

[PATCH v4] mm/numa_balancing: Teach mpol_to_str about the balancing mode

by Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> Since balancing mode was added in bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes"), it was possible to set this mode but it wouldn't be shown in /proc/<pid>/numa_maps since there was no support for it in the mpol_to_str() helper. Furthermore, because the balancing mode sets the MPOL_F_MORON flag, it would be displayed as 'default' due a workaround introduced a few years earlier in 8790c71a18e5 ("mm/mempolicy.c: fix mempolicy printing in numa_maps"). To tidy this up we implement two changes: Replace the MPOL_F_MORON check by pointer comparison against the preferred_node_policy array. By doing this we generalise the current special casing and replace the incorrect 'default' with the correct 'bind' for the mode. Secondly, we add a string representation and corresponding handling for the MPOL_F_NUMA_BALANCING flag. With the two changes together we start showing the balancing flag when it is set and therefore complete the fix. Representation format chosen is to separate multiple flags with vertical bars, following what existed long time ago in kernel 2.6.25. But as between then and now there wasn't a way to display multiple flags, this patch does not change the format in practice. Some /proc/<pid>/numa_maps output examples: 555559580000 bind=balancing:0-1,3 file=... 555585800000 bind=balancing|static:0,2 file=... 555635240000 prefer=relative:0 file= v2: * Fully fix by introducing MPOL_F_KERNEL. v3: * Abandoned the MPOL_F_KERNEL approach in favour of pointer comparisons. * Removed lookup generalisation for easier backporting. * Replaced commas as separator with vertical bars. * Added a few more words about the string format in the commit message. v4: * Use is_power_of_2. * Use ARRAY_SIZE and update recommended buffer size for two flags. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> Fixes: bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes") References: 8790c71a18e5 ("mm/mempolicy.c: fix mempolicy printing in numa_maps") Cc: Huang Ying <ying.huang(a)intel.com> Cc: Mel Gorman <mgorman(a)suse.de> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Rik van Riel <riel(a)surriel.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: "Matthew Wilcox (Oracle)" <willy(a)infradead.org> Cc: Dave Hansen <dave.hansen(a)intel.com> Cc: Andi Kleen <ak(a)linux.intel.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: David Rientjes <rientjes(a)google.com> Cc: <stable(a)vger.kernel.org> # v5.12+ --- mm/mempolicy.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index aec756ae5637..a1bf9aa15c33 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -3293,8 +3293,9 @@ int mpol_parse_str(char *str, struct mempolicy **mpol) * @pol: pointer to mempolicy to be formatted * * Convert @pol into a string. If @buffer is too short, truncate the string. - * Recommend a @maxlen of at least 32 for the longest mode, "interleave", the - * longest flag, "relative", and to display at least a few node ids. + * Recommend a @maxlen of at least 51 for the longest mode, "weighted + * interleave", plus the longest flag flags, "relative|balancing", and to + * display at least a few node ids. */ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) { @@ -3303,7 +3304,10 @@ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) unsigned short mode = MPOL_DEFAULT; unsigned short flags = 0; - if (pol && pol != &default_policy && !(pol->flags & MPOL_F_MORON)) { + if (pol && + pol != &default_policy && + !(pol >= &preferred_node_policy[0] && + pol <= &preferred_node_policy[ARRAY_SIZE(preferred_node_policy) - 1])) { mode = pol->mode; flags = pol->flags; } @@ -3331,12 +3335,18 @@ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) p += snprintf(p, buffer + maxlen - p, "="); /* - * Currently, the only defined flags are mutually exclusive + * Static and relative are mutually exclusive. */ if (flags & MPOL_F_STATIC_NODES) p += snprintf(p, buffer + maxlen - p, "static"); else if (flags & MPOL_F_RELATIVE_NODES) p += snprintf(p, buffer + maxlen - p, "relative"); + + if (flags & MPOL_F_NUMA_BALANCING) { + if (!is_power_of_2(flags & MPOL_MODE_FLAGS)) + p += snprintf(p, buffer + maxlen - p, "|"); + p += snprintf(p, buffer + maxlen - p, "balancing"); + } } if (!nodes_empty(nodes)) -- 2.44.0

11 months, 3 weeks

3
2
0 0

[PATCH 5.4] SUNRPC: Fix RPC client cleaned up the freed pipefs dentries

by Hagar Hemdan

From: felix <fuzhen5(a)huawei.com> [ Upstream commit bfca5fb4e97c46503ddfc582335917b0cc228264 ] RPC client pipefs dentries cleanup is in separated rpc_remove_pipedir() workqueue,which takes care about pipefs superblock locking. In some special scenarios, when kernel frees the pipefs sb of the current client and immediately alloctes a new pipefs sb, rpc_remove_pipedir function would misjudge the existence of pipefs sb which is not the one it used to hold. As a result, the rpc_remove_pipedir would clean the released freed pipefs dentries. To fix this issue, rpc_remove_pipedir should check whether the current pipefs sb is consistent with the original pipefs sb. This error can be catched by KASAN: ========================================================= [ 250.497700] BUG: KASAN: slab-use-after-free in dget_parent+0x195/0x200 [ 250.498315] Read of size 4 at addr ffff88800a2ab804 by task kworker/0:18/106503 [ 250.500549] Workqueue: events rpc_free_client_work [ 250.501001] Call Trace: [ 250.502880] kasan_report+0xb6/0xf0 [ 250.503209] ? dget_parent+0x195/0x200 [ 250.503561] dget_parent+0x195/0x200 [ 250.503897] ? __pfx_rpc_clntdir_depopulate+0x10/0x10 [ 250.504384] rpc_rmdir_depopulate+0x1b/0x90 [ 250.504781] rpc_remove_client_dir+0xf5/0x150 [ 250.505195] rpc_free_client_work+0xe4/0x230 [ 250.505598] process_one_work+0x8ee/0x13b0 ... [ 22.039056] Allocated by task 244: [ 22.039390] kasan_save_stack+0x22/0x50 [ 22.039758] kasan_set_track+0x25/0x30 [ 22.040109] __kasan_slab_alloc+0x59/0x70 [ 22.040487] kmem_cache_alloc_lru+0xf0/0x240 [ 22.040889] __d_alloc+0x31/0x8e0 [ 22.041207] d_alloc+0x44/0x1f0 [ 22.041514] __rpc_lookup_create_exclusive+0x11c/0x140 [ 22.041987] rpc_mkdir_populate.constprop.0+0x5f/0x110 [ 22.042459] rpc_create_client_dir+0x34/0x150 [ 22.042874] rpc_setup_pipedir_sb+0x102/0x1c0 [ 22.043284] rpc_client_register+0x136/0x4e0 [ 22.043689] rpc_new_client+0x911/0x1020 [ 22.044057] rpc_create_xprt+0xcb/0x370 [ 22.044417] rpc_create+0x36b/0x6c0 ... [ 22.049524] Freed by task 0: [ 22.049803] kasan_save_stack+0x22/0x50 [ 22.050165] kasan_set_track+0x25/0x30 [ 22.050520] kasan_save_free_info+0x2b/0x50 [ 22.050921] __kasan_slab_free+0x10e/0x1a0 [ 22.051306] kmem_cache_free+0xa5/0x390 [ 22.051667] rcu_core+0x62c/0x1930 [ 22.051995] __do_softirq+0x165/0x52a [ 22.052347] [ 22.052503] Last potentially related work creation: [ 22.052952] kasan_save_stack+0x22/0x50 [ 22.053313] __kasan_record_aux_stack+0x8e/0xa0 [ 22.053739] __call_rcu_common.constprop.0+0x6b/0x8b0 [ 22.054209] dentry_free+0xb2/0x140 [ 22.054540] __dentry_kill+0x3be/0x540 [ 22.054900] shrink_dentry_list+0x199/0x510 [ 22.055293] shrink_dcache_parent+0x190/0x240 [ 22.055703] do_one_tree+0x11/0x40 [ 22.056028] shrink_dcache_for_umount+0x61/0x140 [ 22.056461] generic_shutdown_super+0x70/0x590 [ 22.056879] kill_anon_super+0x3a/0x60 [ 22.057234] rpc_kill_sb+0x121/0x200 Fixes: 0157d021d23a ("SUNRPC: handle RPC client pipefs dentries by network namespace aware routines") Signed-off-by: felix <fuzhen5(a)huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust(a)hammerspace.com> Signed-off-by: Hagar Hemdan <hagarhem(a)amazon.com> --- include/linux/sunrpc/clnt.h | 1 + net/sunrpc/clnt.c | 5 ++++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h index 18d4cf2d637b..db22a87f3b0a 100644 --- a/include/linux/sunrpc/clnt.h +++ b/include/linux/sunrpc/clnt.h @@ -73,6 +73,7 @@ struct rpc_clnt { #endif struct rpc_xprt_iter cl_xpi; const struct cred *cl_cred; + struct super_block *pipefs_sb; }; /* diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index dc3226edf22f..2e08876bf856 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -113,7 +113,8 @@ static void rpc_clnt_remove_pipedir(struct rpc_clnt *clnt) pipefs_sb = rpc_get_sb_net(net); if (pipefs_sb) { - __rpc_clnt_remove_pipedir(clnt); + if (pipefs_sb == clnt->pipefs_sb) + __rpc_clnt_remove_pipedir(clnt); rpc_put_sb_net(net); } } @@ -153,6 +154,8 @@ rpc_setup_pipedir(struct super_block *pipefs_sb, struct rpc_clnt *clnt) { struct dentry *dentry; + clnt->pipefs_sb = pipefs_sb; + if (clnt->cl_program->pipe_dir_name != NULL) { dentry = rpc_setup_pipedir_sb(pipefs_sb, clnt); if (IS_ERR(dentry)) -- 2.40.1

11 months, 3 weeks

1
0
0 0

[PATCH v2] ceph: force sending a cap update msg back to MDS for revoke op

by xiubli＠redhat.com

From: Xiubo Li <xiubli(a)redhat.com> If a client sends out a cap update dropping caps with the prior 'seq' just before an incoming cap revoke request, then the client may drop the revoke because it believes it's already released the requested capabilities. This causes the MDS to wait indefinitely for the client to respond to the revoke. It's therefore always a good idea to ack the cap revoke request with the bumped up 'seq'. Currently if the cap->issued equals to the newcaps the check_caps() will do nothing, we should force flush the caps. Cc: stable(a)vger.kernel.org Link: https://tracker.ceph.com/issues/61782 Signed-off-by: Xiubo Li <xiubli(a)redhat.com> --- V2: - Improved the patch to force send the cap update only when no caps being used. fs/ceph/caps.c | 33 ++++++++++++++++++++++----------- fs/ceph/super.h | 7 ++++--- 2 files changed, 26 insertions(+), 14 deletions(-) diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index 24c31f795938..b5473085a47b 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -2024,6 +2024,8 @@ bool __ceph_should_report_size(struct ceph_inode_info *ci) * CHECK_CAPS_AUTHONLY - we should only check the auth cap * CHECK_CAPS_FLUSH - we should flush any dirty caps immediately, without * further delay. + * CHECK_CAPS_FLUSH_FORCE - we should flush any caps immediately, without + * further delay. */ void ceph_check_caps(struct ceph_inode_info *ci, int flags) { @@ -2105,7 +2107,7 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags) } doutc(cl, "%p %llx.%llx file_want %s used %s dirty %s " - "flushing %s issued %s revoking %s retain %s %s%s%s\n", + "flushing %s issued %s revoking %s retain %s %s%s%s%s\n", inode, ceph_vinop(inode), ceph_cap_string(file_wanted), ceph_cap_string(used), ceph_cap_string(ci->i_dirty_caps), ceph_cap_string(ci->i_flushing_caps), @@ -2113,7 +2115,8 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags) ceph_cap_string(retain), (flags & CHECK_CAPS_AUTHONLY) ? " AUTHONLY" : "", (flags & CHECK_CAPS_FLUSH) ? " FLUSH" : "", - (flags & CHECK_CAPS_NOINVAL) ? " NOINVAL" : ""); + (flags & CHECK_CAPS_NOINVAL) ? " NOINVAL" : "", + (flags & CHECK_CAPS_FLUSH_FORCE) ? " FLUSH_FORCE" : ""); /* * If we no longer need to hold onto old our caps, and we may @@ -2223,6 +2226,9 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags) goto ack; } + if (flags & CHECK_CAPS_FLUSH_FORCE) + goto ack; + /* things we might delay */ if ((cap->issued & ~retain) == 0) continue; /* nope, all good */ @@ -3518,6 +3524,8 @@ static void handle_cap_grant(struct inode *inode, bool queue_invalidate = false; bool deleted_inode = false; bool fill_inline = false; + bool revoke_wait = false; + int flags = 0; /* * If there is at least one crypto block then we'll trust @@ -3713,16 +3721,18 @@ static void handle_cap_grant(struct inode *inode, ceph_cap_string(cap->issued), ceph_cap_string(newcaps), ceph_cap_string(revoking)); if (S_ISREG(inode->i_mode) && - (revoking & used & CEPH_CAP_FILE_BUFFER)) + (revoking & used & CEPH_CAP_FILE_BUFFER)) { writeback = true; /* initiate writeback; will delay ack */ - else if (queue_invalidate && + revoke_wait = true; + } else if (queue_invalidate && revoking == CEPH_CAP_FILE_CACHE && - (newcaps & CEPH_CAP_FILE_LAZYIO) == 0) - ; /* do nothing yet, invalidation will be queued */ - else if (cap == ci->i_auth_cap) + (newcaps & CEPH_CAP_FILE_LAZYIO) == 0) { + revoke_wait = true; /* do nothing yet, invalidation will be queued */ + } else if (cap == ci->i_auth_cap) { check_caps = 1; /* check auth cap only */ - else + } else { check_caps = 2; /* check all caps */ + } /* If there is new caps, try to wake up the waiters */ if (~cap->issued & newcaps) wake = true; @@ -3749,8 +3759,9 @@ static void handle_cap_grant(struct inode *inode, BUG_ON(cap->issued & ~cap->implemented); /* don't let check_caps skip sending a response to MDS for revoke msgs */ - if (le32_to_cpu(grant->op) == CEPH_CAP_OP_REVOKE) { + if (!revoke_wait && le32_to_cpu(grant->op) == CEPH_CAP_OP_REVOKE) { cap->mds_wanted = 0; + flags |= CHECK_CAPS_FLUSH_FORCE; if (cap == ci->i_auth_cap) check_caps = 1; /* check auth cap only */ else @@ -3806,9 +3817,9 @@ static void handle_cap_grant(struct inode *inode, mutex_unlock(&session->s_mutex); if (check_caps == 1) - ceph_check_caps(ci, CHECK_CAPS_AUTHONLY | CHECK_CAPS_NOINVAL); + ceph_check_caps(ci, flags | CHECK_CAPS_AUTHONLY | CHECK_CAPS_NOINVAL); else if (check_caps == 2) - ceph_check_caps(ci, CHECK_CAPS_NOINVAL); + ceph_check_caps(ci, flags | CHECK_CAPS_NOINVAL); } /* diff --git a/fs/ceph/super.h b/fs/ceph/super.h index b0b368ed3018..831e8ec4d5da 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -200,9 +200,10 @@ struct ceph_cap { struct list_head caps_item; }; -#define CHECK_CAPS_AUTHONLY 1 /* only check auth cap */ -#define CHECK_CAPS_FLUSH 2 /* flush any dirty caps */ -#define CHECK_CAPS_NOINVAL 4 /* don't invalidate pagecache */ +#define CHECK_CAPS_AUTHONLY 1 /* only check auth cap */ +#define CHECK_CAPS_FLUSH 2 /* flush any dirty caps */ +#define CHECK_CAPS_NOINVAL 4 /* don't invalidate pagecache */ +#define CHECK_CAPS_FLUSH_FORCE 8 /* force flush any caps */ struct ceph_cap_flush { u64 tid; -- 2.45.1

11 months, 3 weeks

1
0
0 0

[PATCH] net: ks8851: Fix potential TX stall after interface reopen

by Ronald Wahl

From: Ronald Wahl <ronald.wahl(a)raritan.com> The amount of TX space in the hardware buffer is tracked in the tx_space variable. The initial value is currently only set during driver probing. After closing the interface and reopening it the tx_space variable has the last value it had before close. If it is smaller than the size of the first send packet after reopeing the interface the queue will be stopped. The queue is woken up after receiving a TX interrupt but this will never happen since we did not send anything. This commit moves the initialization of the tx_space variable to the ks8851_net_open function right before starting the TX queue. Also query the value from the hardware instead of using a hard coded value. Only the SPI chip variant is affected by this issue because only this driver variant actually depends on the tx_space variable in the xmit function. Fixes: 3dc5d4454545 ("net: ks8851: Fix TX stall caused by TX buffer overrun") Cc: "David S. Miller" <davem(a)davemloft.net> Cc: Eric Dumazet <edumazet(a)google.com> Cc: Jakub Kicinski <kuba(a)kernel.org> Cc: Paolo Abeni <pabeni(a)redhat.com> Cc: Simon Horman <horms(a)kernel.org> Cc: netdev(a)vger.kernel.org Cc: stable(a)vger.kernel.org # 5.10+ Signed-off-by: Ronald Wahl <ronald.wahl(a)raritan.com> --- drivers/net/ethernet/micrel/ks8851_common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/micrel/ks8851_common.c b/drivers/net/ethernet/micrel/ks8851_common.c index 6453c92f0fa7..03a554df6e7a 100644 --- a/drivers/net/ethernet/micrel/ks8851_common.c +++ b/drivers/net/ethernet/micrel/ks8851_common.c @@ -482,6 +482,7 @@ static int ks8851_net_open(struct net_device *dev) ks8851_wrreg16(ks, KS_IER, ks->rc_ier); ks->queued_len = 0; + ks->tx_space = ks8851_rdreg16(ks, KS_TXMIR); netif_start_queue(ks->netdev); netif_dbg(ks, ifup, ks->netdev, "network device up\n"); @@ -1101,7 +1102,6 @@ int ks8851_probe_common(struct net_device *netdev, struct device *dev, int ret; ks->netdev = netdev; - ks->tx_space = 6144; ks->gpio = devm_gpiod_get_optional(dev, "reset", GPIOD_OUT_HIGH); ret = PTR_ERR_OR_ZERO(ks->gpio); -- 2.45.2

11 months, 3 weeks

3
4
0 0

[PATCH v4 3/7] RISC-V: Check scalar unaligned access on all CPUs

by Jesse Taube

Originally, the check_unaligned_access_emulated_all_cpus function only checked the boot hart. This fixes the function to check all harts. Fixes: 71c54b3d169d ("riscv: report misaligned accesses emulation to hwprobe") Signed-off-by: Jesse Taube <jesse(a)rivosinc.com> Cc: stable(a)vger.kernel.org --- V1 -> V2: - New patch V2 -> V3: - Split patch V3 -> V4: - Re-add check for a system where a heterogeneous CPU is hotplugged into a previously homogenous system. --- arch/riscv/kernel/traps_misaligned.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/riscv/kernel/traps_misaligned.c b/arch/riscv/kernel/traps_misaligned.c index b62d5a2f4541..1a1bb41472ea 100644 --- a/arch/riscv/kernel/traps_misaligned.c +++ b/arch/riscv/kernel/traps_misaligned.c @@ -526,11 +526,11 @@ int handle_misaligned_store(struct pt_regs *regs) return 0; } -static bool check_unaligned_access_emulated(int cpu) +static void check_unaligned_access_emulated(struct work_struct *unused) { + int cpu = smp_processor_id(); long *mas_ptr = per_cpu_ptr(&misaligned_access_speed, cpu); unsigned long tmp_var, tmp_val; - bool misaligned_emu_detected; *mas_ptr = RISCV_HWPROBE_MISALIGNED_UNKNOWN; @@ -538,19 +538,16 @@ static bool check_unaligned_access_emulated(int cpu) " "REG_L" %[tmp], 1(%[ptr])\n" : [tmp] "=r" (tmp_val) : [ptr] "r" (&tmp_var) : "memory"); - misaligned_emu_detected = (*mas_ptr == RISCV_HWPROBE_MISALIGNED_EMULATED); /* * If unaligned_ctl is already set, this means that we detected that all * CPUS uses emulated misaligned access at boot time. If that changed * when hotplugging the new cpu, this is something we don't handle. */ - if (unlikely(unaligned_ctl && !misaligned_emu_detected)) { + if (unlikely(unaligned_ctl && (*mas_ptr != RISCV_HWPROBE_MISALIGNED_EMULATED))) { pr_crit("CPU misaligned accesses non homogeneous (expected all emulated)\n"); while (true) cpu_relax(); } - - return misaligned_emu_detected; } bool check_unaligned_access_emulated_all_cpus(void) @@ -562,8 +559,11 @@ bool check_unaligned_access_emulated_all_cpus(void) * accesses emulated since tasks requesting such control can run on any * CPU. */ + schedule_on_each_cpu(check_unaligned_access_emulated); + for_each_online_cpu(cpu) - if (!check_unaligned_access_emulated(cpu)) + if (per_cpu(misaligned_access_speed, cpu) + != RISCV_HWPROBE_MISALIGNED_EMULATED) return false; unaligned_ctl = true; -- 2.45.2

11 months, 3 weeks

2
1
0 0

+ watchdog-perf-properly-initialize-the-turbo-mode-timestamp-and-rearm-counter.patch added to mm-nonmm-unstable branch

by Andrew Morton

The patch titled Subject: watchdog/perf: Properly initialize the turbo mode timestamp and rearm counter has been added to the -mm mm-nonmm-unstable branch. Its filename is watchdog-perf-properly-initialize-the-turbo-mode-timestamp-and-rearm-counter.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-nonmm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Thomas Gleixner <tglx(a)linutronix.de> Subject: watchdog/perf: Properly initialize the turbo mode timestamp and rearm counter Date: Thu, 11 Jul 2024 22:25:21 +0200 For systems on which the performance counter can expire early due to turbo modes the watchdog handler has a safety net in place which validates that since the last watchdog event there has at least 4/5th of the watchdog period elapsed. This works reliably only after the first watchdog event because the per CPU variable which holds the timestamp of the last event is never initialized. So a first spurious event will validate against a timestamp of 0 which results in a delta which is likely to be way over the 4/5 threshold of the period. As this might happen before the first watchdog hrtimer event increments the watchdog counter, this can lead to false positives. Fix this by initializing the timestamp before enabling the hardware event. Reset the rearm counter as well, as that might be non zero after the watchdog was disabled and reenabled. Link: https://lkml.kernel.org/r/87frsfu15a.ffs@tglx Fixes: 7edaeb6841df ("kernel/watchdog: Prevent false positives with turbo modes") Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Cc: Arjan van de Ven <arjan(a)linux.intel.com> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- kernel/watchdog_perf.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) --- a/kernel/watchdog_perf.c~watchdog-perf-properly-initialize-the-turbo-mode-timestamp-and-rearm-counter +++ a/kernel/watchdog_perf.c @@ -75,11 +75,15 @@ static bool watchdog_check_timestamp(voi __this_cpu_write(last_timestamp, now); return true; } -#else -static inline bool watchdog_check_timestamp(void) + +static void watchdog_init_timestamp(void) { - return true; + __this_cpu_write(nmi_rearmed, 0); + __this_cpu_write(last_timestamp, ktime_get_mono_fast_ns()); } +#else +static inline bool watchdog_check_timestamp(void) { return true; } +static inline void watchdog_init_timestamp(void) { } #endif static struct perf_event_attr wd_hw_attr = { @@ -161,6 +165,7 @@ void watchdog_hardlockup_enable(unsigned if (!atomic_fetch_inc(&watchdog_cpus)) pr_info("Enabled. Permanently consumes one hw-PMU counter.\n"); + watchdog_init_timestamp(); perf_event_enable(this_cpu_read(watchdog_ev)); } _ Patches currently in -mm which might be from tglx(a)linutronix.de are watchdog-perf-properly-initialize-the-turbo-mode-timestamp-and-rearm-counter.patch

11 months, 3 weeks

1
0
0 0

[PATCH v3 3/8] RISC-V: Check scalar unaligned access on all CPUs

by Jesse Taube

Originally, the check_unaligned_access_emulated_all_cpus function only checked the boot hart. This fixes the function to check all harts. Fixes: 71c54b3d169d ("riscv: report misaligned accesses emulation to hwprobe") Signed-off-by: Jesse Taube <jesse(a)rivosinc.com> Cc: stable(a)vger.kernel.org --- V1 -> V2: - New patch V2 -> V3: - Split patch --- arch/riscv/kernel/traps_misaligned.c | 23 ++++++----------------- 1 file changed, 6 insertions(+), 17 deletions(-) diff --git a/arch/riscv/kernel/traps_misaligned.c b/arch/riscv/kernel/traps_misaligned.c index b62d5a2f4541..8fadbe00dd62 100644 --- a/arch/riscv/kernel/traps_misaligned.c +++ b/arch/riscv/kernel/traps_misaligned.c @@ -526,31 +526,17 @@ int handle_misaligned_store(struct pt_regs *regs) return 0; } -static bool check_unaligned_access_emulated(int cpu) +static void check_unaligned_access_emulated(struct work_struct *unused) { + int cpu = smp_processor_id(); long *mas_ptr = per_cpu_ptr(&misaligned_access_speed, cpu); unsigned long tmp_var, tmp_val; - bool misaligned_emu_detected; *mas_ptr = RISCV_HWPROBE_MISALIGNED_UNKNOWN; __asm__ __volatile__ ( " "REG_L" %[tmp], 1(%[ptr])\n" : [tmp] "=r" (tmp_val) : [ptr] "r" (&tmp_var) : "memory"); - - misaligned_emu_detected = (*mas_ptr == RISCV_HWPROBE_MISALIGNED_EMULATED); - /* - * If unaligned_ctl is already set, this means that we detected that all - * CPUS uses emulated misaligned access at boot time. If that changed - * when hotplugging the new cpu, this is something we don't handle. - */ - if (unlikely(unaligned_ctl && !misaligned_emu_detected)) { - pr_crit("CPU misaligned accesses non homogeneous (expected all emulated)\n"); - while (true) - cpu_relax(); - } - - return misaligned_emu_detected; } bool check_unaligned_access_emulated_all_cpus(void) @@ -562,8 +548,11 @@ bool check_unaligned_access_emulated_all_cpus(void) * accesses emulated since tasks requesting such control can run on any * CPU. */ + schedule_on_each_cpu(check_unaligned_access_emulated); + for_each_online_cpu(cpu) - if (!check_unaligned_access_emulated(cpu)) + if (per_cpu(misaligned_access_speed, cpu) + != RISCV_HWPROBE_MISALIGNED_EMULATED) return false; unaligned_ctl = true; -- 2.45.2

11 months, 3 weeks

2
2
0 0

[PATCH] binder: fix hang of unregistered readers

by Carlos Llamas

With the introduction of binder_available_for_proc_work_ilocked() in commit 1b77e9dcc3da ("ANDROID: binder: remove proc waitqueue") a binder thread can only "wait_for_proc_work" after its thread->looper has been marked as BINDER_LOOPER_STATE_{ENTERED|REGISTERED}. This means an unregistered reader risks waiting indefinitely for work since it never gets added to the proc->waiting_threads. If there are no further references to its waitqueue either the task will hang. The same applies to readers using the (e)poll interface. I couldn't find the rationale behind this restriction. So this patch restores the previous behavior of allowing unregistered threads to "wait_for_proc_work". Note that an error message for this scenario, which had previously become unreachable, is now re-enabled. Fixes: 1b77e9dcc3da ("ANDROID: binder: remove proc waitqueue") Cc: stable(a)vger.kernel.org Cc: Martijn Coenen <maco(a)google.com> Cc: Arve Hjønnevåg <arve(a)google.com> Signed-off-by: Carlos Llamas <cmllamas(a)google.com> --- drivers/android/binder.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/android/binder.c b/drivers/android/binder.c index b21a7b246a0d..2d0a24a56508 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -570,9 +570,7 @@ static bool binder_has_work(struct binder_thread *thread, bool do_proc_work) static bool binder_available_for_proc_work_ilocked(struct binder_thread *thread) { return !thread->transaction_stack && - binder_worklist_empty_ilocked(&thread->todo) && - (thread->looper & (BINDER_LOOPER_STATE_ENTERED | - BINDER_LOOPER_STATE_REGISTERED)); + binder_worklist_empty_ilocked(&thread->todo); } static void binder_wakeup_poll_threads_ilocked(struct binder_proc *proc, -- 2.45.2.993.g49e7a77208-goog

11 months, 3 weeks

1
0
0 0

+ mm-huge_memory-avoid-pmd-size-page-cache-if-needed.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm/huge_memory: avoid PMD-size page cache if needed has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-huge_memory-avoid-pmd-size-page-cache-if-needed.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Gavin Shan <gshan(a)redhat.com> Subject: mm/huge_memory: avoid PMD-size page cache if needed Date: Thu, 11 Jul 2024 20:48:40 +1000 Currently, xarray can't support arbitrary page cache size and the largest and supported page cache size is defined as MAX_PAGECACHE_ORDER in commit 099d90642a71 ("mm/filemap: make MAX_PAGECACHE_ORDER acceptable to xarray"). However, it's possible to have 512MB page cache in the huge memory collapsing path on ARM64 system whose base page size is 64KB. A warning is raised when the huge page cache is split as shown in the following example. [root@dhcp-10-26-1-207 ~]# cat /proc/1/smaps | grep KernelPageSize KernelPageSize: 64 kB [root@dhcp-10-26-1-207 ~]# cat /tmp/test.c : int main(int argc, char **argv) { const char *filename = TEST_XFS_FILENAME; int fd = 0; void *buf = (void *)-1, *p; int pgsize = getpagesize(); int ret = 0; if (pgsize != 0x10000) { fprintf(stdout, "System with 64KB base page size is required!\n"); return -EPERM; } system("echo 0 > /sys/devices/virtual/bdi/253:0/read_ahead_kb"); system("echo 1 > /proc/sys/vm/drop_caches"); /* Open xfs or shmem file */ fd = open(filename, O_RDONLY); assert(fd > 0); /* Create VMA */ buf = mmap(NULL, TEST_MEM_SIZE, PROT_READ, MAP_SHARED, fd, 0); assert(buf != (void *)-1); fprintf(stdout, "mapped buffer at 0x%p\n", buf); /* Populate VMA */ ret = madvise(buf, TEST_MEM_SIZE, MADV_NOHUGEPAGE); assert(ret == 0); ret = madvise(buf, TEST_MEM_SIZE, MADV_POPULATE_READ); assert(ret == 0); /* Collapse VMA */ ret = madvise(buf, TEST_MEM_SIZE, MADV_HUGEPAGE); assert(ret == 0); ret = madvise(buf, TEST_MEM_SIZE, MADV_COLLAPSE); if (ret) { fprintf(stdout, "Error %d to madvise(MADV_COLLAPSE)\n", errno); goto out; } /* Split xarray. The file needs to reopened with write permission */ munmap(buf, TEST_MEM_SIZE); buf = (void *)-1; close(fd); fd = open(filename, O_RDWR); assert(fd > 0); fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, TEST_MEM_SIZE - pgsize, pgsize); out: if (buf != (void *)-1) munmap(buf, TEST_MEM_SIZE); if (fd > 0) close(fd); return ret; } [root@dhcp-10-26-1-207 ~]# gcc /tmp/test.c -o /tmp/test [root@dhcp-10-26-1-207 ~]# /tmp/test ------------[ cut here ]------------ WARNING: CPU: 25 PID: 7560 at lib/xarray.c:1025 xas_split_alloc+0xf8/0x128 Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib \ nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct \ nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 \ ip_set rfkill nf_tables nfnetlink vfat fat virtio_balloon drm fuse \ xfs libcrc32c crct10dif_ce ghash_ce sha2_ce sha256_arm64 virtio_net \ sha1_ce net_failover virtio_blk virtio_console failover dimlib virtio_mmio CPU: 25 PID: 7560 Comm: test Kdump: loaded Not tainted 6.10.0-rc7-gavin+ #9 Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-1.el9 05/24/2024 pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : xas_split_alloc+0xf8/0x128 lr : split_huge_page_to_list_to_order+0x1c4/0x780 sp : ffff8000ac32f660 x29: ffff8000ac32f660 x28: ffff0000e0969eb0 x27: ffff8000ac32f6c0 x26: 0000000000000c40 x25: ffff0000e0969eb0 x24: 000000000000000d x23: ffff8000ac32f6c0 x22: ffffffdfc0700000 x21: 0000000000000000 x20: 0000000000000000 x19: ffffffdfc0700000 x18: 0000000000000000 x17: 0000000000000000 x16: ffffd5f3708ffc70 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: ffffffffffffffc0 x10: 0000000000000040 x9 : ffffd5f3708e692c x8 : 0000000000000003 x7 : 0000000000000000 x6 : ffff0000e0969eb8 x5 : ffffd5f37289e378 x4 : 0000000000000000 x3 : 0000000000000c40 x2 : 000000000000000d x1 : 000000000000000c x0 : 0000000000000000 Call trace: xas_split_alloc+0xf8/0x128 split_huge_page_to_list_to_order+0x1c4/0x780 truncate_inode_partial_folio+0xdc/0x160 truncate_inode_pages_range+0x1b4/0x4a8 truncate_pagecache_range+0x84/0xa0 xfs_flush_unmap_range+0x70/0x90 [xfs] xfs_file_fallocate+0xfc/0x4d8 [xfs] vfs_fallocate+0x124/0x2f0 ksys_fallocate+0x4c/0xa0 __arm64_sys_fallocate+0x24/0x38 invoke_syscall.constprop.0+0x7c/0xd8 do_el0_svc+0xb4/0xd0 el0_svc+0x44/0x1d8 el0t_64_sync_handler+0x134/0x150 el0t_64_sync+0x17c/0x180 Fix it by avoiding PMD-sized page cache in the huge memory collapsing path. After this patch is applied, the test program fails with error -EINVAL returned from __thp_vma_allowable_orders() and the madvise() system call to collapse the page caches. Link: https://lkml.kernel.org/r/20240711104840.200573-1-gshan@redhat.com Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache") Signed-off-by: Gavin Shan <gshan(a)redhat.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Ryan Roberts <ryan.roberts(a)arm.com> Cc: William Kucharski <william.kucharski(a)oracle.com> Cc: <stable(a)vger.kernel.org> [5.17+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/huge_memory.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/mm/huge_memory.c~mm-huge_memory-avoid-pmd-size-page-cache-if-needed +++ a/mm/huge_memory.c @@ -136,7 +136,8 @@ unsigned long __thp_vma_allowable_orders while (orders) { addr = vma->vm_end - (PAGE_SIZE << order); - if (thp_vma_suitable_order(vma, addr, order)) + if (!(vma->vm_file && order > MAX_PAGECACHE_ORDER) && + thp_vma_suitable_order(vma, addr, order)) break; order = next_order(&orders, order); } _ Patches currently in -mm which might be from gshan(a)redhat.com are mm-huge_memory-avoid-pmd-size-page-cache-if-needed.patch

11 months, 3 weeks

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror July 2024