Linux-stable-mirror January 2024

linux-stable-mirror@lists.linaro.org

377 participants
1022 discussions

[PATCH v4 0/4] x86/cacheinfo: Set the number of leaves per CPU

by Ricardo Neri

Hi, The interface /sys/devices/system/cpu/cpuX/cache is broken (not populated) if CPUs have different numbers of subleaves in CPUID 4. This is the case of Intel Meteor Lake. This is v4 of a patchset to fix the cache sysfs interface by setting the number of cache leaves independently for each CPU. v1, v2, and v3 can be found here[1], here[2], and here[3]. All the tests described in [4] passed. Changes since v3: * Fixed another NULL-pointer dereference when checking the validity of the last-level cache info. * Added the Reviewed-by tags from Radu and Sudeep. Thanks! * Rebased on v6.7-rc5. Changes since v2: * This version uncovered a NULL-pointer dereference in recent changes to cacheinfo[5]. This dereference is observed when the system does not configure cacheinfo early during boot nor makes corrections later during CPU hotplug; as is the case in x86. Patch 1 fixes this issue. Changes since v1: * Dave Hansen suggested to use the existing per-CPU ci_cpu_cacheinfo variable. Now the global variable num_cache_leaves became useless. * While here, I noticed that init_cache_level() also became useless: x86 does not need ci_cpu_cacheinfo::num_levels. Thanks and BR, Ricardo [1]. https://lore.kernel.org/lkml/20230314231658.30169-1-ricardo.neri-calderon@l… [2]. https://lore.kernel.org/all/20230424001956.21434-1-ricardo.neri-calderon@li… [3]. https://lore.kernel.org/lkml/20230805012421.7002-1-ricardo.neri-calderon@li… [4]. https://lore.kernel.org/lkml/20230912032350.GA17008@ranerica-svr.sc.intel.c… [5]. https://lore.kernel.org/all/20230412185759.755408-1-rrendec@redhat.com/ Ricardo Neri (4): cacheinfo: Check for null last-level cache info cacheinfo: Allocate memory for memory if not done from the primary CPU x86/cacheinfo: Delete global num_cache_leaves x86/cacheinfo: Clean out init_cache_level() arch/x86/kernel/cpu/cacheinfo.c | 49 +++++++++++++++++---------------- drivers/base/cacheinfo.c | 9 +++++- 2 files changed, 34 insertions(+), 24 deletions(-) -- 2.25.1

1 year, 11 months

[PATCH v2 1/2] EDAC/device_sysfs: Fix calling kobject_put() with ->state_initialized unset

by Harshit Mogalapalli

In edac_device_register_sysfs_main_kobj(), when dev_root is NULL, kobject_init_and_add() is not called. if (err) { // err = -ENODEV edac_dbg(1, "Failed to register '.../edac/%s'\n", edac_dev->name); goto err_kobj_reg; // This calls kobj_put() } This will cause a runtime warning in kobject_put() if the above happens. Warning: "kobject: '%s' (%p): is not initialized, yet kobject_put() is being called." Fix the error handling to avoid the above possible situation. Cc: <stable(a)vger.kernel.org> Fixes: cb4a0bec0bb9 ("EDAC/sysfs: move to use bus_get_dev_root()") Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com> --- This is based on static analysis and only compile tested. v1->v2: Resend as a patchset as they are two similar bugs. --- drivers/edac/edac_device_sysfs.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/edac/edac_device_sysfs.c b/drivers/edac/edac_device_sysfs.c index 010c26be5846..4cac14cbdb60 100644 --- a/drivers/edac/edac_device_sysfs.c +++ b/drivers/edac/edac_device_sysfs.c @@ -253,11 +253,13 @@ int edac_device_register_sysfs_main_kobj(struct edac_device_ctl_info *edac_dev) /* register */ dev_root = bus_get_dev_root(edac_subsys); - if (dev_root) { - err = kobject_init_and_add(&edac_dev->kobj, &ktype_device_ctrl, - &dev_root->kobj, "%s", edac_dev->name); - put_device(dev_root); - } + if (!dev_root) + goto module_put; + + err = kobject_init_and_add(&edac_dev->kobj, &ktype_device_ctrl, + &dev_root->kobj, "%s", edac_dev->name); + put_device(dev_root); + if (err) { edac_dbg(1, "Failed to register '.../edac/%s'\n", edac_dev->name); @@ -276,8 +278,8 @@ int edac_device_register_sysfs_main_kobj(struct edac_device_ctl_info *edac_dev) /* Error exit stack */ err_kobj_reg: kobject_put(&edac_dev->kobj); +module_put: module_put(edac_dev->owner); - err_out: return err; } -- 2.42.0

1 year, 11 months

[PATCH 1/2] net: stmmac: do not clear TBS enable bit on link up/down

by Esben Haabendal

With the dma conf being reallocated on each call to stmmac_open(), any information in there is lost, unless we specifically handle it. The STMMAC_TBS_EN bit is set when adding an etf qdisc, and the etf qdisc therefore would stop working when link was set down and then back up. Fixes: ba39b344e924 ("net: ethernet: stmicro: stmmac: generate stmmac dma conf before open") Cc: stable(a)vger.kernel.org Signed-off-by: Esben Haabendal <esben(a)geanix.com> --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index b334eb16da23..25519952f754 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -3932,6 +3932,9 @@ static int __stmmac_open(struct net_device *dev, priv->rx_copybreak = STMMAC_RX_COPYBREAK; buf_sz = dma_conf->dma_buf_sz; + for (int i = 0; i < MTL_MAX_TX_QUEUES; i++) + if (priv->dma_conf.tx_queue[i].tbs & STMMAC_TBS_EN) + dma_conf->tx_queue[i].tbs = priv->dma_conf.tx_queue[i].tbs; memcpy(&priv->dma_conf, dma_conf, sizeof(*dma_conf)); stmmac_reset_queues_param(priv); -- 2.43.0

1 year, 11 months

Re: [PATCH] btrfs: fix infinite directory reads

by Eugeniu Rosca

Hello Greg, Hello Filipe, On Sun, Aug 13, 2023 at 12:34:08PM +0100, fdmanana(a)kernel.org wrote: > From: Filipe Manana <fdmanana(a)suse.com> > > The readdir implementation currently processes always up to the last index > it finds. This however can result in an infinite loop if the directory has > a large number of entries such that they won't all fit in the given buffer > passed to the readdir callback, that is, dir_emit() returns a non-zero > value. Because in that case readdir() will be called again and if in the > meanwhile new directory entries were added and we still can't put all the > remaining entries in the buffer, we keep repeating this over and over. > > The following C program and test script reproduce the problem: This crucial fix successfully landed into vanilla v6.5 [1] and stable v6.4.12 [2], but unfortunately not into the older stable trees. Consequently, the fix is missing on the popular Ubuntu versions like 20.04 (KNL v5.15.x) and 22.04.3 (KNL v6.2.x). For that reason, people still experience infinite loops when building Linux on those systems. To overcome the issue, people fall back to workarounds [3-4]. The patch seems to apply cleanly to v6.2, but not to v5.15 (v5.15 backport attempt failed miserably). Is there a chance for: - Stable maintainers to accept the clean backport to v6.2? - BTRFS experts to suggest a conflict resolution for v5.15? [1] https:// git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9b378… [2] https:// git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=5441532… [3] https:// android-review.googlesource.com/c/kernel/build/+/2708835 [4] https:// android-review.googlesource.com/c/kernel/build/+/2715296 Best Regards Eugeniu

1 year, 11 months

[PATCH 5.15] btrfs: fix infinite directory reads

by Qu Wenruo

From: Filipe Manana <fdmanana(a)suse.com> [ Upstream commit b4c639f699349880b7918b861e1bd360442ec450 ] The readdir implementation currently processes always up to the last index it finds. This however can result in an infinite loop if the directory has a large number of entries such that they won't all fit in the given buffer passed to the readdir callback, that is, dir_emit() returns a non-zero value. Because in that case readdir() will be called again and if in the meanwhile new directory entries were added and we still can't put all the remaining entries in the buffer, we keep repeating this over and over. The following C program and test script reproduce the problem: $ cat /mnt/readdir_prog.c #include <sys/types.h> #include <dirent.h> #include <stdio.h> int main(int argc, char *argv[]) { DIR *dir = opendir("."); struct dirent *dd; while ((dd = readdir(dir))) { printf("%s\n", dd->d_name); rename(dd->d_name, "TEMPFILE"); rename("TEMPFILE", dd->d_name); } closedir(dir); } $ gcc -o /mnt/readdir_prog /mnt/readdir_prog.c $ cat test.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi mkfs.btrfs -f $DEV &> /dev/null #mkfs.xfs -f $DEV &> /dev/null #mkfs.ext4 -F $DEV &> /dev/null mount $DEV $MNT mkdir $MNT/testdir for ((i = 1; i <= 2000; i++)); do echo -n > $MNT/testdir/file_$i done cd $MNT/testdir /mnt/readdir_prog cd /mnt umount $MNT This behaviour is surprising to applications and it's unlike ext4, xfs, tmpfs, vfat and other filesystems, which always finish. In this case where new entries were added due to renames, some file names may be reported more than once, but this varies according to each filesystem - for example ext4 never reported the same file more than once while xfs reports the first 13 file names twice. So change our readdir implementation to track the last index number when opendir() is called and then make readdir() never process beyond that index number. This gives the same behaviour as ext4. Reported-by: Rob Landley <rob(a)landley.net> Link: https://lore.kernel.org/linux-btrfs/2c8c55ec-04c6-e0dc-9c5c-8c7924778c35@la… Link: https://bugzilla.kernel.org/show_bug.cgi?id=217681 CC: stable(a)vger.kernel.org # 5.15 Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> [ Resolve a conflict due to member changes in 96d89923fa94 ] Signed-off-by: Qu Wenruo <wqu(a)suse.com> --- fs/btrfs/ctree.h | 1 + fs/btrfs/delayed-inode.c | 5 +- fs/btrfs/delayed-inode.h | 1 + fs/btrfs/inode.c | 131 +++++++++++++++++++++++---------------- 4 files changed, 84 insertions(+), 54 deletions(-) --- Initially I tried to backport all the needed dependency to v5.15, but it turns out to be a disaster, at least 3 large rework series are needed, and the deeper I dig, more dependency series came up. So this version is a manual backport, the conflicts are caused by two missing dependency: - 96d89923fa94 ("btrfs: store index number instead of key in struct btrfs_delayed_item") This changes btrfs_delayed_item::key to index, which saves some space. Thankfully the new @index member is just the same as @key.offset. - 3c32c7212f16 ("btrfs: use cached state when looking for delalloc ranges with lseek") This belongs to the series "[PATCH 0/9] btrfs: more optimizations for lseek and fiemap", thankfully the conflicting member is only utilized by the optimizations, we can go without that member. And if I dig deeper with every conflict, at least the following series are needed to be backported: - [PATCH 0/9] btrfs: more optimizations for lseek and fiemap - [PATCH v2 00/15] btrfs: some updates to delayed items and inode logging - [PATCH v2 00/16] btrfs: inode creation cleanups and fixes At this stage, a full backport with all those needed dependency looks impractical. Thus a manual backport is created, so far the fstests result looks fine and generic/736 (the reproducer inspired by the bug report) passes. diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 1467bf439cb4..7905f178efa3 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1361,6 +1361,7 @@ struct btrfs_drop_extents_args { struct btrfs_file_private { void *filldir_buf; + u64 last_index; }; diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index fd951aeaeac5..5a98c5da1225 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -1513,6 +1513,7 @@ int btrfs_inode_delayed_dir_index_count(struct btrfs_inode *inode) } bool btrfs_readdir_get_delayed_items(struct inode *inode, + u64 last_index, struct list_head *ins_list, struct list_head *del_list) { @@ -1532,14 +1533,14 @@ bool btrfs_readdir_get_delayed_items(struct inode *inode, mutex_lock(&delayed_node->mutex); item = __btrfs_first_delayed_insertion_item(delayed_node); - while (item) { + while (item && item->key.offset <= last_index) { refcount_inc(&item->refs); list_add_tail(&item->readdir_list, ins_list); item = __btrfs_next_delayed_item(item); } item = __btrfs_first_delayed_deletion_item(delayed_node); - while (item) { + while (item && item->key.offset <= last_index) { refcount_inc(&item->refs); list_add_tail(&item->readdir_list, del_list); item = __btrfs_next_delayed_item(item); diff --git a/fs/btrfs/delayed-inode.h b/fs/btrfs/delayed-inode.h index b2412160c5bc..a9cfce856d2e 100644 --- a/fs/btrfs/delayed-inode.h +++ b/fs/btrfs/delayed-inode.h @@ -123,6 +123,7 @@ void btrfs_destroy_delayed_inodes(struct btrfs_fs_info *fs_info); /* Used for readdir() */ bool btrfs_readdir_get_delayed_items(struct inode *inode, + u64 last_index, struct list_head *ins_list, struct list_head *del_list); void btrfs_readdir_put_delayed_items(struct inode *inode, diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 95af29634e55..1df374ce829b 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6121,6 +6121,74 @@ static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry, return d_splice_alias(inode, dentry); } +/* + * Find the highest existing sequence number in a directory and then set the + * in-memory index_cnt variable to the first free sequence number. + */ +static int btrfs_set_inode_index_count(struct btrfs_inode *inode) +{ + struct btrfs_root *root = inode->root; + struct btrfs_key key, found_key; + struct btrfs_path *path; + struct extent_buffer *leaf; + int ret; + + key.objectid = btrfs_ino(inode); + key.type = BTRFS_DIR_INDEX_KEY; + key.offset = (u64)-1; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); + if (ret < 0) + goto out; + /* FIXME: we should be able to handle this */ + if (ret == 0) + goto out; + ret = 0; + + if (path->slots[0] == 0) { + inode->index_cnt = BTRFS_DIR_START_INDEX; + goto out; + } + + path->slots[0]--; + + leaf = path->nodes[0]; + btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]); + + if (found_key.objectid != btrfs_ino(inode) || + found_key.type != BTRFS_DIR_INDEX_KEY) { + inode->index_cnt = BTRFS_DIR_START_INDEX; + goto out; + } + + inode->index_cnt = found_key.offset + 1; +out: + btrfs_free_path(path); + return ret; +} + +static int btrfs_get_dir_last_index(struct btrfs_inode *dir, u64 *index) +{ + if (dir->index_cnt == (u64)-1) { + int ret; + + ret = btrfs_inode_delayed_dir_index_count(dir); + if (ret) { + ret = btrfs_set_inode_index_count(dir); + if (ret) + return ret; + } + } + + *index = dir->index_cnt; + + return 0; +} + /* * All this infrastructure exists because dir_emit can fault, and we are holding * the tree lock when doing readdir. For now just allocate a buffer and copy @@ -6133,10 +6201,17 @@ static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry, static int btrfs_opendir(struct inode *inode, struct file *file) { struct btrfs_file_private *private; + u64 last_index; + int ret; + + ret = btrfs_get_dir_last_index(BTRFS_I(inode), &last_index); + if (ret) + return ret; private = kzalloc(sizeof(struct btrfs_file_private), GFP_KERNEL); if (!private) return -ENOMEM; + private->last_index = last_index; private->filldir_buf = kzalloc(PAGE_SIZE, GFP_KERNEL); if (!private->filldir_buf) { kfree(private); @@ -6205,7 +6280,8 @@ static int btrfs_real_readdir(struct file *file, struct dir_context *ctx) INIT_LIST_HEAD(&ins_list); INIT_LIST_HEAD(&del_list); - put = btrfs_readdir_get_delayed_items(inode, &ins_list, &del_list); + put = btrfs_readdir_get_delayed_items(inode, private->last_index, + &ins_list, &del_list); again: key.type = BTRFS_DIR_INDEX_KEY; @@ -6238,6 +6314,8 @@ static int btrfs_real_readdir(struct file *file, struct dir_context *ctx) break; if (found_key.offset < ctx->pos) goto next; + if (found_key.offset > private->last_index) + break; if (btrfs_should_delete_dir_index(&del_list, found_key.offset)) goto next; di = btrfs_item_ptr(leaf, slot, struct btrfs_dir_item); @@ -6371,57 +6449,6 @@ static int btrfs_update_time(struct inode *inode, struct timespec64 *now, return dirty ? btrfs_dirty_inode(inode) : 0; } -/* - * find the highest existing sequence number in a directory - * and then set the in-memory index_cnt variable to reflect - * free sequence numbers - */ -static int btrfs_set_inode_index_count(struct btrfs_inode *inode) -{ - struct btrfs_root *root = inode->root; - struct btrfs_key key, found_key; - struct btrfs_path *path; - struct extent_buffer *leaf; - int ret; - - key.objectid = btrfs_ino(inode); - key.type = BTRFS_DIR_INDEX_KEY; - key.offset = (u64)-1; - - path = btrfs_alloc_path(); - if (!path) - return -ENOMEM; - - ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); - if (ret < 0) - goto out; - /* FIXME: we should be able to handle this */ - if (ret == 0) - goto out; - ret = 0; - - if (path->slots[0] == 0) { - inode->index_cnt = BTRFS_DIR_START_INDEX; - goto out; - } - - path->slots[0]--; - - leaf = path->nodes[0]; - btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]); - - if (found_key.objectid != btrfs_ino(inode) || - found_key.type != BTRFS_DIR_INDEX_KEY) { - inode->index_cnt = BTRFS_DIR_START_INDEX; - goto out; - } - - inode->index_cnt = found_key.offset + 1; -out: - btrfs_free_path(path); - return ret; -} - /* * helper to find a free sequence number in a given directory. This current * code is very simple, later versions will do smarter things in the btree -- 2.43.0

1 year, 11 months

[tip: timers/urgent] tick/sched: Preserve number of idle sleeps across CPU hotplug events

by tip-bot2 for Tim Chen

The following commit has been merged into the timers/urgent branch of tip: Commit-ID: 9a574ea9069be30b835a3da772c039993c43369b Gitweb: https://git.kernel.org/tip/9a574ea9069be30b835a3da772c039993c43369b Author: Tim Chen <tim.c.chen(a)linux.intel.com> AuthorDate: Mon, 22 Jan 2024 15:35:34 -08:00 Committer: Thomas Gleixner <tglx(a)linutronix.de> CommitterDate: Thu, 25 Jan 2024 09:52:40 +01:00 tick/sched: Preserve number of idle sleeps across CPU hotplug events Commit 71fee48f ("tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug") preserved total idle sleep time and iowait sleeptime across CPU hotplug events. Similar reasoning applies to the number of idle calls and idle sleeps to get the proper average of sleep time per idle invocation. Preserve those fields too. Fixes: 71fee48f ("tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug") Signed-off-by: Tim Chen <tim.c.chen(a)linux.intel.com> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/r/20240122233534.3094238-1-tim.c.chen@linux.intel.c… --- kernel/time/tick-sched.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index d250167..01fb50c 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -1577,6 +1577,7 @@ void tick_cancel_sched_timer(int cpu) { struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu); ktime_t idle_sleeptime, iowait_sleeptime; + unsigned long idle_calls, idle_sleeps; # ifdef CONFIG_HIGH_RES_TIMERS if (ts->sched_timer.base) @@ -1585,9 +1586,13 @@ void tick_cancel_sched_timer(int cpu) idle_sleeptime = ts->idle_sleeptime; iowait_sleeptime = ts->iowait_sleeptime; + idle_calls = ts->idle_calls; + idle_sleeps = ts->idle_sleeps; memset(ts, 0, sizeof(*ts)); ts->idle_sleeptime = idle_sleeptime; ts->iowait_sleeptime = iowait_sleeptime; + ts->idle_calls = idle_calls; + ts->idle_sleeps = idle_sleeps; } #endif

1 year, 11 months

[PATCH] wifi: cfg80211: fix wiphy delayed work queueing

by Johannes Berg

From: Johannes Berg <johannes.berg(a)intel.com> When a wiphy work is queued with timer, and then again without a delay, it's started immediately but *also* started again after the timer expires. This can lead, for example, to warnings in mac80211's offchannel code as reported by Jouni. Running the same work twice isn't expected, of course. Fix this by deleting the timer at this point, when queuing immediately due to delay=0. Reported-by: Jouni Malinen <j(a)w1.fi> Cc: stable(a)vger.kernel.org Fixes: a3ee4dc84c4e ("wifi: cfg80211: add a work abstraction with special semantics") Signed-off-by: Johannes Berg <johannes.berg(a)intel.com> --- net/wireless/core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/wireless/core.c b/net/wireless/core.c index 409d74c57ca0..3fb1b637352a 100644 --- a/net/wireless/core.c +++ b/net/wireless/core.c @@ -5,7 +5,7 @@ * Copyright 2006-2010 Johannes Berg <johannes(a)sipsolutions.net> * Copyright 2013-2014 Intel Mobile Communications GmbH * Copyright 2015-2017 Intel Deutschland GmbH - * Copyright (C) 2018-2023 Intel Corporation + * Copyright (C) 2018-2024 Intel Corporation */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt @@ -1661,6 +1661,7 @@ void wiphy_delayed_work_queue(struct wiphy *wiphy, unsigned long delay) { if (!delay) { + del_timer(&dwork->timer); wiphy_work_queue(wiphy, &dwork->work); return; } -- 2.43.0

1 year, 11 months

[tip: timers/urgent] clocksource: Skip watchdog check for large watchdog intervals

by tip-bot2 for Jiri Wiesner

The following commit has been merged into the timers/urgent branch of tip: Commit-ID: 644649553508b9bacf0fc7a5bdc4f9e0165576a5 Gitweb: https://git.kernel.org/tip/644649553508b9bacf0fc7a5bdc4f9e0165576a5 Author: Jiri Wiesner <jwiesner(a)suse.de> AuthorDate: Mon, 22 Jan 2024 18:23:50 +01:00 Committer: Thomas Gleixner <tglx(a)linutronix.de> CommitterDate: Thu, 25 Jan 2024 09:13:16 +01:00 clocksource: Skip watchdog check for large watchdog intervals There have been reports of the watchdog marking clocksources unstable on machines with 8 NUMA nodes: clocksource: timekeeping watchdog on CPU373: Marking clocksource 'tsc' as unstable because the skew is too large: clocksource: 'hpet' wd_nsec: 14523447520 clocksource: 'tsc' cs_nsec: 14524115132 The measured clocksource skew - the absolute difference between cs_nsec and wd_nsec - was 668 microseconds: cs_nsec - wd_nsec = 14524115132 - 14523447520 = 667612 The kernel used 200 microseconds for the uncertainty_margin of both the clocksource and watchdog, resulting in a threshold of 400 microseconds (the md variable). Both the cs_nsec and the wd_nsec value indicate that the readout interval was circa 14.5 seconds. The observed behaviour is that watchdog checks failed for large readout intervals on 8 NUMA node machines. This indicates that the size of the skew was directly proportinal to the length of the readout interval on those machines. The measured clocksource skew, 668 microseconds, was evaluated against a threshold (the md variable) that is suited for readout intervals of roughly WATCHDOG_INTERVAL, i.e. HZ >> 1, which is 0.5 second. The intention of 2e27e793e280 ("clocksource: Reduce clocksource-skew threshold") was to tighten the threshold for evaluating skew and set the lower bound for the uncertainty_margin of clocksources to twice WATCHDOG_MAX_SKEW. Later in c37e85c135ce ("clocksource: Loosen clocksource watchdog constraints"), the WATCHDOG_MAX_SKEW constant was increased to 125 microseconds to fit the limit of NTP, which is able to use a clocksource that suffers from up to 500 microseconds of skew per second. Both the TSC and the HPET use default uncertainty_margin. When the readout interval gets stretched the default uncertainty_margin is no longer a suitable lower bound for evaluating skew - it imposes a limit that is far stricter than the skew with which NTP can deal. The root causes of the skew being directly proportinal to the length of the readout interval are: * the inaccuracy of the shift/mult pairs of clocksources and the watchdog * the conversion to nanoseconds is imprecise for large readout intervals Prevent this by skipping the current watchdog check if the readout interval exceeds 2 * WATCHDOG_INTERVAL. Considering the maximum readout interval of 2 * WATCHDOG_INTERVAL, the current default uncertainty margin (of the TSC and HPET) corresponds to a limit on clocksource skew of 250 ppm (microseconds of skew per second). To keep the limit imposed by NTP (500 microseconds of skew per second) for all possible readout intervals, the margins would have to be scaled so that the threshold value is proportional to the length of the actual readout interval. As for why the readout interval may get stretched: Since the watchdog is executed in softirq context the expiration of the watchdog timer can get severely delayed on account of a ksoftirqd thread not getting to run in a timely manner. Surely, a system with such belated softirq execution is not working well and the scheduling issue should be looked into but the clocksource watchdog should be able to deal with it accordingly. Fixes: 2e27e793e280 ("clocksource: Reduce clocksource-skew threshold") Suggested-by: Feng Tang <feng.tang(a)intel.com> Signed-off-by: Jiri Wiesner <jwiesner(a)suse.de> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Tested-by: Paul E. McKenney <paulmck(a)kernel.org> Reviewed-by: Feng Tang <feng.tang(a)intel.com> Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/r/20240122172350.GA740@incl --- kernel/time/clocksource.c | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index c108ed8..3052b1f 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -99,6 +99,7 @@ static u64 suspend_start; * Interval: 0.5sec. */ #define WATCHDOG_INTERVAL (HZ >> 1) +#define WATCHDOG_INTERVAL_MAX_NS ((2 * WATCHDOG_INTERVAL) * (NSEC_PER_SEC / HZ)) /* * Threshold: 0.0312s, when doubled: 0.0625s. @@ -134,6 +135,7 @@ static DECLARE_WORK(watchdog_work, clocksource_watchdog_work); static DEFINE_SPINLOCK(watchdog_lock); static int watchdog_running; static atomic_t watchdog_reset_pending; +static int64_t watchdog_max_interval; static inline void clocksource_watchdog_lock(unsigned long *flags) { @@ -399,8 +401,8 @@ static inline void clocksource_reset_watchdog(void) static void clocksource_watchdog(struct timer_list *unused) { u64 csnow, wdnow, cslast, wdlast, delta; + int64_t wd_nsec, cs_nsec, interval; int next_cpu, reset_pending; - int64_t wd_nsec, cs_nsec; struct clocksource *cs; enum wd_read_status read_ret; unsigned long extra_wait = 0; @@ -470,6 +472,27 @@ static void clocksource_watchdog(struct timer_list *unused) if (atomic_read(&watchdog_reset_pending)) continue; + /* + * The processing of timer softirqs can get delayed (usually + * on account of ksoftirqd not getting to run in a timely + * manner), which causes the watchdog interval to stretch. + * Skew detection may fail for longer watchdog intervals + * on account of fixed margins being used. + * Some clocksources, e.g. acpi_pm, cannot tolerate + * watchdog intervals longer than a few seconds. + */ + interval = max(cs_nsec, wd_nsec); + if (unlikely(interval > WATCHDOG_INTERVAL_MAX_NS)) { + if (system_state > SYSTEM_SCHEDULING && + interval > 2 * watchdog_max_interval) { + watchdog_max_interval = interval; + pr_warn("Long readout interval, skipping watchdog check: cs_nsec: %lld wd_nsec: %lld\n", + cs_nsec, wd_nsec); + } + watchdog_timer.expires = jiffies; + continue; + } + /* Check the deviation from the watchdog clocksource. */ md = cs->uncertainty_margin + watchdog->uncertainty_margin; if (abs(cs_nsec - wd_nsec) > md) {

1 year, 11 months

FAILED: patch "[PATCH] serial: Do not hold the port lock when setting rx-during-tx" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 07c30ea5861fb26a77dade8cdc787252f6122fb1 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024012204-tattered-oxidation-ac0c@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: 07c30ea5861f ("serial: Do not hold the port lock when setting rx-during-tx GPIO") fe2017ba24f3 ("Merge 6.6-rc6 into tty-next") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 07c30ea5861fb26a77dade8cdc787252f6122fb1 Mon Sep 17 00:00:00 2001 From: Lino Sanfilippo <l.sanfilippo(a)kunbus.com> Date: Wed, 3 Jan 2024 07:18:12 +0100 Subject: [PATCH] serial: Do not hold the port lock when setting rx-during-tx GPIO Both the imx and stm32 driver set the rx-during-tx GPIO in rs485_config(). Since this function is called with the port lock held, this can be a problem in case that setting the GPIO line can sleep (e.g. if a GPIO expander is used which is connected via SPI or I2C). Avoid this issue by moving the GPIO setting outside of the port lock into the serial core and thus making it a generic feature. Also with commit c54d48543689 ("serial: stm32: Add support for rs485 RX_DURING_TX output GPIO") the SER_RS485_RX_DURING_TX flag is only set if a rx-during-tx GPIO is _not_ available, which is wrong. Fix this, too. Furthermore reset old GPIO settings in case that changing the RS485 configuration failed. Fixes: c54d48543689 ("serial: stm32: Add support for rs485 RX_DURING_TX output GPIO") Fixes: ca530cfa968c ("serial: imx: Add support for RS485 RX_DURING_TX output GPIO") Cc: Shawn Guo <shawnguo(a)kernel.org> Cc: Sascha Hauer <s.hauer(a)pengutronix.de> Cc: <stable(a)vger.kernel.org> Signed-off-by: Lino Sanfilippo <l.sanfilippo(a)kunbus.com> Link: https://lore.kernel.org/r/20240103061818.564-2-l.sanfilippo@kunbus.com Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c index a0fcf18cbeac..8b7d9c5a7455 100644 --- a/drivers/tty/serial/imx.c +++ b/drivers/tty/serial/imx.c @@ -1948,10 +1948,6 @@ static int imx_uart_rs485_config(struct uart_port *port, struct ktermios *termio rs485conf->flags & SER_RS485_RX_DURING_TX) imx_uart_start_rx(port); - if (port->rs485_rx_during_tx_gpio) - gpiod_set_value_cansleep(port->rs485_rx_during_tx_gpio, - !!(rs485conf->flags & SER_RS485_RX_DURING_TX)); - return 0; } diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c index cc866da50b81..8d381c283cec 100644 --- a/drivers/tty/serial/serial_core.c +++ b/drivers/tty/serial/serial_core.c @@ -1407,6 +1407,16 @@ static void uart_set_rs485_termination(struct uart_port *port, !!(rs485->flags & SER_RS485_TERMINATE_BUS)); } +static void uart_set_rs485_rx_during_tx(struct uart_port *port, + const struct serial_rs485 *rs485) +{ + if (!(rs485->flags & SER_RS485_ENABLED)) + return; + + gpiod_set_value_cansleep(port->rs485_rx_during_tx_gpio, + !!(rs485->flags & SER_RS485_RX_DURING_TX)); +} + static int uart_rs485_config(struct uart_port *port) { struct serial_rs485 *rs485 = &port->rs485; @@ -1418,12 +1428,17 @@ static int uart_rs485_config(struct uart_port *port) uart_sanitize_serial_rs485(port, rs485); uart_set_rs485_termination(port, rs485); + uart_set_rs485_rx_during_tx(port, rs485); uart_port_lock_irqsave(port, &flags); ret = port->rs485_config(port, NULL, rs485); uart_port_unlock_irqrestore(port, flags); - if (ret) + if (ret) { memset(rs485, 0, sizeof(*rs485)); + /* unset GPIOs */ + gpiod_set_value_cansleep(port->rs485_term_gpio, 0); + gpiod_set_value_cansleep(port->rs485_rx_during_tx_gpio, 0); + } return ret; } @@ -1462,6 +1477,7 @@ static int uart_set_rs485_config(struct tty_struct *tty, struct uart_port *port, return ret; uart_sanitize_serial_rs485(port, &rs485); uart_set_rs485_termination(port, &rs485); + uart_set_rs485_rx_during_tx(port, &rs485); uart_port_lock_irqsave(port, &flags); ret = port->rs485_config(port, &tty->termios, &rs485); @@ -1473,8 +1489,14 @@ static int uart_set_rs485_config(struct tty_struct *tty, struct uart_port *port, port->ops->set_mctrl(port, port->mctrl); } uart_port_unlock_irqrestore(port, flags); - if (ret) + if (ret) { + /* restore old GPIO settings */ + gpiod_set_value_cansleep(port->rs485_term_gpio, + !!(port->rs485.flags & SER_RS485_TERMINATE_BUS)); + gpiod_set_value_cansleep(port->rs485_rx_during_tx_gpio, + !!(port->rs485.flags & SER_RS485_RX_DURING_TX)); return ret; + } if (copy_to_user(rs485_user, &port->rs485, sizeof(port->rs485))) return -EFAULT; diff --git a/drivers/tty/serial/stm32-usart.c b/drivers/tty/serial/stm32-usart.c index 9781c143def2..794b77512740 100644 --- a/drivers/tty/serial/stm32-usart.c +++ b/drivers/tty/serial/stm32-usart.c @@ -226,12 +226,6 @@ static int stm32_usart_config_rs485(struct uart_port *port, struct ktermios *ter stm32_usart_clr_bits(port, ofs->cr1, BIT(cfg->uart_enable_bit)); - if (port->rs485_rx_during_tx_gpio) - gpiod_set_value_cansleep(port->rs485_rx_during_tx_gpio, - !!(rs485conf->flags & SER_RS485_RX_DURING_TX)); - else - rs485conf->flags |= SER_RS485_RX_DURING_TX; - if (rs485conf->flags & SER_RS485_ENABLED) { cr1 = readl_relaxed(port->membase + ofs->cr1); cr3 = readl_relaxed(port->membase + ofs->cr3); @@ -256,6 +250,8 @@ static int stm32_usart_config_rs485(struct uart_port *port, struct ktermios *ter writel_relaxed(cr3, port->membase + ofs->cr3); writel_relaxed(cr1, port->membase + ofs->cr1); + + rs485conf->flags |= SER_RS485_RX_DURING_TX; } else { stm32_usart_clr_bits(port, ofs->cr3, USART_CR3_DEM | USART_CR3_DEP);

1 year, 11 months

FAILED: patch "[PATCH] serial: core: set missing supported flag for RX during TX" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

1 year, 11 months

Jump to page:

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror January 2024