The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 5596d9e8b553dacb0ac34bcf873cbbfb16c3ba3e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071559-reptilian-chaffing-a991@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
5596d9e8b553 ("mm/hugetlb: fix potential race in __update_and_free_hugetlb_folio()")
bd225530a4c7 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers")
51718e25c53f ("mm: convert arch_clear_hugepage_flags to take a folio")
831bc31a5e82 ("mm: hugetlb: improve the handling of hugetlb allocation failure for freed or in-use hugetlb")
ebc20dcac4ce ("mm: hugetlb_vmemmap: convert page to folio")
c5ad3233ead5 ("hugetlb_vmemmap: use folio argument for hugetlb_vmemmap_* functions")
c24f188b2289 ("hugetlb: batch TLB flushes when restoring vmemmap")
f13b83fdd996 ("hugetlb: batch TLB flushes when freeing vmemmap")
f4b7e3efaddb ("hugetlb: batch PMD split for bulk vmemmap dedup")
91f386bf0772 ("hugetlb: batch freeing of vmemmap pages")
cfb8c75099db ("hugetlb: perform vmemmap restoration on a list of pages")
79359d6d24df ("hugetlb: perform vmemmap optimization on a list of pages")
d67e32f26713 ("hugetlb: restructure pool allocations")
d2cf88c27f51 ("hugetlb: optimize update_and_free_pages_bulk to avoid lock cycles")
30a89adf872d ("hugetlb: check for hugetlb folio before vmemmap_restore")
d5b43e9683ec ("hugetlb: convert remove_pool_huge_page() to remove_pool_hugetlb_folio()")
04bbfd844b99 ("hugetlb: remove a few calls to page_folio()")
fde1c4ecf916 ("mm: hugetlb: skip initialization of gigantic tail struct pages if freed by HVO")
3ee0aa9f0675 ("mm: move some shrinker-related function declarations to mm/internal.h")
d8f5f7e445f0 ("hugetlb: set hugetlb page flag before optimizing vmemmap")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5596d9e8b553dacb0ac34bcf873cbbfb16c3ba3e Mon Sep 17 00:00:00 2001
From: Miaohe Lin <linmiaohe(a)huawei.com>
Date: Mon, 8 Jul 2024 10:51:27 +0800
Subject: [PATCH] mm/hugetlb: fix potential race in
__update_and_free_hugetlb_folio()
There is a potential race between __update_and_free_hugetlb_folio() and
try_memory_failure_hugetlb():
CPU1 CPU2
__update_and_free_hugetlb_folio try_memory_failure_hugetlb
folio_test_hugetlb
-- It's still hugetlb folio.
folio_clear_hugetlb_hwpoison
spin_lock_irq(&hugetlb_lock);
__get_huge_page_for_hwpoison
folio_set_hugetlb_hwpoison
spin_unlock_irq(&hugetlb_lock);
spin_lock_irq(&hugetlb_lock);
__folio_clear_hugetlb(folio);
-- Hugetlb flag is cleared but too late.
spin_unlock_irq(&hugetlb_lock);
When the above race occurs, raw error page info will be leaked. Even
worse, raw error pages won't have hwpoisoned flag set and hit
pcplists/buddy. Fix this issue by deferring
folio_clear_hugetlb_hwpoison() until __folio_clear_hugetlb() is done. So
all raw error pages will have hwpoisoned flag set.
Link: https://lkml.kernel.org/r/20240708025127.107713-1-linmiaohe@huawei.com
Fixes: 32c877191e02 ("hugetlb: do not clear hugetlb dtor until allocating vmemmap")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Acked-by: Muchun Song <muchun.song(a)linux.dev>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 2afb70171b76..fe44324d6383 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1725,13 +1725,6 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
return;
}
- /*
- * Move PageHWPoison flag from head page to the raw error pages,
- * which makes any healthy subpages reusable.
- */
- if (unlikely(folio_test_hwpoison(folio)))
- folio_clear_hugetlb_hwpoison(folio);
-
/*
* If vmemmap pages were allocated above, then we need to clear the
* hugetlb flag under the hugetlb lock.
@@ -1742,6 +1735,13 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
spin_unlock_irq(&hugetlb_lock);
}
+ /*
+ * Move PageHWPoison flag from head page to the raw error pages,
+ * which makes any healthy subpages reusable.
+ */
+ if (unlikely(folio_test_hwpoison(folio)))
+ folio_clear_hugetlb_hwpoison(folio);
+
folio_ref_unfreeze(folio, 1);
/*
From: Francesco Dolcini <francesco.dolcini(a)toradex.com>
This reverts commit 3935fbc87ddebea5439f3ab6a78b1e83e976bf88.
CTRL_SLEEP_MOCI# is a signal that is defined for all the SoM
implementing the Verdin family specification, this signal is supposed to
control the power enable in the carrier board when the system is in deep
sleep mode. However this is not possible with Texas Instruments AM62
SoC, IOs output buffer is disabled in deep sleep and IOs are in
tri-state mode.
Given that we cannot properly control this pin, force it to be always
high to minimize potential issues.
Fixes: 3935fbc87dde ("arm64: dts: ti: k3-am62-verdin-dahlia: support sleep-moci")
Cc: <stable(a)vger.kernel.org>
Link: https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1…
Signed-off-by: Francesco Dolcini <francesco.dolcini(a)toradex.com>
---
.../boot/dts/ti/k3-am62-verdin-dahlia.dtsi | 22 -------------------
arch/arm64/boot/dts/ti/k3-am62-verdin.dtsi | 6 -----
2 files changed, 28 deletions(-)
diff --git a/arch/arm64/boot/dts/ti/k3-am62-verdin-dahlia.dtsi b/arch/arm64/boot/dts/ti/k3-am62-verdin-dahlia.dtsi
index e8f4d136e5df..9202181fbd65 100644
--- a/arch/arm64/boot/dts/ti/k3-am62-verdin-dahlia.dtsi
+++ b/arch/arm64/boot/dts/ti/k3-am62-verdin-dahlia.dtsi
@@ -43,15 +43,6 @@ simple-audio-card,cpu {
sound-dai = <&mcasp0>;
};
};
-
- reg_usb_hub: regulator-usb-hub {
- compatible = "regulator-fixed";
- enable-active-high;
- /* Verdin CTRL_SLEEP_MOCI# (SODIMM 256) */
- gpio = <&main_gpio0 31 GPIO_ACTIVE_HIGH>;
- regulator-boot-on;
- regulator-name = "HUB_PWR_EN";
- };
};
/* Verdin ETHs */
@@ -193,11 +184,6 @@ &ospi0 {
status = "okay";
};
-/* Do not force CTRL_SLEEP_MOCI# always enabled */
-®_force_sleep_moci {
- status = "disabled";
-};
-
/* Verdin SD_1 */
&sdhci1 {
status = "okay";
@@ -218,15 +204,7 @@ &usbss1 {
};
&usb1 {
- #address-cells = <1>;
- #size-cells = <0>;
status = "okay";
-
- usb-hub@1 {
- compatible = "usb424,2744";
- reg = <1>;
- vdd-supply = <®_usb_hub>;
- };
};
/* Verdin CTRL_WAKE1_MICO# */
diff --git a/arch/arm64/boot/dts/ti/k3-am62-verdin.dtsi b/arch/arm64/boot/dts/ti/k3-am62-verdin.dtsi
index 359f53f3e019..5bef31b8577b 100644
--- a/arch/arm64/boot/dts/ti/k3-am62-verdin.dtsi
+++ b/arch/arm64/boot/dts/ti/k3-am62-verdin.dtsi
@@ -138,12 +138,6 @@ reg_1v8_eth: regulator-1v8-eth {
vin-supply = <®_1v8>;
};
- /*
- * By default we enable CTRL_SLEEP_MOCI#, this is required to have
- * peripherals on the carrier board powered.
- * If more granularity or power saving is required this can be disabled
- * in the carrier board device tree files.
- */
reg_force_sleep_moci: regulator-force-sleep-moci {
compatible = "regulator-fixed";
enable-active-high;
--
2.39.2
uevent_show() wants to de-reference dev->driver->name. There is no clean
way for a device attribute to de-reference dev->driver unless that
attribute is defined via (struct device_driver).dev_groups. Instead, the
anti-pattern of taking the device_lock() in the attribute handler risks
deadlocks with code paths that remove device attributes while holding
the lock.
This deadlock is typically invisible to lockdep given the device_lock()
is marked lockdep_set_novalidate_class(), but some subsystems allocate a
local lockdep key for @dev->mutex to reveal reports of the form:
======================================================
WARNING: possible circular locking dependency detected
6.10.0-rc7+ #275 Tainted: G OE N
------------------------------------------------------
modprobe/2374 is trying to acquire lock:
ffff8c2270070de0 (kn->active#6){++++}-{0:0}, at: __kernfs_remove+0xde/0x220
but task is already holding lock:
ffff8c22016e88f8 (&cxl_root_key){+.+.}-{3:3}, at: device_release_driver_internal+0x39/0x210
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&cxl_root_key){+.+.}-{3:3}:
__mutex_lock+0x99/0xc30
uevent_show+0xac/0x130
dev_attr_show+0x18/0x40
sysfs_kf_seq_show+0xac/0xf0
seq_read_iter+0x110/0x450
vfs_read+0x25b/0x340
ksys_read+0x67/0xf0
do_syscall_64+0x75/0x190
entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #0 (kn->active#6){++++}-{0:0}:
__lock_acquire+0x121a/0x1fa0
lock_acquire+0xd6/0x2e0
kernfs_drain+0x1e9/0x200
__kernfs_remove+0xde/0x220
kernfs_remove_by_name_ns+0x5e/0xa0
device_del+0x168/0x410
device_unregister+0x13/0x60
devres_release_all+0xb8/0x110
device_unbind_cleanup+0xe/0x70
device_release_driver_internal+0x1c7/0x210
driver_detach+0x47/0x90
bus_remove_driver+0x6c/0xf0
cxl_acpi_exit+0xc/0x11 [cxl_acpi]
__do_sys_delete_module.isra.0+0x181/0x260
do_syscall_64+0x75/0x190
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The observation though is that driver objects are typically much longer
lived than device objects. It is reasonable to perform lockless
de-reference of a @driver pointer even if it is racing detach from a
device. Given the infrequency of driver unregistration, use
synchronize_rcu() in module_remove_driver() to close any potential
races. It is potentially overkill to suffer synchronize_rcu() just to
handle the rare module removal racing uevent_show() event.
Thanks to Tetsuo Handa for the debug analysis of the syzbot report [1].
Fixes: c0a40097f0bc ("drivers: core: synchronize really_probe() and dev_uevent()")
Reported-by: syzbot+4762dd74e32532cda5ff(a)syzkaller.appspotmail.com
Reported-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Closes: http://lore.kernel.org/5aa5558f-90a4-4864-b1b1-5d6784c5607d@I-love.SAKURA.n… [1]
Link: http://lore.kernel.org/669073b8ea479_5fffa294c1@dwillia2-xfh.jf.intel.com.n…
Cc: stable(a)vger.kernel.org
Cc: Ashish Sangwan <a.sangwan(a)samsung.com>
Cc: Namjae Jeon <namjae.jeon(a)samsung.com>
Cc: Dirk Behme <dirk.behme(a)de.bosch.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael(a)kernel.org>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
drivers/base/core.c | 13 ++++++++-----
drivers/base/module.c | 4 ++++
2 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 2b4c0624b704..b5399262198a 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -25,6 +25,7 @@
#include <linux/mutex.h>
#include <linux/pm_runtime.h>
#include <linux/netdevice.h>
+#include <linux/rcupdate.h>
#include <linux/sched/signal.h>
#include <linux/sched/mm.h>
#include <linux/string_helpers.h>
@@ -2640,6 +2641,7 @@ static const char *dev_uevent_name(const struct kobject *kobj)
static int dev_uevent(const struct kobject *kobj, struct kobj_uevent_env *env)
{
const struct device *dev = kobj_to_dev(kobj);
+ struct device_driver *driver;
int retval = 0;
/* add device node properties if present */
@@ -2668,8 +2670,12 @@ static int dev_uevent(const struct kobject *kobj, struct kobj_uevent_env *env)
if (dev->type && dev->type->name)
add_uevent_var(env, "DEVTYPE=%s", dev->type->name);
- if (dev->driver)
- add_uevent_var(env, "DRIVER=%s", dev->driver->name);
+ /* Synchronize with module_remove_driver() */
+ rcu_read_lock();
+ driver = READ_ONCE(dev->driver);
+ if (driver)
+ add_uevent_var(env, "DRIVER=%s", driver->name);
+ rcu_read_unlock();
/* Add common DT information about the device */
of_device_uevent(dev, env);
@@ -2739,11 +2745,8 @@ static ssize_t uevent_show(struct device *dev, struct device_attribute *attr,
if (!env)
return -ENOMEM;
- /* Synchronize with really_probe() */
- device_lock(dev);
/* let the kset specific function add its keys */
retval = kset->uevent_ops->uevent(&dev->kobj, env);
- device_unlock(dev);
if (retval)
goto out;
diff --git a/drivers/base/module.c b/drivers/base/module.c
index a1b55da07127..b0b79b9c189d 100644
--- a/drivers/base/module.c
+++ b/drivers/base/module.c
@@ -7,6 +7,7 @@
#include <linux/errno.h>
#include <linux/slab.h>
#include <linux/string.h>
+#include <linux/rcupdate.h>
#include "base.h"
static char *make_driver_name(struct device_driver *drv)
@@ -97,6 +98,9 @@ void module_remove_driver(struct device_driver *drv)
if (!drv)
return;
+ /* Synchronize with dev_uevent() */
+ synchronize_rcu();
+
sysfs_remove_link(&drv->p->kobj, "module");
if (drv->owner)
While calculating the end addresses of main area and segment 0, u32
may be not enough to hold the result without the danger of int
overflow.
Just in case, play it safe and cast one of the operands to a
wider type (u64).
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: fd694733d523 ("f2fs: cover large section in sanity check of super")
Cc: stable(a)vger.kernel.org
Signed-off-by: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
---
fs/f2fs/super.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 3959fd137cc9..4d8f38ca6fcd 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -3356,9 +3356,9 @@ static inline bool sanity_check_area_boundary(struct f2fs_sb_info *sbi,
u32 segment_count = le32_to_cpu(raw_super->segment_count);
u32 log_blocks_per_seg = le32_to_cpu(raw_super->log_blocks_per_seg);
u64 main_end_blkaddr = main_blkaddr +
- (segment_count_main << log_blocks_per_seg);
+ ((u64)segment_count_main << log_blocks_per_seg);
u64 seg_end_blkaddr = segment0_blkaddr +
- (segment_count << log_blocks_per_seg);
+ ((u64)segment_count << log_blocks_per_seg);
if (segment0_blkaddr != cp_blkaddr) {
f2fs_info(sbi, "Mismatch start address, segment0(%u) cp_blkaddr(%u)",
The result of multiplication between values derived from functions
dir_buckets() and bucket_blocks() *could* technically reach
2^30 * 2^2 = 2^32.
While unlikely to happen, it is prudent to ensure that it will not
lead to integer overflow. Thus, use mul_u32_u32() as it's more
appropriate to mitigate the issue.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: 3843154598a0 ("f2fs: introduce large directory support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
---
fs/f2fs/dir.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index cbd7a5e96a37..14900ca8a9ff 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -166,7 +166,8 @@ static unsigned long dir_block_index(unsigned int level,
unsigned long bidx = 0;
for (i = 0; i < level; i++)
- bidx += dir_buckets(i, dir_level) * bucket_blocks(i);
+ bidx += mul_u32_u32(dir_buckets(i, dir_level),
+ bucket_blocks(i));
bidx += idx * bucket_blocks(level);
return bidx;
}
When dealing with large extents and calculating file offsets by
summing up according extent offsets and lengths of unsigned int type,
one may encounter possible integer overflow if the values are
big enough.
Prevent this from happening by expanding one of the addends to
(pgoff_t) type.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: d323d005ac4a ("f2fs: support file defragment")
Cc: stable(a)vger.kernel.org
Signed-off-by: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
---
As the patch covers several code fragments, one singular Fixes: tag
is hard to pinpoint. Hopefully, it's not critical at this stage.
fs/f2fs/extent_cache.c | 4 ++--
fs/f2fs/file.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/f2fs/extent_cache.c b/fs/f2fs/extent_cache.c
index fd1fc06359ee..62ac440d9416 100644
--- a/fs/f2fs/extent_cache.c
+++ b/fs/f2fs/extent_cache.c
@@ -366,7 +366,7 @@ static unsigned int __free_extent_tree(struct f2fs_sb_info *sbi,
static void __drop_largest_extent(struct extent_tree *et,
pgoff_t fofs, unsigned int len)
{
- if (fofs < et->largest.fofs + et->largest.len &&
+ if (fofs < (pgoff_t)et->largest.fofs + et->largest.len &&
fofs + len > et->largest.fofs) {
et->largest.len = 0;
et->largest_updated = true;
@@ -456,7 +456,7 @@ static bool __lookup_extent_tree(struct inode *inode, pgoff_t pgofs,
if (type == EX_READ &&
et->largest.fofs <= pgofs &&
- et->largest.fofs + et->largest.len > pgofs) {
+ (pgoff_t)et->largest.fofs + et->largest.len > pgofs) {
*ei = et->largest;
ret = true;
stat_inc_largest_node_hit(sbi);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 168f08507004..c598cfe5e0ed 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -2710,7 +2710,7 @@ static int f2fs_defragment_range(struct f2fs_sb_info *sbi,
* block addresses are continuous.
*/
if (f2fs_lookup_read_extent_cache(inode, pg_start, &ei)) {
- if (ei.fofs + ei.len >= pg_end)
+ if ((pgoff_t)ei.fofs + ei.len >= pg_end)
goto out;
}
It should wait all existing dio write IOs before block removal,
otherwise, previous direct write IO may overwrite data in the
block which may be reused by other inode.
Cc: stable(a)vger.kernel.org
Signed-off-by: Chao Yu <chao(a)kernel.org>
---
fs/f2fs/file.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 76a6043caf27..f2d0e0de775f 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1056,6 +1056,13 @@ int f2fs_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
return err;
}
+ /*
+ * wait for inflight dio, blocks should be removed after
+ * IO completion.
+ */
+ if (attr->ia_size < old_size)
+ inode_dio_wait(inode);
+
f2fs_down_write(&fi->i_gc_rwsem[WRITE]);
filemap_invalidate_lock(inode->i_mapping);
@@ -1892,6 +1899,12 @@ static long f2fs_fallocate(struct file *file, int mode,
if (ret)
goto out;
+ /*
+ * wait for inflight dio, blocks should be removed after IO
+ * completion.
+ */
+ inode_dio_wait(inode);
+
if (mode & FALLOC_FL_PUNCH_HOLE) {
if (offset >= inode->i_size)
goto out;
--
2.40.1