Since commit 02fb4f008433 ("clk: clk-loongson2: Fix potential buffer
overflow in flexible-array member access"), the clk provider register is
failed.
The count of `clks_num` is shown below:
for (p = data; p->name; p++)
clks_num++;
In fact, `clks_num` represents the number of SoC clocks and should be
expressed as the maximum value of the clock binding id in use (p->id + 1).
Now we fix it to avoid the following error when trying to register a clk
provider:
[ 13.409595] of_clk_hw_onecell_get: invalid index 17
Cc: stable(a)vger.kernel.org
Cc: Gustavo A. R. Silva <gustavoars(a)kernel.org>
Fixes: 02fb4f008433 ("clk: clk-loongson2: Fix potential buffer overflow in flexible-array member access")
Signed-off-by: Binbin Zhou <zhoubinbin(a)loongson.cn>
---
V2:
- Add Gustavo A. R. Silva to cc list;
- Populate the onecell data with -ENOENT error pointers to avoid
returning NULL, for it is a valid clock.
Link to V1:
https://lore.kernel.org/all/20241225060600.3094154-1-zhoubinbin@loongson.cn/
drivers/clk/clk-loongson2.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/clk/clk-loongson2.c b/drivers/clk/clk-loongson2.c
index 6bf51d5a49a1..9c240a2308f5 100644
--- a/drivers/clk/clk-loongson2.c
+++ b/drivers/clk/clk-loongson2.c
@@ -294,7 +294,7 @@ static int loongson2_clk_probe(struct platform_device *pdev)
return -EINVAL;
for (p = data; p->name; p++)
- clks_num++;
+ clks_num = max(clks_num, p->id + 1);
clp = devm_kzalloc(dev, struct_size(clp, clk_data.hws, clks_num),
GFP_KERNEL);
@@ -309,6 +309,9 @@ static int loongson2_clk_probe(struct platform_device *pdev)
clp->clk_data.num = clks_num;
clp->dev = dev;
+ /* Avoid returning NULL for unused id */
+ memset_p((void **)&clp->clk_data.hws, ERR_PTR(-ENOENT), clks_num);
+
for (i = 0; i < clks_num; i++) {
p = &data[i];
switch (p->type) {
--
2.43.5
Hi, I'm experiencing UBSAN array-index-out-of-bounds errors while using
my Framework 13" AMD laptop with its Mediatek MT7922 wifi adapter
(mt7921e).
It seems to happen only once on boot, and occurs with both kernel
versions 6.12.7 and 6.13-rc4, both compiled from vanilla upstream kernel
sources on Fedora 41 using the kernel.org LLVM toolchain (19.1.6).
I can try some other kernel series if necessary, and also a bisect if I
find a working version, but that may take me a while.
I wasn't sure if I should mark this as a regression, as I'm not sure
which/if there is a working kernel version at this point.
Thanks.
----
[ 17.754417] UBSAN: array-index-out-of-bounds in /data/linux/net/wireless/scan.c:766:2
[ 17.754423] index 0 is out of range for type 'struct ieee80211_channel *[] __counted_by(n_channels)' (aka 'struct ieee80211_channel *[]')
[ 17.754427] CPU: 13 UID: 0 PID: 620 Comm: kworker/u64:10 Tainted: G T 6.13.0-rc4 #9
[ 17.754433] Tainted: [T]=RANDSTRUCT
[ 17.754435] Hardware name: Framework Laptop 13 (AMD Ryzen 7040Series)/FRANMDCP07, BIOS 03.05 03/29/2024
[ 17.754438] Workqueue: events_unbound cfg80211_wiphy_work
[ 17.754446] Call Trace:
[ 17.754449] <TASK>
[ 17.754452] dump_stack_lvl+0x82/0xc0
[ 17.754459] __ubsan_handle_out_of_bounds+0xe7/0x110
[ 17.754464] ? srso_alias_return_thunk+0x5/0xfbef5
[ 17.754470] ? __kmalloc_noprof+0x1a7/0x280
[ 17.754477] cfg80211_scan_6ghz+0x3bb/0xfd0
[ 17.754482] ? srso_alias_return_thunk+0x5/0xfbef5
[ 17.754486] ? try_to_wake_up+0x368/0x4c0
[ 17.754491] ? try_to_wake_up+0x1a9/0x4c0
[ 17.754496] ___cfg80211_scan_done+0xa9/0x1e0
[ 17.754500] cfg80211_wiphy_work+0xb7/0xe0
[ 17.754504] process_scheduled_works+0x205/0x3a0
[ 17.754509] worker_thread+0x24a/0x300
[ 17.754514] ? __cfi_worker_thread+0x10/0x10
[ 17.754519] kthread+0x158/0x180
[ 17.754524] ? __cfi_kthread+0x10/0x10
[ 17.754528] ret_from_fork+0x40/0x50
[ 17.754534] ? __cfi_kthread+0x10/0x10
[ 17.754538] ret_from_fork_asm+0x11/0x30
[ 17.754544] </TASK>
On Tue, Jan 14, 2025 at 10:47:33AM +0100, Johan Hovold wrote:
> On Mon, Jan 13, 2025 at 06:00:34PM +0000, Qasim Ijaz wrote:
> > This patch addresses a null-ptr-deref in qt2_process_read_urb() due to
> > an incorrect bounds check in the following:
> >
> > if (newport > serial->num_ports) {
> > dev_err(&port->dev,
> > "%s - port change to invalid port: %i\n",
> > __func__, newport);
> > break;
> > }
> >
> > The condition doesn't account for the valid range of the serial->port
> > buffer, which is from 0 to serial->num_ports - 1. When newport is equal
> > to serial->num_ports, the assignment of "port" in the
> > following code is out-of-bounds and NULL:
> >
> > serial_priv->current_port = newport;
> > port = serial->port[serial_priv->current_port];
> >
> > The fix checks if newport is greater than or equal to serial->num_ports
> > indicating it is out-of-bounds.
> >
> > Reported-by: syzbot <syzbot+506479ebf12fe435d01a(a)syzkaller.appspotmail.com>
> > Closes: https://syzkaller.appspot.com/bug?extid=506479ebf12fe435d01a
> > Fixes: f7a33e608d9a ("USB: serial: add quatech2 usb to serial driver")
> > Cc: <stable(a)vger.kernel.org> # 3.5
> > Signed-off-by: Qasim Ijaz <qasdev00(a)gmail.com>
> > ---
>
> Thanks for the update. I've applied the patch now after adding Greg's
> Reviewed-by tag (for v2).
>
> For your future contributions, try to remember to include any
> Reviewed-by or Tested-by tags when updating the patch unless the changes
> are non-trivial.
>
> There should typically also be a short change log here under the ---
> line to indicate what changes from previous versions.
>
> It is also encouraged to write the commit message in imperative mood
> (add, change, fix) and to avoid the phrase "this patch". There are some
> more details in
>
> Documentation/process/submitting-patches.rst
>
> Something to keep in mind for the future, but this patch already looks
> really good.
>
> Johan
Hi Johan,
Thanks for reviewing and applying the patch. I appreciate the guidance on patch style and process, and I'll incorporate your suggestions in future submissions.
Best regards,
Qasim
Hi Carlos,
Please pull this branch with changes for xfs.
As usual, I did a test-merge with the main upstream branch as of a few
minutes ago, and didn't see any conflicts. Please let me know if you
encounter any problems.
--D
The following changes since commit 05290bd5c6236b8ad659157edb36bd2d38f46d3e:
xfs: allow inode-based btrees to reserve space in the data device (2024-12-23 13:06:03 -0800)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git tags/realtime-rmap_2024-12-23
for you to fetch changes up to c2358439af374cad47f771797875d0beb8256738:
xfs: enable realtime rmap btree (2024-12-23 13:06:09 -0800)
----------------------------------------------------------------
xfs: realtime reverse-mapping support [v6.2 04/14]
This is the latest revision of a patchset that adds to XFS kernel
support for reverse mapping for the realtime device. This time around
I've fixed some of the bitrot that I've noticed over the past few
months, and most notably have converted rtrmapbt to use the metadata
inode directory feature instead of burning more space in the superblock.
At the beginning of the set are patches to implement storing B+tree
leaves in an inode root, since the realtime rmapbt is rooted in an
inode, unlike the regular rmapbt which is rooted in an AG block.
Prior to this, the only btree that could be rooted in the inode fork
was the block mapping btree; if all the extent records fit in the
inode, format would be switched from 'btree' to 'extents'.
The next few patches enhance the reverse mapping routines to handle
the parts that are specific to rtgroups -- adding the new btree type,
adding a new log intent item type, and wiring up the metadata directory
tree entries.
Finally, implement GETFSMAP with the rtrmapbt and scrub functionality
for the rtrmapbt and rtbitmap and online fsck functionality.
This has been running on the djcloud for months with no problems. Enjoy!
Signed-off-by: "Darrick J. Wong" <djwong(a)kernel.org>
----------------------------------------------------------------
Darrick J. Wong (37):
xfs: add some rtgroup inode helpers
xfs: prepare rmap btree cursor tracepoints for realtime
xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions
xfs: introduce realtime rmap btree ondisk definitions
xfs: realtime rmap btree transaction reservations
xfs: add realtime rmap btree operations
xfs: prepare rmap functions to deal with rtrmapbt
xfs: add a realtime flag to the rmap update log redo items
xfs: support recovering rmap intent items targetting realtime extents
xfs: pretty print metadata file types in error messages
xfs: support file data forks containing metadata btrees
xfs: add realtime reverse map inode to metadata directory
xfs: add metadata reservations for realtime rmap btrees
xfs: wire up a new metafile type for the realtime rmap
xfs: wire up rmap map and unmap to the realtime rmapbt
xfs: create routine to allocate and initialize a realtime rmap btree inode
xfs: wire up getfsmap to the realtime reverse mapping btree
xfs: check that the rtrmapbt maxlevels doesn't increase when growing fs
xfs: report realtime rmap btree corruption errors to the health system
xfs: allow queued realtime intents to drain before scrubbing
xfs: scrub the realtime rmapbt
xfs: cross-reference realtime bitmap to realtime rmapbt scrubber
xfs: cross-reference the realtime rmapbt
xfs: scan rt rmap when we're doing an intense rmap check of bmbt mappings
xfs: scrub the metadir path of rt rmap btree files
xfs: walk the rt reverse mapping tree when rebuilding rmap
xfs: online repair of realtime file bmaps
xfs: repair inodes that have realtime extents
xfs: repair rmap btree inodes
xfs: online repair of realtime bitmaps for a realtime group
xfs: support repairing metadata btrees rooted in metadir inodes
xfs: online repair of the realtime rmap btree
xfs: create a shadow rmap btree during realtime rmap repair
xfs: hook live realtime rmap operations during a repair operation
xfs: don't shut down the filesystem for media failures beyond end of log
xfs: react to fsdax failure notifications on the rt device
xfs: enable realtime rmap btree
fs/xfs/Makefile | 3 +
fs/xfs/libxfs/xfs_btree.c | 73 +++
fs/xfs/libxfs/xfs_btree.h | 8 +-
fs/xfs/libxfs/xfs_btree_mem.c | 1 +
fs/xfs/libxfs/xfs_btree_staging.c | 1 +
fs/xfs/libxfs/xfs_defer.h | 1 +
fs/xfs/libxfs/xfs_exchmaps.c | 4 +-
fs/xfs/libxfs/xfs_format.h | 28 +-
fs/xfs/libxfs/xfs_fs.h | 7 +-
fs/xfs/libxfs/xfs_health.h | 4 +-
fs/xfs/libxfs/xfs_inode_buf.c | 32 +-
fs/xfs/libxfs/xfs_inode_fork.c | 25 +
fs/xfs/libxfs/xfs_log_format.h | 6 +-
fs/xfs/libxfs/xfs_log_recover.h | 2 +
fs/xfs/libxfs/xfs_metafile.c | 18 +
fs/xfs/libxfs/xfs_metafile.h | 2 +
fs/xfs/libxfs/xfs_ondisk.h | 2 +
fs/xfs/libxfs/xfs_refcount.c | 6 +-
fs/xfs/libxfs/xfs_rmap.c | 171 +++++--
fs/xfs/libxfs/xfs_rmap.h | 12 +-
fs/xfs/libxfs/xfs_rtbitmap.c | 2 +-
fs/xfs/libxfs/xfs_rtbitmap.h | 9 +
fs/xfs/libxfs/xfs_rtgroup.c | 53 +-
fs/xfs/libxfs/xfs_rtgroup.h | 49 +-
fs/xfs/libxfs/xfs_rtrmap_btree.c | 1011 +++++++++++++++++++++++++++++++++++++
fs/xfs/libxfs/xfs_rtrmap_btree.h | 210 ++++++++
fs/xfs/libxfs/xfs_sb.c | 6 +
fs/xfs/libxfs/xfs_shared.h | 14 +
fs/xfs/libxfs/xfs_trans_resv.c | 12 +-
fs/xfs/libxfs/xfs_trans_space.h | 13 +
fs/xfs/scrub/alloc_repair.c | 5 +-
fs/xfs/scrub/bmap.c | 108 +++-
fs/xfs/scrub/bmap_repair.c | 129 ++++-
fs/xfs/scrub/common.c | 160 +++++-
fs/xfs/scrub/common.h | 23 +-
fs/xfs/scrub/health.c | 1 +
fs/xfs/scrub/inode.c | 10 +-
fs/xfs/scrub/inode_repair.c | 136 ++++-
fs/xfs/scrub/metapath.c | 3 +
fs/xfs/scrub/newbt.c | 42 ++
fs/xfs/scrub/newbt.h | 1 +
fs/xfs/scrub/reap.c | 41 ++
fs/xfs/scrub/reap.h | 2 +
fs/xfs/scrub/repair.c | 191 +++++++
fs/xfs/scrub/repair.h | 17 +
fs/xfs/scrub/rgsuper.c | 6 +-
fs/xfs/scrub/rmap_repair.c | 84 ++-
fs/xfs/scrub/rtbitmap.c | 75 ++-
fs/xfs/scrub/rtbitmap.h | 55 ++
fs/xfs/scrub/rtbitmap_repair.c | 429 +++++++++++++++-
fs/xfs/scrub/rtrmap.c | 271 ++++++++++
fs/xfs/scrub/rtrmap_repair.c | 903 +++++++++++++++++++++++++++++++++
fs/xfs/scrub/rtsummary.c | 17 +-
fs/xfs/scrub/rtsummary_repair.c | 3 +-
fs/xfs/scrub/scrub.c | 11 +-
fs/xfs/scrub/scrub.h | 14 +
fs/xfs/scrub/stats.c | 1 +
fs/xfs/scrub/tempexch.h | 2 +-
fs/xfs/scrub/tempfile.c | 20 +-
fs/xfs/scrub/trace.c | 1 +
fs/xfs/scrub/trace.h | 228 ++++++++-
fs/xfs/xfs_buf.c | 1 +
fs/xfs/xfs_buf_item_recover.c | 4 +
fs/xfs/xfs_drain.c | 20 +-
fs/xfs/xfs_drain.h | 7 +-
fs/xfs/xfs_fsmap.c | 174 ++++++-
fs/xfs/xfs_fsops.c | 11 +
fs/xfs/xfs_health.c | 1 +
fs/xfs/xfs_inode.c | 19 +-
fs/xfs/xfs_inode_item.c | 2 +
fs/xfs/xfs_inode_item_recover.c | 44 +-
fs/xfs/xfs_log_recover.c | 2 +
fs/xfs/xfs_mount.c | 5 +-
fs/xfs/xfs_mount.h | 9 +
fs/xfs/xfs_notify_failure.c | 230 +++++----
fs/xfs/xfs_notify_failure.h | 11 +
fs/xfs/xfs_qm.c | 8 +-
fs/xfs/xfs_rmap_item.c | 216 +++++++-
fs/xfs/xfs_rtalloc.c | 82 ++-
fs/xfs/xfs_rtalloc.h | 10 +
fs/xfs/xfs_stats.c | 4 +-
fs/xfs/xfs_stats.h | 2 +
fs/xfs/xfs_super.c | 6 -
fs/xfs/xfs_super.h | 1 -
fs/xfs/xfs_trace.h | 104 ++--
85 files changed, 5381 insertions(+), 366 deletions(-)
create mode 100644 fs/xfs/libxfs/xfs_rtrmap_btree.c
create mode 100644 fs/xfs/libxfs/xfs_rtrmap_btree.h
create mode 100644 fs/xfs/scrub/rtrmap.c
create mode 100644 fs/xfs/scrub/rtrmap_repair.c
create mode 100644 fs/xfs/xfs_notify_failure.h
Hi Carlos,
Please pull this branch with changes for xfs.
As usual, I did a test-merge with the main upstream branch as of a few
minutes ago, and didn't see any conflicts. Please let me know if you
encounter any problems.
--D
The following changes since commit 4bbf9020becbfd8fc2c3da790855b7042fad455b:
Linux 6.13-rc4 (2024-12-22 13:22:21 -0800)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git tags/xfs-6.13-fixes_2024-12-23
for you to fetch changes up to 1aacd3fac248902ea1f7607f2d12b93929a4833b:
xfs: release the dquot buf outside of qli_lock (2024-12-23 13:06:01 -0800)
----------------------------------------------------------------
xfs: bug fixes for 6.13 [01/14]
Bug fixes for 6.13.
This has been running on the djcloud for months with no problems. Enjoy!
Signed-off-by: "Darrick J. Wong" <djwong(a)kernel.org>
----------------------------------------------------------------
Darrick J. Wong (2):
xfs: don't over-report free space or inodes in statvfs
xfs: release the dquot buf outside of qli_lock
fs/xfs/xfs_dquot.c | 12 ++++++++----
fs/xfs/xfs_qm_bhv.c | 27 +++++++++++++++++----------
2 files changed, 25 insertions(+), 14 deletions(-)
This patch addresses a null-ptr-deref in qt2_process_read_urb() due to
an incorrect bounds check in the following:
if (newport > serial->num_ports) {
dev_err(&port->dev,
"%s - port change to invalid port: %i\n",
__func__, newport);
break;
}
The condition doesn't account for the valid range of the serial->port
buffer, which is from 0 to serial->num_ports - 1. When newport is equal
to serial->num_ports, the assignment of "port" in the
following code is out-of-bounds and NULL:
serial_priv->current_port = newport;
port = serial->port[serial_priv->current_port];
The fix checks if newport is greater than or equal to serial->num_ports
indicating it is out-of-bounds.
Reported-by: syzbot <syzbot+506479ebf12fe435d01a(a)syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=506479ebf12fe435d01a
Fixes: f7a33e608d9a ("USB: serial: add quatech2 usb to serial driver")
Cc: <stable(a)vger.kernel.org> # 3.5
Signed-off-by: Qasim Ijaz <qasdev00(a)gmail.com>
---
drivers/usb/serial/quatech2.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/serial/quatech2.c b/drivers/usb/serial/quatech2.c
index a317bdbd00ad..72fe83a6c978 100644
--- a/drivers/usb/serial/quatech2.c
+++ b/drivers/usb/serial/quatech2.c
@@ -503,7 +503,7 @@ static void qt2_process_read_urb(struct urb *urb)
newport = *(ch + 3);
- if (newport > serial->num_ports) {
+ if (newport >= serial->num_ports) {
dev_err(&port->dev,
"%s - port change to invalid port: %i\n",
__func__, newport);
--
2.39.5
This is a resend of the patchset discussed here[1] for the 5.15 tree.
[1] https://lore.kernel.org/r/2025011052-backpedal-coat-2fec@gregkh
I've picked the "do not keep dangling zcomp pointer" patch from the
linux-rc tree at the time, so kept Sasha's SOB and added mine on top
-- please let me know if it wasn't appropriate.
I've also prepared the 5.10 patches, but I hadn't realized there were so
many stable deps (8 patchs total!); I'm honestly not sure the problem is
worth the churn but since it's done and tested I'll send the patches if
there is no problem with this 5.15 version.
Thanks!
Dominique Martinet (1):
zram: check comp is non-NULL before calling comp_destroy
Kairui Song (1):
zram: fix uninitialized ZRAM not releasing backing device
Sergey Senozhatsky (1):
drivers/block/zram/zram_drv.c: do not keep dangling zcomp pointer
after zram reset
drivers/block/zram/zram_drv.c | 23 +++++++++--------------
1 file changed, 9 insertions(+), 14 deletions(-)
--
2.39.5
The quilt patch titled
Subject: mm/hugetlb: fix avoid_reserve to allow taking folio from subpool
has been removed from the -mm tree. Its filename was
mm-hugetlb-fix-avoid_reserve-to-allow-taking-folio-from-subpool.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/hugetlb: fix avoid_reserve to allow taking folio from subpool
Date: Tue, 7 Jan 2025 15:39:56 -0500
Patch series "mm/hugetlb: Refactor hugetlb allocation resv accounting",
v2.
This is a follow up on Ackerley's series here as replacement:
https://lore.kernel.org/r/cover.1728684491.git.ackerleytng@google.com
The goal of this series is to cleanup hugetlb resv accounting, especially
during folio allocation, to decouple a few things:
- Hugetlb folios v.s. Hugetlbfs: IOW, the hope is in the future hugetlb
folios can be allocated completely without hugetlbfs.
- Decouple VMA v.s. hugetlb folio allocations: allocating a hugetlb folio
should not always require a hugetlbfs VMA. For example, either it got
allocated from the inode level (see hugetlbfs_fallocate() where it used
a pesudo VMA for allocation), or it can be allocated by other kernel
subsystems.
It paves way for other users to allocate hugetlb folios out of either
system reservations, or subpools (instead of hugetlbfs, as a file system).
For longer term, this prepares hugetlb as a separate concept versus
hugetlbfs, so that hugetlb folios can be allocated by not only hugetlbfs
and other things.
Tests I've done:
- I had a reproducer in patch 1 for the bug I found, this will start to
work after patch 1 or the whole set applied.
- Hugetlb regression tests (on x86_64 2MBs), includes:
- All vmtests on hugetlbfs
- libhugetlbfs test suite (which may fail some tests, but no new failures
will be introduced by this series, so all such failures happen before
this series so shouldn't be relevant).
This patch (of 7):
Since commit 04f2cbe35699 ("hugetlb: guarantee that COW faults for a
process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed"),
avoid_reserve was introduced for a special case of CoW on hugetlb private
mappings, and only if the owner VMA is trying to allocate yet another
hugetlb folio that is not reserved within the private vma reserved map.
Later on, in commit d85f69b0b533 ("mm/hugetlb: alloc_huge_page handle
areas hole punched by fallocate"), alloc_huge_page() enforced to not
consume any global reservation as long as avoid_reserve=true. This
operation doesn't look correct, because even if it will enforce the
allocation to not use global reservation at all, it will still try to take
one reservation from the spool (if the subpool existed). Then since the
spool reserved pages take from global reservation, it'll also take one
reservation globally.
Logically it can cause global reservation to go wrong.
I wrote a reproducer below, trigger this special path, and every run of
such program will cause global reservation count to increment by one, until
it hits the number of free pages:
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <stdio.h>
#include <fcntl.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/mman.h>
#define MSIZE (2UL << 20)
int main(int argc, char *argv[])
{
const char *path;
int *buf;
int fd, ret;
pid_t child;
if (argc < 2) {
printf("usage: %s <hugetlb_file>\n", argv[0]);
return -1;
}
path = argv[1];
fd = open(path, O_RDWR | O_CREAT, 0666);
if (fd < 0) {
perror("open failed");
return -1;
}
ret = fallocate(fd, 0, 0, MSIZE);
if (ret != 0) {
perror("fallocate");
return -1;
}
buf = mmap(NULL, MSIZE, PROT_READ|PROT_WRITE,
MAP_PRIVATE, fd, 0);
if (buf == MAP_FAILED) {
perror("mmap() failed");
return -1;
}
/* Allocate a page */
*buf = 1;
child = fork();
if (child == 0) {
/* child doesn't need to do anything */
exit(0);
}
/* Trigger CoW from owner */
*buf = 2;
munmap(buf, MSIZE);
close(fd);
unlink(path);
return 0;
}
It can only reproduce with a sub-mount when there're reserved pages on the
spool, like:
# sysctl vm.nr_hugepages=128
# mkdir ./hugetlb-pool
# mount -t hugetlbfs -o min_size=8M,pagesize=2M none ./hugetlb-pool
Then run the reproducer on the mountpoint:
# ./reproducer ./hugetlb-pool/test
Fix it by taking the reservation from spool if available. In general,
avoid_reserve is IMHO more about "avoid vma resv map", not spool's.
I copied stable, however I have no intention for backporting if it's not a
clean cherry-pick, because private hugetlb mapping, and then fork() on top
is too rare to hit.
Link: https://lkml.kernel.org/r/20250107204002.2683356-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20250107204002.2683356-2-peterx@redhat.com
Fixes: d85f69b0b533 ("mm/hugetlb: alloc_huge_page handle areas hole punched by fallocate")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Ackerley Tng <ackerleytng(a)google.com>
Tested-by: Ackerley Tng <ackerleytng(a)google.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Cc: Breno Leitao <leitao(a)debian.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 22 +++-------------------
1 file changed, 3 insertions(+), 19 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-avoid_reserve-to-allow-taking-folio-from-subpool
+++ a/mm/hugetlb.c
@@ -1398,8 +1398,7 @@ static unsigned long available_huge_page
static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h,
struct vm_area_struct *vma,
- unsigned long address, int avoid_reserve,
- long chg)
+ unsigned long address, long chg)
{
struct folio *folio = NULL;
struct mempolicy *mpol;
@@ -1415,10 +1414,6 @@ static struct folio *dequeue_hugetlb_fol
if (!vma_has_reserves(vma, chg) && !available_huge_pages(h))
goto err;
- /* If reserves cannot be used, ensure enough pages are in the pool */
- if (avoid_reserve && !available_huge_pages(h))
- goto err;
-
gfp_mask = htlb_alloc_mask(h);
nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
@@ -1434,7 +1429,7 @@ static struct folio *dequeue_hugetlb_fol
folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask,
nid, nodemask);
- if (folio && !avoid_reserve && vma_has_reserves(vma, chg)) {
+ if (folio && vma_has_reserves(vma, chg)) {
folio_set_hugetlb_restore_reserve(folio);
h->resv_huge_pages--;
}
@@ -3051,17 +3046,6 @@ struct folio *alloc_hugetlb_folio(struct
gbl_chg = hugepage_subpool_get_pages(spool, 1);
if (gbl_chg < 0)
goto out_end_reservation;
-
- /*
- * Even though there was no reservation in the region/reserve
- * map, there could be reservations associated with the
- * subpool that can be used. This would be indicated if the
- * return value of hugepage_subpool_get_pages() is zero.
- * However, if avoid_reserve is specified we still avoid even
- * the subpool reservations.
- */
- if (avoid_reserve)
- gbl_chg = 1;
}
/* If this allocation is not consuming a reservation, charge it now.
@@ -3084,7 +3068,7 @@ struct folio *alloc_hugetlb_folio(struct
* from the global free pool (global change). gbl_chg == 0 indicates
* a reservation exists for the allocation.
*/
- folio = dequeue_hugetlb_folio_vma(h, vma, addr, avoid_reserve, gbl_chg);
+ folio = dequeue_hugetlb_folio_vma(h, vma, addr, gbl_chg);
if (!folio) {
spin_unlock_irq(&hugetlb_lock);
folio = alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr);
_
Patches currently in -mm which might be from peterx(a)redhat.com are