July 2024 - Linux-stable-mirror

Re: Patch "drm/amdgpu: fix the warning about the expression (int)size - len" has been added to the 6.1-stable tree

by Greg KH

On Fri, Jul 05, 2024 at 03:34:04PM -0400, Sasha Levin wrote: > This is a note to let you know that I've just added the patch titled > > drm/amdgpu: fix the warning about the expression (int)size - len > > to the 6.1-stable tree which can be found at: > http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… > > The filename of the patch is: > drm-amdgpu-fix-the-warning-about-the-expression-int-.patch > and it can be found in the queue-6.1 subdirectory. > > If you, or anyone else, feels it should not be added to the stable tree, > please let <stable(a)vger.kernel.org> know about it. > > > > commit c71e1d31c7d6735e32dcfd0043970d9fadb00b82 > Author: Jesse Zhang <jesse.zhang(a)amd.com> > Date: Thu Apr 25 15:16:40 2024 +0800 > > drm/amdgpu: fix the warning about the expression (int)size - len > > [ Upstream commit ea686fef5489ef7a2450a9fdbcc732b837fb46a8 ] > > Converting size from size_t to int may overflow. > v2: keep reverse xmas tree order (Christian) > > Signed-off-by: Jesse Zhang <jesse.zhang(a)amd.com> > Reviewed-by: Alex Deucher <alexander.deucher(a)amd.com> > Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> > Signed-off-by: Sasha Levin <sashal(a)kernel.org> Nope, this breaks the build on 6.1, which is kind of worse than fixing a build warning :( I'll go drop this now. thanks, greg k-h

9 months, 3 weeks

1
0
0 0

[Patch v2] mtd: rawnand: lpx32xx: Fix dma_request_chan() error checks

by Piotr Wojtaszczyk

The dma_request_chan() returns error pointer in case of error, while dma_request_channel() returns NULL in case of error therefore different error checks are needed for the two. Fixes: 7326d3fb1ee3 ("mtd: rawnand: lpx32xx: Request DMA channels using DT entries") Signed-off-by: Piotr Wojtaszczyk <piotr.wojtaszczyk(a)timesys.com> Cc: stable(a)vger.kernel.org --- Changes v2: - Corrected 'Fixes' tag - add Cc: stable drivers/mtd/nand/raw/lpc32xx_mlc.c | 2 +- drivers/mtd/nand/raw/lpc32xx_slc.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/mtd/nand/raw/lpc32xx_mlc.c b/drivers/mtd/nand/raw/lpc32xx_mlc.c index 92cebe871bb4..b9c3adc54c01 100644 --- a/drivers/mtd/nand/raw/lpc32xx_mlc.c +++ b/drivers/mtd/nand/raw/lpc32xx_mlc.c @@ -575,7 +575,7 @@ static int lpc32xx_dma_setup(struct lpc32xx_nand_host *host) dma_cap_mask_t mask; host->dma_chan = dma_request_chan(mtd->dev.parent, "rx-tx"); - if (!host->dma_chan) { + if (IS_ERR(host->dma_chan)) { /* fallback to request using platform data */ if (!host->pdata || !host->pdata->dma_filter) { dev_err(mtd->dev.parent, "no DMA platform data\n"); diff --git a/drivers/mtd/nand/raw/lpc32xx_slc.c b/drivers/mtd/nand/raw/lpc32xx_slc.c index 3b7e3d259785..ade971e4cc3b 100644 --- a/drivers/mtd/nand/raw/lpc32xx_slc.c +++ b/drivers/mtd/nand/raw/lpc32xx_slc.c @@ -722,7 +722,7 @@ static int lpc32xx_nand_dma_setup(struct lpc32xx_nand_host *host) dma_cap_mask_t mask; host->dma_chan = dma_request_chan(mtd->dev.parent, "rx-tx"); - if (!host->dma_chan) { + if (IS_ERR(host->dma_chan)) { /* fallback to request using platform data */ if (!host->pdata || !host->pdata->dma_filter) { dev_err(mtd->dev.parent, "no DMA platform data\n"); -- 2.25.1

9 months, 3 weeks

2
1
0 0

[PATCH 1/3] mm/numa_balancing: Teach mpol_to_str about the balancing mode

by Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> Since balancing mode was added in bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes"), it was possible to set this mode but it wouldn't be shown in /proc/<pid>/numa_maps since there was no support for it in the mpol_to_str() helper. Furthermore, because the balancing mode sets the MPOL_F_MORON flag, it would be displayed as 'default' due a workaround introduced a few years earlier in 8790c71a18e5 ("mm/mempolicy.c: fix mempolicy printing in numa_maps"). To tidy this up we implement two changes: Replace the MPOL_F_MORON check by pointer comparison against the preferred_node_policy array. By doing this we generalise the current special casing and replace the incorrect 'default' with the correct 'bind' for the mode. Secondly, we add a string representation and corresponding handling for the MPOL_F_NUMA_BALANCING flag. With the two changes together we start showing the balancing flag when it is set and therefore complete the fix. Representation format chosen is to separate multiple flags with vertical bars, following what existed long time ago in kernel 2.6.25. But as between then and now there wasn't a way to display multiple flags, this patch does not change the format in practice. Some /proc/<pid>/numa_maps output examples: 555559580000 bind=balancing:0-1,3 file=... 555585800000 bind=balancing|static:0,2 file=... 555635240000 prefer=relative:0 file= v2: * Fully fix by introducing MPOL_F_KERNEL. v3: * Abandoned the MPOL_F_KERNEL approach in favour of pointer comparisons. * Removed lookup generalisation for easier backporting. * Replaced commas as separator with vertical bars. * Added a few more words about the string format in the commit message. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> Fixes: bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes") References: 8790c71a18e5 ("mm/mempolicy.c: fix mempolicy printing in numa_maps") Cc: Huang Ying <ying.huang(a)intel.com> Cc: Mel Gorman <mgorman(a)suse.de> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Rik van Riel <riel(a)surriel.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: "Matthew Wilcox (Oracle)" <willy(a)infradead.org> Cc: Dave Hansen <dave.hansen(a)intel.com> Cc: Andi Kleen <ak(a)linux.intel.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: David Rientjes <rientjes(a)google.com> Cc: <stable(a)vger.kernel.org> # v5.12+ --- mm/mempolicy.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index aec756ae5637..1bfb6c73a39c 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -3293,8 +3293,9 @@ int mpol_parse_str(char *str, struct mempolicy **mpol) * @pol: pointer to mempolicy to be formatted * * Convert @pol into a string. If @buffer is too short, truncate the string. - * Recommend a @maxlen of at least 32 for the longest mode, "interleave", the - * longest flag, "relative", and to display at least a few node ids. + * Recommend a @maxlen of at least 42 for the longest mode, "weighted + * interleave", the longest flag, "balancing", and to display at least a few + * node ids. */ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) { @@ -3303,7 +3304,10 @@ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) unsigned short mode = MPOL_DEFAULT; unsigned short flags = 0; - if (pol && pol != &default_policy && !(pol->flags & MPOL_F_MORON)) { + if (pol && + pol != &default_policy && + !(pol >= &preferred_node_policy[0] && + pol <= &preferred_node_policy[MAX_NUMNODES - 1])) { mode = pol->mode; flags = pol->flags; } @@ -3331,12 +3335,18 @@ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) p += snprintf(p, buffer + maxlen - p, "="); /* - * Currently, the only defined flags are mutually exclusive + * Static and relative are mutually exclusive. */ if (flags & MPOL_F_STATIC_NODES) p += snprintf(p, buffer + maxlen - p, "static"); else if (flags & MPOL_F_RELATIVE_NODES) p += snprintf(p, buffer + maxlen - p, "relative"); + + if (flags & MPOL_F_NUMA_BALANCING) { + if (hweight16(flags & MPOL_MODE_FLAGS) > 1) + p += snprintf(p, buffer + maxlen - p, "|"); + p += snprintf(p, buffer + maxlen - p, "balancing"); + } } if (!nodes_empty(nodes)) -- 2.44.0

9 months, 3 weeks

3
2
0 0

[PATCH v2 1/2] f2fs: use meta inode for GC of atomic file

by Sunmin Jeong

The page cache of the atomic file keeps new data pages which will be stored in the COW file. It can also keep old data pages when GCing the atomic file. In this case, new data can be overwritten by old data if a GC thread sets the old data page as dirty after new data page was evicted. Also, since all writes to the atomic file are redirected to COW inodes, GC for the atomic file is not working well as below. f2fs_gc(gc_type=FG_GC) - select A as a victim segment do_garbage_collect - iget atomic file's inode for block B move_data_page f2fs_do_write_data_page - use dn of cow inode - set fio->old_blkaddr from cow inode - seg_freed is 0 since block B is still valid - goto gc_more and A is selected as victim again To solve the problem, let's separate GC writes and updates in the atomic file by using the meta inode for GC writes. Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Cc: stable(a)vger.kernel.org #v5.19+ Reviewed-by: Sungjong Seo <sj1557.seo(a)samsung.com> Reviewed-by: Yeongjin Gil <youngjin.gil(a)samsung.com> Signed-off-by: Sunmin Jeong <s_min.jeong(a)samsung.com> --- v2: - replace post_read to meta_gc fs/f2fs/data.c | 4 ++-- fs/f2fs/f2fs.h | 7 ++++++- fs/f2fs/gc.c | 6 +++--- fs/f2fs/segment.c | 6 +++--- 4 files changed, 14 insertions(+), 9 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index b6dcb3bcaef7..9a213d03005d 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -2693,7 +2693,7 @@ int f2fs_do_write_data_page(struct f2fs_io_info *fio) } /* wait for GCed page writeback via META_MAPPING */ - if (fio->post_read) + if (fio->meta_gc) f2fs_wait_on_block_writeback(inode, fio->old_blkaddr); /* @@ -2788,7 +2788,7 @@ int f2fs_write_single_data_page(struct page *page, int *submitted, .submitted = 0, .compr_blocks = compr_blocks, .need_lock = compr_blocks ? LOCK_DONE : LOCK_RETRY, - .post_read = f2fs_post_read_required(inode) ? 1 : 0, + .meta_gc = f2fs_meta_inode_gc_required(inode) ? 1 : 0, .io_type = io_type, .io_wbc = wbc, .bio = bio, diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index f7ee6c5e371e..796ae11c0fa3 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -1211,7 +1211,7 @@ struct f2fs_io_info { unsigned int in_list:1; /* indicate fio is in io_list */ unsigned int is_por:1; /* indicate IO is from recovery or not */ unsigned int encrypted:1; /* indicate file is encrypted */ - unsigned int post_read:1; /* require post read */ + unsigned int meta_gc:1; /* require meta inode GC */ enum iostat_type io_type; /* io type */ struct writeback_control *io_wbc; /* writeback control */ struct bio **bio; /* bio for ipu */ @@ -4263,6 +4263,11 @@ static inline bool f2fs_post_read_required(struct inode *inode) f2fs_compressed_file(inode); } +static inline bool f2fs_meta_inode_gc_required(struct inode *inode) +{ + return f2fs_post_read_required(inode) || f2fs_is_atomic_file(inode); +} + /* * compress.c */ diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index ef667fec9a12..cb3006551ab5 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -1589,7 +1589,7 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, start_bidx = f2fs_start_bidx_of_node(nofs, inode) + ofs_in_node; - if (f2fs_post_read_required(inode)) { + if (f2fs_meta_inode_gc_required(inode)) { int err = ra_data_block(inode, start_bidx); f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); @@ -1640,7 +1640,7 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, start_bidx = f2fs_start_bidx_of_node(nofs, inode) + ofs_in_node; - if (f2fs_post_read_required(inode)) + if (f2fs_meta_inode_gc_required(inode)) err = move_data_block(inode, start_bidx, gc_type, segno, off); else @@ -1648,7 +1648,7 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, segno, off); if (!err && (gc_type == FG_GC || - f2fs_post_read_required(inode))) + f2fs_meta_inode_gc_required(inode))) submitted++; if (locked) { diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 4db1add43e36..77ef46b384b4 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -3851,7 +3851,7 @@ int f2fs_inplace_write_data(struct f2fs_io_info *fio) goto drop_bio; } - if (fio->post_read) + if (fio->meta_gc) f2fs_truncate_meta_inode_pages(sbi, fio->new_blkaddr, 1); stat_inc_inplace_blocks(fio->sbi); @@ -4021,7 +4021,7 @@ void f2fs_wait_on_block_writeback(struct inode *inode, block_t blkaddr) struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct page *cpage; - if (!f2fs_post_read_required(inode)) + if (!f2fs_meta_inode_gc_required(inode)) return; if (!__is_valid_data_blkaddr(blkaddr)) @@ -4040,7 +4040,7 @@ void f2fs_wait_on_block_writeback_range(struct inode *inode, block_t blkaddr, struct f2fs_sb_info *sbi = F2FS_I_SB(inode); block_t i; - if (!f2fs_post_read_required(inode)) + if (!f2fs_meta_inode_gc_required(inode)) return; for (i = 0; i < len; i++) -- 2.25.1

9 months, 3 weeks

2
1
0 0

[PATCH] FIXUP: genirq: defuse spurious-irq timebomb

by Pete Swain

The flapping-irq detector still has a timebomb. A pathological workload, or test script, can arm the spurious-irq timebomb described in 4f27c00bf80f ("Improve behaviour of spurious IRQ detect") This leads to irqs being moved the much slower polled mode, despite the actual unhandled-irq rate being well under the 99.9k/100k threshold that the code appears to check. How? - Queued completion handler, like nvme, servicing events as they appear in the queue, even if the irq corresponding to the event has not yet been seen. - queues frequently empty, so seeing "spurious" irqs whenever the last events of a threaded handler's while (events_queued()) process_them(); ends with those events' irqs posted while thread was scanning. In this case the while() has consumed last event(s), so next handler says IRQ_NONE. - In each run of "unhandled" irqs, exactly one IRQ_NONE response is promoted from IRQ_NONE to IRQ_HANDLED, by note_interrupt()'s SPURIOUS_DEFERRED logic. - Any 2+ unhandled-irq runs will increment irqs_unhandled. The time_after() check in note_interrupt() resets irqs_unhandled to 1 after an idle period, but if irqs are never spaced more than HZ/10 apart, irqs_unhandled keeps growing. - During processing of long completion queues, the non-threaded handlers will return IRQ_WAKE_THREAD, for potentially thousands of per-event irqs. These bypass note_interrupt()'s irq_count++ logic, so do not count as handled, and do not invoke the flapping-irq logic. - When the _counted_ irq_count reaches the 100k threshold, it's possible for irqs_unhandled > 99.9k to force a move to polling mode, even though many millions of _WAKE_THREAD irqs have been handled without being counted. Solution: include IRQ_WAKE_THREAD events in irq_count. Only when IRQ_NONE responses outweigh (IRQ_HANDLED + IRQ_WAKE_THREAD) by the old 99:1 ratio will an irq be moved to polling mode. Fixes: 4f27c00bf80f ("Improve behaviour of spurious IRQ detect") Cc: stable(a)vger.kernel.org Signed-off-by: Pete Swain <swine(a)google.com> --- kernel/irq/spurious.c | 68 +++++++++++++++++++++---------------------- 1 file changed, 34 insertions(+), 34 deletions(-) diff --git a/kernel/irq/spurious.c b/kernel/irq/spurious.c index 02b2daf07441..ac596c8dc4b1 100644 --- a/kernel/irq/spurious.c +++ b/kernel/irq/spurious.c @@ -321,44 +321,44 @@ void note_interrupt(struct irq_desc *desc, irqreturn_t action_ret) */ if (!(desc->threads_handled_last & SPURIOUS_DEFERRED)) { desc->threads_handled_last |= SPURIOUS_DEFERRED; - return; - } - /* - * Check whether one of the threaded handlers - * returned IRQ_HANDLED since the last - * interrupt happened. - * - * For simplicity we just set bit 31, as it is - * set in threads_handled_last as well. So we - * avoid extra masking. And we really do not - * care about the high bits of the handled - * count. We just care about the count being - * different than the one we saw before. - */ - handled = atomic_read(&desc->threads_handled); - handled |= SPURIOUS_DEFERRED; - if (handled != desc->threads_handled_last) { - action_ret = IRQ_HANDLED; - /* - * Note: We keep the SPURIOUS_DEFERRED - * bit set. We are handling the - * previous invocation right now. - * Keep it for the current one, so the - * next hardware interrupt will - * account for it. - */ - desc->threads_handled_last = handled; } else { /* - * None of the threaded handlers felt - * responsible for the last interrupt + * Check whether one of the threaded handlers + * returned IRQ_HANDLED since the last + * interrupt happened. * - * We keep the SPURIOUS_DEFERRED bit - * set in threads_handled_last as we - * need to account for the current - * interrupt as well. + * For simplicity we just set bit 31, as it is + * set in threads_handled_last as well. So we + * avoid extra masking. And we really do not + * care about the high bits of the handled + * count. We just care about the count being + * different than the one we saw before. */ - action_ret = IRQ_NONE; + handled = atomic_read(&desc->threads_handled); + handled |= SPURIOUS_DEFERRED; + if (handled != desc->threads_handled_last) { + action_ret = IRQ_HANDLED; + /* + * Note: We keep the SPURIOUS_DEFERRED + * bit set. We are handling the + * previous invocation right now. + * Keep it for the current one, so the + * next hardware interrupt will + * account for it. + */ + desc->threads_handled_last = handled; + } else { + /* + * None of the threaded handlers felt + * responsible for the last interrupt + * + * We keep the SPURIOUS_DEFERRED bit + * set in threads_handled_last as we + * need to account for the current + * interrupt as well. + */ + action_ret = IRQ_NONE; + } } } else { /* -- 2.45.2.627.g7a2c4fd464-goog

9 months, 3 weeks

2
1
0 0

[PATCH net v2 3/4] ipv6: take care of scope when choosing the src addr

by Nicolas Dichtel

When the source address is selected, the scope must be checked. For example, if a loopback address is assigned to the vrf device, it must not be chosen for packets sent outside. CC: stable(a)vger.kernel.org Fixes: afbac6010aec ("net: ipv6: Address selection needs to consider L3 domains") Signed-off-by: Nicolas Dichtel <nicolas.dichtel(a)6wind.com> --- net/ipv6/addrconf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 5c424a0e7232..4f2c5cc31015 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -1873,7 +1873,8 @@ int ipv6_dev_get_saddr(struct net *net, const struct net_device *dst_dev, master, &dst, scores, hiscore_idx); - if (scores[hiscore_idx].ifa) + if (scores[hiscore_idx].ifa && + scores[hiscore_idx].scopedist >= 0) goto out; } -- 2.43.1

9 months, 3 weeks

2
1
0 0

[PATCH v3] mm: Fix khugepaged activation policy

by Ryan Roberts

Since the introduction of mTHP, the docuementation has stated that khugepaged would be enabled when any mTHP size is enabled, and disabled when all mTHP sizes are disabled. There are 2 problems with this; 1. this is not what was implemented by the code and 2. this is not the desirable behavior. Desirable behavior is for khugepaged to be enabled when any PMD-sized THP is enabled, anon or file. (Note that file THP is still controlled by the top-level control so we must always consider that, as well as the PMD-size mTHP control for anon). khugepaged only supports collapsing to PMD-sized THP so there is no value in enabling it when PMD-sized THP is disabled. So let's change the code and documentation to reflect this policy. Further, per-size enabled control modification events were not previously forwarded to khugepaged to give it an opportunity to start or stop. Consequently the following was resulting in khugepaged eroneously not being activated: echo never > /sys/kernel/mm/transparent_hugepage/enabled echo always > /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com> Fixes: 3485b88390b0 ("mm: thp: introduce multi-size THP sysfs interface") Closes: https://lore.kernel.org/linux-mm/7a0bbe69-1e3d-4263-b206-da007791a5c4@redha… Cc: stable(a)vger.kernel.org --- Hi All, Applies on top of mm-unstable from a couple of days ago (9bb8753acdd8). No regressions observed in mm selftests. When fixing this I also noticed that khugepaged doesn't get (and never has been) activated/deactivated by `shmem_enabled=`. I've concluded that this is definitely a (separate) bug. But I'm waiting for the conclusion of the conversation at [3] before fixing, so will send separately. Changes since v1 [1] ==================== - hugepage_pmd_enabled() now considers CONFIG_READ_ONLY_THP_FOR_FS as part of decision; means that for kernels without this config, khugepaged will not be started when only the top-level control is enabled. Changes since v2 [2] ==================== - Make hugepage_pmd_enabled() out-of-line static in khugepaged.c (per Andrew) - Refactor hugepage_pmd_enabled() for better readability (per Andrew) [1] https://lore.kernel.org/linux-mm/20240702144617.2291480-1-ryan.roberts@arm.… [2] https://lore.kernel.org/linux-mm/20240704091051.2411934-1-ryan.roberts@arm.… [3] https://lore.kernel.org/linux-mm/65c37315-2741-481f-b433-cec35ef1af35@arm.c… Thanks, Ryan Documentation/admin-guide/mm/transhuge.rst | 11 ++++---- include/linux/huge_mm.h | 12 -------- mm/huge_memory.c | 7 +++++ mm/khugepaged.c | 33 +++++++++++++++++----- 4 files changed, 38 insertions(+), 25 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 709fe10b60f4..fc321d40b8ac 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -202,12 +202,11 @@ PMD-mappable transparent hugepage:: cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size -khugepaged will be automatically started when one or more hugepage -sizes are enabled (either by directly setting "always" or "madvise", -or by setting "inherit" while the top-level enabled is set to "always" -or "madvise"), and it'll be automatically shutdown when the last -hugepage size is disabled (either by directly setting "never", or by -setting "inherit" while the top-level enabled is set to "never"). +khugepaged will be automatically started when PMD-sized THP is enabled +(either of the per-size anon control or the top-level control are set +to "always" or "madvise"), and it'll be automatically shutdown when +PMD-sized THP is disabled (when both the per-size anon control and the +top-level control are "never") Khugepaged controls ------------------- diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 4d155c7a4792..107da5c4eba4 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -128,18 +128,6 @@ static inline bool hugepage_global_always(void) (1<<TRANSPARENT_HUGEPAGE_FLAG); } -static inline bool hugepage_flags_enabled(void) -{ - /* - * We cover both the anon and the file-backed case here; we must return - * true if globally enabled, even when all anon sizes are set to never. - * So we don't need to look at huge_anon_orders_inherit. - */ - return hugepage_global_enabled() || - READ_ONCE(huge_anon_orders_always) || - READ_ONCE(huge_anon_orders_madvise); -} - static inline int highest_order(unsigned long orders) { return fls_long(orders) - 1; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 251d6932130f..085f5e973231 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -502,6 +502,13 @@ static ssize_t thpsize_enabled_store(struct kobject *kobj, } else ret = -EINVAL; + if (ret > 0) { + int err; + + err = start_stop_khugepaged(); + if (err) + ret = err; + } return ret; } diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 409f67a817f1..a5ec03ef8722 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -413,6 +413,26 @@ static inline int hpage_collapse_test_exit_or_disable(struct mm_struct *mm) test_bit(MMF_DISABLE_THP, &mm->flags); } +static bool hugepage_pmd_enabled(void) +{ + /* + * We cover both the anon and the file-backed case here; file-backed + * hugepages, when configured in, are determined by the global control. + * Anon pmd-sized hugepages are determined by the pmd-size control. + */ + if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && + hugepage_global_enabled()) + return true; + if (test_bit(PMD_ORDER, &huge_anon_orders_always)) + return true; + if (test_bit(PMD_ORDER, &huge_anon_orders_madvise)) + return true; + if (test_bit(PMD_ORDER, &huge_anon_orders_inherit) && + hugepage_global_enabled()) + return true; + return false; +} + void __khugepaged_enter(struct mm_struct *mm) { struct khugepaged_mm_slot *mm_slot; @@ -449,7 +469,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma, unsigned long vm_flags) { if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) && - hugepage_flags_enabled()) { + hugepage_pmd_enabled()) { if (thp_vma_allowable_order(vma, vm_flags, TVA_ENFORCE_SYSFS, PMD_ORDER)) __khugepaged_enter(vma->vm_mm); @@ -2462,8 +2482,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, static int khugepaged_has_work(void) { - return !list_empty(&khugepaged_scan.mm_head) && - hugepage_flags_enabled(); + return !list_empty(&khugepaged_scan.mm_head) && hugepage_pmd_enabled(); } static int khugepaged_wait_event(void) @@ -2536,7 +2555,7 @@ static void khugepaged_wait_work(void) return; } - if (hugepage_flags_enabled()) + if (hugepage_pmd_enabled()) wait_event_freezable(khugepaged_wait, khugepaged_wait_event()); } @@ -2567,7 +2586,7 @@ static void set_recommended_min_free_kbytes(void) int nr_zones = 0; unsigned long recommended_min; - if (!hugepage_flags_enabled()) { + if (!hugepage_pmd_enabled()) { calculate_min_free_kbytes(); goto update_wmarks; } @@ -2617,7 +2636,7 @@ int start_stop_khugepaged(void) int err = 0; mutex_lock(&khugepaged_mutex); - if (hugepage_flags_enabled()) { + if (hugepage_pmd_enabled()) { if (!khugepaged_thread) khugepaged_thread = kthread_run(khugepaged, NULL, "khugepaged"); @@ -2643,7 +2662,7 @@ int start_stop_khugepaged(void) void khugepaged_min_free_kbytes_update(void) { mutex_lock(&khugepaged_mutex); - if (hugepage_flags_enabled() && khugepaged_thread) + if (hugepage_pmd_enabled() && khugepaged_thread) set_recommended_min_free_kbytes(); mutex_unlock(&khugepaged_mutex); } -- 2.43.0

9 months, 3 weeks

2
1
0 0

[merged mm-hotfixes-stable] mm-fix-crashes-from-deferred-split-racing-folio-migration.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: fix crashes from deferred split racing folio migration has been removed from the -mm tree. Its filename was mm-fix-crashes-from-deferred-split-racing-folio-migration.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Hugh Dickins <hughd(a)google.com> Subject: mm: fix crashes from deferred split racing folio migration Date: Tue, 2 Jul 2024 00:40:55 -0700 (PDT) Even on 6.10-rc6, I've been seeing elusive "Bad page state"s (often on flags when freeing, yet the flags shown are not bad: PG_locked had been set and cleared??), and VM_BUG_ON_PAGE(page_ref_count(page) == 0)s from deferred_split_scan()'s folio_put(), and a variety of other BUG and WARN symptoms implying double free by deferred split and large folio migration. 6.7 commit 9bcef5973e31 ("mm: memcg: fix split queue list crash when large folio migration") was right to fix the memcg-dependent locking broken in 85ce2c517ade ("memcontrol: only transfer the memcg data for migration"), but missed a subtlety of deferred_split_scan(): it moves folios to its own local list to work on them without split_queue_lock, during which time folio->_deferred_list is not empty, but even the "right" lock does nothing to secure the folio and the list it is on. Fortunately, deferred_split_scan() is careful to use folio_try_get(): so folio_migrate_mapping() can avoid the race by folio_undo_large_rmappable() while the old folio's reference count is temporarily frozen to 0 - adding such a freeze in the !mapping case too (originally, folio lock and unmapping and no swap cache left an anon folio unreachable, so no freezing was needed there: but the deferred split queue offers a way to reach it). Link: https://lkml.kernel.org/r/29c83d1a-11ca-b6c9-f92e-6ccb322af510@google.com Fixes: 9bcef5973e31 ("mm: memcg: fix split queue list crash when large folio migration") Signed-off-by: Hugh Dickins <hughd(a)google.com> Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com> Cc: Barry Song <baohua(a)kernel.org> Cc: David Hildenbrand <david(a)redhat.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Nhat Pham <nphamcs(a)gmail.com> Cc: Yang Shi <shy828301(a)gmail.com> Cc: Zi Yan <ziy(a)nvidia.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memcontrol.c | 11 ----------- mm/migrate.c | 13 +++++++++++++ 2 files changed, 13 insertions(+), 11 deletions(-) --- a/mm/memcontrol.c~mm-fix-crashes-from-deferred-split-racing-folio-migration +++ a/mm/memcontrol.c @@ -7823,17 +7823,6 @@ void mem_cgroup_migrate(struct folio *ol /* Transfer the charge and the css ref */ commit_charge(new, memcg); - /* - * If the old folio is a large folio and is in the split queue, it needs - * to be removed from the split queue now, in case getting an incorrect - * split queue in destroy_large_folio() after the memcg of the old folio - * is cleared. - * - * In addition, the old folio is about to be freed after migration, so - * removing from the split queue a bit earlier seems reasonable. - */ - if (folio_test_large(old) && folio_test_large_rmappable(old)) - folio_undo_large_rmappable(old); old->memcg_data = 0; } --- a/mm/migrate.c~mm-fix-crashes-from-deferred-split-racing-folio-migration +++ a/mm/migrate.c @@ -415,6 +415,15 @@ int folio_migrate_mapping(struct address if (folio_ref_count(folio) != expected_count) return -EAGAIN; + /* Take off deferred split queue while frozen and memcg set */ + if (folio_test_large(folio) && + folio_test_large_rmappable(folio)) { + if (!folio_ref_freeze(folio, expected_count)) + return -EAGAIN; + folio_undo_large_rmappable(folio); + folio_ref_unfreeze(folio, expected_count); + } + /* No turning back from here */ newfolio->index = folio->index; newfolio->mapping = folio->mapping; @@ -433,6 +442,10 @@ int folio_migrate_mapping(struct address return -EAGAIN; } + /* Take off deferred split queue while frozen and memcg set */ + if (folio_test_large(folio) && folio_test_large_rmappable(folio)) + folio_undo_large_rmappable(folio); + /* * Now we know that no one else is looking at the folio: * no turning back from here. _ Patches currently in -mm which might be from hughd(a)google.com are

9 months, 3 weeks

1
0
0 0

[merged mm-hotfixes-stable] mm-gup-stop-abusing-try_grab_folio.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: gup: stop abusing try_grab_folio has been removed from the -mm tree. Its filename was mm-gup-stop-abusing-try_grab_folio.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Yang Shi <yang(a)os.amperecomputing.com> Subject: mm: gup: stop abusing try_grab_folio Date: Fri, 28 Jun 2024 12:14:58 -0700 A kernel warning was reported when pinning folio in CMA memory when launching SEV virtual machine. The splat looks like: [ 464.325306] WARNING: CPU: 13 PID: 6734 at mm/gup.c:1313 __get_user_pages+0x423/0x520 [ 464.325464] CPU: 13 PID: 6734 Comm: qemu-kvm Kdump: loaded Not tainted 6.6.33+ #6 [ 464.325477] RIP: 0010:__get_user_pages+0x423/0x520 [ 464.325515] Call Trace: [ 464.325520] <TASK> [ 464.325523] ? __get_user_pages+0x423/0x520 [ 464.325528] ? __warn+0x81/0x130 [ 464.325536] ? __get_user_pages+0x423/0x520 [ 464.325541] ? report_bug+0x171/0x1a0 [ 464.325549] ? handle_bug+0x3c/0x70 [ 464.325554] ? exc_invalid_op+0x17/0x70 [ 464.325558] ? asm_exc_invalid_op+0x1a/0x20 [ 464.325567] ? __get_user_pages+0x423/0x520 [ 464.325575] __gup_longterm_locked+0x212/0x7a0 [ 464.325583] internal_get_user_pages_fast+0xfb/0x190 [ 464.325590] pin_user_pages_fast+0x47/0x60 [ 464.325598] sev_pin_memory+0xca/0x170 [kvm_amd] [ 464.325616] sev_mem_enc_register_region+0x81/0x130 [kvm_amd] Per the analysis done by yangge, when starting the SEV virtual machine, it will call pin_user_pages_fast(..., FOLL_LONGTERM, ...) to pin the memory. But the page is in CMA area, so fast GUP will fail then fallback to the slow path due to the longterm pinnalbe check in try_grab_folio(). The slow path will try to pin the pages then migrate them out of CMA area. But the slow path also uses try_grab_folio() to pin the page, it will also fail due to the same check then the above warning is triggered. In addition, the try_grab_folio() is supposed to be used in fast path and it elevates folio refcount by using add ref unless zero. We are guaranteed to have at least one stable reference in slow path, so the simple atomic add could be used. The performance difference should be trivial, but the misuse may be confusing and misleading. Redefined try_grab_folio() to try_grab_folio_fast(), and try_grab_page() to try_grab_folio(), and use them in the proper paths. This solves both the abuse and the kernel warning. The proper naming makes their usecase more clear and should prevent from abusing in the future. peterx said: : The user will see the pin fails, for gpu-slow it further triggers the WARN : right below that failure (as in the original report): : : folio = try_grab_folio(page, page_increm - 1, : foll_flags); : if (WARN_ON_ONCE(!folio)) { <------------------------ here : /* : * Release the 1st page ref if the : * folio is problematic, fail hard. : */ : gup_put_folio(page_folio(page), 1, : foll_flags); : ret = -EFAULT; : goto out; : } [1] https://lore.kernel.org/linux-mm/1719478388-31917-1-git-send-email-yangge11… [shy828301(a)gmail.com: fix implicit declaration of function try_grab_folio_fast] Link: https://lkml.kernel.org/r/CAHbLzkowMSso-4Nufc9hcMehQsK9PNz3OSu-+eniU-2Mm-xj… Link: https://lkml.kernel.org/r/20240628191458.2605553-1-yang@os.amperecomputing.… Fixes: 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages != NULL"") Signed-off-by: Yang Shi <yang(a)os.amperecomputing.com> Reported-by: yangge <yangge1116(a)126.com> Cc: Christoph Hellwig <hch(a)infradead.org> Cc: David Hildenbrand <david(a)redhat.com> Cc: Peter Xu <peterx(a)redhat.com> Cc: <stable(a)vger.kernel.org> [6.6+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/gup.c | 289 +++++++++++++++++++++++---------------------- mm/huge_memory.c | 2 mm/internal.h | 4 3 files changed, 156 insertions(+), 139 deletions(-) --- a/mm/gup.c~mm-gup-stop-abusing-try_grab_folio +++ a/mm/gup.c @@ -97,95 +97,6 @@ retry: return folio; } -/** - * try_grab_folio() - Attempt to get or pin a folio. - * @page: pointer to page to be grabbed - * @refs: the value to (effectively) add to the folio's refcount - * @flags: gup flags: these are the FOLL_* flag values. - * - * "grab" names in this file mean, "look at flags to decide whether to use - * FOLL_PIN or FOLL_GET behavior, when incrementing the folio's refcount. - * - * Either FOLL_PIN or FOLL_GET (or neither) must be set, but not both at the - * same time. (That's true throughout the get_user_pages*() and - * pin_user_pages*() APIs.) Cases: - * - * FOLL_GET: folio's refcount will be incremented by @refs. - * - * FOLL_PIN on large folios: folio's refcount will be incremented by - * @refs, and its pincount will be incremented by @refs. - * - * FOLL_PIN on single-page folios: folio's refcount will be incremented by - * @refs * GUP_PIN_COUNTING_BIAS. - * - * Return: The folio containing @page (with refcount appropriately - * incremented) for success, or NULL upon failure. If neither FOLL_GET - * nor FOLL_PIN was set, that's considered failure, and furthermore, - * a likely bug in the caller, so a warning is also emitted. - */ -struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) -{ - struct folio *folio; - - if (WARN_ON_ONCE((flags & (FOLL_GET | FOLL_PIN)) == 0)) - return NULL; - - if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page))) - return NULL; - - if (flags & FOLL_GET) - return try_get_folio(page, refs); - - /* FOLL_PIN is set */ - - /* - * Don't take a pin on the zero page - it's not going anywhere - * and it is used in a *lot* of places. - */ - if (is_zero_page(page)) - return page_folio(page); - - folio = try_get_folio(page, refs); - if (!folio) - return NULL; - - /* - * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a - * right zone, so fail and let the caller fall back to the slow - * path. - */ - if (unlikely((flags & FOLL_LONGTERM) && - !folio_is_longterm_pinnable(folio))) { - if (!put_devmap_managed_folio_refs(folio, refs)) - folio_put_refs(folio, refs); - return NULL; - } - - /* - * When pinning a large folio, use an exact count to track it. - * - * However, be sure to *also* increment the normal folio - * refcount field at least once, so that the folio really - * is pinned. That's why the refcount from the earlier - * try_get_folio() is left intact. - */ - if (folio_test_large(folio)) - atomic_add(refs, &folio->_pincount); - else - folio_ref_add(folio, - refs * (GUP_PIN_COUNTING_BIAS - 1)); - /* - * Adjust the pincount before re-checking the PTE for changes. - * This is essentially a smp_mb() and is paired with a memory - * barrier in folio_try_share_anon_rmap_*(). - */ - smp_mb__after_atomic(); - - node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs); - - return folio; -} - static void gup_put_folio(struct folio *folio, int refs, unsigned int flags) { if (flags & FOLL_PIN) { @@ -203,58 +114,59 @@ static void gup_put_folio(struct folio * } /** - * try_grab_page() - elevate a page's refcount by a flag-dependent amount - * @page: pointer to page to be grabbed - * @flags: gup flags: these are the FOLL_* flag values. + * try_grab_folio() - add a folio's refcount by a flag-dependent amount + * @folio: pointer to folio to be grabbed + * @refs: the value to (effectively) add to the folio's refcount + * @flags: gup flags: these are the FOLL_* flag values * * This might not do anything at all, depending on the flags argument. * * "grab" names in this file mean, "look at flags to decide whether to use - * FOLL_PIN or FOLL_GET behavior, when incrementing the page's refcount. + * FOLL_PIN or FOLL_GET behavior, when incrementing the folio's refcount. * * Either FOLL_PIN or FOLL_GET (or neither) may be set, but not both at the same - * time. Cases: please see the try_grab_folio() documentation, with - * "refs=1". + * time. * * Return: 0 for success, or if no action was required (if neither FOLL_PIN * nor FOLL_GET was set, nothing is done). A negative error code for failure: * - * -ENOMEM FOLL_GET or FOLL_PIN was set, but the page could not + * -ENOMEM FOLL_GET or FOLL_PIN was set, but the folio could not * be grabbed. + * + * It is called when we have a stable reference for the folio, typically in + * GUP slow path. */ -int __must_check try_grab_page(struct page *page, unsigned int flags) +int __must_check try_grab_folio(struct folio *folio, int refs, + unsigned int flags) { - struct folio *folio = page_folio(page); - if (WARN_ON_ONCE(folio_ref_count(folio) <= 0)) return -ENOMEM; - if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page))) + if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(&folio->page))) return -EREMOTEIO; if (flags & FOLL_GET) - folio_ref_inc(folio); + folio_ref_add(folio, refs); else if (flags & FOLL_PIN) { /* * Don't take a pin on the zero page - it's not going anywhere * and it is used in a *lot* of places. */ - if (is_zero_page(page)) + if (is_zero_folio(folio)) return 0; /* - * Similar to try_grab_folio(): be sure to *also* - * increment the normal page refcount field at least once, + * Increment the normal page refcount field at least once, * so that the page really is pinned. */ if (folio_test_large(folio)) { - folio_ref_add(folio, 1); - atomic_add(1, &folio->_pincount); + folio_ref_add(folio, refs); + atomic_add(refs, &folio->_pincount); } else { - folio_ref_add(folio, GUP_PIN_COUNTING_BIAS); + folio_ref_add(folio, refs * GUP_PIN_COUNTING_BIAS); } - node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1); + node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs); } return 0; @@ -515,6 +427,102 @@ static int record_subpages(struct page * return nr; } + +/** + * try_grab_folio_fast() - Attempt to get or pin a folio in fast path. + * @page: pointer to page to be grabbed + * @refs: the value to (effectively) add to the folio's refcount + * @flags: gup flags: these are the FOLL_* flag values. + * + * "grab" names in this file mean, "look at flags to decide whether to use + * FOLL_PIN or FOLL_GET behavior, when incrementing the folio's refcount. + * + * Either FOLL_PIN or FOLL_GET (or neither) must be set, but not both at the + * same time. (That's true throughout the get_user_pages*() and + * pin_user_pages*() APIs.) Cases: + * + * FOLL_GET: folio's refcount will be incremented by @refs. + * + * FOLL_PIN on large folios: folio's refcount will be incremented by + * @refs, and its pincount will be incremented by @refs. + * + * FOLL_PIN on single-page folios: folio's refcount will be incremented by + * @refs * GUP_PIN_COUNTING_BIAS. + * + * Return: The folio containing @page (with refcount appropriately + * incremented) for success, or NULL upon failure. If neither FOLL_GET + * nor FOLL_PIN was set, that's considered failure, and furthermore, + * a likely bug in the caller, so a warning is also emitted. + * + * It uses add ref unless zero to elevate the folio refcount and must be called + * in fast path only. + */ +static struct folio *try_grab_folio_fast(struct page *page, int refs, + unsigned int flags) +{ + struct folio *folio; + + /* Raise warn if it is not called in fast GUP */ + VM_WARN_ON_ONCE(!irqs_disabled()); + + if (WARN_ON_ONCE((flags & (FOLL_GET | FOLL_PIN)) == 0)) + return NULL; + + if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page))) + return NULL; + + if (flags & FOLL_GET) + return try_get_folio(page, refs); + + /* FOLL_PIN is set */ + + /* + * Don't take a pin on the zero page - it's not going anywhere + * and it is used in a *lot* of places. + */ + if (is_zero_page(page)) + return page_folio(page); + + folio = try_get_folio(page, refs); + if (!folio) + return NULL; + + /* + * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a + * right zone, so fail and let the caller fall back to the slow + * path. + */ + if (unlikely((flags & FOLL_LONGTERM) && + !folio_is_longterm_pinnable(folio))) { + if (!put_devmap_managed_folio_refs(folio, refs)) + folio_put_refs(folio, refs); + return NULL; + } + + /* + * When pinning a large folio, use an exact count to track it. + * + * However, be sure to *also* increment the normal folio + * refcount field at least once, so that the folio really + * is pinned. That's why the refcount from the earlier + * try_get_folio() is left intact. + */ + if (folio_test_large(folio)) + atomic_add(refs, &folio->_pincount); + else + folio_ref_add(folio, + refs * (GUP_PIN_COUNTING_BIAS - 1)); + /* + * Adjust the pincount before re-checking the PTE for changes. + * This is essentially a smp_mb() and is paired with a memory + * barrier in folio_try_share_anon_rmap_*(). + */ + smp_mb__after_atomic(); + + node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs); + + return folio; +} #endif /* CONFIG_ARCH_HAS_HUGEPD || CONFIG_HAVE_GUP_FAST */ #ifdef CONFIG_ARCH_HAS_HUGEPD @@ -535,7 +543,7 @@ static unsigned long hugepte_addr_end(un */ static int gup_hugepte(struct vm_area_struct *vma, pte_t *ptep, unsigned long sz, unsigned long addr, unsigned long end, unsigned int flags, - struct page **pages, int *nr) + struct page **pages, int *nr, bool fast) { unsigned long pte_end; struct page *page; @@ -558,9 +566,15 @@ static int gup_hugepte(struct vm_area_st page = pte_page(pte); refs = record_subpages(page, sz, addr, end, pages + *nr); - folio = try_grab_folio(page, refs, flags); - if (!folio) - return 0; + if (fast) { + folio = try_grab_folio_fast(page, refs, flags); + if (!folio) + return 0; + } else { + folio = page_folio(page); + if (try_grab_folio(folio, refs, flags)) + return 0; + } if (unlikely(pte_val(pte) != pte_val(ptep_get(ptep)))) { gup_put_folio(folio, refs, flags); @@ -588,7 +602,7 @@ static int gup_hugepte(struct vm_area_st static int gup_hugepd(struct vm_area_struct *vma, hugepd_t hugepd, unsigned long addr, unsigned int pdshift, unsigned long end, unsigned int flags, - struct page **pages, int *nr) + struct page **pages, int *nr, bool fast) { pte_t *ptep; unsigned long sz = 1UL << hugepd_shift(hugepd); @@ -598,7 +612,8 @@ static int gup_hugepd(struct vm_area_str ptep = hugepte_offset(hugepd, addr, pdshift); do { next = hugepte_addr_end(addr, end, sz); - ret = gup_hugepte(vma, ptep, sz, addr, end, flags, pages, nr); + ret = gup_hugepte(vma, ptep, sz, addr, end, flags, pages, nr, + fast); if (ret != 1) return ret; } while (ptep++, addr = next, addr != end); @@ -625,7 +640,7 @@ static struct page *follow_hugepd(struct ptep = hugepte_offset(hugepd, addr, pdshift); ptl = huge_pte_lock(h, vma->vm_mm, ptep); ret = gup_hugepd(vma, hugepd, addr, pdshift, addr + PAGE_SIZE, - flags, &page, &nr); + flags, &page, &nr, false); spin_unlock(ptl); if (ret == 1) { @@ -642,7 +657,7 @@ static struct page *follow_hugepd(struct static inline int gup_hugepd(struct vm_area_struct *vma, hugepd_t hugepd, unsigned long addr, unsigned int pdshift, unsigned long end, unsigned int flags, - struct page **pages, int *nr) + struct page **pages, int *nr, bool fast) { return 0; } @@ -729,7 +744,7 @@ static struct page *follow_huge_pud(stru gup_must_unshare(vma, flags, page)) return ERR_PTR(-EMLINK); - ret = try_grab_page(page, flags); + ret = try_grab_folio(page_folio(page), 1, flags); if (ret) page = ERR_PTR(ret); else @@ -806,7 +821,7 @@ static struct page *follow_huge_pmd(stru VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) && !PageAnonExclusive(page), page); - ret = try_grab_page(page, flags); + ret = try_grab_folio(page_folio(page), 1, flags); if (ret) return ERR_PTR(ret); @@ -968,8 +983,8 @@ static struct page *follow_page_pte(stru VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) && !PageAnonExclusive(page), page); - /* try_grab_page() does nothing unless FOLL_GET or FOLL_PIN is set. */ - ret = try_grab_page(page, flags); + /* try_grab_folio() does nothing unless FOLL_GET or FOLL_PIN is set. */ + ret = try_grab_folio(page_folio(page), 1, flags); if (unlikely(ret)) { page = ERR_PTR(ret); goto out; @@ -1233,7 +1248,7 @@ static int get_gate_page(struct mm_struc goto unmap; *page = pte_page(entry); } - ret = try_grab_page(*page, gup_flags); + ret = try_grab_folio(page_folio(*page), 1, gup_flags); if (unlikely(ret)) goto unmap; out: @@ -1636,20 +1651,19 @@ next_page: * pages. */ if (page_increm > 1) { - struct folio *folio; + struct folio *folio = page_folio(page); /* * Since we already hold refcount on the * large folio, this should never fail. */ - folio = try_grab_folio(page, page_increm - 1, - foll_flags); - if (WARN_ON_ONCE(!folio)) { + if (try_grab_folio(folio, page_increm - 1, + foll_flags)) { /* * Release the 1st page ref if the * folio is problematic, fail hard. */ - gup_put_folio(page_folio(page), 1, + gup_put_folio(folio, 1, foll_flags); ret = -EFAULT; goto out; @@ -2797,7 +2811,6 @@ EXPORT_SYMBOL(get_user_pages_unlocked); * This code is based heavily on the PowerPC implementation by Nick Piggin. */ #ifdef CONFIG_HAVE_GUP_FAST - /* * Used in the GUP-fast path to determine whether GUP is permitted to work on * a specific folio. @@ -2962,7 +2975,7 @@ static int gup_fast_pte_range(pmd_t pmd, VM_BUG_ON(!pfn_valid(pte_pfn(pte))); page = pte_page(pte); - folio = try_grab_folio(page, 1, flags); + folio = try_grab_folio_fast(page, 1, flags); if (!folio) goto pte_unmap; @@ -3049,7 +3062,7 @@ static int gup_fast_devmap_leaf(unsigned break; } - folio = try_grab_folio(page, 1, flags); + folio = try_grab_folio_fast(page, 1, flags); if (!folio) { gup_fast_undo_dev_pagemap(nr, nr_start, flags, pages); break; @@ -3138,7 +3151,7 @@ static int gup_fast_pmd_leaf(pmd_t orig, page = pmd_page(orig); refs = record_subpages(page, PMD_SIZE, addr, end, pages + *nr); - folio = try_grab_folio(page, refs, flags); + folio = try_grab_folio_fast(page, refs, flags); if (!folio) return 0; @@ -3182,7 +3195,7 @@ static int gup_fast_pud_leaf(pud_t orig, page = pud_page(orig); refs = record_subpages(page, PUD_SIZE, addr, end, pages + *nr); - folio = try_grab_folio(page, refs, flags); + folio = try_grab_folio_fast(page, refs, flags); if (!folio) return 0; @@ -3222,7 +3235,7 @@ static int gup_fast_pgd_leaf(pgd_t orig, page = pgd_page(orig); refs = record_subpages(page, PGDIR_SIZE, addr, end, pages + *nr); - folio = try_grab_folio(page, refs, flags); + folio = try_grab_folio_fast(page, refs, flags); if (!folio) return 0; @@ -3276,7 +3289,8 @@ static int gup_fast_pmd_range(pud_t *pud * pmd format and THP pmd format */ if (gup_hugepd(NULL, __hugepd(pmd_val(pmd)), addr, - PMD_SHIFT, next, flags, pages, nr) != 1) + PMD_SHIFT, next, flags, pages, nr, + true) != 1) return 0; } else if (!gup_fast_pte_range(pmd, pmdp, addr, next, flags, pages, nr)) @@ -3306,7 +3320,8 @@ static int gup_fast_pud_range(p4d_t *p4d return 0; } else if (unlikely(is_hugepd(__hugepd(pud_val(pud))))) { if (gup_hugepd(NULL, __hugepd(pud_val(pud)), addr, - PUD_SHIFT, next, flags, pages, nr) != 1) + PUD_SHIFT, next, flags, pages, nr, + true) != 1) return 0; } else if (!gup_fast_pmd_range(pudp, pud, addr, next, flags, pages, nr)) @@ -3333,7 +3348,8 @@ static int gup_fast_p4d_range(pgd_t *pgd BUILD_BUG_ON(p4d_leaf(p4d)); if (unlikely(is_hugepd(__hugepd(p4d_val(p4d))))) { if (gup_hugepd(NULL, __hugepd(p4d_val(p4d)), addr, - P4D_SHIFT, next, flags, pages, nr) != 1) + P4D_SHIFT, next, flags, pages, nr, + true) != 1) return 0; } else if (!gup_fast_pud_range(p4dp, p4d, addr, next, flags, pages, nr)) @@ -3362,7 +3378,8 @@ static void gup_fast_pgd_range(unsigned return; } else if (unlikely(is_hugepd(__hugepd(pgd_val(pgd))))) { if (gup_hugepd(NULL, __hugepd(pgd_val(pgd)), addr, - PGDIR_SHIFT, next, flags, pages, nr) != 1) + PGDIR_SHIFT, next, flags, pages, nr, + true) != 1) return; } else if (!gup_fast_p4d_range(pgdp, pgd, addr, next, flags, pages, nr)) --- a/mm/huge_memory.c~mm-gup-stop-abusing-try_grab_folio +++ a/mm/huge_memory.c @@ -1331,7 +1331,7 @@ struct page *follow_devmap_pmd(struct vm if (!*pgmap) return ERR_PTR(-EFAULT); page = pfn_to_page(pfn); - ret = try_grab_page(page, flags); + ret = try_grab_folio(page_folio(page), 1, flags); if (ret) page = ERR_PTR(ret); --- a/mm/internal.h~mm-gup-stop-abusing-try_grab_folio +++ a/mm/internal.h @@ -1182,8 +1182,8 @@ int migrate_device_coherent_page(struct /* * mm/gup.c */ -struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags); -int __must_check try_grab_page(struct page *page, unsigned int flags); +int __must_check try_grab_folio(struct folio *folio, int refs, + unsigned int flags); /* * mm/huge_memory.c _ Patches currently in -mm which might be from yang(a)os.amperecomputing.com are

9 months, 3 weeks

1
0
0 0

buckles ,leather hardware ,bag hardware ,shoe hardware ,cord hardware ,hockey hardware, head pins ,wallets ,trims ,handles,tools ,fashion ,wires supply ?

by qualityfactory＠vip.163.com

Dear Sir : Nice day! This is Su from METAL & HARNESSES factory ,we make D rings , O rings ,Plastic hardware ,clasps ,clips , buckles ,leather hardware ,bag hardware ,shoe hardware ,cord hardware ,hockey hardware, head pins ,wallets ,trims ,handles,tools ,fashion ,wires ,fastening,rivets , grommets,eyelets,buckles,screws , harnesses ,wires ,bolts ,nuts ,clips ,bits ,spurs ,screws,anchors,stainless steel hardware , brass hardware,steel harnesses ,aluminium hardware ,iron hardware ,metal & Harnesses,belt and buckles,straps ,leather Harnesses,tags ,belts,straps ,clips ,rings bars ,hooks , safety buckles,brass buckles, brass rings ,belts,buckles ,Split Ring ,straps ,sliders,snaps ,Bars Rings, Fasteners, magnent buttons,craft hardware ,bindings ,knobs,steps ,straps , bits ,eyelets ,tools , metal gear , metal hardware , metal products, metal Components,S hooks ,bars ,snaps ,webbings , buckles, HOOK,brass buckles, clips , hooks ,straps , accessories ,brass snaps ,brass gear , stainless steel hardware,steel harness ,steel wires, wire rings , wire buckles ,brass gears ,brass products ,iron metal hardware ,Fasteners,safety buckles,brass buckles, brass rings ,belts,buckles ,Ring ,Squares ,straps ,sliders,Bars Rings,Buckles, Cord locks, Hooks & Rings,Tabs, Locks and Slides,magnent buttons,carabiners,trigger ,buttons ,snaps ,fasteners, webbings , webbing buckles, buttons,Hook Loop ，textile accessories,safety buckles,sliders,snaps ,Rings ,chains,suspenders as required for our global clients . We are manufactory, we are the source, our price is very competitive ,you will get the best price , We have profuse designs with series quality grade, and expressly. Our factory always produce customer designs and drawing , if you have any products looking please let me know we could surely make for you Sincerely hope could work with you ! Best regards Su

9 months, 3 weeks

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror July 2024