July 2024 - Linux-stable-mirror

[PATCH 6.1 1/2] wifi: mac80211: Allow NSS change only up to capability

by Hauke Mehrtens

From: Rameshkumar Sundaram <quic_ramess(a)quicinc.com> [ Upstream commit 57b341e9ab13e5688491bfd54f8b5502416c8905 ] Stations can update bandwidth/NSS change in VHT action frame with action type Operating Mode Notification. (IEEE Std 802.11-2020 - 9.4.1.53 Operating Mode field) For Operating Mode Notification, an RX NSS change to a value greater than AP's maximum NSS should not be allowed. During fuzz testing, by forcefully sending VHT Op. mode notif. frames from STA with random rx_nss values, it is found that AP accepts rx_nss values greater that APs maximum NSS instead of discarding such NSS change. Hence allow NSS change only up to maximum NSS that is negotiated and capped to AP's capability during association. Signed-off-by: Rameshkumar Sundaram <quic_ramess(a)quicinc.com> Link: https://lore.kernel.org/r/20230207114146.10567-1-quic_ramess@quicinc.com Signed-off-by: Johannes Berg <johannes.berg(a)intel.com> Signed-off-by: Hauke Mehrtens <hauke(a)hauke-m.de> --- net/mac80211/vht.c | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/net/mac80211/vht.c b/net/mac80211/vht.c index f7526be8a1c7..b3a5c3e96a72 100644 --- a/net/mac80211/vht.c +++ b/net/mac80211/vht.c @@ -637,7 +637,7 @@ u32 __ieee80211_vht_handle_opmode(struct ieee80211_sub_if_data *sdata, enum ieee80211_sta_rx_bandwidth new_bw; struct sta_opmode_info sta_opmode = {}; u32 changed = 0; - u8 nss; + u8 nss, cur_nss; /* ignore - no support for BF yet */ if (opmode & IEEE80211_OPMODE_NOTIF_RX_NSS_TYPE_BF) @@ -648,10 +648,25 @@ u32 __ieee80211_vht_handle_opmode(struct ieee80211_sub_if_data *sdata, nss += 1; if (link_sta->pub->rx_nss != nss) { - link_sta->pub->rx_nss = nss; - sta_opmode.rx_nss = nss; - changed |= IEEE80211_RC_NSS_CHANGED; - sta_opmode.changed |= STA_OPMODE_N_SS_CHANGED; + cur_nss = link_sta->pub->rx_nss; + /* Reset rx_nss and call ieee80211_sta_set_rx_nss() which + * will set the same to max nss value calculated based on capability. + */ + link_sta->pub->rx_nss = 0; + ieee80211_sta_set_rx_nss(link_sta); + /* Do not allow an nss change to rx_nss greater than max_nss + * negotiated and capped to APs capability during association. + */ + if (nss <= link_sta->pub->rx_nss) { + link_sta->pub->rx_nss = nss; + sta_opmode.rx_nss = nss; + changed |= IEEE80211_RC_NSS_CHANGED; + sta_opmode.changed |= STA_OPMODE_N_SS_CHANGED; + } else { + link_sta->pub->rx_nss = cur_nss; + pr_warn_ratelimited("Ignoring NSS change in VHT Operating Mode Notification from %pM with invalid nss %d", + link_sta->pub->addr, nss); + } } switch (opmode & IEEE80211_OPMODE_NOTIF_CHANWIDTH_MASK) { -- 2.45.2

1 year, 1 month

1
1
0 0

[PATCH v3 0/3] ALSA: firewire-lib: restore process context workqueue to prevent deadlock

by Edmund Raile

This patchset serves to prevent an AB/BA deadlock: thread 0: * (lock A) acquire substream lock by snd_pcm_stream_lock_irq() in snd_pcm_status64() * (lock B) wait for tasklet to finish by calling tasklet_unlock_spin_wait() in tasklet_disable_in_atomic() in ohci_flush_iso_completions() of ohci.c thread 1: * (lock B) enter tasklet * (lock A) attempt to acquire substream lock, waiting for it to be released: snd_pcm_stream_lock_irqsave() in snd_pcm_period_elapsed() in update_pcm_pointers() in process_ctx_payloads() in process_rx_packets() of amdtp-stream.c ? tasklet_unlock_spin_wait </NMI> <TASK> ohci_flush_iso_completions firewire_ohci amdtp_domain_stream_pcm_pointer snd_firewire_lib snd_pcm_update_hw_ptr0 snd_pcm snd_pcm_status64 snd_pcm ? native_queued_spin_lock_slowpath </NMI> <IRQ> _raw_spin_lock_irqsave snd_pcm_period_elapsed snd_pcm process_rx_packets snd_firewire_lib irq_target_callback snd_firewire_lib handle_it_packet firewire_ohci context_tasklet firewire_ohci The issue has been reported as a regression of kernel 5.14: Link: https://lore.kernel.org/regressions/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg7… ("[REGRESSION] ALSA: firewire-lib: snd_pcm_period_elapsed deadlock with Fireface 800") Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event in process context") removed the process context workqueue from amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove its overhead. Commit b5b519965c4c ("ALSA: firewire-lib: obsolete workqueue for period update") belongs to the same patch series and removed the now-unused workqueue entirely. Though being observed on RME Fireface 800, this issue would affect all Firewire audio interfaces using ohci amdtp + pcm streaming. ALSA streaming, especially under intensive CPU load will reveal this issue the soonest due to issuing more hardIRQs, with time to occurrence ranging from 2 secons to 30 minutes after starting playback. to reproduce the issue: direct ALSA playback to the device: mpv --audio-device=alsa/sysdefault:CARD=Fireface800 Spor-Ignition.flac Time to occurrence: 2s to 30m Likelihood increased by: - high CPU load stress --cpu $(nproc) - switching between applications via workspaces tested with i915 in Xfce PulsaAudio / PipeWire conceal the issue as they run PCM substream without period wakeup mode, issuing less hardIRQs. Cc: stable(a)vger.kernel.org Backport note: Also applies to and fixes on (tested): 6.10.2, 6.9.12, 6.6.43, 6.1.102, 5.15.164 Edmund Raile (3): Revert "ALSA: firewire-lib: obsolete workqueue for period update" Revert "ALSA: firewire-lib: operate for period elapse event in process context" ALSA: firewire-lib: amdtp-stream work queue inline description sound/firewire/amdtp-stream.c | 38 ++++++++++++++++++++++------------- sound/firewire/amdtp-stream.h | 1 + 2 files changed, 25 insertions(+), 14 deletions(-) -- 2.45.2

1 year, 1 month

1
3
0 0

[PATCH v3 0/3] ALSA: firewire-lib: restore process context workqueue to prevent deadlock

by Edmund Raile

This patchset serves to prevent an AB/BA deadlock: thread 0: * (lock A) acquire substream lock by snd_pcm_stream_lock_irq() in snd_pcm_status64() * (lock B) wait for tasklet to finish by calling tasklet_unlock_spin_wait() in tasklet_disable_in_atomic() in ohci_flush_iso_completions() of ohci.c thread 1: * (lock B) enter tasklet * (lock A) attempt to acquire substream lock, waiting for it to be released: snd_pcm_stream_lock_irqsave() in snd_pcm_period_elapsed() in update_pcm_pointers() in process_ctx_payloads() in process_rx_packets() of amdtp-stream.c ? tasklet_unlock_spin_wait </NMI> <TASK> ohci_flush_iso_completions firewire_ohci amdtp_domain_stream_pcm_pointer snd_firewire_lib snd_pcm_update_hw_ptr0 snd_pcm snd_pcm_status64 snd_pcm ? native_queued_spin_lock_slowpath </NMI> <IRQ> _raw_spin_lock_irqsave snd_pcm_period_elapsed snd_pcm process_rx_packets snd_firewire_lib irq_target_callback snd_firewire_lib handle_it_packet firewire_ohci context_tasklet firewire_ohci The issue has been reported as a regression of kernel 5.14: Link: https://lore.kernel.org/regressions/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg7… ("[REGRESSION] ALSA: firewire-lib: snd_pcm_period_elapsed deadlock with Fireface 800") Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event in process context") removed the process context workqueue from amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove its overhead. Commit b5b519965c4c ("ALSA: firewire-lib: obsolete workqueue for period update") belongs to the same patch series and removed the now-unused workqueue entirely. Though being observed on RME Fireface 800, this issue would affect all Firewire audio interfaces using ohci amdtp + pcm streaming. ALSA streaming, especially under intensive CPU load will reveal this issue the soonest due to issuing more hardIRQs, with time to occurrence ranging from 2 secons to 30 minutes after starting playback. to reproduce the issue: direct ALSA playback to the device: mpv --audio-device=alsa/sysdefault:CARD=Fireface800 Spor-Ignition.flac Time to occurrence: 2s to 30m Likelihood increased by: - high CPU load stress --cpu $(nproc) - switching between applications via workspaces tested with i915 in Xfce PulsaAudio / PipeWire conceal the issue as they run PCM substream without period wakeup mode, issuing less hardIRQs. Cc: stable(a)vger.kernel.org Backport note: Also applies to and fixes on (tested): 6.10.2, 6.9.12, 6.6.43, 6.1.102, 5.15.164 Edmund Raile (3): Revert "ALSA: firewire-lib: obsolete workqueue for period update" Revert "ALSA: firewire-lib: operate for period elapse event in process context" ALSA: firewire-lib: amdtp-stream work queue inline description sound/firewire/amdtp-stream.c | 38 ++++++++++++++++++++++------------- sound/firewire/amdtp-stream.h | 1 + 2 files changed, 25 insertions(+), 14 deletions(-) -- 2.45.2

1 year, 1 month

1
3
0 0

FAILED: patch "[PATCH] fuse: verify {g,u}id mount options correctly" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y git checkout FETCH_HEAD git cherry-pick -x 525bd65aa759ec320af1dc06e114ed69733e9e23 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072914-sierra-aflutter-e231@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^.. Possible dependencies: 525bd65aa759 ("fuse: verify {g,u}id mount options correctly") 84c215075b57 ("fuse: name fs_context consistently") 1dd539577c42 ("virtiofs: add a mount option to enable dax") f4fd4ae354ba ("virtiofs: get rid of no_mount_options") b330966f79fb ("fuse: reject options on reconfigure via fsconfig(2)") e8b20a474cf2 ("fuse: ignore 'data' argument of mount(..., MS_REMOUNT)") 0189a2d367f4 ("fuse: use ->reconfigure() instead of ->remount_fs()") 7fd3abfa8dd7 ("virtiofs: do not use fuse_fill_super_common() for device installation") c9d35ee049b4 ("Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 525bd65aa759ec320af1dc06e114ed69733e9e23 Mon Sep 17 00:00:00 2001 From: Eric Sandeen <sandeen(a)redhat.com> Date: Tue, 2 Jul 2024 17:22:41 -0500 Subject: [PATCH] fuse: verify {g,u}id mount options correctly As was done in 0200679fc795 ("tmpfs: verify {g,u}id mount options correctly") we need to validate that the requested uid and/or gid is representable in the filesystem's idmapping. Cribbing from the above commit log, The contract for {g,u}id mount options and {g,u}id values in general set from userspace has always been that they are translated according to the caller's idmapping. In so far, fuse has been doing the correct thing. But since fuse is mountable in unprivileged contexts it is also necessary to verify that the resulting {k,g}uid is representable in the namespace of the superblock. Fixes: c30da2e981a7 ("fuse: convert to use the new mount API") Cc: stable(a)vger.kernel.org # 5.4+ Signed-off-by: Eric Sandeen <sandeen(a)redhat.com> Link: https://lore.kernel.org/r/8f07d45d-c806-484d-a2e3-7a2199df1cd2@redhat.com Reviewed-by: Christian Brauner <brauner(a)kernel.org> Reviewed-by: Josef Bacik <josef(a)toxicpanda.com> Signed-off-by: Christian Brauner <brauner(a)kernel.org> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 99e44ea7d875..32fe6fa72f46 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -755,6 +755,8 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param) struct fs_parse_result result; struct fuse_fs_context *ctx = fsc->fs_private; int opt; + kuid_t kuid; + kgid_t kgid; if (fsc->purpose == FS_CONTEXT_FOR_RECONFIGURE) { /* @@ -799,16 +801,30 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param) break; case OPT_USER_ID: - ctx->user_id = make_kuid(fsc->user_ns, result.uint_32); - if (!uid_valid(ctx->user_id)) + kuid = make_kuid(fsc->user_ns, result.uint_32); + if (!uid_valid(kuid)) return invalfc(fsc, "Invalid user_id"); + /* + * The requested uid must be representable in the + * filesystem's idmapping. + */ + if (!kuid_has_mapping(fsc->user_ns, kuid)) + return invalfc(fsc, "Invalid user_id"); + ctx->user_id = kuid; ctx->user_id_present = true; break; case OPT_GROUP_ID: - ctx->group_id = make_kgid(fsc->user_ns, result.uint_32); - if (!gid_valid(ctx->group_id)) + kgid = make_kgid(fsc->user_ns, result.uint_32);; + if (!gid_valid(kgid)) return invalfc(fsc, "Invalid group_id"); + /* + * The requested gid must be representable in the + * filesystem's idmapping. + */ + if (!kgid_has_mapping(fsc->user_ns, kgid)) + return invalfc(fsc, "Invalid group_id"); + ctx->group_id = kgid; ctx->group_id_present = true; break;

1 year, 1 month

2
1
0 0

[PATCH] arm64: dts: rockchip: Correct the Pinebook Pro battery design capacity

by Dragan Simic

All batches of the Pine64 Pinebook Pro, except the latest batch (as of 2024) whose hardware design was revised due to the component shortage, use a 1S lithium battery whose nominal/design capacity is 10,000 mAh, according to the battery datasheet. [1][2] Let's correct the design full-charge value in the Pinebook Pro board dts, to improve the accuracy of the hardware description, and to hopefully improve the accuracy of the fuel gauge a bit on all units that don't belong to the latest batch. The above-mentioned latest batch uses a different 1S lithium battery with a slightly lower capacity, more precisely 9,600 mAh. To make the fuel gauge work reliably on the latest batch, a sample battery would need to be sent to CellWise, to obtain its proprietary battery profile, whose data goes into "cellwise,battery-profile" in the Pinebook Pro board dts. Without that data, the fuel gauge reportedly works unreliably, so changing the design capacity won't have any negative effects on the already unreliable operation of the fuel gauge in the Pinebook Pros that belong to the latest batch. According to the battery datasheet, its voltage can go as low as 2.75 V while discharging, but it's better to leave the current 3.0 V value in the dts file, because of the associated Pinebook Pro's voltage regulation issues. [1] https://wiki.pine64.org/index.php/Pinebook_Pro#Battery [2] https://files.pine64.org/doc/datasheet/pinebook/40110175P%203.8V%2010000mAh… Fixes: c7c4d698cd28 ("arm64: dts: rockchip: add fuel gauge to Pinebook Pro dts") Cc: stable(a)vger.kernel.org Cc: Marek Kraus <gamiee(a)pine64.org> Signed-off-by: Dragan Simic <dsimic(a)manjaro.org> --- arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts b/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts index 294eb2de263d..c918968714e4 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts +++ b/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts @@ -37,7 +37,7 @@ backlight: edp-backlight { bat: battery { compatible = "simple-battery"; - charge-full-design-microamp-hours = <9800000>; + charge-full-design-microamp-hours = <10000000>; voltage-max-design-microvolt = <4350000>; voltage-min-design-microvolt = <3000000>; };

1 year, 1 month

2
1
0 0

Re: Patch "drm: zynqmp_dpsub: Always register bridge" has been added to the 6.9-stable tree

by Sean Anderson

Hi Sasha, Please also pick [1] when it is applied. --Sean [1] https://lore.kernel.org/all/974d1b062d7c61ee6db00d16fa7c69aa1218ee02.171619… On 6/3/24 07:46, Sasha Levin wrote: > This is a note to let you know that I've just added the patch titled > > drm: zynqmp_dpsub: Always register bridge > > to the 6.9-stable tree which can be found at: > http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… > > The filename of the patch is: > drm-zynqmp_dpsub-always-register-bridge.patch > and it can be found in the queue-6.9 subdirectory. > > If you, or anyone else, feels it should not be added to the stable tree, > please let <stable(a)vger.kernel.org> know about it. > > > > commit d6817dc61b3b6a1e4f098b25109e7f2ecaaf0ee8 > Author: Sean Anderson <sean.anderson(a)linux.dev> > Date: Fri Mar 8 15:47:41 2024 -0500 > > drm: zynqmp_dpsub: Always register bridge > > [ Upstream commit be3f3042391d061cfca2bd22630e0d101acea5fc ] > > We must always register the DRM bridge, since zynqmp_dp_hpd_work_func > calls drm_bridge_hpd_notify, which in turn expects hpd_mutex to be > initialized. We do this before zynqmp_dpsub_drm_init since that calls > drm_bridge_attach. This fixes the following lockdep warning: > > [ 19.217084] ------------[ cut here ]------------ > [ 19.227530] DEBUG_LOCKS_WARN_ON(lock->magic != lock) > [ 19.227768] WARNING: CPU: 0 PID: 140 at kernel/locking/mutex.c:582 __mutex_lock+0x4bc/0x550 > [ 19.241696] Modules linked in: > [ 19.244937] CPU: 0 PID: 140 Comm: kworker/0:4 Not tainted 6.6.20+ #96 > [ 19.252046] Hardware name: xlnx,zynqmp (DT) > [ 19.256421] Workqueue: events zynqmp_dp_hpd_work_func > [ 19.261795] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 19.269104] pc : __mutex_lock+0x4bc/0x550 > [ 19.273364] lr : __mutex_lock+0x4bc/0x550 > [ 19.277592] sp : ffffffc085c5bbe0 > [ 19.281066] x29: ffffffc085c5bbe0 x28: 0000000000000000 x27: ffffff88009417f8 > [ 19.288624] x26: ffffff8800941788 x25: ffffff8800020008 x24: ffffffc082aa3000 > [ 19.296227] x23: ffffffc080d90e3c x22: 0000000000000002 x21: 0000000000000000 > [ 19.303744] x20: 0000000000000000 x19: ffffff88002f5210 x18: 0000000000000000 > [ 19.311295] x17: 6c707369642e3030 x16: 3030613464662072 x15: 0720072007200720 > [ 19.318922] x14: 0000000000000000 x13: 284e4f5f4e524157 x12: 0000000000000001 > [ 19.326442] x11: 0001ffc085c5b940 x10: 0001ff88003f388b x9 : 0001ff88003f3888 > [ 19.334003] x8 : 0001ff88003f3888 x7 : 0000000000000000 x6 : 0000000000000000 > [ 19.341537] x5 : 0000000000000000 x4 : 0000000000001668 x3 : 0000000000000000 > [ 19.349054] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff88003f3880 > [ 19.356581] Call trace: > [ 19.359160] __mutex_lock+0x4bc/0x550 > [ 19.363032] mutex_lock_nested+0x24/0x30 > [ 19.367187] drm_bridge_hpd_notify+0x2c/0x6c > [ 19.371698] zynqmp_dp_hpd_work_func+0x44/0x54 > [ 19.376364] process_one_work+0x3ac/0x988 > [ 19.380660] worker_thread+0x398/0x694 > [ 19.384736] kthread+0x1bc/0x1c0 > [ 19.388241] ret_from_fork+0x10/0x20 > [ 19.392031] irq event stamp: 183 > [ 19.395450] hardirqs last enabled at (183): [<ffffffc0800b9278>] finish_task_switch.isra.0+0xa8/0x2d4 > [ 19.405140] hardirqs last disabled at (182): [<ffffffc081ad3754>] __schedule+0x714/0xd04 > [ 19.413612] softirqs last enabled at (114): [<ffffffc080133de8>] srcu_invoke_callbacks+0x158/0x23c > [ 19.423128] softirqs last disabled at (110): [<ffffffc080133de8>] srcu_invoke_callbacks+0x158/0x23c > [ 19.432614] ---[ end trace 0000000000000000 ]--- > > Fixes: eb2d64bfcc17 ("drm: xlnx: zynqmp_dpsub: Report HPD through the bridge") > Signed-off-by: Sean Anderson <sean.anderson(a)linux.dev> > Reviewed-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com> > Reviewed-by: Tomi Valkeinen <tomi.valkeinen(a)ideasonboard.com> > Signed-off-by: Tomi Valkeinen <tomi.valkeinen(a)ideasonboard.com> > Link: https://patchwork.freedesktop.org/patch/msgid/20240308204741.3631919-1-sean… > (cherry picked from commit 61ba791c4a7a09a370c45b70a81b8c7d4cf6b2ae) > Signed-off-by: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com> > Signed-off-by: Sasha Levin <sashal(a)kernel.org> > > diff --git a/drivers/gpu/drm/xlnx/zynqmp_dpsub.c b/drivers/gpu/drm/xlnx/zynqmp_dpsub.c > index 88eb33acd5f0d..face8d6b2a6fb 100644 > --- a/drivers/gpu/drm/xlnx/zynqmp_dpsub.c > +++ b/drivers/gpu/drm/xlnx/zynqmp_dpsub.c > @@ -256,12 +256,12 @@ static int zynqmp_dpsub_probe(struct platform_device *pdev) > if (ret) > goto err_dp; > > + drm_bridge_add(dpsub->bridge); > + > if (dpsub->dma_enabled) { > ret = zynqmp_dpsub_drm_init(dpsub); > if (ret) > goto err_disp; > - } else { > - drm_bridge_add(dpsub->bridge); > } > > dev_info(&pdev->dev, "ZynqMP DisplayPort Subsystem driver probed"); > @@ -288,9 +288,8 @@ static void zynqmp_dpsub_remove(struct platform_device *pdev) > > if (dpsub->drm) > zynqmp_dpsub_drm_cleanup(dpsub); > - else > - drm_bridge_remove(dpsub->bridge); > > + drm_bridge_remove(dpsub->bridge); > zynqmp_disp_remove(dpsub); > zynqmp_dp_remove(dpsub); >

1 year, 1 month

2
2
0 0

FAILED: patch "[PATCH] fuse: verify {g,u}id mount options correctly" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y git checkout FETCH_HEAD git cherry-pick -x 525bd65aa759ec320af1dc06e114ed69733e9e23 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072908-everglade-starved-66da@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^.. Possible dependencies: 525bd65aa759 ("fuse: verify {g,u}id mount options correctly") 84c215075b57 ("fuse: name fs_context consistently") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 525bd65aa759ec320af1dc06e114ed69733e9e23 Mon Sep 17 00:00:00 2001 From: Eric Sandeen <sandeen(a)redhat.com> Date: Tue, 2 Jul 2024 17:22:41 -0500 Subject: [PATCH] fuse: verify {g,u}id mount options correctly As was done in 0200679fc795 ("tmpfs: verify {g,u}id mount options correctly") we need to validate that the requested uid and/or gid is representable in the filesystem's idmapping. Cribbing from the above commit log, The contract for {g,u}id mount options and {g,u}id values in general set from userspace has always been that they are translated according to the caller's idmapping. In so far, fuse has been doing the correct thing. But since fuse is mountable in unprivileged contexts it is also necessary to verify that the resulting {k,g}uid is representable in the namespace of the superblock. Fixes: c30da2e981a7 ("fuse: convert to use the new mount API") Cc: stable(a)vger.kernel.org # 5.4+ Signed-off-by: Eric Sandeen <sandeen(a)redhat.com> Link: https://lore.kernel.org/r/8f07d45d-c806-484d-a2e3-7a2199df1cd2@redhat.com Reviewed-by: Christian Brauner <brauner(a)kernel.org> Reviewed-by: Josef Bacik <josef(a)toxicpanda.com> Signed-off-by: Christian Brauner <brauner(a)kernel.org> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 99e44ea7d875..32fe6fa72f46 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -755,6 +755,8 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param) struct fs_parse_result result; struct fuse_fs_context *ctx = fsc->fs_private; int opt; + kuid_t kuid; + kgid_t kgid; if (fsc->purpose == FS_CONTEXT_FOR_RECONFIGURE) { /* @@ -799,16 +801,30 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param) break; case OPT_USER_ID: - ctx->user_id = make_kuid(fsc->user_ns, result.uint_32); - if (!uid_valid(ctx->user_id)) + kuid = make_kuid(fsc->user_ns, result.uint_32); + if (!uid_valid(kuid)) return invalfc(fsc, "Invalid user_id"); + /* + * The requested uid must be representable in the + * filesystem's idmapping. + */ + if (!kuid_has_mapping(fsc->user_ns, kuid)) + return invalfc(fsc, "Invalid user_id"); + ctx->user_id = kuid; ctx->user_id_present = true; break; case OPT_GROUP_ID: - ctx->group_id = make_kgid(fsc->user_ns, result.uint_32); - if (!gid_valid(ctx->group_id)) + kgid = make_kgid(fsc->user_ns, result.uint_32);; + if (!gid_valid(kgid)) return invalfc(fsc, "Invalid group_id"); + /* + * The requested gid must be representable in the + * filesystem's idmapping. + */ + if (!kgid_has_mapping(fsc->user_ns, kgid)) + return invalfc(fsc, "Invalid group_id"); + ctx->group_id = kgid; ctx->group_id_present = true; break;

1 year, 1 month

2
3
0 0

[PATCH] scsi: ufs: core: Fix hba->last_dme_cmd_tstamp timestamp updating logic

by Vamshi Gajjela

The ufshcd_add_delay_before_dme_cmd() always introduces a delay of MIN_DELAY_BEFORE_DME_CMDS_US between DME commands even when it's not required. The delay is added when the UFS host controller supplies the quirk UFSHCD_QUIRK_DELAY_BEFORE_DME_CMDS. Fix the logic to update hba->last_dme_cmd_tstamp to ensure subsequent DME commands have the correct delay in the range of 0 to MIN_DELAY_BEFORE_DME_CMDS_US. Update the timestamp at the end of the function to ensure it captures the latest time after any necessary delay has been applied. Signed-off-by: Vamshi Gajjela <vamshigajjela(a)google.com> Fixes: cad2e03d8607 ("ufs: add support to allow non standard behaviours (quirks)") Cc: stable(a)vger.kernel.org --- drivers/ufs/core/ufshcd.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index dc757ba47522..406bda1585f6 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -4090,11 +4090,16 @@ static inline void ufshcd_add_delay_before_dme_cmd(struct ufs_hba *hba) min_sleep_time_us = MIN_DELAY_BEFORE_DME_CMDS_US - delta; else - return; /* no more delay required */ + min_sleep_time_us = 0; /* no more delay required */ } - /* allow sleep for extra 50us if needed */ - usleep_range(min_sleep_time_us, min_sleep_time_us + 50); + if (min_sleep_time_us > 0) { + /* allow sleep for extra 50us if needed */ + usleep_range(min_sleep_time_us, min_sleep_time_us + 50); + } + + /* update the last_dme_cmd_tstamp */ + hba->last_dme_cmd_tstamp = ktime_get(); } /** -- 2.45.2.1089.g2a221341d9-goog

1 year, 1 month

2
1
0 0

FAILED: patch "[PATCH] io_uring: fix lost getsockopt completions" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 24dce1c538a7ceac43f2f97aae8dfd4bb93ea9b9 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072931-sarcastic-coagulant-6821@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: 24dce1c538a7 ("io_uring: fix lost getsockopt completions") d10f19dff56e ("io_uring/uring_cmd: switch to always allocating async data") a9165b83c193 ("io_uring/rw: always setup io_async_rw for read/write requests") 8e5b3b89ecaf ("io_uring: remove struct io_tw_state::locked") 92219afb980e ("io_uring: force tw ctx locking") 6e6b8c62120a ("io_uring/rw: avoid punting to io-wq directly") e1eef2e56cb0 ("io_uring/cmd: fix tw <-> issue_flags conversion") 6edd953b6ec7 ("io_uring/cmd: kill one issue_flags to tw conversion") da12d9ab5889 ("io_uring/cmd: move io_uring_try_cancel_uring_cmd()") 8d0c12a80cde ("io-uring: add napi busy poll support") 405b4dc14b10 ("io-uring: move io_wait_queue definition to header file") da08d2edb020 ("io_uring: re-arrange struct io_ring_ctx to reduce padding") 42c0905f0cac ("io_uring: cleanup handle_tw_list() calling convention") 9fe3eaea4a35 ("io_uring: remove unconditional looping in local task_work handling") 670d9d3df880 ("io_uring: remove next io_kiocb fetch in task_work running") 170539bdf109 ("io_uring: handle traditional task_work in FIFO order") 592b4805432a ("io_uring: remove looping around handling traditional task_work") 95041b93e90a ("io_uring: add io_file_can_poll() helper") 521223d7c229 ("io_uring/cancel: don't default to setting req->work.cancel_seq") 4bcb982cce74 ("io_uring: expand main struct io_kiocb flags to 64-bits") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 24dce1c538a7ceac43f2f97aae8dfd4bb93ea9b9 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov <asml.silence(a)gmail.com> Date: Tue, 16 Jul 2024 19:05:46 +0100 Subject: [PATCH] io_uring: fix lost getsockopt completions There is a report that iowq executed getsockopt never completes. The reason being that io_uring_cmd_sock() can return a positive result, and io_uring_cmd() propagates it back to core io_uring, instead of IOU_OK. In case of io_wq_submit_work(), the request will be dropped without completing it. The offending code was introduced by a hack in a9c3eda7eada9 ("io_uring: fix submission-failure handling for uring-cmd"), however it was fine until getsockopt was introduced and started returning positive results. The right solution is to always return IOU_OK, since e0b23d9953b0c ("io_uring: optimise ltimeout for inline execution"), we should be able to do it without problems, however for the sake of backporting and minimising side effects, let's keep returning negative return codes and otherwise do IOU_OK. Link: https://github.com/axboe/liburing/issues/1181 Cc: stable(a)vger.kernel.org Fixes: 8e9fad0e70b7b ("io_uring: Add io_uring command support for sockets") Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com> Reviewed-by: Breno Leitao <leitao(a)debian.org> Link: https://lore.kernel.org/r/ff349cf0654018189b6077e85feed935f0f8839e.17211498… Signed-off-by: Jens Axboe <axboe(a)kernel.dk> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 21ac5fb2d5f0..a54163a83968 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -265,7 +265,7 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags) req_set_fail(req); io_req_uring_cleanup(req, issue_flags); io_req_set_res(req, ret, 0); - return ret; + return ret < 0 ? ret : IOU_OK; } int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,

1 year, 1 month

3
2
0 0

FAILED: patch "[PATCH] mm: gup: stop abusing try_grab_folio" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x f442fa6141379a20b48ae3efabee827a3d260787 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071541-observing-landline-c9ec@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: f442fa614137 ("mm: gup: stop abusing try_grab_folio") 01d89b93e176 ("mm/gup: fix hugepd handling in hugetlb rework") 9cbe4954c6d9 ("gup: use folios for gup_devmap") 53e45c4f6d4f ("mm: convert put_devmap_managed_page_refs() to put_devmap_managed_folio_refs()") 6785c54a1b43 ("mm: remove put_devmap_managed_page()") 25176ad09ca3 ("mm/treewide: rename CONFIG_HAVE_FAST_GUP to CONFIG_HAVE_GUP_FAST") 23babe1934d7 ("mm/gup: consistently name GUP-fast functions") a12083d721d7 ("mm/gup: handle hugepd for follow_page()") 4418c522f683 ("mm/gup: handle huge pmd for follow_pmd_mask()") 1b1676180246 ("mm/gup: handle huge pud for follow_pud_mask()") caf8cab79857 ("mm/gup: cache *pudp in follow_pud_mask()") 878b0c451621 ("mm/gup: handle hugetlb for no_page_table()") f3c94c625fe3 ("mm/gup: refactor record_subpages() to find 1st small page") 607c63195d63 ("mm/gup: drop gup_fast_folio_allowed() in hugepd processing") f002882ca369 ("mm: merge folio_is_secretmem() and folio_fast_pin_allowed() into gup_fast_folio_allowed()") 1965e933ddeb ("mm/treewide: replace pXd_huge() with pXd_leaf()") 7db86dc389aa ("mm/gup: merge pXd huge mapping checks") 089f92141ed0 ("mm/gup: check p4d presence before going on") e6fd5564c07c ("mm/gup: cache p4d in follow_p4d_mask()") 65291dcfcf89 ("mm/secretmem: fix GUP-fast succeeding on secretmem folios") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f442fa6141379a20b48ae3efabee827a3d260787 Mon Sep 17 00:00:00 2001 From: Yang Shi <yang(a)os.amperecomputing.com> Date: Fri, 28 Jun 2024 12:14:58 -0700 Subject: [PATCH] mm: gup: stop abusing try_grab_folio A kernel warning was reported when pinning folio in CMA memory when launching SEV virtual machine. The splat looks like: [ 464.325306] WARNING: CPU: 13 PID: 6734 at mm/gup.c:1313 __get_user_pages+0x423/0x520 [ 464.325464] CPU: 13 PID: 6734 Comm: qemu-kvm Kdump: loaded Not tainted 6.6.33+ #6 [ 464.325477] RIP: 0010:__get_user_pages+0x423/0x520 [ 464.325515] Call Trace: [ 464.325520] <TASK> [ 464.325523] ? __get_user_pages+0x423/0x520 [ 464.325528] ? __warn+0x81/0x130 [ 464.325536] ? __get_user_pages+0x423/0x520 [ 464.325541] ? report_bug+0x171/0x1a0 [ 464.325549] ? handle_bug+0x3c/0x70 [ 464.325554] ? exc_invalid_op+0x17/0x70 [ 464.325558] ? asm_exc_invalid_op+0x1a/0x20 [ 464.325567] ? __get_user_pages+0x423/0x520 [ 464.325575] __gup_longterm_locked+0x212/0x7a0 [ 464.325583] internal_get_user_pages_fast+0xfb/0x190 [ 464.325590] pin_user_pages_fast+0x47/0x60 [ 464.325598] sev_pin_memory+0xca/0x170 [kvm_amd] [ 464.325616] sev_mem_enc_register_region+0x81/0x130 [kvm_amd] Per the analysis done by yangge, when starting the SEV virtual machine, it will call pin_user_pages_fast(..., FOLL_LONGTERM, ...) to pin the memory. But the page is in CMA area, so fast GUP will fail then fallback to the slow path due to the longterm pinnalbe check in try_grab_folio(). The slow path will try to pin the pages then migrate them out of CMA area. But the slow path also uses try_grab_folio() to pin the page, it will also fail due to the same check then the above warning is triggered. In addition, the try_grab_folio() is supposed to be used in fast path and it elevates folio refcount by using add ref unless zero. We are guaranteed to have at least one stable reference in slow path, so the simple atomic add could be used. The performance difference should be trivial, but the misuse may be confusing and misleading. Redefined try_grab_folio() to try_grab_folio_fast(), and try_grab_page() to try_grab_folio(), and use them in the proper paths. This solves both the abuse and the kernel warning. The proper naming makes their usecase more clear and should prevent from abusing in the future. peterx said: : The user will see the pin fails, for gpu-slow it further triggers the WARN : right below that failure (as in the original report): : : folio = try_grab_folio(page, page_increm - 1, : foll_flags); : if (WARN_ON_ONCE(!folio)) { <------------------------ here : /* : * Release the 1st page ref if the : * folio is problematic, fail hard. : */ : gup_put_folio(page_folio(page), 1, : foll_flags); : ret = -EFAULT; : goto out; : } [1] https://lore.kernel.org/linux-mm/1719478388-31917-1-git-send-email-yangge11… [shy828301(a)gmail.com: fix implicit declaration of function try_grab_folio_fast] Link: https://lkml.kernel.org/r/CAHbLzkowMSso-4Nufc9hcMehQsK9PNz3OSu-+eniU-2Mm-xj… Link: https://lkml.kernel.org/r/20240628191458.2605553-1-yang@os.amperecomputing.… Fixes: 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages != NULL"") Signed-off-by: Yang Shi <yang(a)os.amperecomputing.com> Reported-by: yangge <yangge1116(a)126.com> Cc: Christoph Hellwig <hch(a)infradead.org> Cc: David Hildenbrand <david(a)redhat.com> Cc: Peter Xu <peterx(a)redhat.com> Cc: <stable(a)vger.kernel.org> [6.6+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/gup.c b/mm/gup.c index 469799f805f1..f1d6bc06eb52 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -97,95 +97,6 @@ static inline struct folio *try_get_folio(struct page *page, int refs) return folio; } -/** - * try_grab_folio() - Attempt to get or pin a folio. - * @page: pointer to page to be grabbed - * @refs: the value to (effectively) add to the folio's refcount - * @flags: gup flags: these are the FOLL_* flag values. - * - * "grab" names in this file mean, "look at flags to decide whether to use - * FOLL_PIN or FOLL_GET behavior, when incrementing the folio's refcount. - * - * Either FOLL_PIN or FOLL_GET (or neither) must be set, but not both at the - * same time. (That's true throughout the get_user_pages*() and - * pin_user_pages*() APIs.) Cases: - * - * FOLL_GET: folio's refcount will be incremented by @refs. - * - * FOLL_PIN on large folios: folio's refcount will be incremented by - * @refs, and its pincount will be incremented by @refs. - * - * FOLL_PIN on single-page folios: folio's refcount will be incremented by - * @refs * GUP_PIN_COUNTING_BIAS. - * - * Return: The folio containing @page (with refcount appropriately - * incremented) for success, or NULL upon failure. If neither FOLL_GET - * nor FOLL_PIN was set, that's considered failure, and furthermore, - * a likely bug in the caller, so a warning is also emitted. - */ -struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) -{ - struct folio *folio; - - if (WARN_ON_ONCE((flags & (FOLL_GET | FOLL_PIN)) == 0)) - return NULL; - - if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page))) - return NULL; - - if (flags & FOLL_GET) - return try_get_folio(page, refs); - - /* FOLL_PIN is set */ - - /* - * Don't take a pin on the zero page - it's not going anywhere - * and it is used in a *lot* of places. - */ - if (is_zero_page(page)) - return page_folio(page); - - folio = try_get_folio(page, refs); - if (!folio) - return NULL; - - /* - * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a - * right zone, so fail and let the caller fall back to the slow - * path. - */ - if (unlikely((flags & FOLL_LONGTERM) && - !folio_is_longterm_pinnable(folio))) { - if (!put_devmap_managed_folio_refs(folio, refs)) - folio_put_refs(folio, refs); - return NULL; - } - - /* - * When pinning a large folio, use an exact count to track it. - * - * However, be sure to *also* increment the normal folio - * refcount field at least once, so that the folio really - * is pinned. That's why the refcount from the earlier - * try_get_folio() is left intact. - */ - if (folio_test_large(folio)) - atomic_add(refs, &folio->_pincount); - else - folio_ref_add(folio, - refs * (GUP_PIN_COUNTING_BIAS - 1)); - /* - * Adjust the pincount before re-checking the PTE for changes. - * This is essentially a smp_mb() and is paired with a memory - * barrier in folio_try_share_anon_rmap_*(). - */ - smp_mb__after_atomic(); - - node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs); - - return folio; -} - static void gup_put_folio(struct folio *folio, int refs, unsigned int flags) { if (flags & FOLL_PIN) { @@ -203,58 +114,59 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags) } /** - * try_grab_page() - elevate a page's refcount by a flag-dependent amount - * @page: pointer to page to be grabbed - * @flags: gup flags: these are the FOLL_* flag values. + * try_grab_folio() - add a folio's refcount by a flag-dependent amount + * @folio: pointer to folio to be grabbed + * @refs: the value to (effectively) add to the folio's refcount + * @flags: gup flags: these are the FOLL_* flag values * * This might not do anything at all, depending on the flags argument. * * "grab" names in this file mean, "look at flags to decide whether to use - * FOLL_PIN or FOLL_GET behavior, when incrementing the page's refcount. + * FOLL_PIN or FOLL_GET behavior, when incrementing the folio's refcount. * * Either FOLL_PIN or FOLL_GET (or neither) may be set, but not both at the same - * time. Cases: please see the try_grab_folio() documentation, with - * "refs=1". + * time. * * Return: 0 for success, or if no action was required (if neither FOLL_PIN * nor FOLL_GET was set, nothing is done). A negative error code for failure: * - * -ENOMEM FOLL_GET or FOLL_PIN was set, but the page could not + * -ENOMEM FOLL_GET or FOLL_PIN was set, but the folio could not * be grabbed. + * + * It is called when we have a stable reference for the folio, typically in + * GUP slow path. */ -int __must_check try_grab_page(struct page *page, unsigned int flags) +int __must_check try_grab_folio(struct folio *folio, int refs, + unsigned int flags) { - struct folio *folio = page_folio(page); - if (WARN_ON_ONCE(folio_ref_count(folio) <= 0)) return -ENOMEM; - if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page))) + if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(&folio->page))) return -EREMOTEIO; if (flags & FOLL_GET) - folio_ref_inc(folio); + folio_ref_add(folio, refs); else if (flags & FOLL_PIN) { /* * Don't take a pin on the zero page - it's not going anywhere * and it is used in a *lot* of places. */ - if (is_zero_page(page)) + if (is_zero_folio(folio)) return 0; /* - * Similar to try_grab_folio(): be sure to *also* - * increment the normal page refcount field at least once, + * Increment the normal page refcount field at least once, * so that the page really is pinned. */ if (folio_test_large(folio)) { - folio_ref_add(folio, 1); - atomic_add(1, &folio->_pincount); + folio_ref_add(folio, refs); + atomic_add(refs, &folio->_pincount); } else { - folio_ref_add(folio, GUP_PIN_COUNTING_BIAS); + folio_ref_add(folio, refs * GUP_PIN_COUNTING_BIAS); } - node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1); + node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs); } return 0; @@ -515,6 +427,102 @@ static int record_subpages(struct page *page, unsigned long sz, return nr; } + +/** + * try_grab_folio_fast() - Attempt to get or pin a folio in fast path. + * @page: pointer to page to be grabbed + * @refs: the value to (effectively) add to the folio's refcount + * @flags: gup flags: these are the FOLL_* flag values. + * + * "grab" names in this file mean, "look at flags to decide whether to use + * FOLL_PIN or FOLL_GET behavior, when incrementing the folio's refcount. + * + * Either FOLL_PIN or FOLL_GET (or neither) must be set, but not both at the + * same time. (That's true throughout the get_user_pages*() and + * pin_user_pages*() APIs.) Cases: + * + * FOLL_GET: folio's refcount will be incremented by @refs. + * + * FOLL_PIN on large folios: folio's refcount will be incremented by + * @refs, and its pincount will be incremented by @refs. + * + * FOLL_PIN on single-page folios: folio's refcount will be incremented by + * @refs * GUP_PIN_COUNTING_BIAS. + * + * Return: The folio containing @page (with refcount appropriately + * incremented) for success, or NULL upon failure. If neither FOLL_GET + * nor FOLL_PIN was set, that's considered failure, and furthermore, + * a likely bug in the caller, so a warning is also emitted. + * + * It uses add ref unless zero to elevate the folio refcount and must be called + * in fast path only. + */ +static struct folio *try_grab_folio_fast(struct page *page, int refs, + unsigned int flags) +{ + struct folio *folio; + + /* Raise warn if it is not called in fast GUP */ + VM_WARN_ON_ONCE(!irqs_disabled()); + + if (WARN_ON_ONCE((flags & (FOLL_GET | FOLL_PIN)) == 0)) + return NULL; + + if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page))) + return NULL; + + if (flags & FOLL_GET) + return try_get_folio(page, refs); + + /* FOLL_PIN is set */ + + /* + * Don't take a pin on the zero page - it's not going anywhere + * and it is used in a *lot* of places. + */ + if (is_zero_page(page)) + return page_folio(page); + + folio = try_get_folio(page, refs); + if (!folio) + return NULL; + + /* + * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a + * right zone, so fail and let the caller fall back to the slow + * path. + */ + if (unlikely((flags & FOLL_LONGTERM) && + !folio_is_longterm_pinnable(folio))) { + if (!put_devmap_managed_folio_refs(folio, refs)) + folio_put_refs(folio, refs); + return NULL; + } + + /* + * When pinning a large folio, use an exact count to track it. + * + * However, be sure to *also* increment the normal folio + * refcount field at least once, so that the folio really + * is pinned. That's why the refcount from the earlier + * try_get_folio() is left intact. + */ + if (folio_test_large(folio)) + atomic_add(refs, &folio->_pincount); + else + folio_ref_add(folio, + refs * (GUP_PIN_COUNTING_BIAS - 1)); + /* + * Adjust the pincount before re-checking the PTE for changes. + * This is essentially a smp_mb() and is paired with a memory + * barrier in folio_try_share_anon_rmap_*(). + */ + smp_mb__after_atomic(); + + node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs); + + return folio; +} #endif /* CONFIG_ARCH_HAS_HUGEPD || CONFIG_HAVE_GUP_FAST */ #ifdef CONFIG_ARCH_HAS_HUGEPD @@ -535,7 +543,7 @@ static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end, */ static int gup_hugepte(struct vm_area_struct *vma, pte_t *ptep, unsigned long sz, unsigned long addr, unsigned long end, unsigned int flags, - struct page **pages, int *nr) + struct page **pages, int *nr, bool fast) { unsigned long pte_end; struct page *page; @@ -558,9 +566,15 @@ static int gup_hugepte(struct vm_area_struct *vma, pte_t *ptep, unsigned long sz page = pte_page(pte); refs = record_subpages(page, sz, addr, end, pages + *nr); - folio = try_grab_folio(page, refs, flags); - if (!folio) - return 0; + if (fast) { + folio = try_grab_folio_fast(page, refs, flags); + if (!folio) + return 0; + } else { + folio = page_folio(page); + if (try_grab_folio(folio, refs, flags)) + return 0; + } if (unlikely(pte_val(pte) != pte_val(ptep_get(ptep)))) { gup_put_folio(folio, refs, flags); @@ -588,7 +602,7 @@ static int gup_hugepte(struct vm_area_struct *vma, pte_t *ptep, unsigned long sz static int gup_hugepd(struct vm_area_struct *vma, hugepd_t hugepd, unsigned long addr, unsigned int pdshift, unsigned long end, unsigned int flags, - struct page **pages, int *nr) + struct page **pages, int *nr, bool fast) { pte_t *ptep; unsigned long sz = 1UL << hugepd_shift(hugepd); @@ -598,7 +612,8 @@ static int gup_hugepd(struct vm_area_struct *vma, hugepd_t hugepd, ptep = hugepte_offset(hugepd, addr, pdshift); do { next = hugepte_addr_end(addr, end, sz); - ret = gup_hugepte(vma, ptep, sz, addr, end, flags, pages, nr); + ret = gup_hugepte(vma, ptep, sz, addr, end, flags, pages, nr, + fast); if (ret != 1) return ret; } while (ptep++, addr = next, addr != end); @@ -625,7 +640,7 @@ static struct page *follow_hugepd(struct vm_area_struct *vma, hugepd_t hugepd, ptep = hugepte_offset(hugepd, addr, pdshift); ptl = huge_pte_lock(h, vma->vm_mm, ptep); ret = gup_hugepd(vma, hugepd, addr, pdshift, addr + PAGE_SIZE, - flags, &page, &nr); + flags, &page, &nr, false); spin_unlock(ptl); if (ret == 1) { @@ -642,7 +657,7 @@ static struct page *follow_hugepd(struct vm_area_struct *vma, hugepd_t hugepd, static inline int gup_hugepd(struct vm_area_struct *vma, hugepd_t hugepd, unsigned long addr, unsigned int pdshift, unsigned long end, unsigned int flags, - struct page **pages, int *nr) + struct page **pages, int *nr, bool fast) { return 0; } @@ -729,7 +744,7 @@ static struct page *follow_huge_pud(struct vm_area_struct *vma, gup_must_unshare(vma, flags, page)) return ERR_PTR(-EMLINK); - ret = try_grab_page(page, flags); + ret = try_grab_folio(page_folio(page), 1, flags); if (ret) page = ERR_PTR(ret); else @@ -806,7 +821,7 @@ static struct page *follow_huge_pmd(struct vm_area_struct *vma, VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) && !PageAnonExclusive(page), page); - ret = try_grab_page(page, flags); + ret = try_grab_folio(page_folio(page), 1, flags); if (ret) return ERR_PTR(ret); @@ -968,8 +983,8 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) && !PageAnonExclusive(page), page); - /* try_grab_page() does nothing unless FOLL_GET or FOLL_PIN is set. */ - ret = try_grab_page(page, flags); + /* try_grab_folio() does nothing unless FOLL_GET or FOLL_PIN is set. */ + ret = try_grab_folio(page_folio(page), 1, flags); if (unlikely(ret)) { page = ERR_PTR(ret); goto out; @@ -1233,7 +1248,7 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address, goto unmap; *page = pte_page(entry); } - ret = try_grab_page(*page, gup_flags); + ret = try_grab_folio(page_folio(*page), 1, gup_flags); if (unlikely(ret)) goto unmap; out: @@ -1636,20 +1651,19 @@ static long __get_user_pages(struct mm_struct *mm, * pages. */ if (page_increm > 1) { - struct folio *folio; + struct folio *folio = page_folio(page); /* * Since we already hold refcount on the * large folio, this should never fail. */ - folio = try_grab_folio(page, page_increm - 1, - foll_flags); - if (WARN_ON_ONCE(!folio)) { + if (try_grab_folio(folio, page_increm - 1, + foll_flags)) { /* * Release the 1st page ref if the * folio is problematic, fail hard. */ - gup_put_folio(page_folio(page), 1, + gup_put_folio(folio, 1, foll_flags); ret = -EFAULT; goto out; @@ -2797,7 +2811,6 @@ EXPORT_SYMBOL(get_user_pages_unlocked); * This code is based heavily on the PowerPC implementation by Nick Piggin. */ #ifdef CONFIG_HAVE_GUP_FAST - /* * Used in the GUP-fast path to determine whether GUP is permitted to work on * a specific folio. @@ -2962,7 +2975,7 @@ static int gup_fast_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr, VM_BUG_ON(!pfn_valid(pte_pfn(pte))); page = pte_page(pte); - folio = try_grab_folio(page, 1, flags); + folio = try_grab_folio_fast(page, 1, flags); if (!folio) goto pte_unmap; @@ -3049,7 +3062,7 @@ static int gup_fast_devmap_leaf(unsigned long pfn, unsigned long addr, break; } - folio = try_grab_folio(page, 1, flags); + folio = try_grab_folio_fast(page, 1, flags); if (!folio) { gup_fast_undo_dev_pagemap(nr, nr_start, flags, pages); break; @@ -3138,7 +3151,7 @@ static int gup_fast_pmd_leaf(pmd_t orig, pmd_t *pmdp, unsigned long addr, page = pmd_page(orig); refs = record_subpages(page, PMD_SIZE, addr, end, pages + *nr); - folio = try_grab_folio(page, refs, flags); + folio = try_grab_folio_fast(page, refs, flags); if (!folio) return 0; @@ -3182,7 +3195,7 @@ static int gup_fast_pud_leaf(pud_t orig, pud_t *pudp, unsigned long addr, page = pud_page(orig); refs = record_subpages(page, PUD_SIZE, addr, end, pages + *nr); - folio = try_grab_folio(page, refs, flags); + folio = try_grab_folio_fast(page, refs, flags); if (!folio) return 0; @@ -3222,7 +3235,7 @@ static int gup_fast_pgd_leaf(pgd_t orig, pgd_t *pgdp, unsigned long addr, page = pgd_page(orig); refs = record_subpages(page, PGDIR_SIZE, addr, end, pages + *nr); - folio = try_grab_folio(page, refs, flags); + folio = try_grab_folio_fast(page, refs, flags); if (!folio) return 0; @@ -3276,7 +3289,8 @@ static int gup_fast_pmd_range(pud_t *pudp, pud_t pud, unsigned long addr, * pmd format and THP pmd format */ if (gup_hugepd(NULL, __hugepd(pmd_val(pmd)), addr, - PMD_SHIFT, next, flags, pages, nr) != 1) + PMD_SHIFT, next, flags, pages, nr, + true) != 1) return 0; } else if (!gup_fast_pte_range(pmd, pmdp, addr, next, flags, pages, nr)) @@ -3306,7 +3320,8 @@ static int gup_fast_pud_range(p4d_t *p4dp, p4d_t p4d, unsigned long addr, return 0; } else if (unlikely(is_hugepd(__hugepd(pud_val(pud))))) { if (gup_hugepd(NULL, __hugepd(pud_val(pud)), addr, - PUD_SHIFT, next, flags, pages, nr) != 1) + PUD_SHIFT, next, flags, pages, nr, + true) != 1) return 0; } else if (!gup_fast_pmd_range(pudp, pud, addr, next, flags, pages, nr)) @@ -3333,7 +3348,8 @@ static int gup_fast_p4d_range(pgd_t *pgdp, pgd_t pgd, unsigned long addr, BUILD_BUG_ON(p4d_leaf(p4d)); if (unlikely(is_hugepd(__hugepd(p4d_val(p4d))))) { if (gup_hugepd(NULL, __hugepd(p4d_val(p4d)), addr, - P4D_SHIFT, next, flags, pages, nr) != 1) + P4D_SHIFT, next, flags, pages, nr, + true) != 1) return 0; } else if (!gup_fast_pud_range(p4dp, p4d, addr, next, flags, pages, nr)) @@ -3362,7 +3378,8 @@ static void gup_fast_pgd_range(unsigned long addr, unsigned long end, return; } else if (unlikely(is_hugepd(__hugepd(pgd_val(pgd))))) { if (gup_hugepd(NULL, __hugepd(pgd_val(pgd)), addr, - PGDIR_SHIFT, next, flags, pages, nr) != 1) + PGDIR_SHIFT, next, flags, pages, nr, + true) != 1) return; } else if (!gup_fast_p4d_range(pgdp, pgd, addr, next, flags, pages, nr)) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index db7946a0a28c..2120f7478e55 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1331,7 +1331,7 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, if (!*pgmap) return ERR_PTR(-EFAULT); page = pfn_to_page(pfn); - ret = try_grab_page(page, flags); + ret = try_grab_folio(page_folio(page), 1, flags); if (ret) page = ERR_PTR(ret); diff --git a/mm/internal.h b/mm/internal.h index 6902b7dd8509..cc2c5e07fad3 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1182,8 +1182,8 @@ int migrate_device_coherent_page(struct page *page); /* * mm/gup.c */ -struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags); -int __must_check try_grab_page(struct page *page, unsigned int flags); +int __must_check try_grab_folio(struct folio *folio, int refs, + unsigned int flags); /* * mm/huge_memory.c

1 year, 1 month

3
2
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror July 2024