January 2022 - Linux-stable-mirror

[PATCH v1] drivers/base/memory: add memory block to memory group after registration succeeded

by David Hildenbrand

If register_memory() fails, we freed the memory block but already added the memory block to the group list, not good. Let's defer adding the block to the memory group to after registering the memory block device. We do handle it properly during unregister_memory(), but that's not called when the registration fails. Fixes: 028fc57a1c36 ("drivers/base/memory: introduce "memory groups" to logically group memory blocks") Cc: stable(a)vger.kernel.org # v5.15+ Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael(a)kernel.org> Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Oscar Salvador <osalvador(a)suse.de> Signed-off-by: David Hildenbrand <david(a)redhat.com> --- drivers/base/memory.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 365cd4a7f239..60c38f9cf1a7 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -663,14 +663,16 @@ static int init_memory_block(unsigned long block_id, unsigned long state, mem->nr_vmemmap_pages = nr_vmemmap_pages; INIT_LIST_HEAD(&mem->group_next); + ret = register_memory(mem); + if (ret) + return ret; + if (group) { mem->group = group; list_add(&mem->group_next, &group->memory_blocks); } - ret = register_memory(mem); - - return ret; + return 0; } static int add_memory_block(unsigned long base_section_nr) -- 2.34.1

3 years, 9 months

4
5
0 0

[PATCH] ALSA: usb-audio: Correct quirk for VF0770

by Jonas Hahnfeld

This device provides both audio and video. The original quirk added in commit 48827e1d6af5 ("ALSA: usb-audio: Add quirk for VF0770") used USB_DEVICE to match the vendor and product ID. Depending on module order, if snd-usb-audio was asked first, it would match the entire device and uvcvideo wouldn't get to see it. Change the matching to USB_AUDIO_DEVICE to restore uvcvideo matching in all cases. Fixes: 48827e1d6af5 ("ALSA: usb-audio: Add quirk for VF0770") Reported-by: Jukka Heikintalo <heikintalo.jukka(a)gmail.com> Tested-by: Jukka Heikintalo <heikintalo.jukka(a)gmail.com> Reported-by: Paweł Susicki <pawel.susicki(a)gmail.com> Tested-by: Paweł Susicki <pawel.susicki(a)gmail.com> Cc: <stable(a)vger.kernel.org> # 5.4, 5.10, 5.14, 5.15 Signed-off-by: Jonas Hahnfeld <hahnjo(a)hahnjo.de> --- sound/usb/quirks-table.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/usb/quirks-table.h b/sound/usb/quirks-table.h index b1522e43173e..0ea39565e623 100644 --- a/sound/usb/quirks-table.h +++ b/sound/usb/quirks-table.h @@ -84,7 +84,7 @@ * combination. */ { - USB_DEVICE(0x041e, 0x4095), + USB_AUDIO_DEVICE(0x041e, 0x4095), .driver_info = (unsigned long) &(const struct snd_usb_audio_quirk) { .ifnum = QUIRK_ANY_INTERFACE, .type = QUIRK_COMPOSITE, -- 2.35.1

3 years, 9 months

2
1
0 0

[PATCH] ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback

by Jan Kiszka

From: Georgi Valkov <gvalkov(a)abv.bg> When rx_buf is allocated we need to account for IPHETH_IP_ALIGN, which reduces the usable size by 2 bytes. Otherwise we have 1512 bytes usable instead of 1514, and if we receive more than 1512 bytes, ipheth_rcvbulk_callback is called with status -EOVERFLOW, after which the driver malfunctiones and all communication stops. Resolves ipheth 2-1:4.2: ipheth_rcvbulk_callback: urb status: -75 Fixes: f33d9e2b48a3 ("usbnet: ipheth: fix connectivity with iOS 14") Signed-off-by: Georgi Valkov <gvalkov(a)abv.bg> Tested-by: Jan Kiszka <jan.kiszka(a)siemens.com> --- drivers/net/usb/ipheth.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/usb/ipheth.c b/drivers/net/usb/ipheth.c index cd33955df0b6..6a769df0b421 100644 --- a/drivers/net/usb/ipheth.c +++ b/drivers/net/usb/ipheth.c @@ -121,7 +121,7 @@ static int ipheth_alloc_urbs(struct ipheth_device *iphone) if (tx_buf == NULL) goto free_rx_urb; - rx_buf = usb_alloc_coherent(iphone->udev, IPHETH_BUF_SIZE, + rx_buf = usb_alloc_coherent(iphone->udev, IPHETH_BUF_SIZE + IPHETH_IP_ALIGN, GFP_KERNEL, &rx_urb->transfer_dma); if (rx_buf == NULL) goto free_tx_buf; @@ -146,7 +146,7 @@ static int ipheth_alloc_urbs(struct ipheth_device *iphone) static void ipheth_free_urbs(struct ipheth_device *iphone) { - usb_free_coherent(iphone->udev, IPHETH_BUF_SIZE, iphone->rx_buf, + usb_free_coherent(iphone->udev, IPHETH_BUF_SIZE + IPHETH_IP_ALIGN, iphone->rx_buf, iphone->rx_urb->transfer_dma); usb_free_coherent(iphone->udev, IPHETH_BUF_SIZE, iphone->tx_buf, iphone->tx_urb->transfer_dma); @@ -317,7 +317,7 @@ static int ipheth_rx_submit(struct ipheth_device *dev, gfp_t mem_flags) usb_fill_bulk_urb(dev->rx_urb, udev, usb_rcvbulkpipe(udev, dev->bulk_in), - dev->rx_buf, IPHETH_BUF_SIZE, + dev->rx_buf, IPHETH_BUF_SIZE + IPHETH_IP_ALIGN, ipheth_rcvbulk_callback, dev); dev->rx_urb->transfer_flags |= URB_NO_TRANSFER_DMA_MAP; -- 2.34.1

3 years, 9 months

2
2
0 0

[PATCH] arm64: dts: qcom: sm8250: Fix MSI IRQ for PCIe1 and PCIe2

by Manivannan Sadhasivam

Fix the MSI IRQ used for PCIe instances 1 and 2. Cc: stable(a)vger.kernel.org Fixes: e53bdfc00977 ("arm64: dts: qcom: sm8250: Add PCIe support") Reported-by: Jordan Crouse <jordan(a)cosmicpenguin.net> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org> --- arch/arm64/boot/dts/qcom/sm8250.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/boot/dts/qcom/sm8250.dtsi b/arch/arm64/boot/dts/qcom/sm8250.dtsi index 6f6129b39c9c..8a3373c110fc 100644 --- a/arch/arm64/boot/dts/qcom/sm8250.dtsi +++ b/arch/arm64/boot/dts/qcom/sm8250.dtsi @@ -1487,7 +1487,7 @@ pcie1: pci@1c08000 { ranges = <0x01000000 0x0 0x40200000 0x0 0x40200000 0x0 0x100000>, <0x02000000 0x0 0x40300000 0x0 0x40300000 0x0 0x1fd00000>; - interrupts = <GIC_SPI 306 IRQ_TYPE_EDGE_RISING>; + interrupts = <GIC_SPI 307 IRQ_TYPE_LEVEL_HIGH>; interrupt-names = "msi"; #interrupt-cells = <1>; interrupt-map-mask = <0 0 0 0x7>; @@ -1593,7 +1593,7 @@ pcie2: pci@1c10000 { ranges = <0x01000000 0x0 0x64200000 0x0 0x64200000 0x0 0x100000>, <0x02000000 0x0 0x64300000 0x0 0x64300000 0x0 0x3d00000>; - interrupts = <GIC_SPI 236 IRQ_TYPE_EDGE_RISING>; + interrupts = <GIC_SPI 243 IRQ_TYPE_LEVEL_HIGH>; interrupt-names = "msi"; #interrupt-cells = <1>; interrupt-map-mask = <0 0 0 0x7>; -- 2.25.1

3 years, 9 months

3
2
0 0

[PATCH v4 3/9] brcmfmac: firmware: Do not crash on a NULL board_type

by Hector Martin

This unbreaks support for USB devices, which do not have a board_type to create an alt_path out of and thus were running into a NULL dereference. Fixes: 5ff013914c62 ("brcmfmac: firmware: Allow per-board firmware binaries") Reviewed-by: Arend van Spriel <arend.vanspriel(a)broadcom.com> Reviewed-by: Dmitry Osipenko <digetx(a)gmail.com> Cc: stable(a)vger.kernel.org Signed-off-by: Hector Martin <marcan(a)marcan.st> --- drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.c index 1001c8888bfe..63821856bbe1 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.c @@ -599,6 +599,9 @@ static char *brcm_alt_fw_path(const char *path, const char *board_type) char alt_path[BRCMF_FW_NAME_LEN]; char suffix[5]; + if (!board_type) + return NULL; + strscpy(alt_path, path, BRCMF_FW_NAME_LEN); /* At least one character + suffix */ if (strlen(alt_path) < 5) -- 2.33.0

3 years, 9 months

3
5
0 0

FAILED: patch "[PATCH] psi: Fix uaf issue when psi trigger is destroyed while being" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From a06247c6804f1a7c86a2e5398a4c1f1db1471848 Mon Sep 17 00:00:00 2001 From: Suren Baghdasaryan <surenb(a)google.com> Date: Tue, 11 Jan 2022 15:23:09 -0800 Subject: [PATCH] psi: Fix uaf issue when psi trigger is destroyed while being polled With write operation on psi files replacing old trigger with a new one, the lifetime of its waitqueue is totally arbitrary. Overwriting an existing trigger causes its waitqueue to be freed and pending poll() will stumble on trigger->event_wait which was destroyed. Fix this by disallowing to redefine an existing psi trigger. If a write operation is used on a file descriptor with an already existing psi trigger, the operation will fail with EBUSY error. Also bypass a check for psi_disabled in the psi_trigger_destroy as the flag can be flipped after the trigger is created, leading to a memory leak. Fixes: 0e94682b73bf ("psi: introduce psi monitor") Reported-by: syzbot+cdb5dd11c97cc532efad(a)syzkaller.appspotmail.com Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org> Analyzed-by: Eric Biggers <ebiggers(a)kernel.org> Signed-off-by: Suren Baghdasaryan <surenb(a)google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Reviewed-by: Eric Biggers <ebiggers(a)google.com> Acked-by: Johannes Weiner <hannes(a)cmpxchg.org> Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/r/20220111232309.1786347-1-surenb@google.com diff --git a/Documentation/accounting/psi.rst b/Documentation/accounting/psi.rst index f2b3439edcc2..860fe651d645 100644 --- a/Documentation/accounting/psi.rst +++ b/Documentation/accounting/psi.rst @@ -92,7 +92,8 @@ Triggers can be set on more than one psi metric and more than one trigger for the same psi metric can be specified. However for each trigger a separate file descriptor is required to be able to poll it separately from others, therefore for each trigger a separate open() syscall should be made even -when opening the same psi interface file. +when opening the same psi interface file. Write operations to a file descriptor +with an already existing psi trigger will fail with EBUSY. Monitors activate only when system enters stall state for the monitored psi metric and deactivates upon exit from the stall state. While system is diff --git a/include/linux/psi.h b/include/linux/psi.h index a70ca833c6d7..f8ce53bfdb2a 100644 --- a/include/linux/psi.h +++ b/include/linux/psi.h @@ -33,7 +33,7 @@ void cgroup_move_task(struct task_struct *p, struct css_set *to); struct psi_trigger *psi_trigger_create(struct psi_group *group, char *buf, size_t nbytes, enum psi_res res); -void psi_trigger_replace(void **trigger_ptr, struct psi_trigger *t); +void psi_trigger_destroy(struct psi_trigger *t); __poll_t psi_trigger_poll(void **trigger_ptr, struct file *file, poll_table *wait); diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h index 516c0fe836fd..1a3cef26d129 100644 --- a/include/linux/psi_types.h +++ b/include/linux/psi_types.h @@ -141,9 +141,6 @@ struct psi_trigger { * events to one per window */ u64 last_event_time; - - /* Refcounting to prevent premature destruction */ - struct kref refcount; }; struct psi_group { diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index b31e1465868a..9d05c3ca2d5e 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -3643,6 +3643,12 @@ static ssize_t cgroup_pressure_write(struct kernfs_open_file *of, char *buf, cgroup_get(cgrp); cgroup_kn_unlock(of->kn); + /* Allow only one trigger per file descriptor */ + if (ctx->psi.trigger) { + cgroup_put(cgrp); + return -EBUSY; + } + psi = cgroup_ino(cgrp) == 1 ? &psi_system : &cgrp->psi; new = psi_trigger_create(psi, buf, nbytes, res); if (IS_ERR(new)) { @@ -3650,8 +3656,7 @@ static ssize_t cgroup_pressure_write(struct kernfs_open_file *of, char *buf, return PTR_ERR(new); } - psi_trigger_replace(&ctx->psi.trigger, new); - + smp_store_release(&ctx->psi.trigger, new); cgroup_put(cgrp); return nbytes; @@ -3690,7 +3695,7 @@ static void cgroup_pressure_release(struct kernfs_open_file *of) { struct cgroup_file_ctx *ctx = of->priv; - psi_trigger_replace(&ctx->psi.trigger, NULL); + psi_trigger_destroy(ctx->psi.trigger); } bool cgroup_psi_enabled(void) diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index a679613a7cb7..c137c4d6983e 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -1162,7 +1162,6 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group, t->event = 0; t->last_event_time = 0; init_waitqueue_head(&t->event_wait); - kref_init(&t->refcount); mutex_lock(&group->trigger_lock); @@ -1191,15 +1190,19 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group, return t; } -static void psi_trigger_destroy(struct kref *ref) +void psi_trigger_destroy(struct psi_trigger *t) { - struct psi_trigger *t = container_of(ref, struct psi_trigger, refcount); - struct psi_group *group = t->group; + struct psi_group *group; struct task_struct *task_to_destroy = NULL; - if (static_branch_likely(&psi_disabled)) + /* + * We do not check psi_disabled since it might have been disabled after + * the trigger got created. + */ + if (!t) return; + group = t->group; /* * Wakeup waiters to stop polling. Can happen if cgroup is deleted * from under a polling process. @@ -1235,9 +1238,9 @@ static void psi_trigger_destroy(struct kref *ref) mutex_unlock(&group->trigger_lock); /* - * Wait for both *trigger_ptr from psi_trigger_replace and - * poll_task RCUs to complete their read-side critical sections - * before destroying the trigger and optionally the poll_task + * Wait for psi_schedule_poll_work RCU to complete its read-side + * critical section before destroying the trigger and optionally the + * poll_task. */ synchronize_rcu(); /* @@ -1254,18 +1257,6 @@ static void psi_trigger_destroy(struct kref *ref) kfree(t); } -void psi_trigger_replace(void **trigger_ptr, struct psi_trigger *new) -{ - struct psi_trigger *old = *trigger_ptr; - - if (static_branch_likely(&psi_disabled)) - return; - - rcu_assign_pointer(*trigger_ptr, new); - if (old) - kref_put(&old->refcount, psi_trigger_destroy); -} - __poll_t psi_trigger_poll(void **trigger_ptr, struct file *file, poll_table *wait) { @@ -1275,24 +1266,15 @@ __poll_t psi_trigger_poll(void **trigger_ptr, if (static_branch_likely(&psi_disabled)) return DEFAULT_POLLMASK | EPOLLERR | EPOLLPRI; - rcu_read_lock(); - - t = rcu_dereference(*(void __rcu __force **)trigger_ptr); - if (!t) { - rcu_read_unlock(); + t = smp_load_acquire(trigger_ptr); + if (!t) return DEFAULT_POLLMASK | EPOLLERR | EPOLLPRI; - } - kref_get(&t->refcount); - - rcu_read_unlock(); poll_wait(file, &t->event_wait, wait); if (cmpxchg(&t->event, 1, 0) == 1) ret |= EPOLLPRI; - kref_put(&t->refcount, psi_trigger_destroy); - return ret; } @@ -1316,14 +1298,24 @@ static ssize_t psi_write(struct file *file, const char __user *user_buf, buf[buf_size - 1] = '\0'; - new = psi_trigger_create(&psi_system, buf, nbytes, res); - if (IS_ERR(new)) - return PTR_ERR(new); - seq = file->private_data; + /* Take seq->lock to protect seq->private from concurrent writes */ mutex_lock(&seq->lock); - psi_trigger_replace(&seq->private, new); + + /* Allow only one trigger per file descriptor */ + if (seq->private) { + mutex_unlock(&seq->lock); + return -EBUSY; + } + + new = psi_trigger_create(&psi_system, buf, nbytes, res); + if (IS_ERR(new)) { + mutex_unlock(&seq->lock); + return PTR_ERR(new); + } + + smp_store_release(&seq->private, new); mutex_unlock(&seq->lock); return nbytes; @@ -1358,7 +1350,7 @@ static int psi_fop_release(struct inode *inode, struct file *file) { struct seq_file *seq = file->private_data; - psi_trigger_replace(&seq->private, NULL); + psi_trigger_destroy(seq->private); return single_release(inode, file); }

3 years, 9 months

2
1
0 0

FAILED: patch "[PATCH] psi: Fix uaf issue when psi trigger is destroyed while being" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From a06247c6804f1a7c86a2e5398a4c1f1db1471848 Mon Sep 17 00:00:00 2001 From: Suren Baghdasaryan <surenb(a)google.com> Date: Tue, 11 Jan 2022 15:23:09 -0800 Subject: [PATCH] psi: Fix uaf issue when psi trigger is destroyed while being polled With write operation on psi files replacing old trigger with a new one, the lifetime of its waitqueue is totally arbitrary. Overwriting an existing trigger causes its waitqueue to be freed and pending poll() will stumble on trigger->event_wait which was destroyed. Fix this by disallowing to redefine an existing psi trigger. If a write operation is used on a file descriptor with an already existing psi trigger, the operation will fail with EBUSY error. Also bypass a check for psi_disabled in the psi_trigger_destroy as the flag can be flipped after the trigger is created, leading to a memory leak. Fixes: 0e94682b73bf ("psi: introduce psi monitor") Reported-by: syzbot+cdb5dd11c97cc532efad(a)syzkaller.appspotmail.com Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org> Analyzed-by: Eric Biggers <ebiggers(a)kernel.org> Signed-off-by: Suren Baghdasaryan <surenb(a)google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Reviewed-by: Eric Biggers <ebiggers(a)google.com> Acked-by: Johannes Weiner <hannes(a)cmpxchg.org> Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/r/20220111232309.1786347-1-surenb@google.com diff --git a/Documentation/accounting/psi.rst b/Documentation/accounting/psi.rst index f2b3439edcc2..860fe651d645 100644 --- a/Documentation/accounting/psi.rst +++ b/Documentation/accounting/psi.rst @@ -92,7 +92,8 @@ Triggers can be set on more than one psi metric and more than one trigger for the same psi metric can be specified. However for each trigger a separate file descriptor is required to be able to poll it separately from others, therefore for each trigger a separate open() syscall should be made even -when opening the same psi interface file. +when opening the same psi interface file. Write operations to a file descriptor +with an already existing psi trigger will fail with EBUSY. Monitors activate only when system enters stall state for the monitored psi metric and deactivates upon exit from the stall state. While system is diff --git a/include/linux/psi.h b/include/linux/psi.h index a70ca833c6d7..f8ce53bfdb2a 100644 --- a/include/linux/psi.h +++ b/include/linux/psi.h @@ -33,7 +33,7 @@ void cgroup_move_task(struct task_struct *p, struct css_set *to); struct psi_trigger *psi_trigger_create(struct psi_group *group, char *buf, size_t nbytes, enum psi_res res); -void psi_trigger_replace(void **trigger_ptr, struct psi_trigger *t); +void psi_trigger_destroy(struct psi_trigger *t); __poll_t psi_trigger_poll(void **trigger_ptr, struct file *file, poll_table *wait); diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h index 516c0fe836fd..1a3cef26d129 100644 --- a/include/linux/psi_types.h +++ b/include/linux/psi_types.h @@ -141,9 +141,6 @@ struct psi_trigger { * events to one per window */ u64 last_event_time; - - /* Refcounting to prevent premature destruction */ - struct kref refcount; }; struct psi_group { diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index b31e1465868a..9d05c3ca2d5e 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -3643,6 +3643,12 @@ static ssize_t cgroup_pressure_write(struct kernfs_open_file *of, char *buf, cgroup_get(cgrp); cgroup_kn_unlock(of->kn); + /* Allow only one trigger per file descriptor */ + if (ctx->psi.trigger) { + cgroup_put(cgrp); + return -EBUSY; + } + psi = cgroup_ino(cgrp) == 1 ? &psi_system : &cgrp->psi; new = psi_trigger_create(psi, buf, nbytes, res); if (IS_ERR(new)) { @@ -3650,8 +3656,7 @@ static ssize_t cgroup_pressure_write(struct kernfs_open_file *of, char *buf, return PTR_ERR(new); } - psi_trigger_replace(&ctx->psi.trigger, new); - + smp_store_release(&ctx->psi.trigger, new); cgroup_put(cgrp); return nbytes; @@ -3690,7 +3695,7 @@ static void cgroup_pressure_release(struct kernfs_open_file *of) { struct cgroup_file_ctx *ctx = of->priv; - psi_trigger_replace(&ctx->psi.trigger, NULL); + psi_trigger_destroy(ctx->psi.trigger); } bool cgroup_psi_enabled(void) diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index a679613a7cb7..c137c4d6983e 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -1162,7 +1162,6 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group, t->event = 0; t->last_event_time = 0; init_waitqueue_head(&t->event_wait); - kref_init(&t->refcount); mutex_lock(&group->trigger_lock); @@ -1191,15 +1190,19 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group, return t; } -static void psi_trigger_destroy(struct kref *ref) +void psi_trigger_destroy(struct psi_trigger *t) { - struct psi_trigger *t = container_of(ref, struct psi_trigger, refcount); - struct psi_group *group = t->group; + struct psi_group *group; struct task_struct *task_to_destroy = NULL; - if (static_branch_likely(&psi_disabled)) + /* + * We do not check psi_disabled since it might have been disabled after + * the trigger got created. + */ + if (!t) return; + group = t->group; /* * Wakeup waiters to stop polling. Can happen if cgroup is deleted * from under a polling process. @@ -1235,9 +1238,9 @@ static void psi_trigger_destroy(struct kref *ref) mutex_unlock(&group->trigger_lock); /* - * Wait for both *trigger_ptr from psi_trigger_replace and - * poll_task RCUs to complete their read-side critical sections - * before destroying the trigger and optionally the poll_task + * Wait for psi_schedule_poll_work RCU to complete its read-side + * critical section before destroying the trigger and optionally the + * poll_task. */ synchronize_rcu(); /* @@ -1254,18 +1257,6 @@ static void psi_trigger_destroy(struct kref *ref) kfree(t); } -void psi_trigger_replace(void **trigger_ptr, struct psi_trigger *new) -{ - struct psi_trigger *old = *trigger_ptr; - - if (static_branch_likely(&psi_disabled)) - return; - - rcu_assign_pointer(*trigger_ptr, new); - if (old) - kref_put(&old->refcount, psi_trigger_destroy); -} - __poll_t psi_trigger_poll(void **trigger_ptr, struct file *file, poll_table *wait) { @@ -1275,24 +1266,15 @@ __poll_t psi_trigger_poll(void **trigger_ptr, if (static_branch_likely(&psi_disabled)) return DEFAULT_POLLMASK | EPOLLERR | EPOLLPRI; - rcu_read_lock(); - - t = rcu_dereference(*(void __rcu __force **)trigger_ptr); - if (!t) { - rcu_read_unlock(); + t = smp_load_acquire(trigger_ptr); + if (!t) return DEFAULT_POLLMASK | EPOLLERR | EPOLLPRI; - } - kref_get(&t->refcount); - - rcu_read_unlock(); poll_wait(file, &t->event_wait, wait); if (cmpxchg(&t->event, 1, 0) == 1) ret |= EPOLLPRI; - kref_put(&t->refcount, psi_trigger_destroy); - return ret; } @@ -1316,14 +1298,24 @@ static ssize_t psi_write(struct file *file, const char __user *user_buf, buf[buf_size - 1] = '\0'; - new = psi_trigger_create(&psi_system, buf, nbytes, res); - if (IS_ERR(new)) - return PTR_ERR(new); - seq = file->private_data; + /* Take seq->lock to protect seq->private from concurrent writes */ mutex_lock(&seq->lock); - psi_trigger_replace(&seq->private, new); + + /* Allow only one trigger per file descriptor */ + if (seq->private) { + mutex_unlock(&seq->lock); + return -EBUSY; + } + + new = psi_trigger_create(&psi_system, buf, nbytes, res); + if (IS_ERR(new)) { + mutex_unlock(&seq->lock); + return PTR_ERR(new); + } + + smp_store_release(&seq->private, new); mutex_unlock(&seq->lock); return nbytes; @@ -1358,7 +1350,7 @@ static int psi_fop_release(struct inode *inode, struct file *file) { struct seq_file *seq = file->private_data; - psi_trigger_replace(&seq->private, NULL); + psi_trigger_destroy(seq->private); return single_release(inode, file); }

3 years, 9 months

2
2
0 0

[PATCH RESEND] KVM: x86/mmu: fix UAF in paging_update_accessed_dirty_bits

by Tadeusz Struk

Syzbot reported an use-after-free bug in update_accessed_dirty_bits(). Fix this by checking if the memremap'ed pointer is still valid. Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Vitaly Kuznetsov <vkuznets(a)redhat.com> Cc: Wanpeng Li <wanpengli(a)tencent.com> Cc: Jim Mattson <jmattson(a)google.com> Cc: Joerg Roedel <joro(a)8bytes.org> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Borislav Petkov <bp(a)alien8.de> Cc: Dave Hansen <dave.hansen(a)linux.intel.com> Cc: x86(a)kernel.org Cc: "H. Peter Anvin" <hpa(a)zytor.com> Cc: kvm(a)vger.kernel.org Cc: <stable(a)vger.kernel.org> Fixes: bd53cb35a3e9 ("X86/KVM: Handle PFNs outside of kernel reach when touching GPTEs") Link: https://syzkaller.appspot.com/bug?id=6cb6102a0a7b0c52060753dd62d070a1d1e713… Reported-by: syzbot+6cde2282daa792c49ab8(a)syzkaller.appspotmail.com Signed-off-by: Tadeusz Struk <tadeusz.struk(a)linaro.org> --- arch/x86/kvm/mmu/paging_tmpl.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 5b5bdac97c7b..d25b72d7b1b1 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -174,7 +174,7 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; paddr = pfn << PAGE_SHIFT; table = memremap(paddr, PAGE_SIZE, MEMREMAP_WB); - if (!table) { + if (!table || !access_ok(table, PAGE_SIZE)) { mmap_read_unlock(current->mm); return -EFAULT; } -- 2.34.1

3 years, 9 months

3
6
0 0

stable-rc/linux-5.10.y build: 184 builds: 3 failed, 181 passed, 4 errors, 10 warnings (v5.10.95-101-gbf18cfd8183f)

by kernelci.org bot

stable-rc/linux-5.10.y build: 184 builds: 3 failed, 181 passed, 4 errors, 10 warnings (v5.10.95-101-gbf18cfd8183f) Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-5.10.y/kernel/v5.10.95-10… Tree: stable-rc Branch: linux-5.10.y Git Describe: v5.10.95-101-gbf18cfd8183f Git Commit: bf18cfd8183fafa8bbba6fdd2c236624bbef333c Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Built: 7 unique architectures Build Failures Detected: arm: rpc_defconfig: (gcc-10) FAIL mips: ip27_defconfig: (gcc-10) FAIL ip28_defconfig: (gcc-10) FAIL Errors and Warnings Detected: arc: arm64: arm: rpc_defconfig (gcc-10): 4 errors i386: mips: 32r2el_defconfig (gcc-10): 1 warning decstation_64_defconfig (gcc-10): 1 warning decstation_defconfig (gcc-10): 1 warning decstation_r4k_defconfig (gcc-10): 1 warning lemote2f_defconfig (gcc-10): 1 warning rm200_defconfig (gcc-10): 1 warning riscv: rv32_defconfig (gcc-10): 4 warnings x86_64: Errors summary: 2 arm-linux-gnueabihf-gcc: error: unrecognized -march target: armv3m 2 arm-linux-gnueabihf-gcc: error: missing argument to ‘-march=’ Warnings summary: 3 kernel/rcu/tasks.h:707:13: warning: ‘show_rcu_tasks_rude_gp_kthread’ defined but not used [-Wunused-function] 2 <stdin>:830:2: warning: #warning syscall fstat64 not implemented [-Wcpp] 2 <stdin>:1127:2: warning: #warning syscall fstatat64 not implemented [-Wcpp] 1 net/mac80211/mlme.c:4335:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=] 1 drivers/block/paride/bpck.c:32: warning: "PC" redefined 1 WARNING: modpost: Symbol info of vmlinux is missing. Unresolved symbol check will be entirely skipped. Section mismatches summary: 1 WARNING: modpost: vmlinux.o(.text+0xd040): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() 1 WARNING: modpost: vmlinux.o(.text+0xce9c): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() 1 WARNING: modpost: vmlinux.o(.text+0xcda4): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() 1 WARNING: modpost: vmlinux.o(.text+0xcd38): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() 1 WARNING: modpost: vmlinux.o(.text+0xcb84): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() 1 WARNING: modpost: vmlinux.o(.text+0xcb74): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() 1 WARNING: modpost: vmlinux.o(.text+0xcb6c): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() 1 WARNING: modpost: vmlinux.o(.text+0xcb4c): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() 1 WARNING: modpost: vmlinux.o(.text+0xcaa8): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() 1 WARNING: modpost: vmlinux.o(.text+0xc8ec): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() 1 WARNING: modpost: vmlinux.o(.text+0xb904): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() 1 WARNING: modpost: vmlinux.o(.text+0x8054): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() 1 WARNING: modpost: vmlinux.o(.text+0x7670): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() ================================================================================ Detailed per-defconfig build reports: -------------------------------------------------------------------------------- 32r2el_defconfig (mips, gcc-10) — PASS, 0 errors, 1 warning, 0 section mismatches Warnings: WARNING: modpost: Symbol info of vmlinux is missing. Unresolved symbol check will be entirely skipped. -------------------------------------------------------------------------------- allnoconfig (x86_64, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- allnoconfig (i386, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- allnoconfig (arc, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- am200epdkit_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- ar7_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- aspeed_g4_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- aspeed_g5_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- assabet_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches Section mismatches: WARNING: modpost: vmlinux.o(.text+0xcb6c): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() -------------------------------------------------------------------------------- at91_dt_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- ath25_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- ath79_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- axm55xx_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- axs103_defconfig (arc, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- axs103_smp_defconfig (arc, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- badge4_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches Section mismatches: WARNING: modpost: vmlinux.o(.text+0xcd38): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() -------------------------------------------------------------------------------- bcm2835_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- bcm47xx_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- bcm63xx_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- bigsur_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- bmips_be_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- bmips_stb_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- capcella_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- cavium_octeon_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- cerfcube_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches Section mismatches: WARNING: modpost: vmlinux.o(.text+0x7670): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() -------------------------------------------------------------------------------- ci20_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- cm_x300_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- cobalt_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- colibri_pxa270_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- colibri_pxa300_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- collie_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches Section mismatches: WARNING: modpost: vmlinux.o(.text+0xb904): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() -------------------------------------------------------------------------------- corgi_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- cu1000-neo_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- cu1830-neo_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- davinci_all_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- db1xxx_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- decstation_64_defconfig (mips, gcc-10) — PASS, 0 errors, 1 warning, 0 section mismatches Warnings: kernel/rcu/tasks.h:707:13: warning: ‘show_rcu_tasks_rude_gp_kthread’ defined but not used [-Wunused-function] -------------------------------------------------------------------------------- decstation_defconfig (mips, gcc-10) — PASS, 0 errors, 1 warning, 0 section mismatches Warnings: kernel/rcu/tasks.h:707:13: warning: ‘show_rcu_tasks_rude_gp_kthread’ defined but not used [-Wunused-function] -------------------------------------------------------------------------------- decstation_r4k_defconfig (mips, gcc-10) — PASS, 0 errors, 1 warning, 0 section mismatches Warnings: kernel/rcu/tasks.h:707:13: warning: ‘show_rcu_tasks_rude_gp_kthread’ defined but not used [-Wunused-function] -------------------------------------------------------------------------------- defconfig (riscv, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- defconfig (arm64, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- defconfig+arm64-chromebook (arm64, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- dove_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- e55_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- ebsa110_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- efm32_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- ep93xx_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches Section mismatches: WARNING: modpost: vmlinux.o(.text+0x8054): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() -------------------------------------------------------------------------------- eseries_pxa_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- exynos_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- ezx_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- footbridge_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- fuloong2e_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- gcw0_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- gemini_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- gpr_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- h3600_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches Section mismatches: WARNING: modpost: vmlinux.o(.text+0xcb84): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() -------------------------------------------------------------------------------- h5000_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- hackkit_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches Section mismatches: WARNING: modpost: vmlinux.o(.text+0xce9c): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() -------------------------------------------------------------------------------- haps_hs_defconfig (arc, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- haps_hs_smp_defconfig (arc, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- hisi_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- hsdk_defconfig (arc, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- i386_defconfig (i386, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- imote2_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- imx_v4_v5_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- imx_v6_v7_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- integrator_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- iop32x_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- ip22_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- ip27_defconfig (mips, gcc-10) — FAIL, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- ip28_defconfig (mips, gcc-10) — FAIL, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- ip32_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- ixp4xx_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- jazz_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- jmr3927_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- jornada720_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches Section mismatches: WARNING: modpost: vmlinux.o(.text+0xcaa8): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() -------------------------------------------------------------------------------- keystone_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- lart_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches Section mismatches: WARNING: modpost: vmlinux.o(.text+0xcb74): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() -------------------------------------------------------------------------------- lemote2f_defconfig (mips, gcc-10) — PASS, 0 errors, 1 warning, 0 section mismatches Warnings: net/mac80211/mlme.c:4335:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=] -------------------------------------------------------------------------------- loongson1b_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- loongson1c_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- loongson3_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- lpc18xx_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- lpc32xx_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- lpd270_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- lubbock_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- magician_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- mainstone_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- malta_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- malta_kvm_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- malta_kvm_guest_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- malta_qemu_32r6_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- maltaaprp_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- maltasmvp_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- maltasmvp_eva_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- maltaup_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- maltaup_xpa_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- milbeaut_m10v_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- mini2440_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- mmp2_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- moxart_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- mpc30x_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- mps2_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- mtx1_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- multi_v4t_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- multi_v5_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- multi_v7_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- mvebu_v5_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- mvebu_v7_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- mxs_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- neponset_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches Section mismatches: WARNING: modpost: vmlinux.o(.text+0xcda4): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() -------------------------------------------------------------------------------- netwinder_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- nhk8815_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- nlm_xlp_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- nlm_xlr_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- nommu_k210_defconfig (riscv, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- nsimosci_hs_defconfig (arc, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- nsimosci_hs_smp_defconfig (arc, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- omap1_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- omap2plus_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- omega2p_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- orion5x_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- oxnas_v6_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- palmz72_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- pcm027_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- pic32mzda_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- pistachio_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- pleb_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches Section mismatches: WARNING: modpost: vmlinux.o(.text+0xc8ec): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() -------------------------------------------------------------------------------- prima2_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- pxa168_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- pxa255-idp_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- pxa3xx_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- pxa910_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- pxa_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- qcom_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- qi_lb60_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- rb532_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- rbtx49xx_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- realview_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- rm200_defconfig (mips, gcc-10) — PASS, 0 errors, 1 warning, 0 section mismatches Warnings: drivers/block/paride/bpck.c:32: warning: "PC" redefined -------------------------------------------------------------------------------- rpc_defconfig (arm, gcc-10) — FAIL, 4 errors, 0 warnings, 0 section mismatches Errors: arm-linux-gnueabihf-gcc: error: unrecognized -march target: armv3m arm-linux-gnueabihf-gcc: error: missing argument to ‘-march=’ arm-linux-gnueabihf-gcc: error: unrecognized -march target: armv3m arm-linux-gnueabihf-gcc: error: missing argument to ‘-march=’ -------------------------------------------------------------------------------- rt305x_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- rv32_defconfig (riscv, gcc-10) — PASS, 0 errors, 4 warnings, 0 section mismatches Warnings: <stdin>:830:2: warning: #warning syscall fstat64 not implemented [-Wcpp] <stdin>:1127:2: warning: #warning syscall fstatat64 not implemented [-Wcpp] <stdin>:830:2: warning: #warning syscall fstat64 not implemented [-Wcpp] <stdin>:1127:2: warning: #warning syscall fstatat64 not implemented [-Wcpp] -------------------------------------------------------------------------------- s3c2410_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- s3c6400_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- s5pv210_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- sama5_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- sb1250_swarm_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- shannon_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches Section mismatches: WARNING: modpost: vmlinux.o(.text+0xcb4c): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() -------------------------------------------------------------------------------- shmobile_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- simpad_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches Section mismatches: WARNING: modpost: vmlinux.o(.text+0xd040): Section mismatch in reference from the function __arm_ioremap_pfn_caller() to the function .meminit.text:memblock_is_map_memory() -------------------------------------------------------------------------------- socfpga_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- spear13xx_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- spear3xx_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- spear6xx_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- spitz_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- stm32_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- sunxi_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- tango4_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- tb0219_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- tb0226_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- tb0287_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- tct_hammer_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- tegra_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- tinyconfig (x86_64, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- tinyconfig (i386, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- tinyconfig (arc, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- trizeps4_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- u300_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- u8500_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- vdk_hs38_defconfig (arc, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- vdk_hs38_smp_defconfig (arc, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- versatile_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- vexpress_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- vf610m4_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- viper_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- vocore2_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- vt8500_v6_v7_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- workpad_defconfig (mips, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- x86_64_defconfig (x86_64, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- x86_64_defconfig+x86-chromebook (x86_64, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- xcep_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- zeus_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches -------------------------------------------------------------------------------- zx_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches --- For more info write to <info(a)kernelci.org>

3 years, 9 months

1
0
0 0

+ mm-fix-race-between-madv_free-reclaim-and-blkdev-direct-io-read.patch added to -mm tree

by Andrew Morton

The patch titled Subject: mm: fix race between MADV_FREE reclaim and blkdev direct IO read has been added to the -mm tree. Its filename is mm-fix-race-between-madv_free-reclaim-and-blkdev-direct-io-read.patch This patch should soon appear at https://ozlabs.org/~akpm/mmots/broken-out/mm-fix-race-between-madv_free-rec… and later at https://ozlabs.org/~akpm/mmotm/broken-out/mm-fix-race-between-madv_free-rec… Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Mauricio Faria de Oliveira <mfo(a)canonical.com> Subject: mm: fix race between MADV_FREE reclaim and blkdev direct IO read Problem: ======= Userspace might read the zero-page instead of actual data from a direct IO read on a block device if the buffers have been called madvise(MADV_FREE) on earlier (this is discussed below) due to a race between page reclaim on MADV_FREE and blkdev direct IO read. - Race condition: ============== During page reclaim, the MADV_FREE page check in try_to_unmap_one() checks if the page is not dirty, then discards its rmap PTE(s) (vs. remap back if the page is dirty). However, after try_to_unmap_one() returns to shrink_page_list(), it might keep the page _anyway_ if page_ref_freeze() fails (it expects exactly _one_ page reference, from the isolation for page reclaim). Well, blkdev_direct_IO() gets references for all pages, and on READ operations it only sets them dirty _later_. So, if MADV_FREE'd pages (i.e., not dirty) are used as buffers for direct IO read from block devices, and page reclaim happens during __blkdev_direct_IO[_simple]() exactly AFTER bio_iov_iter_get_pages() returns, but BEFORE the pages are set dirty, the situation happens. The direct IO read eventually completes. Now, when userspace reads the buffers, the PTE is no longer there and the page fault handler do_anonymous_page() services that with the zero-page, NOT the data! A synthetic reproducer is provided. - Page faults: =========== If page reclaim happens BEFORE bio_iov_iter_get_pages() the issue doesn't happen, because that faults-in all pages as writeable, so do_anonymous_page() sets up a new page/rmap/PTE, and that is used by direct IO. The userspace reads don't fault as the PTE is there (thus zero-page is not used/setup). But if page reclaim happens AFTER it / BEFORE setting pages dirty, the PTE is no longer there; the subsequent page faults can't help: The data-read from the block device probably won't generate faults due to DMA (no MMU) but even in the case it wouldn't use DMA, that happens on different virtual addresses (not user-mapped addresses) because `struct bio_vec` stores `struct page` to figure addresses out (which are different from user-mapped addresses) for the read. Thus userspace reads (to user-mapped addresses) still fault, then do_anonymous_page() gets another `struct page` that would address/ map to other memory than the `struct page` used by `struct bio_vec` for the read. (The original `struct page` is not available, since it wasn't freed, as page_ref_freeze() failed due to more page refs. And even if it were available, its data cannot be trusted anymore.) Solution: ======== One solution is to check for the expected page reference count in try_to_unmap_one(). There should be one reference from the isolation (that is also checked in shrink_page_list() with page_ref_freeze()) plus one or more references from page mapping(s) (put in discard: label). Further references mean that rmap/PTE cannot be unmapped/nuked. (Note: there might be more than one reference from mapping due to fork()/clone() without CLONE_VM, which use the same `struct page` for references, until the copy-on-write page gets copied.) So, additional page references (e.g., from direct IO read) now prevent the rmap/PTE from being unmapped/dropped; similarly to the page is not freed per shrink_page_list()/page_ref_freeze()). - Races and Barriers: ================== The new check in try_to_unmap_one() should be safe in races with bio_iov_iter_get_pages() in get_user_pages() fast and slow paths, as it's done under the PTE lock. The fast path doesn't take the lock, but it checks if the PTE has changed and if so, it drops the reference and leaves the page for the slow path (which does take that lock). The fast path requires synchronization w/ full memory barrier: it writes the page reference count first then it reads the PTE later, while try_to_unmap() writes PTE first then it reads page refcount. And a second barrier is needed, as the page dirty flag should not be read before the page reference count (as in __remove_mapping()). (This can be a load memory barrier only; no writes are involved.) Call stack/comments: - try_to_unmap_one() - page_vma_mapped_walk() - map_pte() # see pte_offset_map_lock(): pte_offset_map() spin_lock() - ptep_get_and_clear() # write PTE - smp_mb() # (new barrier) GUP fast path - page_ref_count() # (new check) read refcount - page_vma_mapped_walk_done() # see pte_unmap_unlock(): pte_unmap() spin_unlock() - bio_iov_iter_get_pages() - __bio_iov_iter_get_pages() - iov_iter_get_pages() - get_user_pages_fast() - internal_get_user_pages_fast() # fast path - lockless_pages_from_mm() - gup_{pgd,p4d,pud,pmd,pte}_range() ptep = pte_offset_map() # not _lock() pte = ptep_get_lockless(ptep) page = pte_page(pte) try_grab_compound_head(page) # inc refcount # (RMW/barrier # on success) if (pte_val(pte) != pte_val(*ptep)) # read PTE put_compound_head(page) # dec refcount # go slow path # slow path - __gup_longterm_unlocked() - get_user_pages_unlocked() - __get_user_pages_locked() - __get_user_pages() - follow_{page,p4d,pud,pmd}_mask() - follow_page_pte() ptep = pte_offset_map_lock() pte = *ptep page = vm_normal_page(pte) try_grab_page(page) # inc refcount pte_unmap_unlock() - Huge Pages: ========== Regarding transparent hugepages, that logic shouldn't change, as MADV_FREE (aka lazyfree) pages are PageAnon() && !PageSwapBacked() (madvise_free_pte_range() -> mark_page_lazyfree() -> lru_lazyfree_fn()) thus should reach shrink_page_list() -> split_huge_page_to_list() before try_to_unmap[_one](), so it deals with normal pages only. (And in case unlikely/TTU_SPLIT_HUGE_PMD/split_huge_pmd_address() happens, which should not or be rare, the page refcount should be greater than mapcount: the head page is referenced by tail pages. That also prevents checking the head `page` then incorrectly call page_remove_rmap(subpage) for a tail page, that isn't even in the shrink_page_list()'s page_list (an effect of split huge pmd/pmvw), as it might happen today in this unlikely scenario.) MADV_FREE'd buffers: =================== So, back to the "if MADV_FREE pages are used as buffers" note. The case is arguable, and subject to multiple interpretations. The madvise(2) manual page on the MADV_FREE advice value says: 1) 'After a successful MADV_FREE ... data will be lost when the kernel frees the pages.' 2) 'the free operation will be canceled if the caller writes into the page' / 'subsequent writes ... will succeed and then [the] kernel cannot free those dirtied pages' 3) 'If there is no subsequent write, the kernel can free the pages at any time.' Thoughts, questions, considerations... respectively: 1) Since the kernel didn't actually free the page (page_ref_freeze() failed), should the data not have been lost? (on userspace read.) 2) Should writes performed by the direct IO read be able to cancel the free operation? - Should the direct IO read be considered as 'the caller' too, as it's been requested by 'the caller'? - Should the bio technique to dirty pages on return to userspace (bio_check_pages_dirty() is called/used by __blkdev_direct_IO()) be considered in another/special way here? 3) Should an upcoming write from a previously requested direct IO read be considered as a subsequent write, so the kernel should not free the pages? (as it's known at the time of page reclaim.) At last: Technically, the last point would seem a reasonable consideration and balance, as the madvise(2) manual page apparently (and fairly) seem to assume that 'writes' are memory access from the userspace process (not explicitly considering writes from the kernel or its corner cases; again, fairly).. plus the kernel fix implementation for the corner case of the largely 'non-atomic write' encompassed by a direct IO read operation, is relatively simple; and it helps. Reproducer: ========== @ test.c (simplified, but works) #define _GNU_SOURCE #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/mman.h> int main() { int fd, i; char *buf; fd = open(DEV, O_RDONLY | O_DIRECT); buf = mmap(NULL, BUF_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); for (i = 0; i < BUF_SIZE; i += PAGE_SIZE) buf[i] = 1; // init to non-zero madvise(buf, BUF_SIZE, MADV_FREE); read(fd, buf, BUF_SIZE); for (i = 0; i < BUF_SIZE; i += PAGE_SIZE) printf("%p: 0x%x ", &buf[i], buf[i]); return 0; } @ block/fops.c (formerly fs/block_dev.c) +#include <linux/swap.h> ... ... __blkdev_direct_IO[_simple](...) { ... + if (!strcmp(current->comm, "good")) + shrink_all_memory(ULONG_MAX); + ret = bio_iov_iter_get_pages(...); + + if (!strcmp(current->comm, "bad")) + shrink_all_memory(ULONG_MAX); ... } @ shell # NUM_PAGES=4 # PAGE_SIZE=$(getconf PAGE_SIZE) # yes | dd of=test.img bs=${PAGE_SIZE} count=${NUM_PAGES} # DEV=$(losetup -f --show test.img) # gcc -DDEV=\"$DEV\" \ -DBUF_SIZE=$((PAGE_SIZE * NUM_PAGES)) \ -DPAGE_SIZE=${PAGE_SIZE} \ test.c -o test # od -tx1 $DEV 0000000 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a * 0040000 # mv test good # ./good 0x7f7c10418000: 0x79 0x7f7c10419000: 0x79 0x7f7c1041a000: 0x79 0x7f7c1041b000: 0x79 # mv good bad # ./bad 0x7fa1b8050000: 0x0 0x7fa1b8051000: 0x0 0x7fa1b8052000: 0x0 0x7fa1b8053000: 0x0 Ceph/TCMalloc: ============= For documentation purposes, the use case driving the analysis/fix is Ceph on Ubuntu 18.04, as the TCMalloc library there still uses MADV_FREE to release unused memory to the system from the mmap'ed page heap (might be committed back/used again; it's not munmap'ed.) - PageHeap::DecommitSpan() -> TCMalloc_SystemRelease() -> madvise() - PageHeap::CommitSpan() -> TCMalloc_SystemCommit() -> do nothing. Note: TCMalloc switched back to MADV_DONTNEED a few commits after the release in Ubuntu 18.04 (google-perftools/gperftools 2.5), so the issue just 'disappeared' on Ceph on later Ubuntu releases but is still present in the kernel, and can be hit by other use cases. The observed issue seems to be the old Ceph bug #22464 [1], where checksum mismatches are observed (and instrumentation with buffer dumps shows zero-pages read from mmap'ed/MADV_FREE'd page ranges). The issue in Ceph was reasonably deemed a kernel bug (comment #50) and mostly worked around with a retry mechanism, but other parts of Ceph could still hit that (rocksdb). Anyway, it's less likely to be hit again as TCMalloc switched out of MADV_FREE by default. (Some kernel versions/reports from the Ceph bug, and relation with the MADV_FREE introduction/changes; TCMalloc versions not checked.) - 4.4 good - 4.5 (madv_free: introduction) - 4.9 bad - 4.10 good? maybe a swapless system - 4.12 (madv_free: no longer free instantly on swapless systems) - 4.13 bad [1] https://tracker.ceph.com/issues/22464 Thanks: ====== Several people contributed to analysis/discussions/tests/reproducers in the first stages when drilling down on ceph/tcmalloc/linux kernel: - Dan Hill - Dan Streetman - Dongdong Tao - Gavin Guo - Gerald Yang - Heitor Alves de Siqueira - Ioanna Alifieraki - Jay Vosburgh - Matthew Ruffell - Ponnuvel Palaniyappan Link: https://lkml.kernel.org/r/20220131230255.789059-1-mfo@canonical.com Fixes: 802a3a92ad7a ("mm: reclaim MADV_FREE pages") Signed-off-by: Mauricio Faria de Oliveira <mfo(a)canonical.com> Reviewed-by: "Huang, Ying" <ying.huang(a)intel.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: Yu Zhao <yuzhao(a)google.com> Cc: Yang Shi <shy828301(a)gmail.com> Cc: Miaohe Lin <linmiaohe(a)huawei.com> Cc: Dan Hill <daniel.hill(a)canonical.com> Cc: Dan Streetman <dan.streetman(a)canonical.com> Cc: Dongdong Tao <dongdong.tao(a)canonical.com> Cc: Gavin Guo <gavin.guo(a)canonical.com> Cc: Gerald Yang <gerald.yang(a)canonical.com> Cc: Heitor Alves de Siqueira <halves(a)canonical.com> Cc: Ioanna Alifieraki <ioanna-maria.alifieraki(a)canonical.com> Cc: Jay Vosburgh <jay.vosburgh(a)canonical.com> Cc: Matthew Ruffell <matthew.ruffell(a)canonical.com> Cc: Ponnuvel Palaniyappan <ponnuvel.palaniyappan(a)canonical.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/rmap.c | 25 ++++++++++++++++++++++++- mm/vmscan.c | 2 +- 2 files changed, 25 insertions(+), 2 deletions(-) --- a/mm/rmap.c~mm-fix-race-between-madv_free-reclaim-and-blkdev-direct-io-read +++ a/mm/rmap.c @@ -1599,7 +1599,30 @@ static bool try_to_unmap_one(struct page /* MADV_FREE page check */ if (!PageSwapBacked(page)) { - if (!PageDirty(page)) { + int ref_count, map_count; + + /* + * Synchronize with gup_pte_range(): + * - clear PTE; barrier; read refcount + * - inc refcount; barrier; read PTE + */ + smp_mb(); + + ref_count = page_count(page); + map_count = page_mapcount(page); + + /* + * Order reads for page refcount and dirty flag; + * see __remove_mapping(). + */ + smp_rmb(); + + /* + * The only page refs must be from the isolation + * plus one or more rmap's (dropped by discard:). + */ + if ((ref_count == 1 + map_count) && + !PageDirty(page)) { /* Invalidate as we cleared the pte */ mmu_notifier_invalidate_range(mm, address, address + PAGE_SIZE); --- a/mm/vmscan.c~mm-fix-race-between-madv_free-reclaim-and-blkdev-direct-io-read +++ a/mm/vmscan.c @@ -1714,7 +1714,7 @@ retry: mapping = page_mapping(page); } } else if (unlikely(PageTransHuge(page))) { - /* Split file THP */ + /* Split file/lazyfree THP */ if (split_huge_page_to_list(page, page_list)) goto keep_locked; } _ Patches currently in -mm which might be from mfo(a)canonical.com are mm-fix-race-between-madv_free-reclaim-and-blkdev-direct-io-read.patch

3 years, 9 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror January 2022