Linux-stable-mirror April 2019

linux-stable-mirror@lists.linaro.org

318 participants
849 discussions

patch "USB: core: Fix bug caused by duplicate interface PM usage counter" added to usb-linus

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled USB: core: Fix bug caused by duplicate interface PM usage counter to my usb git tree which can be found at git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git in the usb-linus branch. The patch will show up in the next release of the linux-next tree (usually sometime within the next 24 hours during the week.) The patch will hopefully also be merged in Linus's tree for the next -rc kernel release. If you have any questions about this process, please let me know. >From c2b71462d294cf517a0bc6e4fd6424d7cee5596f Mon Sep 17 00:00:00 2001 From: Alan Stern <stern(a)rowland.harvard.edu> Date: Fri, 19 Apr 2019 13:52:38 -0400 Subject: USB: core: Fix bug caused by duplicate interface PM usage counter The syzkaller fuzzer reported a bug in the USB hub driver which turned out to be caused by a negative runtime-PM usage counter. This allowed a hub to be runtime suspended at a time when the driver did not expect it. The symptom is a WARNING issued because the hub's status URB is submitted while it is already active: URB 0000000031fb463e submitted while active WARNING: CPU: 0 PID: 2917 at drivers/usb/core/urb.c:363 The negative runtime-PM usage count was caused by an unfortunate design decision made when runtime PM was first implemented for USB. At that time, USB class drivers were allowed to unbind from their interfaces without balancing the usage counter (i.e., leaving it with a positive count). The core code would take care of setting the counter back to 0 before allowing another driver to bind to the interface. Later on when runtime PM was implemented for the entire kernel, the opposite decision was made: Drivers were required to balance their runtime-PM get and put calls. In order to maintain backward compatibility, however, the USB subsystem adapted to the new implementation by keeping an independent usage counter for each interface and using it to automatically adjust the normal usage counter back to 0 whenever a driver was unbound. This approach involves duplicating information, but what is worse, it doesn't work properly in cases where a USB class driver delays decrementing the usage counter until after the driver's disconnect() routine has returned and the counter has been adjusted back to 0. Doing so would cause the usage counter to become negative. There's even a warning about this in the USB power management documentation! As it happens, this is exactly what the hub driver does. The kick_hub_wq() routine increments the runtime-PM usage counter, and the corresponding decrement is carried out by hub_event() in the context of the hub_wq work-queue thread. This work routine may sometimes run after the driver has been unbound from its interface, and when it does it causes the usage counter to go negative. It is not possible for hub_disconnect() to wait for a pending hub_event() call to finish, because hub_disconnect() is called with the device lock held and hub_event() acquires that lock. The only feasible fix is to reverse the original design decision: remove the duplicate interface-specific usage counter and require USB drivers to balance their runtime PM gets and puts. As far as I know, all existing drivers currently do this. Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu> Reported-and-tested-by: syzbot+7634edaea4d0b341c625(a)syzkaller.appspotmail.com CC: <stable(a)vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- Documentation/driver-api/usb/power-management.rst | 14 +++++++++----- drivers/usb/core/driver.c | 13 ------------- drivers/usb/storage/realtek_cr.c | 13 +++++-------- include/linux/usb.h | 2 -- 4 files changed, 14 insertions(+), 28 deletions(-) diff --git a/Documentation/driver-api/usb/power-management.rst b/Documentation/driver-api/usb/power-management.rst index 79beb807996b..4a74cf6f2797 100644 --- a/Documentation/driver-api/usb/power-management.rst +++ b/Documentation/driver-api/usb/power-management.rst @@ -370,11 +370,15 @@ autosuspend the interface's device. When the usage counter is = 0 then the interface is considered to be idle, and the kernel may autosuspend the device. -Drivers need not be concerned about balancing changes to the usage -counter; the USB core will undo any remaining "get"s when a driver -is unbound from its interface. As a corollary, drivers must not call -any of the ``usb_autopm_*`` functions after their ``disconnect`` -routine has returned. +Drivers must be careful to balance their overall changes to the usage +counter. Unbalanced "get"s will remain in effect when a driver is +unbound from its interface, preventing the device from going into +runtime suspend should the interface be bound to a driver again. On +the other hand, drivers are allowed to achieve this balance by calling +the ``usb_autopm_*`` functions even after their ``disconnect`` routine +has returned -- say from within a work-queue routine -- provided they +retain an active reference to the interface (via ``usb_get_intf`` and +``usb_put_intf``). Drivers using the async routines are responsible for their own synchronization and mutual exclusion. diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c index 8987cec9549d..ebcadaad89d1 100644 --- a/drivers/usb/core/driver.c +++ b/drivers/usb/core/driver.c @@ -473,11 +473,6 @@ static int usb_unbind_interface(struct device *dev) pm_runtime_disable(dev); pm_runtime_set_suspended(dev); - /* Undo any residual pm_autopm_get_interface_* calls */ - for (r = atomic_read(&intf->pm_usage_cnt); r > 0; --r) - usb_autopm_put_interface_no_suspend(intf); - atomic_set(&intf->pm_usage_cnt, 0); - if (!error) usb_autosuspend_device(udev); @@ -1633,7 +1628,6 @@ void usb_autopm_put_interface(struct usb_interface *intf) int status; usb_mark_last_busy(udev); - atomic_dec(&intf->pm_usage_cnt); status = pm_runtime_put_sync(&intf->dev); dev_vdbg(&intf->dev, "%s: cnt %d -> %d\n", __func__, atomic_read(&intf->dev.power.usage_count), @@ -1662,7 +1656,6 @@ void usb_autopm_put_interface_async(struct usb_interface *intf) int status; usb_mark_last_busy(udev); - atomic_dec(&intf->pm_usage_cnt); status = pm_runtime_put(&intf->dev); dev_vdbg(&intf->dev, "%s: cnt %d -> %d\n", __func__, atomic_read(&intf->dev.power.usage_count), @@ -1684,7 +1677,6 @@ void usb_autopm_put_interface_no_suspend(struct usb_interface *intf) struct usb_device *udev = interface_to_usbdev(intf); usb_mark_last_busy(udev); - atomic_dec(&intf->pm_usage_cnt); pm_runtime_put_noidle(&intf->dev); } EXPORT_SYMBOL_GPL(usb_autopm_put_interface_no_suspend); @@ -1715,8 +1707,6 @@ int usb_autopm_get_interface(struct usb_interface *intf) status = pm_runtime_get_sync(&intf->dev); if (status < 0) pm_runtime_put_sync(&intf->dev); - else - atomic_inc(&intf->pm_usage_cnt); dev_vdbg(&intf->dev, "%s: cnt %d -> %d\n", __func__, atomic_read(&intf->dev.power.usage_count), status); @@ -1750,8 +1740,6 @@ int usb_autopm_get_interface_async(struct usb_interface *intf) status = pm_runtime_get(&intf->dev); if (status < 0 && status != -EINPROGRESS) pm_runtime_put_noidle(&intf->dev); - else - atomic_inc(&intf->pm_usage_cnt); dev_vdbg(&intf->dev, "%s: cnt %d -> %d\n", __func__, atomic_read(&intf->dev.power.usage_count), status); @@ -1775,7 +1763,6 @@ void usb_autopm_get_interface_no_resume(struct usb_interface *intf) struct usb_device *udev = interface_to_usbdev(intf); usb_mark_last_busy(udev); - atomic_inc(&intf->pm_usage_cnt); pm_runtime_get_noresume(&intf->dev); } EXPORT_SYMBOL_GPL(usb_autopm_get_interface_no_resume); diff --git a/drivers/usb/storage/realtek_cr.c b/drivers/usb/storage/realtek_cr.c index 31b024441938..cc794e25a0b6 100644 --- a/drivers/usb/storage/realtek_cr.c +++ b/drivers/usb/storage/realtek_cr.c @@ -763,18 +763,16 @@ static void rts51x_suspend_timer_fn(struct timer_list *t) break; case RTS51X_STAT_IDLE: case RTS51X_STAT_SS: - usb_stor_dbg(us, "RTS51X_STAT_SS, intf->pm_usage_cnt:%d, power.usage:%d\n", - atomic_read(&us->pusb_intf->pm_usage_cnt), + usb_stor_dbg(us, "RTS51X_STAT_SS, power.usage:%d\n", atomic_read(&us->pusb_intf->dev.power.usage_count)); - if (atomic_read(&us->pusb_intf->pm_usage_cnt) > 0) { + if (atomic_read(&us->pusb_intf->dev.power.usage_count) > 0) { usb_stor_dbg(us, "Ready to enter SS state\n"); rts51x_set_stat(chip, RTS51X_STAT_SS); /* ignore mass storage interface's children */ pm_suspend_ignore_children(&us->pusb_intf->dev, true); usb_autopm_put_interface_async(us->pusb_intf); - usb_stor_dbg(us, "RTS51X_STAT_SS 01, intf->pm_usage_cnt:%d, power.usage:%d\n", - atomic_read(&us->pusb_intf->pm_usage_cnt), + usb_stor_dbg(us, "RTS51X_STAT_SS 01, power.usage:%d\n", atomic_read(&us->pusb_intf->dev.power.usage_count)); } break; @@ -807,11 +805,10 @@ static void rts51x_invoke_transport(struct scsi_cmnd *srb, struct us_data *us) int ret; if (working_scsi(srb)) { - usb_stor_dbg(us, "working scsi, intf->pm_usage_cnt:%d, power.usage:%d\n", - atomic_read(&us->pusb_intf->pm_usage_cnt), + usb_stor_dbg(us, "working scsi, power.usage:%d\n", atomic_read(&us->pusb_intf->dev.power.usage_count)); - if (atomic_read(&us->pusb_intf->pm_usage_cnt) <= 0) { + if (atomic_read(&us->pusb_intf->dev.power.usage_count) <= 0) { ret = usb_autopm_get_interface(us->pusb_intf); usb_stor_dbg(us, "working scsi, ret=%d\n", ret); } diff --git a/include/linux/usb.h b/include/linux/usb.h index 5e49e82c4368..ff010d1fd1c7 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -200,7 +200,6 @@ usb_find_last_int_out_endpoint(struct usb_host_interface *alt, * @dev: driver model's view of this device * @usb_dev: if an interface is bound to the USB major, this will point * to the sysfs representation for that device. - * @pm_usage_cnt: PM usage counter for this interface * @reset_ws: Used for scheduling resets from atomic context. * @resetting_device: USB core reset the device, so use alt setting 0 as * current; needs bandwidth alloc after reset. @@ -257,7 +256,6 @@ struct usb_interface { struct device dev; /* interface specific device info */ struct device *usb_dev; - atomic_t pm_usage_cnt; /* usage counter for autosuspend */ struct work_struct reset_ws; /* for resets in atomic context */ }; #define to_usb_interface(d) container_of(d, struct usb_interface, dev) -- 2.21.0

6 years, 8 months

Applied "ASoC: codec: hdac_hdmi add device_link to card device" to the asoc tree

by Mark Brown

The patch ASoC: codec: hdac_hdmi add device_link to card device has been applied to the asoc tree at https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark >From 01c8327667c249818d3712c3e25c7ad2aca7f389 Mon Sep 17 00:00:00 2001 From: Libin Yang <libin.yang(a)intel.com> Date: Sat, 13 Apr 2019 21:18:12 +0800 Subject: [PATCH] ASoC: codec: hdac_hdmi add device_link to card device In resume from S3, HDAC HDMI codec driver dapm event callback may be operated before HDMI codec driver turns on the display audio power domain because of the contest between display driver and hdmi codec driver. This patch adds the device_link between soc card device (consumer) and hdmi codec device (supplier) to make sure the sequence is always correct. Signed-off-by: Libin Yang <libin.yang(a)intel.com> Reviewed-by: Takashi Iwai <tiwai(a)suse.de> Acked-by: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com> Signed-off-by: Mark Brown <broonie(a)kernel.org> Cc: stable(a)vger.kernel.org --- sound/soc/codecs/hdac_hdmi.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/sound/soc/codecs/hdac_hdmi.c b/sound/soc/codecs/hdac_hdmi.c index 5eeb0fe836a9..4de1fbfa8827 100644 --- a/sound/soc/codecs/hdac_hdmi.c +++ b/sound/soc/codecs/hdac_hdmi.c @@ -1854,6 +1854,17 @@ static int hdmi_codec_probe(struct snd_soc_component *component) /* Imp: Store the card pointer in hda_codec */ hdmi->card = dapm->card->snd_card; + /* + * Setup a device_link between card device and HDMI codec device. + * The card device is the consumer and the HDMI codec device is + * the supplier. With this setting, we can make sure that the audio + * domain in display power will be always turned on before operating + * on the HDMI audio codec registers. + * Let's use the flag DL_FLAG_AUTOREMOVE_CONSUMER. This can make + * sure the device link is freed when the machine driver is removed. + */ + device_link_add(component->card->dev, &hdev->dev, DL_FLAG_RPM_ACTIVE | + DL_FLAG_AUTOREMOVE_CONSUMER); /* * hdac_device core already sets the state to active and calls * get_noresume. So enable runtime and set the device to suspend. -- 2.20.1

6 years, 8 months

patch "USB: dummy-hcd: Fix failure to give back unlinked URBs" added to usb-linus

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled USB: dummy-hcd: Fix failure to give back unlinked URBs to my usb git tree which can be found at git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git in the usb-linus branch. The patch will show up in the next release of the linux-next tree (usually sometime within the next 24 hours during the week.) The patch will hopefully also be merged in Linus's tree for the next -rc kernel release. If you have any questions about this process, please let me know. >From fc834e607ae3d18e1a20bca3f9a2d7f52ea7a2be Mon Sep 17 00:00:00 2001 From: Alan Stern <stern(a)rowland.harvard.edu> Date: Thu, 18 Apr 2019 13:12:07 -0400 Subject: USB: dummy-hcd: Fix failure to give back unlinked URBs The syzkaller USB fuzzer identified a failure mode in which dummy-hcd would never give back an unlinked URB. This causes usb_kill_urb() to hang, leading to WARNINGs and unkillable threads. In dummy-hcd, all URBs are given back by the dummy_timer() routine as it scans through the list of pending URBS. Failure to give back URBs can be caused by failure to start or early exit from the scanning loop. The code currently has two such pathways: One is triggered when an unsupported bus transfer speed is encountered, and the other by exhausting the simulated bandwidth for USB transfers during a frame. This patch removes those two paths, thereby allowing all unlinked URBs to be given back in a timely manner. It adds a check for the bus speed when the gadget first starts running, so that dummy_timer() will never thereafter encounter an unsupported speed. And it prevents the loop from exiting as soon as the total bandwidth has been used up (the scanning loop continues, giving back unlinked URBs as they are found, but not transferring any more data). Thanks to Andrey Konovalov for manually running the syzkaller fuzzer to help track down the source of the bug. Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu> Reported-and-tested-by: syzbot+d919b0f29d7b5a4994b9(a)syzkaller.appspotmail.com CC: <stable(a)vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/usb/gadget/udc/dummy_hcd.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/drivers/usb/gadget/udc/dummy_hcd.c b/drivers/usb/gadget/udc/dummy_hcd.c index baf72f95f0f1..213b52508621 100644 --- a/drivers/usb/gadget/udc/dummy_hcd.c +++ b/drivers/usb/gadget/udc/dummy_hcd.c @@ -979,8 +979,18 @@ static int dummy_udc_start(struct usb_gadget *g, struct dummy_hcd *dum_hcd = gadget_to_dummy_hcd(g); struct dummy *dum = dum_hcd->dum; - if (driver->max_speed == USB_SPEED_UNKNOWN) + switch (g->speed) { + /* All the speeds we support */ + case USB_SPEED_LOW: + case USB_SPEED_FULL: + case USB_SPEED_HIGH: + case USB_SPEED_SUPER: + break; + default: + dev_err(dummy_dev(dum_hcd), "Unsupported driver max speed %d\n", + driver->max_speed); return -EINVAL; + } /* * SLAVE side init ... the layer above hardware, which @@ -1784,9 +1794,10 @@ static void dummy_timer(struct timer_list *t) /* Bus speed is 500000 bytes/ms, so use a little less */ total = 490000; break; - default: + default: /* Can't happen */ dev_err(dummy_dev(dum_hcd), "bogus device speed\n"); - return; + total = 0; + break; } /* FIXME if HZ != 1000 this will probably misbehave ... */ @@ -1828,7 +1839,7 @@ static void dummy_timer(struct timer_list *t) /* Used up this frame's bandwidth? */ if (total <= 0) - break; + continue; /* find the gadget's ep for this request (if configured) */ address = usb_pipeendpoint (urb->pipe); -- 2.21.0

6 years, 8 months

FAILED: patch "[PATCH] tpm/tpm_i2c_atmel: Return -E2BIG when the transfer is" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 442601e87a4769a8daba4976ec3afa5222ca211d Mon Sep 17 00:00:00 2001 From: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com> Date: Fri, 8 Feb 2019 18:30:59 +0200 Subject: [PATCH] tpm/tpm_i2c_atmel: Return -E2BIG when the transfer is incomplete Return -E2BIG when the transfer is incomplete. The upper layer does not retry, so not doing that is incorrect behaviour. Cc: stable(a)vger.kernel.org Fixes: a2871c62e186 ("tpm: Add support for Atmel I2C TPMs") Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com> Reviewed-by: Stefan Berger <stefanb(a)linux.ibm.com> Reviewed-by: Jerry Snitselaar <jsnitsel(a)redhat.com> diff --git a/drivers/char/tpm/tpm_i2c_atmel.c b/drivers/char/tpm/tpm_i2c_atmel.c index 32a8e27c5382..cc4e642d3180 100644 --- a/drivers/char/tpm/tpm_i2c_atmel.c +++ b/drivers/char/tpm/tpm_i2c_atmel.c @@ -69,6 +69,10 @@ static int i2c_atmel_send(struct tpm_chip *chip, u8 *buf, size_t len) if (status < 0) return status; + /* The upper layer does not support incomplete sends. */ + if (status != len) + return -E2BIG; + return 0; }

6 years, 8 months

[PATCH] inet: update the IP ID generation algorithm to patch 355b98553789b646ed97ad801a619ff898471b92 standards.

by Amit Klein

Patch 355b98553789b646ed97ad801a619ff898471b92 makes net_hash_mix() return true 32 bits of entropy. When used in the IP ID generation algorithm, this has the effect of extending the IP ID generation key from 32 bits to 64 bits. However, net_hash_mix() is only used for IP ID generation starting with kernel version 4.1. Therefore, earlier kernels remain with 32-bit key. The patch addresses this issue by explicitly extending the key to 64 bits for kernels v<4.1. Signed-off-by: Amit Klein <aksecurity(a)gmail.com> --- net/ipv4/route.c | 4 +++- net/ipv6/ip6_output.c | 3 +++ 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index ede610a4abc8..446b6d202643 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -488,13 +488,15 @@ EXPORT_SYMBOL(ip_idents_reserve); void __ip_select_ident(struct iphdr *iph, int segs) { static u32 ip_idents_hashrnd __read_mostly; + static u32 ip_idents_hashrnd_extra __read_mostly; u32 hash, id; net_get_random_once(&ip_idents_hashrnd, sizeof(ip_idents_hashrnd)); + net_get_random_once(&ip_idents_hashrnd_extra, sizeof(ip_idents_hashrnd_extra)); hash = jhash_3words((__force u32)iph->daddr, (__force u32)iph->saddr, - iph->protocol, + iph->protocol^ip_idents_hashrnd_extra, ip_idents_hashrnd); id = ip_idents_reserve(hash, segs); iph->id = htons(id); diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 2b69a4b965ed..58e507a79cdd 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -546,12 +546,15 @@ static void ip6_copy_metadata(struct sk_buff *to, struct sk_buff *from) static void ipv6_select_ident(struct frag_hdr *fhdr, struct rt6_info *rt) { static u32 ip6_idents_hashrnd __read_mostly; + static u32 ip6_idents_hashrnd_extra __read_mostly; u32 hash, id; net_get_random_once(&ip6_idents_hashrnd, sizeof(ip6_idents_hashrnd)); + net_get_random_once(&ip6_idents_hashrnd_extra, sizeof(ip6_idents_hashrnd_extra)); hash = __ipv6_addr_jhash(&rt->rt6i_dst.addr, ip6_idents_hashrnd); hash = __ipv6_addr_jhash(&rt->rt6i_src.addr, hash); + hash = jhash_1word(hash,ip6_idents_hashrnd_extra); id = ip_idents_reserve(hash, 1); fhdr->identification = htonl(id); -- 2.17.1

6 years, 8 months

[patch 16/16] coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping

by akpm＠linux-foundation.org

From: Andrea Arcangeli <aarcange(a)redhat.com> Subject: coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping The core dumping code has always run without holding the mmap_sem for writing, despite that is the only way to ensure that the entire vma layout will not change from under it. Only using some signal serialization on the processes belonging to the mm is not nearly enough. This was pointed out earlier. For example in Hugh's post from Jul 2017: https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils "Not strictly relevant here, but a related note: I was very surprised to discover, only quite recently, how handle_mm_fault() may be called without down_read(mmap_sem) - when core dumping. That seems a misguided optimization to me, which would also be nice to correct" In particular because the growsdown and growsup can move the vm_start/vm_end the various loops the core dump does around the vma will not be consistent if page faults can happen concurrently. Pretty much all users calling mmget_not_zero()/get_task_mm() and then taking the mmap_sem had the potential to introduce unexpected side effects in the core dumping code. Adding mmap_sem for writing around the ->core_dump invocation is a viable long term fix, but it requires removing all copy user and page faults and to replace them with get_dump_page() for all binary formats which is not suitable as a short term fix. For the time being this solution manually covers the places that can confuse the core dump either by altering the vma layout or the vma flags while it runs. Once ->core_dump runs under mmap_sem for writing the function mmget_still_valid() can be dropped. Allowing mmap_sem protected sections to run in parallel with the coredump provides some minor parallelism advantage to the swapoff code (which seems to be safe enough by never mangling any vma field and can keep doing swapins in parallel to the core dumping) and to some other corner case. In order to facilitate the backporting I added "Fixes: 86039bd3b4e6" however the side effect of this same race condition in /proc/pid/mem should be reproducible since before 2.6.12-rc2 so I couldn't add any other "Fixes:" because there's no hash beyond the git genesis commit. Because find_extend_vma() is the only location outside of the process context that could modify the "mm" structures under mmap_sem for reading, by adding the mmget_still_valid() check to it, all other cases that take the mmap_sem for reading don't need the new check after mmget_not_zero()/get_task_mm(). The expand_stack() in page fault context also doesn't need the new check, because all tasks under core dumping are frozen. Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization") Signed-off-by: Andrea Arcangeli <aarcange(a)redhat.com> Reported-by: Jann Horn <jannh(a)google.com> Suggested-by: Oleg Nesterov <oleg(a)redhat.com> Acked-by: Peter Xu <peterx(a)redhat.com> Reviewed-by: Mike Rapoport <rppt(a)linux.ibm.com> Reviewed-by: Oleg Nesterov <oleg(a)redhat.com> Reviewed-by: Jann Horn <jannh(a)google.com> Acked-by: Jason Gunthorpe <jgg(a)mellanox.com> Acked-by: Michal Hocko <mhocko(a)suse.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- drivers/infiniband/core/uverbs_main.c | 3 +++ fs/proc/task_mmu.c | 18 ++++++++++++++++++ fs/userfaultfd.c | 9 +++++++++ include/linux/sched/mm.h | 21 +++++++++++++++++++++ mm/mmap.c | 7 ++++++- 5 files changed, 57 insertions(+), 1 deletion(-) --- a/drivers/infiniband/core/uverbs_main.c~coredump-fix-race-condition-between-mmget_not_zero-get_task_mm-and-core-dumping +++ a/drivers/infiniband/core/uverbs_main.c @@ -993,6 +993,8 @@ void uverbs_user_mmap_disassociate(struc * will only be one mm, so no big deal. */ down_write(&mm->mmap_sem); + if (!mmget_still_valid(mm)) + goto skip_mm; mutex_lock(&ufile->umap_lock); list_for_each_entry_safe (priv, next_priv, &ufile->umaps, list) { @@ -1007,6 +1009,7 @@ void uverbs_user_mmap_disassociate(struc vma->vm_flags &= ~(VM_SHARED | VM_MAYSHARE); } mutex_unlock(&ufile->umap_lock); + skip_mm: up_write(&mm->mmap_sem); mmput(mm); } --- a/fs/proc/task_mmu.c~coredump-fix-race-condition-between-mmget_not_zero-get_task_mm-and-core-dumping +++ a/fs/proc/task_mmu.c @@ -1143,6 +1143,24 @@ static ssize_t clear_refs_write(struct f count = -EINTR; goto out_mm; } + /* + * Avoid to modify vma->vm_flags + * without locked ops while the + * coredump reads the vm_flags. + */ + if (!mmget_still_valid(mm)) { + /* + * Silently return "count" + * like if get_task_mm() + * failed. FIXME: should this + * function have returned + * -ESRCH if get_task_mm() + * failed like if + * get_proc_task() fails? + */ + up_write(&mm->mmap_sem); + goto out_mm; + } for (vma = mm->mmap; vma; vma = vma->vm_next) { vma->vm_flags &= ~VM_SOFTDIRTY; vma_set_page_prot(vma); --- a/fs/userfaultfd.c~coredump-fix-race-condition-between-mmget_not_zero-get_task_mm-and-core-dumping +++ a/fs/userfaultfd.c @@ -629,6 +629,8 @@ static void userfaultfd_event_wait_compl /* the various vma->vm_userfaultfd_ctx still points to it */ down_write(&mm->mmap_sem); + /* no task can run (and in turn coredump) yet */ + VM_WARN_ON(!mmget_still_valid(mm)); for (vma = mm->mmap; vma; vma = vma->vm_next) if (vma->vm_userfaultfd_ctx.ctx == release_new_ctx) { vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; @@ -883,6 +885,8 @@ static int userfaultfd_release(struct in * taking the mmap_sem for writing. */ down_write(&mm->mmap_sem); + if (!mmget_still_valid(mm)) + goto skip_mm; prev = NULL; for (vma = mm->mmap; vma; vma = vma->vm_next) { cond_resched(); @@ -905,6 +909,7 @@ static int userfaultfd_release(struct in vma->vm_flags = new_flags; vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; } +skip_mm: up_write(&mm->mmap_sem); mmput(mm); wakeup: @@ -1333,6 +1338,8 @@ static int userfaultfd_register(struct u goto out; down_write(&mm->mmap_sem); + if (!mmget_still_valid(mm)) + goto out_unlock; vma = find_vma_prev(mm, start, &prev); if (!vma) goto out_unlock; @@ -1520,6 +1527,8 @@ static int userfaultfd_unregister(struct goto out; down_write(&mm->mmap_sem); + if (!mmget_still_valid(mm)) + goto out_unlock; vma = find_vma_prev(mm, start, &prev); if (!vma) goto out_unlock; --- a/include/linux/sched/mm.h~coredump-fix-race-condition-between-mmget_not_zero-get_task_mm-and-core-dumping +++ a/include/linux/sched/mm.h @@ -49,6 +49,27 @@ static inline void mmdrop(struct mm_stru __mmdrop(mm); } +/* + * This has to be called after a get_task_mm()/mmget_not_zero() + * followed by taking the mmap_sem for writing before modifying the + * vmas or anything the coredump pretends not to change from under it. + * + * NOTE: find_extend_vma() called from GUP context is the only place + * that can modify the "mm" (notably the vm_start/end) under mmap_sem + * for reading and outside the context of the process, so it is also + * the only case that holds the mmap_sem for reading that must call + * this function. Generally if the mmap_sem is hold for reading + * there's no need of this check after get_task_mm()/mmget_not_zero(). + * + * This function can be obsoleted and the check can be removed, after + * the coredump code will hold the mmap_sem for writing before + * invoking the ->core_dump methods. + */ +static inline bool mmget_still_valid(struct mm_struct *mm) +{ + return likely(!mm->core_state); +} + /** * mmget() - Pin the address space associated with a &struct mm_struct. * @mm: The address space to pin. --- a/mm/mmap.c~coredump-fix-race-condition-between-mmget_not_zero-get_task_mm-and-core-dumping +++ a/mm/mmap.c @@ -45,6 +45,7 @@ #include <linux/moduleparam.h> #include <linux/pkeys.h> #include <linux/oom.h> +#include <linux/sched/mm.h> #include <linux/uaccess.h> #include <asm/cacheflush.h> @@ -2525,7 +2526,8 @@ find_extend_vma(struct mm_struct *mm, un vma = find_vma_prev(mm, addr, &prev); if (vma && (vma->vm_start <= addr)) return vma; - if (!prev || expand_stack(prev, addr)) + /* don't alter vm_end if the coredump is running */ + if (!prev || !mmget_still_valid(mm) || expand_stack(prev, addr)) return NULL; if (prev->vm_flags & VM_LOCKED) populate_vma_page_range(prev, addr, prev->vm_end, NULL); @@ -2551,6 +2553,9 @@ find_extend_vma(struct mm_struct *mm, un return vma; if (!(vma->vm_flags & VM_GROWSDOWN)) return NULL; + /* don't alter vm_start if the coredump is running */ + if (!mmget_still_valid(mm)) + return NULL; start = vma->vm_start; if (expand_stack(vma, addr)) return NULL; _

6 years, 8 months

[patch 07/16] mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n

by akpm＠linux-foundation.org

From: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru> Subject: mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n 58bc4c34d249 ("mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly") depends on skipping vmstat entries with empty name introduced in 7aaf77272358 ("mm: don't show nr_indirectly_reclaimable in /proc/vmstat") but reverted in b29940c1abd7 ("mm: rename and change semantics of nr_indirectly_reclaimable_bytes"). So skipping no longer works and /proc/vmstat has misformatted lines " 0". This patch simply shows debug counters "nr_tlb_remote_*" for UP. Link: http://lkml.kernel.org/r/155481488468.467.4295519102880913454.stgit@buzz Fixes: 58bc4c34d249 ("mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly") Signed-off-by: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru> Acked-by: Vlastimil Babka <vbabka(a)suse.cz> Cc: Roman Gushchin <guro(a)fb.com> Cc: Jann Horn <jannh(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/vmstat.c | 5 ----- 1 file changed, 5 deletions(-) --- a/mm/vmstat.c~mm-vmstat-fix-proc-vmstat-format-for-config_debug_tlbflush=y-config_smp=n +++ a/mm/vmstat.c @@ -1274,13 +1274,8 @@ const char * const vmstat_text[] = { #endif #endif /* CONFIG_MEMORY_BALLOON */ #ifdef CONFIG_DEBUG_TLBFLUSH -#ifdef CONFIG_SMP "nr_tlb_remote_flush", "nr_tlb_remote_flush_received", -#else - "", /* nr_tlb_remote_flush */ - "", /* nr_tlb_remote_flush_received */ -#endif /* CONFIG_SMP */ "nr_tlb_local_flush_all", "nr_tlb_local_flush_one", #endif /* CONFIG_DEBUG_TLBFLUSH */ _

6 years, 8 months

[patch 06/16] mm/memory_hotplug: do not unlock after failing to take the device_hotplug_lock

by akpm＠linux-foundation.org

From: zhong jiang <zhongjiang(a)huawei.com> Subject: mm/memory_hotplug: do not unlock after failing to take the device_hotplug_lock When adding memory by probing a memory block in the sysfs interface, there is an obvious issue where we will unlock the device_hotplug_lock when we failed to takes it. That issue was introduced in 8df1d0e4a265 ("mm/memory_hotplug: make add_memory() take the device_hotplug_lock"). We should drop out in time when failing to take the device_hotplug_lock. Link: http://lkml.kernel.org/r/1554696437-9593-1-git-send-email-zhongjiang@huawei… Fixes: 8df1d0e4a265 ("mm/memory_hotplug: make add_memory() take the device_hotplug_lock") Signed-off-by: zhong jiang <zhongjiang(a)huawei.com> Reported-by: Yang yingliang <yangyingliang(a)huawei.com> Acked-by: Michal Hocko <mhocko(a)suse.com> Reviewed-by: David Hildenbrand <david(a)redhat.com> Reviewed-by: Oscar Salvador <osalvador(a)suse.de> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- drivers/base/memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/drivers/base/memory.c~mm-memory_hotplug-do-not-unlock-when-fails-to-take-the-device_hotplug_lock +++ a/drivers/base/memory.c @@ -506,7 +506,7 @@ static ssize_t probe_store(struct device ret = lock_device_hotplug_sysfs(); if (ret) - goto out; + return ret; nid = memory_add_physaddr_to_nid(phys_addr); ret = __add_memory(nid, phys_addr, _

6 years, 8 months

[failures] binfmt_elf-move-brk-out-of-mmap-when-doing-direct-loader-exec.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: fs/binfmt_elf.c: move brk out of mmap when doing direct loader exec has been removed from the -mm tree. Its filename was binfmt_elf-move-brk-out-of-mmap-when-doing-direct-loader-exec.patch This patch was dropped because it had testing failures ------------------------------------------------------ From: Kees Cook <keescook(a)chromium.org> Subject: fs/binfmt_elf.c: move brk out of mmap when doing direct loader exec eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"), made changes in the rare case when the ELF loader was directly invoked (e.g to set a non-inheritable LD_LIBRARY_PATH, testing new versions of the loader), by moving into the mmap region to avoid both ET_EXEC and PIE binaries. This had the effect of also moving the brk region into mmap, which could lead to the stack and brk being arbitrarily close to each other. An unlucky process wouldn't get its requested stack size and stack allocations could end up scribbling on the heap. This is illustrated here. In the case of using the loader directly, brk (so helpfully identified as "[heap]") is allocated with the _loader_ not the binary. For example, with ASLR entirely disabled, you can see this more clearly: $ /bin/cat /proc/self/maps 555555554000-55555555c000 r-xp 00000000 ... /bin/cat 55555575b000-55555575c000 r--p 00007000 ... /bin/cat 55555575c000-55555575d000 rw-p 00008000 ... /bin/cat 55555575d000-55555577e000 rw-p 00000000 ... [heap] ... 7ffff7ff7000-7ffff7ffa000 r--p 00000000 ... [vvar] 7ffff7ffa000-7ffff7ffc000 r-xp 00000000 ... [vdso] 7ffff7ffc000-7ffff7ffd000 r--p 00027000 ... /lib/x86_64-linux-gnu/ld-2.27.so 7ffff7ffd000-7ffff7ffe000 rw-p 00028000 ... /lib/x86_64-linux-gnu/ld-2.27.so 7ffff7ffe000-7ffff7fff000 rw-p 00000000 ... 7ffffffde000-7ffffffff000 rw-p 00000000 ... [stack] $ /lib/x86_64-linux-gnu/ld-2.27.so /bin/cat /proc/self/maps ... 7ffff7bcc000-7ffff7bd4000 r-xp 00000000 ... /bin/cat 7ffff7bd4000-7ffff7dd3000 ---p 00008000 ... /bin/cat 7ffff7dd3000-7ffff7dd4000 r--p 00007000 ... /bin/cat 7ffff7dd4000-7ffff7dd5000 rw-p 00008000 ... /bin/cat 7ffff7dd5000-7ffff7dfc000 r-xp 00000000 ... /lib/x86_64-linux-gnu/ld-2.27.so 7ffff7fb2000-7ffff7fd6000 rw-p 00000000 ... 7ffff7ff7000-7ffff7ffa000 r--p 00000000 ... [vvar] 7ffff7ffa000-7ffff7ffc000 r-xp 00000000 ... [vdso] 7ffff7ffc000-7ffff7ffd000 r--p 00027000 ... /lib/x86_64-linux-gnu/ld-2.27.so 7ffff7ffd000-7ffff7ffe000 rw-p 00028000 ... /lib/x86_64-linux-gnu/ld-2.27.so 7ffff7ffe000-7ffff8020000 rw-p 00000000 ... [heap] 7ffffffde000-7ffffffff000 rw-p 00000000 ... [stack] The solution is to move brk out of mmap and into ELF_ET_DYN_BASE since nothing is there in this direct loader case (and ET_EXEC still far away at 0x400000). Anything that ran before should still work (i.e. the ultimately-launched binary already had the brk very far from its text, so this should be no different from a COMPAT_BRK standpoint). The only risk I see here is that if someone started to suddenly depend on the entire memory space above the mmap region being available when launching binaries via a direct loader execs which seems highly unlikely, I'd hope: this would mean a binary would _not_ work when execed normally. Link: http://lkml.kernel.org/r/20190416042320.GA36924@beast Link: https://lkml.kernel.org/r/CAGXu5jJ5sj3emOT2QPxQkNQk0qbU6zEfu9=Omfhx_p0nCKPS… Fixes: eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE") Signed-off-by: Kees Cook <keescook(a)chromium.org> Reported-by: Ali Saidi <alisaidi(a)amazon.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Jann Horn <jannh(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/binfmt_elf.c | 9 +++++++++ 1 file changed, 9 insertions(+) --- a/fs/binfmt_elf.c~binfmt_elf-move-brk-out-of-mmap-when-doing-direct-loader-exec +++ a/fs/binfmt_elf.c @@ -1131,6 +1131,15 @@ out_free_interp: current->mm->end_data = end_data; current->mm->start_stack = bprm->p; + /* + * When executing a loader directly (ET_DYN without Interp), move + * the brk area out of the mmap region (since it grows up, and may + * collide early with the stack growing down), and into the unused + * ELF_ET_DYN_BASE region. + */ + if (!elf_interpreter) + current->mm->brk = current->mm->start_brk = ELF_ET_DYN_BASE; + if ((current->flags & PF_RANDOMIZE) && (randomize_va_space > 1)) { current->mm->brk = current->mm->start_brk = arch_randomize_brk(current->mm); _ Patches currently in -mm which might be from keescook(a)chromium.org are

6 years, 8 months

[PATCH] cifs: fix page reference leak with readv/writev

by jglisse＠redhat.com

From: Jérôme Glisse <jglisse(a)redhat.com> CIFS can leak pages reference gotten through GUP (get_user_pages*() through iov_iter_get_pages()). This happen if cifs_send_async_read() or cifs_write_from_iter() calls fail from within __cifs_readv() and __cifs_writev() respectively. This patch move page unreference to cifs_aio_ctx_release() which will happens on all code paths this is all simpler to follow for correctness. Signed-off-by: Jérôme Glisse <jglisse(a)redhat.com> Cc: Steve French <sfrench(a)samba.org> Cc: linux-cifs(a)vger.kernel.org Cc: samba-technical(a)lists.samba.org Cc: Alexander Viro <viro(a)zeniv.linux.org.uk> Cc: linux-fsdevel(a)vger.kernel.org Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Stable <stable(a)vger.kernel.org> --- fs/cifs/file.c | 15 +-------------- fs/cifs/misc.c | 23 ++++++++++++++++++++++- 2 files changed, 23 insertions(+), 15 deletions(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 89006e044973..a756a4d3f70f 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -2858,7 +2858,6 @@ static void collect_uncached_write_data(struct cifs_aio_ctx *ctx) struct cifs_tcon *tcon; struct cifs_sb_info *cifs_sb; struct dentry *dentry = ctx->cfile->dentry; - unsigned int i; int rc; tcon = tlink_tcon(ctx->cfile->tlink); @@ -2922,10 +2921,6 @@ static void collect_uncached_write_data(struct cifs_aio_ctx *ctx) kref_put(&wdata->refcount, cifs_uncached_writedata_release); } - if (!ctx->direct_io) - for (i = 0; i < ctx->npages; i++) - put_page(ctx->bv[i].bv_page); - cifs_stats_bytes_written(tcon, ctx->total_len); set_bit(CIFS_INO_INVALID_MAPPING, &CIFS_I(dentry->d_inode)->flags); @@ -3563,7 +3558,6 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx) struct iov_iter *to = &ctx->iter; struct cifs_sb_info *cifs_sb; struct cifs_tcon *tcon; - unsigned int i; int rc; tcon = tlink_tcon(ctx->cfile->tlink); @@ -3647,15 +3641,8 @@ collect_uncached_read_data(struct cifs_aio_ctx *ctx) kref_put(&rdata->refcount, cifs_uncached_readdata_release); } - if (!ctx->direct_io) { - for (i = 0; i < ctx->npages; i++) { - if (ctx->should_dirty) - set_page_dirty(ctx->bv[i].bv_page); - put_page(ctx->bv[i].bv_page); - } - + if (!ctx->direct_io) ctx->total_len = ctx->len - iov_iter_count(to); - } /* mask nodata case */ if (rc == -ENODATA) diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c index bee203055b30..9bc0d17a9d77 100644 --- a/fs/cifs/misc.c +++ b/fs/cifs/misc.c @@ -768,6 +768,11 @@ cifs_aio_ctx_alloc(void) { struct cifs_aio_ctx *ctx; + /* + * Must use kzalloc to initialize ctx->bv to NULL and ctx->direct_io + * to false so that we know when we have to unreference pages within + * cifs_aio_ctx_release() + */ ctx = kzalloc(sizeof(struct cifs_aio_ctx), GFP_KERNEL); if (!ctx) return NULL; @@ -786,7 +791,23 @@ cifs_aio_ctx_release(struct kref *refcount) struct cifs_aio_ctx, refcount); cifsFileInfo_put(ctx->cfile); - kvfree(ctx->bv); + + /* + * ctx->bv is only set if setup_aio_ctx_iter() was call successfuly + * which means that iov_iter_get_pages() was a success and thus that + * we have taken reference on pages. + */ + if (ctx->bv) { + unsigned i; + + for (i = 0; i < ctx->npages; i++) { + if (ctx->should_dirty) + set_page_dirty(ctx->bv[i].bv_page); + put_page(ctx->bv[i].bv_page); + } + kvfree(ctx->bv); + } + kfree(ctx); } -- 2.20.1

6 years, 8 months

← Newer
1
...
38
39
40
41
42
43
44
...
85
Older →

Jump to page:

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror April 2019