This is the start of the stable review cycle for the 3.18.133 release. There are 52 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat Jan 26 19:01:07 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.18.133-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-3.18.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 3.18.133-rc1
Michal Hocko mhocko@suse.com mm, proc: be more verbose about unstable VMA flags in /proc/<pid>/smaps
Junxiao Bi junxiao.bi@oracle.com ocfs2: fix panic due to unrecovered local alloc
Daniel Vetter daniel.vetter@ffwll.ch sysfs: Disable lockdep for driver bind/unbind files
Takashi Sakamoto o-takashi@sakamocchi.jp ALSA: bebob: fix model-id of unit for Apogee Ensemble
Nikos Tsironis ntsironis@arrikto.com dm snapshot: Fix excessive memory usage and workqueue stalls
Nikos Tsironis ntsironis@arrikto.com dm kcopyd: Fix bug causing workqueue stalls
Arnaldo Carvalho de Melo acme@redhat.com perf parse-events: Fix unchecked usage of strncpy()
Arnaldo Carvalho de Melo acme@redhat.com perf svghelper: Fix unchecked usage of strncpy()
Jonas Danielsson jonas@orbital-systems.com mmc: atmel-mci: do not assume idle after atmci_request_end
Masahiro Yamada yamada.masahiro@socionext.com kconfig: fix memory leak when EOF is encountered in quotation
Lucas Stach l.stach@pengutronix.de clk: imx6q: reset exclusive gates on init
David Disseldorp ddiss@suse.de scsi: target: use consistent left-aligned ASCII INQUIRY data
yupeng yupeng0921@gmail.com net: call sk_dst_reset when set SO_DONTROUTE
Nathan Chancellor natechancellor@gmail.com media: firewire: Fix app_info parameter type in avc_ca{,_app}_info
Breno Leitao leitao@debian.org powerpc/pseries/cpuidle: Fix preempt warning
Joel Fernandes (Google) joel@joelfernandes.org pstore/ram: Do not treat empty buffers as valid
Daniel Santos daniel.santos@pobox.com jffs2: Fix use of uninitialized delayed_work, lockdep breakage
Maciej W. Rozycki macro@linux-mips.org MIPS: SiByte: Enable swiotlb for SWARM, LittleSur and BigSur
Kai-Heng Feng kai.heng.feng@canonical.com r8169: Add support for new Realtek Ethernet
Mauro Carvalho Chehab mchehab+samsung@kernel.org media: vb2: be sure to unlock mutex on errors
Ivan Mironov mironov.ivan@gmail.com drm/fb-helper: Ignore the value of fb_var_screeninfo.pixclock
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp block/loop: Use global lock for ioctl() operation.
Xin Long lucien.xin@gmail.com sctp: allocate sctp_sockaddr_entry with kzalloc
Stephen Smalley sds@tycho.nsa.gov selinux: fix GPF on invalid policy
J. Bruce Fields bfields@redhat.com sunrpc: handle ENOMEM in rpcb_getport_async
Hans Verkuil hverkuil@xs4all.nl media: vb2: vb2_mmap: move lock up
Hans Verkuil hverkuil-cisco@xs4all.nl media: vivid: set min width/height to a value > 0
Hans Verkuil hverkuil-cisco@xs4all.nl media: vivid: fix error handling of kthread_run
Vlad Tsyrklevich vlad@tsyrklevich.net omap2fb: Fix stack memory disclosure
YunQiang Su ysu@wavecomp.com Disable MSI also when pcie-octeon.pcie_disable on
Jonathan Hunter jonathanh@nvidia.com mfd: tps6586x: Handle interrupts on suspend
Ivan Mironov mironov.ivan@gmail.com scsi: sd: Fix cache_type_store()
Kees Cook keescook@chromium.org Yama: Check for pid death before checking ancestry
Josef Bacik josef@toxicpanda.com btrfs: wait on ordered extents on abort cleanup
Eric Biggers ebiggers@google.com crypto: authenc - fix parsing key with misaligned rta_len
JianJhen Chen kchen@synology.com net: bridge: fix a bug on using a neighbour cache entry without checking its state
Jason Gunthorpe jgg@mellanox.com packet: Do not leak dev refcounts on error exit
Eric Dumazet edumazet@google.com ipv6: fix kernel-infoleak in ipv6_local_error()
Ben Hutchings ben@decadent.org.uk media: em28xx: Fix misplaced reset of dev->v4l::field_count
Oliver Hartkopp socketcan@hartkopp.net can: gw: ensure DLC boundaries after CAN frame modification
Dmitry Safonov dima@arista.com tty/ldsem: Wake up readers after timed out down_write()
Vasily Averin vvs@virtuozzo.com sunrpc: use-after-free in svc_process_common()
Eric Biggers ebiggers@google.com crypto: cts - fix crash on short inputs
Yi Zeng yizeng@asrmicro.com i2c: dev: prevent adapter retries and timeout being set as minus value
Hans de Goede hdegoede@redhat.com ACPI: power: Skip duplicate power resource references in _PRx
Christoph Lameter cl@linux.com slab: alien caches must not be initialized if the allocation of the alien cache failed
Icenowy Zheng icenowy@aosc.io USB: storage: add quirk for SMI SM3350
Icenowy Zheng icenowy@aosc.io USB: storage: don't insert sane sense for SPC3+ when bad sense specified
Daniele Palmas dnlplm@gmail.com usb: cdc-acm: send ZLP for Telit 3G Intel based modems
Ross Lagerwall ross.lagerwall@citrix.com cifs: Fix potential OOB access of lock element array
Pavel Shilovsky pshilov@microsoft.com CIFS: Do not hide EINTR after sending network packets
Andreas Larsson andreas@gaisler.com sparc32: Fix inverted invalid_frame_pointer checks on sigreturns
-------------
Diffstat:
Documentation/filesystems/proc.txt | 4 +- Makefile | 4 +- arch/arm/mach-imx/clk-imx6q.c | 6 ++- arch/mips/Kconfig | 3 ++ arch/mips/pci/msi-octeon.c | 4 +- arch/mips/sibyte/common/Makefile | 1 + arch/mips/sibyte/common/dma.c | 14 +++++++ arch/sparc/kernel/signal_32.c | 4 +- crypto/authenc.c | 14 +++++-- crypto/cts.c | 8 ++-- drivers/acpi/power.c | 22 +++++++++++ drivers/base/bus.c | 7 +++- drivers/block/loop.c | 47 ++++++++++++------------ drivers/block/loop.h | 1 - drivers/cpuidle/cpuidle-pseries.c | 8 +++- drivers/gpu/drm/drm_fb_helper.c | 7 +++- drivers/i2c/i2c-dev.c | 6 +++ drivers/md/dm-kcopyd.c | 19 +++++++--- drivers/md/dm-snap.c | 22 +++++++++++ drivers/media/firewire/firedtv-avc.c | 6 ++- drivers/media/firewire/firedtv.h | 6 ++- drivers/media/platform/vivid/vivid-kthread-cap.c | 5 ++- drivers/media/platform/vivid/vivid-kthread-out.c | 5 ++- drivers/media/platform/vivid/vivid-vid-common.c | 2 +- drivers/media/usb/em28xx/em28xx-video.c | 4 +- drivers/media/v4l2-core/videobuf2-core.c | 14 +++++-- drivers/mfd/tps6586x.c | 24 ++++++++++++ drivers/mmc/host/atmel-mci.c | 3 +- drivers/net/ethernet/realtek/r8169.c | 2 + drivers/scsi/sd.c | 6 +++ drivers/target/target_core_spc.c | 17 ++++++--- drivers/tty/tty_ldsem.c | 10 +++++ drivers/usb/class/cdc-acm.c | 7 ++++ drivers/usb/storage/scsiglue.c | 8 +++- drivers/usb/storage/unusual_devs.h | 12 ++++++ drivers/video/fbdev/omap2/omapfb/omapfb-ioctl.c | 2 + fs/btrfs/disk-io.c | 8 ++++ fs/cifs/file.c | 8 ++-- fs/cifs/smb2file.c | 4 +- fs/cifs/transport.c | 2 +- fs/jffs2/super.c | 3 +- fs/ocfs2/localalloc.c | 9 ++++- fs/pstore/ram_core.c | 5 +++ include/linux/sunrpc/svc.h | 5 ++- mm/slab.c | 6 ++- net/bridge/br_netfilter.c | 2 +- net/can/gw.c | 30 +++++++++++++-- net/core/sock.c | 1 + net/ipv6/datagram.c | 1 + net/packet/af_packet.c | 4 +- net/sctp/ipv6.c | 5 +-- net/sctp/protocol.c | 4 +- net/sunrpc/rpcb_clnt.c | 8 ++++ net/sunrpc/svc.c | 10 +++-- net/sunrpc/svc_xprt.c | 5 ++- net/sunrpc/svcsock.c | 2 +- scripts/kconfig/zconf.l | 2 + security/selinux/ss/policydb.c | 3 +- security/yama/yama_lsm.c | 4 +- sound/firewire/bebob/bebob.c | 2 +- tools/perf/util/parse-events.c | 2 +- tools/perf/util/svghelper.c | 2 +- 62 files changed, 366 insertions(+), 105 deletions(-)
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andreas Larsson andreas@gaisler.com
commit 07b5ab3f71d318e52c18cc3b73c1d44c908aacfa upstream.
Signed-off-by: Andreas Larsson andreas@gaisler.com Signed-off-by: David S. Miller davem@davemloft.net Cc: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/sparc/kernel/signal_32.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/sparc/kernel/signal_32.c +++ b/arch/sparc/kernel/signal_32.c @@ -89,7 +89,7 @@ asmlinkage void do_sigreturn(struct pt_r sf = (struct signal_frame __user *) regs->u_regs[UREG_FP];
/* 1. Make sure we are not getting garbage from the user */ - if (!invalid_frame_pointer(sf, sizeof(*sf))) + if (invalid_frame_pointer(sf, sizeof(*sf))) goto segv_and_exit;
if (get_user(ufp, &sf->info.si_regs.u_regs[UREG_FP])) @@ -150,7 +150,7 @@ asmlinkage void do_rt_sigreturn(struct p
synchronize_user_stack(); sf = (struct rt_signal_frame __user *) regs->u_regs[UREG_FP]; - if (!invalid_frame_pointer(sf, sizeof(*sf))) + if (invalid_frame_pointer(sf, sizeof(*sf))) goto segv;
if (get_user(ufp, &sf->regs.u_regs[UREG_FP]))
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Shilovsky pshilov@microsoft.com
commit ee13919c2e8d1f904e035ad4b4239029a8994131 upstream.
Currently we hide EINTR code returned from sock_sendmsg() and return 0 instead. This makes a caller think that we successfully completed the network operation which is not true. Fix this by properly returning EINTR to callers.
Cc: stable@vger.kernel.org Signed-off-by: Pavel Shilovsky pshilov@microsoft.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/transport.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/cifs/transport.c +++ b/fs/cifs/transport.c @@ -360,7 +360,7 @@ uncork: if (rc < 0 && rc != -EINTR) cifs_dbg(VFS, "Error %d sending data on socket to server\n", rc); - else + else if (rc > 0) rc = 0;
return rc;
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ross Lagerwall ross.lagerwall@citrix.com
commit b9a74cde94957d82003fb9f7ab4777938ca851cd upstream.
If maxBuf is small but non-zero, it could result in a zero sized lock element array which we would then try and access OOB.
Signed-off-by: Ross Lagerwall ross.lagerwall@citrix.com Signed-off-by: Steve French stfrench@microsoft.com CC: Stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/file.c | 8 ++++---- fs/cifs/smb2file.c | 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-)
--- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -1067,10 +1067,10 @@ cifs_push_mandatory_locks(struct cifsFil
/* * Accessing maxBuf is racy with cifs_reconnect - need to store value - * and check it for zero before using. + * and check it before using. */ max_buf = tcon->ses->server->maxBuf; - if (!max_buf) { + if (max_buf < (sizeof(struct smb_hdr) + sizeof(LOCKING_ANDX_RANGE))) { free_xid(xid); return -EINVAL; } @@ -1404,10 +1404,10 @@ cifs_unlock_range(struct cifsFileInfo *c
/* * Accessing maxBuf is racy with cifs_reconnect - need to store value - * and check it for zero before using. + * and check it before using. */ max_buf = tcon->ses->server->maxBuf; - if (!max_buf) + if (max_buf < (sizeof(struct smb_hdr) + sizeof(LOCKING_ANDX_RANGE))) return -EINVAL;
max_num = (max_buf - sizeof(struct smb_hdr)) / --- a/fs/cifs/smb2file.c +++ b/fs/cifs/smb2file.c @@ -104,10 +104,10 @@ smb2_unlock_range(struct cifsFileInfo *c
/* * Accessing maxBuf is racy with cifs_reconnect - need to store value - * and check it for zero before using. + * and check it before using. */ max_buf = tcon->ses->server->maxBuf; - if (!max_buf) + if (max_buf < sizeof(struct smb2_lock_element)) return -EINVAL;
max_num = max_buf / sizeof(struct smb2_lock_element);
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniele Palmas dnlplm@gmail.com
commit 34aabf918717dd14e05051896aaecd3b16b53d95 upstream.
Telit 3G Intel based modems require zero packet to be sent if out data size is equal to the endpoint max packet size.
Signed-off-by: Daniele Palmas dnlplm@gmail.com Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/class/cdc-acm.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/drivers/usb/class/cdc-acm.c +++ b/drivers/usb/class/cdc-acm.c @@ -1904,6 +1904,13 @@ static const struct usb_device_id acm_id .driver_info = IGNORE_DEVICE, },
+ { USB_DEVICE(0x1bc7, 0x0021), /* Telit 3G ACM only composition */ + .driver_info = SEND_ZERO_PACKET, + }, + { USB_DEVICE(0x1bc7, 0x0023), /* Telit 3G ACM + ECM composition */ + .driver_info = SEND_ZERO_PACKET, + }, + /* control interfaces without any protocol set */ { USB_INTERFACE_INFO(USB_CLASS_COMM, USB_CDC_SUBCLASS_ACM, USB_CDC_PROTO_NONE) },
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Icenowy Zheng icenowy@aosc.io
commit c5603d2fdb424849360fe7e3f8c1befc97571b8c upstream.
Currently the code will set US_FL_SANE_SENSE flag unconditionally if device claims SPC3+, however we should allow US_FL_BAD_SENSE flag to prevent this behavior, because SMI SM3350 UFS-USB bridge controller, which claims SPC4, will show strange behavior with 96-byte sense (put the chip into a wrong state that cannot read/write anything).
Check the presence of US_FL_BAD_SENSE when assuming US_FL_SANE_SENSE on SPC4+ devices.
Signed-off-by: Icenowy Zheng icenowy@aosc.io Cc: stable stable@vger.kernel.org Acked-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/storage/scsiglue.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/usb/storage/scsiglue.c +++ b/drivers/usb/storage/scsiglue.c @@ -223,8 +223,12 @@ static int slave_configure(struct scsi_d if (!(us->fflags & US_FL_NEEDS_CAP16)) sdev->try_rc_10_first = 1;
- /* assume SPC3 or latter devices support sense size > 18 */ - if (sdev->scsi_level > SCSI_SPC_2) + /* + * assume SPC3 or latter devices support sense size > 18 + * unless US_FL_BAD_SENSE quirk is specified. + */ + if (sdev->scsi_level > SCSI_SPC_2 && + !(us->fflags & US_FL_BAD_SENSE)) us->fflags |= US_FL_SANE_SENSE;
/* USB-IDE bridges tend to report SK = 0x04 (Non-recoverable
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Icenowy Zheng icenowy@aosc.io
commit 0a99cc4b8ee83885ab9f097a3737d1ab28455ac0 upstream.
The SMI SM3350 USB-UFS bridge controller cannot handle long sense request correctly and will make the chip refuse to do read/write when requested long sense.
Add a bad sense quirk for it.
Signed-off-by: Icenowy Zheng icenowy@aosc.io Cc: stable stable@vger.kernel.org Acked-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/storage/unusual_devs.h | 12 ++++++++++++ 1 file changed, 12 insertions(+)
--- a/drivers/usb/storage/unusual_devs.h +++ b/drivers/usb/storage/unusual_devs.h @@ -1393,6 +1393,18 @@ UNUSUAL_DEV( 0x0d49, 0x7310, 0x0000, 0x US_FL_SANE_SENSE),
/* + * Reported by Icenowy Zheng icenowy@aosc.io + * The SMI SM3350 USB-UFS bridge controller will enter a wrong state + * that do not process read/write command if a long sense is requested, + * so force to use 18-byte sense. + */ +UNUSUAL_DEV( 0x090c, 0x3350, 0x0000, 0xffff, + "SMI", + "SM3350 UFS-to-USB-Mass-Storage bridge", + USB_SC_DEVICE, USB_PR_DEVICE, NULL, + US_FL_BAD_SENSE ), + +/* * Pete Zaitcev zaitcev@yahoo.com, bz#164688. * The device blatantly ignores LUN and returns 1 in GetMaxLUN. */
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Lameter cl@linux.com
commit 09c2e76ed734a1d36470d257a778aaba28e86531 upstream.
Callers of __alloc_alien() check for NULL. We must do the same check in __alloc_alien_cache to avoid NULL pointer dereferences on allocation failures.
Link: http://lkml.kernel.org/r/010001680f42f192-82b4e12e-1565-4ee0-ae1f-1e98974906... Fixes: 49dfc304ba241 ("slab: use the lock on alien_cache, instead of the lock on array_cache") Fixes: c8522a3a5832b ("Slab: introduce alloc_alien") Signed-off-by: Christoph Lameter cl@linux.com Reported-by: syzbot+d6ed4ec679652b4fd4e4@syzkaller.appspotmail.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/slab.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/mm/slab.c +++ b/mm/slab.c @@ -869,8 +869,10 @@ static struct alien_cache *__alloc_alien struct alien_cache *alc = NULL;
alc = kmalloc_node(memsize, gfp, node); - init_arraycache(&alc->ac, entries, batch); - spin_lock_init(&alc->lock); + if (alc) { + init_arraycache(&alc->ac, entries, batch); + spin_lock_init(&alc->lock); + } return alc; }
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hdegoede@redhat.com
commit 7d7b467cb95bf29597b417d4990160d4ea6d69b9 upstream.
Some ACPI tables contain duplicate power resource references like this:
Name (_PR0, Package (0x04) // _PR0: Power Resources for D0 { P28P, P18P, P18P, CLK4 })
This causes a WARN_ON in sysfs_add_link_to_group() because we end up adding a link to the same acpi_device twice:
sysfs: cannot create duplicate filename '/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/808622C1:00/OVTI2680:00/power_resources_D0/LNXPOWER:0a' CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.12-301.fc29.x86_64 #1 Hardware name: Insyde CherryTrail/Type2 - Board Product Name, BIOS jumperx.T87.KFBNEEA02 04/13/2016 Call Trace: dump_stack+0x5c/0x80 sysfs_warn_dup.cold.3+0x17/0x2a sysfs_do_create_link_sd.isra.2+0xa9/0xb0 sysfs_add_link_to_group+0x30/0x50 acpi_power_expose_list+0x74/0xa0 acpi_power_add_remove_device+0x50/0xa0 acpi_add_single_object+0x26b/0x5f0 acpi_bus_check_add+0xc4/0x250 ...
To address this issue, make acpi_extract_power_resources() check for duplicates and simply skip them when found.
Cc: All applicable stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com [ rjw: Subject & changelog, comments ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/acpi/power.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
--- a/drivers/acpi/power.c +++ b/drivers/acpi/power.c @@ -132,6 +132,23 @@ void acpi_power_resources_list_free(stru } }
+static bool acpi_power_resource_is_dup(union acpi_object *package, + unsigned int start, unsigned int i) +{ + acpi_handle rhandle, dup; + unsigned int j; + + /* The caller is expected to check the package element types */ + rhandle = package->package.elements[i].reference.handle; + for (j = start; j < i; j++) { + dup = package->package.elements[j].reference.handle; + if (dup == rhandle) + return true; + } + + return false; +} + int acpi_extract_power_resources(union acpi_object *package, unsigned int start, struct list_head *list) { @@ -151,6 +168,11 @@ int acpi_extract_power_resources(union a err = -ENODEV; break; } + + /* Some ACPI tables contain duplicate power resource references */ + if (acpi_power_resource_is_dup(package, start, i)) + continue; + err = acpi_add_power_resource(rhandle); if (err) break;
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yi Zeng yizeng@asrmicro.com
commit 6ebec961d59bccf65d08b13fc1ad4e6272a89338 upstream.
If adapter->retries is set to a minus value from user space via ioctl, it will make __i2c_transfer and __i2c_smbus_xfer skip the calling to adapter->algo->master_xfer and adapter->algo->smbus_xfer that is registered by the underlying bus drivers, and return value 0 to all the callers. The bus driver will never be accessed anymore by all users, besides, the users may still get successful return value without any error or information log print out.
If adapter->timeout is set to minus value from user space via ioctl, it will make the retrying loop in __i2c_transfer and __i2c_smbus_xfer always break after the the first try, due to the time_after always returns true.
Signed-off-by: Yi Zeng yizeng@asrmicro.com [wsa: minor grammar updates to commit message] Signed-off-by: Wolfram Sang wsa@the-dreams.de Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/i2c/i2c-dev.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/i2c/i2c-dev.c +++ b/drivers/i2c/i2c-dev.c @@ -462,9 +462,15 @@ static long i2cdev_ioctl(struct file *fi return i2cdev_ioctl_smbus(client, arg);
case I2C_RETRIES: + if (arg > INT_MAX) + return -EINVAL; + client->adapter->retries = arg; break; case I2C_TIMEOUT: + if (arg > INT_MAX) + return -EINVAL; + /* For historical reasons, user-space sets the timeout * value in units of 10 ms. */
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Biggers ebiggers@google.com
[It's a minimal fix for a bug that was fixed incidentally by a large refactoring in v4.8.]
In the CTS template, when the input length is <= one block cipher block (e.g. <= 16 bytes for AES) pass the correct length to the underlying CBC transform rather than one block. This matches the upstream behavior and makes the encryption/decryption operation correctly return -EINVAL when 1 <= nbytes < bsize or succeed when nbytes == 0, rather than crashing.
This was fixed upstream incidentally by a large refactoring, commit 0605c41cc53c ("crypto: cts - Convert to skcipher"). But syzkaller easily trips over this when running on older kernels, as it's easily reachable via AF_ALG. Therefore, this patch makes the minimal fix for older kernels.
Cc: linux-crypto@vger.kernel.org Fixes: 76cb9521795a ("[CRYPTO] cts: Add CTS mode required for Kerberos AES support") Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- crypto/cts.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/crypto/cts.c +++ b/crypto/cts.c @@ -137,8 +137,8 @@ static int crypto_cts_encrypt(struct blk lcldesc.info = desc->info; lcldesc.flags = desc->flags;
- if (tot_blocks == 1) { - err = crypto_blkcipher_encrypt_iv(&lcldesc, dst, src, bsize); + if (tot_blocks <= 1) { + err = crypto_blkcipher_encrypt_iv(&lcldesc, dst, src, nbytes); } else if (nbytes <= bsize * 2) { err = cts_cbc_encrypt(ctx, desc, dst, src, 0, nbytes); } else { @@ -232,8 +232,8 @@ static int crypto_cts_decrypt(struct blk lcldesc.info = desc->info; lcldesc.flags = desc->flags;
- if (tot_blocks == 1) { - err = crypto_blkcipher_decrypt_iv(&lcldesc, dst, src, bsize); + if (tot_blocks <= 1) { + err = crypto_blkcipher_decrypt_iv(&lcldesc, dst, src, nbytes); } else if (nbytes <= bsize * 2) { err = cts_cbc_decrypt(ctx, desc, dst, src, 0, nbytes); } else {
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vasily Averin vvs@virtuozzo.com
commit d4b09acf924b84bae77cad090a9d108e70b43643 upstream.
if node have NFSv41+ mounts inside several net namespaces it can lead to use-after-free in svc_process_common()
svc_process_common() /* Setup reply header */ rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr(rqstp); <<< HERE
svc_process_common() can use incorrect rqstp->rq_xprt, its caller function bc_svc_process() takes it from serv->sv_bc_xprt. The problem is that serv is global structure but sv_bc_xprt is assigned per-netnamespace.
According to Trond, the whole "let's set up rqstp->rq_xprt for the back channel" is nothing but a giant hack in order to work around the fact that svc_process_common() uses it to find the xpt_ops, and perform a couple of (meaningless for the back channel) tests of xpt_flags.
All we really need in svc_process_common() is to be able to run rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr()
Bruce J Fields points that this xpo_prep_reply_hdr() call is an awfully roundabout way just to do "svc_putnl(resv, 0);" in the tcp case.
This patch does not initialiuze rqstp->rq_xprt in bc_svc_process(), now it calls svc_process_common() with rqstp->rq_xprt = NULL.
To adjust reply header svc_process_common() just check rqstp->rq_prot and calls svc_tcp_prep_reply_hdr() for tcp case.
To handle rqstp->rq_xprt = NULL case in functions called from svc_process_common() patch intruduces net namespace pointer svc_rqst->rq_bc_net and adjust SVC_NET() definition. Some other function was also adopted to properly handle described case.
Signed-off-by: Vasily Averin vvs@virtuozzo.com Cc: stable@vger.kernel.org Fixes: 23c20ecd4475 ("NFS: callback up - users counting cleanup") Signed-off-by: J. Bruce Fields bfields@redhat.com v2: - added lost extern svc_tcp_prep_reply_hdr() - context changes in svc_process_common() - dropped trace_svc_process() changes Signed-off-by: Vasily Averin vvs@virtuozzo.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/sunrpc/svc.h | 5 ++++- net/sunrpc/svc.c | 10 +++++++--- net/sunrpc/svc_xprt.c | 5 +++-- net/sunrpc/svcsock.c | 2 +- 4 files changed, 15 insertions(+), 7 deletions(-)
--- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -281,9 +281,12 @@ struct svc_rqst { * to prevent encrypting page * cache pages */ struct task_struct *rq_task; /* service thread */ + struct net *rq_bc_net; /* pointer to backchannel's + * net namespace + */ };
-#define SVC_NET(svc_rqst) (svc_rqst->rq_xprt->xpt_net) +#define SVC_NET(rqst) (rqst->rq_xprt ? rqst->rq_xprt->xpt_net : rqst->rq_bc_net)
/* * Rigorous type checking on sockaddr type conversions --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -1061,6 +1061,8 @@ void svc_printk(struct svc_rqst *rqstp, static __printf(2,3) void svc_printk(struct svc_rqst *rqstp, const char *fmt, ...) {} #endif
+extern void svc_tcp_prep_reply_hdr(struct svc_rqst *); + /* * Common routine for processing the RPC request. */ @@ -1090,7 +1092,8 @@ svc_process_common(struct svc_rqst *rqst rqstp->rq_dropme = false;
/* Setup reply header */ - rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr(rqstp); + if (rqstp->rq_prot == IPPROTO_TCP) + svc_tcp_prep_reply_hdr(rqstp);
svc_putu32(resv, rqstp->rq_xid);
@@ -1137,7 +1140,8 @@ svc_process_common(struct svc_rqst *rqst case SVC_DENIED: goto err_bad_auth; case SVC_CLOSE: - if (test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags)) + if (rqstp->rq_xprt && + test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags)) svc_close_xprt(rqstp->rq_xprt); case SVC_DROP: goto dropit; @@ -1347,10 +1351,10 @@ bc_svc_process(struct svc_serv *serv, st struct kvec *resv = &rqstp->rq_res.head[0];
/* Build the svc_rqst used by the common processing routine */ - rqstp->rq_xprt = serv->sv_bc_xprt; rqstp->rq_xid = req->rq_xid; rqstp->rq_prot = req->rq_xprt->prot; rqstp->rq_server = serv; + rqstp->rq_bc_net = req->rq_xprt->xprt_net;
rqstp->rq_addrlen = sizeof(req->rq_xprt->addr); memcpy(&rqstp->rq_addr, &req->rq_xprt->addr, rqstp->rq_addrlen); --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -438,10 +438,11 @@ static struct svc_xprt *svc_xprt_dequeue */ void svc_reserve(struct svc_rqst *rqstp, int space) { + struct svc_xprt *xprt = rqstp->rq_xprt; + space += rqstp->rq_res.head[0].iov_len;
- if (space < rqstp->rq_reserved) { - struct svc_xprt *xprt = rqstp->rq_xprt; + if (xprt && space < rqstp->rq_reserved) { atomic_sub((rqstp->rq_reserved - space), &xprt->xpt_reserved); rqstp->rq_reserved = space;
--- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -1212,7 +1212,7 @@ static int svc_tcp_sendto(struct svc_rqs /* * Setup response header. TCP has a 4B record length field. */ -static void svc_tcp_prep_reply_hdr(struct svc_rqst *rqstp) +void svc_tcp_prep_reply_hdr(struct svc_rqst *rqstp) { struct kvec *resv = &rqstp->rq_res.head[0];
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Safonov dima@arista.com
commit 231f8fd0cca078bd4396dd7e380db813ac5736e2 upstream.
ldsem_down_read() will sleep if there is pending writer in the queue. If the writer times out, readers in the queue should be woken up, otherwise they may miss a chance to acquire the semaphore until the last active reader will do ldsem_up_read().
There was a couple of reports where there was one active reader and other readers soft locked up: Showing all locks held in the system: 2 locks held by khungtaskd/17: #0: (rcu_read_lock){......}, at: watchdog+0x124/0x6d1 #1: (tasklist_lock){.+.+..}, at: debug_show_all_locks+0x72/0x2d3 2 locks held by askfirst/123: #0: (&tty->ldisc_sem){.+.+.+}, at: ldsem_down_read+0x46/0x58 #1: (&ldata->atomic_read_lock){+.+...}, at: n_tty_read+0x115/0xbe4
Prevent readers wait for active readers to release ldisc semaphore.
Link: lkml.kernel.org/r/20171121132855.ajdv4k6swzhvktl6@wfg-t540p.sh.intel.com Link: lkml.kernel.org/r/20180907045041.GF1110@shao2-debian Cc: Jiri Slaby jslaby@suse.com Cc: Peter Zijlstra peterz@infradead.org Cc: stable@vger.kernel.org Reported-by: kernel test robot rong.a.chen@intel.com Signed-off-by: Dmitry Safonov dima@arista.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/tty/tty_ldsem.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/drivers/tty/tty_ldsem.c +++ b/drivers/tty/tty_ldsem.c @@ -306,6 +306,16 @@ down_write_failed(struct ld_semaphore *s if (!locked) ldsem_atomic_update(-LDSEM_WAIT_BIAS, sem); list_del(&waiter.list); + + /* + * In case of timeout, wake up every reader who gave the right of way + * to writer. Prevent separation readers into two groups: + * one that helds semaphore and another that sleeps. + * (in case of no contention with a writer) + */ + if (!locked && list_empty(&sem->write_wait)) + __ldsem_wake_readers(sem); + raw_spin_unlock_irq(&sem->wait_lock);
__set_task_state(tsk, TASK_RUNNING);
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oliver Hartkopp socketcan@hartkopp.net
commit 0aaa81377c5a01f686bcdb8c7a6929a7bf330c68 upstream.
Muyu Yu provided a POC where user root with CAP_NET_ADMIN can create a CAN frame modification rule that makes the data length code a higher value than the available CAN frame data size. In combination with a configured checksum calculation where the result is stored relatively to the end of the data (e.g. cgw_csum_xor_rel) the tail of the skb (e.g. frag_list pointer in skb_shared_info) can be rewritten which finally can cause a system crash.
Michael Kubecek suggested to drop frames that have a DLC exceeding the available space after the modification process and provided a patch that can handle CAN FD frames too. Within this patch we also limit the length for the checksum calculations to the maximum of Classic CAN data length (8).
CAN frames that are dropped by these additional checks are counted with the CGW_DELETED counter which indicates misconfigurations in can-gw rules.
This fixes CVE-2019-3701.
Reported-by: Muyu Yu ieatmuttonchuan@gmail.com Reported-by: Marcus Meissner meissner@suse.de Suggested-by: Michal Kubecek mkubecek@suse.cz Tested-by: Muyu Yu ieatmuttonchuan@gmail.com Tested-by: Oliver Hartkopp socketcan@hartkopp.net Signed-off-by: Oliver Hartkopp socketcan@hartkopp.net Cc: linux-stable stable@vger.kernel.org # >= v3.2 Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/can/gw.c | 30 +++++++++++++++++++++++++++--- 1 file changed, 27 insertions(+), 3 deletions(-)
--- a/net/can/gw.c +++ b/net/can/gw.c @@ -417,13 +417,29 @@ static void can_can_gw_rcv(struct sk_buf while (modidx < MAX_MODFUNCTIONS && gwj->mod.modfunc[modidx]) (*gwj->mod.modfunc[modidx++])(cf, &gwj->mod);
- /* check for checksum updates when the CAN frame has been modified */ + /* Has the CAN frame been modified? */ if (modidx) { - if (gwj->mod.csumfunc.crc8) + /* get available space for the processed CAN frame type */ + int max_len = nskb->len - offsetof(struct can_frame, data); + + /* dlc may have changed, make sure it fits to the CAN frame */ + if (cf->can_dlc > max_len) + goto out_delete; + + /* check for checksum updates in classic CAN length only */ + if (gwj->mod.csumfunc.crc8) { + if (cf->can_dlc > 8) + goto out_delete; + (*gwj->mod.csumfunc.crc8)(cf, &gwj->mod.csum.crc8); + } + + if (gwj->mod.csumfunc.xor) { + if (cf->can_dlc > 8) + goto out_delete;
- if (gwj->mod.csumfunc.xor) (*gwj->mod.csumfunc.xor)(cf, &gwj->mod.csum.xor); + } }
/* clear the skb timestamp if not configured the other way */ @@ -435,6 +451,14 @@ static void can_can_gw_rcv(struct sk_buf gwj->dropped_frames++; else gwj->handled_frames++; + + return; + + out_delete: + /* delete frame due to misconfiguration */ + gwj->deleted_frames++; + kfree_skb(nskb); + return; }
static inline int cgw_register_filter(struct cgw_job *gwj)
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ben Hutchings ben@decadent.org.uk
The backport of commit afeaade90db4 "media: em28xx: make v4l2-compliance happier by starting sequence on zero" added a reset on em28xx_v4l2::field_count to em28xx_ctrl_notify(), but it should be done in em28xx_start_analog_streaming().
Signed-off-by: Ben Hutchings ben@decadent.org.uk Cc: Mauro Carvalho Chehab mchehab+samsung@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/media/usb/em28xx/em28xx-video.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/media/usb/em28xx/em28xx-video.c +++ b/drivers/media/usb/em28xx/em28xx-video.c @@ -931,6 +931,8 @@ int em28xx_start_analog_streaming(struct
em28xx_videodbg("%s\n", __func__);
+ dev->v4l2->field_count = 0; + /* Make sure streaming is not already in progress for this type of filehandle (e.g. video, vbi) */ rc = res_get(dev, vq->type); @@ -1141,8 +1143,6 @@ static void em28xx_ctrl_notify(struct v4 { struct em28xx *dev = priv;
- dev->v4l2->field_count = 0; - /* * In the case of non-AC97 volume controls, we still need * to do some setups at em28xx, in order to mute/unmute
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 7d033c9f6a7fd3821af75620a0257db87c2b552a ]
This patch makes sure the flow label in the IPv6 header forged in ipv6_local_error() is initialized.
BUG: KMSAN: kernel-infoleak in _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32 CPU: 1 PID: 24675 Comm: syz-executor1 Not tainted 4.20.0-rc7+ #4 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x173/0x1d0 lib/dump_stack.c:113 kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613 kmsan_internal_check_memory+0x455/0xb00 mm/kmsan/kmsan.c:675 kmsan_copy_to_user+0xab/0xc0 mm/kmsan/kmsan_hooks.c:601 _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32 copy_to_user include/linux/uaccess.h:177 [inline] move_addr_to_user+0x2e9/0x4f0 net/socket.c:227 ___sys_recvmsg+0x5d7/0x1140 net/socket.c:2284 __sys_recvmsg net/socket.c:2327 [inline] __do_sys_recvmsg net/socket.c:2337 [inline] __se_sys_recvmsg+0x2fa/0x450 net/socket.c:2334 __x64_sys_recvmsg+0x4a/0x70 net/socket.c:2334 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x63/0xe7 RIP: 0033:0x457ec9 Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f8750c06c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002f RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457ec9 RDX: 0000000000002000 RSI: 0000000020000400 RDI: 0000000000000005 RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8750c076d4 R13: 00000000004c4a60 R14: 00000000004d8140 R15: 00000000ffffffff
Uninit was stored to memory at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline] kmsan_save_stack mm/kmsan/kmsan.c:219 [inline] kmsan_internal_chain_origin+0x134/0x230 mm/kmsan/kmsan.c:439 __msan_chain_origin+0x70/0xe0 mm/kmsan/kmsan_instr.c:200 ipv6_recv_error+0x1e3f/0x1eb0 net/ipv6/datagram.c:475 udpv6_recvmsg+0x398/0x2ab0 net/ipv6/udp.c:335 inet_recvmsg+0x4fb/0x600 net/ipv4/af_inet.c:830 sock_recvmsg_nosec net/socket.c:794 [inline] sock_recvmsg+0x1d1/0x230 net/socket.c:801 ___sys_recvmsg+0x4d5/0x1140 net/socket.c:2278 __sys_recvmsg net/socket.c:2327 [inline] __do_sys_recvmsg net/socket.c:2337 [inline] __se_sys_recvmsg+0x2fa/0x450 net/socket.c:2334 __x64_sys_recvmsg+0x4a/0x70 net/socket.c:2334 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x63/0xe7
Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline] kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:158 kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176 kmsan_slab_alloc+0xe/0x10 mm/kmsan/kmsan_hooks.c:185 slab_post_alloc_hook mm/slab.h:446 [inline] slab_alloc_node mm/slub.c:2759 [inline] __kmalloc_node_track_caller+0xe18/0x1030 mm/slub.c:4383 __kmalloc_reserve net/core/skbuff.c:137 [inline] __alloc_skb+0x309/0xa20 net/core/skbuff.c:205 alloc_skb include/linux/skbuff.h:998 [inline] ipv6_local_error+0x1a7/0x9e0 net/ipv6/datagram.c:334 __ip6_append_data+0x129f/0x4fd0 net/ipv6/ip6_output.c:1311 ip6_make_skb+0x6cc/0xcf0 net/ipv6/ip6_output.c:1775 udpv6_sendmsg+0x3f8e/0x45d0 net/ipv6/udp.c:1384 inet_sendmsg+0x54a/0x720 net/ipv4/af_inet.c:798 sock_sendmsg_nosec net/socket.c:621 [inline] sock_sendmsg net/socket.c:631 [inline] __sys_sendto+0x8c4/0xac0 net/socket.c:1788 __do_sys_sendto net/socket.c:1800 [inline] __se_sys_sendto+0x107/0x130 net/socket.c:1796 __x64_sys_sendto+0x6e/0x90 net/socket.c:1796 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x63/0xe7
Bytes 4-7 of 28 are uninitialized Memory access of size 28 starts at ffff8881937bfce0 Data copied to user address 0000000020000000
Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv6/datagram.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/ipv6/datagram.c +++ b/net/ipv6/datagram.c @@ -287,6 +287,7 @@ void ipv6_local_error(struct sock *sk, i skb_reset_network_header(skb); iph = ipv6_hdr(skb); iph->daddr = fl6->daddr; + ip6_flow_hdr(iph, 0, 0);
serr = SKB_EXT_ERR(skb); serr->ee.ee_errno = err;
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Gunthorpe jgg@mellanox.com
[ Upstream commit d972f3dce8d161e2142da0ab1ef25df00e2f21a9 ]
'dev' is non NULL when the addr_len check triggers so it must goto a label that does the dev_put otherwise dev will have a leaked refcount.
This bug causes the ib_ipoib module to become unloadable when using systemd-network as it triggers this check on InfiniBand links.
Fixes: 99137b7888f4 ("packet: validate address length") Reported-by: Leon Romanovsky leonro@mellanox.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Acked-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/packet/af_packet.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2276,7 +2276,7 @@ static int tpacket_snd(struct packet_soc addr = saddr->sll_halen ? saddr->sll_addr : NULL; dev = dev_get_by_index(sock_net(&po->sk), saddr->sll_ifindex); if (addr && dev && saddr->sll_halen < dev->addr_len) - goto out; + goto out_put; }
err = -ENXIO; @@ -2439,7 +2439,7 @@ static int packet_snd(struct socket *soc addr = saddr->sll_halen ? saddr->sll_addr : NULL; dev = dev_get_by_index(sock_net(sk), saddr->sll_ifindex); if (addr && dev && saddr->sll_halen < dev->addr_len) - goto out; + goto out_unlock; }
err = -ENXIO;
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: JianJhen Chen kchen@synology.com
[ Upstream commit 4c84edc11b76590859b1e45dd676074c59602dc4 ]
When handling DNAT'ed packets on a bridge device, the neighbour cache entry from lookup was used without checking its state. It means that a cache entry in the NUD_STALE state will be used directly instead of entering the NUD_DELAY state to confirm the reachability of the neighbor.
This problem becomes worse after commit 2724680bceee ("neigh: Keep neighbour cache entries if number of them is small enough."), since all neighbour cache entries in the NUD_STALE state will be kept in the neighbour table as long as the number of cache entries does not exceed the value specified in gc_thresh1.
This commit validates the state of a neighbour cache entry before using the entry.
Signed-off-by: JianJhen Chen kchen@synology.com Reviewed-by: JinLin Chen jlchen@synology.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/bridge/br_netfilter.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/bridge/br_netfilter.c +++ b/net/bridge/br_netfilter.c @@ -287,7 +287,7 @@ static int br_nf_pre_routing_finish_brid if (neigh) { int ret;
- if (neigh->hh.hh_len) { + if ((neigh->nud_state & NUD_CONNECTED) && neigh->hh.hh_len) { neigh_hh_bridge(&neigh->hh, skb); skb->dev = nf_bridge->physindev; ret = br_handle_frame_finish(skb);
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Biggers ebiggers@google.com
commit 8f9c469348487844328e162db57112f7d347c49f upstream.
Keys for "authenc" AEADs are formatted as an rtattr containing a 4-byte 'enckeylen', followed by an authentication key and an encryption key. crypto_authenc_extractkeys() parses the key to find the inner keys.
However, it fails to consider the case where the rtattr's payload is longer than 4 bytes but not 4-byte aligned, and where the key ends before the next 4-byte aligned boundary. In this case, 'keylen -= RTA_ALIGN(rta->rta_len);' underflows to a value near UINT_MAX. This causes a buffer overread and crash during crypto_ahash_setkey().
Fix it by restricting the rtattr payload to the expected size.
Reproducer using AF_ALG:
#include <linux/if_alg.h> #include <linux/rtnetlink.h> #include <sys/socket.h>
int main() { int fd; struct sockaddr_alg addr = { .salg_type = "aead", .salg_name = "authenc(hmac(sha256),cbc(aes))", }; struct { struct rtattr attr; __be32 enckeylen; char keys[1]; } __attribute__((packed)) key = { .attr.rta_len = sizeof(key), .attr.rta_type = 1 /* CRYPTO_AUTHENC_KEYA_PARAM */, };
fd = socket(AF_ALG, SOCK_SEQPACKET, 0); bind(fd, (void *)&addr, sizeof(addr)); setsockopt(fd, SOL_ALG, ALG_SET_KEY, &key, sizeof(key)); }
It caused:
BUG: unable to handle kernel paging request at ffff88007ffdc000 PGD 2e01067 P4D 2e01067 PUD 2e04067 PMD 2e05067 PTE 0 Oops: 0000 [#1] SMP CPU: 0 PID: 883 Comm: authenc Not tainted 4.20.0-rc1-00108-g00c9fe37a7f27 #13 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181126_142135-anatol 04/01/2014 RIP: 0010:sha256_ni_transform+0xb3/0x330 arch/x86/crypto/sha256_ni_asm.S:155 [...] Call Trace: sha256_ni_finup+0x10/0x20 arch/x86/crypto/sha256_ssse3_glue.c:321 crypto_shash_finup+0x1a/0x30 crypto/shash.c:178 shash_digest_unaligned+0x45/0x60 crypto/shash.c:186 crypto_shash_digest+0x24/0x40 crypto/shash.c:202 hmac_setkey+0x135/0x1e0 crypto/hmac.c:66 crypto_shash_setkey+0x2b/0xb0 crypto/shash.c:66 shash_async_setkey+0x10/0x20 crypto/shash.c:223 crypto_ahash_setkey+0x2d/0xa0 crypto/ahash.c:202 crypto_authenc_setkey+0x68/0x100 crypto/authenc.c:96 crypto_aead_setkey+0x2a/0xc0 crypto/aead.c:62 aead_setkey+0xc/0x10 crypto/algif_aead.c:526 alg_setkey crypto/af_alg.c:223 [inline] alg_setsockopt+0xfe/0x130 crypto/af_alg.c:256 __sys_setsockopt+0x6d/0xd0 net/socket.c:1902 __do_sys_setsockopt net/socket.c:1913 [inline] __se_sys_setsockopt net/socket.c:1910 [inline] __x64_sys_setsockopt+0x1f/0x30 net/socket.c:1910 do_syscall_64+0x4a/0x180 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: e236d4a89a2f ("[CRYPTO] authenc: Move enckeylen into key itself") Cc: stable@vger.kernel.org # v2.6.25+ Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- crypto/authenc.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
--- a/crypto/authenc.c +++ b/crypto/authenc.c @@ -62,14 +62,22 @@ int crypto_authenc_extractkeys(struct cr return -EINVAL; if (rta->rta_type != CRYPTO_AUTHENC_KEYA_PARAM) return -EINVAL; - if (RTA_PAYLOAD(rta) < sizeof(*param)) + + /* + * RTA_OK() didn't align the rtattr's payload when validating that it + * fits in the buffer. Yet, the keys should start on the next 4-byte + * aligned boundary. To avoid confusion, require that the rtattr + * payload be exactly the param struct, which has a 4-byte aligned size. + */ + if (RTA_PAYLOAD(rta) != sizeof(*param)) return -EINVAL; + BUILD_BUG_ON(sizeof(*param) % RTA_ALIGNTO);
param = RTA_DATA(rta); keys->enckeylen = be32_to_cpu(param->enckeylen);
- key += RTA_ALIGN(rta->rta_len); - keylen -= RTA_ALIGN(rta->rta_len); + key += rta->rta_len; + keylen -= rta->rta_len;
if (keylen < keys->enckeylen) return -EINVAL;
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josef Bacik josef@toxicpanda.com
commit 74d5d229b1bf60f93bff244b2dfc0eb21ec32a07 upstream.
If we flip read-only before we initiate writeback on all dirty pages for ordered extents we've created then we'll have ordered extents left over on umount, which results in all sorts of bad things happening. Fix this by making sure we wait on ordered extents if we have to do the aborted transaction cleanup stuff.
generic/475 can produce this warning:
[ 8531.177332] WARNING: CPU: 2 PID: 11997 at fs/btrfs/disk-io.c:3856 btrfs_free_fs_root+0x95/0xa0 [btrfs] [ 8531.183282] CPU: 2 PID: 11997 Comm: umount Tainted: G W 5.0.0-rc1-default+ #394 [ 8531.185164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014 [ 8531.187851] RIP: 0010:btrfs_free_fs_root+0x95/0xa0 [btrfs] [ 8531.193082] RSP: 0018:ffffb1ab86163d98 EFLAGS: 00010286 [ 8531.194198] RAX: ffff9f3449494d18 RBX: ffff9f34a2695000 RCX:0000000000000000 [ 8531.195629] RDX: 0000000000000002 RSI: 0000000000000001 RDI:0000000000000000 [ 8531.197315] RBP: ffff9f344e930000 R08: 0000000000000001 R09:0000000000000000 [ 8531.199095] R10: 0000000000000000 R11: ffff9f34494d4ff8 R12:ffffb1ab86163dc0 [ 8531.200870] R13: ffff9f344e9300b0 R14: ffffb1ab86163db8 R15:0000000000000000 [ 8531.202707] FS: 00007fc68e949fc0(0000) GS:ffff9f34bd800000(0000)knlGS:0000000000000000 [ 8531.204851] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8531.205942] CR2: 00007ffde8114dd8 CR3: 000000002dfbd000 CR4:00000000000006e0 [ 8531.207516] Call Trace: [ 8531.208175] btrfs_free_fs_roots+0xdb/0x170 [btrfs] [ 8531.210209] ? wait_for_completion+0x5b/0x190 [ 8531.211303] close_ctree+0x157/0x350 [btrfs] [ 8531.212412] generic_shutdown_super+0x64/0x100 [ 8531.213485] kill_anon_super+0x14/0x30 [ 8531.214430] btrfs_kill_super+0x12/0xa0 [btrfs] [ 8531.215539] deactivate_locked_super+0x29/0x60 [ 8531.216633] cleanup_mnt+0x3b/0x70 [ 8531.217497] task_work_run+0x98/0xc0 [ 8531.218397] exit_to_usermode_loop+0x83/0x90 [ 8531.219324] do_syscall_64+0x15b/0x180 [ 8531.220192] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 8531.221286] RIP: 0033:0x7fc68e5e4d07 [ 8531.225621] RSP: 002b:00007ffde8116608 EFLAGS: 00000246 ORIG_RAX:00000000000000a6 [ 8531.227512] RAX: 0000000000000000 RBX: 00005580c2175970 RCX:00007fc68e5e4d07 [ 8531.229098] RDX: 0000000000000001 RSI: 0000000000000000 RDI:00005580c2175b80 [ 8531.230730] RBP: 0000000000000000 R08: 00005580c2175ba0 R09:00007ffde8114e80 [ 8531.232269] R10: 0000000000000000 R11: 0000000000000246 R12:00005580c2175b80 [ 8531.233839] R13: 00007fc68eac61c4 R14: 00005580c2175a68 R15:0000000000000000
Leaving a tree in the rb-tree:
3853 void btrfs_free_fs_root(struct btrfs_root *root) 3854 { 3855 iput(root->ino_cache_inode); 3856 WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree));
CC: stable@vger.kernel.org Reviewed-by: Nikolay Borisov nborisov@suse.com Signed-off-by: Josef Bacik josef@toxicpanda.com [ add stacktrace ] Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/disk-io.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3942,6 +3942,14 @@ static void btrfs_destroy_all_ordered_ex spin_lock(&fs_info->ordered_root_lock); } spin_unlock(&fs_info->ordered_root_lock); + + /* + * We need this here because if we've been flipped read-only we won't + * get sync() from the umount, so we need to make sure any ordered + * extents that haven't had their dirty pages IO start writeout yet + * actually get run and error out properly. + */ + btrfs_wait_ordered_roots(fs_info, -1); }
static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook keescook@chromium.org
commit 9474f4e7cd71a633fa1ef93b7daefd44bbdfd482 upstream.
It's possible that a pid has died before we take the rcu lock, in which case we can't walk the ancestry list as it may be detached. Instead, check for death first before doing the walk.
Reported-by: syzbot+a9ac39bf55329e206219@syzkaller.appspotmail.com Fixes: 2d514487faf1 ("security: Yama LSM") Cc: stable@vger.kernel.org Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: James Morris james.morris@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- security/yama/yama_lsm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/security/yama/yama_lsm.c +++ b/security/yama/yama_lsm.c @@ -299,7 +299,9 @@ int yama_ptrace_access_check(struct task break; case YAMA_SCOPE_RELATIONAL: rcu_read_lock(); - if (!task_is_descendant(current, child) && + if (!pid_alive(child)) + rc = -EPERM; + if (!rc && !task_is_descendant(current, child) && !ptracer_exception_found(current, child) && !ns_capable(__task_cred(child)->user_ns, CAP_SYS_PTRACE)) rc = -EPERM;
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Mironov mironov.ivan@gmail.com
commit 44759979a49bfd2d20d789add7fa81a21eb1a4ab upstream.
Changing of caching mode via /sys/devices/.../scsi_disk/.../cache_type may fail if device responds to MODE SENSE command with DPOFUA flag set, and then checks this flag to be not set on MODE SELECT command.
In this scenario, when trying to change cache_type, write always fails:
# echo "none" >cache_type bash: echo: write error: Invalid argument
And following appears in dmesg:
[13007.865745] sd 1:0:1:0: [sda] Sense Key : Illegal Request [current] [13007.865753] sd 1:0:1:0: [sda] Add. Sense: Invalid field in parameter list
From SBC-4 r15, 6.5.1 "Mode pages overview", description of DEVICE-SPECIFIC
PARAMETER field in the mode parameter header: ... The write protect (WP) bit for mode data sent with a MODE SELECT command shall be ignored by the device server. ... The DPOFUA bit is reserved for mode data sent with a MODE SELECT command. ...
The remaining bits in the DEVICE-SPECIFIC PARAMETER byte are also reserved and shall be set to zero.
[mkp: shuffled commentary to commit description]
Cc: stable@vger.kernel.org Signed-off-by: Ivan Mironov mironov.ivan@gmail.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/scsi/sd.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -205,6 +205,12 @@ cache_type_store(struct device *dev, str buffer_data[2] |= wce << 2 | rcd; sp = buffer_data[0] & 0x80 ? 1 : 0;
+ /* + * Ensure WP, DPOFUA, and RESERVED fields are cleared in + * received mode parameter buffer before doing MODE SELECT. + */ + data.device_specific = 0; + if (scsi_mode_select(sdp, 1, sp, 8, buffer_data, len, SD_TIMEOUT, SD_MAX_RETRIES, &data, &sshdr)) { if (scsi_sense_valid(&sshdr))
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jonathan Hunter jonathanh@nvidia.com
commit ac4ca4b9f4623ba5e1ea7a582f286567c611e027 upstream.
The tps6586x driver creates an irqchip that is used by its various child devices for managing interrupts. The tps6586x-rtc device is one of its children that uses the tps6586x irqchip. When using the tps6586x-rtc as a wake-up device from suspend, the following is seen:
PM: Syncing filesystems ... done. Freezing user space processes ... (elapsed 0.001 seconds) done. OOM killer disabled. Freezing remaining freezable tasks ... (elapsed 0.000 seconds) done. Disabling non-boot CPUs ... Entering suspend state LP1 Enabling non-boot CPUs ... CPU1 is up tps6586x 3-0034: failed to read interrupt status tps6586x 3-0034: failed to read interrupt status
The reason why the tps6586x interrupt status cannot be read is because the tps6586x interrupt is not masked during suspend and when the tps6586x-rtc interrupt occurs, to wake-up the device, the interrupt is seen before the i2c controller has been resumed in order to read the tps6586x interrupt status.
The tps6586x-rtc driver sets it's interrupt as a wake-up source during suspend, which gets propagated to the parent tps6586x interrupt. However, the tps6586x-rtc driver cannot disable it's interrupt during suspend otherwise we would never be woken up and so the tps6586x must disable it's interrupt instead.
Prevent the tps6586x interrupt handler from executing on exiting suspend before the i2c controller has been resumed by disabling the tps6586x interrupt on entering suspend and re-enabling it on resuming from suspend.
Cc: stable@vger.kernel.org Signed-off-by: Jon Hunter jonathanh@nvidia.com Reviewed-by: Dmitry Osipenko digetx@gmail.com Tested-by: Dmitry Osipenko digetx@gmail.com Acked-by: Thierry Reding treding@nvidia.com Signed-off-by: Lee Jones lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mfd/tps6586x.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
--- a/drivers/mfd/tps6586x.c +++ b/drivers/mfd/tps6586x.c @@ -601,6 +601,29 @@ static int tps6586x_i2c_remove(struct i2 return 0; }
+static int __maybe_unused tps6586x_i2c_suspend(struct device *dev) +{ + struct tps6586x *tps6586x = dev_get_drvdata(dev); + + if (tps6586x->client->irq) + disable_irq(tps6586x->client->irq); + + return 0; +} + +static int __maybe_unused tps6586x_i2c_resume(struct device *dev) +{ + struct tps6586x *tps6586x = dev_get_drvdata(dev); + + if (tps6586x->client->irq) + enable_irq(tps6586x->client->irq); + + return 0; +} + +static SIMPLE_DEV_PM_OPS(tps6586x_pm_ops, tps6586x_i2c_suspend, + tps6586x_i2c_resume); + static const struct i2c_device_id tps6586x_id_table[] = { { "tps6586x", 0 }, { }, @@ -612,6 +635,7 @@ static struct i2c_driver tps6586x_driver .name = "tps6586x", .owner = THIS_MODULE, .of_match_table = of_match_ptr(tps6586x_of_match), + .pm = &tps6586x_pm_ops, }, .probe = tps6586x_i2c_probe, .remove = tps6586x_i2c_remove,
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: YunQiang Su ysu@wavecomp.com
commit a214720cbf50cd8c3f76bbb9c3f5c283910e9d33 upstream.
Octeon has an boot-time option to disable pcie.
Since MSI depends on PCI-E, we should also disable MSI also with this option is on in order to avoid inadvertently accessing PCIe registers.
Signed-off-by: YunQiang Su ysu@wavecomp.com Signed-off-by: Paul Burton paul.burton@mips.com Cc: pburton@wavecomp.com Cc: linux-mips@vger.kernel.org Cc: aaro.koskinen@iki.fi Cc: stable@vger.kernel.org # v3.3+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/mips/pci/msi-octeon.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/arch/mips/pci/msi-octeon.c +++ b/arch/mips/pci/msi-octeon.c @@ -369,7 +369,9 @@ int __init octeon_msi_initialize(void) int irq; struct irq_chip *msi;
- if (octeon_dma_bar_type == OCTEON_DMA_BAR_TYPE_PCIE) { + if (octeon_dma_bar_type == OCTEON_DMA_BAR_TYPE_INVALID) { + return 0; + } else if (octeon_dma_bar_type == OCTEON_DMA_BAR_TYPE_PCIE) { msi_rcv_reg[0] = CVMX_PEXP_NPEI_MSI_RCV0; msi_rcv_reg[1] = CVMX_PEXP_NPEI_MSI_RCV1; msi_rcv_reg[2] = CVMX_PEXP_NPEI_MSI_RCV2;
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vlad Tsyrklevich vlad@tsyrklevich.net
commit a01421e4484327fe44f8e126793ed5a48a221e24 upstream.
Using [1] for static analysis I found that the OMAPFB_QUERY_PLANE, OMAPFB_GET_COLOR_KEY, OMAPFB_GET_DISPLAY_INFO, and OMAPFB_GET_VRAM_INFO cases could all leak uninitialized stack memory--either due to uninitialized padding or 'reserved' fields.
Fix them by clearing the shared union used to store copied out data.
[1] https://github.com/vlad902/kernel-uninitialized-memory-checker
Signed-off-by: Vlad Tsyrklevich vlad@tsyrklevich.net Reviewed-by: Kees Cook keescook@chromium.org Fixes: b39a982ddecf ("OMAP: DSS2: omapfb driver") Cc: security@kernel.org [b.zolnierkie: prefix patch subject with "omap2fb: "] Signed-off-by: Bartlomiej Zolnierkiewicz b.zolnierkie@samsung.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/video/fbdev/omap2/omapfb/omapfb-ioctl.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/video/fbdev/omap2/omapfb/omapfb-ioctl.c +++ b/drivers/video/fbdev/omap2/omapfb/omapfb-ioctl.c @@ -606,6 +606,8 @@ int omapfb_ioctl(struct fb_info *fbi, un
int r = 0;
+ memset(&p, 0, sizeof(p)); + switch (cmd) { case OMAPFB_SYNC_GFX: DBG("ioctl SYNC_GFX\n");
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans Verkuil hverkuil-cisco@xs4all.nl
commit 701f49bc028edb19ffccd101997dd84f0d71e279 upstream.
kthread_run returns an error pointer, but elsewhere in the code dev->kthread_vid_cap/out is checked against NULL.
If kthread_run returns an error, then set the pointer to NULL.
I chose this method over changing all kthread_vid_cap/out tests elsewhere since this is more robust.
Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Reported-by: syzbot+53d5b2df0d9744411e2e@syzkaller.appspotmail.com Signed-off-by: Mauro Carvalho Chehab mchehab+samsung@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/media/platform/vivid/vivid-kthread-cap.c | 5 ++++- drivers/media/platform/vivid/vivid-kthread-out.c | 5 ++++- 2 files changed, 8 insertions(+), 2 deletions(-)
--- a/drivers/media/platform/vivid/vivid-kthread-cap.c +++ b/drivers/media/platform/vivid/vivid-kthread-cap.c @@ -829,8 +829,11 @@ int vivid_start_generating_vid_cap(struc "%s-vid-cap", dev->v4l2_dev.name);
if (IS_ERR(dev->kthread_vid_cap)) { + int err = PTR_ERR(dev->kthread_vid_cap); + + dev->kthread_vid_cap = NULL; v4l2_err(&dev->v4l2_dev, "kernel_thread() failed\n"); - return PTR_ERR(dev->kthread_vid_cap); + return err; } *pstreaming = true; vivid_grab_controls(dev, true); --- a/drivers/media/platform/vivid/vivid-kthread-out.c +++ b/drivers/media/platform/vivid/vivid-kthread-out.c @@ -248,8 +248,11 @@ int vivid_start_generating_vid_out(struc "%s-vid-out", dev->v4l2_dev.name);
if (IS_ERR(dev->kthread_vid_out)) { + int err = PTR_ERR(dev->kthread_vid_out); + + dev->kthread_vid_out = NULL; v4l2_err(&dev->v4l2_dev, "kernel_thread() failed\n"); - return PTR_ERR(dev->kthread_vid_out); + return err; } *pstreaming = true; vivid_grab_controls(dev, true);
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans Verkuil hverkuil-cisco@xs4all.nl
commit 9729d6d282a6d7ce88e64c9119cecdf79edf4e88 upstream.
The capture DV timings capabilities allowed for a minimum width and height of 0. So passing a timings struct with 0 values is allowed and will later cause a division by zero.
Ensure that the width and height must be >= 16 to avoid this.
Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Reported-by: syzbot+57c3d83d71187054d56f@syzkaller.appspotmail.com Signed-off-by: Mauro Carvalho Chehab mchehab+samsung@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/media/platform/vivid/vivid-vid-common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/media/platform/vivid/vivid-vid-common.c +++ b/drivers/media/platform/vivid/vivid-vid-common.c @@ -33,7 +33,7 @@ const struct v4l2_dv_timings_cap vivid_d .type = V4L2_DV_BT_656_1120, /* keep this initialization for compatibility with GCC < 4.4.6 */ .reserved = { 0 }, - V4L2_INIT_BT_TIMINGS(0, MAX_WIDTH, 0, MAX_HEIGHT, 25000000, 600000000, + V4L2_INIT_BT_TIMINGS(16, MAX_WIDTH, 16, MAX_HEIGHT, 25000000, 600000000, V4L2_DV_BT_STD_CEA861 | V4L2_DV_BT_STD_DMT, V4L2_DV_BT_CAP_PROGRESSIVE | V4L2_DV_BT_CAP_INTERLACED) };
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans Verkuil hverkuil@xs4all.nl
commit cd26d1c4d1bc947b56ae404998ae2276df7b39b7 upstream.
If a filehandle is dup()ped, then it is possible to close it from one fd and call mmap from the other. This creates a race condition in vb2_mmap where it is using queue data that __vb2_queue_free (called from close()) is in the process of releasing.
By moving up the mutex_lock(mmap_lock) in vb2_mmap this race is avoided since __vb2_queue_free is called with the same mutex locked. So vb2_mmap now reads consistent buffer data.
Signed-off-by: Hans Verkuil hverkuil@xs4all.nl Reported-by: syzbot+be93025dd45dccd8923c@syzkaller.appspotmail.com Signed-off-by: Hans Verkuil hansverk@cisco.com Signed-off-by: Mauro Carvalho Chehab mchehab+samsung@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/media/v4l2-core/videobuf2-core.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
--- a/drivers/media/v4l2-core/videobuf2-core.c +++ b/drivers/media/v4l2-core/videobuf2-core.c @@ -2474,9 +2474,13 @@ int vb2_mmap(struct vb2_queue *q, struct return -EINVAL; } } + + mutex_lock(&q->mmap_lock); + if (vb2_fileio_is_active(q)) { dprintk(1, "mmap: file io in progress\n"); - return -EBUSY; + ret = -EBUSY; + goto unlock; }
/* @@ -2484,7 +2488,7 @@ int vb2_mmap(struct vb2_queue *q, struct */ ret = __find_plane_by_offset(q, off, &buffer, &plane); if (ret) - return ret; + goto unlock;
vb = q->bufs[buffer];
@@ -2500,8 +2504,9 @@ int vb2_mmap(struct vb2_queue *q, struct return -EINVAL; }
- mutex_lock(&q->mmap_lock); ret = call_memop(vb, mmap, vb->planes[plane].mem_priv, vma); + +unlock: mutex_unlock(&q->mmap_lock); if (ret) return ret;
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: J. Bruce Fields bfields@redhat.com
commit 81c88b18de1f11f70c97f28ced8d642c00bb3955 upstream.
If we ignore the error we'll hit a null dereference a little later.
Reported-by: syzbot+4b98281f2401ab849f4b@syzkaller.appspotmail.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/sunrpc/rpcb_clnt.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/net/sunrpc/rpcb_clnt.c +++ b/net/sunrpc/rpcb_clnt.c @@ -772,6 +772,12 @@ void rpcb_getport_async(struct rpc_task case RPCBVERS_3: map->r_netid = xprt->address_strings[RPC_DISPLAY_NETID]; map->r_addr = rpc_sockaddr2uaddr(sap, GFP_ATOMIC); + if (!map->r_addr) { + status = -ENOMEM; + dprintk("RPC: %5u %s: no memory available\n", + task->tk_pid, __func__); + goto bailout_free_args; + } map->r_owner = ""; break; case RPCBVERS_2: @@ -794,6 +800,8 @@ void rpcb_getport_async(struct rpc_task rpc_put_task(child); return;
+bailout_free_args: + kfree(map); bailout_release_client: rpc_release_client(rpcb_clnt); bailout_nofree:
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stephen Smalley sds@tycho.nsa.gov
commit 5b0e7310a2a33c06edc7eb81ffc521af9b2c5610 upstream.
levdatum->level can be NULL if we encounter an error while loading the policy during sens_read prior to initializing it. Make sure sens_destroy handles that case correctly.
Reported-by: syzbot+6664500f0f18f07a5c0e@syzkaller.appspotmail.com Signed-off-by: Stephen Smalley sds@tycho.nsa.gov Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- security/selinux/ss/policydb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/security/selinux/ss/policydb.c +++ b/security/selinux/ss/policydb.c @@ -717,7 +717,8 @@ static int sens_destroy(void *key, void kfree(key); if (datum) { levdatum = datum; - ebitmap_destroy(&levdatum->level->cat); + if (levdatum->level) + ebitmap_destroy(&levdatum->level->cat); kfree(levdatum->level); } kfree(datum);
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xin Long lucien.xin@gmail.com
commit 400b8b9a2a17918f8ce00786f596f530e7f30d50 upstream.
The similar issue as fixed in Commit 4a2eb0c37b47 ("sctp: initialize sin6_flowinfo for ipv6 addrs in sctp_inet6addr_event") also exists in sctp_inetaddr_event, as Alexander noticed.
To fix it, allocate sctp_sockaddr_entry with kzalloc for both sctp ipv4 and ipv6 addresses, as does in sctp_v4/6_copy_addrlist().
Reported-by: Alexander Potapenko glider@google.com Signed-off-by: Xin Long lucien.xin@gmail.com Reported-by: syzbot+ae0c70c0c2d40c51bb92@syzkaller.appspotmail.com Acked-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Acked-by: Neil Horman nhorman@tuxdriver.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/sctp/ipv6.c | 5 +---- net/sctp/protocol.c | 4 +--- 2 files changed, 2 insertions(+), 7 deletions(-)
--- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -97,11 +97,9 @@ static int sctp_inet6addr_event(struct n
switch (ev) { case NETDEV_UP: - addr = kmalloc(sizeof(struct sctp_sockaddr_entry), GFP_ATOMIC); + addr = kzalloc(sizeof(*addr), GFP_ATOMIC); if (addr) { addr->a.v6.sin6_family = AF_INET6; - addr->a.v6.sin6_port = 0; - addr->a.v6.sin6_flowinfo = 0; addr->a.v6.sin6_addr = ifa->addr; addr->a.v6.sin6_scope_id = ifa->idev->dev->ifindex; addr->valid = 1; @@ -411,7 +409,6 @@ static void sctp_v6_copy_addrlist(struct addr = kzalloc(sizeof(*addr), GFP_ATOMIC); if (addr) { addr->a.v6.sin6_family = AF_INET6; - addr->a.v6.sin6_port = 0; addr->a.v6.sin6_addr = ifp->addr; addr->a.v6.sin6_scope_id = dev->ifindex; addr->valid = 1; --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -149,7 +149,6 @@ static void sctp_v4_copy_addrlist(struct addr = kzalloc(sizeof(*addr), GFP_ATOMIC); if (addr) { addr->a.v4.sin_family = AF_INET; - addr->a.v4.sin_port = 0; addr->a.v4.sin_addr.s_addr = ifa->ifa_local; addr->valid = 1; INIT_LIST_HEAD(&addr->list); @@ -755,10 +754,9 @@ static int sctp_inetaddr_event(struct no
switch (ev) { case NETDEV_UP: - addr = kmalloc(sizeof(struct sctp_sockaddr_entry), GFP_ATOMIC); + addr = kzalloc(sizeof(*addr), GFP_ATOMIC); if (addr) { addr->a.v4.sin_family = AF_INET; - addr->a.v4.sin_port = 0; addr->a.v4.sin_addr.s_addr = ifa->ifa_local; addr->valid = 1; spin_lock_bh(&net->sctp.local_addr_lock);
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
commit 310ca162d779efee8a2dc3731439680f3e9c1e86 upstream.
syzbot is reporting NULL pointer dereference [1] which is caused by race condition between ioctl(loop_fd, LOOP_CLR_FD, 0) versus ioctl(other_loop_fd, LOOP_SET_FD, loop_fd) due to traversing other loop devices at loop_validate_file() without holding corresponding lo->lo_ctl_mutex locks.
Since ioctl() request on loop devices is not frequent operation, we don't need fine grained locking. Let's use global lock in order to allow safe traversal at loop_validate_file().
Note that syzbot is also reporting circular locking dependency between bdev->bd_mutex and lo->lo_ctl_mutex [2] which is caused by calling blkdev_reread_part() with lock held. This patch does not address it.
[1] https://syzkaller.appspot.com/bug?id=f3cfe26e785d85f9ee259f385515291d21bd80a... [2] https://syzkaller.appspot.com/bug?id=bf154052f0eea4bc7712499e4569505907d1588...
Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Reported-by: syzbot syzbot+bf89c128e05dd6c62523@syzkaller.appspotmail.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/block/loop.c | 47 ++++++++++++++++++++++++----------------------- drivers/block/loop.h | 1 - 2 files changed, 24 insertions(+), 24 deletions(-)
--- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -81,6 +81,7 @@
static DEFINE_IDR(loop_index_idr); static DEFINE_MUTEX(loop_index_mutex); +static DEFINE_MUTEX(loop_ctl_mutex);
static int max_part; static int part_shift; @@ -1012,7 +1013,7 @@ static int loop_clr_fd(struct loop_devic */ if (lo->lo_refcnt > 1) { lo->lo_flags |= LO_FLAGS_AUTOCLEAR; - mutex_unlock(&lo->lo_ctl_mutex); + mutex_unlock(&loop_ctl_mutex); return 0; }
@@ -1061,12 +1062,12 @@ static int loop_clr_fd(struct loop_devic lo->lo_flags = 0; if (!part_shift) lo->lo_disk->flags |= GENHD_FL_NO_PART_SCAN; - mutex_unlock(&lo->lo_ctl_mutex); + mutex_unlock(&loop_ctl_mutex); /* - * Need not hold lo_ctl_mutex to fput backing file. - * Calling fput holding lo_ctl_mutex triggers a circular + * Need not hold loop_ctl_mutex to fput backing file. + * Calling fput holding loop_ctl_mutex triggers a circular * lock dependency possibility warning as fput can take - * bd_mutex which is usually taken before lo_ctl_mutex. + * bd_mutex which is usually taken before loop_ctl_mutex. */ fput(filp); return 0; @@ -1300,7 +1301,7 @@ static int lo_ioctl(struct block_device struct loop_device *lo = bdev->bd_disk->private_data; int err;
- mutex_lock_nested(&lo->lo_ctl_mutex, 1); + mutex_lock_nested(&loop_ctl_mutex, 1); switch (cmd) { case LOOP_SET_FD: err = loop_set_fd(lo, mode, bdev, arg); @@ -1309,7 +1310,7 @@ static int lo_ioctl(struct block_device err = loop_change_fd(lo, bdev, arg); break; case LOOP_CLR_FD: - /* loop_clr_fd would have unlocked lo_ctl_mutex on success */ + /* loop_clr_fd would have unlocked loop_ctl_mutex on success */ err = loop_clr_fd(lo); if (!err) goto out_unlocked; @@ -1340,7 +1341,7 @@ static int lo_ioctl(struct block_device default: err = lo->ioctl ? lo->ioctl(lo, cmd, arg) : -EINVAL; } - mutex_unlock(&lo->lo_ctl_mutex); + mutex_unlock(&loop_ctl_mutex);
out_unlocked: return err; @@ -1473,16 +1474,16 @@ static int lo_compat_ioctl(struct block_
switch(cmd) { case LOOP_SET_STATUS: - mutex_lock(&lo->lo_ctl_mutex); + mutex_lock(&loop_ctl_mutex); err = loop_set_status_compat( lo, (const struct compat_loop_info __user *) arg); - mutex_unlock(&lo->lo_ctl_mutex); + mutex_unlock(&loop_ctl_mutex); break; case LOOP_GET_STATUS: - mutex_lock(&lo->lo_ctl_mutex); + mutex_lock(&loop_ctl_mutex); err = loop_get_status_compat( lo, (struct compat_loop_info __user *) arg); - mutex_unlock(&lo->lo_ctl_mutex); + mutex_unlock(&loop_ctl_mutex); break; case LOOP_SET_CAPACITY: case LOOP_CLR_FD: @@ -1513,9 +1514,9 @@ static int lo_open(struct block_device * goto out; }
- mutex_lock(&lo->lo_ctl_mutex); + mutex_lock(&loop_ctl_mutex); lo->lo_refcnt++; - mutex_unlock(&lo->lo_ctl_mutex); + mutex_unlock(&loop_ctl_mutex); out: mutex_unlock(&loop_index_mutex); return err; @@ -1525,7 +1526,7 @@ static void __lo_release(struct loop_dev { int err;
- mutex_lock(&lo->lo_ctl_mutex); + mutex_lock(&loop_ctl_mutex);
if (--lo->lo_refcnt) goto out; @@ -1547,7 +1548,7 @@ static void __lo_release(struct loop_dev }
out: - mutex_unlock(&lo->lo_ctl_mutex); + mutex_unlock(&loop_ctl_mutex); }
static void lo_release(struct gendisk *disk, fmode_t mode) @@ -1593,10 +1594,10 @@ static int unregister_transfer_cb(int id struct loop_device *lo = ptr; struct loop_func_table *xfer = data;
- mutex_lock(&lo->lo_ctl_mutex); + mutex_lock(&loop_ctl_mutex); if (lo->lo_encryption == xfer) loop_release_xfer(lo); - mutex_unlock(&lo->lo_ctl_mutex); + mutex_unlock(&loop_ctl_mutex); return 0; }
@@ -1677,7 +1678,7 @@ static int loop_add(struct loop_device * if (!part_shift) disk->flags |= GENHD_FL_NO_PART_SCAN; disk->flags |= GENHD_FL_EXT_DEVT; - mutex_init(&lo->lo_ctl_mutex); + mutex_init(&loop_ctl_mutex); lo->lo_number = i; lo->lo_thread = NULL; init_waitqueue_head(&lo->lo_event); @@ -1789,19 +1790,19 @@ static long loop_control_ioctl(struct fi ret = loop_lookup(&lo, parm); if (ret < 0) break; - mutex_lock(&lo->lo_ctl_mutex); + mutex_lock(&loop_ctl_mutex); if (lo->lo_state != Lo_unbound) { ret = -EBUSY; - mutex_unlock(&lo->lo_ctl_mutex); + mutex_unlock(&loop_ctl_mutex); break; } if (lo->lo_refcnt > 0) { ret = -EBUSY; - mutex_unlock(&lo->lo_ctl_mutex); + mutex_unlock(&loop_ctl_mutex); break; } lo->lo_disk->private_data = NULL; - mutex_unlock(&lo->lo_ctl_mutex); + mutex_unlock(&loop_ctl_mutex); idr_remove(&loop_index_idr, lo->lo_number); loop_remove(lo); break; --- a/drivers/block/loop.h +++ b/drivers/block/loop.h @@ -55,7 +55,6 @@ struct loop_device { struct bio_list lo_bio_list; unsigned int lo_bio_count; int lo_state; - struct mutex lo_ctl_mutex; struct task_struct *lo_thread; wait_queue_head_t lo_event; /* wait queue for incoming requests */
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Mironov mironov.ivan@gmail.com
commit 66a8d5bfb518f9f12d47e1d2dce1732279f9451e upstream.
Strict requirement of pixclock to be zero breaks support of SDL 1.2 which contains hardcoded table of supported video modes with non-zero pixclock values[1].
To better understand which pixclock values are considered valid and how driver should handle these values, I briefly examined few existing fbdev drivers and documentation in Documentation/fb/. And it looks like there are no strict rules on that and actual behaviour varies:
* some drivers treat (pixclock == 0) as "use defaults" (uvesafb.c); * some treat (pixclock == 0) as invalid value which leads to -EINVAL (clps711x-fb.c); * some pass converted pixclock value to hardware (uvesafb.c); * some are trying to find nearest value from predefined table (vga16fb.c, video_gx.c).
Given this, I believe that it should be safe to just ignore this value if changing is not supported. It seems that any portable fbdev application which was not written only for one specific device working under one specific kernel version should not rely on any particular behaviour of pixclock anyway.
However, while enabling SDL1 applications to work out of the box when there is no /etc/fb.modes with valid settings, this change affects the video mode choosing logic in SDL. Depending on current screen resolution, contents of /etc/fb.modes and resolution requested by application, this may lead to user-visible difference (not always): image will be displayed in a right way, but it will be aligned to the left instead of center. There is no "right behaviour" here as well, as emulated fbdev, opposing to old fbdev drivers, simply ignores any requsts of video mode changes with resolutions smaller than current.
The easiest way to reproduce this problem is to install sdl-sopwith[2], remove /etc/fb.modes file if it exists, and then try to run sopwith from console without X. At least in Fedora 29, sopwith may be simply installed from standard repositories.
[1] SDL 1.2.15 source code, src/video/fbcon/SDL_fbvideo.c, vesa_timings [2] http://sdl-sopwith.sourceforge.net/
Signed-off-by: Ivan Mironov mironov.ivan@gmail.com Cc: stable@vger.kernel.org Fixes: 79e539453b34e ("DRM: i915: add mode setting support") Fixes: 771fe6b912fca ("drm/radeon: introduce kernel modesetting for radeon hardware") Fixes: 785b93ef8c309 ("drm/kms: move driver specific fb common code to helper functions (v2)") Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch Link: https://patchwork.freedesktop.org/patch/msgid/20190108072353.28078-3-mironov... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/drm_fb_helper.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -822,9 +822,14 @@ int drm_fb_helper_check_var(struct fb_va struct drm_framebuffer *fb = fb_helper->fb; int depth;
- if (var->pixclock != 0 || in_dbg_master()) + if (in_dbg_master()) return -EINVAL;
+ if (var->pixclock != 0) { + DRM_DEBUG("fbdev emulation doesn't support changing the pixel clock, value of pixclock is ignored\n"); + var->pixclock = 0; + } + /* Need to resize the fb object !!! */ if (var->bits_per_pixel > fb->bits_per_pixel || var->xres > fb->width || var->yres > fb->height ||
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mauro Carvalho Chehab mchehab+samsung@kernel.org
commit c06ef2e9acef4cda1feee2ce055b8086e33d251a upstream.
As reported by smatch: drivers/media/common/videobuf2/videobuf2-core.c: drivers/media/common/videobuf2/videobuf2-core.c:2159 vb2_mmap() warn: inconsistent returns 'mutex:&q->mmap_lock'. Locked on: line 2148 Unlocked on: line 2100 line 2108 line 2113 line 2118 line 2156 line 2159
There is one error condition that doesn't unlock a mutex.
Fixes: cd26d1c4d1bc ("media: vb2: vb2_mmap: move lock up") Reviewed-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab+samsung@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/media/v4l2-core/videobuf2-core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/media/v4l2-core/videobuf2-core.c +++ b/drivers/media/v4l2-core/videobuf2-core.c @@ -2501,7 +2501,8 @@ int vb2_mmap(struct vb2_queue *q, struct if (length < (vma->vm_end - vma->vm_start)) { dprintk(1, "MMAP invalid, as it would overflow buffer length\n"); - return -EINVAL; + ret = -EINVAL; + goto unlock; }
ret = call_memop(vb, mmap, vb->planes[plane].mem_priv, vma);
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kai-Heng Feng kai.heng.feng@canonical.com
[ Upstream commit 36352991835ce99e46b4441dd0eb6980f9a83e8f ]
There are two new Realtek Ethernet devices which are re-branded r8168h. Add the IDs to to support them.
Signed-off-by: Kai-Heng Feng kai.heng.feng@canonical.com Reviewed-by: Heiner Kallweit hkallweit1@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/realtek/r8169.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -324,6 +324,8 @@ enum cfg_version { };
static const struct pci_device_id rtl8169_pci_tbl[] = { + { PCI_VDEVICE(REALTEK, 0x2502), RTL_CFG_1 }, + { PCI_VDEVICE(REALTEK, 0x2600), RTL_CFG_1 }, { PCI_DEVICE(PCI_VENDOR_ID_REALTEK, 0x8129), 0, 0, RTL_CFG_0 }, { PCI_DEVICE(PCI_VENDOR_ID_REALTEK, 0x8136), 0, 0, RTL_CFG_2 }, { PCI_DEVICE(PCI_VENDOR_ID_REALTEK, 0x8161), 0, 0, RTL_CFG_1 },
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit e4849aff1e169b86c561738daf8ff020e9de1011 ]
The Broadcom SiByte BCM1250, BCM1125, and BCM1125H SOCs have an onchip DRAM controller that supports memory amounts of up to 16GiB, and due to how the address decoder has been wired in the SOC any memory beyond 1GiB is actually mapped starting from 4GiB physical up, that is beyond the 32-bit addressable limit[1]. Consequently if the maximum amount of memory has been installed, then it will span up to 19GiB.
Many of the evaluation boards we support that are based on one of these SOCs have their memory soldered and the amount present fits in the 32-bit address range. The BCM91250A SWARM board however has actual DIMM slots and accepts, depending on the peripherals revision of the SOC, up to 4GiB or 8GiB of memory in commercially available JEDEC modules[2]. I believe this is also the case with the BCM91250C2 LittleSur board. This means that up to either 3GiB or 7GiB of memory requires 64-bit addressing to access.
I believe the BCM91480B BigSur board, which has the BCM1480 SOC instead, accepts at least as much memory, although I have no documentation or actual hardware available to verify that.
Both systems have PCI slots installed for use by any PCI option boards, including ones that only support 32-bit addressing (additionally the 32-bit PCI host bridge of the BCM1250, BCM1125, and BCM1125H SOCs limits addressing to 32-bits), and there is no IOMMU available. Therefore for PCI DMA to work in the presence of memory beyond enable swiotlb for the affected systems.
All the other SOC onchip DMA devices use 40-bit addressing and therefore can address the whole memory, so only enable swiotlb if PCI support and support for DMA beyond 4GiB have been both enabled in the configuration of the kernel.
This shows up as follows:
Broadcom SiByte BCM1250 B2 @ 800 MHz (SB1 rev 2) Board type: SiByte BCM91250A (SWARM) Determined physical RAM map: memory: 000000000fe7fe00 @ 0000000000000000 (usable) memory: 000000001ffffe00 @ 0000000080000000 (usable) memory: 000000000ffffe00 @ 00000000c0000000 (usable) memory: 0000000087fffe00 @ 0000000100000000 (usable) software IO TLB: mapped [mem 0xcbffc000-0xcfffc000] (64MB)
in the bootstrap log and removes failures like these:
defxx 0000:02:00.0: dma_direct_map_page: overflow 0x0000000185bc6080+4608 of device mask ffffffff bus mask 0 fddi0: Receive buffer allocation failed fddi0: Adapter open failed! IP-Config: Failed to open fddi0 defxx 0000:09:08.0: dma_direct_map_page: overflow 0x0000000185bc6080+4608 of device mask ffffffff bus mask 0 fddi1: Receive buffer allocation failed fddi1: Adapter open failed! IP-Config: Failed to open fddi1
when memory beyond 4GiB is handed out to devices that can only do 32-bit addressing.
This updates commit cce335ae47e2 ("[MIPS] 64-bit Sibyte kernels need DMA32.").
References:
[1] "BCM1250/BCM1125/BCM1125H User Manual", Revision 1250_1125-UM100-R, Broadcom Corporation, 21 Oct 2002, Section 3: "System Overview", "Memory Map", pp. 34-38
[2] "BCM91250A User Manual", Revision 91250A-UM100-R, Broadcom Corporation, 18 May 2004, Section 3: "Physical Description", "Supported DRAM", p. 23
Signed-off-by: Maciej W. Rozycki macro@linux-mips.org [paul.burton@mips.com: Remove GPL text from dma.c; SPDX tag covers it] Signed-off-by: Paul Burton paul.burton@mips.com Reviewed-by: Christoph Hellwig hch@lst.de Patchwork: https://patchwork.linux-mips.org/patch/21108/ References: cce335ae47e2 ("[MIPS] 64-bit Sibyte kernels need DMA32.") Cc: Ralf Baechle ralf@linux-mips.org Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/mips/Kconfig | 3 +++ arch/mips/sibyte/common/Makefile | 1 + arch/mips/sibyte/common/dma.c | 14 ++++++++++++++ 3 files changed, 18 insertions(+) create mode 100644 arch/mips/sibyte/common/dma.c
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 9536ef912f59..077ae203e8f9 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -627,6 +627,7 @@ config SIBYTE_SWARM select SYS_SUPPORTS_HIGHMEM select SYS_SUPPORTS_LITTLE_ENDIAN select ZONE_DMA32 if 64BIT + select SWIOTLB if ARCH_DMA_ADDR_T_64BIT && PCI
config SIBYTE_LITTLESUR bool "Sibyte BCM91250C2-LittleSur" @@ -649,6 +650,7 @@ config SIBYTE_SENTOSA select SYS_HAS_CPU_SB1 select SYS_SUPPORTS_BIG_ENDIAN select SYS_SUPPORTS_LITTLE_ENDIAN + select SWIOTLB if ARCH_DMA_ADDR_T_64BIT && PCI
config SIBYTE_BIGSUR bool "Sibyte BCM91480B-BigSur" @@ -662,6 +664,7 @@ config SIBYTE_BIGSUR select SYS_SUPPORTS_HIGHMEM select SYS_SUPPORTS_LITTLE_ENDIAN select ZONE_DMA32 if 64BIT + select SWIOTLB if ARCH_DMA_ADDR_T_64BIT && PCI
config SNI_RM bool "SNI RM200/300/400" diff --git a/arch/mips/sibyte/common/Makefile b/arch/mips/sibyte/common/Makefile index b3d6bf23a662..3ef3fb658136 100644 --- a/arch/mips/sibyte/common/Makefile +++ b/arch/mips/sibyte/common/Makefile @@ -1,4 +1,5 @@ obj-y := cfe.o +obj-$(CONFIG_SWIOTLB) += dma.o obj-$(CONFIG_SIBYTE_BUS_WATCHER) += bus_watcher.o obj-$(CONFIG_SIBYTE_CFE_CONSOLE) += cfe_console.o obj-$(CONFIG_SIBYTE_TBPROF) += sb_tbprof.o diff --git a/arch/mips/sibyte/common/dma.c b/arch/mips/sibyte/common/dma.c new file mode 100644 index 000000000000..eb47a94f3583 --- /dev/null +++ b/arch/mips/sibyte/common/dma.c @@ -0,0 +1,14 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * DMA support for Broadcom SiByte platforms. + * + * Copyright (c) 2018 Maciej W. Rozycki + */ + +#include <linux/swiotlb.h> +#include <asm/bootinfo.h> + +void __init plat_swiotlb_setup(void) +{ + swiotlb_init(1); +}
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit a788c5272769ddbcdbab297cf386413eeac04463 ]
jffs2_sync_fs makes the assumption that if CONFIG_JFFS2_FS_WRITEBUFFER is defined then a write buffer is available and has been initialized. However, this does is not the case when the mtd device has no out-of-band buffer:
int jffs2_nand_flash_setup(struct jffs2_sb_info *c) { if (!c->mtd->oobsize) return 0; ...
The resulting call to cancel_delayed_work_sync passing a uninitialized (but zeroed) delayed_work struct forces lockdep to become disabled.
[ 90.050639] overlayfs: upper fs does not support tmpfile. [ 90.652264] INFO: trying to register non-static key. [ 90.662171] the code is fine but needs lockdep annotation. [ 90.673090] turning off the locking correctness validator. [ 90.684021] CPU: 0 PID: 1762 Comm: mount_root Not tainted 4.14.63 #0 [ 90.696672] Stack : 00000000 00000000 80d8f6a2 00000038 805f0000 80444600 8fe364f4 805dfbe7 [ 90.713349] 80563a30 000006e2 8068370c 00000001 00000000 00000001 8e2fdc48 ffffffff [ 90.730020] 00000000 00000000 80d90000 00000000 00000106 00000000 6465746e 312e3420 [ 90.746690] 6b636f6c 03bf0000 f8000000 20676e69 00000000 80000000 00000000 8e2c2a90 [ 90.763362] 80d90000 00000001 00000000 8e2c2a90 00000003 80260dc0 08052098 80680000 [ 90.780033] ... [ 90.784902] Call Trace: [ 90.789793] [<8000f0d8>] show_stack+0xb8/0x148 [ 90.798659] [<8005a000>] register_lock_class+0x270/0x55c [ 90.809247] [<8005cb64>] __lock_acquire+0x13c/0xf7c [ 90.818964] [<8005e314>] lock_acquire+0x194/0x1dc [ 90.828345] [<8003f27c>] flush_work+0x200/0x24c [ 90.837374] [<80041dfc>] __cancel_work_timer+0x158/0x210 [ 90.847958] [<801a8770>] jffs2_sync_fs+0x20/0x54 [ 90.857173] [<80125cf4>] iterate_supers+0xf4/0x120 [ 90.866729] [<80158fc4>] sys_sync+0x44/0x9c [ 90.875067] [<80014424>] syscall_common+0x34/0x58
Signed-off-by: Daniel Santos daniel.santos@pobox.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Boris Brezillon boris.brezillon@bootlin.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/jffs2/super.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/jffs2/super.c b/fs/jffs2/super.c index f3194e5147fd..0bbc31d10857 100644 --- a/fs/jffs2/super.c +++ b/fs/jffs2/super.c @@ -101,7 +101,8 @@ static int jffs2_sync_fs(struct super_block *sb, int wait) struct jffs2_sb_info *c = JFFS2_SB_INFO(sb);
#ifdef CONFIG_JFFS2_FS_WRITEBUFFER - cancel_delayed_work_sync(&c->wbuf_dwork); + if (jffs2_is_writebuffered(c)) + cancel_delayed_work_sync(&c->wbuf_dwork); #endif
mutex_lock(&c->alloc_sem);
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit 30696378f68a9e3dad6bfe55938b112e72af00c2 ]
The ramoops backend currently calls persistent_ram_save_old() even if a buffer is empty. While this appears to work, it is does not seem like the right thing to do and could lead to future bugs so lets avoid that. It also prevents misleading prints in the logs which claim the buffer is valid.
I got something like:
found existing buffer, size 0, start 0
When I was expecting:
no valid data in buffer (sig = ...)
This bails out early (and reports with pr_debug()), since it's an acceptable state.
Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org Co-developed-by: Kees Cook keescook@chromium.org Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/pstore/ram_core.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c index 288ce45f613e..5ffbf60f5c88 100644 --- a/fs/pstore/ram_core.c +++ b/fs/pstore/ram_core.c @@ -484,6 +484,11 @@ static int persistent_ram_post_init(struct persistent_ram_zone *prz, u32 sig, sig ^= PERSISTENT_RAM_SIG;
if (prz->buffer->sig == sig) { + if (buffer_size(prz) == 0) { + pr_debug("found existing empty buffer\n"); + return 0; + } + if (buffer_size(prz) > prz->buffer_size || buffer_start(prz) > buffer_size(prz)) pr_info("found existing invalid buffer, size %zu, start %zu\n",
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit 2b038cbc5fcf12a7ee1cc9bfd5da1e46dacdee87 ]
When booting a pseries kernel with PREEMPT enabled, it dumps the following warning:
BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 caller is pseries_processor_idle_init+0x5c/0x22c CPU: 13 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc3-00090-g12201a0128bc-dirty #828 Call Trace: [c000000429437ab0] [c0000000009c8878] dump_stack+0xec/0x164 (unreliable) [c000000429437b00] [c0000000005f2f24] check_preemption_disabled+0x154/0x160 [c000000429437b90] [c000000000cab8e8] pseries_processor_idle_init+0x5c/0x22c [c000000429437c10] [c000000000010ed4] do_one_initcall+0x64/0x300 [c000000429437ce0] [c000000000c54500] kernel_init_freeable+0x3f0/0x500 [c000000429437db0] [c0000000000112dc] kernel_init+0x2c/0x160 [c000000429437e20] [c00000000000c1d0] ret_from_kernel_thread+0x5c/0x6c
This happens because the code calls get_lppaca() which calls get_paca() and it checks if preemption is disabled through check_preemption_disabled().
Preemption should be disabled because the per CPU variable may make no sense if there is a preemption (and a CPU switch) after it reads the per CPU data and when it is used.
In this device driver specifically, it is not a problem, because this code just needs to have access to one lppaca struct, and it does not matter if it is the current per CPU lppaca struct or not (i.e. when there is a preemption and a CPU migration).
That said, the most appropriate fix seems to be related to avoiding the debug_smp_processor_id() call at get_paca(), instead of calling preempt_disable() before get_paca().
Signed-off-by: Breno Leitao leitao@debian.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cpuidle/cpuidle-pseries.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c index 6f7b01956885..b755d3a54e3c 100644 --- a/drivers/cpuidle/cpuidle-pseries.c +++ b/drivers/cpuidle/cpuidle-pseries.c @@ -237,7 +237,13 @@ static int pseries_idle_probe(void) return -ENODEV;
if (firmware_has_feature(FW_FEATURE_SPLPAR)) { - if (lppaca_shared_proc(get_lppaca())) { + /* + * Use local_paca instead of get_lppaca() since + * preemption is not disabled, and it is not required in + * fact, since lppaca_ptr does not need to be the value + * associated to the current CPU, it can be from any CPU. + */ + if (lppaca_shared_proc(local_paca->lppaca_ptr)) { cpuidle_state_table = shared_states; max_idle_state = ARRAY_SIZE(shared_states); } else {
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit b2e9a4eda11fd2cb1e6714e9ad3f455c402568ff ]
Clang warns:
drivers/media/firewire/firedtv-avc.c:999:45: warning: implicit conversion from 'int' to 'char' changes value from 159 to -97 [-Wconstant-conversion] app_info[0] = (EN50221_TAG_APP_INFO >> 16) & 0xff; ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~ drivers/media/firewire/firedtv-avc.c:1000:45: warning: implicit conversion from 'int' to 'char' changes value from 128 to -128 [-Wconstant-conversion] app_info[1] = (EN50221_TAG_APP_INFO >> 8) & 0xff; ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~ drivers/media/firewire/firedtv-avc.c:1040:44: warning: implicit conversion from 'int' to 'char' changes value from 159 to -97 [-Wconstant-conversion] app_info[0] = (EN50221_TAG_CA_INFO >> 16) & 0xff; ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~ drivers/media/firewire/firedtv-avc.c:1041:44: warning: implicit conversion from 'int' to 'char' changes value from 128 to -128 [-Wconstant-conversion] app_info[1] = (EN50221_TAG_CA_INFO >> 8) & 0xff; ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~ 4 warnings generated.
Change app_info's type to unsigned char to match the type of the member msg in struct ca_msg, which is the only thing passed into the app_info parameter in this function.
Link: https://github.com/ClangBuiltLinux/linux/issues/105
Signed-off-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Mauro Carvalho Chehab mchehab+samsung@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/firewire/firedtv-avc.c | 6 ++++-- drivers/media/firewire/firedtv.h | 6 ++++-- 2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/media/firewire/firedtv-avc.c b/drivers/media/firewire/firedtv-avc.c index 251a556112a9..280b5ffea592 100644 --- a/drivers/media/firewire/firedtv-avc.c +++ b/drivers/media/firewire/firedtv-avc.c @@ -968,7 +968,8 @@ static int get_ca_object_length(struct avc_response_frame *r) return r->operand[7]; }
-int avc_ca_app_info(struct firedtv *fdtv, char *app_info, unsigned int *len) +int avc_ca_app_info(struct firedtv *fdtv, unsigned char *app_info, + unsigned int *len) { struct avc_command_frame *c = (void *)fdtv->avc_data; struct avc_response_frame *r = (void *)fdtv->avc_data; @@ -1009,7 +1010,8 @@ out: return ret; }
-int avc_ca_info(struct firedtv *fdtv, char *app_info, unsigned int *len) +int avc_ca_info(struct firedtv *fdtv, unsigned char *app_info, + unsigned int *len) { struct avc_command_frame *c = (void *)fdtv->avc_data; struct avc_response_frame *r = (void *)fdtv->avc_data; diff --git a/drivers/media/firewire/firedtv.h b/drivers/media/firewire/firedtv.h index c2ba085e0d20..54853159482e 100644 --- a/drivers/media/firewire/firedtv.h +++ b/drivers/media/firewire/firedtv.h @@ -124,8 +124,10 @@ int avc_lnb_control(struct firedtv *fdtv, char voltage, char burst, struct dvb_diseqc_master_cmd *diseqcmd); void avc_remote_ctrl_work(struct work_struct *work); int avc_register_remote_control(struct firedtv *fdtv); -int avc_ca_app_info(struct firedtv *fdtv, char *app_info, unsigned int *len); -int avc_ca_info(struct firedtv *fdtv, char *app_info, unsigned int *len); +int avc_ca_app_info(struct firedtv *fdtv, unsigned char *app_info, + unsigned int *len); +int avc_ca_info(struct firedtv *fdtv, unsigned char *app_info, + unsigned int *len); int avc_ca_reset(struct firedtv *fdtv); int avc_ca_pmt(struct firedtv *fdtv, char *app_info, int length); int avc_ca_get_time_date(struct firedtv *fdtv, int *interval);
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit 0fbe82e628c817e292ff588cd5847fc935e025f2 ]
after set SO_DONTROUTE to 1, the IP layer should not route packets if the dest IP address is not in link scope. But if the socket has cached the dst_entry, such packets would be routed until the sk_dst_cache expires. So we should clean the sk_dst_cache when a user set SO_DONTROUTE option. Below are server/client python scripts which could reprodue this issue:
server side code:
========================================================================== import socket import struct import time
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.bind(('0.0.0.0', 9000)) s.listen(1) sock, addr = s.accept() sock.setsockopt(socket.SOL_SOCKET, socket.SO_DONTROUTE, struct.pack('i', 1)) while True: sock.send(b'foo') time.sleep(1) ==========================================================================
client side code: ========================================================================== import socket import time
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect(('server_address', 9000)) while True: data = s.recv(1024) print(data) ==========================================================================
Signed-off-by: yupeng yupeng0921@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/sock.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/core/sock.c b/net/core/sock.c index 91ee0eb70fcf..039f101e7807 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -720,6 +720,7 @@ int sock_setsockopt(struct socket *sock, int level, int optname, break; case SO_DONTROUTE: sock_valbool_flag(sk, SOCK_LOCALROUTE, valbool); + sk_dst_reset(sk); break; case SO_BROADCAST: sock_valbool_flag(sk, SOCK_BROADCAST, valbool);
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit 0de263577de5d5e052be5f4f93334e63cc8a7f0b ]
spc5r17.pdf specifies:
4.3.1 ASCII data field requirements ASCII data fields shall contain only ASCII printable characters (i.e., code values 20h to 7Eh) and may be terminated with one or more ASCII null (00h) characters. ASCII data fields described as being left-aligned shall have any unused bytes at the end of the field (i.e., highest offset) and the unused bytes shall be filled with ASCII space characters (20h).
LIO currently space-pads the T10 VENDOR IDENTIFICATION and PRODUCT IDENTIFICATION fields in the standard INQUIRY data. However, the PRODUCT REVISION LEVEL field in the standard INQUIRY data as well as the T10 VENDOR IDENTIFICATION field in the INQUIRY Device Identification VPD Page are zero-terminated/zero-padded.
Fix this inconsistency by using space-padding for all of the above fields.
Signed-off-by: David Disseldorp ddiss@suse.de Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Bryant G. Ly bly@catalogicsoftware.com Reviewed-by: Lee Duncan lduncan@suse.com Reviewed-by: Hannes Reinecke hare@suse.com Reviewed-by: Roman Bolshakov r.bolshakov@yadro.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/target/target_core_spc.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/drivers/target/target_core_spc.c b/drivers/target/target_core_spc.c index 614005b6b08b..3ab264d4fb1f 100644 --- a/drivers/target/target_core_spc.c +++ b/drivers/target/target_core_spc.c @@ -112,12 +112,17 @@ spc_emulate_inquiry_std(struct se_cmd *cmd, unsigned char *buf)
buf[7] = 0x2; /* CmdQue=1 */
- memcpy(&buf[8], "LIO-ORG ", 8); - memset(&buf[16], 0x20, 16); + /* + * ASCII data fields described as being left-aligned shall have any + * unused bytes at the end of the field (i.e., highest offset) and the + * unused bytes shall be filled with ASCII space characters (20h). + */ + memset(&buf[8], 0x20, 8 + 16 + 4); + memcpy(&buf[8], "LIO-ORG", sizeof("LIO-ORG") - 1); memcpy(&buf[16], dev->t10_wwn.model, - min_t(size_t, strlen(dev->t10_wwn.model), 16)); + strnlen(dev->t10_wwn.model, 16)); memcpy(&buf[32], dev->t10_wwn.revision, - min_t(size_t, strlen(dev->t10_wwn.revision), 4)); + strnlen(dev->t10_wwn.revision, 4)); buf[4] = 31; /* Set additional length to 31 */
return 0; @@ -257,7 +262,9 @@ check_t10_vend_desc: buf[off] = 0x2; /* ASCII */ buf[off+1] = 0x1; /* T10 Vendor ID */ buf[off+2] = 0x0; - memcpy(&buf[off+4], "LIO-ORG", 8); + /* left align Vendor ID and pad with spaces */ + memset(&buf[off+4], 0x20, 8); + memcpy(&buf[off+4], "LIO-ORG", sizeof("LIO-ORG") - 1); /* Extra Byte for NULL Terminator */ id_len++; /* Identifier Length */
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit f7542d817733f461258fd3a47d77da35b2d9fc81 ]
The exclusive gates may be set up in the wrong way by software running before the clock driver comes up. In that case the exclusive setup is locked in its initial state, as the complementary function can't be activated without disabling the initial setup first.
To avoid this lock situation, reset the exclusive gates to the off state and allow the kernel to provide the proper setup.
Signed-off-by: Lucas Stach l.stach@pengutronix.de Reviewed-by: Dong Aisheng Aisheng.dong@nxp.com Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/mach-imx/clk-imx6q.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/arm/mach-imx/clk-imx6q.c b/arch/arm/mach-imx/clk-imx6q.c index 5474a76803f0..936ed9bae595 100644 --- a/arch/arm/mach-imx/clk-imx6q.c +++ b/arch/arm/mach-imx/clk-imx6q.c @@ -222,8 +222,12 @@ static void __init imx6q_clocks_init(struct device_node *ccm_node) * lvds1_gate and lvds2_gate are pseudo-gates. Both can be * independently configured as clock inputs or outputs. We treat * the "output_enable" bit as a gate, even though it's really just - * enabling clock output. + * enabling clock output. Initially the gate bits are cleared, as + * otherwise the exclusive configuration gets locked in the setup done + * by software running before the clock driver, with no way to change + * it. */ + writel(readl(base + 0x160) & ~0x3c00, base + 0x160); clk[IMX6QDL_CLK_LVDS1_GATE] = imx_clk_gate_exclusive("lvds1_gate", "lvds1_sel", base + 0x160, 10, BIT(12)); clk[IMX6QDL_CLK_LVDS2_GATE] = imx_clk_gate_exclusive("lvds2_gate", "lvds2_sel", base + 0x160, 11, BIT(13));
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit fbac5977d81cb2b2b7e37b11c459055d9585273c ]
An unterminated string literal followed by new line is passed to the parser (with "multi-line strings not supported" warning shown), then handled properly there.
On the other hand, an unterminated string literal at end of file is never passed to the parser, then results in memory leak.
[Test Code]
----------(Kconfig begin)---------- source "Kconfig.inc"
config A bool "a" -----------(Kconfig end)-----------
--------(Kconfig.inc begin)-------- config B bool "b\No new line at end of file ---------(Kconfig.inc end)---------
[Summary from Valgrind]
Before the fix:
LEAK SUMMARY: definitely lost: 16 bytes in 1 blocks ...
After the fix:
LEAK SUMMARY: definitely lost: 0 bytes in 0 blocks ...
Eliminate the memory leak path by handling this case. Of course, such a Kconfig file is wrong already, so I will add an error message later.
Signed-off-by: Masahiro Yamada yamada.masahiro@socionext.com Signed-off-by: Sasha Levin sashal@kernel.org --- scripts/kconfig/zconf.l | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/scripts/kconfig/zconf.l b/scripts/kconfig/zconf.l index 6c62d93b4ffb..485b19a461f8 100644 --- a/scripts/kconfig/zconf.l +++ b/scripts/kconfig/zconf.l @@ -180,6 +180,8 @@ n [A-Za-z0-9_] } <<EOF>> { BEGIN(INITIAL); + yylval.string = text; + return T_WORD_QUOTE; } }
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit ae460c115b7aa50c9a36cf78fced07b27962c9d0 ]
On our AT91SAM9260 board we use the same sdio bus for wifi and for the sd card slot. This caused the atmel-mci to give the following splat on the serial console:
------------[ cut here ]------------ WARNING: CPU: 0 PID: 538 at drivers/mmc/host/atmel-mci.c:859 atmci_send_command+0x24/0x44 Modules linked in: CPU: 0 PID: 538 Comm: mmcqd/0 Not tainted 4.14.76 #14 Hardware name: Atmel AT91SAM9 [<c000fccc>] (unwind_backtrace) from [<c000d3dc>] (show_stack+0x10/0x14) [<c000d3dc>] (show_stack) from [<c0017644>] (__warn+0xd8/0xf4) [<c0017644>] (__warn) from [<c0017704>] (warn_slowpath_null+0x1c/0x24) [<c0017704>] (warn_slowpath_null) from [<c033bb9c>] (atmci_send_command+0x24/0x44) [<c033bb9c>] (atmci_send_command) from [<c033e984>] (atmci_start_request+0x1f4/0x2dc) [<c033e984>] (atmci_start_request) from [<c033f3b4>] (atmci_request+0xf0/0x164) [<c033f3b4>] (atmci_request) from [<c0327108>] (mmc_start_request+0x280/0x2d0) [<c0327108>] (mmc_start_request) from [<c032800c>] (mmc_start_areq+0x230/0x330) [<c032800c>] (mmc_start_areq) from [<c03366f8>] (mmc_blk_issue_rw_rq+0xc4/0x310) [<c03366f8>] (mmc_blk_issue_rw_rq) from [<c03372c4>] (mmc_blk_issue_rq+0x118/0x5ac) [<c03372c4>] (mmc_blk_issue_rq) from [<c033781c>] (mmc_queue_thread+0xc4/0x118) [<c033781c>] (mmc_queue_thread) from [<c002daf8>] (kthread+0x100/0x118) [<c002daf8>] (kthread) from [<c000a580>] (ret_from_fork+0x14/0x34) ---[ end trace 594371ddfa284bd6 ]---
This is: WARN_ON(host->cmd);
This was fixed on our board by letting atmci_request_end determine what state we are in. Instead of unconditionally setting it to STATE_IDLE on STATE_END_REQUEST.
Signed-off-by: Jonas Danielsson jonas@orbital-systems.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/host/atmel-mci.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/mmc/host/atmel-mci.c b/drivers/mmc/host/atmel-mci.c index 6423083c5238..bacc0368db6b 100644 --- a/drivers/mmc/host/atmel-mci.c +++ b/drivers/mmc/host/atmel-mci.c @@ -1838,13 +1838,14 @@ static void atmci_tasklet_func(unsigned long priv) }
atmci_request_end(host, host->mrq); - state = STATE_IDLE; + goto unlock; /* atmci_request_end() sets host->state */ break; } } while (state != prev_state);
host->state = state;
+unlock: spin_unlock(&host->lock); }
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit 2f5302533f306d5ee87bd375aef9ca35b91762cb ]
The strncpy() function may leave the destination string buffer unterminated, better use strlcpy() that we have a __weak fallback implementation for systems without it.
In this specific case this would only happen if fgets() was buggy, as its man page states that it should read one less byte than the size of the destination buffer, so that it can put the nul byte at the end of it, so it would never copy 255 non-nul chars, as fgets reads into the orig buffer at most 254 non-nul chars and terminates it. But lets just switch to strlcpy to keep the original intent and silence the gcc 8.2 warning.
This fixes this warning on an Alpine Linux Edge system with gcc 8.2:
In function 'cpu_model', inlined from 'svg_cpu_box' at util/svghelper.c:378:2: util/svghelper.c:337:5: error: 'strncpy' output may be truncated copying 255 bytes from a string of length 255 [-Werror=stringop-truncation] strncpy(cpu_m, &buf[13], 255); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cc: Adrian Hunter adrian.hunter@intel.com Cc: Jiri Olsa jolsa@kernel.org Cc: Namhyung Kim namhyung@kernel.org Cc: Arjan van de Ven arjan@linux.intel.com Fixes: f48d55ce7871 ("perf: Add a SVG helper library file") Link: https://lkml.kernel.org/n/tip-xzkoo0gyr56gej39ltivuh9g@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/svghelper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/util/svghelper.c b/tools/perf/util/svghelper.c index 283d3e73e2f2..48a841462536 100644 --- a/tools/perf/util/svghelper.c +++ b/tools/perf/util/svghelper.c @@ -333,7 +333,7 @@ static char *cpu_model(void) if (file) { while (fgets(buf, 255, file)) { if (strstr(buf, "model name")) { - strncpy(cpu_m, &buf[13], 255); + strlcpy(cpu_m, &buf[13], 255); break; } }
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit bd8d57fb7e25e9fcf67a9eef5fa13aabe2016e07 ]
The strncpy() function may leave the destination string buffer unterminated, better use strlcpy() that we have a __weak fallback implementation for systems without it.
This fixes this warning on an Alpine Linux Edge system with gcc 8.2:
util/parse-events.c: In function 'print_symbol_events': util/parse-events.c:2465:4: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation] strncpy(name, syms->symbol, MAX_NAME_LEN); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In function 'print_symbol_events.constprop', inlined from 'print_events' at util/parse-events.c:2508:2: util/parse-events.c:2465:4: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation] strncpy(name, syms->symbol, MAX_NAME_LEN); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In function 'print_symbol_events.constprop', inlined from 'print_events' at util/parse-events.c:2511:2: util/parse-events.c:2465:4: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation] strncpy(name, syms->symbol, MAX_NAME_LEN); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors
Cc: Adrian Hunter adrian.hunter@intel.com Cc: Jiri Olsa jolsa@kernel.org Cc: Namhyung Kim namhyung@kernel.org Fixes: 947b4ad1d198 ("perf list: Fix max event string size") Link: https://lkml.kernel.org/n/tip-b663e33bm6x8hrkie4uxh7u2@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/parse-events.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index c659a3ca1283..78b452421375 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -1319,7 +1319,7 @@ static void print_symbol_events(const char *event_glob, unsigned type, if (strlen(syms->alias)) snprintf(name, MAX_NAME_LEN, "%s OR %s", syms->symbol, syms->alias); else - strncpy(name, syms->symbol, MAX_NAME_LEN); + strlcpy(name, syms->symbol, MAX_NAME_LEN);
printf(" %-50s [%s]\n", name, event_type_descriptors[type]);
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit d7e6b8dfc7bcb3f4f3a18313581f67486a725b52 ]
When using kcopyd to run callbacks through dm_kcopyd_do_callback() or submitting copy jobs with a source size of 0, the jobs are pushed directly to the complete_jobs list, which could be under processing by the kcopyd thread. As a result, the kcopyd thread can continue running completed jobs indefinitely, without releasing the CPU, as long as someone keeps submitting new completed jobs through the aforementioned paths. Processing of work items, queued for execution on the same CPU as the currently running kcopyd thread, is thus stalled for excessive amounts of time, hurting performance.
Running the following test, from the device mapper test suite [1],
dmtest run --suite snapshot -n parallel_io_to_many_snaps_N
, with 8 active snapshots, we get, in dmesg, messages like the following:
[68899.948523] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 95s! [68899.949282] Showing busy workqueues and worker pools: [68899.949288] workqueue events: flags=0x0 [68899.949295] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 [68899.949306] pending: vmstat_shepherd, cache_reap [68899.949331] workqueue mm_percpu_wq: flags=0x8 [68899.949337] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949345] pending: vmstat_update [68899.949387] workqueue dm_bufio_cache: flags=0x8 [68899.949392] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 [68899.949400] pending: work_fn [dm_bufio] [68899.949423] workqueue kcopyd: flags=0x8 [68899.949429] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949437] pending: do_work [dm_mod] [68899.949452] workqueue kcopyd: flags=0x8 [68899.949458] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 [68899.949466] in-flight: 13:do_work [dm_mod] [68899.949474] pending: do_work [dm_mod] [68899.949487] workqueue kcopyd: flags=0x8 [68899.949493] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949501] pending: do_work [dm_mod] [68899.949515] workqueue kcopyd: flags=0x8 [68899.949521] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949529] pending: do_work [dm_mod] [68899.949541] workqueue kcopyd: flags=0x8 [68899.949547] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949555] pending: do_work [dm_mod] [68899.949568] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=95s workers=4 idle: 27130 27223 1084
Fix this by splitting the complete_jobs list into two parts: A user facing part, named callback_jobs, and one used internally by kcopyd, retaining the name complete_jobs. dm_kcopyd_do_callback() and dispatch_job() now push their jobs to the callback_jobs list, which is spliced to the complete_jobs list once, every time the kcopyd thread wakes up. This prevents kcopyd from hogging the CPU indefinitely and causing workqueue stalls.
Re-running the aforementioned test:
* Workqueue stalls are eliminated * The maximum writing time among all targets is reduced from 09m37.10s to 06m04.85s and the total run time of the test is reduced from 10m43.591s to 7m19.199s
[1] https://github.com/jthornber/device-mapper-test-suite
Signed-off-by: Nikos Tsironis ntsironis@arrikto.com Signed-off-by: Ilias Tsitsimpis iliastsi@arrikto.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/dm-kcopyd.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c index 77833b65e01e..9e626e5df2f6 100644 --- a/drivers/md/dm-kcopyd.c +++ b/drivers/md/dm-kcopyd.c @@ -55,15 +55,17 @@ struct dm_kcopyd_client { struct dm_kcopyd_throttle *throttle;
/* - * We maintain three lists of jobs: + * We maintain four lists of jobs: * * i) jobs waiting for pages * ii) jobs that have pages, and are waiting for the io to be issued. - * iii) jobs that have completed. + * iii) jobs that don't need to do any IO and just run a callback + * iv) jobs that have completed. * - * All three of these are protected by job_lock. + * All four of these are protected by job_lock. */ spinlock_t job_lock; + struct list_head callback_jobs; struct list_head complete_jobs; struct list_head io_jobs; struct list_head pages_jobs; @@ -583,6 +585,7 @@ static void do_work(struct work_struct *work) struct dm_kcopyd_client *kc = container_of(work, struct dm_kcopyd_client, kcopyd_work); struct blk_plug plug; + unsigned long flags;
/* * The order that these are called is *very* important. @@ -591,6 +594,10 @@ static void do_work(struct work_struct *work) * list. io jobs call wake when they complete and it all * starts again. */ + spin_lock_irqsave(&kc->job_lock, flags); + list_splice_tail_init(&kc->callback_jobs, &kc->complete_jobs); + spin_unlock_irqrestore(&kc->job_lock, flags); + blk_start_plug(&plug); process_jobs(&kc->complete_jobs, kc, run_complete_job); process_jobs(&kc->pages_jobs, kc, run_pages_job); @@ -608,7 +615,7 @@ static void dispatch_job(struct kcopyd_job *job) struct dm_kcopyd_client *kc = job->kc; atomic_inc(&kc->nr_jobs); if (unlikely(!job->source.count)) - push(&kc->complete_jobs, job); + push(&kc->callback_jobs, job); else if (job->pages == &zero_page_list) push(&kc->io_jobs, job); else @@ -795,7 +802,7 @@ void dm_kcopyd_do_callback(void *j, int read_err, unsigned long write_err) job->read_err = read_err; job->write_err = write_err;
- push(&kc->complete_jobs, job); + push(&kc->callback_jobs, job); wake(kc); } EXPORT_SYMBOL(dm_kcopyd_do_callback); @@ -825,6 +832,7 @@ struct dm_kcopyd_client *dm_kcopyd_client_create(struct dm_kcopyd_throttle *thro return ERR_PTR(-ENOMEM);
spin_lock_init(&kc->job_lock); + INIT_LIST_HEAD(&kc->callback_jobs); INIT_LIST_HEAD(&kc->complete_jobs); INIT_LIST_HEAD(&kc->io_jobs); INIT_LIST_HEAD(&kc->pages_jobs); @@ -874,6 +882,7 @@ void dm_kcopyd_client_destroy(struct dm_kcopyd_client *kc) /* Wait for completion of all jobs submitted by this client. */ wait_event(kc->destroyq, !atomic_read(&kc->nr_jobs));
+ BUG_ON(!list_empty(&kc->callback_jobs)); BUG_ON(!list_empty(&kc->complete_jobs)); BUG_ON(!list_empty(&kc->io_jobs)); BUG_ON(!list_empty(&kc->pages_jobs));
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit 721b1d98fb517ae99ab3b757021cf81db41e67be ]
kcopyd has no upper limit to the number of jobs one can allocate and issue. Under certain workloads this can lead to excessive memory usage and workqueue stalls. For example, when creating multiple dm-snapshot targets with a 4K chunk size and then writing to the origin through the page cache. Syncing the page cache causes a large number of BIOs to be issued to the dm-snapshot origin target, which itself issues an even larger (because of the BIO splitting taking place) number of kcopyd jobs.
Running the following test, from the device mapper test suite [1],
dmtest run --suite snapshot -n many_snapshots_of_same_volume_N
, with 8 active snapshots, results in the kcopyd job slab cache growing to 10G. Depending on the available system RAM this can lead to the OOM killer killing user processes:
[463.492878] kthreadd invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=1, oom_score_adj=0 [463.492894] kthreadd cpuset=/ mems_allowed=0 [463.492948] CPU: 7 PID: 2 Comm: kthreadd Not tainted 4.19.0-rc7 #3 [463.492950] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [463.492952] Call Trace: [463.492964] dump_stack+0x7d/0xbb [463.492973] dump_header+0x6b/0x2fc [463.492987] ? lockdep_hardirqs_on+0xee/0x190 [463.493012] oom_kill_process+0x302/0x370 [463.493021] out_of_memory+0x113/0x560 [463.493030] __alloc_pages_slowpath+0xf40/0x1020 [463.493055] __alloc_pages_nodemask+0x348/0x3c0 [463.493067] cache_grow_begin+0x81/0x8b0 [463.493072] ? cache_grow_begin+0x874/0x8b0 [463.493078] fallback_alloc+0x1e4/0x280 [463.493092] kmem_cache_alloc_node+0xd6/0x370 [463.493098] ? copy_process.part.31+0x1c5/0x20d0 [463.493105] copy_process.part.31+0x1c5/0x20d0 [463.493115] ? __lock_acquire+0x3cc/0x1550 [463.493121] ? __switch_to_asm+0x34/0x70 [463.493129] ? kthread_create_worker_on_cpu+0x70/0x70 [463.493135] ? finish_task_switch+0x90/0x280 [463.493165] _do_fork+0xe0/0x6d0 [463.493191] ? kthreadd+0x19f/0x220 [463.493233] kernel_thread+0x25/0x30 [463.493235] kthreadd+0x1bf/0x220 [463.493242] ? kthread_create_on_cpu+0x90/0x90 [463.493248] ret_from_fork+0x3a/0x50 [463.493279] Mem-Info: [463.493285] active_anon:20631 inactive_anon:4831 isolated_anon:0 [463.493285] active_file:80216 inactive_file:80107 isolated_file:435 [463.493285] unevictable:0 dirty:51266 writeback:109372 unstable:0 [463.493285] slab_reclaimable:31191 slab_unreclaimable:3483521 [463.493285] mapped:526 shmem:4903 pagetables:1759 bounce:0 [463.493285] free:33623 free_pcp:2392 free_cma:0 ... [463.493489] Unreclaimable slab info: [463.493513] Name Used Total [463.493522] bio-6 1028KB 1028KB [463.493525] bio-5 1028KB 1028KB [463.493528] dm_snap_pending_exception 236783KB 243789KB [463.493531] dm_exception 41KB 42KB [463.493534] bio-4 1216KB 1216KB [463.493537] bio-3 439396KB 439396KB [463.493539] kcopyd_job 6973427KB 6973427KB ... [463.494340] Out of memory: Kill process 1298 (ruby2.3) score 1 or sacrifice child [463.494673] Killed process 1298 (ruby2.3) total-vm:435740kB, anon-rss:20180kB, file-rss:4kB, shmem-rss:0kB [463.506437] oom_reaper: reaped process 1298 (ruby2.3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Moreover, issuing a large number of kcopyd jobs results in kcopyd hogging the CPU, while processing them. As a result, processing of work items, queued for execution on the same CPU as the currently running kcopyd thread, is stalled for long periods of time, hurting performance. Running the aforementioned test we get, in dmesg, messages like the following:
[67501.194592] BUG: workqueue lockup - pool cpus=4 node=0 flags=0x0 nice=0 stuck for 27s! [67501.195586] Showing busy workqueues and worker pools: [67501.195591] workqueue events: flags=0x0 [67501.195597] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [67501.195611] pending: cache_reap [67501.195641] workqueue mm_percpu_wq: flags=0x8 [67501.195645] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [67501.195656] pending: vmstat_update [67501.195682] workqueue kblockd: flags=0x18 [67501.195687] pwq 5: cpus=2 node=0 flags=0x0 nice=-20 active=1/256 [67501.195698] pending: blk_timeout_work [67501.195753] workqueue kcopyd: flags=0x8 [67501.195757] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [67501.195768] pending: do_work [dm_mod] [67501.195802] workqueue kcopyd: flags=0x8 [67501.195806] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [67501.195817] pending: do_work [dm_mod] [67501.195834] workqueue kcopyd: flags=0x8 [67501.195838] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [67501.195848] pending: do_work [dm_mod] [67501.195881] workqueue kcopyd: flags=0x8 [67501.195885] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [67501.195896] pending: do_work [dm_mod] [67501.195920] workqueue kcopyd: flags=0x8 [67501.195924] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=2/256 [67501.195935] in-flight: 67:do_work [dm_mod] [67501.195945] pending: do_work [dm_mod] [67501.195961] pool 8: cpus=4 node=0 flags=0x0 nice=0 hung=27s workers=3 idle: 129 23765
The root cause for these issues is the way dm-snapshot uses kcopyd. In particular, the lack of an explicit or implicit limit to the maximum number of in-flight COW jobs. The merging path is not affected because it implicitly limits the in-flight kcopyd jobs to one.
Fix these issues by using a semaphore to limit the maximum number of in-flight kcopyd jobs. We grab the semaphore before allocating a new kcopyd job in start_copy() and start_full_bio() and release it after the job finishes in copy_callback().
The initial semaphore value is configurable through a module parameter, to allow fine tuning the maximum number of in-flight COW jobs. Setting this parameter to zero initializes the semaphore to INT_MAX.
A default value of 2048 maximum in-flight kcopyd jobs was chosen. This value was decided experimentally as a trade-off between memory consumption, stalling the kernel's workqueues and maintaining a high enough throughput.
Re-running the aforementioned test:
* Workqueue stalls are eliminated * kcopyd's job slab cache uses a maximum of 130MB * The time taken by the test to write to the snapshot-origin target is reduced from 05m20.48s to 03m26.38s
[1] https://github.com/jthornber/device-mapper-test-suite
Signed-off-by: Nikos Tsironis ntsironis@arrikto.com Signed-off-by: Ilias Tsitsimpis iliastsi@arrikto.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/dm-snap.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c index b91b39caf7a9..0a304e7de260 100644 --- a/drivers/md/dm-snap.c +++ b/drivers/md/dm-snap.c @@ -19,6 +19,7 @@ #include <linux/vmalloc.h> #include <linux/log2.h> #include <linux/dm-kcopyd.h> +#include <linux/semaphore.h>
#include "dm.h"
@@ -98,6 +99,9 @@ struct dm_snapshot { /* The on disk metadata handler */ struct dm_exception_store *store;
+ /* Maximum number of in-flight COW jobs. */ + struct semaphore cow_count; + struct dm_kcopyd_client *kcopyd_client;
/* Wait for events based on state_bits */ @@ -138,6 +142,19 @@ struct dm_snapshot { #define RUNNING_MERGE 0 #define SHUTDOWN_MERGE 1
+/* + * Maximum number of chunks being copied on write. + * + * The value was decided experimentally as a trade-off between memory + * consumption, stalling the kernel's workqueues and maintaining a high enough + * throughput. + */ +#define DEFAULT_COW_THRESHOLD 2048 + +static int cow_threshold = DEFAULT_COW_THRESHOLD; +module_param_named(snapshot_cow_threshold, cow_threshold, int, 0644); +MODULE_PARM_DESC(snapshot_cow_threshold, "Maximum number of chunks being copied on write"); + DECLARE_DM_KCOPYD_THROTTLE_WITH_MODULE_PARM(snapshot_copy_throttle, "A percentage of time allocated for copy on write");
@@ -1173,6 +1190,8 @@ static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv) goto bad_hash_tables; }
+ sema_init(&s->cow_count, (cow_threshold > 0) ? cow_threshold : INT_MAX); + s->kcopyd_client = dm_kcopyd_client_create(&dm_kcopyd_throttle); if (IS_ERR(s->kcopyd_client)) { r = PTR_ERR(s->kcopyd_client); @@ -1545,6 +1564,7 @@ static void copy_callback(int read_err, unsigned long write_err, void *context) } list_add(&pe->out_of_order_entry, lh); } + up(&s->cow_count); }
/* @@ -1568,6 +1588,7 @@ static void start_copy(struct dm_snap_pending_exception *pe) dest.count = src.count;
/* Hand over to kcopyd */ + down(&s->cow_count); dm_kcopyd_copy(s->kcopyd_client, &src, 1, &dest, 0, copy_callback, pe); }
@@ -1588,6 +1609,7 @@ static void start_full_bio(struct dm_snap_pending_exception *pe, pe->full_bio_end_io = bio->bi_end_io; pe->full_bio_private = bio->bi_private;
+ down(&s->cow_count); callback_data = dm_kcopyd_prepare_callback(s->kcopyd_client, copy_callback, pe);
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit 644b2e97405b0b74845e1d3c2b4fe4c34858062b ]
This commit fixes hard-coded model-id for an unit of Apogee Ensemble with a correct value. This unit uses DM1500 ASIC produced ArchWave AG (formerly known as BridgeCo AG).
I note that this model supports three modes in the number of data channels in tx/rx streams; 8 ch pairs, 10 ch pairs, 18 ch pairs. The mode is switched by Vendor-dependent AV/C command, like:
$ cd linux-firewire-utils $ ./firewire-request /dev/fw1 fcp 0x00ff000003dbeb0600000000 (8ch pairs) $ ./firewire-request /dev/fw1 fcp 0x00ff000003dbeb0601000000 (10ch pairs) $ ./firewire-request /dev/fw1 fcp 0x00ff000003dbeb0602000000 (18ch pairs)
When switching between different mode, the unit disappears from IEEE 1394 bus, then appears on the bus with different combination of stream formats. In a mode of 18 ch pairs, available sampling rate is up to 96.0 kHz, else up to 192.0 kHz.
$ ./hinawa-config-rom-printer /dev/fw1 { 'bus-info': { 'adj': False, 'bmc': True, 'chip_ID': 21474898341, 'cmc': True, 'cyc_clk_acc': 100, 'generation': 2, 'imc': True, 'isc': True, 'link_spd': 2, 'max_ROM': 1, 'max_rec': 512, 'name': '1394', 'node_vendor_ID': 987, 'pmc': False}, 'root-directory': [ ['HARDWARE_VERSION', 19], [ 'NODE_CAPABILITIES', { 'addressing': {'64': True, 'fix': True, 'prv': False}, 'misc': {'int': False, 'ms': False, 'spt': True}, 'state': { 'atn': False, 'ded': False, 'drq': True, 'elo': False, 'init': False, 'lst': True, 'off': False}, 'testing': {'bas': False, 'ext': False}}], ['VENDOR', 987], ['DESCRIPTOR', 'Apogee Electronics'], ['MODEL', 126702], ['DESCRIPTOR', 'Ensemble'], ['VERSION', 5297], [ 'UNIT', [ ['SPECIFIER_ID', 41005], ['VERSION', 65537], ['MODEL', 126702], ['DESCRIPTOR', 'Ensemble']]], [ 'DEPENDENT_INFO', [ ['SPECIFIER_ID', 2037], ['VERSION', 1], [(58, 'IMMEDIATE'), 16777159], [(59, 'IMMEDIATE'), 1048576], [(60, 'IMMEDIATE'), 16777159], [(61, 'IMMEDIATE'), 6291456]]]]}
Signed-off-by: Takashi Sakamoto o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/firewire/bebob/bebob.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/firewire/bebob/bebob.c b/sound/firewire/bebob/bebob.c index fc19c99654aa..e4312afe2ce5 100644 --- a/sound/firewire/bebob/bebob.c +++ b/sound/firewire/bebob/bebob.c @@ -356,7 +356,7 @@ static const struct ieee1394_device_id bebob_id_table[] = { /* Apogee Electronics, DA/AD/DD-16X (X-FireWire card) */ SND_BEBOB_DEV_ENTRY(VEN_APOGEE, 0x00010048, &spec_normal), /* Apogee Electronics, Ensemble */ - SND_BEBOB_DEV_ENTRY(VEN_APOGEE, 0x00001eee, &spec_normal), + SND_BEBOB_DEV_ENTRY(VEN_APOGEE, 0x01eeee, &spec_normal), /* ESI, Quatafire610 */ SND_BEBOB_DEV_ENTRY(VEN_ESI, 0x00010064, &spec_normal), /* AcousticReality, eARMasterOne */
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit 4f4b374332ec0ae9c738ff8ec9bed5cd97ff9adc ]
This is the much more correct fix for my earlier attempt at:
https://lkml.org/lkml/2018/12/10/118
Short recap:
- There's not actually a locking issue, it's just lockdep being a bit too eager to complain about a possible deadlock.
- Contrary to what I claimed the real problem is recursion on kn->count. Greg pointed me at sysfs_break_active_protection(), used by the scsi subsystem to allow a sysfs file to unbind itself. That would be a real deadlock, which isn't what's happening here. Also, breaking the active protection means we'd need to manually handle all the lifetime fun.
- With Rafael we discussed the task_work approach, which kinda works, but has two downsides: It's a functional change for a lockdep annotation issue, and it won't work for the bind file (which needs to get the errno from the driver load function back to userspace).
- Greg also asked why this never showed up: To hit this you need to unregister a 2nd driver from the unload code of your first driver. I guess only gpus do that. The bug has always been there, but only with a recent patch series did we add more locks so that lockdep built a chain from unbinding the snd-hda driver to the acpi_video_unregister call.
Full lockdep splat:
[12301.898799] ============================================ [12301.898805] WARNING: possible recursive locking detected [12301.898811] 4.20.0-rc7+ #84 Not tainted [12301.898815] -------------------------------------------- [12301.898821] bash/5297 is trying to acquire lock: [12301.898826] 00000000f61c6093 (kn->count#39){++++}, at: kernfs_remove_by_name_ns+0x3b/0x80 [12301.898841] but task is already holding lock: [12301.898847] 000000005f634021 (kn->count#39){++++}, at: kernfs_fop_write+0xdc/0x190 [12301.898856] other info that might help us debug this: [12301.898862] Possible unsafe locking scenario: [12301.898867] CPU0 [12301.898870] ---- [12301.898874] lock(kn->count#39); [12301.898879] lock(kn->count#39); [12301.898883] *** DEADLOCK *** [12301.898891] May be due to missing lock nesting notation [12301.898899] 5 locks held by bash/5297: [12301.898903] #0: 00000000cd800e54 (sb_writers#4){.+.+}, at: vfs_write+0x17f/0x1b0 [12301.898915] #1: 000000000465e7c2 (&of->mutex){+.+.}, at: kernfs_fop_write+0xd3/0x190 [12301.898925] #2: 000000005f634021 (kn->count#39){++++}, at: kernfs_fop_write+0xdc/0x190 [12301.898936] #3: 00000000414ef7ac (&dev->mutex){....}, at: device_release_driver_internal+0x34/0x240 [12301.898950] #4: 000000003218fbdf (register_count_mutex){+.+.}, at: acpi_video_unregister+0xe/0x40 [12301.898960] stack backtrace: [12301.898968] CPU: 1 PID: 5297 Comm: bash Not tainted 4.20.0-rc7+ #84 [12301.898974] Hardware name: Hewlett-Packard HP EliteBook 8460p/161C, BIOS 68SCF Ver. F.01 03/11/2011 [12301.898982] Call Trace: [12301.898989] dump_stack+0x67/0x9b [12301.898997] __lock_acquire+0x6ad/0x1410 [12301.899003] ? kernfs_remove_by_name_ns+0x3b/0x80 [12301.899010] ? find_held_lock+0x2d/0x90 [12301.899017] ? mutex_spin_on_owner+0xe4/0x150 [12301.899023] ? find_held_lock+0x2d/0x90 [12301.899030] ? lock_acquire+0x90/0x180 [12301.899036] lock_acquire+0x90/0x180 [12301.899042] ? kernfs_remove_by_name_ns+0x3b/0x80 [12301.899049] __kernfs_remove+0x296/0x310 [12301.899055] ? kernfs_remove_by_name_ns+0x3b/0x80 [12301.899060] ? kernfs_name_hash+0xd/0x80 [12301.899066] ? kernfs_find_ns+0x6c/0x100 [12301.899073] kernfs_remove_by_name_ns+0x3b/0x80 [12301.899080] bus_remove_driver+0x92/0xa0 [12301.899085] acpi_video_unregister+0x24/0x40 [12301.899127] i915_driver_unload+0x42/0x130 [i915] [12301.899160] i915_pci_remove+0x19/0x30 [i915] [12301.899169] pci_device_remove+0x36/0xb0 [12301.899176] device_release_driver_internal+0x185/0x240 [12301.899183] unbind_store+0xaf/0x180 [12301.899189] kernfs_fop_write+0x104/0x190 [12301.899195] __vfs_write+0x31/0x180 [12301.899203] ? rcu_read_lock_sched_held+0x6f/0x80 [12301.899209] ? rcu_sync_lockdep_assert+0x29/0x50 [12301.899216] ? __sb_start_write+0x13c/0x1a0 [12301.899221] ? vfs_write+0x17f/0x1b0 [12301.899227] vfs_write+0xb9/0x1b0 [12301.899233] ksys_write+0x50/0xc0 [12301.899239] do_syscall_64+0x4b/0x180 [12301.899247] entry_SYSCALL_64_after_hwframe+0x49/0xbe [12301.899253] RIP: 0033:0x7f452ac7f7a4 [12301.899259] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 00 8b 05 aa f0 2c 00 48 63 ff 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 55 53 48 89 d5 48 89 f3 48 83 [12301.899273] RSP: 002b:00007ffceafa6918 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [12301.899282] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f452ac7f7a4 [12301.899288] RDX: 000000000000000d RSI: 00005612a1abf7c0 RDI: 0000000000000001 [12301.899295] RBP: 00005612a1abf7c0 R08: 000000000000000a R09: 00005612a1c46730 [12301.899301] R10: 000000000000000a R11: 0000000000000246 R12: 000000000000000d [12301.899308] R13: 0000000000000001 R14: 00007f452af4a740 R15: 000000000000000d
Looking around I've noticed that usb and i2c already handle similar recursion problems, where a sysfs file can unbind the same type of sysfs somewhere else in the hierarchy. Relevant commits are:
commit 356c05d58af05d582e634b54b40050c73609617b Author: Alan Stern stern@rowland.harvard.edu Date: Mon May 14 13:30:03 2012 -0400
sysfs: get rid of some lockdep false positives
commit e9b526fe704812364bca07edd15eadeba163ebfb Author: Alexander Sverdlin alexander.sverdlin@nsn.com Date: Fri May 17 14:56:35 2013 +0200
i2c: suppress lockdep warning on delete_device
Implement the same trick for driver bind/unbind.
v2: Put the macro into bus.c (Greg).
Reviewed-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Cc: Ramalingam C ramalingam.c@intel.com Cc: Arend van Spriel aspriel@gmail.com Cc: Andy Shevchenko andriy.shevchenko@linux.intel.com Cc: Geert Uytterhoeven geert+renesas@glider.be Cc: Bartosz Golaszewski brgl@bgdev.pl Cc: Heikki Krogerus heikki.krogerus@linux.intel.com Cc: Vivek Gautam vivek.gautam@codeaurora.org Cc: Joe Perches joe@perches.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/base/bus.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/base/bus.c b/drivers/base/bus.c index 07ea8608fb0b..c05561e26d85 100644 --- a/drivers/base/bus.c +++ b/drivers/base/bus.c @@ -32,6 +32,9 @@ static struct kset *system_kset;
#define to_drv_attr(_attr) container_of(_attr, struct driver_attribute, attr)
+#define DRIVER_ATTR_IGNORE_LOCKDEP(_name, _mode, _show, _store) \ + struct driver_attribute driver_attr_##_name = \ + __ATTR_IGNORE_LOCKDEP(_name, _mode, _show, _store)
static int __must_check bus_rescan_devices_helper(struct device *dev, void *data); @@ -197,7 +200,7 @@ static ssize_t unbind_store(struct device_driver *drv, const char *buf, bus_put(bus); return err; } -static DRIVER_ATTR_WO(unbind); +static DRIVER_ATTR_IGNORE_LOCKDEP(unbind, S_IWUSR, NULL, unbind_store);
/* * Manually attach a device to a driver. @@ -233,7 +236,7 @@ static ssize_t bind_store(struct device_driver *drv, const char *buf, bus_put(bus); return err; } -static DRIVER_ATTR_WO(bind); +static DRIVER_ATTR_IGNORE_LOCKDEP(bind, S_IWUSR, NULL, bind_store);
static ssize_t show_drivers_autoprobe(struct bus_type *bus, char *buf) {
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit 532e1e54c8140188e192348c790317921cb2dc1c ]
mount.ocfs2 ignore the inconsistent error that journal is clean but local alloc is unrecovered. After mount, local alloc not empty, then reserver cluster didn't alloc a new local alloc window, reserveration map is empty(ocfs2_reservation_map.m_bitmap_len = 0), that triggered the following panic.
This issue was reported at
https://oss.oracle.com/pipermail/ocfs2-devel/2015-May/010854.html
and was advised to fixed during mount. But this is a very unusual inconsistent state, usually journal dirty flag should be cleared at the last stage of umount until every other things go right. We may need do further debug to check that. Any way to avoid possible futher corruption, mount should be abort and fsck should be run.
(mount.ocfs2,1765,1):ocfs2_load_local_alloc:353 ERROR: Local alloc hasn't been recovered! found = 6518, set = 6518, taken = 8192, off = 15912372 ocfs2: Mounting device (202,64) on (node 0, slot 3) with ordered data mode. o2dlm: Joining domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 8 ) 8 nodes ocfs2: Mounting device (202,80) on (node 0, slot 3) with ordered data mode. o2hb: Region 89CEAC63CC4F4D03AC185B44E0EE0F3F (xvdf) is now a quorum device o2net: Accepted connection from node yvwsoa17p (num 7) at 172.22.77.88:7777 o2dlm: Node 7 joins domain 64FE421C8C984E6D96ED12C55FEE2435 ( 0 1 2 3 4 5 6 7 8 ) 9 nodes o2dlm: Node 7 joins domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 7 8 ) 9 nodes ------------[ cut here ]------------ kernel BUG at fs/ocfs2/reservations.c:507! invalid opcode: 0000 [#1] SMP Modules linked in: ocfs2 rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache lockd grace ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ovmapi ppdev parport_pc parport xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea acpi_cpufreq pcspkr i2c_piix4 i2c_core sg ext4 jbd2 mbcache2 sr_mod cdrom xen_blkfront pata_acpi ata_generic ata_piix floppy dm_mirror dm_region_hash dm_log dm_mod CPU: 0 PID: 4349 Comm: startWebLogic.s Not tainted 4.1.12-124.19.2.el6uek.x86_64 #2 Hardware name: Xen HVM domU, BIOS 4.4.4OVM 09/06/2018 task: ffff8803fb04e200 ti: ffff8800ea4d8000 task.ti: ffff8800ea4d8000 RIP: 0010:[<ffffffffa05e96a8>] [<ffffffffa05e96a8>] __ocfs2_resv_find_window+0x498/0x760 [ocfs2] Call Trace: ocfs2_resmap_resv_bits+0x10d/0x400 [ocfs2] ocfs2_claim_local_alloc_bits+0xd0/0x640 [ocfs2] __ocfs2_claim_clusters+0x178/0x360 [ocfs2] ocfs2_claim_clusters+0x1f/0x30 [ocfs2] ocfs2_convert_inline_data_to_extents+0x634/0xa60 [ocfs2] ocfs2_write_begin_nolock+0x1c6/0x1da0 [ocfs2] ocfs2_write_begin+0x13e/0x230 [ocfs2] generic_perform_write+0xbf/0x1c0 __generic_file_write_iter+0x19c/0x1d0 ocfs2_file_write_iter+0x589/0x1360 [ocfs2] __vfs_write+0xb8/0x110 vfs_write+0xa9/0x1b0 SyS_write+0x46/0xb0 system_call_fastpath+0x18/0xd7 Code: ff ff 8b 75 b8 39 75 b0 8b 45 c8 89 45 98 0f 84 e5 fe ff ff 45 8b 74 24 18 41 8b 54 24 1c e9 56 fc ff ff 85 c0 0f 85 48 ff ff ff <0f> 0b 48 8b 05 cf c3 de ff 48 ba 00 00 00 00 00 00 00 10 48 85 RIP __ocfs2_resv_find_window+0x498/0x760 [ocfs2] RSP <ffff8800ea4db668> ---[ end trace 566f07529f2edf3c ]--- Kernel panic - not syncing: Fatal exception Kernel Offset: disabled
Link: http://lkml.kernel.org/r/20181121020023.3034-2-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi junxiao.bi@oracle.com Reviewed-by: Yiwen Jiang jiangyiwen@huawei.com Acked-by: Joseph Qi jiangqi903@gmail.com Cc: Jun Piao piaojun@huawei.com Cc: Mark Fasheh mfasheh@versity.com Cc: Joel Becker jlbec@evilplan.org Cc: Changwei Ge ge.changwei@h3c.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ocfs2/localalloc.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c index 044013455621..12675bb5304a 100644 --- a/fs/ocfs2/localalloc.c +++ b/fs/ocfs2/localalloc.c @@ -345,13 +345,18 @@ int ocfs2_load_local_alloc(struct ocfs2_super *osb) if (num_used || alloc->id1.bitmap1.i_used || alloc->id1.bitmap1.i_total - || la->la_bm_off) - mlog(ML_ERROR, "Local alloc hasn't been recovered!\n" + || la->la_bm_off) { + mlog(ML_ERROR, "inconsistent detected, clean journal with" + " unrecovered local alloc, please run fsck.ocfs2!\n" "found = %u, set = %u, taken = %u, off = %u\n", num_used, le32_to_cpu(alloc->id1.bitmap1.i_used), le32_to_cpu(alloc->id1.bitmap1.i_total), OCFS2_LOCAL_ALLOC(alloc)->la_bm_off);
+ status = -EINVAL; + goto bail; + } + osb->local_alloc_bh = alloc_bh; osb->local_alloc_state = OCFS2_LA_ENABLED;
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit 7550c6079846a24f30d15ac75a941c8515dbedfb ]
Patch series "THP eligibility reporting via proc".
This series of three patches aims at making THP eligibility reporting much more robust and long term sustainable. The trigger for the change is a regression report [2] and the long follow up discussion. In short the specific application didn't have good API to query whether a particular mapping can be backed by THP so it has used VMA flags to workaround that. These flags represent a deep internal state of VMAs and as such they should be used by userspace with a great deal of caution.
A similar has happened for [3] when users complained that VM_MIXEDMAP is no longer set on DAX mappings. Again a lack of a proper API led to an abuse.
The first patch in the series tries to emphasise that that the semantic of flags might change and any application consuming those should be really careful.
The remaining two patches provide a more suitable interface to address [2] and provide a consistent API to query the THP status both for each VMA and process wide as well. [1]
http://lkml.kernel.org/r/20181120103515.25280-1-mhocko@kernel.org [2] http://lkml.kernel.org/r/http://lkml.kernel.org/r/alpine.DEB.2.21.1809241054... [3] http://lkml.kernel.org/r/20181002100531.GC4135@quack2.suse.cz
This patch (of 3):
Even though vma flags exported via /proc/<pid>/smaps are explicitly documented to be not guaranteed for future compatibility the warning doesn't go far enough because it doesn't mention semantic changes to those flags. And they are important as well because these flags are a deep implementation internal to the MM code and the semantic might change at any time.
Let's consider two recent examples: http://lkml.kernel.org/r/20181002100531.GC4135@quack2.suse.cz : commit e1fb4a086495 "dax: remove VM_MIXEDMAP for fsdax and device dax" has : removed VM_MIXEDMAP flag from DAX VMAs. Now our testing shows that in the : mean time certain customer of ours started poking into /proc/<pid>/smaps : and looks at VMA flags there and if VM_MIXEDMAP is missing among the VMA : flags, the application just fails to start complaining that DAX support is : missing in the kernel.
http://lkml.kernel.org/r/alpine.DEB.2.21.1809241054050.224429@chino.kir.corp... : Commit 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active") : introduced a regression in that userspace cannot always determine the set : of vmas where thp is ineligible. : Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps : to determine if a vma is eligible to be backed by hugepages. : Previous to this commit, prctl(PR_SET_THP_DISABLE, 1) would cause thp to : be disabled and emit "nh" as a flag for the corresponding vmas as part of : /proc/pid/smaps. After the commit, thp is disabled by means of an mm : flag and "nh" is not emitted. : This causes smaps parsing libraries to assume a vma is eligible for thp : and ends up puzzling the user on why its memory is not backed by thp.
In both cases userspace was relying on a semantic of a specific VMA flag. The primary reason why that happened is a lack of a proper interface. While this has been worked on and it will be fixed properly, it seems that our wording could see some refinement and be more vocal about semantic aspect of these flags as well.
Link: http://lkml.kernel.org/r/20181211143641.3503-2-mhocko@kernel.org Signed-off-by: Michal Hocko mhocko@suse.com Acked-by: Jan Kara jack@suse.cz Acked-by: Dan Williams dan.j.williams@intel.com Acked-by: David Rientjes rientjes@google.com Acked-by: Mike Rapoport rppt@linux.ibm.com Acked-by: Vlastimil Babka vbabka@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: David Rientjes rientjes@google.com Cc: Paul Oppenheimer bepvte@gmail.com Cc: William Kucharski william.kucharski@oracle.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/filesystems/proc.txt | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 04b23a83ce68..3172a2465ff2 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -439,7 +439,9 @@ manner. The codes are the following:
Note that there is no guarantee that every flag and associated mnemonic will be present in all further kernel releases. Things get changed, the flags may -be vanished or the reverse -- new added. +be vanished or the reverse -- new added. Interpretation of their meaning +might change in future as well. So each consumer of these flags has to +follow each specific kernel version for the exact semantic.
This file is only present if the CONFIG_MMU kernel configuration option is enabled.
On 1/24/19 12:19 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 3.18.133 release. There are 52 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat Jan 26 19:01:07 UTC 2019. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.18.133-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-3.18.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
On Thu, Jan 24, 2019 at 08:19:24PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 3.18.133 release. There are 52 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat Jan 26 19:01:07 UTC 2019. Anything received after that time might be too late.
Build results: total: 155 pass: 155 fail: 0 Qemu test results: total: 226 pass: 226 fail: 0
Guenter
linux-stable-mirror@lists.linaro.org