I'm announcing the release of the 3.18.117 kernel.
All users of the 3.18 kernel series must upgrade.
The updated 3.18.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-3.18.y
and can be browsed at the normal kernel.org git web browser:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 3
arch/arc/include/asm/page.h | 2
arch/arc/include/asm/pgtable.h | 2
arch/arm/include/asm/uaccess.h | 2
arch/x86/kernel/cpu/mcheck/mce.c | 3
drivers/net/can/xilinx_can.c | 98 +++++++++++++-----
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c | 2
drivers/ptp/ptp_chardev.c | 1
drivers/usb/class/cdc-acm.c | 3
drivers/usb/core/hub.c | 8 +
drivers/usb/gadget/function/f_fs.c | 2
fs/fat/inode.c | 20 ++-
include/linux/skbuff.h | 12 +-
include/net/tcp.h | 2
net/core/rtnetlink.c | 9 +
net/core/skbuff.c | 1
net/ipv4/ip_output.c | 2
net/ipv4/sysctl_net_ipv4.c | 5
net/ipv4/tcp_dctcp.c | 50 ++-------
net/ipv4/tcp_input.c | 22 +++-
net/ipv4/tcp_output.c | 33 ++++--
net/ipv6/ip6_output.c | 2
sound/core/rawmidi.c | 20 ++-
23 files changed, 198 insertions(+), 106 deletions(-)
Alexey Brodkin (1):
ARC: Fix CONFIG_SWAP
Anssi Hannula (4):
can: xilinx_can: fix RX loop if RXNEMP is asserted without RXOK
can: xilinx_can: fix device dropping off bus on RX overrun
can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting
can: xilinx_can: fix RX overflow interrupt not being enabled
Arnd Bergmann (2):
ARM: fix put_user() for gcc-8
turn off -Wattribute-alias
Bin Liu (1):
usb: core: handle hub C_PORT_OVER_CURRENT condition
Dewet Thibaut (1):
x86/MCE: Remove min interval polling limitation
Eric Dumazet (2):
tcp: avoid collapses in tcp_prune_queue() if possible
tcp: detect malicious patterns in tcp_collapse_ofo_queue()
Greg Kroah-Hartman (1):
Linux 3.18.117
Gustavo A. R. Silva (1):
ptp: fix missing break in switch
Jack Morgenstein (1):
net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper
Jerry Zhang (1):
usb: gadget: f_fs: Only return delayed status when len is 0
Lubomir Rintel (1):
usb: cdc_acm: Add quirk for Castles VEGA3000
OGAWA Hirofumi (1):
fat: fix memory allocation failure handling of match_strdup()
Paolo Abeni (1):
ip: hash fragments consistently
Roopa Prabhu (1):
rtnetlink: add rtnl_link_state check in rtnl_configure_link
Stefano Brivio (2):
net: Don't copy pfmemalloc flag in __copy_skb_header()
skbuff: Unconditionally copy pfmemalloc in __skb_clone()
Takashi Iwai (1):
ALSA: rawmidi: Change resized buffers atomically
Tyler Hicks (1):
ipv4: Return EINVAL when ping_group_range sysctl doesn't map to user ns
Vineet Gupta (1):
ARC: mm: allow mprotect to make stack mappings executable
Yuchung Cheng (4):
tcp: fix dctcp delayed ACK schedule
tcp: helpers to send special DCTCP ack
tcp: do not cancel delay-AcK on DCTCP special ACK
tcp: do not delay ACK in DCTCP upon CE status change
This is the start of the stable review cycle for the 4.4.140 release.
There are 47 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu Jul 12 18:23:24 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.140-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.140-rc1
Dan Carpenter <dan.carpenter(a)oracle.com>
staging: comedi: quatech_daqp_cs: fix no-op loop daqp_ao_insn_write()
Jann Horn <jannh(a)google.com>
netfilter: nf_log: don't hold nf_log_mutex during user access
Tokunori Ikegami <ikegami(a)allied-telesis.co.jp>
mtd: cfi_cmdset_0002: Change erase functions to check chip good only
Tokunori Ikegami <ikegami(a)allied-telesis.co.jp>
mtd: cfi_cmdset_0002: Change erase functions to retry for error
Tokunori Ikegami <ikegami(a)allied-telesis.co.jp>
mtd: cfi_cmdset_0002: Change definition naming to retry write operation
Mikulas Patocka <mpatocka(a)redhat.com>
dm bufio: don't take the lock in dm_bufio_shrink_count
Martin Kaiser <martin(a)kaiser.cx>
mtd: rawnand: mxc: set spare area size register explicitly
Mikulas Patocka <mpatocka(a)redhat.com>
dm bufio: drop the lock when doing GFP_NOIO allocation
Douglas Anderson <dianders(a)chromium.org>
dm bufio: avoid sleeping while holding the dm_bufio lock
Vlastimil Babka <vbabka(a)suse.cz>
mm, page_alloc: do not break __GFP_THISNODE by zonelist reset
Brad Love <brad(a)nextdimension.cc>
media: cx25840: Use subdev host data for PLL override
Tony Luck <tony.luck(a)intel.com>
x86/mce: Fix incorrect "Machine check from unknown source" message
Yazen Ghannam <Yazen.Ghannam(a)amd.com>
x86/mce: Detect local MCEs properly
Daniel Rosenberg <drosen(a)google.com>
HID: debug: check length before copy_to_user()
Gustavo A. R. Silva <gustavo(a)embeddedor.com>
HID: hiddev: fix potential Spectre v1
Jason Andryuk <jandryuk(a)gmail.com>
HID: i2c-hid: Fix "incomplete report" noise
Jon Derrick <jonathan.derrick(a)intel.com>
ext4: check superblock mapped prior to committing
Theodore Ts'o <tytso(a)mit.edu>
ext4: add more mount time checks of the superblock
Theodore Ts'o <tytso(a)mit.edu>
ext4: add more inode number paranoia checks
Theodore Ts'o <tytso(a)mit.edu>
ext4: clear i_data in ext4_inode_info when removing inline data
Theodore Ts'o <tytso(a)mit.edu>
ext4: include the illegal physical block in the bad map ext4_error msg
Theodore Ts'o <tytso(a)mit.edu>
ext4: verify the depth of extent tree in ext4_find_extent()
Theodore Ts'o <tytso(a)mit.edu>
ext4: only look at the bg_flags field if it is valid
Theodore Ts'o <tytso(a)mit.edu>
ext4: always check block group bounds in ext4_init_block_bitmap()
Theodore Ts'o <tytso(a)mit.edu>
ext4: make sure bitmaps and the inode table don't overlap with bg descriptors
Theodore Ts'o <tytso(a)mit.edu>
jbd2: don't mark block as modified if the handle is out of credits
Paulo Alcantara <paulo(a)paulo.ac>
cifs: Fix infinite loop when using hard mount option
Lars Ellenberg <lars.ellenberg(a)linbit.com>
drbd: fix access after free
Christian Borntraeger <borntraeger(a)de.ibm.com>
s390: Correct register corruption in critical section cleanup
Jann Horn <jannh(a)google.com>
scsi: sg: mitigate read/write abuse
Changbin Du <changbin.du(a)intel.com>
tracing: Fix missing return symbol in function_graph output
Cannon Matthews <cannonmatthews(a)google.com>
mm: hugetlb: yield when prepping struct pages
Richard Weinberger <richard(a)nod.at>
ubi: fastmap: Correctly handle interrupted erasures in EBA
Sean Nyekjaer <sean.nyekjaer(a)prevas.dk>
ARM: dts: imx6q: Use correct SDMA script for SPI5 core
Taehee Yoo <ap420073(a)gmail.com>
netfilter: nf_tables: use WARN_ON_ONCE instead of BUG_ON in nft_do_chain()
Keith Busch <keith.busch(a)intel.com>
nvme-pci: initialize queue memory before interrupts
Masami Hiramatsu <mhiramat(a)kernel.org>
kprobes/x86: Do not modify singlestep buffer while resuming
Ben Hutchings <ben.hutchings(a)codethink.co.uk>
ipv4: Fix error return value in fib_convert_metrics()
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
i2c: rcar: fix resume by always initializing registers before transfer
Vasanthakumar Thiagarajan <vthiagar(a)qti.qualcomm.com>
ath10k: fix rfc1042 header retrieval in QCA4019 with eth decap mode
Dave Hansen <dave.hansen(a)linux.intel.com>
x86/boot: Fix early command-line parsing when matching at end
Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
n_tty: Access echo_* variables carefully.
Laura Abbott <labbott(a)redhat.com>
staging: android: ion: Return an ERR_PTR in ion_map_kernel
Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
n_tty: Fix stall at n_tty_receive_char_special().
Karoly Pados <pados(a)pados.hu>
USB: serial: cp210x: add Silicon Labs IDs for Windows Update
Johan Hovold <johan(a)kernel.org>
USB: serial: cp210x: add CESINEL device ids
Houston Yaroschoff <hstn(a)4ever3.net>
usb: cdc_acm: Add quirk for Uniden UBC125 scanner
-------------
Diffstat:
Makefile | 4 +-
arch/arm/boot/dts/imx6q.dtsi | 2 +-
arch/s390/kernel/entry.S | 4 +-
arch/x86/kernel/cpu/mcheck/mce.c | 51 ++++++++-----
arch/x86/kernel/kprobes/core.c | 42 ++++++-----
arch/x86/lib/cmdline.c | 34 ++++++---
drivers/block/drbd/drbd_worker.c | 2 +-
drivers/hid/hid-debug.c | 8 ++-
drivers/hid/i2c-hid/i2c-hid.c | 2 +-
drivers/hid/usbhid/hiddev.c | 11 +++
drivers/i2c/busses/i2c-rcar.c | 3 +-
drivers/md/dm-bufio.c | 31 ++++----
drivers/media/i2c/cx25840/cx25840-core.c | 28 ++++++--
drivers/mtd/chips/cfi_cmdset_0002.c | 30 +++++---
drivers/mtd/nand/mxc_nand.c | 5 +-
drivers/mtd/ubi/eba.c | 92 +++++++++++++++++++++++-
drivers/net/wireless/ath/ath10k/htt_rx.c | 5 +-
drivers/nvme/host/pci.c | 4 +-
drivers/scsi/sg.c | 42 ++++++++++-
drivers/staging/android/ion/ion_heap.c | 2 +-
drivers/staging/comedi/drivers/quatech_daqp_cs.c | 2 +-
drivers/tty/n_tty.c | 55 ++++++++------
drivers/usb/class/cdc-acm.c | 3 +
drivers/usb/serial/cp210x.c | 14 ++++
fs/cifs/cifssmb.c | 10 ++-
fs/cifs/smb2pdu.c | 18 +++--
fs/ext4/balloc.c | 21 +++---
fs/ext4/ext4.h | 5 --
fs/ext4/ext4_extents.h | 1 +
fs/ext4/extents.c | 6 ++
fs/ext4/ialloc.c | 14 +++-
fs/ext4/inline.c | 1 +
fs/ext4/inode.c | 7 +-
fs/ext4/mballoc.c | 6 +-
fs/ext4/super.c | 86 ++++++++++++++++++----
fs/jbd2/transaction.c | 9 ++-
kernel/trace/trace_functions_graph.c | 5 +-
mm/hugetlb.c | 1 +
mm/page_alloc.c | 2 -
net/ipv4/fib_semantics.c | 2 +-
net/netfilter/nf_log.c | 9 ++-
net/netfilter/nf_tables_core.c | 3 +-
42 files changed, 513 insertions(+), 169 deletions(-)
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 38c0a74fe06da3be133cae3fb7bde6a9438e698b Mon Sep 17 00:00:00 2001
From: Paul Burton <paul.burton(a)mips.com>
Date: Thu, 12 Jul 2018 09:33:04 -0700
Subject: [PATCH] MIPS: Fix off-by-one in pci_resource_to_user()
The MIPS implementation of pci_resource_to_user() introduced in v3.12 by
commit 4c2924b725fb ("MIPS: PCI: Use pci_resource_to_user to map pci
memory space properly") incorrectly sets *end to the address of the
byte after the resource, rather than the last byte of the resource.
This results in userland seeing resources as a byte larger than they
actually are, for example a 32 byte BAR will be reported by a tool such
as lspci as being 33 bytes in size:
Region 2: I/O ports at 1000 [disabled] [size=33]
Correct this by subtracting one from the calculated end address,
reporting the correct address to userland.
Signed-off-by: Paul Burton <paul.burton(a)mips.com>
Reported-by: Rui Wang <rui.wang(a)windriver.com>
Fixes: 4c2924b725fb ("MIPS: PCI: Use pci_resource_to_user to map pci memory space properly")
Cc: James Hogan <jhogan(a)kernel.org>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: Wolfgang Grandegger <wg(a)grandegger.com>
Cc: linux-mips(a)linux-mips.org
Cc: stable(a)vger.kernel.org # v3.12+
Patchwork: https://patchwork.linux-mips.org/patch/19829/
diff --git a/arch/mips/pci/pci.c b/arch/mips/pci/pci.c
index 9632436d74d7..c2e94cf5ecda 100644
--- a/arch/mips/pci/pci.c
+++ b/arch/mips/pci/pci.c
@@ -54,5 +54,5 @@ void pci_resource_to_user(const struct pci_dev *dev, int bar,
phys_addr_t size = resource_size(rsrc);
*start = fixup_bigphys_addr(rsrc->start, size);
- *end = rsrc->start + size;
+ *end = rsrc->start + size - 1;
}
The patch titled
Subject: ipc/shm.c add ->pagesize function to shm_vm_ops
has been added to the -mm tree. Its filename is
ipc-shmc-add-pagesize-function-to-shm_vm_ops.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/ipc-shmc-add-pagesize-function-to-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/ipc-shmc-add-pagesize-function-to-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Jane Chu <jane.chu(a)oracle.com>
Subject: ipc/shm.c add ->pagesize function to shm_vm_ops
05ea88608d4e13 (mm, hugetlbfs: introduce ->pagesize() to
vm_operations_struct) adds a new ->pagesize() function to hugetlb_vm_ops,
intended to cover all hugetlbfs backed files.
With System V shared memory model, if "huge page" is specified, the
"shared memory" is backed by hugetlbfs files, but the mappings initiated
via shmget/shmat have their original vm_ops overwritten with shm_vm_ops,
so we need to add a ->pagesize function to shm_vm_ops. Otherwise,
vma_kernel_pagesize() returns PAGE_SIZE given a hugetlbfs backed vma,
result in below BUG:
fs/hugetlbfs/inode.c
443 if (unlikely(page_mapped(page))) {
444 BUG_ON(truncate_op);
[ 242.268342] hugetlbfs: oracle (4592): Using mlock ulimits for SHM_HUGETLB is deprecated
[ 282.653208] ------------[ cut here ]------------
[ 282.708447] kernel BUG at fs/hugetlbfs/inode.c:444!
[ 282.818957] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 ...
[ 284.025873] CPU: 35 PID: 5583 Comm: oracle_5583_sbt Not tainted 4.14.35-1829.el7uek.x86_64 #2
[ 284.246609] task: ffff9bf0507aaf80 task.stack: ffffa9e625628000
[ 284.317455] RIP: 0010:remove_inode_hugepages+0x3db/0x3e2
....
[ 285.292389] Call Trace:
[ 285.321630] hugetlbfs_evict_inode+0x1e/0x3e
[ 285.372707] evict+0xdb/0x1af
[ 285.408185] iput+0x1a2/0x1f7
[ 285.443661] dentry_unlink_inode+0xc6/0xf0
[ 285.492661] __dentry_kill+0xd8/0x18d
[ 285.536459] dput+0x1b5/0x1ed
[ 285.571939] __fput+0x18b/0x216
[ 285.609495] ____fput+0xe/0x10
[ 285.646030] task_work_run+0x90/0xa7
[ 285.688788] exit_to_usermode_loop+0xdd/0x116
[ 285.740905] do_syscall_64+0x187/0x1ae
[ 285.785740] entry_SYSCALL_64_after_hwframe+0x150/0x0
Link: http://lkml.kernel.org/r/20180727211727.5020-1-jane.chu@oracle.com
Fixes: 05ea88608d4e13 ("mm, hugetlbfs: introduce ->pagesize() to vm_operations_struct")
Signed-off-by: Jane Chu <jane.chu(a)oracle.com>
Suggested-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Jérôme Glisse <jglisse(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Manfred Spraul <manfred(a)colorfullife.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
diff -puN include/linux/mm.h~ipc-shmc-add-pagesize-function-to-shm_vm_ops include/linux/mm.h
--- a/include/linux/mm.h~ipc-shmc-add-pagesize-function-to-shm_vm_ops
+++ a/include/linux/mm.h
@@ -389,6 +389,13 @@ enum page_entry_size {
* These are the virtual MM functions - opening of an area, closing and
* unmapping it (needed to keep files on disk up-to-date etc), pointer
* to the functions called when a no-page or a wp-page exception occurs.
+ *
+ * Note, when a new function is introduced to vm_operations_struct and
+ * added to hugetlb_vm_ops, please consider adding the function to
+ * shm_vm_ops. This is because under System V memory model, though
+ * mappings created via shmget/shmat with "huge page" specified are
+ * backed by hugetlbfs files, their original vm_ops are overwritten with
+ * shm_vm_ops.
*/
struct vm_operations_struct {
void (*open)(struct vm_area_struct * area);
diff -puN ipc/shm.c~ipc-shmc-add-pagesize-function-to-shm_vm_ops ipc/shm.c
--- a/ipc/shm.c~ipc-shmc-add-pagesize-function-to-shm_vm_ops
+++ a/ipc/shm.c
@@ -427,6 +427,17 @@ static int shm_split(struct vm_area_stru
return 0;
}
+static unsigned long shm_pagesize(struct vm_area_struct *vma)
+{
+ struct file *file = vma->vm_file;
+ struct shm_file_data *sfd = shm_file_data(file);
+
+ if (sfd->vm_ops->pagesize)
+ return sfd->vm_ops->pagesize(vma);
+
+ return PAGE_SIZE;
+}
+
#ifdef CONFIG_NUMA
static int shm_set_policy(struct vm_area_struct *vma, struct mempolicy *new)
{
@@ -554,6 +565,7 @@ static const struct vm_operations_struct
.close = shm_close, /* callback for when the vm-area is released */
.fault = shm_fault,
.split = shm_split,
+ .pagesize = shm_pagesize,
#if defined(CONFIG_NUMA)
.set_policy = shm_set_policy,
.get_policy = shm_get_policy,
_
Patches currently in -mm which might be from jane.chu(a)oracle.com are
ipc-shmc-add-pagesize-function-to-shm_vm_ops.patch
The patch titled
Subject: kvm, mm: account shadow page tables to kmemcg
has been removed from the -mm tree. Its filename was
kvm-mm-account-shadow-page-tables-to-kmemcg.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Shakeel Butt <shakeelb(a)google.com>
Subject: kvm, mm: account shadow page tables to kmemcg
The size of kvm's shadow page tables corresponds to the size of the guest
virtual machines on the system. Large VMs can spend a significant amount
of memory as shadow page tables which can not be left as system memory
overhead. So, account shadow page tables to the kmemcg.
[shakeelb(a)google.com: replace (GFP_KERNEL|__GFP_ACCOUNT) with GFP_KERNEL_ACCOUNT]
Link: http://lkml.kernel.org/r/20180629140224.205849-1-shakeelb@google.com
Link: http://lkml.kernel.org/r/20180627181349.149778-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Greg Thelen <gthelen(a)google.com>
Cc: Radim Krčmář <rkrcmar(a)redhat.com>
Cc: Peter Feiner <pfeiner(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/kvm/mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kvm/mmu.c~kvm-mm-account-shadow-page-tables-to-kmemcg
+++ a/arch/x86/kvm/mmu.c
@@ -890,7 +890,7 @@ static int mmu_topup_memory_cache_page(s
if (cache->nobjs >= min)
return 0;
while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
- page = (void *)__get_free_page(GFP_KERNEL);
+ page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
if (!page)
return -ENOMEM;
cache->objects[cache->nobjs++] = page;
_
Patches currently in -mm which might be from shakeelb(a)google.com are
fs-fsnotify-account-fsnotify-metadata-to-kmemcg.patch
fs-fsnotify-account-fsnotify-metadata-to-kmemcg-fix.patch
fs-mm-account-buffer_head-to-kmemcg.patch
fs-mm-account-buffer_head-to-kmemcgpatchfix.patch
memcg-reduce-memcg-tree-traversals-for-stats-collection.patch
The patch titled
Subject: mm: introduce vma_init()
has been removed from the -mm tree. Its filename was
mm-introduce-vma_init.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Subject: mm: introduce vma_init()
Not all VMAs allocated with vm_area_alloc(). Some of them allocated on
stack or in data segment.
The new helper can be use to initialize VMA properly regardless where it
was allocated.
Link: http://lkml.kernel.org/r/20180724121139.62570-2-kirill.shutemov@linux.intel…
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Acked-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm.h | 6 ++++++
kernel/fork.c | 6 ++----
2 files changed, 8 insertions(+), 4 deletions(-)
--- a/include/linux/mm.h~mm-introduce-vma_init
+++ a/include/linux/mm.h
@@ -452,6 +452,12 @@ struct vm_operations_struct {
unsigned long addr);
};
+static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm)
+{
+ vma->vm_mm = mm;
+ INIT_LIST_HEAD(&vma->anon_vma_chain);
+}
+
struct mmu_gather;
struct inode;
--- a/kernel/fork.c~mm-introduce-vma_init
+++ a/kernel/fork.c
@@ -312,10 +312,8 @@ struct vm_area_struct *vm_area_alloc(str
{
struct vm_area_struct *vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL);
- if (vma) {
- vma->vm_mm = mm;
- INIT_LIST_HEAD(&vma->anon_vma_chain);
- }
+ if (vma)
+ vma_init(vma, mm);
return vma;
}
_
Patches currently in -mm which might be from kirill.shutemov(a)linux.intel.com are
mm-page_ext-drop-definition-of-unused-page_ext_debug_poison.patch
mm-page_ext-constify-lookup_page_ext-argument.patch
The patch titled
Subject: delayacct: fix crash in delayacct_blkio_end() after delayacct init failure
has been removed from the -mm tree. Its filename was
delayacct-fix-crash-in-delayacct_blkio_end-after-delayacct-init-failure.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Tejun Heo <tj(a)kernel.org>
Subject: delayacct: fix crash in delayacct_blkio_end() after delayacct init failure
While forking, if delayacct init fails due to memory shortage, it
continues expecting all delayacct users to check task->delays pointer
against NULL before dereferencing it, which all of them used to do.
c96f5471ce7d ("delayacct: Account blkio completion on the correct task"),
while updating delayacct_blkio_end() to take the target task instead of
always using %current, made the function test NULL on %current->delays and
then continue to operated on @p->delays. If %current succeeded init while
@p didn't, it leads to the following crash.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
IP: __delayacct_blkio_end+0xc/0x40
PGD 8000001fd07e1067 P4D 8000001fd07e1067 PUD 1fcffbb067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 4 PID: 25774 Comm: QIOThread0 Not tainted 4.16.0-9_fbk1_rc2_1180_g6b593215b4d7 #9
Hardware name: Quanta Leopard ORv2-DDR4/Leopard ORv2-DDR4, BIOS F06_3B12 08/17/2017
RIP: 0010:__delayacct_blkio_end+0xc/0x40
RSP: 0000:ffff881fff703bf8 EFLAGS: 00010086
RAX: ffff881f1ec8b800 RBX: ffff8804f735cd54 RCX: ffff881fff703cb0
RDX: 0000000000000002 RSI: 0000000000000003 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff881fff703cc0
R10: 0000000000001000 R11: ffff881fd3f73d00 R12: ffff8804f735c600
R13: 0000000000000000 R14: 000000000000001d R15: ffff881fff703cb0
FS: 00007f5003f7d700(0000) GS:ffff881fff700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000004 CR3: 0000001f401a6006 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
try_to_wake_up+0x2c0/0x600
autoremove_wake_function+0xe/0x30
__wake_up_common+0x74/0x120
wake_up_page_bit+0x9c/0xe0
mpage_end_io+0x27/0x70
blk_update_request+0x78/0x2c0
scsi_end_request+0x2c/0x1e0
scsi_io_completion+0x20b/0x5f0
blk_mq_complete_request+0xa2/0x100
ata_scsi_qc_complete+0x79/0x400
ata_qc_complete_multiple+0x86/0xd0
ahci_handle_port_interrupt+0xc9/0x5c0
ahci_handle_port_intr+0x54/0xb0
ahci_single_level_irq_intr+0x3b/0x60
__handle_irq_event_percpu+0x43/0x190
handle_irq_event_percpu+0x20/0x50
handle_irq_event+0x2a/0x50
handle_edge_irq+0x80/0x1c0
handle_irq+0xaf/0x120
do_IRQ+0x41/0xc0
common_interrupt+0xf/0xf
</IRQ>
Fix it by updating delayacct_blkio_end() check @p->delays instead.
Link: http://lkml.kernel.org/r/20180724175542.GP1934745@devbig577.frc2.facebook.c…
Fixes: c96f5471ce7d ("delayacct: Account blkio completion on the correct task")
Signed-off-by: Tejun Heo <tj(a)kernel.org>
Reported-by: Dave Jones <dsj(a)fb.com>
Debugged-by: Dave Jones <dsj(a)fb.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Josh Snyder <joshs(a)netflix.com>
Cc: <stable(a)vger.kernel.org> [4.15+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/delayacct.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/delayacct.h~delayacct-fix-crash-in-delayacct_blkio_end-after-delayacct-init-failure
+++ a/include/linux/delayacct.h
@@ -124,7 +124,7 @@ static inline void delayacct_blkio_start
static inline void delayacct_blkio_end(struct task_struct *p)
{
- if (current->delays)
+ if (p->delays)
__delayacct_blkio_end(p);
delayacct_clear_flag(DELAYACCT_PF_BLKIO);
}
_
Patches currently in -mm which might be from tj(a)kernel.org are
This is the start of the stable review cycle for the 3.18.117 release.
There are 27 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun Jul 29 10:26:38 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.18.117-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-3.18.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 3.18.117-rc1
Arnd Bergmann <arnd(a)arndb.de>
turn off -Wattribute-alias
Arnd Bergmann <arnd(a)arndb.de>
ARM: fix put_user() for gcc-8
Anssi Hannula <anssi.hannula(a)bitwise.fi>
can: xilinx_can: fix RX overflow interrupt not being enabled
Anssi Hannula <anssi.hannula(a)bitwise.fi>
can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting
Anssi Hannula <anssi.hannula(a)bitwise.fi>
can: xilinx_can: fix device dropping off bus on RX overrun
Anssi Hannula <anssi.hannula(a)bitwise.fi>
can: xilinx_can: fix RX loop if RXNEMP is asserted without RXOK
Jerry Zhang <zhangjerry(a)google.com>
usb: gadget: f_fs: Only return delayed status when len is 0
Bin Liu <b-liu(a)ti.com>
usb: core: handle hub C_PORT_OVER_CURRENT condition
Lubomir Rintel <lkundrak(a)v3.sk>
usb: cdc_acm: Add quirk for Castles VEGA3000
Eric Dumazet <edumazet(a)google.com>
tcp: detect malicious patterns in tcp_collapse_ofo_queue()
Eric Dumazet <edumazet(a)google.com>
tcp: avoid collapses in tcp_prune_queue() if possible
Yuchung Cheng <ycheng(a)google.com>
tcp: do not delay ACK in DCTCP upon CE status change
Yuchung Cheng <ycheng(a)google.com>
tcp: do not cancel delay-AcK on DCTCP special ACK
Yuchung Cheng <ycheng(a)google.com>
tcp: helpers to send special DCTCP ack
Yuchung Cheng <ycheng(a)google.com>
tcp: fix dctcp delayed ACK schedule
Roopa Prabhu <roopa(a)cumulusnetworks.com>
rtnetlink: add rtnl_link_state check in rtnl_configure_link
Jack Morgenstein <jackm(a)dev.mellanox.co.il>
net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper
Paolo Abeni <pabeni(a)redhat.com>
ip: hash fragments consistently
Stefano Brivio <sbrivio(a)redhat.com>
skbuff: Unconditionally copy pfmemalloc in __skb_clone()
Stefano Brivio <sbrivio(a)redhat.com>
net: Don't copy pfmemalloc flag in __copy_skb_header()
Gustavo A. R. Silva <gustavo(a)embeddedor.com>
ptp: fix missing break in switch
Tyler Hicks <tyhicks(a)canonical.com>
ipv4: Return EINVAL when ping_group_range sysctl doesn't map to user ns
Vineet Gupta <vgupta(a)synopsys.com>
ARC: mm: allow mprotect to make stack mappings executable
Alexey Brodkin <abrodkin(a)synopsys.com>
ARC: Fix CONFIG_SWAP
Takashi Iwai <tiwai(a)suse.de>
ALSA: rawmidi: Change resized buffers atomically
OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
fat: fix memory allocation failure handling of match_strdup()
Dewet Thibaut <thibaut.dewet(a)nokia.com>
x86/MCE: Remove min interval polling limitation
-------------
Diffstat:
Makefile | 5 +-
arch/arc/include/asm/page.h | 2 +-
arch/arc/include/asm/pgtable.h | 2 +-
arch/arm/include/asm/uaccess.h | 2 +-
arch/x86/kernel/cpu/mcheck/mce.c | 3 -
drivers/net/can/xilinx_can.c | 98 ++++++++++++++++------
.../net/ethernet/mellanox/mlx4/resource_tracker.c | 2 +-
drivers/ptp/ptp_chardev.c | 1 +
drivers/usb/class/cdc-acm.c | 3 +
drivers/usb/core/hub.c | 8 +-
drivers/usb/gadget/function/f_fs.c | 2 +-
fs/fat/inode.c | 20 +++--
include/linux/skbuff.h | 12 +--
include/net/tcp.h | 2 +
net/core/rtnetlink.c | 9 +-
net/core/skbuff.c | 1 +
net/ipv4/ip_output.c | 2 +
net/ipv4/sysctl_net_ipv4.c | 5 +-
net/ipv4/tcp_dctcp.c | 50 ++++-------
net/ipv4/tcp_input.c | 21 ++++-
net/ipv4/tcp_output.c | 33 ++++++--
net/ipv6/ip6_output.c | 2 +
sound/core/rawmidi.c | 20 +++--
23 files changed, 198 insertions(+), 107 deletions(-)
'ac->ac_2order' is a user-controlled value used to index into
'grp->bb_counters' and based on the value at that index, 'ac->ac_found'
is written to. Clamp the value right after the bounds check to avoid a
speculative out-of-bounds read of 'grp->bb_counters'.
This also protects the access of the s_mb_offsets and s_mb_maxs arrays
inside mb_find_buddy().
These gadgets were discovered with the help of smatch:
* fs/ext4/mballoc.c:1896 ext4_mb_simple_scan_group() warn: potential
spectre issue 'grp->bb_counters' [w] (local cap)
* fs/ext4/mballoc.c:445 mb_find_buddy() warn: potential spectre issue
'EXT4_SB(e4b->bd_sb)->s_mb_offsets' [r] (local cap)
* fs/ext4/mballoc.c:446 mb_find_buddy() warn: potential spectre issue
'EXT4_SB(e4b->bd_sb)->s_mb_maxs' [r] (local cap)
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jeremy Cline <jcline(a)redhat.com>
---
fs/ext4/mballoc.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index f7ab34088162..c0866007a949 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -14,6 +14,7 @@
#include <linux/log2.h>
#include <linux/module.h>
#include <linux/slab.h>
+#include <linux/nospec.h>
#include <linux/backing-dev.h>
#include <trace/events/ext4.h>
@@ -1893,6 +1894,7 @@ void ext4_mb_simple_scan_group(struct ext4_allocation_context *ac,
BUG_ON(ac->ac_2order <= 0);
for (i = ac->ac_2order; i <= sb->s_blocksize_bits + 1; i++) {
+ i = array_index_nospec(i, sb->s_blocksize_bits + 2);
if (grp->bb_counters[i] == 0)
continue;
--
2.17.1
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 393753b217f05474e714aea36c37501546ed1202 Mon Sep 17 00:00:00 2001
From: Roman Fietze <roman.fietze(a)telemotive.de>
Date: Wed, 11 Jul 2018 15:36:14 +0200
Subject: [PATCH] can: m_can.c: fix setup of CCCR register: clear CCCR NISO bit
before checking can.ctrlmode
Inside m_can_chip_config(), when setting up the new value of the CCCR,
the CCCR_NISO bit is not cleared like the others, CCCR_TEST, CCCR_MON,
CCCR_BRSE and CCCR_FDOE, before checking the can.ctrlmode bits for
CAN_CTRLMODE_FD_NON_ISO.
This way once the controller was configured for CAN_CTRLMODE_FD_NON_ISO,
this mode could never be cleared again.
This fix is only relevant for controllers with version 3.1.x or 3.2.x.
Older versions do not support NISO.
Signed-off-by: Roman Fietze <roman.fietze(a)telemotive.de>
Cc: linux-stable <stable(a)vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index b397a33f3d32..8e2b7f873c4d 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -1109,7 +1109,8 @@ static void m_can_chip_config(struct net_device *dev)
} else {
/* Version 3.1.x or 3.2.x */
- cccr &= ~(CCCR_TEST | CCCR_MON | CCCR_BRSE | CCCR_FDOE);
+ cccr &= ~(CCCR_TEST | CCCR_MON | CCCR_BRSE | CCCR_FDOE |
+ CCCR_NISO);
/* Only 3.2.x has NISO Bit implemented */
if (priv->can.ctrlmode & CAN_CTRLMODE_FD_NON_ISO)
I would like to contact the person who manages your images for your
company?
We services such as background image cut out, clipping path, shadow adding
(drop shadow, reflection shadow, natural shadow, mirror effect), image
masking, product image editing.
The following are the kind of services together:
Clipping Path Service
Cut out image,Image Clipping, Clip image
Photo Masking Service
Crop image, Photo cut out
Beauty Retouching, Model retouching
We can give you editing test on your photos.
Also, we also use the most recent application as well as techniques such as
Adobe Photoshop.
Thanks,
Jeremy
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2f4f0f338cf453bfcdbcf089e177c16f35f023c8 Mon Sep 17 00:00:00 2001
From: Anssi Hannula <anssi.hannula(a)bitwise.fi>
Date: Mon, 26 Feb 2018 14:39:59 +0200
Subject: [PATCH] can: xilinx_can: fix incorrect clear of non-processed
interrupts
xcan_interrupt() clears ERROR|RXOFLV|BSOFF|ARBLST interrupts if any of
them is asserted. This does not take into account that some of them
could have been asserted between interrupt status read and interrupt
clear, therefore clearing them without handling them.
Fix the code to only clear those interrupts that it knows are asserted
and therefore going to be processed in xcan_err_interrupt().
Fixes: b1201e44f50b ("can: xilinx CAN controller support")
Signed-off-by: Anssi Hannula <anssi.hannula(a)bitwise.fi>
Cc: Michal Simek <michal.simek(a)xilinx.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c
index ea9f9d1a5ba7..cb80a9aa7281 100644
--- a/drivers/net/can/xilinx_can.c
+++ b/drivers/net/can/xilinx_can.c
@@ -938,6 +938,7 @@ static irqreturn_t xcan_interrupt(int irq, void *dev_id)
struct net_device *ndev = (struct net_device *)dev_id;
struct xcan_priv *priv = netdev_priv(ndev);
u32 isr, ier;
+ u32 isr_errors;
/* Get the interrupt status from Xilinx CAN */
isr = priv->read_reg(priv, XCAN_ISR_OFFSET);
@@ -956,11 +957,10 @@ static irqreturn_t xcan_interrupt(int irq, void *dev_id)
xcan_tx_interrupt(ndev, isr);
/* Check for the type of error interrupt and Processing it */
- if (isr & (XCAN_IXR_ERROR_MASK | XCAN_IXR_RXOFLW_MASK |
- XCAN_IXR_BSOFF_MASK | XCAN_IXR_ARBLST_MASK)) {
- priv->write_reg(priv, XCAN_ICR_OFFSET, (XCAN_IXR_ERROR_MASK |
- XCAN_IXR_RXOFLW_MASK | XCAN_IXR_BSOFF_MASK |
- XCAN_IXR_ARBLST_MASK));
+ isr_errors = isr & (XCAN_IXR_ERROR_MASK | XCAN_IXR_RXOFLW_MASK |
+ XCAN_IXR_BSOFF_MASK | XCAN_IXR_ARBLST_MASK);
+ if (isr_errors) {
+ priv->write_reg(priv, XCAN_ICR_OFFSET, isr_errors);
xcan_err_interrupt(ndev, isr);
}
Starting with gcc-8.1, we get a warning about all system call definitions,
which use an alias between functions with incompatible prototypes, e.g.:
In file included from ../mm/process_vm_access.c:19:
../include/linux/syscalls.h:211:18: warning: 'sys_process_vm_readv' alias between functions of incompatible types 'long int(pid_t, const struct iovec *, long unsigned int, const struct iovec *, long unsigned int, long unsigned int)' {aka 'long int(int, const struct iovec *, long unsigned int, const struct iovec *, long unsigned int, long unsigned int)'} and 'long int(long int, long int, long int, long int, long int, long int)' [-Wattribute-alias]
asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
^~~
../include/linux/syscalls.h:207:2: note: in expansion of macro '__SYSCALL_DEFINEx'
__SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
^~~~~~~~~~~~~~~~~
../include/linux/syscalls.h:201:36: note: in expansion of macro 'SYSCALL_DEFINEx'
#define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
^~~~~~~~~~~~~~~
../mm/process_vm_access.c:300:1: note: in expansion of macro 'SYSCALL_DEFINE6'
SYSCALL_DEFINE6(process_vm_readv, pid_t, pid, const struct iovec __user *, lvec,
^~~~~~~~~~~~~~~
../include/linux/syscalls.h:215:18: note: aliased declaration here
asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
^~~
../include/linux/syscalls.h:207:2: note: in expansion of macro '__SYSCALL_DEFINEx'
__SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
^~~~~~~~~~~~~~~~~
../include/linux/syscalls.h:201:36: note: in expansion of macro 'SYSCALL_DEFINEx'
#define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
^~~~~~~~~~~~~~~
../mm/process_vm_access.c:300:1: note: in expansion of macro 'SYSCALL_DEFINE6'
SYSCALL_DEFINE6(process_vm_readv, pid_t, pid, const struct iovec __user *, lvec,
This is really noisy and does not indicate a real problem. In the latest
mainline kernel, this was addressed by commit bee20031772a ("disable
-Wattribute-alias warning for SYSCALL_DEFINEx()"), which seems too invasive
to backport.
This takes a much simpler approach and just disables the warning across the
kernel.
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
---
Makefile | 1 +
1 file changed, 1 insertion(+)
diff --git a/Makefile b/Makefile
index a44d6b2adb76..91f9d2d56eac 100644
--- a/Makefile
+++ b/Makefile
@@ -642,6 +642,7 @@ KBUILD_CFLAGS += $(call cc-disable-warning,frame-address,)
KBUILD_CFLAGS += $(call cc-disable-warning, format-truncation)
KBUILD_CFLAGS += $(call cc-disable-warning, format-overflow)
KBUILD_CFLAGS += $(call cc-disable-warning, int-in-bool-context)
+KBUILD_CFLAGS += $(call cc-disable-warning, attribute-alias)
ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
KBUILD_CFLAGS += $(call cc-option,-Oz,-Os)
--
2.18.0
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 393753b217f05474e714aea36c37501546ed1202 Mon Sep 17 00:00:00 2001
From: Roman Fietze <roman.fietze(a)telemotive.de>
Date: Wed, 11 Jul 2018 15:36:14 +0200
Subject: [PATCH] can: m_can.c: fix setup of CCCR register: clear CCCR NISO bit
before checking can.ctrlmode
Inside m_can_chip_config(), when setting up the new value of the CCCR,
the CCCR_NISO bit is not cleared like the others, CCCR_TEST, CCCR_MON,
CCCR_BRSE and CCCR_FDOE, before checking the can.ctrlmode bits for
CAN_CTRLMODE_FD_NON_ISO.
This way once the controller was configured for CAN_CTRLMODE_FD_NON_ISO,
this mode could never be cleared again.
This fix is only relevant for controllers with version 3.1.x or 3.2.x.
Older versions do not support NISO.
Signed-off-by: Roman Fietze <roman.fietze(a)telemotive.de>
Cc: linux-stable <stable(a)vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index b397a33f3d32..8e2b7f873c4d 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -1109,7 +1109,8 @@ static void m_can_chip_config(struct net_device *dev)
} else {
/* Version 3.1.x or 3.2.x */
- cccr &= ~(CCCR_TEST | CCCR_MON | CCCR_BRSE | CCCR_FDOE);
+ cccr &= ~(CCCR_TEST | CCCR_MON | CCCR_BRSE | CCCR_FDOE |
+ CCCR_NISO);
/* Only 3.2.x has NISO Bit implemented */
if (priv->can.ctrlmode & CAN_CTRLMODE_FD_NON_ISO)
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1675bee3e732c2449e792feed9caff804f3bd42c Mon Sep 17 00:00:00 2001
From: Faiz Abbas <faiz_abbas(a)ti.com>
Date: Tue, 3 Jul 2018 16:41:02 +0530
Subject: [PATCH] can: m_can: Fix runtime resume call
pm_runtime_get_sync() returns a 1 if the state of the device is already
'active'. This is not a failure case and should return a success.
Therefore fix error handling for pm_runtime_get_sync() call such that
it returns success when the value is 1.
Also cleanup the TODO for using runtime PM for sleep mode as that is
implemented.
Signed-off-by: Faiz Abbas <faiz_abbas(a)ti.com>
Cc: <stable(a)vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index 8e2b7f873c4d..e2f965c2e3aa 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -634,10 +634,12 @@ static int m_can_clk_start(struct m_can_priv *priv)
int err;
err = pm_runtime_get_sync(priv->device);
- if (err)
+ if (err < 0) {
pm_runtime_put_noidle(priv->device);
+ return err;
+ }
- return err;
+ return 0;
}
static void m_can_clk_stop(struct m_can_priv *priv)
@@ -1688,8 +1690,6 @@ static int m_can_plat_probe(struct platform_device *pdev)
return ret;
}
-/* TODO: runtime PM with power down or sleep mode */
-
static __maybe_unused int m_can_suspend(struct device *dev)
{
struct net_device *ndev = dev_get_drvdata(dev);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1675bee3e732c2449e792feed9caff804f3bd42c Mon Sep 17 00:00:00 2001
From: Faiz Abbas <faiz_abbas(a)ti.com>
Date: Tue, 3 Jul 2018 16:41:02 +0530
Subject: [PATCH] can: m_can: Fix runtime resume call
pm_runtime_get_sync() returns a 1 if the state of the device is already
'active'. This is not a failure case and should return a success.
Therefore fix error handling for pm_runtime_get_sync() call such that
it returns success when the value is 1.
Also cleanup the TODO for using runtime PM for sleep mode as that is
implemented.
Signed-off-by: Faiz Abbas <faiz_abbas(a)ti.com>
Cc: <stable(a)vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index 8e2b7f873c4d..e2f965c2e3aa 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -634,10 +634,12 @@ static int m_can_clk_start(struct m_can_priv *priv)
int err;
err = pm_runtime_get_sync(priv->device);
- if (err)
+ if (err < 0) {
pm_runtime_put_noidle(priv->device);
+ return err;
+ }
- return err;
+ return 0;
}
static void m_can_clk_stop(struct m_can_priv *priv)
@@ -1688,8 +1690,6 @@ static int m_can_plat_probe(struct platform_device *pdev)
return ret;
}
-/* TODO: runtime PM with power down or sleep mode */
-
static __maybe_unused int m_can_suspend(struct device *dev)
{
struct net_device *ndev = dev_get_drvdata(dev);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1675bee3e732c2449e792feed9caff804f3bd42c Mon Sep 17 00:00:00 2001
From: Faiz Abbas <faiz_abbas(a)ti.com>
Date: Tue, 3 Jul 2018 16:41:02 +0530
Subject: [PATCH] can: m_can: Fix runtime resume call
pm_runtime_get_sync() returns a 1 if the state of the device is already
'active'. This is not a failure case and should return a success.
Therefore fix error handling for pm_runtime_get_sync() call such that
it returns success when the value is 1.
Also cleanup the TODO for using runtime PM for sleep mode as that is
implemented.
Signed-off-by: Faiz Abbas <faiz_abbas(a)ti.com>
Cc: <stable(a)vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index 8e2b7f873c4d..e2f965c2e3aa 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -634,10 +634,12 @@ static int m_can_clk_start(struct m_can_priv *priv)
int err;
err = pm_runtime_get_sync(priv->device);
- if (err)
+ if (err < 0) {
pm_runtime_put_noidle(priv->device);
+ return err;
+ }
- return err;
+ return 0;
}
static void m_can_clk_stop(struct m_can_priv *priv)
@@ -1688,8 +1690,6 @@ static int m_can_plat_probe(struct platform_device *pdev)
return ret;
}
-/* TODO: runtime PM with power down or sleep mode */
-
static __maybe_unused int m_can_suspend(struct device *dev)
{
struct net_device *ndev = dev_get_drvdata(dev);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 56406e017a883b54b339207b230f85599f4d70ae Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Antti=20Sepp=C3=A4l=C3=A4?= <a.seppala(a)gmail.com>
Date: Thu, 5 Jul 2018 17:31:53 +0300
Subject: [PATCH] usb: dwc2: Fix DMA alignment to start at allocated boundary
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The commit 3bc04e28a030 ("usb: dwc2: host: Get aligned DMA in a more
supported way") introduced a common way to align DMA allocations.
The code in the commit aligns the struct dma_aligned_buffer but the
actual DMA address pointed by data[0] gets aligned to an offset from
the allocated boundary by the kmalloc_ptr and the old_xfer_buffer
pointers.
This is against the recommendation in Documentation/DMA-API.txt which
states:
Therefore, it is recommended that driver writers who don't take
special care to determine the cache line size at run time only map
virtual regions that begin and end on page boundaries (which are
guaranteed also to be cache line boundaries).
The effect of this is that architectures with non-coherent DMA caches
may run into memory corruption or kernel crashes with Unhandled
kernel unaligned accesses exceptions.
Fix the alignment by positioning the DMA area in front of the allocation
and use memory at the end of the area for storing the orginal
transfer_buffer pointer. This may have the added benefit of increased
performance as the DMA area is now fully aligned on all architectures.
Tested with Lantiq xRX200 (MIPS) and RPi Model B Rev 2 (ARM).
Fixes: 3bc04e28a030 ("usb: dwc2: host: Get aligned DMA in a more supported way")
Cc: <stable(a)vger.kernel.org>
Reviewed-by: Douglas Anderson <dianders(a)chromium.org>
Signed-off-by: Antti Seppälä <a.seppala(a)gmail.com>
Signed-off-by: Felipe Balbi <felipe.balbi(a)linux.intel.com>
diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index b1104be3429c..2ed0ac18e053 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -2665,34 +2665,29 @@ static int dwc2_alloc_split_dma_aligned_buf(struct dwc2_hsotg *hsotg,
#define DWC2_USB_DMA_ALIGN 4
-struct dma_aligned_buffer {
- void *kmalloc_ptr;
- void *old_xfer_buffer;
- u8 data[0];
-};
-
static void dwc2_free_dma_aligned_buffer(struct urb *urb)
{
- struct dma_aligned_buffer *temp;
+ void *stored_xfer_buffer;
if (!(urb->transfer_flags & URB_ALIGNED_TEMP_BUFFER))
return;
- temp = container_of(urb->transfer_buffer,
- struct dma_aligned_buffer, data);
+ /* Restore urb->transfer_buffer from the end of the allocated area */
+ memcpy(&stored_xfer_buffer, urb->transfer_buffer +
+ urb->transfer_buffer_length, sizeof(urb->transfer_buffer));
if (usb_urb_dir_in(urb))
- memcpy(temp->old_xfer_buffer, temp->data,
+ memcpy(stored_xfer_buffer, urb->transfer_buffer,
urb->transfer_buffer_length);
- urb->transfer_buffer = temp->old_xfer_buffer;
- kfree(temp->kmalloc_ptr);
+ kfree(urb->transfer_buffer);
+ urb->transfer_buffer = stored_xfer_buffer;
urb->transfer_flags &= ~URB_ALIGNED_TEMP_BUFFER;
}
static int dwc2_alloc_dma_aligned_buffer(struct urb *urb, gfp_t mem_flags)
{
- struct dma_aligned_buffer *temp, *kmalloc_ptr;
+ void *kmalloc_ptr;
size_t kmalloc_size;
if (urb->num_sgs || urb->sg ||
@@ -2700,22 +2695,29 @@ static int dwc2_alloc_dma_aligned_buffer(struct urb *urb, gfp_t mem_flags)
!((uintptr_t)urb->transfer_buffer & (DWC2_USB_DMA_ALIGN - 1)))
return 0;
- /* Allocate a buffer with enough padding for alignment */
+ /*
+ * Allocate a buffer with enough padding for original transfer_buffer
+ * pointer. This allocation is guaranteed to be aligned properly for
+ * DMA
+ */
kmalloc_size = urb->transfer_buffer_length +
- sizeof(struct dma_aligned_buffer) + DWC2_USB_DMA_ALIGN - 1;
+ sizeof(urb->transfer_buffer);
kmalloc_ptr = kmalloc(kmalloc_size, mem_flags);
if (!kmalloc_ptr)
return -ENOMEM;
- /* Position our struct dma_aligned_buffer such that data is aligned */
- temp = PTR_ALIGN(kmalloc_ptr + 1, DWC2_USB_DMA_ALIGN) - 1;
- temp->kmalloc_ptr = kmalloc_ptr;
- temp->old_xfer_buffer = urb->transfer_buffer;
+ /*
+ * Position value of original urb->transfer_buffer pointer to the end
+ * of allocation for later referencing
+ */
+ memcpy(kmalloc_ptr + urb->transfer_buffer_length,
+ &urb->transfer_buffer, sizeof(urb->transfer_buffer));
+
if (usb_urb_dir_out(urb))
- memcpy(temp->data, urb->transfer_buffer,
+ memcpy(kmalloc_ptr, urb->transfer_buffer,
urb->transfer_buffer_length);
- urb->transfer_buffer = temp->data;
+ urb->transfer_buffer = kmalloc_ptr;
urb->transfer_flags |= URB_ALIGNED_TEMP_BUFFER;
I would like to contact the person who manages your images for your
company?
We services such as background image cut out, clipping path, shadow adding
(drop shadow, reflection shadow, natural shadow, mirror effect), image
masking, product image editing.
The following are the kind of services together:
Clipping Path Service
Cut out image,Image Clipping, Clip image
Photo Masking Service
Crop image, Photo cut out
Beauty Retouching, Model retouching
We can give you editing test on your photos.
Also, we also use the most recent application as well as techniques such as
Adobe Photoshop.
Thanks,
Jeremy
[ Upstream commit f8f4bef322e4600c5856911c7a632c0e3da920d6 ]
When dealing with ingress rule on a netdev, if we did fine through the
conventional path, there's no need to continue into the egdev route,
and we can stop right there.
Not doing so may cause a 2nd rule to be added by the cls api layer
with the ingress being the egdev.
For example, under sriov switchdev scheme, a user rule of VFR A --> VFR B
will end up with two HW rules (1) VF A --> VF B and (2) uplink --> VF B
Fixes: 208c0f4b5237 ('net: sched: use tc_setup_cb_call to call per-block callbacks')
Signed-off-by: Or Gerlitz <ogerlitz(a)mellanox.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
Hi Greg,
The commit that introduced the problem dates to 4.15 and the fix made
on 4.17. I see the fix was pushed to 4.16-stable but not to 4.15-stable,
so sending it now.
Or.
net/sched/cls_api.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index c2c732a..86d2d59 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1587,7 +1587,7 @@ int tc_setup_cb_call(struct tcf_block *block, struct tcf_exts *exts,
return ret;
ok_count = ret;
- if (!exts)
+ if (!exts || ok_count)
return ok_count;
ret = tc_exts_setup_cb_egdev_call(exts, type, type_data, err_stop);
if (ret < 0)
--
2.3.7
[ Eric please double check my TCP backports, thank you... ]
Please queue up the following networking fixes for v4.14.x and v4.17.x
-stable, respectively.
Thank you!
From: Shakeel Butt <shakeelb(a)google.com>
Subject: kvm, mm: account shadow page tables to kmemcg
The size of kvm's shadow page tables corresponds to the size of the guest
virtual machines on the system. Large VMs can spend a significant amount
of memory as shadow page tables which can not be left as system memory
overhead. So, account shadow page tables to the kmemcg.
[shakeelb(a)google.com: replace (GFP_KERNEL|__GFP_ACCOUNT) with GFP_KERNEL_ACCOUNT]
Link: http://lkml.kernel.org/r/20180629140224.205849-1-shakeelb@google.com
Link: http://lkml.kernel.org/r/20180627181349.149778-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Greg Thelen <gthelen(a)google.com>
Cc: Radim Krčmář <rkrcmar(a)redhat.com>
Cc: Peter Feiner <pfeiner(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/kvm/mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kvm/mmu.c~kvm-mm-account-shadow-page-tables-to-kmemcg
+++ a/arch/x86/kvm/mmu.c
@@ -890,7 +890,7 @@ static int mmu_topup_memory_cache_page(s
if (cache->nobjs >= min)
return 0;
while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
- page = (void *)__get_free_page(GFP_KERNEL);
+ page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
if (!page)
return -ENOMEM;
cache->objects[cache->nobjs++] = page;
_
The patch titled
Subject: slub: track number of slabs irrespective of CONFIG_SLUB_DEBUG
has been removed from the -mm tree. Its filename was
slub-track-number-of-slabs-irrespective-of-config_slub_debug.patch
This patch was dropped because it is obsolete
------------------------------------------------------
From: Shakeel Butt <shakeelb(a)google.com>
Subject: slub: track number of slabs irrespective of CONFIG_SLUB_DEBUG
For !CONFIG_SLUB_DEBUG, SLUB does not maintain the number of slabs
allocated per node for a kmem_cache. Thus, slabs_node() in
__kmem_cache_empty(), __kmem_cache_shrink() and __kmem_cache_destroy()
will always return 0 for such config. This is wrong and can cause issues
for all users of these functions.
In fact in [1] Jason has reported a system crash while using SLUB without
CONFIG_SLUB_DEBUG. The reason was the usage of slabs_node() by
__kmem_cache_empty().
The right solution is to make slabs_node() work even for
!CONFIG_SLUB_DEBUG. The commit 0f389ec63077 ("slub: No need for per node
slab counters if !SLUB_DEBUG") had put the per node slab counter under
CONFIG_SLUB_DEBUG because it was only read through sysfs API and the sysfs
API was disabled on !CONFIG_SLUB_DEBUG. However the users of the per node
slab counter assumed that it will work in the absence of
CONFIG_SLUB_DEBUG. So, make the counter work for !CONFIG_SLUB_DEBUG.
Please note that f9e13c0a5a33 ("slab, slub: skip unnecessary
kasan_cache_shutdown()") exposed this issue but it is present even before.
[1] http://lkml.kernel.org/r/CAHmME9rtoPwxUSnktxzKso14iuVCWT7BE_-_8PAC=pGw1iJnQ…
Link: http://lkml.kernel.org/r/20180620224147.23777-1-shakeelb@google.com
Fixes: f9e13c0a5a33 ("slab, slub: skip unnecessary kasan_cache_shutdown()")
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Suggested-by: David Rientjes <rientjes(a)google.com>
Reported-by: Jason A . Donenfeld <Jason(a)zx2c4.com>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/slab.h | 2 -
mm/slub.c | 80 ++++++++++++++++++++++++----------------------------
2 files changed, 38 insertions(+), 44 deletions(-)
diff -puN mm/slab.h~slub-track-number-of-slabs-irrespective-of-config_slub_debug mm/slab.h
--- a/mm/slab.h~slub-track-number-of-slabs-irrespective-of-config_slub_debug
+++ a/mm/slab.h
@@ -473,8 +473,8 @@ struct kmem_cache_node {
#ifdef CONFIG_SLUB
unsigned long nr_partial;
struct list_head partial;
-#ifdef CONFIG_SLUB_DEBUG
atomic_long_t nr_slabs;
+#ifdef CONFIG_SLUB_DEBUG
atomic_long_t total_objects;
struct list_head full;
#endif
diff -puN mm/slub.c~slub-track-number-of-slabs-irrespective-of-config_slub_debug mm/slub.c
--- a/mm/slub.c~slub-track-number-of-slabs-irrespective-of-config_slub_debug
+++ a/mm/slub.c
@@ -1030,42 +1030,6 @@ static void remove_full(struct kmem_cach
list_del(&page->lru);
}
-/* Tracking of the number of slabs for debugging purposes */
-static inline unsigned long slabs_node(struct kmem_cache *s, int node)
-{
- struct kmem_cache_node *n = get_node(s, node);
-
- return atomic_long_read(&n->nr_slabs);
-}
-
-static inline unsigned long node_nr_slabs(struct kmem_cache_node *n)
-{
- return atomic_long_read(&n->nr_slabs);
-}
-
-static inline void inc_slabs_node(struct kmem_cache *s, int node, int objects)
-{
- struct kmem_cache_node *n = get_node(s, node);
-
- /*
- * May be called early in order to allocate a slab for the
- * kmem_cache_node structure. Solve the chicken-egg
- * dilemma by deferring the increment of the count during
- * bootstrap (see early_kmem_cache_node_alloc).
- */
- if (likely(n)) {
- atomic_long_inc(&n->nr_slabs);
- atomic_long_add(objects, &n->total_objects);
- }
-}
-static inline void dec_slabs_node(struct kmem_cache *s, int node, int objects)
-{
- struct kmem_cache_node *n = get_node(s, node);
-
- atomic_long_dec(&n->nr_slabs);
- atomic_long_sub(objects, &n->total_objects);
-}
-
/* Object debug checks for alloc/free paths */
static void setup_object_debug(struct kmem_cache *s, struct page *page,
void *object)
@@ -1321,16 +1285,46 @@ slab_flags_t kmem_cache_flags(unsigned i
#define disable_higher_order_debug 0
+#endif /* CONFIG_SLUB_DEBUG */
+
static inline unsigned long slabs_node(struct kmem_cache *s, int node)
- { return 0; }
+{
+ struct kmem_cache_node *n = get_node(s, node);
+
+ return atomic_long_read(&n->nr_slabs);
+}
+
static inline unsigned long node_nr_slabs(struct kmem_cache_node *n)
- { return 0; }
-static inline void inc_slabs_node(struct kmem_cache *s, int node,
- int objects) {}
-static inline void dec_slabs_node(struct kmem_cache *s, int node,
- int objects) {}
+{
+ return atomic_long_read(&n->nr_slabs);
+}
-#endif /* CONFIG_SLUB_DEBUG */
+static inline void inc_slabs_node(struct kmem_cache *s, int node, int objects)
+{
+ struct kmem_cache_node *n = get_node(s, node);
+
+ /*
+ * May be called early in order to allocate a slab for the
+ * kmem_cache_node structure. Solve the chicken-egg
+ * dilemma by deferring the increment of the count during
+ * bootstrap (see early_kmem_cache_node_alloc).
+ */
+ if (likely(n)) {
+ atomic_long_inc(&n->nr_slabs);
+#ifdef CONFIG_SLUB_DEBUG
+ atomic_long_add(objects, &n->total_objects);
+#endif
+ }
+}
+static inline void dec_slabs_node(struct kmem_cache *s, int node, int objects)
+{
+ struct kmem_cache_node *n = get_node(s, node);
+
+ atomic_long_dec(&n->nr_slabs);
+#ifdef CONFIG_SLUB_DEBUG
+ atomic_long_sub(objects, &n->total_objects);
+#endif
+}
/*
* Hooks for other subsystems that check memory allocations. In a typical
_
Patches currently in -mm which might be from shakeelb(a)google.com are
kvm-mm-account-shadow-page-tables-to-kmemcg.patch
fs-fsnotify-account-fsnotify-metadata-to-kmemcg.patch
fs-fsnotify-account-fsnotify-metadata-to-kmemcg-fix.patch
fs-mm-account-buffer_head-to-kmemcg.patch
fs-mm-account-buffer_head-to-kmemcgpatchfix.patch
memcg-reduce-memcg-tree-traversals-for-stats-collection.patch
From: Snild Dolkow <snild(a)sony.com>
There is a window for racing when printing directly to task->comm,
allowing other threads to see a non-terminated string. The vsnprintf
function fills the buffer, counts the truncated chars, then finally
writes the \0 at the end.
creator other
vsnprintf:
fill (not terminated)
count the rest trace_sched_waking(p):
... memcpy(comm, p->comm, TASK_COMM_LEN)
write \0
The consequences depend on how 'other' uses the string. In our case,
it was copied into the tracing system's saved cmdlines, a buffer of
adjacent TASK_COMM_LEN-byte buffers (note the 'n' where 0 should be):
crash-arm64> x/1024s savedcmd->saved_cmdlines | grep 'evenk'
0xffffffd5b3818640: "irq/497-pwr_evenkworker/u16:12"
...and a strcpy out of there would cause stack corruption:
[224761.522292] Kernel panic - not syncing: stack-protector:
Kernel stack is corrupted in: ffffff9bf9783c78
crash-arm64> kbt | grep 'comm\|trace_print_context'
#6 0xffffff9bf9783c78 in trace_print_context+0x18c(+396)
comm (char [16]) = "irq/497-pwr_even"
crash-arm64> rd 0xffffffd4d0e17d14 8
ffffffd4d0e17d14: 2f71726900000000 5f7277702d373934 ....irq/497-pwr_
ffffffd4d0e17d24: 726f776b6e657665 3a3631752f72656b evenkworker/u16:
ffffffd4d0e17d34: f9780248ff003231 cede60e0ffffff9b 12..H.x......`..
ffffffd4d0e17d44: cede60c8ffffffd4 00000fffffffffd4 .....`..........
The workaround in e09e28671 (use strlcpy in __trace_find_cmdline) was
likely needed because of this same bug.
Solved by vsnprintf:ing to a local buffer, then using set_task_comm().
This way, there won't be a window where comm is not terminated.
Link: http://lkml.kernel.org/r/20180726071539.188015-1-snild@sony.com
Cc: stable(a)vger.kernel.org
Fixes: bc0c38d139ec7 ("ftrace: latency tracer infrastructure")
Reviewed-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Snild Dolkow <snild(a)sony.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
kernel/kthread.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 750cb8082694..486dedbd9af5 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -325,8 +325,14 @@ struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
task = create->result;
if (!IS_ERR(task)) {
static const struct sched_param param = { .sched_priority = 0 };
+ char name[TASK_COMM_LEN];
- vsnprintf(task->comm, sizeof(task->comm), namefmt, args);
+ /*
+ * task is already visible to other tasks, so updating
+ * COMM must be protected.
+ */
+ vsnprintf(name, sizeof(name), namefmt, args);
+ set_task_comm(task, name);
/*
* root may have changed our (kthreadd's) priority or CPU mask.
* The kernel thread should not inherit these properties.
--
2.17.1
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Commit 57ea2a34adf4 ("tracing/kprobes: Fix trace_probe flags on
enable_trace_kprobe() failure") added an if statement that depends on another
if statement that gcc doesn't see will initialize the "link" variable and
gives the warning:
"warning: 'link' may be used uninitialized in this function"
It is really a false positive, but to quiet the warning, and also to make
sure that it never actually is used uninitialized, initialize the "link"
variable to NULL and add an if (!WARN_ON_ONCE(!link)) where the compiler
thinks it could be used uninitialized.
Cc: stable(a)vger.kernel.org
Fixes: 57ea2a34adf4 ("tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure")
Reported-by: kbuild test robot <lkp(a)intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
kernel/trace/trace_kprobe.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 27ace4513c43..6b71860f3998 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -400,7 +400,7 @@ static struct trace_kprobe *find_trace_kprobe(const char *event,
static int
enable_trace_kprobe(struct trace_kprobe *tk, struct trace_event_file *file)
{
- struct event_file_link *link;
+ struct event_file_link *link = NULL;
int ret = 0;
if (file) {
@@ -426,7 +426,9 @@ enable_trace_kprobe(struct trace_kprobe *tk, struct trace_event_file *file)
if (ret) {
if (file) {
- list_del_rcu(&link->list);
+ /* Notice the if is true on not WARN() */
+ if (!WARN_ON_ONCE(!link))
+ list_del_rcu(&link->list);
kfree(link);
tk->tp.flags &= ~TP_FLAG_TRACE;
} else {
--
2.17.1
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
There was a case that triggered a double free in event_trigger_callback()
due to the called reg() function freeing the trigger_data and then it
getting freed again by the error return by the caller. The solution there
was to up the trigger_data ref count.
Code inspection found that event_enable_trigger_func() has the same issue,
but is not as easy to trigger (requires harder to trigger failures). It
needs to be solved slightly different as it needs more to clean up when the
reg() function fails.
Link: http://lkml.kernel.org/r/20180725124008.7008e586@gandalf.local.home
Cc: stable(a)vger.kernel.org
Fixes: 7862ad1846e99 ("tracing: Add 'enable_event' and 'disable_event' event trigger commands")
Reivewed-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
kernel/trace/trace_events_trigger.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index d18ec0e58be2..5dea177cef53 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -1420,6 +1420,9 @@ int event_enable_trigger_func(struct event_command *cmd_ops,
goto out;
}
+ /* Up the trigger_data count to make sure nothing frees it on failure */
+ event_trigger_init(trigger_ops, trigger_data);
+
if (trigger) {
number = strsep(&trigger, ":");
@@ -1470,6 +1473,7 @@ int event_enable_trigger_func(struct event_command *cmd_ops,
goto out_disable;
/* Just return zero, not the number of enabled functions */
ret = 0;
+ event_trigger_free(trigger_ops, trigger_data);
out:
return ret;
@@ -1480,7 +1484,7 @@ int event_enable_trigger_func(struct event_command *cmd_ops,
out_free:
if (cmd_ops->set_filter)
cmd_ops->set_filter(NULL, trigger_data, NULL);
- kfree(trigger_data);
+ event_trigger_free(trigger_ops, trigger_data);
kfree(enable_data);
goto out;
}
--
2.17.1
From: Masami Hiramatsu <mhiramat(a)kernel.org>
Maintain the tracing on/off setting of the ring_buffer when switching
to the trace buffer snapshot.
Taking a snapshot is done by swapping the backup ring buffer
(max_tr_buffer). But since the tracing on/off setting is defined
by the ring buffer, when swapping it, the tracing on/off setting
can also be changed. This causes a strange result like below:
/sys/kernel/debug/tracing # cat tracing_on
1
/sys/kernel/debug/tracing # echo 0 > tracing_on
/sys/kernel/debug/tracing # cat tracing_on
0
/sys/kernel/debug/tracing # echo 1 > snapshot
/sys/kernel/debug/tracing # cat tracing_on
1
/sys/kernel/debug/tracing # echo 1 > snapshot
/sys/kernel/debug/tracing # cat tracing_on
0
We don't touch tracing_on, but snapshot changes tracing_on
setting each time. This is an anomaly, because user doesn't know
that each "ring_buffer" stores its own tracing-enable state and
the snapshot is done by swapping ring buffers.
Link: http://lkml.kernel.org/r/153149929558.11274.11730609978254724394.stgit@devb…
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Tom Zanussi <tom.zanussi(a)linux.intel.com>
Cc: Hiraku Toyooka <hiraku.toyooka(a)cybertrust.co.jp>
Cc: stable(a)vger.kernel.org
Fixes: debdd57f5145 ("tracing: Make a snapshot feature available from userspace")
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
[ Updated commit log and comment in the code ]
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
include/linux/ring_buffer.h | 1 +
kernel/trace/ring_buffer.c | 16 ++++++++++++++++
kernel/trace/trace.c | 6 ++++++
3 files changed, 23 insertions(+)
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index b72ebdff0b77..003d09ab308d 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -165,6 +165,7 @@ void ring_buffer_record_enable(struct ring_buffer *buffer);
void ring_buffer_record_off(struct ring_buffer *buffer);
void ring_buffer_record_on(struct ring_buffer *buffer);
int ring_buffer_record_is_on(struct ring_buffer *buffer);
+int ring_buffer_record_is_set_on(struct ring_buffer *buffer);
void ring_buffer_record_disable_cpu(struct ring_buffer *buffer, int cpu);
void ring_buffer_record_enable_cpu(struct ring_buffer *buffer, int cpu);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 6a46af21765c..0b0b688ea166 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -3226,6 +3226,22 @@ int ring_buffer_record_is_on(struct ring_buffer *buffer)
return !atomic_read(&buffer->record_disabled);
}
+/**
+ * ring_buffer_record_is_set_on - return true if the ring buffer is set writable
+ * @buffer: The ring buffer to see if write is set enabled
+ *
+ * Returns true if the ring buffer is set writable by ring_buffer_record_on().
+ * Note that this does NOT mean it is in a writable state.
+ *
+ * It may return true when the ring buffer has been disabled by
+ * ring_buffer_record_disable(), as that is a temporary disabling of
+ * the ring buffer.
+ */
+int ring_buffer_record_is_set_on(struct ring_buffer *buffer)
+{
+ return !(atomic_read(&buffer->record_disabled) & RB_BUFFER_OFF);
+}
+
/**
* ring_buffer_record_disable_cpu - stop all writes into the cpu_buffer
* @buffer: The ring buffer to stop writes to.
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 87cf25171fb8..823687997b01 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1373,6 +1373,12 @@ update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu)
arch_spin_lock(&tr->max_lock);
+ /* Inherit the recordable setting from trace_buffer */
+ if (ring_buffer_record_is_set_on(tr->trace_buffer.buffer))
+ ring_buffer_record_on(tr->max_buffer.buffer);
+ else
+ ring_buffer_record_off(tr->max_buffer.buffer);
+
swap(tr->trace_buffer.buffer, tr->max_buffer.buffer);
__update_max_tr(tr, tsk, cpu);
--
2.17.1
The patch below does not apply to the 4.17-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From eb493fbc150f4a28151ae1ee84f24395989f3600 Mon Sep 17 00:00:00 2001
From: Lyude Paul <lyude(a)redhat.com>
Date: Tue, 3 Jul 2018 16:31:41 -0400
Subject: [PATCH] drm/nouveau: Set DRIVER_ATOMIC cap earlier to fix debugfs
Currently nouveau doesn't actually expose the state debugfs file that's
usually provided for any modesetting driver that supports atomic, even
if nouveau is loaded with atomic=1. This is due to the fact that the
standard debugfs files that DRM creates for atomic drivers is called
when drm_get_pci_dev() is called from nouveau_drm.c. This happens well
before we've initialized the display core, which is currently
responsible for setting the DRIVER_ATOMIC cap.
So, move the atomic option into nouveau_drm.c and just add the
DRIVER_ATOMIC cap whenever it's enabled on the kernel commandline. This
shouldn't cause any actual issues, as the atomic ioctl will still fail
as expected even if the display core doesn't disable it until later in
the init sequence. This also provides the added benefit of being able to
use the state debugfs file to check the current display state even if
clients aren't allowed to modify it through anything other than the
legacy ioctls.
Additionally, disable the DRIVER_ATOMIC cap in nv04's display core, as
this was already disabled there previously.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Ben Skeggs <bskeggs(a)redhat.com>
diff --git a/drivers/gpu/drm/nouveau/dispnv04/disp.c b/drivers/gpu/drm/nouveau/dispnv04/disp.c
index 501d2d290e9c..70dce544984e 100644
--- a/drivers/gpu/drm/nouveau/dispnv04/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv04/disp.c
@@ -55,6 +55,9 @@ nv04_display_create(struct drm_device *dev)
nouveau_display(dev)->init = nv04_display_init;
nouveau_display(dev)->fini = nv04_display_fini;
+ /* Pre-nv50 doesn't support atomic, so don't expose the ioctls */
+ dev->driver->driver_features &= ~DRIVER_ATOMIC;
+
nouveau_hw_save_vga_fonts(dev, 1);
nv04_crtc_create(dev, 0);
diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index 31b12b4f321a..9bae4db84cfb 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -2126,10 +2126,6 @@ nv50_display_destroy(struct drm_device *dev)
kfree(disp);
}
-MODULE_PARM_DESC(atomic, "Expose atomic ioctl (default: disabled)");
-static int nouveau_atomic = 0;
-module_param_named(atomic, nouveau_atomic, int, 0400);
-
int
nv50_display_create(struct drm_device *dev)
{
@@ -2154,8 +2150,6 @@ nv50_display_create(struct drm_device *dev)
disp->disp = &nouveau_display(dev)->disp;
dev->mode_config.funcs = &nv50_disp_func;
dev->driver->driver_features |= DRIVER_PREFER_XBGR_30BPP;
- if (nouveau_atomic)
- dev->driver->driver_features |= DRIVER_ATOMIC;
/* small shared memory area we use for notifiers and semaphores */
ret = nouveau_bo_new(&drm->client, 4096, 0x1000, TTM_PL_FLAG_VRAM,
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 514903338782..f5d3158f0378 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -81,6 +81,10 @@ MODULE_PARM_DESC(modeset, "enable driver (default: auto, "
int nouveau_modeset = -1;
module_param_named(modeset, nouveau_modeset, int, 0400);
+MODULE_PARM_DESC(atomic, "Expose atomic ioctl (default: disabled)");
+static int nouveau_atomic = 0;
+module_param_named(atomic, nouveau_atomic, int, 0400);
+
MODULE_PARM_DESC(runpm, "disable (0), force enable (1), optimus only default (-1)");
static int nouveau_runtime_pm = -1;
module_param_named(runpm, nouveau_runtime_pm, int, 0400);
@@ -509,6 +513,9 @@ static int nouveau_drm_probe(struct pci_dev *pdev,
pci_set_master(pdev);
+ if (nouveau_atomic)
+ driver_pci.driver_features |= DRIVER_ATOMIC;
+
ret = drm_get_pci_dev(pdev, pent, &driver_pci);
if (ret) {
nvkm_device_del(&device);
Hi Stable Team,
On 13/06/2018 14:20, Neil Armstrong wrote:
> On Amlogic Meson GXBB & GXL platforms, the SCPI Cortex-M4 Co-Processor
> seems to be dependent on the FCLK_DIV2 to be operationnal.
>
> The issue occured since v4.17-rc1 by freezing the kernel boot when
> the 'schedutil' cpufreq governor was selected as default :
>
> [ 12.071837] scpi_protocol scpi: SCP Protocol 0.0 Firmware 0.0.0 version
> domain-0 init dvfs: 4
> [ 12.087757] hctosys: unable to open rtc device (rtc0)
> [ 12.087907] cfg80211: Loading compiled-in X.509 certificates for regulatory database
> [ 12.102241] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
>
> But when disabling the MMC driver, the boot finished but cpufreq failed to
> change the CPU frequency :
>
> [ 12.153045] cpufreq: __target_index: Failed to change cpu frequency: -5
>
> A bisect between v4.16 and v4.16-rc1 gave the 05f814402d61 commit to be
> the first bad commit.
> This commit added support for the missing clock gates before the fixed PLL
> fixed dividers (FCLK_DIVx) and the clock framework basically disabled
> all the unused fixed dividers, thus disabled a critical clock path for
> the SCPI Co-Processor.
>
> This patch simply sets the FCLK_DIV2 gate as critical to ensure
> nobody can disable it.
>
> Fixes: 05f814402d61 ("clk: meson: add fdiv clock gates")
> Signed-off-by: Neil Armstrong <narmstrong(a)baylibre.com>
This patch hit linux master with commit id c987ac6f1f088663b6dad39281071aeb31d450a8
Could this be backported to the next 4.17 stable release ?
Thanks,
Neil
> ---
> drivers/clk/meson/gxbb.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/clk/meson/gxbb.c b/drivers/clk/meson/gxbb.c
> index b1e4d95..0e053c1 100644
> --- a/drivers/clk/meson/gxbb.c
> +++ b/drivers/clk/meson/gxbb.c
> @@ -511,6 +511,7 @@ static struct clk_regmap gxbb_fclk_div2 = {
> .ops = &clk_regmap_gate_ops,
> .parent_names = (const char *[]){ "fclk_div2_div" },
> .num_parents = 1,
> + .flags = CLK_IS_CRITICAL,
> },
> };
>
>
This is the start of the stable review cycle for the 4.4.139 release.
There are 105 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Tue Jul 3 15:31:30 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.139-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.139-rc1
Szymon Janc <szymon.janc(a)codecoup.pl>
Bluetooth: Fix connection if directed advertising and privacy is used
Bjørn Mork <bjorn(a)mork.no>
cdc_ncm: avoid padding beyond end of skb
Mike Snitzer <snitzer(a)redhat.com>
dm thin: handle running out of data space vs concurrent discard
Keith Busch <keith.busch(a)intel.com>
block: Fix transfer when chunk sectors exceeds max
Maxime Chevallier <maxime.chevallier(a)bootlin.com>
spi: Fix scatterlist elements size in spi_map_buf
Liu Bo <bo.li.liu(a)oracle.com>
Btrfs: fix unexpected cow in run_delalloc_nocow
Takashi Iwai <tiwai(a)suse.de>
ALSA: hda/realtek - Add a quirk for FSC ESPRIMO U9210
??? <kt.liao(a)emc.com.tw>
Input: elantech - fix V4 report decoding for module with middle key
Aaron Ma <aaron.ma(a)canonical.com>
Input: elantech - enable middle button of touchpads on ThinkPad P52
Ben Hutchings <ben.hutchings(a)codethink.co.uk>
Input: elan_i2c_smbus - fix more potential stack buffer overflows
Jan Kara <jack(a)suse.cz>
udf: Detect incorrect directory size
Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
xen: Remove unnecessary BUG_ON from __unbind_from_irq()
Alexandr Savca <alexandr.savca(a)saltedge.com>
Input: elan_i2c - add ELAN0618 (Lenovo v330 15IKB) ACPI ID
Kees Cook <keescook(a)chromium.org>
video: uvesafb: Fix integer overflow in allocation
Dave Wysochanski <dwysocha(a)redhat.com>
NFSv4: Fix possible 1-byte stack overflow in nfs_idmap_read_and_verify_message
Scott Mayhew <smayhew(a)redhat.com>
nfsd: restrict rd_maxcount to svc_max_payload in nfsd_encode_readdir
Mauro Carvalho Chehab <mchehab(a)s-opensource.com>
media: dvb_frontend: fix locking issues at dvb_frontend_get_event()
Kai-Heng Feng <kai.heng.feng(a)canonical.com>
media: cx231xx: Add support for AverMedia DVD EZMaker 7
Mauro Carvalho Chehab <mchehab(a)s-opensource.com>
media: v4l2-compat-ioctl32: prevent go past max size
Adrian Hunter <adrian.hunter(a)intel.com>
perf intel-pt: Fix packet decoding of CYC packets
Adrian Hunter <adrian.hunter(a)intel.com>
perf intel-pt: Fix "Unexpected indirect branch" error
Adrian Hunter <adrian.hunter(a)intel.com>
perf intel-pt: Fix MTC timing after overflow
Adrian Hunter <adrian.hunter(a)intel.com>
perf intel-pt: Fix decoding to accept CBR between FUP and corresponding TIP
Adrian Hunter <adrian.hunter(a)intel.com>
perf intel-pt: Fix sync_switch INTEL_PT_SS_NOT_TRACING
Adrian Hunter <adrian.hunter(a)intel.com>
perf tools: Fix symbol and object code resolution for vdso32 and vdsox32
Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
mfd: intel-lpss: Program REMAP register in PIO mode
Johan Hovold <johan(a)kernel.org>
backlight: tps65217_bl: Fix Device Tree node lookup
Johan Hovold <johan(a)kernel.org>
backlight: max8925_bl: Fix Device Tree node lookup
Johan Hovold <johan(a)kernel.org>
backlight: as3711_bl: Fix Device Tree node lookup
Florian Westphal <fw(a)strlen.de>
xfrm: skip policies marked as dead while rehashing
Tobias Brunner <tobias(a)strongswan.org>
xfrm: Ignore socket policies when rebuilding hash tables
Silvio Cesare <silvio.cesare(a)gmail.com>
UBIFS: Fix potential integer overflow in allocation
Richard Weinberger <richard(a)nod.at>
ubi: fastmap: Cancel work upon detach
NeilBrown <neilb(a)suse.com>
md: fix two problems with setting the "re-add" device state.
Robert Elliott <elliott(a)hpe.com>
linvdimm, pmem: Preserve read-only setting for pmem devices
Steffen Maier <maier(a)linux.ibm.com>
scsi: zfcp: fix missing REC trigger trace on enqueue without ERP thread
Steffen Maier <maier(a)linux.ibm.com>
scsi: zfcp: fix missing REC trigger trace for all objects in ERP_FAILED
Steffen Maier <maier(a)linux.ibm.com>
scsi: zfcp: fix missing REC trigger trace on terminate_rport_io for ERP_FAILED
Steffen Maier <maier(a)linux.ibm.com>
scsi: zfcp: fix missing REC trigger trace on terminate_rport_io early return
Steffen Maier <maier(a)linux.ibm.com>
scsi: zfcp: fix misleading REC trigger trace where erp_action setup failed
Steffen Maier <maier(a)linux.ibm.com>
scsi: zfcp: fix missing SCSI trace for retry of abort / scsi_eh TMF
Steffen Maier <maier(a)linux.ibm.com>
scsi: zfcp: fix missing SCSI trace for result of eh_host_reset_handler
Himanshu Madhani <himanshu.madhani(a)cavium.com>
scsi: qla2xxx: Fix setting lower transfer speed if GPSC fails
Martin Kelly <mkelly(a)xevo.com>
iio:buffer: make length types match kfifo types
Omar Sandoval <osandov(a)fb.com>
Btrfs: fix clone vs chattr NODATASUM race
Geert Uytterhoeven <geert(a)linux-m68k.org>
time: Make sure jiffies_to_msecs() preserves non-zero time periods
Huacai Chen <chenhc(a)lemote.com>
MIPS: io: Add barrier after register read in inX()
Mika Westerberg <mika.westerberg(a)linux.intel.com>
PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume
Tokunori Ikegami <ikegami(a)allied-telesis.co.jp>
MIPS: BCM47XX: Enable 74K Core ExternalSync for PCIe erratum
Joakim Tjernlund <joakim.tjernlund(a)infinera.com>
mtd: cfi_cmdset_0002: Avoid walking all chips when unlocking.
Joakim Tjernlund <joakim.tjernlund(a)infinera.com>
mtd: cfi_cmdset_0002: Fix unlocking requests crossing a chip boudary
Joakim Tjernlund <joakim.tjernlund(a)infinera.com>
mtd: cfi_cmdset_0002: fix SEGV unlocking multiple chips
Joakim Tjernlund <joakim.tjernlund(a)infinera.com>
mtd: cfi_cmdset_0002: Use right chip in do_ppb_xxlock()
Tokunori Ikegami <ikegami(a)allied-telesis.co.jp>
mtd: cfi_cmdset_0002: Change write buffer to check correct value
Leon Romanovsky <leonro(a)mellanox.com>
RDMA/mlx4: Discard unknown SQP work requests
Mike Marciniszyn <mike.marciniszyn(a)intel.com>
IB/qib: Fix DMA api warning with debug kernel
Stefan M Schaeckeler <sschaeck(a)cisco.com>
of: unittest: for strings, account for trailing \0 in property length field
David Rivshin <DRivshin(a)allworx.com>
ARM: 8764/1: kgdb: fix NUMREGBYTES so that gdb_regs[] is the correct size
Mahesh Salgaonkar <mahesh(a)linux.vnet.ibm.com>
powerpc/fadump: Unregister fadump on kexec down path.
Gautham R. Shenoy <ego(a)linux.vnet.ibm.com>
cpuidle: powernv: Fix promotion from snooze if next state disabled
Michael Neuling <mikey(a)neuling.org>
powerpc/ptrace: Fix enforcement of DAWR constraints
Michael Neuling <mikey(a)neuling.org>
powerpc/ptrace: Fix setting 512B aligned breakpoints with PTRACE_SET_DEBUGREG
Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
powerpc/mm/hash: Add missing isync prior to kernel stack SLB switch
Miklos Szeredi <mszeredi(a)redhat.com>
fuse: fix control dir setup and teardown
Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
fuse: don't keep dead fuse_conn at fuse_fill_super().
Miklos Szeredi <mszeredi(a)redhat.com>
fuse: atomic_o_trunc should truncate pagecache
Amit Pundir <amit.pundir(a)linaro.org>
Bluetooth: hci_qca: Avoid missing rampatch failure with userspace fw loader
Corey Minyard <cminyard(a)mvista.com>
ipmi:bt: Set the timeout before doing a capabilities check
Mikulas Patocka <mpatocka(a)redhat.com>
branch-check: fix long->int truncation when profiling branches
Matthias Schiffer <mschiffer(a)universe-factory.net>
mips: ftrace: fix static function graph tracing
Geert Uytterhoeven <geert+renesas(a)glider.be>
lib/vsprintf: Remove atomic-unsafe support for %pCr
Alexander Sverdlin <alexander.sverdlin(a)gmail.com>
ASoC: cirrus: i2s: Fix {TX|RX}LinCtrlData setup
Alexander Sverdlin <alexander.sverdlin(a)gmail.com>
ASoC: cirrus: i2s: Fix LRCLK configuration
Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
ASoC: dapm: delete dapm_kcontrol_data paths list before freeing it
Ingo Flaschberger <ingo.flaschberger(a)gmail.com>
1wire: family module autoload fails because of upper/lower case mismatch.
Maxim Moseychuk <franchesko.salias.hudro.pedros(a)gmail.com>
usb: do not reset if a low-speed or full-speed device timed out
Eric W. Biederman <ebiederm(a)xmission.com>
signal/xtensa: Consistenly use SIGBUS in do_unaligned_user
Daniel Wagner <daniel.wagner(a)siemens.com>
serial: sh-sci: Use spin_{try}lock_irqsave instead of open coding version
Michael Schmitz <schmitzmic(a)gmail.com>
m68k/mm: Adjust VM area to be unmapped by gap size for __iounmap()
Dan Williams <dan.j.williams(a)intel.com>
x86/spectre_v1: Disable compiler optimizations over array_index_mask_nospec()
Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
fs/binfmt_misc.c: do not allow offset overflow
Stefan Potyra <Stefan.Potyra(a)elektrobit.com>
w1: mxc_w1: Enable clock before calling clk_get_rate() on it
Hans de Goede <hdegoede(a)redhat.com>
libata: Drop SanDisk SD7UB3Q*G1001 NOLPM quirk
Dan Carpenter <dan.carpenter(a)oracle.com>
libata: zpodd: small read overflow in eject_tray()
Colin Ian King <colin.king(a)canonical.com>
libata: zpodd: make arrays cdb static, reduces object code size
Tao Wang <kevin.wangtao(a)hisilicon.com>
cpufreq: Fix new policy initialization during limits updates via sysfs
Dennis Wassenberg <dennis.wassenberg(a)secunet.com>
ALSA: hda: add dock and led support for HP ProBook 640 G4
Dennis Wassenberg <dennis.wassenberg(a)secunet.com>
ALSA: hda: add dock and led support for HP EliteBook 830 G5
Bo Chen <chenbo(a)pdx.edu>
ALSA: hda - Handle kzalloc() failure in snd_hda_attach_pcm_stream()
Qu Wenruo <wqu(a)suse.com>
btrfs: scrub: Don't use inode pages for device replace
Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
driver core: Don't ignore class_dir_create_and_add() failure.
Jan Kara <jack(a)suse.cz>
ext4: fix fencepost error in check for inode count overflow during resize
Lukas Czerner <lczerner(a)redhat.com>
ext4: update mtime in ext4_punch_hole even if no blocks are released
Frank van der Linden <fllinden(a)amazon.com>
tcp: verify the checksum of the first data segment in a new connection
Xiangning Yu <yuxiangning(a)gmail.com>
bonding: re-evaluate force_primary when the primary slave name changes
Daniel Glöckner <dg(a)emlix.com>
usb: musb: fix remote wakeup racing with suspend
Liu Bo <bo.li.liu(a)oracle.com>
Btrfs: make raid6 rebuild retry more
Eric Dumazet <edumazet(a)google.com>
tcp: do not overshoot window_clamp in tcp_rcv_space_adjust()
Sasha Levin <Alexander.Levin(a)microsoft.com>
Revert "Btrfs: fix scrub to repair raid6 corruption"
Finn Thain <fthain(a)telegraphics.com.au>
net/sonic: Use dma_mapping_error()
Josh Hill <josh(a)joshuajhill.com>
net: qmi_wwan: Add Netgear Aircard 779S
Ivan Bornyakov <brnkv.i1(a)gmail.com>
atm: zatm: fix memcmp casting
Julian Anastasov <ja(a)ssi.bg>
ipvs: fix buffer overflow with sync daemon and service
Paolo Abeni <pabeni(a)redhat.com>
netfilter: ebtables: handle string from userspace with care
Eric Dumazet <edumazet(a)google.com>
xfrm6: avoid potential infinite loop in _decode_session6()
-------------
Diffstat:
Documentation/printk-formats.txt | 3 +-
Makefile | 4 +-
arch/arm/include/asm/kgdb.h | 2 +-
arch/m68k/mm/kmap.c | 3 +-
arch/mips/bcm47xx/setup.c | 6 +
arch/mips/include/asm/io.h | 2 +
arch/mips/include/asm/mipsregs.h | 3 +
arch/mips/kernel/mcount.S | 27 ++---
arch/powerpc/kernel/entry_64.S | 1 +
arch/powerpc/kernel/fadump.c | 3 +
arch/powerpc/kernel/hw_breakpoint.c | 4 +-
arch/powerpc/kernel/ptrace.c | 1 +
arch/x86/include/asm/barrier.h | 2 +-
arch/xtensa/kernel/traps.c | 2 +-
drivers/ata/libata-core.c | 3 -
drivers/ata/libata-zpodd.c | 4 +-
drivers/atm/zatm.c | 4 +-
drivers/base/core.c | 14 ++-
drivers/bluetooth/hci_qca.c | 6 +
drivers/char/ipmi/ipmi_bt_sm.c | 3 +-
drivers/cpufreq/cpufreq.c | 2 +
drivers/cpuidle/cpuidle-powernv.c | 32 +++++-
drivers/iio/buffer/kfifo_buf.c | 4 +-
drivers/infiniband/hw/mlx4/mad.c | 1 -
drivers/infiniband/hw/qib/qib.h | 3 +-
drivers/infiniband/hw/qib/qib_file_ops.c | 10 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 20 ++--
drivers/input/mouse/elan_i2c.h | 2 +
drivers/input/mouse/elan_i2c_core.c | 3 +-
drivers/input/mouse/elan_i2c_smbus.c | 10 +-
drivers/input/mouse/elantech.c | 11 +-
drivers/md/dm-thin.c | 11 +-
drivers/md/md.c | 4 +-
drivers/media/dvb-core/dvb_frontend.c | 23 ++--
drivers/media/usb/cx231xx/cx231xx-cards.c | 3 +
drivers/media/v4l2-core/v4l2-compat-ioctl32.c | 2 +-
drivers/mfd/intel-lpss.c | 4 +-
drivers/mtd/chips/cfi_cmdset_0002.c | 21 ++--
drivers/mtd/ubi/build.c | 3 +
drivers/mtd/ubi/wl.c | 4 +-
drivers/net/bonding/bond_options.c | 1 +
drivers/net/ethernet/natsemi/sonic.c | 2 +-
drivers/net/usb/cdc_ncm.c | 4 +-
drivers/net/usb/qmi_wwan.c | 1 +
drivers/nvdimm/bus.c | 14 ++-
drivers/of/unittest.c | 8 +-
drivers/pci/hotplug/pciehp.h | 2 +-
drivers/pci/hotplug/pciehp_core.c | 2 +-
drivers/pci/hotplug/pciehp_hpc.c | 13 ++-
drivers/s390/scsi/zfcp_dbf.c | 40 +++++++
drivers/s390/scsi/zfcp_erp.c | 123 ++++++++++++++++-----
drivers/s390/scsi/zfcp_ext.h | 5 +
drivers/s390/scsi/zfcp_scsi.c | 18 ++-
drivers/scsi/qla2xxx/qla_init.c | 3 +-
drivers/spi/spi.c | 10 +-
drivers/tty/serial/sh-sci.c | 8 +-
drivers/usb/core/hub.c | 4 +-
drivers/usb/musb/musb_host.c | 5 +-
drivers/usb/musb/musb_host.h | 7 +-
drivers/usb/musb/musb_virthub.c | 25 +++--
drivers/video/backlight/as3711_bl.c | 33 ++++--
drivers/video/backlight/max8925_bl.c | 4 +-
drivers/video/backlight/tps65217_bl.c | 4 +-
drivers/video/fbdev/uvesafb.c | 3 +-
drivers/w1/masters/mxc_w1.c | 20 ++--
drivers/w1/w1.c | 2 +-
drivers/xen/events/events_base.c | 2 -
fs/binfmt_misc.c | 12 +-
fs/btrfs/inode.c | 33 +++++-
fs/btrfs/ioctl.c | 12 +-
fs/btrfs/scrub.c | 2 +-
fs/ext4/inode.c | 36 +++---
fs/ext4/resize.c | 2 +-
fs/fuse/control.c | 13 ++-
fs/fuse/dir.c | 13 ++-
fs/fuse/inode.c | 1 +
fs/nfs/nfs4idmap.c | 5 +-
fs/nfsd/nfs4xdr.c | 5 +-
fs/ubifs/journal.c | 2 +-
fs/udf/directory.c | 3 +
include/linux/blkdev.h | 4 +-
include/linux/compiler.h | 2 +-
include/linux/iio/buffer.h | 6 +-
include/net/bluetooth/hci_core.h | 2 +-
kernel/time/time.c | 6 +-
lib/vsprintf.c | 3 -
net/bluetooth/hci_conn.c | 27 +++--
net/bluetooth/hci_event.c | 15 ++-
net/bridge/netfilter/ebtables.c | 3 +-
net/ipv4/tcp_input.c | 2 +-
net/ipv4/tcp_ipv4.c | 4 +
net/ipv6/tcp_ipv6.c | 4 +
net/ipv6/xfrm6_policy.c | 2 +-
net/netfilter/ipvs/ip_vs_ctl.c | 21 +++-
net/xfrm/xfrm_policy.c | 5 +
sound/pci/hda/hda_controller.c | 4 +-
sound/pci/hda/patch_conexant.c | 2 +
sound/pci/hda/patch_realtek.c | 1 +
sound/soc/cirrus/edb93xx.c | 2 +-
sound/soc/cirrus/ep93xx-i2s.c | 26 +++--
sound/soc/cirrus/snappercl15.c | 2 +-
sound/soc/soc-dapm.c | 2 +
tools/perf/util/dso.c | 2 +
.../perf/util/intel-pt-decoder/intel-pt-decoder.c | 23 +++-
.../perf/util/intel-pt-decoder/intel-pt-decoder.h | 9 ++
.../util/intel-pt-decoder/intel-pt-pkt-decoder.c | 2 +-
tools/perf/util/intel-pt.c | 5 +
107 files changed, 685 insertions(+), 273 deletions(-)
Hi Greg,
Please consider backporting commit 7ec916f82c48, which fixes an issue
with iwlwifi module loading in some cases. Fabio initially reported the
issue and confirmed reverting fixed the problem, and it has also been
reported by at least one Fedora user[0] as fixing the problem.
Thanks!
[0] https://bugzilla.redhat.com/show_bug.cgi?id=1607092
A VM which has:
- a DMA capable device passed through to it (eg. network card);
- running a malicious kernel that ignores H_PUT_TCE failure;
- capability of using IOMMU pages bigger that physical pages
can create an IOMMU mapping that exposes (for example) 16MB of
the host physical memory to the device when only 64K was allocated to the VM.
The remaining 16MB - 64K will be some other content of host memory, possibly
including pages of the VM, but also pages of host kernel memory, host
programs or other VMs.
The attacking VM does not control the location of the page it can map,
and is only allowed to map as many pages as it has pages of RAM.
We already have a check in drivers/vfio/vfio_iommu_spapr_tce.c that
an IOMMU page is contained in the physical page so the PCI hardware won't
get access to unassigned host memory; however this check is missing in
the KVM fastpath (H_PUT_TCE accelerated code). We were lucky so far and
did not hit this yet as the very first time when the mapping happens
we do not have tbl::it_userspace allocated yet and fall back to
the userspace which in turn calls VFIO IOMMU driver, this fails and
the guest does not retry,
This stores the smallest preregistered page size in the preregistered
region descriptor and changes the mm_iommu_xxx API to check this against
the IOMMU page size.
This calculates maximum page size as a minimum of the natural region
alignment and compound page size. For the page shift this uses the shift
returned by find_linux_pte() which indicates how the page is mapped to
the current userspace - if the page is huge and this is not a zero, then
it is a leaf pte and the page is mapped within the range.
Fixes: 121f80ba68f1 ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO")
Cc: stable(a)vger.kernel.org # v4.12+
Signed-off-by: Alexey Kardashevskiy <aik(a)ozlabs.ru>
Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
(cherry picked from commit 76fa4975f3ed12d15762bc979ca44078598ed8ee)
Signed-off-by: Alexey Kardashevskiy <aik(a)ozlabs.ru>
---
The original patch did not apply because of fad953ce which fixed
all vmalloc's to use array_size() so the backport is pretty trivial
and applies to v4.17 stable as well.
---
arch/powerpc/include/asm/mmu_context.h | 4 ++--
arch/powerpc/kvm/book3s_64_vio.c | 2 +-
arch/powerpc/kvm/book3s_64_vio_hv.c | 6 ++++--
arch/powerpc/mm/mmu_context_iommu.c | 37 ++++++++++++++++++++++++++++++++--
drivers/vfio/vfio_iommu_spapr_tce.c | 2 +-
5 files changed, 43 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 44fdf47..6f67ff5 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -35,9 +35,9 @@ extern struct mm_iommu_table_group_mem_t *mm_iommu_lookup_rm(
extern struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm,
unsigned long ua, unsigned long entries);
extern long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
- unsigned long ua, unsigned long *hpa);
+ unsigned long ua, unsigned int pageshift, unsigned long *hpa);
extern long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
- unsigned long ua, unsigned long *hpa);
+ unsigned long ua, unsigned int pageshift, unsigned long *hpa);
extern long mm_iommu_mapped_inc(struct mm_iommu_table_group_mem_t *mem);
extern void mm_iommu_mapped_dec(struct mm_iommu_table_group_mem_t *mem);
#endif
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 4dffa61..e14cec6 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -433,7 +433,7 @@ long kvmppc_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
/* This only handles v2 IOMMU type, v1 is handled via ioctl() */
return H_TOO_HARD;
- if (WARN_ON_ONCE(mm_iommu_ua_to_hpa(mem, ua, &hpa)))
+ if (WARN_ON_ONCE(mm_iommu_ua_to_hpa(mem, ua, tbl->it_page_shift, &hpa)))
return H_HARDWARE;
if (mm_iommu_mapped_inc(mem))
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index c32e9bfe..648cf6c 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -262,7 +262,8 @@ static long kvmppc_rm_tce_iommu_map(struct kvm *kvm, struct iommu_table *tbl,
if (!mem)
return H_TOO_HARD;
- if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, &hpa)))
+ if (WARN_ON_ONCE_RM(mm_iommu_ua_to_hpa_rm(mem, ua, tbl->it_page_shift,
+ &hpa)))
return H_HARDWARE;
pua = (void *) vmalloc_to_phys(pua);
@@ -431,7 +432,8 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
mem = mm_iommu_lookup_rm(vcpu->kvm->mm, ua, IOMMU_PAGE_SIZE_4K);
if (mem)
- prereg = mm_iommu_ua_to_hpa_rm(mem, ua, &tces) == 0;
+ prereg = mm_iommu_ua_to_hpa_rm(mem, ua,
+ IOMMU_PAGE_SHIFT_4K, &tces) == 0;
}
if (!prereg) {
diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
index e0a2d8e..8160559 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -19,6 +19,7 @@
#include <linux/hugetlb.h>
#include <linux/swap.h>
#include <asm/mmu_context.h>
+#include <asm/pte-walk.h>
static DEFINE_MUTEX(mem_list_mutex);
@@ -27,6 +28,7 @@ struct mm_iommu_table_group_mem_t {
struct rcu_head rcu;
unsigned long used;
atomic64_t mapped;
+ unsigned int pageshift;
u64 ua; /* userspace address */
u64 entries; /* number of entries in hpas[] */
u64 *hpas; /* vmalloc'ed */
@@ -126,6 +128,8 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
{
struct mm_iommu_table_group_mem_t *mem;
long i, j, ret = 0, locked_entries = 0;
+ unsigned int pageshift;
+ unsigned long flags;
struct page *page = NULL;
mutex_lock(&mem_list_mutex);
@@ -160,6 +164,12 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
goto unlock_exit;
}
+ /*
+ * For a starting point for a maximum page size calculation
+ * we use @ua and @entries natural alignment to allow IOMMU pages
+ * smaller than huge pages but still bigger than PAGE_SIZE.
+ */
+ mem->pageshift = __ffs(ua | (entries << PAGE_SHIFT));
mem->hpas = vzalloc(entries * sizeof(mem->hpas[0]));
if (!mem->hpas) {
kfree(mem);
@@ -200,6 +210,23 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
}
}
populate:
+ pageshift = PAGE_SHIFT;
+ if (PageCompound(page)) {
+ pte_t *pte;
+ struct page *head = compound_head(page);
+ unsigned int compshift = compound_order(head);
+
+ local_irq_save(flags); /* disables as well */
+ pte = find_linux_pte(mm->pgd, ua, NULL, &pageshift);
+ local_irq_restore(flags);
+
+ /* Double check it is still the same pinned page */
+ if (pte && pte_page(*pte) == head &&
+ pageshift == compshift)
+ pageshift = max_t(unsigned int, pageshift,
+ PAGE_SHIFT);
+ }
+ mem->pageshift = min(mem->pageshift, pageshift);
mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
}
@@ -350,7 +377,7 @@ struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm,
EXPORT_SYMBOL_GPL(mm_iommu_find);
long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
- unsigned long ua, unsigned long *hpa)
+ unsigned long ua, unsigned int pageshift, unsigned long *hpa)
{
const long entry = (ua - mem->ua) >> PAGE_SHIFT;
u64 *va = &mem->hpas[entry];
@@ -358,6 +385,9 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
if (entry >= mem->entries)
return -EFAULT;
+ if (pageshift > mem->pageshift)
+ return -EFAULT;
+
*hpa = *va | (ua & ~PAGE_MASK);
return 0;
@@ -365,7 +395,7 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem,
EXPORT_SYMBOL_GPL(mm_iommu_ua_to_hpa);
long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
- unsigned long ua, unsigned long *hpa)
+ unsigned long ua, unsigned int pageshift, unsigned long *hpa)
{
const long entry = (ua - mem->ua) >> PAGE_SHIFT;
void *va = &mem->hpas[entry];
@@ -374,6 +404,9 @@ long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem,
if (entry >= mem->entries)
return -EFAULT;
+ if (pageshift > mem->pageshift)
+ return -EFAULT;
+
pa = (void *) vmalloc_to_phys(va);
if (!pa)
return -EFAULT;
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
index b751dd6..b4c68f3 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -467,7 +467,7 @@ static int tce_iommu_prereg_ua_to_hpa(struct tce_container *container,
if (!mem)
return -EINVAL;
- ret = mm_iommu_ua_to_hpa(mem, tce, phpa);
+ ret = mm_iommu_ua_to_hpa(mem, tce, shift, phpa);
if (ret)
return -EINVAL;
--
2.11.0
xen/PVH: Set up GS segment for stack canary
commit 98014068328c5574de9a4a30b604111fd9d8f901 upstream
A 32bit PVH Xen kernel with CONFIG_CC_STACKPROTECTOR_STRONG fails to
boot. Xen detects a triple fault and kills the domain. The IP was
xen_prepare_pvh+9 corresponding to:
mov %gs:0x14,%eax
The 32bit kernel hasn't setup %gs when calling into xen_prepare_pvh.
Curiously, 64bit was not affected. The requested patch sets up the
canary for PVH to boot successfully.
This is applicable to and has been tested on 4.14. It is also
applicable to 4.17.
Thanks,
Jason
The patch
ASoC: zte: Fix incorrect PCM format bit usages
has been applied to the asoc tree at
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git
All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying
to this mail.
Thanks,
Mark
>From c889a45d229938a94b50aadb819def8bb11a6a54 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Wed, 25 Jul 2018 22:40:49 +0200
Subject: [PATCH] ASoC: zte: Fix incorrect PCM format bit usages
zx-tdm driver sets the DAI driver definitions with the format bits
wrongly set with SNDRV_PCM_FORMAT_*, instead of SNDRV_PCM_FMTBIT_*.
This patch corrects the definitions.
Spotted by a sparse warning:
sound/soc/zte/zx-tdm.c:363:35: warning: restricted snd_pcm_format_t degrades to integer
Fixes: 870e0ddc4345 ("ASoC: zx-tdm: add zte's tdm controller driver")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
sound/soc/zte/zx-tdm.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sound/soc/zte/zx-tdm.c b/sound/soc/zte/zx-tdm.c
index dc955272f58b..389272eeba9a 100644
--- a/sound/soc/zte/zx-tdm.c
+++ b/sound/soc/zte/zx-tdm.c
@@ -144,8 +144,8 @@ static void zx_tdm_rx_dma_en(struct zx_tdm_info *tdm, bool on)
#define ZX_TDM_RATES (SNDRV_PCM_RATE_8000 | SNDRV_PCM_RATE_16000)
#define ZX_TDM_FMTBIT \
- (SNDRV_PCM_FMTBIT_S16_LE | SNDRV_PCM_FORMAT_MU_LAW | \
- SNDRV_PCM_FORMAT_A_LAW)
+ (SNDRV_PCM_FMTBIT_S16_LE | SNDRV_PCM_FMTBIT_MU_LAW | \
+ SNDRV_PCM_FMTBIT_A_LAW)
static int zx_tdm_dai_probe(struct snd_soc_dai *dai)
{
--
2.18.0
Commit b1092c9af9ed ("bcache: allow quick writeback when backing idle")
allows the writeback rate to be faster if there is no I/O request on a
bcache device. It works well if there is only one bcache device attached
to the cache set. If there are many bcache devices attached to a cache
set, it may introduce performance regression because multiple faster
writeback threads of the idle bcache devices will compete the btree level
locks with the bcache device who have I/O requests coming.
This patch fixes the above issue by only permitting fast writebac when
all bcache devices attached on the cache set are idle. And if one of the
bcache devices has new I/O request coming, minimized all writeback
throughput immediately and let PI controller __update_writeback_rate()
to decide the upcoming writeback rate for each bcache device.
Also when all bcache devices are idle, limited wrieback rate to a small
number is wast of thoughput, especially when backing devices are slower
non-rotation devices (e.g. SATA SSD). This patch sets a max writeback
rate for each backing device if the whole cache set is idle. A faster
writeback rate in idle time means new I/Os may have more available space
for dirty data, and people may observe a better write performance then.
Please note bcache may change its cache mode in run time, and this patch
still works if the cache mode is switched from writeback mode and there
is still dirty data on cache.
Fixes: Commit b1092c9af9ed ("bcache: allow quick writeback when backing idle")
Cc: stable(a)vger.kernel.org #4.16+
Signed-off-by: Coly Li <colyli(a)suse.de>
Tested-by: Kai Krakow <kai(a)kaishome.de>
Tested-by: Stefan Priebe <s.priebe(a)profihost.ag>
Cc: Michael Lyle <mlyle(a)lyle.org>
---
drivers/md/bcache/bcache.h | 10 ++--
drivers/md/bcache/request.c | 54 ++++++++++++++++++++-
drivers/md/bcache/super.c | 4 ++
drivers/md/bcache/sysfs.c | 15 ++++--
drivers/md/bcache/util.c | 2 +-
drivers/md/bcache/util.h | 2 +-
drivers/md/bcache/writeback.c | 91 +++++++++++++++++++++++------------
7 files changed, 134 insertions(+), 44 deletions(-)
diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 5f7082aab1b0..97489573dedc 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -328,13 +328,6 @@ struct cached_dev {
*/
atomic_t has_dirty;
- /*
- * Set to zero by things that touch the backing volume-- except
- * writeback. Incremented by writeback. Used to determine when to
- * accelerate idle writeback.
- */
- atomic_t backing_idle;
-
struct bch_ratelimit writeback_rate;
struct delayed_work writeback_rate_update;
@@ -515,6 +508,8 @@ struct cache_set {
struct cache_accounting accounting;
unsigned long flags;
+ atomic_t idle_counter;
+ atomic_t at_max_writeback_rate;
struct cache_sb sb;
@@ -524,6 +519,7 @@ struct cache_set {
struct bcache_device **devices;
unsigned devices_max_used;
+ atomic_t attached_dev_nr;
struct list_head cached_devs;
uint64_t cached_dev_sectors;
atomic_long_t flash_dev_dirty_sectors;
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 91206f329971..86a977c2a176 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -1105,6 +1105,44 @@ static void detached_dev_do_request(struct bcache_device *d, struct bio *bio)
generic_make_request(bio);
}
+static void quit_max_writeback_rate(struct cache_set *c,
+ struct cached_dev *this_dc)
+{
+ int i;
+ struct bcache_device *d;
+ struct cached_dev *dc;
+
+ /*
+ * mutex bch_register_lock may compete with other parallel requesters,
+ * or attach/detach operations on other backing device. Waiting to
+ * the mutex lock may increase I/O request latency for seconds or more.
+ * To avoid such situation, if mutext_trylock() failed, only writeback
+ * rate of current cached device is set to 1, and __update_write_back()
+ * will decide writeback rate of other cached devices (remember now
+ * c->idle_counter is 0 already).
+ */
+ if (mutex_trylock(&bch_register_lock)) {
+ for (i = 0; i < c->devices_max_used; i++) {
+ if (!c->devices[i])
+ continue;
+
+ if (UUID_FLASH_ONLY(&c->uuids[i]))
+ continue;
+
+ d = c->devices[i];
+ dc = container_of(d, struct cached_dev, disk);
+ /*
+ * set writeback rate to default minimum value,
+ * then let update_writeback_rate() to decide the
+ * upcoming rate.
+ */
+ atomic_long_set(&dc->writeback_rate.rate, 1);
+ }
+ mutex_unlock(&bch_register_lock);
+ } else
+ atomic_long_set(&this_dc->writeback_rate.rate, 1);
+}
+
/* Cached devices - read & write stuff */
static blk_qc_t cached_dev_make_request(struct request_queue *q,
@@ -1122,7 +1160,21 @@ static blk_qc_t cached_dev_make_request(struct request_queue *q,
return BLK_QC_T_NONE;
}
- atomic_set(&dc->backing_idle, 0);
+ if (likely(d->c)) {
+ if (atomic_read(&d->c->idle_counter))
+ atomic_set(&d->c->idle_counter, 0);
+ /*
+ * If at_max_writeback_rate of cache set is true and new I/O
+ * comes, quit max writeback rate of all cached devices
+ * attached to this cache set, and set at_max_writeback_rate
+ * to false.
+ */
+ if (unlikely(atomic_read(&d->c->at_max_writeback_rate) == 1)) {
+ atomic_set(&d->c->at_max_writeback_rate, 0);
+ quit_max_writeback_rate(d->c, dc);
+ }
+ }
+
generic_start_io_acct(q, rw, bio_sectors(bio), &d->disk->part0);
bio_set_dev(bio, dc->bdev);
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index f517d7d1fa10..32b95f3b9461 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -696,6 +696,8 @@ static void bcache_device_detach(struct bcache_device *d)
{
lockdep_assert_held(&bch_register_lock);
+ atomic_dec(&d->c->attached_dev_nr);
+
if (test_bit(BCACHE_DEV_DETACHING, &d->flags)) {
struct uuid_entry *u = d->c->uuids + d->id;
@@ -1144,6 +1146,7 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c,
bch_cached_dev_run(dc);
bcache_device_link(&dc->disk, c, "bdev");
+ atomic_inc(&c->attached_dev_nr);
/* Allow the writeback thread to proceed */
up_write(&dc->writeback_lock);
@@ -1696,6 +1699,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
c->block_bits = ilog2(sb->block_size);
c->nr_uuids = bucket_bytes(c) / sizeof(struct uuid_entry);
c->devices_max_used = 0;
+ atomic_set(&c->attached_dev_nr, 0);
c->btree_pages = bucket_pages(c);
if (c->btree_pages > BTREE_MAX_PAGES)
c->btree_pages = max_t(int, c->btree_pages / 4,
diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
index 3e9d3459a224..6e88142514fb 100644
--- a/drivers/md/bcache/sysfs.c
+++ b/drivers/md/bcache/sysfs.c
@@ -171,7 +171,8 @@ SHOW(__bch_cached_dev)
var_printf(writeback_running, "%i");
var_print(writeback_delay);
var_print(writeback_percent);
- sysfs_hprint(writeback_rate, wb ? dc->writeback_rate.rate << 9 : 0);
+ sysfs_hprint(writeback_rate,
+ wb ? atomic_long_read(&dc->writeback_rate.rate) << 9 : 0);
sysfs_hprint(io_errors, atomic_read(&dc->io_errors));
sysfs_printf(io_error_limit, "%i", dc->error_limit);
sysfs_printf(io_disable, "%i", dc->io_disable);
@@ -193,7 +194,9 @@ SHOW(__bch_cached_dev)
* Except for dirty and target, other values should
* be 0 if writeback is not running.
*/
- bch_hprint(rate, wb ? dc->writeback_rate.rate << 9 : 0);
+ bch_hprint(rate,
+ wb ? atomic_long_read(&dc->writeback_rate.rate) << 9
+ : 0);
bch_hprint(dirty, bcache_dev_sectors_dirty(&dc->disk) << 9);
bch_hprint(target, dc->writeback_rate_target << 9);
bch_hprint(proportional,
@@ -261,8 +264,12 @@ STORE(__cached_dev)
sysfs_strtoul_clamp(writeback_percent, dc->writeback_percent, 0, 40);
- sysfs_strtoul_clamp(writeback_rate,
- dc->writeback_rate.rate, 1, INT_MAX);
+ if (attr == &sysfs_writeback_rate) {
+ int v;
+
+ sysfs_strtoul_clamp(writeback_rate, v, 1, INT_MAX);
+ atomic_long_set(&dc->writeback_rate.rate, v);
+ }
sysfs_strtoul_clamp(writeback_rate_update_seconds,
dc->writeback_rate_update_seconds,
diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
index f912c372978c..c6a99dfa1ad9 100644
--- a/drivers/md/bcache/util.c
+++ b/drivers/md/bcache/util.c
@@ -200,7 +200,7 @@ uint64_t bch_next_delay(struct bch_ratelimit *d, uint64_t done)
{
uint64_t now = local_clock();
- d->next += div_u64(done * NSEC_PER_SEC, d->rate);
+ d->next += div_u64(done * NSEC_PER_SEC, atomic_long_read(&d->rate));
/* Bound the time. Don't let us fall further than 2 seconds behind
* (this prevents unnecessary backlog that would make it impossible
diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
index a1579e28049f..5ff055f0a653 100644
--- a/drivers/md/bcache/util.h
+++ b/drivers/md/bcache/util.h
@@ -443,7 +443,7 @@ struct bch_ratelimit {
* Rate at which we want to do work, in units per second
* The units here correspond to the units passed to bch_next_delay()
*/
- uint32_t rate;
+ atomic_long_t rate;
};
static inline void bch_ratelimit_reset(struct bch_ratelimit *d)
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 912e969fedba..481d4cf38ac0 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -104,11 +104,56 @@ static void __update_writeback_rate(struct cached_dev *dc)
dc->writeback_rate_proportional = proportional_scaled;
dc->writeback_rate_integral_scaled = integral_scaled;
- dc->writeback_rate_change = new_rate - dc->writeback_rate.rate;
- dc->writeback_rate.rate = new_rate;
+ dc->writeback_rate_change = new_rate -
+ atomic_long_read(&dc->writeback_rate.rate);
+ atomic_long_set(&dc->writeback_rate.rate, new_rate);
dc->writeback_rate_target = target;
}
+static bool set_at_max_writeback_rate(struct cache_set *c,
+ struct cached_dev *dc)
+{
+ /*
+ * Idle_counter is increased everytime when update_writeback_rate() is
+ * called. If all backing devices attached to the same cache set have
+ * identical dc->writeback_rate_update_seconds values, it is about 6
+ * rounds of update_writeback_rate() on each backing device before
+ * c->at_max_writeback_rate is set to 1, and then max wrteback rate set
+ * to each dc->writeback_rate.rate.
+ * In order to avoid extra locking cost for counting exact dirty cached
+ * devices number, c->attached_dev_nr is used to calculate the idle
+ * throushold. It might be bigger if not all cached device are in write-
+ * back mode, but it still works well with limited extra rounds of
+ * update_writeback_rate().
+ */
+ if (atomic_inc_return(&c->idle_counter) <
+ atomic_read(&c->attached_dev_nr) * 6)
+ return false;
+
+ if (atomic_read(&c->at_max_writeback_rate) != 1)
+ atomic_set(&c->at_max_writeback_rate, 1);
+
+ atomic_long_set(&dc->writeback_rate.rate, INT_MAX);
+
+ /* keep writeback_rate_target as existing value */
+ dc->writeback_rate_proportional = 0;
+ dc->writeback_rate_integral_scaled = 0;
+ dc->writeback_rate_change = 0;
+
+ /*
+ * Check c->idle_counter and c->at_max_writeback_rate agagain in case
+ * new I/O arrives during before set_at_max_writeback_rate() returns.
+ * Then the writeback rate is set to 1, and its new value should be
+ * decided via __update_writeback_rate().
+ */
+ if ((atomic_read(&c->idle_counter) <
+ atomic_read(&c->attached_dev_nr) * 6) ||
+ !atomic_read(&c->at_max_writeback_rate))
+ return false;
+
+ return true;
+}
+
static void update_writeback_rate(struct work_struct *work)
{
struct cached_dev *dc = container_of(to_delayed_work(work),
@@ -136,13 +181,20 @@ static void update_writeback_rate(struct work_struct *work)
return;
}
- down_read(&dc->writeback_lock);
-
- if (atomic_read(&dc->has_dirty) &&
- dc->writeback_percent)
- __update_writeback_rate(dc);
+ if (atomic_read(&dc->has_dirty) && dc->writeback_percent) {
+ /*
+ * If the whole cache set is idle, set_at_max_writeback_rate()
+ * will set writeback rate to a max number. Then it is
+ * unncessary to update writeback rate for an idle cache set
+ * in maximum writeback rate number(s).
+ */
+ if (!set_at_max_writeback_rate(c, dc)) {
+ down_read(&dc->writeback_lock);
+ __update_writeback_rate(dc);
+ up_read(&dc->writeback_lock);
+ }
+ }
- up_read(&dc->writeback_lock);
/*
* CACHE_SET_IO_DISABLE might be set via sysfs interface,
@@ -422,27 +474,6 @@ static void read_dirty(struct cached_dev *dc)
delay = writeback_delay(dc, size);
- /* If the control system would wait for at least half a
- * second, and there's been no reqs hitting the backing disk
- * for awhile: use an alternate mode where we have at most
- * one contiguous set of writebacks in flight at a time. If
- * someone wants to do IO it will be quick, as it will only
- * have to contend with one operation in flight, and we'll
- * be round-tripping data to the backing disk as quickly as
- * it can accept it.
- */
- if (delay >= HZ / 2) {
- /* 3 means at least 1.5 seconds, up to 7.5 if we
- * have slowed way down.
- */
- if (atomic_inc_return(&dc->backing_idle) >= 3) {
- /* Wait for current I/Os to finish */
- closure_sync(&cl);
- /* And immediately launch a new set. */
- delay = 0;
- }
- }
-
while (!kthread_should_stop() &&
!test_bit(CACHE_SET_IO_DISABLE, &dc->disk.c->flags) &&
delay) {
@@ -741,7 +772,7 @@ void bch_cached_dev_writeback_init(struct cached_dev *dc)
dc->writeback_running = true;
dc->writeback_percent = 10;
dc->writeback_delay = 30;
- dc->writeback_rate.rate = 1024;
+ atomic_long_set(&dc->writeback_rate.rate, 1024);
dc->writeback_rate_minimum = 8;
dc->writeback_rate_update_seconds = WRITEBACK_RATE_UPDATE_SECS_DEFAULT;
--
2.17.1
There is a window for racing when printing directly to task->comm,
allowing other threads to see a non-terminated string. The vsnprintf
function fills the buffer, counts the truncated chars, then finally
writes the \0 at the end.
creator other
vsnprintf:
fill (not terminated)
count the rest trace_sched_waking(p):
... memcpy(comm, p->comm, TASK_COMM_LEN)
write \0
The consequences depend on how 'other' uses the string. In our case,
it was copied into the tracing system's saved cmdlines, a buffer of
adjacent TASK_COMM_LEN-byte buffers (note the 'n' where 0 should be):
crash-arm64> x/1024s savedcmd->saved_cmdlines | grep 'evenk'
0xffffffd5b3818640: "irq/497-pwr_evenkworker/u16:12"
...and a strcpy out of there would cause stack corruption:
[224761.522292] Kernel panic - not syncing: stack-protector:
Kernel stack is corrupted in: ffffff9bf9783c78
crash-arm64> kbt | grep 'comm\|trace_print_context'
#6 0xffffff9bf9783c78 in trace_print_context+0x18c(+396)
comm (char [16]) = "irq/497-pwr_even"
crash-arm64> rd 0xffffffd4d0e17d14 8
ffffffd4d0e17d14: 2f71726900000000 5f7277702d373934 ....irq/497-pwr_
ffffffd4d0e17d24: 726f776b6e657665 3a3631752f72656b evenkworker/u16:
ffffffd4d0e17d34: f9780248ff003231 cede60e0ffffff9b 12..H.x......`..
ffffffd4d0e17d44: cede60c8ffffffd4 00000fffffffffd4 .....`..........
The workaround in e09e28671 (use strlcpy in __trace_find_cmdline) was
likely needed because of this same bug.
Solved by vsnprintf:ing to a local buffer, then using set_task_comm().
This way, there won't be a window where comm is not terminated.
Cc: stable(a)vger.kernel.org
Fixes: bc0c38d139ec7 ("ftrace: latency tracer infrastructure")
Reviewed-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Signed-off-by: Snild Dolkow <snild(a)sony.com>
---
kernel/kthread.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 481951bf091d..1a481ae12dec 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -319,8 +319,14 @@ struct task_struct *__kthread_create_on_node(int (*threadfn)(void *data),
task = create->result;
if (!IS_ERR(task)) {
static const struct sched_param param = { .sched_priority = 0 };
+ char name[TASK_COMM_LEN];
- vsnprintf(task->comm, sizeof(task->comm), namefmt, args);
+ /*
+ * task is already visible to other tasks, so updating
+ * COMM must be protected.
+ */
+ vsnprintf(name, sizeof(name), namefmt, args);
+ set_task_comm(task, name);
/*
* root may have changed our (kthreadd's) priority or CPU mask.
* The kernel thread should not inherit these properties.
--
2.15.1
Commit b1092c9af9ed ("bcache: allow quick writeback when backing idle")
allows the writeback rate to be faster if there is no I/O request on a
bcache device. It works well if there is only one bcache device attached
to the cache set. If there are many bcache devices attached to a cache
set, it may introduce performance regression because multiple faster
writeback threads of the idle bcache devices will compete the btree level
locks with the bcache device who have I/O requests coming.
This patch fixes the above issue by only permitting fast writebac when
all bcache devices attached on the cache set are idle. And if one of the
bcache devices has new I/O request coming, minimized all writeback
throughput immediately and let PI controller __update_writeback_rate()
to decide the upcoming writeback rate for each bcache device.
Also when all bcache devices are idle, limited wrieback rate to a small
number is wast of thoughput, especially when backing devices are slower
non-rotation devices (e.g. SATA SSD). This patch sets a max writeback
rate for each backing device if the whole cache set is idle. A faster
writeback rate in idle time means new I/Os may have more available space
for dirty data, and people may observe a better write performance then.
Please note bcache may change its cache mode in run time, and this patch
still works if the cache mode is switched from writeback mode and there
is still dirty data on cache.
Fixes: Commit b1092c9af9ed ("bcache: allow quick writeback when backing idle")
Cc: stable(a)vger.kernel.org #4.16+
Signed-off-by: Coly Li <colyli(a)suse.de>
Tested-by: Kai Krakow <kai(a)kaishome.de>
Cc: Michael Lyle <mlyle(a)lyle.org>
Cc: Stefan Priebe <s.priebe(a)profihost.ag>
---
Channgelog:
v3, Do not acquire bch_register_lock in set_at_max_writeback_rate().
v2, Fix a deadlock reported by Stefan Priebe.
v1, Initial version.
drivers/md/bcache/bcache.h | 10 ++--
drivers/md/bcache/request.c | 54 ++++++++++++++++++++-
drivers/md/bcache/super.c | 4 ++
drivers/md/bcache/sysfs.c | 14 ++++--
drivers/md/bcache/util.c | 2 +-
drivers/md/bcache/util.h | 2 +-
drivers/md/bcache/writeback.c | 91 +++++++++++++++++++++++------------
7 files changed, 133 insertions(+), 44 deletions(-)
diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 872ef4d67711..13f908be42ba 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -328,13 +328,6 @@ struct cached_dev {
*/
atomic_t has_dirty;
- /*
- * Set to zero by things that touch the backing volume-- except
- * writeback. Incremented by writeback. Used to determine when to
- * accelerate idle writeback.
- */
- atomic_t backing_idle;
-
struct bch_ratelimit writeback_rate;
struct delayed_work writeback_rate_update;
@@ -515,6 +508,8 @@ struct cache_set {
struct cache_accounting accounting;
unsigned long flags;
+ atomic_t idle_counter;
+ atomic_t at_max_writeback_rate;
struct cache_sb sb;
@@ -524,6 +519,7 @@ struct cache_set {
struct bcache_device **devices;
unsigned devices_max_used;
+ atomic_t attached_dev_nr;
struct list_head cached_devs;
uint64_t cached_dev_sectors;
atomic_long_t flash_dev_dirty_sectors;
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 8eece9ef9f46..26f97acde403 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -1105,6 +1105,44 @@ static void detached_dev_do_request(struct bcache_device *d, struct bio *bio)
generic_make_request(bio);
}
+static void quit_max_writeback_rate(struct cache_set *c,
+ struct cached_dev *this_dc)
+{
+ int i;
+ struct bcache_device *d;
+ struct cached_dev *dc;
+
+ /*
+ * mutex bch_register_lock may compete with other parallel requesters,
+ * or attach/detach operations on other backing device. Waiting to
+ * the mutex lock may increase I/O request latency for seconds or more.
+ * To avoid such situation, if mutext_trylock() failed, only writeback
+ * rate of current cached device is set to 1, and __update_write_back()
+ * will decide writeback rate of other cached devices (remember now
+ * c->idle_counter is 0 already).
+ */
+ if (mutex_trylock(&bch_register_lock)) {
+ for (i = 0; i < c->devices_max_used; i++) {
+ if (!c->devices[i])
+ continue;
+
+ if (UUID_FLASH_ONLY(&c->uuids[i]))
+ continue;
+
+ d = c->devices[i];
+ dc = container_of(d, struct cached_dev, disk);
+ /*
+ * set writeback rate to default minimum value,
+ * then let update_writeback_rate() to decide the
+ * upcoming rate.
+ */
+ atomic_long_set(&dc->writeback_rate.rate, 1);
+ }
+ mutex_unlock(&bch_register_lock);
+ } else
+ atomic_long_set(&this_dc->writeback_rate.rate, 1);
+}
+
/* Cached devices - read & write stuff */
static blk_qc_t cached_dev_make_request(struct request_queue *q,
@@ -1122,7 +1160,21 @@ static blk_qc_t cached_dev_make_request(struct request_queue *q,
return BLK_QC_T_NONE;
}
- atomic_set(&dc->backing_idle, 0);
+ if (likely(d->c)) {
+ if (atomic_read(&d->c->idle_counter))
+ atomic_set(&d->c->idle_counter, 0);
+ /*
+ * If at_max_writeback_rate of cache set is true and new I/O
+ * comes, quit max writeback rate of all cached devices
+ * attached to this cache set, and set at_max_writeback_rate
+ * to false.
+ */
+ if (unlikely(atomic_read(&d->c->at_max_writeback_rate) == 1)) {
+ atomic_set(&d->c->at_max_writeback_rate, 0);
+ quit_max_writeback_rate(d->c, dc);
+ }
+ }
+
generic_start_io_acct(q, rw, bio_sectors(bio), &d->disk->part0);
bio_set_dev(bio, dc->bdev);
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index e0a92104ca23..8db6696e2bff 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -696,6 +696,8 @@ static void bcache_device_detach(struct bcache_device *d)
{
lockdep_assert_held(&bch_register_lock);
+ atomic_dec(&d->c->attached_dev_nr);
+
if (test_bit(BCACHE_DEV_DETACHING, &d->flags)) {
struct uuid_entry *u = d->c->uuids + d->id;
@@ -1144,6 +1146,7 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c,
bch_cached_dev_run(dc);
bcache_device_link(&dc->disk, c, "bdev");
+ atomic_inc(&c->attached_dev_nr);
/* Allow the writeback thread to proceed */
up_write(&dc->writeback_lock);
@@ -1695,6 +1698,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
c->block_bits = ilog2(sb->block_size);
c->nr_uuids = bucket_bytes(c) / sizeof(struct uuid_entry);
c->devices_max_used = 0;
+ atomic_set(&c->attached_dev_nr, 0);
c->btree_pages = bucket_pages(c);
if (c->btree_pages > BTREE_MAX_PAGES)
c->btree_pages = max_t(int, c->btree_pages / 4,
diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
index 225b15aa0340..a56067e80b10 100644
--- a/drivers/md/bcache/sysfs.c
+++ b/drivers/md/bcache/sysfs.c
@@ -170,7 +170,8 @@ SHOW(__bch_cached_dev)
var_printf(writeback_running, "%i");
var_print(writeback_delay);
var_print(writeback_percent);
- sysfs_hprint(writeback_rate, dc->writeback_rate.rate << 9);
+ sysfs_hprint(writeback_rate,
+ atomic_long_read(&dc->writeback_rate.rate) << 9);
sysfs_hprint(io_errors, atomic_read(&dc->io_errors));
sysfs_printf(io_error_limit, "%i", dc->error_limit);
sysfs_printf(io_disable, "%i", dc->io_disable);
@@ -188,7 +189,8 @@ SHOW(__bch_cached_dev)
char change[20];
s64 next_io;
- bch_hprint(rate, dc->writeback_rate.rate << 9);
+ bch_hprint(rate,
+ atomic_long_read(&dc->writeback_rate.rate) << 9);
bch_hprint(dirty, bcache_dev_sectors_dirty(&dc->disk) << 9);
bch_hprint(target, dc->writeback_rate_target << 9);
bch_hprint(proportional,dc->writeback_rate_proportional << 9);
@@ -255,8 +257,12 @@ STORE(__cached_dev)
sysfs_strtoul_clamp(writeback_percent, dc->writeback_percent, 0, 40);
- sysfs_strtoul_clamp(writeback_rate,
- dc->writeback_rate.rate, 1, INT_MAX);
+ if (attr == &sysfs_writeback_rate) {
+ int v;
+
+ sysfs_strtoul_clamp(writeback_rate, v, 1, INT_MAX);
+ atomic_long_set(&dc->writeback_rate.rate, v);
+ }
sysfs_strtoul_clamp(writeback_rate_update_seconds,
dc->writeback_rate_update_seconds,
diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
index f912c372978c..c6a99dfa1ad9 100644
--- a/drivers/md/bcache/util.c
+++ b/drivers/md/bcache/util.c
@@ -200,7 +200,7 @@ uint64_t bch_next_delay(struct bch_ratelimit *d, uint64_t done)
{
uint64_t now = local_clock();
- d->next += div_u64(done * NSEC_PER_SEC, d->rate);
+ d->next += div_u64(done * NSEC_PER_SEC, atomic_long_read(&d->rate));
/* Bound the time. Don't let us fall further than 2 seconds behind
* (this prevents unnecessary backlog that would make it impossible
diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
index a1579e28049f..5ff055f0a653 100644
--- a/drivers/md/bcache/util.h
+++ b/drivers/md/bcache/util.h
@@ -443,7 +443,7 @@ struct bch_ratelimit {
* Rate at which we want to do work, in units per second
* The units here correspond to the units passed to bch_next_delay()
*/
- uint32_t rate;
+ atomic_long_t rate;
};
static inline void bch_ratelimit_reset(struct bch_ratelimit *d)
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 912e969fedba..907fa6c0d192 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -104,11 +104,56 @@ static void __update_writeback_rate(struct cached_dev *dc)
dc->writeback_rate_proportional = proportional_scaled;
dc->writeback_rate_integral_scaled = integral_scaled;
- dc->writeback_rate_change = new_rate - dc->writeback_rate.rate;
- dc->writeback_rate.rate = new_rate;
+ dc->writeback_rate_change = new_rate -
+ atomic_long_read(&dc->writeback_rate.rate);
+ atomic_long_set(&dc->writeback_rate.rate, new_rate);
dc->writeback_rate_target = target;
}
+static bool set_at_max_writeback_rate(struct cache_set *c,
+ struct cached_dev *dc)
+{
+ /*
+ * Idle_counter is increased everytime when update_writeback_rate() is
+ * called. If all backing devices attached to the same cache set have
+ * identical dc->writeback_rate_update_seconds values, it is about 6
+ * rounds of update_writeback_rate() on each backing device before
+ * c->at_max_writeback_rate is set to 1, and then max wrteback rate set
+ * to each dc->writeback_rate.rate.
+ * In order to avoid extra locking cost for counting exact dirty cached
+ * devices number, c->attached_dev_nr is used to calculate the idle
+ * throushold. It might be bigger if not all cached device are in write-
+ * back mode, but it still works well with limited extra rounds of
+ * update_writeback_rate().
+ */
+ if (atomic_inc_return(&c->idle_counter) <
+ atomic_read(&c->attached_dev_nr) * 6)
+ return false;
+
+ if (atomic_read(&c->at_max_writeback_rate) != 1)
+ atomic_set(&c->at_max_writeback_rate, 1);
+
+ atomic_long_set(&dc->writeback_rate.rate, INT_MAX);
+
+ /* keep writeback_rate_target as existing value */
+ dc->writeback_rate_proportional = 0;
+ dc->writeback_rate_integral_scaled = 0;
+ dc->writeback_rate_change = 0;
+
+ /*
+ * Check c->idle_counter and c->at_max_writeback_rate agagain in case
+ * new I/O arrives during before set_at_max_writeback_rate() returns.
+ * Then the writeback rate is set to 1, and its new value should be
+ * decided via __update_writeback_rate().
+ */
+ if ((atomic_read(&c->idle_counter) <
+ atomic_read(&c->attached_dev_nr) * 6) ||
+ !atomic_read(&c->at_max_writeback_rate))
+ return false;
+
+ return true;
+}
+
static void update_writeback_rate(struct work_struct *work)
{
struct cached_dev *dc = container_of(to_delayed_work(work),
@@ -136,13 +181,20 @@ static void update_writeback_rate(struct work_struct *work)
return;
}
- down_read(&dc->writeback_lock);
-
- if (atomic_read(&dc->has_dirty) &&
- dc->writeback_percent)
- __update_writeback_rate(dc);
+ if (atomic_read(&dc->has_dirty) && dc->writeback_percent) {
+ /*
+ * If the whole cache set is idle, set_at_max_writeback_rate()
+ * will set writeback rate to a max number. Then it is
+ * unncessary to update writeback rate for an idle cache set
+ * in maximum writeback rate number(s).
+ */
+ if (!set_at_max_writeback_rate(c, dc)) {
+ down_read(&dc->writeback_lock);
+ __update_writeback_rate(dc);
+ up_read(&dc->writeback_lock);
+ }
+ }
- up_read(&dc->writeback_lock);
/*
* CACHE_SET_IO_DISABLE might be set via sysfs interface,
@@ -422,27 +474,6 @@ static void read_dirty(struct cached_dev *dc)
delay = writeback_delay(dc, size);
- /* If the control system would wait for at least half a
- * second, and there's been no reqs hitting the backing disk
- * for awhile: use an alternate mode where we have at most
- * one contiguous set of writebacks in flight at a time. If
- * someone wants to do IO it will be quick, as it will only
- * have to contend with one operation in flight, and we'll
- * be round-tripping data to the backing disk as quickly as
- * it can accept it.
- */
- if (delay >= HZ / 2) {
- /* 3 means at least 1.5 seconds, up to 7.5 if we
- * have slowed way down.
- */
- if (atomic_inc_return(&dc->backing_idle) >= 3) {
- /* Wait for current I/Os to finish */
- closure_sync(&cl);
- /* And immediately launch a new set. */
- delay = 0;
- }
- }
-
while (!kthread_should_stop() &&
!test_bit(CACHE_SET_IO_DISABLE, &dc->disk.c->flags) &&
delay) {
@@ -741,7 +772,7 @@ void bch_cached_dev_writeback_init(struct cached_dev *dc)
dc->writeback_running = true;
dc->writeback_percent = 10;
dc->writeback_delay = 30;
- dc->writeback_rate.rate = 1024;
+ atomic_long_set(&dc->writeback_rate.rate, 1024);
dc->writeback_rate_minimum = 8;
dc->writeback_rate_update_seconds = WRITEBACK_RATE_UPDATE_SECS_DEFAULT;
--
2.17.1
Some of the MSRs returned by GET_MSR_INDEX_LIST currently cannot be sent back
to KVM_GET_MSR and/or KVM_SET_MSR; either they can never be sent back, or you
they are only accepted under special conditions. This makes the API a pain to
use.
To avoid this pain, this patch makes it so that the result of the get-list
ioctl can always be used for host-initiated get and set. Since we don't have
a separate way to check for read-only MSRs, this means some Hyper-V MSRs are
ignored when written. Arguably they should not even be in the result of
GET_MSR_INDEX_LIST, but I am leaving there in case userspace is using the
outcome of GET_MSR_INDEX_LIST to derive the support for the corresponding
Hyper-V feature.
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kvm/hyperv.c | 27 ++++++++++++++++++++-------
arch/x86/kvm/hyperv.h | 2 +-
arch/x86/kvm/x86.c | 15 +++++++++------
3 files changed, 30 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index af8caf965baa..01d209ab5481 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -235,7 +235,7 @@ static int synic_set_msr(struct kvm_vcpu_hv_synic *synic,
struct kvm_vcpu *vcpu = synic_to_vcpu(synic);
int ret;
- if (!synic->active)
+ if (!synic->active && !host)
return 1;
trace_kvm_hv_synic_set_msr(vcpu->vcpu_id, msr, data, host);
@@ -295,11 +295,12 @@ static int synic_set_msr(struct kvm_vcpu_hv_synic *synic,
return ret;
}
-static int synic_get_msr(struct kvm_vcpu_hv_synic *synic, u32 msr, u64 *pdata)
+static int synic_get_msr(struct kvm_vcpu_hv_synic *synic, u32 msr, u64 *pdata,
+ bool host)
{
int ret;
- if (!synic->active)
+ if (!synic->active && !host)
return 1;
ret = 0;
@@ -1014,6 +1015,11 @@ static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data,
case HV_X64_MSR_TSC_EMULATION_STATUS:
hv->hv_tsc_emulation_status = data;
break;
+ case HV_X64_MSR_TIME_REF_COUNT:
+ /* read-only, but still ignore it if host-initiated */
+ if (!host)
+ return 1;
+ break;
default:
vcpu_unimpl(vcpu, "Hyper-V uhandled wrmsr: 0x%x data 0x%llx\n",
msr, data);
@@ -1101,6 +1107,12 @@ static int kvm_hv_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool host)
return stimer_set_count(vcpu_to_stimer(vcpu, timer_index),
data, host);
}
+ case HV_X64_MSR_TSC_FREQUENCY:
+ case HV_X64_MSR_APIC_FREQUENCY:
+ /* read-only, but still ignore it if host-initiated */
+ if (!host)
+ return 1;
+ break;
default:
vcpu_unimpl(vcpu, "Hyper-V uhandled wrmsr: 0x%x data 0x%llx\n",
msr, data);
@@ -1156,7 +1168,8 @@ static int kvm_hv_get_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
return 0;
}
-static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
+static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata,
+ bool host)
{
u64 data = 0;
struct kvm_vcpu_hv *hv = &vcpu->arch.hyperv;
@@ -1183,7 +1196,7 @@ static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
case HV_X64_MSR_SIMP:
case HV_X64_MSR_EOM:
case HV_X64_MSR_SINT0 ... HV_X64_MSR_SINT15:
- return synic_get_msr(vcpu_to_synic(vcpu), msr, pdata);
+ return synic_get_msr(vcpu_to_synic(vcpu), msr, pdata, host);
case HV_X64_MSR_STIMER0_CONFIG:
case HV_X64_MSR_STIMER1_CONFIG:
case HV_X64_MSR_STIMER2_CONFIG:
@@ -1229,7 +1242,7 @@ int kvm_hv_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool host)
return kvm_hv_set_msr(vcpu, msr, data, host);
}
-int kvm_hv_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
+int kvm_hv_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
{
if (kvm_hv_msr_partition_wide(msr)) {
int r;
@@ -1239,7 +1252,7 @@ int kvm_hv_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
mutex_unlock(&vcpu->kvm->arch.hyperv.hv_lock);
return r;
} else
- return kvm_hv_get_msr(vcpu, msr, pdata);
+ return kvm_hv_get_msr(vcpu, msr, pdata, host);
}
static __always_inline int get_sparse_bank_no(u64 valid_bank_mask, int bank_no)
diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index 837465d69c6d..d6aa969e20f1 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -48,7 +48,7 @@ static inline struct kvm_vcpu *synic_to_vcpu(struct kvm_vcpu_hv_synic *synic)
}
int kvm_hv_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool host);
-int kvm_hv_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
+int kvm_hv_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host);
bool kvm_hv_hypercall_enabled(struct kvm *kvm);
int kvm_hv_hypercall(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 153564db7980..f2876053e28b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2166,10 +2166,11 @@ static int set_msr_mce(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
vcpu->arch.mcg_status = data;
break;
case MSR_IA32_MCG_CTL:
- if (!(mcg_cap & MCG_CTL_P))
+ if (!(mcg_cap & MCG_CTL_P) &&
+ (data || !msr_info->host_initiated))
return 1;
if (data != 0 && data != ~(u64)0)
- return -1;
+ return 1;
vcpu->arch.mcg_ctl = data;
break;
default:
@@ -2557,7 +2558,7 @@ int kvm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
}
EXPORT_SYMBOL_GPL(kvm_get_msr);
-static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
+static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
{
u64 data;
u64 mcg_cap = vcpu->arch.mcg_cap;
@@ -2572,7 +2573,7 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
data = vcpu->arch.mcg_cap;
break;
case MSR_IA32_MCG_CTL:
- if (!(mcg_cap & MCG_CTL_P))
+ if (!(mcg_cap & MCG_CTL_P) && !host)
return 1;
data = vcpu->arch.mcg_ctl;
break;
@@ -2705,7 +2706,8 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_MCG_CTL:
case MSR_IA32_MCG_STATUS:
case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
- return get_msr_mce(vcpu, msr_info->index, &msr_info->data);
+ return get_msr_mce(vcpu, msr_info->index, &msr_info->data,
+ msr_info->host_initiated);
case MSR_K7_CLK_CTL:
/*
* Provide expected ramp-up count for K7. All other
@@ -2726,7 +2728,8 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case HV_X64_MSR_TSC_EMULATION_CONTROL:
case HV_X64_MSR_TSC_EMULATION_STATUS:
return kvm_hv_get_msr_common(vcpu,
- msr_info->index, &msr_info->data);
+ msr_info->index, &msr_info->data,
+ msr_info->host_initiated);
break;
case MSR_IA32_BBL_CR_CTL3:
/* This legacy MSR exists but isn't fully documented in current
--
2.17.1
This is a note to let you know that I've just added the patch titled
iio: ad9523: Fix displayed phase
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 5a4e33c1c53ae7d4425f7d94e60e4458a37b349e Mon Sep 17 00:00:00 2001
From: Lars-Peter Clausen <lars(a)metafoo.de>
Date: Mon, 25 Jun 2018 11:03:07 +0300
Subject: iio: ad9523: Fix displayed phase
Fix the displayed phase for the ad9523 driver. Currently the most
significant decimal place is dropped and all other digits are shifted one
to the left. This is due to a multiplication by 10, which is not necessary,
so remove it.
Signed-off-by: Lars-Peter Clausen <lars(a)metafoo.de>
Signed-off-by: Alexandru Ardelean <alexandru.ardelean(a)analog.com>
Fixes: cd1678f9632 ("iio: frequency: New driver for AD9523 SPI Low Jitter Clock Generator")
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/frequency/ad9523.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iio/frequency/ad9523.c b/drivers/iio/frequency/ad9523.c
index 48ea46a1bc38..37504739c277 100644
--- a/drivers/iio/frequency/ad9523.c
+++ b/drivers/iio/frequency/ad9523.c
@@ -653,7 +653,7 @@ static int ad9523_read_raw(struct iio_dev *indio_dev,
code = (AD9523_CLK_DIST_DIV_PHASE_REV(ret) * 3141592) /
AD9523_CLK_DIST_DIV_REV(ret);
*val = code / 1000000;
- *val2 = (code % 1000000) * 10;
+ *val2 = code % 1000000;
return IIO_VAL_INT_PLUS_MICRO;
default:
return -EINVAL;
--
2.18.0
This is a note to let you know that I've just added the patch titled
iio: sca3000: Fix missing return in switch
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From c5b974bee9d2ceae4c441ae5a01e498c2674e100 Mon Sep 17 00:00:00 2001
From: "Gustavo A. R. Silva" <gustavo(a)embeddedor.com>
Date: Sat, 7 Jul 2018 12:44:01 -0500
Subject: iio: sca3000: Fix missing return in switch
The IIO_CHAN_INFO_LOW_PASS_FILTER_3DB_FREQUENCY case is missing a
return and will fall through to the default case and errorenously
return -EINVAL.
Fix this by adding in missing *return ret*.
Fixes: 626f971b5b07 ("staging:iio:accel:sca3000 Add write support to the low pass filter control")
Reported-by: Jonathan Cameron <jic23(a)kernel.org>
Signed-off-by: Gustavo A. R. Silva <gustavo(a)embeddedor.com>
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/accel/sca3000.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/iio/accel/sca3000.c b/drivers/iio/accel/sca3000.c
index 4dceb75e3586..4964561595f5 100644
--- a/drivers/iio/accel/sca3000.c
+++ b/drivers/iio/accel/sca3000.c
@@ -797,6 +797,7 @@ static int sca3000_write_raw(struct iio_dev *indio_dev,
mutex_lock(&st->lock);
ret = sca3000_write_3db_freq(st, val);
mutex_unlock(&st->lock);
+ return ret;
default:
return -EINVAL;
}
--
2.18.0
Tree/Branch: v4.4.144
Git describe: v4.4.144
Commit: 762b585c49 Linux 4.4.144
Build Time: 0 min 5 sec
Passed: 7 / 7 (100.00 %)
Failed: 0 / 7 ( 0.00 %)
Errors: 0
Warnings: 17
Section Mismatches: 0
-------------------------------------------------------------------------------
defconfigs with issues (other than build errors):
17 warnings 0 mismatches : x86_64-allmodconfig
-------------------------------------------------------------------------------
Warnings Summary: 17
1 ../drivers/net/ethernet/rocker/rocker.c:2172:1: warning: the frame size of 2752 bytes is larger than 2048 bytes [-Wframe-larger-than=]
1 ../drivers/media/dvb-frontends/stv090x.c:4759:1: warning: the frame size of 2056 bytes is larger than 2048 bytes [-Wframe-larger-than=]
1 ../drivers/media/dvb-frontends/stv090x.c:4565:1: warning: the frame size of 2096 bytes is larger than 2048 bytes [-Wframe-larger-than=]
1 ../drivers/media/dvb-frontends/stv090x.c:4250:1: warning: the frame size of 4832 bytes is larger than 2048 bytes [-Wframe-larger-than=]
1 ../drivers/media/dvb-frontends/stv090x.c:3436:1: warning: the frame size of 5280 bytes is larger than 2048 bytes [-Wframe-larger-than=]
1 ../drivers/media/dvb-frontends/stv090x.c:3095:1: warning: the frame size of 5864 bytes is larger than 2048 bytes [-Wframe-larger-than=]
1 ../drivers/media/dvb-frontends/stv090x.c:2513:1: warning: the frame size of 2304 bytes is larger than 2048 bytes [-Wframe-larger-than=]
1 ../drivers/media/dvb-frontends/stv090x.c:2141:1: warning: the frame size of 2104 bytes is larger than 2048 bytes [-Wframe-larger-than=]
1 ../drivers/media/dvb-frontends/stv090x.c:2073:1: warning: the frame size of 2552 bytes is larger than 2048 bytes [-Wframe-larger-than=]
1 ../drivers/media/dvb-frontends/stv090x.c:1956:1: warning: the frame size of 3264 bytes is larger than 2048 bytes [-Wframe-larger-than=]
1 ../drivers/media/dvb-frontends/stv090x.c:1858:1: warning: the frame size of 3008 bytes is larger than 2048 bytes [-Wframe-larger-than=]
1 ../drivers/media/dvb-frontends/stv090x.c:1599:1: warning: the frame size of 5296 bytes is larger than 2048 bytes [-Wframe-larger-than=]
1 ../drivers/media/dvb-frontends/stv090x.c:1211:1: warning: the frame size of 2080 bytes is larger than 2048 bytes [-Wframe-larger-than=]
1 ../drivers/media/dvb-frontends/stv090x.c:1168:1: warning: the frame size of 2080 bytes is larger than 2048 bytes [-Wframe-larger-than=]
1 ../drivers/media/dvb-frontends/stv0367.c:3147:1: warning: the frame size of 4144 bytes is larger than 2048 bytes [-Wframe-larger-than=]
1 ../drivers/media/dvb-frontends/cxd2841er.c:2401:1: warning: the frame size of 2984 bytes is larger than 2048 bytes [-Wframe-larger-than=]
1 ../drivers/media/dvb-frontends/cxd2841er.c:2282:1: warning: the frame size of 4328 bytes is larger than 2048 bytes [-Wframe-larger-than=]
===============================================================================
Detailed per-defconfig build reports below:
-------------------------------------------------------------------------------
x86_64-allmodconfig : PASS, 0 errors, 17 warnings, 0 section mismatches
Warnings:
../drivers/media/dvb-frontends/stv090x.c:1858:1: warning: the frame size of 3008 bytes is larger than 2048 bytes [-Wframe-larger-than=]
../drivers/media/dvb-frontends/stv090x.c:2141:1: warning: the frame size of 2104 bytes is larger than 2048 bytes [-Wframe-larger-than=]
../drivers/media/dvb-frontends/stv090x.c:2513:1: warning: the frame size of 2304 bytes is larger than 2048 bytes [-Wframe-larger-than=]
../drivers/media/dvb-frontends/stv090x.c:4565:1: warning: the frame size of 2096 bytes is larger than 2048 bytes [-Wframe-larger-than=]
../drivers/media/dvb-frontends/stv090x.c:1956:1: warning: the frame size of 3264 bytes is larger than 2048 bytes [-Wframe-larger-than=]
../drivers/media/dvb-frontends/stv090x.c:1599:1: warning: the frame size of 5296 bytes is larger than 2048 bytes [-Wframe-larger-than=]
../drivers/media/dvb-frontends/stv090x.c:1211:1: warning: the frame size of 2080 bytes is larger than 2048 bytes [-Wframe-larger-than=]
../drivers/media/dvb-frontends/stv090x.c:4250:1: warning: the frame size of 4832 bytes is larger than 2048 bytes [-Wframe-larger-than=]
../drivers/media/dvb-frontends/stv090x.c:4759:1: warning: the frame size of 2056 bytes is larger than 2048 bytes [-Wframe-larger-than=]
../drivers/media/dvb-frontends/stv090x.c:1168:1: warning: the frame size of 2080 bytes is larger than 2048 bytes [-Wframe-larger-than=]
../drivers/media/dvb-frontends/stv090x.c:2073:1: warning: the frame size of 2552 bytes is larger than 2048 bytes [-Wframe-larger-than=]
../drivers/media/dvb-frontends/stv090x.c:3095:1: warning: the frame size of 5864 bytes is larger than 2048 bytes [-Wframe-larger-than=]
../drivers/media/dvb-frontends/stv090x.c:3436:1: warning: the frame size of 5280 bytes is larger than 2048 bytes [-Wframe-larger-than=]
../drivers/media/dvb-frontends/stv0367.c:3147:1: warning: the frame size of 4144 bytes is larger than 2048 bytes [-Wframe-larger-than=]
../drivers/media/dvb-frontends/cxd2841er.c:2401:1: warning: the frame size of 2984 bytes is larger than 2048 bytes [-Wframe-larger-than=]
../drivers/media/dvb-frontends/cxd2841er.c:2282:1: warning: the frame size of 4328 bytes is larger than 2048 bytes [-Wframe-larger-than=]
../drivers/net/ethernet/rocker/rocker.c:2172:1: warning: the frame size of 2752 bytes is larger than 2048 bytes [-Wframe-larger-than=]
-------------------------------------------------------------------------------
Passed with no errors, warnings or mismatches:
arm-multi_v5_defconfig
arm-multi_v7_defconfig
x86_64-defconfig
arm-allmodconfig
arm-allnoconfig
x86_64-allnoconfig
When driver is built as module and DT node contains clocks compatible
(e.g. "samsung,s2mps11-clk"), the module will not be autoloaded because
module aliases won't match.
The modalias from uevent: of:NclocksT<NULL>Csamsung,s2mps11-clk
The modalias from driver: platform:s2mps11-clk
The devices are instantiated by parent's MFD. However both Device Tree
bindings and parent define the compatible for clocks devices. In case
of module matching this DT compatible will be used.
The issue will not happen if this is built-in (no need for module
matching) or when clocks DT node does not contain compatible (not
correct from bindings perspective but working for driver).
Note when backporting to stable kernels: adjust the list of device ID
entries.
Cc: <stable(a)vger.kernel.org>
Fixes: 53c31b3437a6 ("mfd: sec-core: Add of_compatible strings for clock MFD cells")
Signed-off-by: Krzysztof Kozlowski <krzk(a)kernel.org>
---
drivers/clk/clk-s2mps11.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/drivers/clk/clk-s2mps11.c b/drivers/clk/clk-s2mps11.c
index d44e0eea31ec..11a1e83ff805 100644
--- a/drivers/clk/clk-s2mps11.c
+++ b/drivers/clk/clk-s2mps11.c
@@ -245,6 +245,36 @@ static const struct platform_device_id s2mps11_clk_id[] = {
};
MODULE_DEVICE_TABLE(platform, s2mps11_clk_id);
+#ifdef CONFIG_OF
+/*
+ * Device is instantiated through parent MFD device and device matching is done
+ * through platform_device_id.
+ *
+ * However if device's DT node contains proper clock compatible and it is built
+ * as a module, then the module matching will be done trough DT aliases. This
+ * requires of_device_id table. In the same time this will not change the
+ * actual device matching so do not add .of_match_table.
+ */
+static const struct of_device_id s2mps11_dt_match[] = {
+ {
+ .compatible = "samsung,s2mps11-clk",
+ .data = (void *)S2MPS11X,
+ }, {
+ .compatible = "samsung,s2mps13-clk",
+ .data = (void *)S2MPS13X,
+ }, {
+ .compatible = "samsung,s2mps14-clk",
+ .data = (void *)S2MPS14X,
+ }, {
+ .compatible = "samsung,s5m8767-clk",
+ .data = (void *)S5M8767X,
+ }, {
+ /* Sentinel */
+ },
+};
+MODULE_DEVICE_TABLE(of, s2mps11_dt_match);
+#endif
+
static struct platform_driver s2mps11_clk_driver = {
.driver = {
.name = "s2mps11-clk",
--
2.14.1
Hi Ingo,
Please consider pulling, I'm now investigating why these failed:
38: LLVM search and compile :
38.1: Basic BPF llvm compile : Ok
38.2: kbuild searching : Ok
38.3: Compile source for BPF prologue generation : Ok
38.4: Compile source for BPF relocation : FAILED!
40: BPF filter :
40.1: Basic BPF filtering : Ok
40.2: BPF pinning : Ok
40.3: BPF prologue generation : Ok
40.4: BPF relocation checker : FAILED!
I think these failures are not related to changes in this patch
kit. Details about the test environment, versions, etc.
Regards,
- Arnaldo
Test results at the end of this message, as usual.
The following changes since commit 1d59d16e9b4d5be80c9786a8b129c0f2af0e9522:
Merge remote-tracking branch 'tip/perf/urgent' into perf/core (2018-07-24 14:34:32 -0300)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-4.19-20180725
for you to fetch changes up to 9ef0112442bdddef5fb55adf20b3a5464b33de75:
perf test: Fix subtest number when showing results (2018-07-24 14:55:51 -0300)
----------------------------------------------------------------
perf/cores fixes and improvements:
Tools:
top:
- Fix 'struct comm_str' removal crash race, detected with refcount_t
debugging (Jiri Olsa)
- Use last_match threads cache only in single threaded mode, fixing
a crash (Jiri Olsa)
record:
- Synthesize GROUP_DESC feature in pipe mode fixing display of
event groups (Jiri Olsa)
stat:
- Get rid of extra clock display function (Jiri Olsa)
perf script:
- Show correct offsets for DWARF-based unwinding (Sandipan Das)
test:
- Check that complex event name is parsed correctly (Alexey Budankov)
- Fix subtest number when showing results (Thomas Richter)
Arch specific:
arm64:
- Generate syscall table from the kernel sources (asm/unistd.h) like
other arches do, speeding up the support for new system calls in
tools such as 'perf trace' (Kim Phillips)
arm:
- Bail out immediatelly on CoreSight hardware tracing instruction sample failure (Leo Yan)
PowerPC:
- Fix record+probe_libc_inet_pton.sh 'perf test' entry (Sandipan Das)
- Callchain IP filtering fixes (Sandipan Das)
S/390:
- Add support for detailed S/390 PMU event description in 'perf list' (Thomas Richter)
- Add transaction flag (-T) support in 'perf stat' for S/390 (Thomas Richter)
- Fix 'perf kvm' S/390 subcommands (Thomas Richter)
Infrastructure:
hists:
- Clarify callchain disabling when available (Arnaldo Carvalho de Melo)
evsel:
- Use perf_evsel__match instead of open coded equivalent (Jiri Olsa)
Documentation:
- Add missing documentation for 'perf list' --desc and --debug options (Sangwon Hong)
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
----------------------------------------------------------------
Alexey Budankov (1):
perf tests: Check that complex event name is parsed correctly
Arnaldo Carvalho de Melo (1):
perf hists: Clarify callchain disabling when available
Jiri Olsa (7):
perf tools: Synthesize GROUP_DESC feature in pipe mode
perf machine: Add threads__get_last_match function
perf machine: Add threads__set_last_match function
perf machine: Use last_match threads cache only in single thread mode
perf tools: Fix struct comm_str removal crash
perf tools: Use perf_evsel__match instead of open coded equivalent
perf stat: Get rid of extra clock display function
Kim Phillips (3):
tools include: Grab copies of arm64 dependent unistd.h files
perf arm64: Generate system call table from asm/unistd.h
perf trace arm64: Use generated syscall table
Leo Yan (2):
perf cs-etm: Introduce invalid address macro
perf cs-etm: Bail out immediately for instruction sample failure
Sandipan Das (6):
perf powerpc: Fix callchain ip filtering
perf powerpc: Fix callchain ip filtering when return address is in a register
perf tests: Fix record+probe_libc_inet_pton.sh for powerpc64
perf tests: Fix record+probe_libc_inet_pton.sh to ensure cleanups
perf tests: Fix record+probe_libc_inet_pton.sh when event exists
perf script: Show correct offsets for DWARF-based unwinding
Sangwon Hong (1):
perf list: Add missing documentation for --desc and --debug options
Thomas Richter (6):
Revert "perf list: Add s390 support for detailed/verbose PMU event description"
perf list: Add s390 support for detailed PMU event description
perf json: Add s390 transaction counter definition
perf stat: Add transaction flag (-T) support for s390
perf kvm: Fix subcommands on s390
perf test: Fix subtest number when showing results
tools/arch/arm64/include/uapi/asm/unistd.h | 20 +
tools/include/uapi/asm-generic/unistd.h | 783 +++++++++++++++++++++
tools/perf/Documentation/perf-list.txt | 8 +-
tools/perf/Makefile.config | 2 +
tools/perf/arch/arm64/Makefile | 21 +
tools/perf/arch/arm64/entry/syscalls/mksyscalltbl | 62 ++
tools/perf/arch/powerpc/util/skip-callchain-idx.c | 10 +-
tools/perf/arch/s390/util/kvm-stat.c | 2 +-
tools/perf/builtin-c2c.c | 4 +-
tools/perf/builtin-diff.c | 2 +-
tools/perf/builtin-report.c | 4 +-
tools/perf/builtin-stat.c | 60 +-
tools/perf/builtin-top.c | 2 +-
tools/perf/check-headers.sh | 2 +
tools/perf/pmu-events/arch/s390/cf_z10/basic.json | 12 +
tools/perf/pmu-events/arch/s390/cf_z10/crypto.json | 16 +
.../perf/pmu-events/arch/s390/cf_z10/extended.json | 18 +
tools/perf/pmu-events/arch/s390/cf_z13/basic.json | 12 +
tools/perf/pmu-events/arch/s390/cf_z13/crypto.json | 16 +
.../perf/pmu-events/arch/s390/cf_z13/extended.json | 56 ++
.../pmu-events/arch/s390/cf_z13/transaction.json | 7 +
tools/perf/pmu-events/arch/s390/cf_z14/basic.json | 8 +
tools/perf/pmu-events/arch/s390/cf_z14/crypto.json | 16 +
.../perf/pmu-events/arch/s390/cf_z14/extended.json | 53 ++
.../pmu-events/arch/s390/cf_z14/transaction.json | 7 +
tools/perf/pmu-events/arch/s390/cf_z196/basic.json | 12 +
.../perf/pmu-events/arch/s390/cf_z196/crypto.json | 16 +
.../pmu-events/arch/s390/cf_z196/extended.json | 24 +
.../perf/pmu-events/arch/s390/cf_zec12/basic.json | 12 +
.../perf/pmu-events/arch/s390/cf_zec12/crypto.json | 16 +
.../pmu-events/arch/s390/cf_zec12/extended.json | 35 +
.../pmu-events/arch/s390/cf_zec12/transaction.json | 7 +
tools/perf/pmu-events/jevents.c | 2 +
tools/perf/tests/builtin-test.c | 2 +-
tools/perf/tests/parse-events.c | 18 +
.../tests/shell/record+probe_libc_inet_pton.sh | 36 +-
tools/perf/ui/stdio/hist.c | 8 +-
tools/perf/util/comm.c | 16 +-
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 10 +-
tools/perf/util/cs-etm.c | 3 +
tools/perf/util/evsel.c | 11 +
tools/perf/util/evsel.h | 9 +-
tools/perf/util/header.c | 2 +-
tools/perf/util/hist.h | 2 +-
tools/perf/util/machine.c | 79 ++-
tools/perf/util/metricgroup.c | 22 +
tools/perf/util/metricgroup.h | 1 +
tools/perf/util/pmu.c | 6 -
tools/perf/util/stat-shadow.c | 5 +-
tools/perf/util/syscalltbl.c | 4 +
tools/perf/util/unwind-libdw.c | 2 +-
tools/perf/util/unwind-libunwind-local.c | 2 +-
52 files changed, 1456 insertions(+), 109 deletions(-)
create mode 100644 tools/arch/arm64/include/uapi/asm/unistd.h
create mode 100644 tools/include/uapi/asm-generic/unistd.h
create mode 100755 tools/perf/arch/arm64/entry/syscalls/mksyscalltbl
create mode 100644 tools/perf/pmu-events/arch/s390/cf_z13/transaction.json
create mode 100644 tools/perf/pmu-events/arch/s390/cf_z14/transaction.json
create mode 100644 tools/perf/pmu-events/arch/s390/cf_zec12/transaction.json
Test results:
The first ones are container (docker) based builds of tools/perf with
and without libelf support. Where clang is available, it is also used
to build perf with/without libelf, and building with LIBCLANGLLVM=1
(built-in clang) with gcc and clang when clang and its devel libraries
are installed.
The objtool and samples/bpf/ builds are disabled now that I'm switching from
using the sources in a local volume to fetching them from a http server to
build it inside the container, to make it easier to build in a container cluster.
Those will come back later.
Several are cross builds, the ones with -x-ARCH and the android one, and those
may not have all the features built, due to lack of multi-arch devel packages,
available and being used so far on just a few, like
debian:experimental-x-{arm64,mipsel}.
The 'perf test' one will perform a variety of tests exercising
tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands
with a variety of command line event specifications to then intercept the
sys_perf_event syscall to check that the perf_event_attr fields are set up as
expected, among a variety of other unit tests.
Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/
with a variety of feature sets, exercising the build with an incomplete set of
features as well as with a complete one. It is planned to have it run on each
of the containers mentioned above, using some container orchestration
infrastructure. Get in contact if interested in helping having this in place.
# dm
1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0
2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822
3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0
4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0
5 alpine:edge : Ok gcc (Alpine 6.4.0) 6.4.0
6 amazonlinux:1 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
7 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
8 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
9 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
10 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55)
11 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18)
12 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)
13 debian:7 : Ok gcc (Debian 4.7.2-5) 4.7.2
14 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u1) 4.9.2
15 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
16 debian:experimental : Ok gcc (Debian 7.3.0-15) 7.3.0
17 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 7.3.0-15) 7.3.0
18 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 7.3.0-19) 7.3.0
19 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 7.3.0-18) 7.3.0
20 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 7.3.0-20) 7.3.0
21 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)
22 fedora:21 : Ok gcc (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)
23 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
24 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
25 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1)
26 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710
27 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1)
28 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2)
29 fedora:27 : Ok gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
30 fedora:28 : Ok gcc (GCC) 8.1.1 20180712 (Red Hat 8.1.1-5)
31 fedora:rawhide : Ok gcc (GCC) 8.0.1 20180324 (Red Hat 8.0.1-0.20)
32 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 7.3.0-r3 p1.4) 7.3.0
33 mageia:5 : Ok gcc (GCC) 4.9.2
34 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0
35 opensuse:42.1 : Ok gcc (SUSE Linux) 4.8.5
36 opensuse:42.2 : Ok gcc (SUSE Linux) 4.8.5
37 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5
38 opensuse:tumbleweed : Ok gcc (SUSE Linux) 7.3.1 20180323 [gcc-7-branch revision 258812]
39 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1)
40 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28.0.1)
41 ubuntu:12.04.5 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
42 ubuntu:14.04.4 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
43 ubuntu:14.04.4-x-linaro-arm64 : Ok aarch64-linux-gnu-gcc (Linaro GCC 5.4-2017.05) 5.4.1 20170404
44 ubuntu:15.04 : Ok gcc (Ubuntu 4.9.2-10ubuntu13) 4.9.2
45 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
46 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
47 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
48 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
49 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
50 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
51 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
52 ubuntu:16.10 : Ok gcc (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005
53 ubuntu:17.04 : Ok gcc (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406
54 ubuntu:17.10 : Ok gcc (Ubuntu 7.2.0-8ubuntu3.2) 7.2.0
55 ubuntu:18.04 : Ok gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0
#
Investigation is underway for the BPF related failures below.
# git log --oneline -1
9ef0112442bd (HEAD -> perf/core, jouet/perf/core) perf test: Fix subtest number when showing results
# perf version --build-options
perf version 4.18.rc6.g9ef0112
dwarf: [ on ] # HAVE_DWARF_SUPPORT
dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT
glibc: [ on ] # HAVE_GLIBC_SUPPORT
gtk2: [ on ] # HAVE_GTK2_SUPPORT
syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT
libbfd: [ on ] # HAVE_LIBBFD_SUPPORT
libelf: [ on ] # HAVE_LIBELF_SUPPORT
libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT
numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT
libperl: [ on ] # HAVE_LIBPERL_SUPPORT
libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT
libslang: [ on ] # HAVE_SLANG_SUPPORT
libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT
libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT
libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT
zlib: [ on ] # HAVE_ZLIB_SUPPORT
lzma: [ on ] # HAVE_LZMA_SUPPORT
get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT
bpf: [ on ] # HAVE_LIBBPF_SUPPORT
# uname -a
Linux seventh 4.18.0-rc6-00093-g9981b4fb8684 #2 SMP Wed Jul 25 12:31:40 -03 2018 x86_64 x86_64 x86_64 GNU/Linux
# perf test
1: vmlinux symtab matches kallsyms : Ok
2: Detect openat syscall event : Ok
3: Detect openat syscall event on all cpus : Ok
4: Read samples using the mmap interface : Ok
5: Test data source output : Ok
6: Parse event definition strings : Ok
7: Simple expression parser : Ok
8: PERF_RECORD_* events & perf_sample fields : Ok
9: Parse perf pmu format : Ok
10: DSO data read : Ok
11: DSO data cache : Ok
12: DSO data reopen : Ok
13: Roundtrip evsel->name : Ok
14: Parse sched tracepoints fields : Ok
15: syscalls:sys_enter_openat event fields : Ok
16: Setup struct perf_event_attr : Ok
17: Match and link multiple hists : Ok
18: 'import perf' in python : Ok
19: Breakpoint overflow signal handler : Ok
20: Breakpoint overflow sampling : Ok
21: Breakpoint accounting : Ok
22: Number of exit events of a simple workload : Ok
23: Software clock events period values : Ok
24: Object code reading : Ok
25: Sample parsing : Ok
26: Use a dummy software event to keep tracking : Ok
27: Parse with no sample_id_all bit set : Ok
28: Filter hist entries : Ok
29: Lookup mmap thread : Ok
30: Share thread mg : Ok
31: Sort output of hist entries : Ok
32: Cumulate child hist entries : Ok
33: Track with sched_switch : Ok
34: Filter fds with revents mask in a fdarray : Ok
35: Add fd to a fdarray, making it autogrow : Ok
36: kmod_path__parse : Ok
37: Thread map : Ok
38: LLVM search and compile :
38.1: Basic BPF llvm compile : Ok
38.2: kbuild searching : Ok
38.3: Compile source for BPF prologue generation : Ok
38.4: Compile source for BPF relocation : FAILED!
39: Session topology : Ok
40: BPF filter :
40.1: Basic BPF filtering : Ok
40.2: BPF pinning : Ok
40.3: BPF prologue generation : Ok
40.4: BPF relocation checker : FAILED!
41: Synthesize thread map : Ok
42: Remove thread map : Ok
43: Synthesize cpu map : Ok
44: Synthesize stat config : Ok
45: Synthesize stat : Ok
46: Synthesize stat round : Ok
47: Synthesize attr update : Ok
48: Event times : Ok
49: Read backward ring buffer : Ok
50: Print cpu map : Ok
51: Probe SDT events : Ok
52: is_printable_array : Ok
53: Print bitmap : Ok
54: perf hooks : Ok
55: builtin clang support : Skip (not compiled in)
56: unit_number__scnprintf : Ok
57: mem2node : Ok
58: x86 rdpmc : Ok
59: Convert perf time to TSC : Ok
60: DWARF unwind : Ok
61: x86 instruction decoder - new instructions : Ok
62: probe libc's inet_pton & backtrace it with ping : Ok
63: Check open filename arg using perf trace + vfs_getname: Ok
64: Use vfs_getname probe to get syscall args filenames : Ok
65: Add vfs_getname probe to get syscall args filenames : Ok
#
$ make -C tools/perf build-test
make: Entering directory '/home/acme/git2/perf/tools/perf'
- tarpkg: ./tests/perf-targz-src-pkg .
make_with_babeltrace_O: make LIBBABELTRACE=1
make_util_pmu_bison_o_O: make util/pmu-bison.o
make_install_prefix_slash_O: make install prefix=/tmp/krava/
make_clean_all_O: make clean all
make_no_libunwind_O: make NO_LIBUNWIND=1
make_util_map_o_O: make util/map.o
make_no_auxtrace_O: make NO_AUXTRACE=1
make_no_libbionic_O: make NO_LIBBIONIC=1
make_install_O: make install
make_pure_O: make
make_doc_O: make doc
make_help_O: make help
make_no_gtk2_O: make NO_GTK2=1
make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1
make_no_slang_O: make NO_SLANG=1
make_install_bin_O: make install-bin
make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1
make_no_libaudit_O: make NO_LIBAUDIT=1
make_no_libnuma_O: make NO_LIBNUMA=1
make_no_newt_O: make NO_NEWT=1
make_no_demangle_O: make NO_DEMANGLE=1
make_no_libelf_O: make NO_LIBELF=1
make_cscope_O: make cscope
make_static_O: make LDFLAGS=-static
make_debug_O: make DEBUG=1
make_perf_o_O: make perf.o
make_no_backtrace_O: make NO_BACKTRACE=1
make_tags_O: make tags
make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1
make_install_prefix_O: make install prefix=/tmp/krava
make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1
make_with_clangllvm_O: make LIBCLANGLLVM=1
make_no_libbpf_O: make NO_LIBBPF=1
make_no_libperl_O: make NO_LIBPERL=1
make_no_libpython_O: make NO_LIBPYTHON=1
OK
make: Leaving directory '/home/acme/git2/perf/tools/perf'
$
This is needed to ensure ->is_unity is correct when the plane was
previously configured to output a multi-planar format with scaling
enabled, and is then being reconfigured to output a uniplanar format.
Fixes: fc04023fafec ("drm/vc4: Add support for YUV planes.")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon(a)bootlin.com>
---
drivers/gpu/drm/vc4/vc4_plane.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/vc4/vc4_plane.c b/drivers/gpu/drm/vc4/vc4_plane.c
index 9d7a36f148cf..cfb50fedfa2b 100644
--- a/drivers/gpu/drm/vc4/vc4_plane.c
+++ b/drivers/gpu/drm/vc4/vc4_plane.c
@@ -320,6 +320,9 @@ static int vc4_plane_setup_clipping_and_scaling(struct drm_plane_state *state)
vc4_state->x_scaling[0] = VC4_SCALING_TPZ;
if (vc4_state->y_scaling[0] == VC4_SCALING_NONE)
vc4_state->y_scaling[0] = VC4_SCALING_TPZ;
+ } else {
+ vc4_state->x_scaling[1] = VC4_SCALING_NONE;
+ vc4_state->y_scaling[1] = VC4_SCALING_NONE;
}
vc4_state->is_unity = (vc4_state->x_scaling[0] == VC4_SCALING_NONE &&
--
2.14.1
drm_atomic_helper_async_check() declares the plane, old_plane_state and
new_plane_state variables to iterate over all planes of the atomic
state and make sure only one plane is enabled.
Unfortunately gcc is not smart enough to figure out that the check on
n_planes is enough to guarantee that plane, new_plane_state and
old_plane_state are initialized.
Explicitly initialize those variables to NULL to make gcc happy.
Fixes: fef9df8b5945 ("drm/atomic: initial support for asynchronous plane update")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon(a)bootlin.com>
Reviewed-by: Sean Paul <seanpaul(a)chromium.org>
---
Changes in v2:
- Cc stable
- Add Sean's R-b
- Fix a typo in the commit message
---
drivers/gpu/drm/drm_atomic_helper.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index f7ccfebd3ca8..80be74df7ba6 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1538,8 +1538,9 @@ int drm_atomic_helper_async_check(struct drm_device *dev,
{
struct drm_crtc *crtc;
struct drm_crtc_state *crtc_state;
- struct drm_plane *plane;
- struct drm_plane_state *old_plane_state, *new_plane_state;
+ struct drm_plane *plane = NULL;
+ struct drm_plane_state *old_plane_state = NULL;
+ struct drm_plane_state *new_plane_state = NULL;
const struct drm_plane_helper_funcs *funcs;
int i, n_planes = 0;
--
2.14.1
Async plane update is supposed to work only when updating the FB or FB
position of an already enabled plane. That does not apply to requests
where the plane was previously disabled or assigned to a different
CTRC.
Check old_plane_state->crtc value to make sure async plane update is
allowed.
Fixes: fef9df8b5945 ("drm/atomic: initial support for asynchronous plane update")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon(a)bootlin.com>
Reviewed-by: Eric Anholt <eric(a)anholt.net>
---
Changes in v2:
- Cc stable
- Add Eric's R-b
---
drivers/gpu/drm/drm_atomic_helper.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index 866a2cc72ef6..f7ccfebd3ca8 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1555,7 +1555,8 @@ int drm_atomic_helper_async_check(struct drm_device *dev,
if (n_planes != 1)
return -EINVAL;
- if (!new_plane_state->crtc)
+ if (!new_plane_state->crtc ||
+ old_plane_state->crtc != new_plane_state->crtc)
return -EINVAL;
funcs = plane->helper_private;
--
2.14.1
I'm announcing the release of the 4.4.144 kernel.
All users of the 4.4 kernel series must upgrade.
The updated 4.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.4.y
and can be browsed at the normal kernel.org git web browser:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Documentation/ABI/testing/sysfs-devices-system-cpu | 1
Documentation/kernel-parameters.txt | 45 ++
Documentation/spec_ctrl.txt | 94 ++++
Makefile | 2
arch/arc/include/asm/page.h | 2
arch/arc/include/asm/pgtable.h | 2
arch/x86/entry/entry_64_compat.S | 75 ++-
arch/x86/include/asm/apm.h | 6
arch/x86/include/asm/barrier.h | 2
arch/x86/include/asm/cpufeature.h | 7
arch/x86/include/asm/cpufeatures.h | 37 +
arch/x86/include/asm/disabled-features.h | 3
arch/x86/include/asm/efi.h | 7
arch/x86/include/asm/intel-family.h | 10
arch/x86/include/asm/irqflags.h | 2
arch/x86/include/asm/mmu.h | 15
arch/x86/include/asm/mmu_context.h | 25 -
arch/x86/include/asm/msr-index.h | 22 +
arch/x86/include/asm/nospec-branch.h | 54 ++
arch/x86/include/asm/required-features.h | 3
arch/x86/include/asm/spec-ctrl.h | 80 +++
arch/x86/include/asm/thread_info.h | 6
arch/x86/include/asm/tlbflush.h | 12
arch/x86/kernel/Makefile | 1
arch/x86/kernel/cpu/amd.c | 38 +
arch/x86/kernel/cpu/bugs.c | 427 +++++++++++++++++++--
arch/x86/kernel/cpu/common.c | 121 +++++
arch/x86/kernel/cpu/cpu.h | 3
arch/x86/kernel/cpu/intel.c | 73 +++
arch/x86/kernel/cpu/mcheck/mce.c | 3
arch/x86/kernel/irqflags.S | 26 +
arch/x86/kernel/ldt.c | 4
arch/x86/kernel/process.c | 224 +++++++++--
arch/x86/kernel/smpboot.c | 5
arch/x86/kvm/svm.c | 2
arch/x86/kvm/vmx.c | 2
arch/x86/mm/tlb.c | 33 +
arch/x86/platform/efi/efi_64.c | 3
arch/x86/xen/enlighten.c | 16
arch/x86/xen/smp.c | 5
arch/x86/xen/suspend.c | 16
block/blk-core.c | 10
drivers/base/cpu.c | 8
drivers/clk/tegra/clk-tegra30.c | 11
drivers/mtd/ubi/attach.c | 139 +++++-
drivers/mtd/ubi/eba.c | 4
drivers/mtd/ubi/fastmap-wl.c | 6
drivers/mtd/ubi/fastmap.c | 51 ++
drivers/mtd/ubi/ubi.h | 46 ++
drivers/mtd/ubi/wl.c | 114 ++++-
drivers/net/ethernet/broadcom/tg3.c | 9
drivers/net/phy/phy_device.c | 7
drivers/ptp/ptp_chardev.c | 1
drivers/usb/host/xhci.c | 40 +
drivers/usb/host/xhci.h | 4
fs/fat/inode.c | 20
fs/proc/array.c | 26 +
include/linux/cpu.h | 2
include/linux/nospec.h | 10
include/linux/sched.h | 9
include/linux/seccomp.h | 3
include/linux/skbuff.h | 12
include/net/ipv6.h | 2
include/uapi/linux/prctl.h | 12
include/uapi/linux/seccomp.h | 4
kernel/seccomp.c | 21 -
kernel/sys.c | 21 +
lib/rhashtable.c | 17
mm/memcontrol.c | 2
net/core/skbuff.c | 1
net/ipv4/fib_frontend.c | 1
net/ipv4/sysctl_net_ipv4.c | 5
sound/core/rawmidi.c | 20
tools/testing/selftests/seccomp/seccomp_bpf.c | 98 ++++
virt/kvm/eventfd.c | 6
75 files changed, 1981 insertions(+), 275 deletions(-)
Alan Jenkins (1):
block: do not use interruptible wait anywhere
Alexander Sergeyev (1):
x86/speculation: Remove Skylake C2 from Speculation Control microcode blacklist
Alexey Brodkin (1):
ARC: Fix CONFIG_SWAP
Andy Lutomirski (2):
x86/mm: Give each mm TLB flush generation a unique ID
x86/cpu: Re-apply forced caps every time CPU caps are re-read
Andy Shevchenko (1):
x86/cpu: Rename Merrifield2 to Moorefield
Arnd Bergmann (1):
x86/pti: Mark constant arrays as __initconst
Borislav Petkov (4):
Documentation/spec_ctrl: Do some minor cleanups
x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP
x86/cpu/AMD: Fix erratum 1076 (CPB bit)
x86/bugs: Unify x86_spec_ctrl_{set_guest, restore_host}
Colin Ian King (1):
ipv6: fix useless rol32 call on hash
Dan Williams (2):
x86/entry/64/compat: Clear registers for compat syscalls, to reduce speculation attack surface
x86/speculation: Fix up array_index_nospec_mask() asm constraint
Dave Hansen (1):
x86/mm: Factor out LDT init from context init
David Ahern (1):
net/ipv4: Set oif in fib_compute_spec_dst
David Woodhouse (14):
x86/cpufeatures: Add CPUID_7_EDX CPUID leaf
x86/cpufeatures: Add Intel feature bits for Speculation Control
x86/cpufeatures: Add AMD feature bits for Speculation Control
x86/msr: Add definitions for new speculation control MSRs
x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown
x86/cpufeature: Blacklist SPEC_CTRL/PRED_CMD on early Spectre v2 microcodes
x86/speculation: Add basic IBPB (Indirect Branch Prediction Barrier) support
x86/cpufeatures: Clean up Spectre v2 related CPUID flags
x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on Intel
x86/speculation: Update Speculation Control microcode blacklist
x86/speculation: Correct Speculation Control microcode blacklist again
x86/speculation: Use IBRS if available before calling into firmware
x86/amd: don't set X86_BUG_SYSRET_SS_ATTRS when running under Xen
x86/bugs/AMD: Add support to disable RDS on Fam[15, 16, 17]h if requested
Davidlohr Bueso (1):
lib/rhashtable: consider param->min_size when setting initial table size
Denys Vlasenko (1):
x86/asm/entry/32: Simplify pushes of zeroed pt_regs->REGs
Dewet Thibaut (1):
x86/MCE: Remove min interval polling limitation
Greg Kroah-Hartman (1):
Linux 4.4.144
Gustavo A. R. Silva (1):
ptp: fix missing break in switch
Heiner Kallweit (1):
net: phy: fix flag masking in __set_phy_supported
Ingo Molnar (2):
x86/speculation: Clean up various Spectre related details
x86/speculation: Move firmware_restrict_branch_speculation_*() from C to CPP
Jim Mattson (1):
x86/cpu: Make alternative_msr_write work for 32-bit code
Jing Xia (1):
mm: memcg: fix use after free in mem_cgroup_iter()
Jiri Kosina (2):
x86/bugs: Fix __ssb_select_mitigation() return type
x86/bugs: Make cpu_show_common() static
Juergen Gross (3):
x86/xen: Zero MSR_IA32_SPEC_CTRL before suspend
xen: set cpu capabilities from xen_start_kernel()
x86/xen: Add call of speculative_store_bypass_ht_init() to PV paths
Kees Cook (5):
nospec: Allow getting/setting on non-current task
proc: Provide details on speculation flaw mitigations
seccomp: Enable speculation flaw mitigations
seccomp: Add filter flag to opt-out of SSB mitigation
x86/speculation: Make "seccomp" the default mode for Speculative Store Bypass
Konrad Rzeszutek Wilk (14):
x86/spectre_v2: Don't check microcode versions when running under hypervisors
x86/bugs: Concentrate bug detection into a separate function
x86/bugs: Concentrate bug reporting into a separate function
x86/bugs: Read SPEC_CTRL MSR during boot and re-use reserved bits
x86/bugs, KVM: Support the combination of guest and host IBRS
x86/bugs: Expose /sys/../spec_store_bypass
x86/cpufeatures: Add X86_FEATURE_RDS
x86/bugs: Provide boot parameters for the spec_store_bypass_disable mitigation
x86/bugs/intel: Set proper CPU features and setup RDS
x86/bugs: Whitelist allowed SPEC_CTRL MSR values
x86/bugs: Rename _RDS to _SSBD
proc: Use underscores for SSBD in 'status'
x86/bugs: Fix the parameters alignment and missing void
x86/bugs: Rename SSBD_NO to SSB_NO
Kyle Huey (2):
x86/process: Optimize TIF checks in __switch_to_xtra()
x86/process: Correct and optimize TIF_BLOCKSTEP switch
Lan Tianyu (1):
KVM/Eventfd: Avoid crash when assign and deassign specific eventfd in parallel.
Linus Torvalds (1):
x86/nospec: Simplify alternative_msr_write()
Lucas Stach (1):
clk: tegra: Fix PLL_U post divider and initial rate on Tegra30
Mathias Nyman (1):
xhci: Fix perceived dead host due to runtime suspend race with event handler
Mickaël Salaün (2):
selftest/seccomp: Fix the flag name SECCOMP_FILTER_FLAG_TSYNC
selftest/seccomp: Fix the seccomp(2) signature
Nick Desaulniers (1):
x86/paravirt: Make native_save_fl() extern inline
OGAWA Hirofumi (1):
fat: fix memory allocation failure handling of match_strdup()
Peter Zijlstra (1):
x86/speculation: Add <asm/msr-index.h> dependency
Piotr Luc (1):
x86/cpu/intel: Add Knights Mill to Intel family
Richard Weinberger (5):
ubi: Introduce vol_ignored()
ubi: Rework Fastmap attach base code
ubi: Be more paranoid while seaching for the most recent Fastmap
ubi: Fix races around ubi_refill_pools()
ubi: Fix Fastmap's update_vol()
Sanjeev Bansal (1):
tg3: Add higher cpu clock for 5762.
Sascha Hauer (1):
ubi: fastmap: Erase outdated anchor PEBs during attach
Stefano Brivio (2):
net: Don't copy pfmemalloc flag in __copy_skb_header()
skbuff: Unconditionally copy pfmemalloc in __skb_clone()
Takashi Iwai (1):
ALSA: rawmidi: Change resized buffers atomically
Thomas Gleixner (18):
x86/speculation: Create spec-ctrl.h to avoid include hell
prctl: Add speculation control prctls
x86/process: Optimize TIF_NOTSC switch
x86/process: Allow runtime control of Speculative Store Bypass
x86/speculation: Add prctl for Speculative Store Bypass mitigation
prctl: Add force disable speculation
seccomp: Use PR_SPEC_FORCE_DISABLE
seccomp: Move speculation migitation control to arch code
x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS
x86/cpufeatures: Disentangle SSBD enumeration
x86/cpufeatures: Add FEATURE_ZEN
x86/speculation: Handle HT correctly on AMD
x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL
x86/speculation: Rework speculative_store_bypass_update()
x86/bugs: Expose x86_spec_ctrl_base directly
x86/bugs: Remove x86_spec_ctrl_set()
x86/bugs: Rework spec_ctrl base and mask logic
x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG
Tim Chen (1):
x86/speculation: Use Indirect Branch Prediction Barrier in context switch
Tom Lendacky (1):
x86/speculation: Add virtualized speculative store bypass disable support
Tyler Hicks (1):
ipv4: Return EINVAL when ping_group_range sysctl doesn't map to user ns
Vineet Gupta (1):
ARC: mm: allow mprotect to make stack mappings executable
Fyi.
Please find attached.
Mohammad Ali Tajick Ghanbary
Associate Prof. of Mycology & Plant Pathology
Department of Plant Protection
College of Agronomic Sciences
Sari Agricultural Sciences and Natural Resources University
P.O.Box 578
Sari,IRAN
Office & fax : +98 11 33687567
Mobile : +98 911 254 6616
From: Christoffer Dall <christoffer.dall(a)arm.com>
When the VCPU is blocked (for example from WFI) we don't inject the
physical timer interrupt if it should fire while the CPU is blocked, but
instead we just wake up the VCPU and expect kvm_timer_vcpu_load to take
care of injecting the interrupt.
Unfortunately, kvm_timer_vcpu_load() doesn't actually do that, it only
has support to schedule a soft timer if the emulated phys timer is
expected to fire in the future.
Follow the same pattern as kvm_timer_update_state() and update the irq
state after potentially scheduling a soft timer.
Reported-by: Andre Przywara <andre.przywara(a)arm.com>
Cc: Stable <stable(a)vger.kernel.org> # 4.15+
Fixes: bbdd52cfcba29 ("KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit")
Signed-off-by: Christoffer Dall <christoffer.dall(a)arm.com>
---
virt/kvm/arm/arch_timer.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 18ff6203079d..17cecc96f735 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -487,6 +487,7 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
{
struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
+ struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
if (unlikely(!timer->enabled))
return;
@@ -502,6 +503,10 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
/* Set the background timer for the physical timer emulation. */
phys_timer_emulate(vcpu);
+
+ /* If the timer fired while we weren't running, inject it now */
+ if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
+ kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
}
bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
--
2.14.4
From: Christoffer Dall <christoffer.dall(a)arm.com>
kvm_timer_update_state() is called when changing the phys timer
configuration registers, either via vcpu reset, as a result of a trap
from the guest, or when userspace programs the registers.
phys_timer_emulate() is in turn called by kvm_timer_update_state() to
either cancel an existing software timer, or program a new software
timer, to emulate the behavior of a real phys timer, based on the change
in configuration registers.
Unfortunately, the interaction between these two functions left a small
race; if the conceptual emulated phys timer should actually fire, but
the soft timer hasn't executed its callback yet, we cancel the timer in
phys_timer_emulate without injecting an irq. This only happens if the
check in kvm_timer_update_state is called before the timer should fire,
which is relatively unlikely, but possible.
The solution is to update the state of the phys timer after calling
phys_timer_emulate, which will pick up the pending timer state and
update the interrupt value.
Note that this leaves the opportunity of raising the interrupt twice,
once in the just-programmed soft timer, and once in
kvm_timer_update_state. Since this always happens synchronously with
the VCPU execution, there is no harm in this, and the guest ever only
sees a single timer interrupt.
Cc: Stable <stable(a)vger.kernel.org> # 4.15+
Signed-off-by: Christoffer Dall <christoffer.dall(a)arm.com>
---
virt/kvm/arm/arch_timer.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index bd3d57f40f1b..18ff6203079d 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -295,9 +295,9 @@ static void phys_timer_emulate(struct kvm_vcpu *vcpu)
struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
/*
- * If the timer can fire now we have just raised the IRQ line and we
- * don't need to have a soft timer scheduled for the future. If the
- * timer cannot fire at all, then we also don't need a soft timer.
+ * If the timer can fire now, we don't need to have a soft timer
+ * scheduled for the future. If the timer cannot fire at all,
+ * then we also don't need a soft timer.
*/
if (kvm_timer_should_fire(ptimer) || !kvm_timer_irq_can_fire(ptimer)) {
soft_timer_cancel(&timer->phys_timer, NULL);
@@ -332,10 +332,10 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
level = kvm_timer_should_fire(vtimer);
kvm_timer_update_irq(vcpu, level, vtimer);
+ phys_timer_emulate(vcpu);
+
if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
-
- phys_timer_emulate(vcpu);
}
static void vtimer_save_state(struct kvm_vcpu *vcpu)
--
2.14.4
How are you?
I would like to speak with the person in charge of purchasing your branded
promotional products for your company?
We create custom LOGO USB flash drives for our clients throughout the US.
We can print your logo, and load your digital images, videos and files!
If you need marketing, advertising, gifts or incentives, USB flash drives
are the solution!
Here is what we include:
-All Memory Sizes from 64MB up to 128GB!
-Second Side Printing
-Low Minimum Quantities
-Rush Service Available
-Full color Printing
NEW: We can make a custom shaped USB drive to look like your Logo or
product!
Send us your product image or logo files; we will create a design mock up
for you at no cost!
We are always running a new deals; email to get pricing!
Ask about the “Double Your Memory” upgrade promotion going on right
now!
Pricing is low right now, so let us know what you need and we will get you
a quick quote.
We will beat any competitors pricing, send us your last invoice and we will
beat it!
We always offer great rates for schools and nonprofits as well.
Regards,
Vanessa Kellen
Logo USB Account Manager
This is a note to let you know that I've just added the patch titled
iio: ad9523: Fix displayed phase
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the staging-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 5a4e33c1c53ae7d4425f7d94e60e4458a37b349e Mon Sep 17 00:00:00 2001
From: Lars-Peter Clausen <lars(a)metafoo.de>
Date: Mon, 25 Jun 2018 11:03:07 +0300
Subject: iio: ad9523: Fix displayed phase
Fix the displayed phase for the ad9523 driver. Currently the most
significant decimal place is dropped and all other digits are shifted one
to the left. This is due to a multiplication by 10, which is not necessary,
so remove it.
Signed-off-by: Lars-Peter Clausen <lars(a)metafoo.de>
Signed-off-by: Alexandru Ardelean <alexandru.ardelean(a)analog.com>
Fixes: cd1678f9632 ("iio: frequency: New driver for AD9523 SPI Low Jitter Clock Generator")
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/frequency/ad9523.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iio/frequency/ad9523.c b/drivers/iio/frequency/ad9523.c
index 48ea46a1bc38..37504739c277 100644
--- a/drivers/iio/frequency/ad9523.c
+++ b/drivers/iio/frequency/ad9523.c
@@ -653,7 +653,7 @@ static int ad9523_read_raw(struct iio_dev *indio_dev,
code = (AD9523_CLK_DIST_DIV_PHASE_REV(ret) * 3141592) /
AD9523_CLK_DIST_DIV_REV(ret);
*val = code / 1000000;
- *val2 = (code % 1000000) * 10;
+ *val2 = code % 1000000;
return IIO_VAL_INT_PLUS_MICRO;
default:
return -EINVAL;
--
2.18.0
This is a note to let you know that I've just added the patch titled
iio: sca3000: Fix missing return in switch
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the staging-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From c5b974bee9d2ceae4c441ae5a01e498c2674e100 Mon Sep 17 00:00:00 2001
From: "Gustavo A. R. Silva" <gustavo(a)embeddedor.com>
Date: Sat, 7 Jul 2018 12:44:01 -0500
Subject: iio: sca3000: Fix missing return in switch
The IIO_CHAN_INFO_LOW_PASS_FILTER_3DB_FREQUENCY case is missing a
return and will fall through to the default case and errorenously
return -EINVAL.
Fix this by adding in missing *return ret*.
Fixes: 626f971b5b07 ("staging:iio:accel:sca3000 Add write support to the low pass filter control")
Reported-by: Jonathan Cameron <jic23(a)kernel.org>
Signed-off-by: Gustavo A. R. Silva <gustavo(a)embeddedor.com>
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/accel/sca3000.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/iio/accel/sca3000.c b/drivers/iio/accel/sca3000.c
index 4dceb75e3586..4964561595f5 100644
--- a/drivers/iio/accel/sca3000.c
+++ b/drivers/iio/accel/sca3000.c
@@ -797,6 +797,7 @@ static int sca3000_write_raw(struct iio_dev *indio_dev,
mutex_lock(&st->lock);
ret = sca3000_write_3db_freq(st, val);
mutex_unlock(&st->lock);
+ return ret;
default:
return -EINVAL;
}
--
2.18.0
From: Roman Pen <roman.penyaev(a)profitbricks.com>
The patch below fixes queue stalling when shared hctx marked for restart
(BLK_MQ_S_SCHED_RESTART bit) but q->shared_hctx_restart stays zero. The
root cause is that hctxs are shared between queues, but 'shared_hctx_restart'
belongs to the particular queue, which in fact may not need to be restarted,
thus we return from blk_mq_sched_restart() and leave shared hctx of another
queue never restarted.
The fix is to make shared_hctx_restart counter belong not to the queue, but
to tags, thereby counter will reflect real number of shared hctx needed to
be restarted.
During tests 1 hctx (set->nr_hw_queues) was used and all stalled requests
were noticed in dd->fifo_list of mq-deadline scheduler.
Seeming possible sequence of events:
1. Request A of queue A is inserted into dd->fifo_list of the scheduler.
2. Request B of queue A bypasses scheduler and goes directly to
hctx->dispatch.
3. Request C of queue B is inserted.
4. blk_mq_sched_dispatch_requests() is invoked, since hctx->dispatch is not
empty (request B is in the list) hctx is only marked for for next restart
and request A is left in a list (see comment "So it's best to leave them
there for as long as we can. Mark the hw queue as needing a restart in
that case." in blk-mq-sched.c)
5. Eventually request B is completed/freed and blk_mq_sched_restart() is
called, but by chance hctx from queue B is chosen for restart and request C
gets a chance to be dispatched.
6. Eventually request C is completed/freed and blk_mq_sched_restart() is
called, but shared_hctx_restart for queue B is zero and we return without
attempt to restart hctx from queue A, thus request A is stuck forever.
But stalling queue is not the only one problem with blk_mq_sched_restart().
My tests show that those loops thru all queues and hctxs can be very costly,
even with shared_hctx_restart counter, which aims to fix performance issue.
For my tests I create 128 devices with 64 hctx each, which share same tags
set.
The following is the fio and ftrace output for v4.14-rc4 kernel:
READ: io=5630.3MB, aggrb=573208KB/s, minb=573208KB/s, maxb=573208KB/s, mint=10058msec, maxt=10058msec
WRITE: io=5650.9MB, aggrb=575312KB/s, minb=575312KB/s, maxb=575312KB/s, mint=10058msec, maxt=10058msec
root@pserver16:~/roman# cat /sys/kernel/debug/tracing/trace_stat/* | grep blk_mq
Function Hit Time Avg s^2
-------- --- ---- --- ---
blk_mq_sched_restart 16347 9540759 us 583.639 us 8804801 us
blk_mq_sched_restart 7884 6073471 us 770.354 us 8780054 us
blk_mq_sched_restart 14176 7586794 us 535.185 us 2822731 us
blk_mq_sched_restart 7843 6205435 us 791.206 us 12424960 us
blk_mq_sched_restart 1490 4786107 us 3212.153 us 1949753 us
blk_mq_sched_restart 7892 6039311 us 765.244 us 2994627 us
blk_mq_sched_restart 15382 7511126 us 488.306 us 3090912 us
[cut]
And here are results with two patches reverted:
8e8320c9315c ("blk-mq: fix performance regression with shared tags")
6d8c6c0f97ad ("blk-mq: Restart a single queue if tag sets are shared")
READ: io=12884MB, aggrb=1284.3MB/s, minb=1284.3MB/s, maxb=1284.3MB/s, mint=10032msec, maxt=10032msec
WRITE: io=12987MB, aggrb=1294.6MB/s, minb=1294.6MB/s, maxb=1294.6MB/s, mint=10032msec, maxt=10032msec
root@pserver16:~/roman# cat /sys/kernel/debug/tracing/trace_stat/* | grep blk_mq
Function Hit Time Avg s^2
-------- --- ---- --- ---
blk_mq_sched_restart 50699 8802.349 us 0.173 us 121.771 us
blk_mq_sched_restart 50362 8740.470 us 0.173 us 161.494 us
blk_mq_sched_restart 50402 9066.337 us 0.179 us 113.009 us
blk_mq_sched_restart 50104 9366.197 us 0.186 us 188.645 us
blk_mq_sched_restart 50375 9317.727 us 0.184 us 54.218 us
blk_mq_sched_restart 50136 9311.657 us 0.185 us 446.790 us
blk_mq_sched_restart 50103 9179.625 us 0.183 us 114.472 us
[cut]
Timings and stdevs are terrible, which leads to significant difference:
570MB/s vs 1280MB/s.
Signed-off-by: Roman Pen <roman.penyaev(a)profitbricks.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Ming Lei <ming.lei(a)redhat.com>
Cc: Jianchao Wang <jianchao.w.wang(a)oracle.com>
Cc: Johannes Thumshirn <jthumshirn(a)suse.de>
Cc: Jack Wang <jack.wang.usish(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Bart Van Assche <bart.vanassche(a)wdc.com>
[ bvanassche: modified patch title, description and Cc-list ]
---
block/blk-mq-sched.c | 10 ++++------
block/blk-mq-tag.c | 1 +
block/blk-mq-tag.h | 1 +
block/blk-mq.c | 4 ++--
include/linux/blkdev.h | 2 --
5 files changed, 8 insertions(+), 10 deletions(-)
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 56c493c6cd90..d863b1b32b07 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -60,10 +60,10 @@ static void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx)
return;
if (hctx->flags & BLK_MQ_F_TAG_SHARED) {
- struct request_queue *q = hctx->queue;
+ struct blk_mq_tags *tags = hctx->tags;
if (!test_and_set_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
- atomic_inc(&q->shared_hctx_restart);
+ atomic_inc(&tags->shared_hctx_restart);
} else
set_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
}
@@ -74,10 +74,8 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx)
return false;
if (hctx->flags & BLK_MQ_F_TAG_SHARED) {
- struct request_queue *q = hctx->queue;
-
if (test_and_clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
- atomic_dec(&q->shared_hctx_restart);
+ atomic_dec(&hctx->tags->shared_hctx_restart);
} else
clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
@@ -415,7 +413,7 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *const hctx)
* If this is 0, then we know that no hardware queues
* have RESTART marked. We're done.
*/
- if (!atomic_read(&queue->shared_hctx_restart))
+ if (!atomic_read(&tags->shared_hctx_restart))
return;
rcu_read_lock();
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 09b2ee6694fb..82cd73631adc 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -379,6 +379,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
tags->nr_tags = total_tags;
tags->nr_reserved_tags = reserved_tags;
+ atomic_set(&tags->shared_hctx_restart, 0);
return blk_mq_init_bitmap_tags(tags, node, alloc_policy);
}
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 61deab0b5a5a..477a9d67fb3d 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -12,6 +12,7 @@ struct blk_mq_tags {
unsigned int nr_reserved_tags;
atomic_t active_queues;
+ atomic_t shared_hctx_restart;
struct sbitmap_queue bitmap_tags;
struct sbitmap_queue breserved_tags;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index d394cdd8d8c6..a0fdf80db8fd 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2335,11 +2335,11 @@ static void queue_set_hctx_shared(struct request_queue *q, bool shared)
queue_for_each_hw_ctx(q, hctx, i) {
if (shared) {
if (test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
- atomic_inc(&q->shared_hctx_restart);
+ atomic_inc(&hctx->tags->shared_hctx_restart);
hctx->flags |= BLK_MQ_F_TAG_SHARED;
} else {
if (test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
- atomic_dec(&q->shared_hctx_restart);
+ atomic_dec(&hctx->tags->shared_hctx_restart);
hctx->flags &= ~BLK_MQ_F_TAG_SHARED;
}
}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 79226ca8f80f..62b20da653ca 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -442,8 +442,6 @@ struct request_queue {
int nr_rqs[2]; /* # allocated [a]sync rqs */
int nr_rqs_elvpriv; /* # allocated rqs w/ elvpriv */
- atomic_t shared_hctx_restart;
-
struct blk_queue_stats *stats;
struct rq_wb *rq_wb;
--
2.18.0
The patch titled
Subject: mm: introduce vma_init()
has been added to the -mm tree. Its filename is
mm-introduce-vma_init.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-introduce-vma_init.patch
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-introduce-vma_init.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Subject: mm: introduce vma_init()
Not all VMAs allocated with vm_area_alloc(). Some of them allocated on
stack or in data segment.
The new helper can be use to initialize VMA properly regardless where it
was allocated.
Link: http://lkml.kernel.org/r/20180724121139.62570-2-kirill.shutemov@linux.intel…
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Acked-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm.h | 6 ++++++
kernel/fork.c | 6 ++----
2 files changed, 8 insertions(+), 4 deletions(-)
diff -puN include/linux/mm.h~mm-introduce-vma_init include/linux/mm.h
--- a/include/linux/mm.h~mm-introduce-vma_init
+++ a/include/linux/mm.h
@@ -452,6 +452,12 @@ struct vm_operations_struct {
unsigned long addr);
};
+static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm)
+{
+ vma->vm_mm = mm;
+ INIT_LIST_HEAD(&vma->anon_vma_chain);
+}
+
struct mmu_gather;
struct inode;
diff -puN kernel/fork.c~mm-introduce-vma_init kernel/fork.c
--- a/kernel/fork.c~mm-introduce-vma_init
+++ a/kernel/fork.c
@@ -312,10 +312,8 @@ struct vm_area_struct *vm_area_alloc(str
{
struct vm_area_struct *vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL);
- if (vma) {
- vma->vm_mm = mm;
- INIT_LIST_HEAD(&vma->anon_vma_chain);
- }
+ if (vma)
+ vma_init(vma, mm);
return vma;
}
_
Patches currently in -mm which might be from kirill.shutemov(a)linux.intel.com are
mm-introduce-vma_init.patch
mm-use-vma_init-to-initialize-vmas-on-stack-and-data-segments.patch
mm-fix-vma_is_anonymous-false-positives.patch
mm-page_ext-drop-definition-of-unused-page_ext_debug_poison.patch
mm-page_ext-constify-lookup_page_ext-argument.patch
The patch titled
Subject: delayacct: fix crash in delayacct_blkio_end() after delayacct init failure
has been added to the -mm tree. Its filename is
delayacct-fix-crash-in-delayacct_blkio_end-after-delayacct-init-failure.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/delayacct-fix-crash-in-delayacct_b…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/delayacct-fix-crash-in-delayacct_b…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Tejun Heo <tj(a)kernel.org>
Subject: delayacct: fix crash in delayacct_blkio_end() after delayacct init failure
While forking, if delayacct init fails due to memory shortage, it
continues expecting all delayacct users to check task->delays pointer
against NULL before dereferencing it, which all of them used to do.
c96f5471ce7d ("delayacct: Account blkio completion on the correct task"),
while updating delayacct_blkio_end() to take the target task instead of
always using %current, made the function test NULL on %current->delays and
then continue to operated on @p->delays. If %current succeeded init while
@p didn't, it leads to the following crash.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
IP: __delayacct_blkio_end+0xc/0x40
PGD 8000001fd07e1067 P4D 8000001fd07e1067 PUD 1fcffbb067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 4 PID: 25774 Comm: QIOThread0 Not tainted 4.16.0-9_fbk1_rc2_1180_g6b593215b4d7 #9
Hardware name: Quanta Leopard ORv2-DDR4/Leopard ORv2-DDR4, BIOS F06_3B12 08/17/2017
RIP: 0010:__delayacct_blkio_end+0xc/0x40
RSP: 0000:ffff881fff703bf8 EFLAGS: 00010086
RAX: ffff881f1ec8b800 RBX: ffff8804f735cd54 RCX: ffff881fff703cb0
RDX: 0000000000000002 RSI: 0000000000000003 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff881fff703cc0
R10: 0000000000001000 R11: ffff881fd3f73d00 R12: ffff8804f735c600
R13: 0000000000000000 R14: 000000000000001d R15: ffff881fff703cb0
FS: 00007f5003f7d700(0000) GS:ffff881fff700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000004 CR3: 0000001f401a6006 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
try_to_wake_up+0x2c0/0x600
autoremove_wake_function+0xe/0x30
__wake_up_common+0x74/0x120
wake_up_page_bit+0x9c/0xe0
mpage_end_io+0x27/0x70
blk_update_request+0x78/0x2c0
scsi_end_request+0x2c/0x1e0
scsi_io_completion+0x20b/0x5f0
blk_mq_complete_request+0xa2/0x100
ata_scsi_qc_complete+0x79/0x400
ata_qc_complete_multiple+0x86/0xd0
ahci_handle_port_interrupt+0xc9/0x5c0
ahci_handle_port_intr+0x54/0xb0
ahci_single_level_irq_intr+0x3b/0x60
__handle_irq_event_percpu+0x43/0x190
handle_irq_event_percpu+0x20/0x50
handle_irq_event+0x2a/0x50
handle_edge_irq+0x80/0x1c0
handle_irq+0xaf/0x120
do_IRQ+0x41/0xc0
common_interrupt+0xf/0xf
</IRQ>
Fix it by updating delayacct_blkio_end() check @p->delays instead.
Link: http://lkml.kernel.org/r/20180724175542.GP1934745@devbig577.frc2.facebook.c…
Fixes: c96f5471ce7d ("delayacct: Account blkio completion on the correct task")
Signed-off-by: Tejun Heo <tj(a)kernel.org>
Reported-by: Dave Jones <dsj(a)fb.com>
Debugged-by: Dave Jones <dsj(a)fb.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Josh Snyder <joshs(a)netflix.com>
Cc: <stable(a)vger.kernel.org> [4.15+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/delayacct.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN include/linux/delayacct.h~delayacct-fix-crash-in-delayacct_blkio_end-after-delayacct-init-failure include/linux/delayacct.h
--- a/include/linux/delayacct.h~delayacct-fix-crash-in-delayacct_blkio_end-after-delayacct-init-failure
+++ a/include/linux/delayacct.h
@@ -124,7 +124,7 @@ static inline void delayacct_blkio_start
static inline void delayacct_blkio_end(struct task_struct *p)
{
- if (current->delays)
+ if (p->delays)
__delayacct_blkio_end(p);
delayacct_clear_flag(DELAYACCT_PF_BLKIO);
}
_
Patches currently in -mm which might be from tj(a)kernel.org are
delayacct-fix-crash-in-delayacct_blkio_end-after-delayacct-init-failure.patch
This is the start of the stable review cycle for the 4.9.115 release.
There are 28 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed Jul 25 12:24:13 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.115-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.115-rc1
Alan Jenkins <alan.christopher.jenkins(a)gmail.com>
block: do not use interruptible wait anywhere
Chuck Lever <chuck.lever(a)oracle.com>
xprtrdma: Return -ENOBUFS when no pages are available
Mathias Nyman <mathias.nyman(a)linux.intel.com>
xhci: Fix perceived dead host due to runtime suspend race with event handler
Stefano Brivio <sbrivio(a)redhat.com>
skbuff: Unconditionally copy pfmemalloc in __skb_clone()
Stefano Brivio <sbrivio(a)redhat.com>
net: Don't copy pfmemalloc flag in __copy_skb_header()
Alexander Couzens <lynxis(a)fe80.eu>
net: usb: asix: replace mii_nway_restart in resume path
Sanjeev Bansal <sanjeevb.bansal(a)broadcom.com>
tg3: Add higher cpu clock for 5762.
Matevz Vucnik <vucnikm(a)gmail.com>
qmi_wwan: add support for Quectel EG91
Gustavo A. R. Silva <gustavo(a)embeddedor.com>
ptp: fix missing break in switch
Heiner Kallweit <hkallweit1(a)gmail.com>
net: phy: fix flag masking in __set_phy_supported
David Ahern <dsahern(a)gmail.com>
net/ipv4: Set oif in fib_compute_spec_dst
Lorenzo Colitti <lorenzo(a)google.com>
net: diag: Don't double-free TCP_NEW_SYN_RECV sockets in tcp_abort
Davidlohr Bueso <dave(a)stgolabs.net>
lib/rhashtable: consider param->min_size when setting initial table size
Colin Ian King <colin.king(a)canonical.com>
ipv6: fix useless rol32 call on hash
Tyler Hicks <tyhicks(a)canonical.com>
ipv4: Return EINVAL when ping_group_range sysctl doesn't map to user ns
Toke Høiland-Jørgensen <toke(a)toke.dk>
gen_stats: Fix netlink stats dumping in the presence of padding
Ville Syrjälä <ville.syrjala(a)linux.intel.com>
drm/i915: Fix hotplug irq ack on i965/g4x
Gustavo A. R. Silva <gustavo(a)embeddedor.com>
vfio/pci: Fix potential Spectre v1
Hugh Dickins <hughd(a)google.com>
mm/huge_memory.c: fix data loss when splitting a file pmd
Jing Xia <jing.xia.mail(a)gmail.com>
mm: memcg: fix use after free in mem_cgroup_iter()
Alexey Brodkin <Alexey.Brodkin(a)synopsys.com>
ARC: configs: Remove CONFIG_INITRAMFS_SOURCE from defconfigs
Vineet Gupta <vgupta(a)synopsys.com>
ARC: mm: allow mprotect to make stack mappings executable
Alexey Brodkin <abrodkin(a)synopsys.com>
ARC: Fix CONFIG_SWAP
Takashi Iwai <tiwai(a)suse.de>
ALSA: rawmidi: Change resized buffers atomically
OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
fat: fix memory allocation failure handling of match_strdup()
Dewet Thibaut <thibaut.dewet(a)nokia.com>
x86/MCE: Remove min interval polling limitation
Ville Syrjälä <ville.syrjala(a)linux.intel.com>
x86/apm: Don't access __preempt_count with zeroed fs
Lan Tianyu <tianyu.lan(a)intel.com>
KVM/Eventfd: Avoid crash when assign and deassign specific eventfd in parallel.
-------------
Diffstat:
Makefile | 4 +--
arch/arc/configs/axs101_defconfig | 1 -
arch/arc/configs/axs103_defconfig | 1 -
arch/arc/configs/axs103_smp_defconfig | 1 -
arch/arc/configs/nsim_700_defconfig | 1 -
arch/arc/configs/nsim_hs_defconfig | 1 -
arch/arc/configs/nsim_hs_smp_defconfig | 1 -
arch/arc/configs/nsimosci_defconfig | 1 -
arch/arc/configs/nsimosci_hs_defconfig | 1 -
arch/arc/configs/nsimosci_hs_smp_defconfig | 1 -
arch/arc/include/asm/page.h | 2 +-
arch/arc/include/asm/pgtable.h | 2 +-
arch/x86/include/asm/apm.h | 6 -----
arch/x86/kernel/apm_32.c | 5 ++++
arch/x86/kernel/cpu/mcheck/mce.c | 3 ---
block/blk-core.c | 9 +++----
drivers/gpu/drm/i915/i915_irq.c | 32 ++++++++++++++++++++++--
drivers/net/ethernet/broadcom/tg3.c | 9 +++++++
drivers/net/phy/phy_device.c | 7 ++----
drivers/net/usb/asix_devices.c | 4 ++-
drivers/net/usb/qmi_wwan.c | 1 +
drivers/ptp/ptp_chardev.c | 1 +
drivers/usb/host/xhci.c | 40 +++++++++++++++++++++++++++---
drivers/usb/host/xhci.h | 4 +++
drivers/vfio/pci/vfio_pci.c | 4 +++
fs/fat/inode.c | 20 +++++++++------
include/linux/skbuff.h | 10 ++++----
include/net/ipv6.h | 2 +-
lib/rhashtable.c | 17 ++++++++-----
mm/huge_memory.c | 2 ++
mm/memcontrol.c | 2 +-
net/core/gen_stats.c | 16 ++++++++++--
net/core/skbuff.c | 1 +
net/ipv4/fib_frontend.c | 1 +
net/ipv4/sysctl_net_ipv4.c | 5 ++--
net/ipv4/tcp.c | 3 +--
net/sunrpc/xprtrdma/rpc_rdma.c | 2 +-
sound/core/rawmidi.c | 20 ++++++++++-----
virt/kvm/eventfd.c | 6 ++++-
39 files changed, 176 insertions(+), 73 deletions(-)
According to the official documentation for HFS+ [1], inode timestamps
are supposed to cover the time range from 1904 to 2040 as originally
used in classic MacOS.
The traditional Linux usage is to convert the timestamps into an unsigned
32-bit number based on the Unix epoch and from there to a time_t. On
32-bit systems, that wraps the time from 2038 to 1902, so the last
two years of the valid time range become garbled. On 64-bit systems,
all times before 1970 get turned into timestamps between 2038 and 2106,
which is more convenient but also different from the documented behavior.
Looking at the Darwin sources [2], it seems that MacOS is inconsistent in
yet another way: all timestamps are wrapped around to a 32-bit unsigned
number when written to the disk, but when read back, all numeric values
lower than 2082844800U are assumed to be invalid, so we cannot represent
the times before 1970 or the times after 2040.
While all implementations seem to agree on the interpretation of values
between 1970 and 2038, they often differ on the exact range they support
when reading back values outside of the common range:
MacOS (traditional): 1904-2040
Apple Documentation: 1904-2040
MacOS X source comments: 1970-2040
MacOS X source code: 1970-2038
32-bit Linux: 1902-2038
64-bit Linux: 1970-2106
hfsfuse: 1970-2040
hfsutils (32 bit, old libc) 1902-2038
hfsutils (32 bit, new libc) 1970-2106
hfsutils (64 bit) 1904-2040
hfsplus-utils 1904-2040
hfsexplorer 1904-2040
7-zip 1904-2040
This changes Linux over to mostly the same behavior as described in the
code comment in MacOS X, disallowing all times before 1970 and after
2040, while still allowing times between 2038 and 2040 like most other
implementations do. Most importantly, it means we can have the same
behavior on 32-bit and 64-bit.
Cc: stable(a)vger.kernel.org
Link: [1] https://developer.apple.com/library/archive/technotes/tn/tn1150.html
Link: [2] https://opensource.apple.com/source/hfs/hfs-407.30.1/core/MacOSStubs.c.auto…
Suggested-by: Viacheslav Dubeyko <slava(a)dubeyko.com>
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
---
v2: treat pre-1970 dates as invalid following MacOS X behavior,
reword and expand changelog text
---
fs/hfs/hfs_fs.h | 29 +++++++++++++++++++++++++----
fs/hfsplus/hfsplus_fs.h | 26 +++++++++++++++++++++++---
2 files changed, 48 insertions(+), 7 deletions(-)
diff --git a/fs/hfs/hfs_fs.h b/fs/hfs/hfs_fs.h
index 6d0783e2e276..1af998fb522e 100644
--- a/fs/hfs/hfs_fs.h
+++ b/fs/hfs/hfs_fs.h
@@ -246,14 +246,35 @@ extern void hfs_mark_mdb_dirty(struct super_block *sb);
* mac: unsigned big-endian since 00:00 GMT, Jan. 1, 1904
*
*/
-#define __hfs_u_to_mtime(sec) cpu_to_be32(sec + 2082844800U - sys_tz.tz_minuteswest * 60)
-#define __hfs_m_to_utime(sec) (be32_to_cpu(sec) - 2082844800U + sys_tz.tz_minuteswest * 60)
+static inline time64_t __hfs_m_to_utime(__be32 mt)
+{
+ time64_t ut = (u32)(be32_to_cpu(mt) - 2082844800U);
+
+ /*
+ * Times past 2040-02-06 06:28 are assumed to be invalid,
+ * matching the MacOS behavior.
+ */
+ if (ut > 2082844800U + UINT_MAX)
+ ut = 0;
+
+ return ut + sys_tz.tz_minuteswest * 60;
+}
+static inline __be32 __hfs_u_to_mtime(time64_t ut)
+{
+ ut -= - sys_tz.tz_minuteswest * 60;
+
+ /*
+ * MacOS wraps "invalid" times after 2040 when writing back, so
+ * let's do the same here.
+ */
+ return cpu_to_be32(lower_32_bits(ut + 2082844800U));
+}
#define HFS_I(inode) (container_of(inode, struct hfs_inode_info, vfs_inode))
#define HFS_SB(sb) ((struct hfs_sb_info *)(sb)->s_fs_info)
-#define hfs_m_to_utime(time) (struct timespec){ .tv_sec = __hfs_m_to_utime(time) }
-#define hfs_u_to_mtime(time) __hfs_u_to_mtime((time).tv_sec)
+#define hfs_m_to_utime(time) (struct timespec){ .tv_sec = __hfs_m_to_utime(time) }
+#define hfs_u_to_mtime(time) __hfs_u_to_mtime((time).tv_sec)
#define hfs_mtime() __hfs_u_to_mtime(get_seconds())
static inline const char *hfs_mdb_name(struct super_block *sb)
diff --git a/fs/hfsplus/hfsplus_fs.h b/fs/hfsplus/hfsplus_fs.h
index d9255abafb81..7f0943e540a0 100644
--- a/fs/hfsplus/hfsplus_fs.h
+++ b/fs/hfsplus/hfsplus_fs.h
@@ -530,9 +530,29 @@ int hfsplus_submit_bio(struct super_block *sb, sector_t sector, void *buf,
void **data, int op, int op_flags);
int hfsplus_read_wrapper(struct super_block *sb);
-/* time macros */
-#define __hfsp_mt2ut(t) (be32_to_cpu(t) - 2082844800U)
-#define __hfsp_ut2mt(t) (cpu_to_be32(t + 2082844800U))
+/* time helpers */
+static inline time64_t __hfsp_mt2ut(__be32 mt)
+{
+ time64_t ut = (u32)(be32_to_cpu(mt) - 2082844800U);
+
+ /*
+ * Times past 2040-02-06 06:28 are assumed to be invalid,
+ * matching the MacOS behavior.
+ */
+ if (ut > 2082844800U + UINT_MAX)
+ ut = 0;
+
+ return ut;
+}
+
+static inline __be32 __hfsp_ut2mt(time64_t ut)
+{
+ /*
+ * MacOS wraps "invalid" times after 2040 when writing back, so
+ * let's do the same here.
+ */
+ return cpu_to_be32(lower_32_bits(ut + 2082844800U));
+}
/* compatibility */
#define hfsp_mt2ut(t) (struct timespec){ .tv_sec = __hfsp_mt2ut(t) }
--
2.9.0
Inherit the tracing on/off setting on ring_buffer to next
trace buffer when taking a snapshot.
Taking a snapshot is done by swapping with backup ring buffer
(max_tr_buffer). But since the tracing on/off setting is set
in the ring buffer, when swapping it, tracing on/off setting
can also be changed. This causes a strange result like below;
/sys/kernel/debug/tracing # cat tracing_on
1
/sys/kernel/debug/tracing # echo 0 > tracing_on
/sys/kernel/debug/tracing # echo 1 > snapshot
/sys/kernel/debug/tracing # cat tracing_on
1
/sys/kernel/debug/tracing # echo 1 > snapshot
/sys/kernel/debug/tracing # cat tracing_on
0
We don't touch tracing_on, but snapshot changes tracing_on
setting each time. This must be a bug, because user never know
that each "ring_buffer" stores tracing-enable state and
snapshot is done by swapping ring buffers.
This patch fixes above strange behavior.
Fixes: commit debdd57f5145 ("tracing: Make a snapshot feature available from userspace")
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Hiraku Toyooka <hiraku.toyooka(a)cybertrust.co.jp>
Cc: stable(a)vger.kernel.org
---
include/linux/ring_buffer.h | 1 +
kernel/trace/ring_buffer.c | 12 ++++++++++++
kernel/trace/trace.c | 6 ++++++
3 files changed, 19 insertions(+)
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index b72ebdff0b77..003d09ab308d 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -165,6 +165,7 @@ void ring_buffer_record_enable(struct ring_buffer *buffer);
void ring_buffer_record_off(struct ring_buffer *buffer);
void ring_buffer_record_on(struct ring_buffer *buffer);
int ring_buffer_record_is_on(struct ring_buffer *buffer);
+int ring_buffer_record_is_set_on(struct ring_buffer *buffer);
void ring_buffer_record_disable_cpu(struct ring_buffer *buffer, int cpu);
void ring_buffer_record_enable_cpu(struct ring_buffer *buffer, int cpu);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 6a46af21765c..4038ed74ab95 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -3227,6 +3227,18 @@ int ring_buffer_record_is_on(struct ring_buffer *buffer)
}
/**
+ * ring_buffer_record_is_set_on - return true if the ring buffer is set writable
+ * @buffer: The ring buffer to see if write is set enabled
+ *
+ * Returns true if the ring buffer is set writable by ring_buffer_record_on().
+ * Note that this does NOT mean it is in a writable state.
+ */
+int ring_buffer_record_is_set_on(struct ring_buffer *buffer)
+{
+ return !(atomic_read(&buffer->record_disabled) & RB_BUFFER_OFF);
+}
+
+/**
* ring_buffer_record_disable_cpu - stop all writes into the cpu_buffer
* @buffer: The ring buffer to stop writes to.
* @cpu: The CPU buffer to stop
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2556d8c097d2..bbd5a94a7ef1 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1378,6 +1378,12 @@ update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu)
arch_spin_lock(&tr->max_lock);
+ /* Inherit the recordable setting from trace_buffer */
+ if (ring_buffer_record_is_set_on(tr->trace_buffer.buffer))
+ ring_buffer_record_on(tr->max_buffer.buffer);
+ else
+ ring_buffer_record_off(tr->max_buffer.buffer);
+
swap(tr->trace_buffer.buffer, tr->max_buffer.buffer);
__update_max_tr(tr, tsk, cpu);
The existing code to carve up the sg list expected an sg element-per-page
which can be very incorrect with iommu's remapping multiple memory pages
to fewer bus addresses. To hit this error required a large io payload
(greater than 256k) and a system that maps on a per-page basis. It's
possible that large ios could get by fine if the system condensed the
sgl list into the first 64 elements.
This patch corrects the sg list handling by specifically walking the
sg list element by element and attempting to divide the transfer up
on a per-sg element boundary. While doing so, it still tries to keep
sequences under 256k, but will exceed that rule if a single sg element
is larger than 256k.
Fixes: 48fa362b6c3f ("nvmet-fc: simplify sg list handling")
Cc: <stable(a)vger.kernel.org> # 4.14
Signed-off-by: James Smart <james.smart(a)broadcom.com>
---
drivers/nvme/target/fc.c | 44 +++++++++++++++++++++++++++++++++++---------
1 file changed, 35 insertions(+), 9 deletions(-)
diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c
index 408279cb6f2c..29b4b236afd8 100644
--- a/drivers/nvme/target/fc.c
+++ b/drivers/nvme/target/fc.c
@@ -58,8 +58,8 @@ struct nvmet_fc_ls_iod {
struct work_struct work;
} __aligned(sizeof(unsigned long long));
+/* desired maximum for a single sequence - if sg list allows it */
#define NVMET_FC_MAX_SEQ_LENGTH (256 * 1024)
-#define NVMET_FC_MAX_XFR_SGENTS (NVMET_FC_MAX_SEQ_LENGTH / PAGE_SIZE)
enum nvmet_fcp_datadir {
NVMET_FCP_NODATA,
@@ -74,6 +74,7 @@ struct nvmet_fc_fcp_iod {
struct nvme_fc_cmd_iu cmdiubuf;
struct nvme_fc_ersp_iu rspiubuf;
dma_addr_t rspdma;
+ struct scatterlist *next_sg;
struct scatterlist *data_sg;
int data_sg_cnt;
u32 offset;
@@ -1025,8 +1026,7 @@ nvmet_fc_register_targetport(struct nvmet_fc_port_info *pinfo,
INIT_LIST_HEAD(&newrec->assoc_list);
kref_init(&newrec->ref);
ida_init(&newrec->assoc_cnt);
- newrec->max_sg_cnt = min_t(u32, NVMET_FC_MAX_XFR_SGENTS,
- template->max_sgl_segments);
+ newrec->max_sg_cnt = template->max_sgl_segments;
ret = nvmet_fc_alloc_ls_iodlist(newrec);
if (ret) {
@@ -1722,6 +1722,7 @@ nvmet_fc_alloc_tgt_pgs(struct nvmet_fc_fcp_iod *fod)
((fod->io_dir == NVMET_FCP_WRITE) ?
DMA_FROM_DEVICE : DMA_TO_DEVICE));
/* note: write from initiator perspective */
+ fod->next_sg = fod->data_sg;
return 0;
@@ -1866,24 +1867,49 @@ nvmet_fc_transfer_fcp_data(struct nvmet_fc_tgtport *tgtport,
struct nvmet_fc_fcp_iod *fod, u8 op)
{
struct nvmefc_tgt_fcp_req *fcpreq = fod->fcpreq;
+ struct scatterlist *sg = fod->next_sg;
unsigned long flags;
- u32 tlen;
+ u32 remaininglen = fod->req.transfer_len - fod->offset;
+ u32 tlen = 0;
int ret;
fcpreq->op = op;
fcpreq->offset = fod->offset;
fcpreq->timeout = NVME_FC_TGTOP_TIMEOUT_SEC;
- tlen = min_t(u32, tgtport->max_sg_cnt * PAGE_SIZE,
- (fod->req.transfer_len - fod->offset));
+ /*
+ * for next sequence:
+ * break at a sg element boundary
+ * attempt to keep sequence length capped at
+ * NVMET_FC_MAX_SEQ_LENGTH but allow sequence to
+ * be longer if a single sg element is larger
+ * than that amount. This is done to avoid creating
+ * a new sg list to use for the tgtport api.
+ */
+ fcpreq->sg = sg;
+ fcpreq->sg_cnt = 0;
+ while (tlen < remaininglen &&
+ fcpreq->sg_cnt < tgtport->max_sg_cnt &&
+ tlen + sg_dma_len(sg) < NVMET_FC_MAX_SEQ_LENGTH) {
+ fcpreq->sg_cnt++;
+ tlen += sg_dma_len(sg);
+ sg = sg_next(sg);
+ }
+ if (tlen < remaininglen && fcpreq->sg_cnt == 0) {
+ fcpreq->sg_cnt++;
+ tlen += min_t(u32, sg_dma_len(sg), remaininglen);
+ sg = sg_next(sg);
+ }
+ if (tlen < remaininglen)
+ fod->next_sg = sg;
+ else
+ fod->next_sg = NULL;
+
fcpreq->transfer_length = tlen;
fcpreq->transferred_length = 0;
fcpreq->fcp_error = 0;
fcpreq->rsplen = 0;
- fcpreq->sg = &fod->data_sg[fod->offset / PAGE_SIZE];
- fcpreq->sg_cnt = DIV_ROUND_UP(tlen, PAGE_SIZE);
-
/*
* If the last READDATA request: check if LLDD supports
* combined xfr with response.
--
2.13.1
I would like to speak with the person that managing photos for your
company?
We provide image editing like – photos cutting out and retouching.
Enhancing your images is just a part of what we can do for your business.
Whether you’re an ecommerce
store or portrait photographer, real estate professional, or an e-Retailer,
we are your personal team
of photo editors that integrate seamlessly with your business.
Our mainly services are:
. Cut out, masking, clipping path, deep etching, transparent background
. Colour correction, black and white, light and shadows etc.
. Dust cleaning, spot cleaning
. Beauty retouching, skin retouching, face retouching, body retouching
. Fashion/Beauty Image Retouching
. Product image Retouching
. Real estate image Retouching
. Wedding & Event Album Design.
. Restoration and repair old images
. Vector Conversion
. Portrait image Retouching
We can provide you editing test on your photos.
Please reply if you are interested.
Thanks,
Roland
hrtimer_cancel() busy-waits for the hrtimer callback to stop,
pretty much like del_timer_sync(). This creates a possible deadlock
scenario where we hold a spinlock before calling hrtimer_cancel()
while in trying to acquire the same spinlock in the callback.
This kind of deadlock is already known and is catchable by lockdep,
like for del_timer_sync(), we can add lockdep annotations. However,
it is still missing for hrtimer_cancel(). (I have a WIP patch to make
it complete for hrtimer_cancel() but it breaks booting.)
And there is such a deadlock scenario in kernel/events/core.c too,
well actually, it is a simpler version: the hrtimer callback waits
for itself to finish on the same CPU! It sounds stupid but it is
not obvious at all, it hides very deeply in the perf event code:
cpu_clock_event_init():
perf_swevent_init_hrtimer():
hwc->hrtimer.function = perf_swevent_hrtimer;
perf_swevent_hrtimer():
__perf_event_overflow():
__perf_event_account_interrupt():
perf_adjust_period():
pmu->stop():
cpu_clock_event_stop():
perf_swevent_cancel():
hrtimer_cancel()
Getting stuck in a timer doesn't sound very scary, however, in this
case, its consequences are a disaster:
perf_event_overflow() which calls __perf_event_overflow() is called
in NMI handler too, so it is racy with hrtimer callback as disabling
IRQ can't possibly disable NMI. This means this hrtimer callback
once interrupted by an NMI handler could deadlock within NMI!
As a further consequence, other IRQ handling is blocked too, notably
the IPI handler, especially when smp_call_function_*() waits for their
callbacks synchronously. This is why we saw so many soft lockup's in
smp_call_function_single() given how widely they are used in kernel.
Ironically, perf event code uses synchronous smp_call_function_single()
heavily too.
The fix is not easy. To minimize the impact, ideally we should just
avoid busy waiting when it is called within the hrtimer callback on
the same CPU, there is no reason to wait for itself to finish anyway.
Probably it doesn't even need to cancel itself either since it will
restart by pmu->start() later. There are two possible fixes here:
1. Modify hrtimer API to detect if a hrtimer callback is running
on the same CPU now. This does not look pretty though.
2. Passing some information from perf_swevent_hrtimer() down to
perf_swevent_cancel().
So I pick the latter approach, it is simple and straightforward.
Note, currently perf_swevent_hrtimer() still races with
perf_event_overflow() in NMI on the same CPU anyway, given there is no
lock around and probably locking does not even help. But it is nothing
new, and the race itself is not bad either, at most we have some
inconsistent updates on the event sample period.
Fixes: abd50713944c ("perf: Reimplement frequency driven sampling")
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme(a)kernel.org>
Cc: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Cc: Jiri Olsa <jolsa(a)redhat.com>
Cc: Namhyung Kim <namhyung(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Cong Wang <xiyou.wangcong(a)gmail.com>
---
include/linux/perf_event.h | 3 +++
kernel/events/core.c | 43 +++++++++++++++++++++++++++----------------
2 files changed, 30 insertions(+), 16 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1fa12887ec02..aab39b8aa720 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -310,6 +310,9 @@ struct pmu {
#define PERF_EF_START 0x01 /* start the counter when adding */
#define PERF_EF_RELOAD 0x02 /* reload the counter when starting */
#define PERF_EF_UPDATE 0x04 /* update the counter when stopping */
+#define PERF_EF_NO_WAIT 0x08 /* do not wait when stopping, for
+ * example, waiting for a timer
+ */
/*
* Adds/Removes a counter to/from the PMU, can be done inside a
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8f0434a9951a..f15832346b35 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3555,7 +3555,8 @@ do { \
static DEFINE_PER_CPU(int, perf_throttled_count);
static DEFINE_PER_CPU(u64, perf_throttled_seq);
-static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bool disable)
+static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count,
+ bool disable, bool nowait)
{
struct hw_perf_event *hwc = &event->hw;
s64 period, sample_period;
@@ -3574,8 +3575,13 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bo
hwc->sample_period = sample_period;
if (local64_read(&hwc->period_left) > 8*sample_period) {
- if (disable)
- event->pmu->stop(event, PERF_EF_UPDATE);
+ if (disable) {
+ int flags = PERF_EF_UPDATE;
+
+ if (nowait)
+ flags |= PERF_EF_NO_WAIT;
+ event->pmu->stop(event, flags);
+ }
local64_set(&hwc->period_left, 0);
@@ -3645,7 +3651,7 @@ static void perf_adjust_freq_unthr_context(struct perf_event_context *ctx,
* twice.
*/
if (delta > 0)
- perf_adjust_period(event, period, delta, false);
+ perf_adjust_period(event, period, delta, false, false);
event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);
next:
@@ -7681,7 +7687,8 @@ static void perf_log_itrace_start(struct perf_event *event)
}
static int
-__perf_event_account_interrupt(struct perf_event *event, int throttle)
+__perf_event_account_interrupt(struct perf_event *event, int throttle,
+ bool nowait)
{
struct hw_perf_event *hwc = &event->hw;
int ret = 0;
@@ -7710,7 +7717,8 @@ __perf_event_account_interrupt(struct perf_event *event, int throttle)
hwc->freq_time_stamp = now;
if (delta > 0 && delta < 2*TICK_NSEC)
- perf_adjust_period(event, delta, hwc->last_period, true);
+ perf_adjust_period(event, delta, hwc->last_period, true,
+ nowait);
}
return ret;
@@ -7718,7 +7726,7 @@ __perf_event_account_interrupt(struct perf_event *event, int throttle)
int perf_event_account_interrupt(struct perf_event *event)
{
- return __perf_event_account_interrupt(event, 1);
+ return __perf_event_account_interrupt(event, 1, false);
}
/*
@@ -7727,7 +7735,7 @@ int perf_event_account_interrupt(struct perf_event *event)
static int __perf_event_overflow(struct perf_event *event,
int throttle, struct perf_sample_data *data,
- struct pt_regs *regs)
+ struct pt_regs *regs, bool nowait)
{
int events = atomic_read(&event->event_limit);
int ret = 0;
@@ -7739,7 +7747,7 @@ static int __perf_event_overflow(struct perf_event *event,
if (unlikely(!is_sampling_event(event)))
return 0;
- ret = __perf_event_account_interrupt(event, throttle);
+ ret = __perf_event_account_interrupt(event, throttle, nowait);
/*
* XXX event_limit might not quite work as expected on inherited
@@ -7768,7 +7776,7 @@ int perf_event_overflow(struct perf_event *event,
struct perf_sample_data *data,
struct pt_regs *regs)
{
- return __perf_event_overflow(event, 1, data, regs);
+ return __perf_event_overflow(event, 1, data, regs, true);
}
/*
@@ -7831,7 +7839,7 @@ static void perf_swevent_overflow(struct perf_event *event, u64 overflow,
for (; overflow; overflow--) {
if (__perf_event_overflow(event, throttle,
- data, regs)) {
+ data, regs, false)) {
/*
* We inhibit the overflow from happening when
* hwc->interrupts == MAX_INTERRUPTS.
@@ -9110,7 +9118,7 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
if (regs && !perf_exclude_event(event, regs)) {
if (!(event->attr.exclude_idle && is_idle_task(current)))
- if (__perf_event_overflow(event, 1, &data, regs))
+ if (__perf_event_overflow(event, 1, &data, regs, true))
ret = HRTIMER_NORESTART;
}
@@ -9141,7 +9149,7 @@ static void perf_swevent_start_hrtimer(struct perf_event *event)
HRTIMER_MODE_REL_PINNED);
}
-static void perf_swevent_cancel_hrtimer(struct perf_event *event)
+static void perf_swevent_cancel_hrtimer(struct perf_event *event, bool sync)
{
struct hw_perf_event *hwc = &event->hw;
@@ -9149,7 +9157,10 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event)
ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
local64_set(&hwc->period_left, ktime_to_ns(remaining));
- hrtimer_cancel(&hwc->hrtimer);
+ if (sync)
+ hrtimer_cancel(&hwc->hrtimer);
+ else
+ hrtimer_try_to_cancel(&hwc->hrtimer);
}
}
@@ -9200,7 +9211,7 @@ static void cpu_clock_event_start(struct perf_event *event, int flags)
static void cpu_clock_event_stop(struct perf_event *event, int flags)
{
- perf_swevent_cancel_hrtimer(event);
+ perf_swevent_cancel_hrtimer(event, flags & PERF_EF_NO_WAIT);
cpu_clock_event_update(event);
}
@@ -9277,7 +9288,7 @@ static void task_clock_event_start(struct perf_event *event, int flags)
static void task_clock_event_stop(struct perf_event *event, int flags)
{
- perf_swevent_cancel_hrtimer(event);
+ perf_swevent_cancel_hrtimer(event, flags & PERF_EF_NO_WAIT);
task_clock_event_update(event, event->ctx->time);
}
--
2.14.4
Commit b1092c9af9ed ("bcache: allow quick writeback when backing idle")
allows the writeback rate to be faster if there is no I/O request on a
bcache device. It works well if there is only one bcache device attached
to the cache set. If there are many bcache devices attached to a cache
set, it may introduce performance regression because multiple faster
writeback threads of the idle bcache devices will compete the btree level
locks with the bcache device who have I/O requests coming.
This patch fixes the above issue by only permitting fast writebac when
all bcache devices attached on the cache set are idle. And if one of the
bcache devices has new I/O request coming, minimized all writeback
throughput immediately and let PI controller __update_writeback_rate()
to decide the upcoming writeback rate for each bcache device.
Also when all bcache devices are idle, limited wrieback rate to a small
number is wast of thoughput, especially when backing devices are slower
non-rotation devices (e.g. SATA SSD). This patch sets a max writeback
rate for each backing device if the whole cache set is idle. A faster
writeback rate in idle time means new I/Os may have more available space
for dirty data, and people may observe a better write performance then.
Please note bcache may change its cache mode in run time, and this patch
still works if the cache mode is switched from writeback mode and there
is still dirty data on cache.
Fixes: Commit b1092c9af9ed ("bcache: allow quick writeback when backing idle")
Cc: stable(a)vger.kernel.org #4.16+
Signed-off-by: Coly Li <colyli(a)suse.de>
Tested-by: Kai Krakow <kai(a)kaishome.de>
Cc: Michael Lyle <mlyle(a)lyle.org>
Cc: Stefan Priebe <s.priebe(a)profihost.ag>
---
Channgelog:
v2, Fix a deadlock reported by Stefan Priebe.
v1, Initial version.
drivers/md/bcache/bcache.h | 11 ++--
drivers/md/bcache/request.c | 51 ++++++++++++++-
drivers/md/bcache/super.c | 1 +
drivers/md/bcache/sysfs.c | 14 +++--
drivers/md/bcache/util.c | 2 +-
drivers/md/bcache/util.h | 2 +-
drivers/md/bcache/writeback.c | 115 ++++++++++++++++++++++++++--------
7 files changed, 155 insertions(+), 41 deletions(-)
diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index d6bf294f3907..469ab1a955e0 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -328,13 +328,6 @@ struct cached_dev {
*/
atomic_t has_dirty;
- /*
- * Set to zero by things that touch the backing volume-- except
- * writeback. Incremented by writeback. Used to determine when to
- * accelerate idle writeback.
- */
- atomic_t backing_idle;
-
struct bch_ratelimit writeback_rate;
struct delayed_work writeback_rate_update;
@@ -514,6 +507,8 @@ struct cache_set {
struct cache_accounting accounting;
unsigned long flags;
+ atomic_t idle_counter;
+ atomic_t at_max_writeback_rate;
struct cache_sb sb;
@@ -523,6 +518,8 @@ struct cache_set {
struct bcache_device **devices;
unsigned devices_max_used;
+ /* See set_at_max_writeback_rate() for how it is used */
+ unsigned previous_dirty_dc_nr;
struct list_head cached_devs;
uint64_t cached_dev_sectors;
struct closure caching;
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index ae67f5fa8047..1af3d96abfa5 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -1104,6 +1104,43 @@ static void detached_dev_do_request(struct bcache_device *d, struct bio *bio)
/* Cached devices - read & write stuff */
+static void quit_max_writeback_rate(struct cache_set *c,
+ struct cached_dev *this_dc)
+{
+ int i;
+ struct bcache_device *d;
+ struct cached_dev *dc;
+
+ /*
+ * If bch_register_lock is acquired by other attach/detach operations,
+ * waiting here will increase I/O request latency for seconds or more.
+ * To avoid such situation, only writeback rate of current cached device
+ * is set to 1, and __update_write_back() will decide writeback rate
+ * of other cached devices (remember c->idle_counter is 0 now).
+ */
+ if (mutex_trylock(&bch_register_lock)){
+ for (i = 0; i < c->devices_max_used; i++) {
+ if (!c->devices[i])
+ continue;
+
+ if (UUID_FLASH_ONLY(&c->uuids[i]))
+ continue;
+
+ d = c->devices[i];
+ dc = container_of(d, struct cached_dev, disk);
+ /*
+ * set writeback rate to default minimum value,
+ * then let update_writeback_rate() to decide the
+ * upcoming rate.
+ */
+ atomic64_set(&dc->writeback_rate.rate, 1);
+ }
+
+ mutex_unlock(&bch_register_lock);
+ } else
+ atomic64_set(&this_dc->writeback_rate.rate, 1);
+}
+
static blk_qc_t cached_dev_make_request(struct request_queue *q,
struct bio *bio)
{
@@ -1119,7 +1156,19 @@ static blk_qc_t cached_dev_make_request(struct request_queue *q,
return BLK_QC_T_NONE;
}
- atomic_set(&dc->backing_idle, 0);
+ if (d->c) {
+ atomic_set(&d->c->idle_counter, 0);
+ /*
+ * If at_max_writeback_rate of cache set is true and new I/O
+ * comes, quit max writeback rate of all cached devices
+ * attached to this cache set, and set at_max_writeback_rate
+ * to false.
+ */
+ if (unlikely(atomic_read(&d->c->at_max_writeback_rate) == 1)) {
+ atomic_set(&d->c->at_max_writeback_rate, 0);
+ quit_max_writeback_rate(d->c, dc);
+ }
+ }
generic_start_io_acct(q, rw, bio_sectors(bio), &d->disk->part0);
bio_set_dev(bio, dc->bdev);
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index fa4058e43202..fa532d9f9353 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1687,6 +1687,7 @@ struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
c->block_bits = ilog2(sb->block_size);
c->nr_uuids = bucket_bytes(c) / sizeof(struct uuid_entry);
c->devices_max_used = 0;
+ c->previous_dirty_dc_nr = 0;
c->btree_pages = bucket_pages(c);
if (c->btree_pages > BTREE_MAX_PAGES)
c->btree_pages = max_t(int, c->btree_pages / 4,
diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
index 225b15aa0340..d719021bff81 100644
--- a/drivers/md/bcache/sysfs.c
+++ b/drivers/md/bcache/sysfs.c
@@ -170,7 +170,8 @@ SHOW(__bch_cached_dev)
var_printf(writeback_running, "%i");
var_print(writeback_delay);
var_print(writeback_percent);
- sysfs_hprint(writeback_rate, dc->writeback_rate.rate << 9);
+ sysfs_hprint(writeback_rate,
+ atomic64_read(&dc->writeback_rate.rate) << 9);
sysfs_hprint(io_errors, atomic_read(&dc->io_errors));
sysfs_printf(io_error_limit, "%i", dc->error_limit);
sysfs_printf(io_disable, "%i", dc->io_disable);
@@ -188,7 +189,8 @@ SHOW(__bch_cached_dev)
char change[20];
s64 next_io;
- bch_hprint(rate, dc->writeback_rate.rate << 9);
+ bch_hprint(rate,
+ atomic64_read(&dc->writeback_rate.rate) << 9);
bch_hprint(dirty, bcache_dev_sectors_dirty(&dc->disk) << 9);
bch_hprint(target, dc->writeback_rate_target << 9);
bch_hprint(proportional,dc->writeback_rate_proportional << 9);
@@ -255,8 +257,12 @@ STORE(__cached_dev)
sysfs_strtoul_clamp(writeback_percent, dc->writeback_percent, 0, 40);
- sysfs_strtoul_clamp(writeback_rate,
- dc->writeback_rate.rate, 1, INT_MAX);
+ if (attr == &sysfs_writeback_rate) {
+ int v;
+
+ sysfs_strtoul_clamp(writeback_rate, v, 1, INT_MAX);
+ atomic64_set(&dc->writeback_rate.rate, v);
+ }
sysfs_strtoul_clamp(writeback_rate_update_seconds,
dc->writeback_rate_update_seconds,
diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
index fc479b026d6d..84f90c3d996d 100644
--- a/drivers/md/bcache/util.c
+++ b/drivers/md/bcache/util.c
@@ -200,7 +200,7 @@ uint64_t bch_next_delay(struct bch_ratelimit *d, uint64_t done)
{
uint64_t now = local_clock();
- d->next += div_u64(done * NSEC_PER_SEC, d->rate);
+ d->next += div_u64(done * NSEC_PER_SEC, atomic64_read(&d->rate));
/* Bound the time. Don't let us fall further than 2 seconds behind
* (this prevents unnecessary backlog that would make it impossible
diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
index cced87f8eb27..7e17f32ab563 100644
--- a/drivers/md/bcache/util.h
+++ b/drivers/md/bcache/util.h
@@ -442,7 +442,7 @@ struct bch_ratelimit {
* Rate at which we want to do work, in units per second
* The units here correspond to the units passed to bch_next_delay()
*/
- uint32_t rate;
+ atomic64_t rate;
};
static inline void bch_ratelimit_reset(struct bch_ratelimit *d)
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index ad45ebe1a74b..11ffadc3cf8f 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -49,6 +49,80 @@ static uint64_t __calc_target_rate(struct cached_dev *dc)
return (cache_dirty_target * bdev_share) >> WRITEBACK_SHARE_SHIFT;
}
+static bool set_at_max_writeback_rate(struct cache_set *c,
+ struct cached_dev *dc)
+{
+ int i, dirty_dc_nr = 0;
+ struct bcache_device *d;
+
+ /*
+ * bch_register_lock is acquired in cached_dev_detach_finish() before
+ * calling cancel_writeback_rate_update_dwork() to stop the delayed
+ * kworker writeback_rate_update (where the context we are for now).
+ * Therefore call mutex_lock() here may introduce deadlock when shut
+ * down the bcache device.
+ * c->previous_dirty_dc_nr is used to record previous calculated
+ * dirty_dc_nr when mutex_trylock() last time succeeded. Then if
+ * mutex_trylock() failed here, use c->previous_dirty_dc_nr as dirty
+ * cached device number. Of cause it might be inaccurate, but a few more
+ * or less loop before setting c->at_max_writeback_rate is much better
+ * then a deadlock here.
+ */
+ if (mutex_trylock(&bch_register_lock)) {
+ for (i = 0; i < c->devices_max_used; i++) {
+ if (!c->devices[i])
+ continue;
+ if (UUID_FLASH_ONLY(&c->uuids[i]))
+ continue;
+ d = c->devices[i];
+ dc = container_of(d, struct cached_dev, disk);
+ if (atomic_read(&dc->has_dirty))
+ dirty_dc_nr++;
+ }
+ c->previous_dirty_dc_nr = dirty_dc_nr;
+
+ mutex_unlock(&bch_register_lock);
+ } else
+ dirty_dc_nr = c->previous_dirty_dc_nr;
+
+ /*
+ * Idle_counter is increased everytime when update_writeback_rate()
+ * is rescheduled in. If all backing devices attached to the same
+ * cache set has same dc->writeback_rate_update_seconds value, it
+ * is about 10 rounds of update_writeback_rate() is called on each
+ * backing device, then the code will fall through at set 1 to
+ * c->at_max_writeback_rate, and a max wrteback rate to each
+ * dc->writeback_rate.rate. This is not very accurate but works well
+ * to make sure the whole cache set has no new I/O coming before
+ * writeback rate is set to a max number.
+ */
+ if (atomic_inc_return(&c->idle_counter) < dirty_dc_nr * 10)
+ return false;
+
+ if (atomic_read(&c->at_max_writeback_rate) != 1)
+ atomic_set(&c->at_max_writeback_rate, 1);
+
+
+ atomic64_set(&dc->writeback_rate.rate, INT_MAX);
+
+ /* keep writeback_rate_target as existing value */
+ dc->writeback_rate_proportional = 0;
+ dc->writeback_rate_integral_scaled = 0;
+ dc->writeback_rate_change = 0;
+
+ /*
+ * Check c->idle_counter and c->at_max_writeback_rate agagain in case
+ * new I/O arrives during before set_at_max_writeback_rate() returns.
+ * Then the writeback rate is set to 1, and its new value should be
+ * decided via __update_writeback_rate().
+ */
+ if (atomic_read(&c->idle_counter) < dirty_dc_nr * 10 ||
+ !atomic_read(&c->at_max_writeback_rate))
+ return false;
+
+ return true;
+}
+
static void __update_writeback_rate(struct cached_dev *dc)
{
/*
@@ -104,8 +178,9 @@ static void __update_writeback_rate(struct cached_dev *dc)
dc->writeback_rate_proportional = proportional_scaled;
dc->writeback_rate_integral_scaled = integral_scaled;
- dc->writeback_rate_change = new_rate - dc->writeback_rate.rate;
- dc->writeback_rate.rate = new_rate;
+ dc->writeback_rate_change = new_rate -
+ atomic64_read(&dc->writeback_rate.rate);
+ atomic64_set(&dc->writeback_rate.rate, new_rate);
dc->writeback_rate_target = target;
}
@@ -138,9 +213,16 @@ static void update_writeback_rate(struct work_struct *work)
down_read(&dc->writeback_lock);
- if (atomic_read(&dc->has_dirty) &&
- dc->writeback_percent)
- __update_writeback_rate(dc);
+ if (atomic_read(&dc->has_dirty) && dc->writeback_percent) {
+ /*
+ * If the whole cache set is idle, set_at_max_writeback_rate()
+ * will set writeback rate to a max number. Then it is
+ * unncessary to update writeback rate for an idle cache set
+ * in maximum writeback rate number(s).
+ */
+ if (!set_at_max_writeback_rate(c, dc))
+ __update_writeback_rate(dc);
+ }
up_read(&dc->writeback_lock);
@@ -422,27 +504,6 @@ static void read_dirty(struct cached_dev *dc)
delay = writeback_delay(dc, size);
- /* If the control system would wait for at least half a
- * second, and there's been no reqs hitting the backing disk
- * for awhile: use an alternate mode where we have at most
- * one contiguous set of writebacks in flight at a time. If
- * someone wants to do IO it will be quick, as it will only
- * have to contend with one operation in flight, and we'll
- * be round-tripping data to the backing disk as quickly as
- * it can accept it.
- */
- if (delay >= HZ / 2) {
- /* 3 means at least 1.5 seconds, up to 7.5 if we
- * have slowed way down.
- */
- if (atomic_inc_return(&dc->backing_idle) >= 3) {
- /* Wait for current I/Os to finish */
- closure_sync(&cl);
- /* And immediately launch a new set. */
- delay = 0;
- }
- }
-
while (!kthread_should_stop() &&
!test_bit(CACHE_SET_IO_DISABLE, &dc->disk.c->flags) &&
delay) {
@@ -715,7 +776,7 @@ void bch_cached_dev_writeback_init(struct cached_dev *dc)
dc->writeback_running = true;
dc->writeback_percent = 10;
dc->writeback_delay = 30;
- dc->writeback_rate.rate = 1024;
+ atomic64_set(&dc->writeback_rate.rate, 1024);
dc->writeback_rate_minimum = 8;
dc->writeback_rate_update_seconds = WRITEBACK_RATE_UPDATE_SECS_DEFAULT;
--
2.17.1
Changes since v5 [1]:
* Move put_page() before memory_failure() in madvise_inject_error()
(Naoya)
* The previous change uncovered a latent bug / broken assumption in
__put_devmap_managed_page(). We need to preserve page->mapping for
dax pages when they go idle.
* Rename mapping_size() to dev_pagemap_mapping_size() (Naoya)
* Catch and fail attempts to soft-offline dax pages (Naoya)
* Collect Naoya's ack on "mm, memory_failure: Collect mapping size in
collect_procs()"
[1]: https://lists.01.org/pipermail/linux-nvdimm/2018-July/016682.html
---
As it stands, memory_failure() gets thoroughly confused by dev_pagemap
backed mappings. The recovery code has specific enabling for several
possible page states and needs new enabling to handle poison in dax
mappings.
In order to support reliable reverse mapping of user space addresses:
1/ Add new locking in the memory_failure() rmap path to prevent races
that would typically be handled by the page lock.
2/ Since dev_pagemap pages are hidden from the page allocator and the
"compound page" accounting machinery, add a mechanism to determine the
size of the mapping that encompasses a given poisoned pfn.
3/ Given pmem errors can be repaired, change the speculatively accessed
poison protection, mce_unmap_kpfn(), to be reversible and otherwise
allow ongoing access from the kernel.
A side effect of this enabling is that MADV_HWPOISON becomes usable for
dax mappings, however the primary motivation is to allow the system to
survive userspace consumption of hardware-poison via dax. Specifically
the current behavior is:
mce: Uncorrected hardware memory error in user-access at af34214200
{1}[Hardware Error]: It has been corrected by h/w and requires no further action
mce: [Hardware Error]: Machine check events logged
{1}[Hardware Error]: event severity: corrected
Memory failure: 0xaf34214: reserved kernel page still referenced by 1 users
[..]
Memory failure: 0xaf34214: recovery action for reserved kernel page: Failed
mce: Memory error not recovered
<reboot>
...and with these changes:
Injecting memory failure for pfn 0x20cb00 at process virtual address 0x7f763dd00000
Memory failure: 0x20cb00: Killing dax-pmd:5421 due to hardware memory corruption
Memory failure: 0x20cb00: recovery action for dax page: Recovered
Given all the cross dependencies I propose taking this through
nvdimm.git with acks from Naoya, x86/core, x86/RAS, and of course dax
folks.
---
Dan Williams (13):
device-dax: Convert to vmf_insert_mixed and vm_fault_t
device-dax: Enable page_mapping()
device-dax: Set page->index
filesystem-dax: Set page->index
mm, madvise_inject_error: Disable MADV_SOFT_OFFLINE for ZONE_DEVICE pages
mm, dev_pagemap: Do not clear ->mapping on final put
mm, madvise_inject_error: Let memory_failure() optionally take a page reference
mm, memory_failure: Collect mapping size in collect_procs()
filesystem-dax: Introduce dax_lock_mapping_entry()
mm, memory_failure: Teach memory_failure() about dev_pagemap pages
x86/mm/pat: Prepare {reserve,free}_memtype() for "decoy" addresses
x86/memory_failure: Introduce {set,clear}_mce_nospec()
libnvdimm, pmem: Restore page attributes when clearing errors
arch/x86/include/asm/set_memory.h | 42 ++++++
arch/x86/kernel/cpu/mcheck/mce-internal.h | 15 --
arch/x86/kernel/cpu/mcheck/mce.c | 38 -----
arch/x86/mm/pat.c | 16 ++
drivers/dax/device.c | 75 +++++++---
drivers/nvdimm/pmem.c | 26 ++++
drivers/nvdimm/pmem.h | 13 ++
fs/dax.c | 125 ++++++++++++++++-
include/linux/dax.h | 13 ++
include/linux/huge_mm.h | 5 -
include/linux/mm.h | 1
include/linux/set_memory.h | 14 ++
kernel/memremap.c | 1
mm/hmm.c | 2
mm/huge_memory.c | 4 -
mm/madvise.c | 16 ++
mm/memory-failure.c | 210 +++++++++++++++++++++++------
17 files changed, 481 insertions(+), 135 deletions(-)
error_entry and error_exit communicate the user vs kernel status of
the frame using %ebx. This is unnecessary -- the information is in
regs->cs. Just use regs->cs.
This makes error_entry simpler and makes error_exit more robust.
It also fixes a nasty bug. Before all the Spectre nonsense, The
xen_failsafe_callback entry point returned like this:
ALLOC_PT_GPREGS_ON_STACK
SAVE_C_REGS
SAVE_EXTRA_REGS
ENCODE_FRAME_POINTER
jmp error_exit
And it did not go through error_entry. This was bogus: RBX
contained garbage, and error_exit expected a flag in RBX.
Fortunately, it generally contained *nonzero* garbage, so the
correct code path was used. As part of the Spectre fixes, code was
added to clear RBX to mitigate certain speculation attacks. Now,
depending on kernel configuration, RBX got zeroed and, when running
some Wine workloads, the kernel crashes. This was introduced by:
commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for
exceptions/interrupts, to reduce speculation attack surface")
With this patch applied, RBX is no longer needed as a flag, and the
problem goes away.
I suspect that malicious userspace could use this bug to crash the
kernel even without the offending patch applied, though.
[Historical note: I wrote this patch as a cleanup before I was aware
of the bug it fixed.]
[Note to stable maintainers: this should probably get applied to all
kernels. If you're nervous about that, a more conservative fix to
add xorl %ebx,%ebx; incl %ebx before the jump to error_exit should
also fix the problem.]
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dominik Brodowski <linux(a)dominikbrodowski.net>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: xen-devel(a)lists.xenproject.org
Cc: x86(a)kernel.org
Cc: stable(a)vger.kernel.org
Fixes: 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface")
Reported-and-tested-by: "M. Vefa Bicakci" <m.v.b(a)runbox.com>
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
---
I could also submit the conservative fix tagged for -stable and respin
this on top of it. Ingo, Greg, what do you prefer?
arch/x86/entry/entry_64.S | 18 ++++--------------
1 file changed, 4 insertions(+), 14 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 73a522d53b53..8ae7ffda8f98 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -981,7 +981,7 @@ ENTRY(\sym)
call \do_sym
- jmp error_exit /* %ebx: no swapgs flag */
+ jmp error_exit
.endif
END(\sym)
.endm
@@ -1222,7 +1222,6 @@ END(paranoid_exit)
/*
* Save all registers in pt_regs, and switch GS if needed.
- * Return: EBX=0: came from user mode; EBX=1: otherwise
*/
ENTRY(error_entry)
UNWIND_HINT_FUNC
@@ -1269,7 +1268,6 @@ ENTRY(error_entry)
* for these here too.
*/
.Lerror_kernelspace:
- incl %ebx
leaq native_irq_return_iret(%rip), %rcx
cmpq %rcx, RIP+8(%rsp)
je .Lerror_bad_iret
@@ -1303,28 +1301,20 @@ ENTRY(error_entry)
/*
* Pretend that the exception came from user mode: set up pt_regs
- * as if we faulted immediately after IRET and clear EBX so that
- * error_exit knows that we will be returning to user mode.
+ * as if we faulted immediately after IRET.
*/
mov %rsp, %rdi
call fixup_bad_iret
mov %rax, %rsp
- decl %ebx
jmp .Lerror_entry_from_usermode_after_swapgs
END(error_entry)
-
-/*
- * On entry, EBX is a "return to kernel mode" flag:
- * 1: already in kernel mode, don't need SWAPGS
- * 0: user gsbase is loaded, we need SWAPGS and standard preparation for return to usermode
- */
ENTRY(error_exit)
UNWIND_HINT_REGS
DISABLE_INTERRUPTS(CLBR_ANY)
TRACE_IRQS_OFF
- testl %ebx, %ebx
- jnz retint_kernel
+ testb $3, CS(%rsp)
+ jz retint_kernel
jmp retint_user
END(error_exit)
--
2.17.1
The patch titled
Subject: mm: memcg: fix use after free in mem_cgroup_iter()
has been removed from the -mm tree. Its filename was
mm-memcg-fix-use-after-free-in-mem_cgroup_iter.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Jing Xia <jing.xia.mail(a)gmail.com>
Subject: mm: memcg: fix use after free in mem_cgroup_iter()
It was reported that a kernel crash happened in mem_cgroup_iter(), which
can be triggered if the legacy cgroup-v1 non-hierarchical mode is used.
Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b8f
......
Call trace:
mem_cgroup_iter+0x2e0/0x6d4
shrink_zone+0x8c/0x324
balance_pgdat+0x450/0x640
kswapd+0x130/0x4b8
kthread+0xe8/0xfc
ret_from_fork+0x10/0x20
mem_cgroup_iter():
......
if (css_tryget(css)) <-- crash here
break;
......
The crashing reason is that mem_cgroup_iter() uses the memcg object whose
pointer is stored in iter->position, which has been freed before and
filled with POISON_FREE(0x6b).
And the root cause of the use-after-free issue is that
invalidate_reclaim_iterators() fails to reset the value of iter->position
to NULL when the css of the memcg is released in non- hierarchical mode.
Link: http://lkml.kernel.org/r/1531994807-25639-1-git-send-email-jing.xia@unisoc.…
Fixes: 6df38689e0e9 ("mm: memcontrol: fix possible memcg leak due to interrupted reclaim")
Signed-off-by: Jing Xia <jing.xia.mail(a)gmail.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: <chunyan.zhang(a)unisoc.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN mm/memcontrol.c~mm-memcg-fix-use-after-free-in-mem_cgroup_iter mm/memcontrol.c
--- a/mm/memcontrol.c~mm-memcg-fix-use-after-free-in-mem_cgroup_iter
+++ a/mm/memcontrol.c
@@ -850,7 +850,7 @@ static void invalidate_reclaim_iterators
int nid;
int i;
- while ((memcg = parent_mem_cgroup(memcg))) {
+ for (; memcg; memcg = parent_mem_cgroup(memcg)) {
for_each_node(nid) {
mz = mem_cgroup_nodeinfo(memcg, nid);
for (i = 0; i <= DEF_PRIORITY; i++) {
_
Patches currently in -mm which might be from jing.xia.mail(a)gmail.com are
The patch titled
Subject: mm/huge_memory.c: fix data loss when splitting a file pmd
has been removed from the -mm tree. Its filename was
thp-fix-data-loss-when-splitting-a-file-pmd.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm/huge_memory.c: fix data loss when splitting a file pmd
__split_huge_pmd_locked() must check if the cleared huge pmd was dirty,
and propagate that to PageDirty: otherwise, data may be lost when a huge
tmpfs page is modified then split then reclaimed.
How has this taken so long to be noticed? Because there was no problem
when the huge page is written by a write system call (shmem_write_end()
calls set_page_dirty()), nor when the page is allocated for a write fault
(fault_dirty_shared_page() calls set_page_dirty()); but when allocated for
a read fault (which MAP_POPULATE simulates), no set_page_dirty().
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1807111741430.1106@eggly.anvils
Fixes: d21b9e57c74c ("thp: handle file pages in split_huge_pmd()")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Reported-by: Ashwin Chaugule <ashwinch(a)google.com>
Reviewed-by: Yang Shi <yang.shi(a)linux.alibaba.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: <stable(a)vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 2 ++
1 file changed, 2 insertions(+)
diff -puN mm/huge_memory.c~thp-fix-data-loss-when-splitting-a-file-pmd mm/huge_memory.c
--- a/mm/huge_memory.c~thp-fix-data-loss-when-splitting-a-file-pmd
+++ a/mm/huge_memory.c
@@ -2084,6 +2084,8 @@ static void __split_huge_pmd_locked(stru
if (vma_is_dax(vma))
return;
page = pmd_page(_pmd);
+ if (!PageDirty(page) && pmd_dirty(_pmd))
+ set_page_dirty(page);
if (!PageReferenced(page) && pmd_young(_pmd))
SetPageReferenced(page);
page_remove_rmap(page, true);
_
Patches currently in -mm which might be from hughd(a)google.com are