From: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Subject: mm/vmscan: wake up flushers for legacy cgroups too
Commit 726d061fbd36 ("mm: vmscan: kick flushers when we encounter dirty
pages on the LRU") added flusher invocation to shrink_inactive_list() when
many dirty pages on the LRU are encountered.
However, shrink_inactive_list() doesn't wake up flushers for legacy cgroup
reclaim, so the next commit bbef938429f5 ("mm: vmscan: remove old flusher
wakeup from direct reclaim path") removed the only source of flusher's
wake up in legacy mem cgroup reclaim path.
This leads to premature OOM if there is too many dirty pages in cgroup:
# mkdir /sys/fs/cgroup/memory/test
# echo $$ > /sys/fs/cgroup/memory/test/tasks
# echo 50M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
# dd if=/dev/zero of=tmp_file bs=1M count=100
Killed
dd invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0
Call Trace:
dump_stack+0x46/0x65
dump_header+0x6b/0x2ac
oom_kill_process+0x21c/0x4a0
out_of_memory+0x2a5/0x4b0
mem_cgroup_out_of_memory+0x3b/0x60
mem_cgroup_oom_synchronize+0x2ed/0x330
pagefault_out_of_memory+0x24/0x54
__do_page_fault+0x521/0x540
page_fault+0x45/0x50
Task in /test killed as a result of limit of /test
memory: usage 51200kB, limit 51200kB, failcnt 73
memory+swap: usage 51200kB, limit 9007199254740988kB, failcnt 0
kmem: usage 296kB, limit 9007199254740988kB, failcnt 0
Memory cgroup stats for /test: cache:49632KB rss:1056KB rss_huge:0KB shmem:0KB
mapped_file:0KB dirty:49500KB writeback:0KB swap:0KB inactive_anon:0KB
active_anon:1168KB inactive_file:24760KB active_file:24960KB unevictable:0KB
Memory cgroup out of memory: Kill process 3861 (bash) score 88 or sacrifice child
Killed process 3876 (dd) total-vm:8484kB, anon-rss:1052kB, file-rss:1720kB, shmem-rss:0kB
oom_reaper: reaped process 3876 (dd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Wake up flushers in legacy cgroup reclaim too.
Link: http://lkml.kernel.org/r/20180315164553.17856-1-aryabinin@virtuozzo.com
Fixes: bbef938429f5 ("mm: vmscan: remove old flusher wakeup from direct reclaim path")
Signed-off-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Tested-by: Shakeel Butt <shakeelb(a)google.com>
Acked-by: Michal Hocko <mhocko(a)suse.cz>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <>
---
mm/vmscan.c | 31 ++++++++++++++++---------------
1 file changed, 16 insertions(+), 15 deletions(-)
diff -puN mm/vmscan.c~mm-vmscan-wake-up-flushers-for-legacy-cgroups-too mm/vmscan.c
--- a/mm/vmscan.c~mm-vmscan-wake-up-flushers-for-legacy-cgroups-too
+++ a/mm/vmscan.c
@@ -1780,6 +1780,20 @@ shrink_inactive_list(unsigned long nr_to
set_bit(PGDAT_WRITEBACK, &pgdat->flags);
/*
+ * If dirty pages are scanned that are not queued for IO, it
+ * implies that flushers are not doing their job. This can
+ * happen when memory pressure pushes dirty pages to the end of
+ * the LRU before the dirty limits are breached and the dirty
+ * data has expired. It can also happen when the proportion of
+ * dirty pages grows not through writes but through memory
+ * pressure reclaiming all the clean cache. And in some cases,
+ * the flushers simply cannot keep up with the allocation
+ * rate. Nudge the flusher threads in case they are asleep.
+ */
+ if (stat.nr_unqueued_dirty == nr_taken)
+ wakeup_flusher_threads(WB_REASON_VMSCAN);
+
+ /*
* Legacy memcg will stall in page writeback so avoid forcibly
* stalling here.
*/
@@ -1791,22 +1805,9 @@ shrink_inactive_list(unsigned long nr_to
if (stat.nr_dirty && stat.nr_dirty == stat.nr_congested)
set_bit(PGDAT_CONGESTED, &pgdat->flags);
- /*
- * If dirty pages are scanned that are not queued for IO, it
- * implies that flushers are not doing their job. This can
- * happen when memory pressure pushes dirty pages to the end of
- * the LRU before the dirty limits are breached and the dirty
- * data has expired. It can also happen when the proportion of
- * dirty pages grows not through writes but through memory
- * pressure reclaiming all the clean cache. And in some cases,
- * the flushers simply cannot keep up with the allocation
- * rate. Nudge the flusher threads in case they are asleep, but
- * also allow kswapd to start writing pages during reclaim.
- */
- if (stat.nr_unqueued_dirty == nr_taken) {
- wakeup_flusher_threads(WB_REASON_VMSCAN);
+ /* Allow kswapd to start writing pages during reclaim. */
+ if (stat.nr_unqueued_dirty == nr_taken)
set_bit(PGDAT_DIRTY, &pgdat->flags);
- }
/*
* If kswapd scans pages marked marked for immediate
_
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Subject: mm/shmem: do not wait for lock_page() in shmem_unused_huge_shrink()
shmem_unused_huge_shrink() gets called from reclaim path. Waiting for
page lock may lead to deadlock there.
There was a bug report that may be attributed to this:
http://lkml.kernel.org/r/alpine.LRH.2.11.1801242349220.30642@mail.ewheeler.…
Replace lock_page() with trylock_page() and skip the page if we failed to
lock it. We will get to the page on the next scan.
We can test for the PageTransHuge() outside the page lock as we only need
protection against splitting the page under us. Holding pin oni the page
is enough for this.
Link: http://lkml.kernel.org/r/20180316210830.43738-1-kirill.shutemov@linux.intel…
Fixes: 779750d20b93 ("shmem: split huge pages beyond i_size under memory pressure")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reported-by: Eric Wheeler <linux-mm(a)lists.ewheeler.net>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: <stable(a)vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <>
---
mm/shmem.c | 31 ++++++++++++++++++++-----------
1 file changed, 20 insertions(+), 11 deletions(-)
diff -puN mm/shmem.c~mm-shmem-do-not-wait-for-lock_page-in-shmem_unused_huge_shrink mm/shmem.c
--- a/mm/shmem.c~mm-shmem-do-not-wait-for-lock_page-in-shmem_unused_huge_shrink
+++ a/mm/shmem.c
@@ -493,36 +493,45 @@ next:
info = list_entry(pos, struct shmem_inode_info, shrinklist);
inode = &info->vfs_inode;
- if (nr_to_split && split >= nr_to_split) {
- iput(inode);
- continue;
- }
+ if (nr_to_split && split >= nr_to_split)
+ goto leave;
- page = find_lock_page(inode->i_mapping,
+ page = find_get_page(inode->i_mapping,
(inode->i_size & HPAGE_PMD_MASK) >> PAGE_SHIFT);
if (!page)
goto drop;
+ /* No huge page at the end of the file: nothing to split */
if (!PageTransHuge(page)) {
- unlock_page(page);
put_page(page);
goto drop;
}
+ /*
+ * Leave the inode on the list if we failed to lock
+ * the page at this time.
+ *
+ * Waiting for the lock may lead to deadlock in the
+ * reclaim path.
+ */
+ if (!trylock_page(page)) {
+ put_page(page);
+ goto leave;
+ }
+
ret = split_huge_page(page);
unlock_page(page);
put_page(page);
- if (ret) {
- /* split failed: leave it on the list */
- iput(inode);
- continue;
- }
+ /* If split failed leave the inode on the list */
+ if (ret)
+ goto leave;
split++;
drop:
list_del_init(&info->shrinklist);
removed++;
+leave:
iput(inode);
}
_
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Subject: mm/thp: do not wait for lock_page() in deferred_split_scan()
deferred_split_scan() gets called from reclaim path. Waiting for page
lock may lead to deadlock there.
Replace lock_page() with trylock_page() and skip the page if we failed to
lock it. We will get to the page on the next scan.
Link: http://lkml.kernel.org/r/20180315150747.31945-1-kirill.shutemov@linux.intel…
Fixes: 9a982250f773 ("thp: introduce deferred_split_huge_page()")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <>
---
mm/huge_memory.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff -puN mm/huge_memory.c~mm-thp-do-not-wait-for-lock_page-in-deferred_split_scan mm/huge_memory.c
--- a/mm/huge_memory.c~mm-thp-do-not-wait-for-lock_page-in-deferred_split_scan
+++ a/mm/huge_memory.c
@@ -2783,11 +2783,13 @@ static unsigned long deferred_split_scan
list_for_each_safe(pos, next, &list) {
page = list_entry((void *)pos, struct page, mapping);
- lock_page(page);
+ if (!trylock_page(page))
+ goto next;
/* split_huge_page() removes page from list on success */
if (!split_huge_page(page))
split++;
unlock_page(page);
+next:
put_page(page);
}
_
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Subject: mm/khugepaged.c: convert VM_BUG_ON() to collapse fail
khugepaged is not yet able to convert PTE-mapped huge pages back to PMD
mapped. We do not collapse such pages. See check khugepaged_scan_pmd().
But if between khugepaged_scan_pmd() and __collapse_huge_page_isolate()
somebody managed to instantiate THP in the range and then split the PMD
back to PTEs we would have a problem -- VM_BUG_ON_PAGE(PageCompound(page))
will get triggered.
It's possible since we drop mmap_sem during collapse to re-take for
write.
Replace the VM_BUG_ON() with graceful collapse fail.
Link: http://lkml.kernel.org/r/20180315152353.27989-1-kirill.shutemov@linux.intel…
Fixes: b1caa957ae6d ("khugepaged: ignore pmd tables with THP mapped with ptes")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Laura Abbott <labbott(a)redhat.com>
Cc: Jerome Marchand <jmarchan(a)redhat.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <>
---
mm/khugepaged.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff -puN mm/khugepaged.c~mm-khugepaged-convert-vm_bug_on-to-collapse-fail mm/khugepaged.c
--- a/mm/khugepaged.c~mm-khugepaged-convert-vm_bug_on-to-collapse-fail
+++ a/mm/khugepaged.c
@@ -530,7 +530,12 @@ static int __collapse_huge_page_isolate(
goto out;
}
- VM_BUG_ON_PAGE(PageCompound(page), page);
+ /* TODO: teach khugepaged to collapse THP mapped with pte */
+ if (PageCompound(page)) {
+ result = SCAN_PAGE_COMPOUND;
+ goto out;
+ }
+
VM_BUG_ON_PAGE(!PageAnon(page), page);
/*
_
From: Arnd Bergmann <arnd(a)arndb.de>
Subject: h8300: remove extraneous __BIG_ENDIAN definition
A bugfix I did earlier caused a build regression on h8300, which defines
the __BIG_ENDIAN macro in a slightly different way than the generic code:
arch/h8300/include/asm/byteorder.h:5:0: warning: "__BIG_ENDIAN" redefined
We don't need to define it here, as the same macro is already provided by
the linux/byteorder/big_endian.h, and that version does not conflict.
While this is a v4.16 regression, my earlier patch also got backported to
the 4.14 and 4.15 stable kernels, so we need the fixup there as well.
Link: http://lkml.kernel.org/r/20180313120752.2645129-1-arnd@arndb.de
Fixes: 101110f6271c ("Kbuild: always define endianess in kconfig.h")
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Cc: Yoshinori Sato <ysato(a)users.sourceforge.jp>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <>
---
arch/h8300/include/asm/byteorder.h | 1 -
1 file changed, 1 deletion(-)
diff -puN arch/h8300/include/asm/byteorder.h~h8300-remove-extranous-__big_endian-definition arch/h8300/include/asm/byteorder.h
--- a/arch/h8300/include/asm/byteorder.h~h8300-remove-extranous-__big_endian-definition
+++ a/arch/h8300/include/asm/byteorder.h
@@ -2,7 +2,6 @@
#ifndef __H8300_BYTEORDER_H__
#define __H8300_BYTEORDER_H__
-#define __BIG_ENDIAN __ORDER_BIG_ENDIAN__
#include <linux/byteorder/big_endian.h>
#endif
_
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlbfs: check for pgoff value overflow
A vma with vm_pgoff large enough to overflow a loff_t type when converted
to a byte offset can be passed via the remap_file_pages system call. The
hugetlbfs mmap routine uses the byte offset to calculate reservations and
file size.
A sequence such as:
mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
will result in the following when task exits/file closed,
kernel BUG at mm/hugetlb.c:749!
Call Trace:
hugetlbfs_evict_inode+0x2f/0x40
evict+0xcb/0x190
__dentry_kill+0xcb/0x150
__fput+0x164/0x1e0
task_work_run+0x84/0xa0
exit_to_usermode_loop+0x7d/0x80
do_syscall_64+0x18b/0x190
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
The overflowed pgoff value causes hugetlbfs to try to set up a mapping
with a negative range (end < start) that leaves invalid state which causes
the BUG.
The previous overflow fix to this code was incomplete and did not take the
remap_file_pages system call into account.
[mike.kravetz(a)oracle.com: v3]
Link: http://lkml.kernel.org/r/20180309002726.7248-1-mike.kravetz@oracle.com
[akpm(a)linux-foundation.org: include mmdebug.h]
[akpm(a)linux-foundation.org: fix -ve left shift count on sh]
Link: http://lkml.kernel.org/r/20180308210502.15952-1-mike.kravetz@oracle.com
Fixes: 045c7a3f53d9 ("hugetlbfs: fix offset overflow in hugetlbfs mmap")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reported-by: Nic Losby <blurbdust(a)gmail.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Yisheng Xie <xieyisheng1(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <>
---
fs/hugetlbfs/inode.c | 17 ++++++++++++++---
mm/hugetlb.c | 7 +++++++
2 files changed, 21 insertions(+), 3 deletions(-)
diff -puN fs/hugetlbfs/inode.c~hugetlbfs-check-for-pgoff-value-overflow fs/hugetlbfs/inode.c
--- a/fs/hugetlbfs/inode.c~hugetlbfs-check-for-pgoff-value-overflow
+++ a/fs/hugetlbfs/inode.c
@@ -108,6 +108,16 @@ static void huge_pagevec_release(struct
pagevec_reinit(pvec);
}
+/*
+ * Mask used when checking the page offset value passed in via system
+ * calls. This value will be converted to a loff_t which is signed.
+ * Therefore, we want to check the upper PAGE_SHIFT + 1 bits of the
+ * value. The extra bit (- 1 in the shift value) is to take the sign
+ * bit into account.
+ */
+#define PGOFF_LOFFT_MAX \
+ (((1UL << (PAGE_SHIFT + 1)) - 1) << (BITS_PER_LONG - (PAGE_SHIFT + 1)))
+
static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
{
struct inode *inode = file_inode(file);
@@ -127,12 +137,13 @@ static int hugetlbfs_file_mmap(struct fi
vma->vm_ops = &hugetlb_vm_ops;
/*
- * Offset passed to mmap (before page shift) could have been
- * negative when represented as a (l)off_t.
+ * page based offset in vm_pgoff could be sufficiently large to
+ * overflow a (l)off_t when converted to byte offset.
*/
- if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
+ if (vma->vm_pgoff & PGOFF_LOFFT_MAX)
return -EINVAL;
+ /* must be huge page aligned */
if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
return -EINVAL;
diff -puN mm/hugetlb.c~hugetlbfs-check-for-pgoff-value-overflow mm/hugetlb.c
--- a/mm/hugetlb.c~hugetlbfs-check-for-pgoff-value-overflow
+++ a/mm/hugetlb.c
@@ -18,6 +18,7 @@
#include <linux/bootmem.h>
#include <linux/sysfs.h>
#include <linux/slab.h>
+#include <linux/mmdebug.h>
#include <linux/sched/signal.h>
#include <linux/rmap.h>
#include <linux/string_helpers.h>
@@ -4374,6 +4375,12 @@ int hugetlb_reserve_pages(struct inode *
struct resv_map *resv_map;
long gbl_reserve;
+ /* This should never happen */
+ if (from > to) {
+ VM_WARN(1, "%s called with a negative range\n", __func__);
+ return -EINVAL;
+ }
+
/*
* Only apply hugepage reservation if asked. At fault time, an
* attempt will be made for VM_NORESERVE to allocate a page
_
From: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Subject: lockdep: fix fs_reclaim warning
Dave Jones reported fs_reclaim lockdep warnings.
============================================
WARNING: possible recursive locking detected
4.15.0-rc9-backup-debug+ #1 Not tainted
--------------------------------------------
sshd/24800 is trying to acquire lock:
(fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30
but task is already holding lock:
(fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(fs_reclaim);
lock(fs_reclaim);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by sshd/24800:
#0: (sk_lock-AF_INET6){+.+.}, at: [<000000001a069652>] tcp_sendmsg+0x19/0x40
#1: (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30
stack backtrace:
CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1
Call Trace:
dump_stack+0xbc/0x13f
__lock_acquire+0xa09/0x2040
lock_acquire+0x12e/0x350
fs_reclaim_acquire.part.102+0x29/0x30
kmem_cache_alloc+0x3d/0x2c0
alloc_extent_state+0xa7/0x410
__clear_extent_bit+0x3ea/0x570
try_release_extent_mapping+0x21a/0x260
__btrfs_releasepage+0xb0/0x1c0
btrfs_releasepage+0x161/0x170
try_to_release_page+0x162/0x1c0
shrink_page_list+0x1d5a/0x2fb0
shrink_inactive_list+0x451/0x940
shrink_node_memcg.constprop.88+0x4c9/0x5e0
shrink_node+0x12d/0x260
try_to_free_pages+0x418/0xaf0
__alloc_pages_slowpath+0x976/0x1790
__alloc_pages_nodemask+0x52c/0x5c0
new_slab+0x374/0x3f0
___slab_alloc.constprop.81+0x47e/0x5a0
__slab_alloc.constprop.80+0x32/0x60
__kmalloc_track_caller+0x267/0x310
__kmalloc_reserve.isra.40+0x29/0x80
__alloc_skb+0xee/0x390
sk_stream_alloc_skb+0xb8/0x340
tcp_sendmsg_locked+0x8e6/0x1d30
tcp_sendmsg+0x27/0x40
inet_sendmsg+0xd0/0x310
sock_write_iter+0x17a/0x240
__vfs_write+0x2ab/0x380
vfs_write+0xfb/0x260
SyS_write+0xb6/0x140
do_syscall_64+0x1e5/0xc05
entry_SYSCALL64_slow_path+0x25/0x25
This warning is caused by commit d92a8cfcb37ecd13 ("locking/lockdep:
Rework FS_RECLAIM annotation") which replaced
lockdep_set_current_reclaim_state()/ lockdep_clear_current_reclaim_state()
in __perform_reclaim() and lockdep_trace_alloc() in slab_pre_alloc_hook()
with fs_reclaim_acquire()/ fs_reclaim_release(). Since
__kmalloc_reserve() from __alloc_skb() adds __GFP_NOMEMALLOC |
__GFP_NOWARN to gfp_mask, and all reclaim path simply propagates
__GFP_NOMEMALLOC, fs_reclaim_acquire() in slab_pre_alloc_hook() is trying
to grab the 'fake' lock again when __perform_reclaim() already grabbed the
'fake' lock.
The
/* this guy won't enter reclaim */
if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC))
return false;
test which causes slab_pre_alloc_hook() to try to grab the 'fake' lock was
added by commit cf40bd16fdad42c0 ("lockdep: annotate reclaim context
(__GFP_NOFS)"). But that test is outdated because PF_MEMALLOC thread
won't enter reclaim regardless of __GFP_NOMEMALLOC after commit
341ce06f69abfafa ("page allocator: calculate the alloc_flags for
allocation only once") added the PF_MEMALLOC safeguard (
/* Avoid recursion of direct reclaim */
if (p->flags & PF_MEMALLOC)
goto nopage;
in __alloc_pages_slowpath()).
Thus, let's fix outdated test by removing __GFP_NOMEMALLOC test and allow
__need_fs_reclaim() to return false.
Link: http://lkml.kernel.org/r/201802280650.FJC73911.FOSOMLJVFFQtHO@I-love.SAKURA…
Fixes: d92a8cfcb37ecd13 ("locking/lockdep: Rework FS_RECLAIM annotation")
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Reported-by: Dave Jones <davej(a)codemonkey.org.uk>
Tested-by: Dave Jones <davej(a)codemonkey.org.uk>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Nick Piggin <npiggin(a)gmail.com>
Cc: Ingo Molnar <mingo(a)elte.hu>
Cc: Nikolay Borisov <nborisov(a)suse.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <>
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN mm/page_alloc.c~lockdep-fix-fs_reclaim-warning mm/page_alloc.c
--- a/mm/page_alloc.c~lockdep-fix-fs_reclaim-warning
+++ a/mm/page_alloc.c
@@ -3596,7 +3596,7 @@ static bool __need_fs_reclaim(gfp_t gfp_
return false;
/* this guy won't enter reclaim */
- if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC))
+ if (current->flags & PF_MEMALLOC)
return false;
/* We're only interested __GFP_FS allocations for now */
_
This is a note to let you know that I've just added the patch titled
RDMA/vmw_pvrdma: Fix usage of user response structures in ABI file
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
rdma-vmw_pvrdma-fix-usage-of-user-response-structures-in-abi-file.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 1f5a6c47aabc4606f91ad2e6ef71a1ff1924101c Mon Sep 17 00:00:00 2001
From: Adit Ranadive <aditr(a)vmware.com>
Date: Thu, 15 Feb 2018 12:36:46 -0800
Subject: RDMA/vmw_pvrdma: Fix usage of user response structures in ABI file
From: Adit Ranadive <aditr(a)vmware.com>
commit 1f5a6c47aabc4606f91ad2e6ef71a1ff1924101c upstream.
This ensures that we return the right structures back to userspace.
Otherwise, it looks like the reserved fields in the response structures
in userspace might have uninitialized data in them.
Fixes: 8b10ba783c9d ("RDMA/vmw_pvrdma: Add shared receive queue support")
Fixes: 29c8d9eba550 ("IB: Add vmw_pvrdma driver")
Suggested-by: Jason Gunthorpe <jgg(a)mellanox.com>
Reviewed-by: Bryan Tan <bryantan(a)vmware.com>
Reviewed-by: Aditya Sarwade <asarwade(a)vmware.com>
Reviewed-by: Jorgen Hansen <jhansen(a)vmware.com>
Signed-off-by: Adit Ranadive <aditr(a)vmware.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c | 4 +++-
drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.c | 4 +++-
2 files changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c
@@ -114,6 +114,7 @@ struct ib_cq *pvrdma_create_cq(struct ib
union pvrdma_cmd_resp rsp;
struct pvrdma_cmd_create_cq *cmd = &req.create_cq;
struct pvrdma_cmd_create_cq_resp *resp = &rsp.create_cq_resp;
+ struct pvrdma_create_cq_resp cq_resp = {0};
struct pvrdma_create_cq ucmd;
BUILD_BUG_ON(sizeof(struct pvrdma_cqe) != 64);
@@ -198,6 +199,7 @@ struct ib_cq *pvrdma_create_cq(struct ib
cq->ibcq.cqe = resp->cqe;
cq->cq_handle = resp->cq_handle;
+ cq_resp.cqn = resp->cq_handle;
spin_lock_irqsave(&dev->cq_tbl_lock, flags);
dev->cq_tbl[cq->cq_handle % dev->dsr->caps.max_cq] = cq;
spin_unlock_irqrestore(&dev->cq_tbl_lock, flags);
@@ -206,7 +208,7 @@ struct ib_cq *pvrdma_create_cq(struct ib
cq->uar = &(to_vucontext(context)->uar);
/* Copy udata back. */
- if (ib_copy_to_udata(udata, &cq->cq_handle, sizeof(__u32))) {
+ if (ib_copy_to_udata(udata, &cq_resp, sizeof(cq_resp))) {
dev_warn(&dev->pdev->dev,
"failed to copy back udata\n");
pvrdma_destroy_cq(&cq->ibcq);
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.c
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.c
@@ -444,6 +444,7 @@ struct ib_pd *pvrdma_alloc_pd(struct ib_
union pvrdma_cmd_resp rsp;
struct pvrdma_cmd_create_pd *cmd = &req.create_pd;
struct pvrdma_cmd_create_pd_resp *resp = &rsp.create_pd_resp;
+ struct pvrdma_alloc_pd_resp pd_resp = {0};
int ret;
void *ptr;
@@ -472,9 +473,10 @@ struct ib_pd *pvrdma_alloc_pd(struct ib_
pd->privileged = !context;
pd->pd_handle = resp->pd_handle;
pd->pdn = resp->pd_handle;
+ pd_resp.pdn = resp->pd_handle;
if (context) {
- if (ib_copy_to_udata(udata, &pd->pdn, sizeof(__u32))) {
+ if (ib_copy_to_udata(udata, &pd_resp, sizeof(pd_resp))) {
dev_warn(&dev->pdev->dev,
"failed to copy back protection domain\n");
pvrdma_dealloc_pd(&pd->ibpd);
Patches currently in stable-queue which might be from aditr(a)vmware.com are
queue-4.14/rdma-vmw_pvrdma-fix-usage-of-user-response-structures-in-abi-file.patch
This is a note to let you know that I've just added the patch titled
kbuild: fix linker feature test macros when cross compiling with Clang
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kbuild-fix-linker-feature-test-macros-when-cross-compiling-with-clang.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 86a9df597cdd564d2d29c65897bcad42519e3678 Mon Sep 17 00:00:00 2001
From: Nick Desaulniers <ndesaulniers(a)google.com>
Date: Mon, 6 Nov 2017 10:47:54 -0800
Subject: kbuild: fix linker feature test macros when cross compiling with Clang
From: Nick Desaulniers <ndesaulniers(a)google.com>
commit 86a9df597cdd564d2d29c65897bcad42519e3678 upstream.
I was not seeing my linker flags getting added when using ld-option when
cross compiling with Clang. Upon investigation, this seems to be due to
a difference in how GCC vs Clang handle cross compilation.
GCC is configured at build time to support one backend, that is implicit
when compiling. Clang is explicit via the use of `-target <triple>` and
ships with all supported backends by default.
GNU Make feature test macros that compile then link will always fail
when cross compiling with Clang unless Clang's triple is passed along to
the compiler. For example:
$ clang -x c /dev/null -c -o temp.o
$ aarch64-linux-android/bin/ld -E temp.o
aarch64-linux-android/bin/ld:
unknown architecture of input file `temp.o' is incompatible with
aarch64 output
aarch64-linux-android/bin/ld:
warning: cannot find entry symbol _start; defaulting to
0000000000400078
$ echo $?
1
$ clang -target aarch64-linux-android- -x c /dev/null -c -o temp.o
$ aarch64-linux-android/bin/ld -E temp.o
aarch64-linux-android/bin/ld:
warning: cannot find entry symbol _start; defaulting to 00000000004002e4
$ echo $?
0
This causes conditional checks that invoke $(CC) without the target
triple, then $(LD) on the result, to always fail.
Suggested-by: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Signed-off-by: Nick Desaulniers <ndesaulniers(a)google.com>
Reviewed-by: Matthias Kaehlcke <mka(a)chromium.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Signed-off-by: Greg Hackmann <ghackmann(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
scripts/Kbuild.include | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -160,12 +160,13 @@ cc-if-fullversion = $(shell [ $(cc-fullv
# cc-ldoption
# Usage: ldflags += $(call cc-ldoption, -Wl$(comma)--hash-style=both)
cc-ldoption = $(call try-run,\
- $(CC) $(1) -nostdlib -x c /dev/null -o "$$TMP",$(1),$(2))
+ $(CC) $(1) $(KBUILD_CPPFLAGS) $(CC_OPTION_CFLAGS) -nostdlib -x c /dev/null -o "$$TMP",$(1),$(2))
# ld-option
# Usage: LDFLAGS += $(call ld-option, -X)
ld-option = $(call try-run,\
- $(CC) -x c /dev/null -c -o "$$TMPO" ; $(LD) $(1) "$$TMPO" -o "$$TMP",$(1),$(2))
+ $(CC) $(KBUILD_CPPFLAGS) $(CC_OPTION_CFLAGS) -x c /dev/null -c -o "$$TMPO"; \
+ $(LD) $(LDFLAGS) $(1) "$$TMPO" -o "$$TMP",$(1),$(2))
# ar-option
# Usage: KBUILD_ARFLAGS := $(call ar-option,D)
Patches currently in stable-queue which might be from ndesaulniers(a)google.com are
queue-4.14/kbuild-fix-linker-feature-test-macros-when-cross-compiling-with-clang.patch
This is a note to let you know that I've just added the patch titled
IB/mlx5: Fix out-of-bounds read in create_raw_packet_qp_rq
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ib-mlx5-fix-out-of-bounds-read-in-create_raw_packet_qp_rq.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 2c292dbb398ee46fc1343daf6c3cf9715a75688e Mon Sep 17 00:00:00 2001
From: Boris Pismenny <borisp(a)mellanox.com>
Date: Thu, 8 Mar 2018 15:51:40 +0200
Subject: IB/mlx5: Fix out-of-bounds read in create_raw_packet_qp_rq
From: Boris Pismenny <borisp(a)mellanox.com>
commit 2c292dbb398ee46fc1343daf6c3cf9715a75688e upstream.
Add a check for the length of the qpin structure to prevent out-of-bounds reads
BUG: KASAN: slab-out-of-bounds in create_raw_packet_qp+0x114c/0x15e2
Read of size 8192 at addr ffff880066b99290 by task syz-executor3/549
CPU: 3 PID: 549 Comm: syz-executor3 Not tainted 4.15.0-rc2+ #27 Hardware
name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
dump_stack+0x8d/0xd4
print_address_description+0x73/0x290
kasan_report+0x25c/0x370
? create_raw_packet_qp+0x114c/0x15e2
memcpy+0x1f/0x50
create_raw_packet_qp+0x114c/0x15e2
? create_raw_packet_qp_tis.isra.28+0x13d/0x13d
? lock_acquire+0x370/0x370
create_qp_common+0x2245/0x3b50
? destroy_qp_user.isra.47+0x100/0x100
? kasan_kmalloc+0x13d/0x170
? sched_clock_cpu+0x18/0x180
? fs_reclaim_acquire.part.15+0x5/0x30
? __lock_acquire+0xa11/0x1da0
? sched_clock_cpu+0x18/0x180
? kmem_cache_alloc_trace+0x17e/0x310
? mlx5_ib_create_qp+0x30e/0x17b0
mlx5_ib_create_qp+0x33d/0x17b0
? sched_clock_cpu+0x18/0x180
? create_qp_common+0x3b50/0x3b50
? lock_acquire+0x370/0x370
? __radix_tree_lookup+0x180/0x220
? uverbs_try_lock_object+0x68/0xc0
? rdma_lookup_get_uobject+0x114/0x240
create_qp.isra.5+0xce4/0x1e20
? ib_uverbs_ex_create_cq_cb+0xa0/0xa0
? copy_ah_attr_from_uverbs.isra.2+0xa00/0xa00
? ib_uverbs_cq_event_handler+0x160/0x160
? __might_fault+0x17c/0x1c0
ib_uverbs_create_qp+0x21b/0x2a0
? ib_uverbs_destroy_cq+0x2e0/0x2e0
ib_uverbs_write+0x55a/0xad0
? ib_uverbs_destroy_cq+0x2e0/0x2e0
? ib_uverbs_destroy_cq+0x2e0/0x2e0
? ib_uverbs_open+0x760/0x760
? futex_wake+0x147/0x410
? check_prev_add+0x1680/0x1680
? do_futex+0x3d3/0xa60
? sched_clock_cpu+0x18/0x180
__vfs_write+0xf7/0x5c0
? ib_uverbs_open+0x760/0x760
? kernel_read+0x110/0x110
? lock_acquire+0x370/0x370
? __fget+0x264/0x3b0
vfs_write+0x18a/0x460
SyS_write+0xc7/0x1a0
? SyS_read+0x1a0/0x1a0
? trace_hardirqs_on_thunk+0x1a/0x1c
entry_SYSCALL_64_fastpath+0x18/0x85
RIP: 0033:0x4477b9
RSP: 002b:00007f1822cadc18 EFLAGS: 00000292 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00000000004477b9
RDX: 0000000000000070 RSI: 000000002000a000 RDI: 0000000000000005
RBP: 0000000000708000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000292 R12: 00000000ffffffff
R13: 0000000000005d70 R14: 00000000006e6e30 R15: 0000000020010ff0
Allocated by task 549:
__kmalloc+0x15e/0x340
kvmalloc_node+0xa1/0xd0
create_user_qp.isra.46+0xd42/0x1610
create_qp_common+0x2e63/0x3b50
mlx5_ib_create_qp+0x33d/0x17b0
create_qp.isra.5+0xce4/0x1e20
ib_uverbs_create_qp+0x21b/0x2a0
ib_uverbs_write+0x55a/0xad0
__vfs_write+0xf7/0x5c0
vfs_write+0x18a/0x460
SyS_write+0xc7/0x1a0
entry_SYSCALL_64_fastpath+0x18/0x85
Freed by task 368:
kfree+0xeb/0x2f0
kernfs_fop_release+0x140/0x180
__fput+0x266/0x700
task_work_run+0x104/0x180
exit_to_usermode_loop+0xf7/0x110
syscall_return_slowpath+0x298/0x370
entry_SYSCALL_64_fastpath+0x83/0x85
The buggy address belongs to the object at ffff880066b99180 which
belongs to the cache kmalloc-512 of size 512 The buggy address is
located 272 bytes inside of 512-byte region [ffff880066b99180,
ffff880066b99380) The buggy address belongs to the page:
page:000000006040eedd count:1 mapcount:0 mapping: (null)
index:0x0 compound_mapcount: 0
flags: 0x4000000000008100(slab|head)
raw: 4000000000008100 0000000000000000 0000000000000000 0000000180190019
raw: ffffea00019a7500 0000000b0000000b ffff88006c403080 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff880066b99180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff880066b99200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff880066b99280: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff880066b99300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff880066b99380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Cc: syzkaller <syzkaller(a)googlegroups.com>
Fixes: 0fb2ed66a14c ("IB/mlx5: Add create and destroy functionality for Raw Packet QP")
Signed-off-by: Boris Pismenny <borisp(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leon(a)kernel.org>
Signed-off-by: Doug Ledford <dledford(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/hw/mlx5/qp.c | 23 ++++++++++++++++-------
1 file changed, 16 insertions(+), 7 deletions(-)
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1130,7 +1130,7 @@ static void destroy_raw_packet_qp_sq(str
ib_umem_release(sq->ubuffer.umem);
}
-static int get_rq_pas_size(void *qpc)
+static size_t get_rq_pas_size(void *qpc)
{
u32 log_page_size = MLX5_GET(qpc, qpc, log_page_size) + 12;
u32 log_rq_stride = MLX5_GET(qpc, qpc, log_rq_stride);
@@ -1146,7 +1146,8 @@ static int get_rq_pas_size(void *qpc)
}
static int create_raw_packet_qp_rq(struct mlx5_ib_dev *dev,
- struct mlx5_ib_rq *rq, void *qpin)
+ struct mlx5_ib_rq *rq, void *qpin,
+ size_t qpinlen)
{
struct mlx5_ib_qp *mqp = rq->base.container_mibqp;
__be64 *pas;
@@ -1155,9 +1156,12 @@ static int create_raw_packet_qp_rq(struc
void *rqc;
void *wq;
void *qpc = MLX5_ADDR_OF(create_qp_in, qpin, qpc);
- int inlen;
+ size_t rq_pas_size = get_rq_pas_size(qpc);
+ size_t inlen;
int err;
- u32 rq_pas_size = get_rq_pas_size(qpc);
+
+ if (qpinlen < rq_pas_size + MLX5_BYTE_OFF(create_qp_in, pas))
+ return -EINVAL;
inlen = MLX5_ST_SZ_BYTES(create_rq_in) + rq_pas_size;
in = mlx5_vzalloc(inlen);
@@ -1235,7 +1239,7 @@ static void destroy_raw_packet_qp_tir(st
}
static int create_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp,
- u32 *in,
+ u32 *in, size_t inlen,
struct ib_pd *pd)
{
struct mlx5_ib_raw_packet_qp *raw_packet_qp = &qp->raw_packet_qp;
@@ -1262,7 +1266,7 @@ static int create_raw_packet_qp(struct m
if (qp->rq.wqe_cnt) {
rq->base.container_mibqp = qp;
- err = create_raw_packet_qp_rq(dev, rq, in);
+ err = create_raw_packet_qp_rq(dev, rq, in, inlen);
if (err)
goto err_destroy_sq;
@@ -1753,10 +1757,15 @@ static int create_qp_common(struct mlx5_
qp->flags |= MLX5_IB_QP_LSO;
}
+ if (inlen < 0) {
+ err = -EINVAL;
+ goto err;
+ }
+
if (init_attr->qp_type == IB_QPT_RAW_PACKET) {
qp->raw_packet_qp.sq.ubuffer.buf_addr = ucmd.sq_buf_addr;
raw_packet_qp_copy_info(qp, &qp->raw_packet_qp);
- err = create_raw_packet_qp(dev, qp, in, pd);
+ err = create_raw_packet_qp(dev, qp, in, inlen, pd);
} else {
err = mlx5_core_create_qp(dev->mdev, &base->mqp, in, inlen);
}
Patches currently in stable-queue which might be from borisp(a)mellanox.com are
queue-4.9/ib-mlx5-fix-integer-overflows-in-mlx5_ib_create_srq.patch
queue-4.9/ib-mlx5-fix-out-of-bounds-read-in-create_raw_packet_qp_rq.patch
This is a note to let you know that I've just added the patch titled
IB/mlx5: Fix integer overflows in mlx5_ib_create_srq
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ib-mlx5-fix-integer-overflows-in-mlx5_ib_create_srq.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From c2b37f76485f073f020e60b5954b6dc4e55f693c Mon Sep 17 00:00:00 2001
From: Boris Pismenny <borisp(a)mellanox.com>
Date: Thu, 8 Mar 2018 15:51:41 +0200
Subject: IB/mlx5: Fix integer overflows in mlx5_ib_create_srq
From: Boris Pismenny <borisp(a)mellanox.com>
commit c2b37f76485f073f020e60b5954b6dc4e55f693c upstream.
This patch validates user provided input to prevent integer overflow due
to integer manipulation in the mlx5_ib_create_srq function.
Cc: syzkaller <syzkaller(a)googlegroups.com>
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Boris Pismenny <borisp(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leon(a)kernel.org>
Signed-off-by: Doug Ledford <dledford(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/hw/mlx5/srq.c | 15 +++++++++------
include/linux/mlx5/driver.h | 4 ++--
2 files changed, 11 insertions(+), 8 deletions(-)
--- a/drivers/infiniband/hw/mlx5/srq.c
+++ b/drivers/infiniband/hw/mlx5/srq.c
@@ -243,8 +243,8 @@ struct ib_srq *mlx5_ib_create_srq(struct
{
struct mlx5_ib_dev *dev = to_mdev(pd->device);
struct mlx5_ib_srq *srq;
- int desc_size;
- int buf_size;
+ size_t desc_size;
+ size_t buf_size;
int err;
struct mlx5_srq_attr in = {0};
__u32 max_srq_wqes = 1 << MLX5_CAP_GEN(dev->mdev, log_max_srq_sz);
@@ -268,15 +268,18 @@ struct ib_srq *mlx5_ib_create_srq(struct
desc_size = sizeof(struct mlx5_wqe_srq_next_seg) +
srq->msrq.max_gs * sizeof(struct mlx5_wqe_data_seg);
+ if (desc_size == 0 || srq->msrq.max_gs > desc_size)
+ return ERR_PTR(-EINVAL);
desc_size = roundup_pow_of_two(desc_size);
- desc_size = max_t(int, 32, desc_size);
+ desc_size = max_t(size_t, 32, desc_size);
+ if (desc_size < sizeof(struct mlx5_wqe_srq_next_seg))
+ return ERR_PTR(-EINVAL);
srq->msrq.max_avail_gather = (desc_size - sizeof(struct mlx5_wqe_srq_next_seg)) /
sizeof(struct mlx5_wqe_data_seg);
srq->msrq.wqe_shift = ilog2(desc_size);
buf_size = srq->msrq.max * desc_size;
- mlx5_ib_dbg(dev, "desc_size 0x%x, req wr 0x%x, srq size 0x%x, max_gs 0x%x, max_avail_gather 0x%x\n",
- desc_size, init_attr->attr.max_wr, srq->msrq.max, srq->msrq.max_gs,
- srq->msrq.max_avail_gather);
+ if (buf_size < desc_size)
+ return ERR_PTR(-EINVAL);
in.type = init_attr->srq_type;
if (pd->uobject)
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -380,8 +380,8 @@ struct mlx5_core_srq {
struct mlx5_core_rsc_common common; /* must be first */
u32 srqn;
int max;
- int max_gs;
- int max_avail_gather;
+ size_t max_gs;
+ size_t max_avail_gather;
int wqe_shift;
void (*event) (struct mlx5_core_srq *, enum mlx5_event);
Patches currently in stable-queue which might be from borisp(a)mellanox.com are
queue-4.9/ib-mlx5-fix-integer-overflows-in-mlx5_ib_create_srq.patch
queue-4.9/ib-mlx5-fix-out-of-bounds-read-in-create_raw_packet_qp_rq.patch
From: Gabriel Matni <gabriel.matni(a)exfo.com>
Fixes missing characters on kernel console at low baud rates (i.e.9600).
The driver should poll TX_RDY or TX_FIFO_EMP instead of TX_EMP to ensure
that the transmitter holding register (THR) is ready to receive a new byte.
TX_EMP tells us when it is possible to send a break sequence via
SND_BRK_SEQ. While this also indicates that both the THR and the TSR are
empty, it does not guarantee that a new byte can be written just yet.
Fixes: 30530791a7a0 ("serial: mvebu-uart: initial support for Armada-3700 serial port")
Reviewed-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Acked-by: Gregory CLEMENT <gregory.clement(a)bootlin.com>
Signed-off-by: Gabriel Matni <gabriel.matni(a)exfo.com>
---
Changes since v3:
- mail the patch in a clean email
- fix the subject line accordingly (no Re:)
Changes since v2:
- use one line for the "Fixes" entry
- removed trailing space between Signed-off-by entry and ---
- start using versioning, previous fixes in v1
Changes since v1:
- patch was corrupt, could not be applied
- fixed line indent
---
drivers/tty/serial/mvebu-uart.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/tty/serial/mvebu-uart.c b/drivers/tty/serial/mvebu-uart.c
index a100e98259d7..f0df0640208e 100644
--- a/drivers/tty/serial/mvebu-uart.c
+++ b/drivers/tty/serial/mvebu-uart.c
@@ -618,7 +618,7 @@ static void wait_for_xmitr(struct uart_port *port)
u32 val;
readl_poll_timeout_atomic(port->membase + UART_STAT, val,
- (val & STAT_TX_EMP), 1, 10000);
+ (val & STAT_TX_RDY(port)), 1, 10000);
}
static void mvebu_uart_console_putchar(struct uart_port *port, int ch)
--
2.7.4
This is a note to let you know that I've just added the patch titled
clk: migrate the count of orphaned clocks at init
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
clk-migrate-the-count-of-orphaned-clocks-at-init.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 99652a469df19086d594e8e89757d4081a812789 Mon Sep 17 00:00:00 2001
From: Jerome Brunet <jbrunet(a)baylibre.com>
Date: Wed, 14 Feb 2018 14:43:36 +0100
Subject: clk: migrate the count of orphaned clocks at init
From: Jerome Brunet <jbrunet(a)baylibre.com>
commit 99652a469df19086d594e8e89757d4081a812789 upstream.
The orphan clocks reparents should migrate any existing count from the
orphan clock to its new acestor clocks, otherwise we may have
inconsistent counts in the tree and end-up with gated critical clocks
Assuming we have two clocks, A and B.
* Clock A has CLK_IS_CRITICAL flag set.
* Clock B is an ancestor of A which can gate. Clock B gate is left
enabled by the bootloader.
Step 1: Clock A is registered. Since it is a critical clock, it is
enabled. The clock being still an orphan, no parent are enabled.
Step 2: Clock B is registered and reparented to clock A (potentially
through several other clocks). We are now in situation where the enable
count of clock A is 1 while the enable count of its ancestors is 0, which
is not good.
Step 3: in lateinit, clk_disable_unused() is called, the enable_count of
clock B being 0, clock B is gated and and critical clock A actually gets
disabled.
This situation was found while adding fdiv_clk gates to the meson8b
platform. These clocks parent clk81 critical clock, which is the mother
of all peripheral clocks in this system. Because of the issue described
here, the system is crashing when clk_disable_unused() is called.
The situation is solved by reverting
commit f8f8f1d04494 ("clk: Don't touch hardware when reparenting during registration").
To avoid breaking again the situation described in this commit
description, enabling critical clock should be done before walking the
orphan list. This way, a parent critical clock may not be accidentally
disabled due to the CLK_OPS_PARENT_ENABLE mechanism.
Fixes: f8f8f1d04494 ("clk: Don't touch hardware when reparenting during registration")
Cc: Stephen Boyd <sboyd(a)codeaurora.org>
Cc: Shawn Guo <shawnguo(a)kernel.org>
Cc: Dong Aisheng <aisheng.dong(a)nxp.com>
Signed-off-by: Jerome Brunet <jbrunet(a)baylibre.com>
Tested-by: Marek Szyprowski <m.szyprowski(a)samsung.com>
Tested-by: Heiko Stuebner <heiko(a)sntech.de>
Signed-off-by: Michael Turquette <mturquette(a)baylibre.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/clk/clk.c | 37 +++++++++++++++++++++----------------
1 file changed, 21 insertions(+), 16 deletions(-)
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -2438,22 +2438,37 @@ static int __clk_core_init(struct clk_co
core->rate = core->req_rate = rate;
/*
+ * Enable CLK_IS_CRITICAL clocks so newly added critical clocks
+ * don't get accidentally disabled when walking the orphan tree and
+ * reparenting clocks
+ */
+ if (core->flags & CLK_IS_CRITICAL) {
+ unsigned long flags;
+
+ clk_core_prepare(core);
+
+ flags = clk_enable_lock();
+ clk_core_enable(core);
+ clk_enable_unlock(flags);
+ }
+
+ /*
* walk the list of orphan clocks and reparent any that newly finds a
* parent.
*/
hlist_for_each_entry_safe(orphan, tmp2, &clk_orphan_list, child_node) {
struct clk_core *parent = __clk_init_parent(orphan);
- unsigned long flags;
/*
- * we could call __clk_set_parent, but that would result in a
- * redundant call to the .set_rate op, if it exists
+ * We need to use __clk_set_parent_before() and _after() to
+ * to properly migrate any prepare/enable count of the orphan
+ * clock. This is important for CLK_IS_CRITICAL clocks, which
+ * are enabled during init but might not have a parent yet.
*/
if (parent) {
/* update the clk tree topology */
- flags = clk_enable_lock();
- clk_reparent(orphan, parent);
- clk_enable_unlock(flags);
+ __clk_set_parent_before(orphan, parent);
+ __clk_set_parent_after(orphan, parent, NULL);
__clk_recalc_accuracies(orphan);
__clk_recalc_rates(orphan, 0);
}
@@ -2470,16 +2485,6 @@ static int __clk_core_init(struct clk_co
if (core->ops->init)
core->ops->init(core->hw);
- if (core->flags & CLK_IS_CRITICAL) {
- unsigned long flags;
-
- clk_core_prepare(core);
-
- flags = clk_enable_lock();
- clk_core_enable(core);
- clk_enable_unlock(flags);
- }
-
kref_init(&core->ref);
out:
clk_prepare_unlock();
Patches currently in stable-queue which might be from jbrunet(a)baylibre.com are
queue-4.9/clk-migrate-the-count-of-orphaned-clocks-at-init.patch
This is a note to let you know that I've just added the patch titled
serial: 8250_pci: Don't fail on multiport card class
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
serial-8250_pci-don-t-fail-on-multiport-card-class.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From e7f3e99cb1a667d04d60d02957fbed58b50d4e5a Mon Sep 17 00:00:00 2001
From: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Date: Fri, 2 Feb 2018 20:39:13 +0200
Subject: serial: 8250_pci: Don't fail on multiport card class
From: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
commit e7f3e99cb1a667d04d60d02957fbed58b50d4e5a upstream.
Do not fail on multiport cards in serial_pci_is_class_communication().
It restores behaviour for SUNIX multiport cards, that enumerated by
class and have a custom board data.
Moreover it allows users to reenumerate port-by-port from user space.
Fixes: 7d8905d06405 ("serial: 8250_pci: Enable device after we check black list")
Reported-by: Nikola Ciprich <nikola.ciprich(a)linuxbox.cz>
Signed-off-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Tested-by: Nikola Ciprich <nikola.ciprich(a)linuxbox.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/8250/8250_pci.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
--- a/drivers/tty/serial/8250/8250_pci.c
+++ b/drivers/tty/serial/8250/8250_pci.c
@@ -3387,11 +3387,9 @@ static int serial_pci_is_class_communica
/*
* If it is not a communications device or the programming
* interface is greater than 6, give up.
- *
- * (Should we try to make guesses for multiport serial devices
- * later?)
*/
if ((((dev->class >> 8) != PCI_CLASS_COMMUNICATION_SERIAL) &&
+ ((dev->class >> 8) != PCI_CLASS_COMMUNICATION_MULTISERIAL) &&
((dev->class >> 8) != PCI_CLASS_COMMUNICATION_MODEM)) ||
(dev->class & 0xff) > 6)
return -ENODEV;
@@ -3428,6 +3426,12 @@ serial_pci_guess_board(struct pci_dev *d
{
int num_iomem, num_port, first_port = -1, i;
+ /*
+ * Should we try to make guesses for multiport serial devices later?
+ */
+ if ((dev->class >> 8) == PCI_CLASS_COMMUNICATION_MULTISERIAL)
+ return -ENODEV;
+
num_iomem = num_port = 0;
for (i = 0; i < PCI_NUM_BAR_RESOURCES; i++) {
if (pci_resource_flags(dev, i) & IORESOURCE_IO) {
Patches currently in stable-queue which might be from andriy.shevchenko(a)linux.intel.com are
queue-4.15/serial-8250_pci-don-t-fail-on-multiport-card-class.patch