The patch titled
Subject: mm: always have io_remap_pfn_range() set pgprot_decrypted()
has been removed from the -mm tree. Its filename was
mm-always-have-io_remap_pfn_range-set-pgprot_decrypted.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Jason Gunthorpe <jgg(a)nvidia.com>
Subject: mm: always have io_remap_pfn_range() set pgprot_decrypted()
The purpose of io_remap_pfn_range() is to map IO memory, such as a memory
mapped IO exposed through a PCI BAR. IO devices do not understand
encryption, so this memory must always be decrypted. Automatically call
pgprot_decrypted() as part of the generic implementation.
This fixes a bug where enabling AMD SME causes subsystems, such as RDMA,
using io_remap_pfn_range() to expose BAR pages to user space to fail. The
CPU will encrypt access to those BAR pages instead of passing unencrypted
IO directly to the device.
Places not mapping IO should use remap_pfn_range().
Link: https://lkml.kernel.org/r/0-v1-025d64bdf6c4+e-amd_sme_fix_jgg@nvidia.com
Fixes: aca20d546214 ("x86/mm: Add support to make use of Secure Memory Encryption")
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
CcK Arnd Bergmann <arnd(a)arndb.de>
Cc: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: "Dave Young" <dyoung(a)redhat.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Larry Woodman <lwoodman(a)redhat.com>
Cc: Matt Fleming <matt(a)codeblueprint.co.uk>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: "Michael S. Tsirkin" <mst(a)redhat.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Toshimitsu Kani <toshi.kani(a)hpe.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm.h | 9 +++++++++
include/linux/pgtable.h | 4 ----
2 files changed, 9 insertions(+), 4 deletions(-)
--- a/include/linux/mm.h~mm-always-have-io_remap_pfn_range-set-pgprot_decrypted
+++ a/include/linux/mm.h
@@ -2759,6 +2759,15 @@ static inline vm_fault_t vmf_insert_page
return VM_FAULT_NOPAGE;
}
+#ifndef io_remap_pfn_range
+static inline int io_remap_pfn_range(struct vm_area_struct *vma,
+ unsigned long addr, unsigned long pfn,
+ unsigned long size, pgprot_t prot)
+{
+ return remap_pfn_range(vma, addr, pfn, size, pgprot_decrypted(prot));
+}
+#endif
+
static inline vm_fault_t vmf_error(int err)
{
if (err == -ENOMEM)
--- a/include/linux/pgtable.h~mm-always-have-io_remap_pfn_range-set-pgprot_decrypted
+++ a/include/linux/pgtable.h
@@ -1427,10 +1427,6 @@ typedef unsigned int pgtbl_mod_mask;
#endif /* !__ASSEMBLY__ */
-#ifndef io_remap_pfn_range
-#define io_remap_pfn_range remap_pfn_range
-#endif
-
#ifndef has_transparent_hugepage
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
#define has_transparent_hugepage() 1
_
Patches currently in -mm which might be from jgg(a)nvidia.com are
mm-gup-use-unpin_user_pages-in-check_and_migrate_cma_pages.patch
The patch titled
Subject: kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled
has been removed from the -mm tree. Its filename was
kthread_worker-prevent-queuing-delayed-work-from-timer_fn-when-it-is-being-canceled.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Zqiang <qiang.zhang(a)windriver.com>
Subject: kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled
There is a small race window when a delayed work is being canceled and the
work still might be queued from the timer_fn:
CPU0 CPU1
kthread_cancel_delayed_work_sync()
__kthread_cancel_work_sync()
__kthread_cancel_work()
work->canceling++;
kthread_delayed_work_timer_fn()
kthread_insert_work();
BUG: kthread_insert_work() should not get called when work->canceling is
set.
Link: https://lkml.kernel.org/r/20201014083030.16895-1-qiang.zhang@windriver.com
Signed-off-by: Zqiang <qiang.zhang(a)windriver.com>
Reviewed-by: Petr Mladek <pmladek(a)suse.com>
Acked-by: Tejun Heo <tj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/kthread.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/kernel/kthread.c~kthread_worker-prevent-queuing-delayed-work-from-timer_fn-when-it-is-being-canceled
+++ a/kernel/kthread.c
@@ -897,7 +897,8 @@ void kthread_delayed_work_timer_fn(struc
/* Move the work from worker->delayed_work_list. */
WARN_ON_ONCE(list_empty(&work->node));
list_del_init(&work->node);
- kthread_insert_work(worker, work, &worker->work_list);
+ if (!work->canceling)
+ kthread_insert_work(worker, work, &worker->work_list);
raw_spin_unlock_irqrestore(&worker->lock, flags);
}
_
Patches currently in -mm which might be from qiang.zhang(a)windriver.com are
The patch titled
Subject: ptrace: fix task_join_group_stop() for the case when current is traced
has been removed from the -mm tree. Its filename was
ptrace-fix-task_join_group_stop-for-the-case-when-current-is-traced.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Oleg Nesterov <oleg(a)redhat.com>
Subject: ptrace: fix task_join_group_stop() for the case when current is traced
This testcase
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <pthread.h>
#include <assert.h>
void *tf(void *arg)
{
return NULL;
}
int main(void)
{
int pid = fork();
if (!pid) {
kill(getpid(), SIGSTOP);
pthread_t th;
pthread_create(&th, NULL, tf, NULL);
return 0;
}
waitpid(pid, NULL, WSTOPPED);
ptrace(PTRACE_SEIZE, pid, 0, PTRACE_O_TRACECLONE);
waitpid(pid, NULL, 0);
ptrace(PTRACE_CONT, pid, 0,0);
waitpid(pid, NULL, 0);
int status;
int thread = waitpid(-1, &status, 0);
assert(thread > 0 && thread != pid);
assert(status == 0x80137f);
return 0;
}
fails and triggers WARN_ON_ONCE(!signr) in do_jobctl_trap().
This is because task_join_group_stop() has 2 problems when current is traced:
1. We can't rely on the "JOBCTL_STOP_PENDING" check, a stopped tracee
can be woken up by debugger and it can clone another thread which
should join the group-stop.
We need to check group_stop_count || SIGNAL_STOP_STOPPED.
2. If SIGNAL_STOP_STOPPED is already set, we should not increment
sig->group_stop_count and add JOBCTL_STOP_CONSUME. The new thread
should stop without another do_notify_parent_cldstop() report.
To clarify, the problem is very old and we should blame
ptrace_init_task(). But now that we have task_join_group_stop() it makes
more sense to fix this helper to avoid the code duplication.
Link: https://lkml.kernel.org/r/20201019134237.GA18810@redhat.com
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Reported-by: syzbot+3485e3773f7da290eecc(a)syzkaller.appspotmail.com
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Christian Brauner <christian(a)brauner.io>
Cc: "Eric W . Biederman" <ebiederm(a)xmission.com>
Cc: Zhiqiang Liu <liuzhiqiang26(a)huawei.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/signal.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)
--- a/kernel/signal.c~ptrace-fix-task_join_group_stop-for-the-case-when-current-is-traced
+++ a/kernel/signal.c
@@ -391,16 +391,17 @@ static bool task_participate_group_stop(
void task_join_group_stop(struct task_struct *task)
{
+ unsigned long mask = current->jobctl & JOBCTL_STOP_SIGMASK;
+ struct signal_struct *sig = current->signal;
+
+ if (sig->group_stop_count) {
+ sig->group_stop_count++;
+ mask |= JOBCTL_STOP_CONSUME;
+ } else if (!(sig->flags & SIGNAL_STOP_STOPPED))
+ return;
+
/* Have the new thread join an on-going signal group stop */
- unsigned long jobctl = current->jobctl;
- if (jobctl & JOBCTL_STOP_PENDING) {
- struct signal_struct *sig = current->signal;
- unsigned long signr = jobctl & JOBCTL_STOP_SIGMASK;
- unsigned long gstop = JOBCTL_STOP_PENDING | JOBCTL_STOP_CONSUME;
- if (task_set_jobctl_pending(task, signr | gstop)) {
- sig->group_stop_count++;
- }
- }
+ task_set_jobctl_pending(task, mask | JOBCTL_STOP_PENDING);
}
/*
_
Patches currently in -mm which might be from oleg(a)redhat.com are
aio-simplify-read_events.patch
The patch titled
Subject: mm: mempolicy: fix potential pte_unmap_unlock pte error
has been removed from the -mm tree. Its filename was
mm-mempolicy-fix-potential-pte_unmap_unlock-pte-error.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Shijie Luo <luoshijie1(a)huawei.com>
Subject: mm: mempolicy: fix potential pte_unmap_unlock pte error
When flags in queue_pages_pte_range don't have MPOL_MF_MOVE or
MPOL_MF_MOVE_ALL bits, code breaks and passing origin pte - 1 to
pte_unmap_unlock seems like not a good idea.
queue_pages_pte_range can run in MPOL_MF_MOVE_ALL mode which doesn't
migrate misplaced pages but returns with EIO when encountering such a
page. Since commit a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO
when MPOL_MF_STRICT is specified") and early break on the first pte in the
range results in pte_unmap_unlock on an underflow pte. This can lead to
lockups later on when somebody tries to lock the pte resp.
page_table_lock again..
Link: https://lkml.kernel.org/r/20201019074853.50856-1-luoshijie1@huawei.com
Fixes: a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified")
Signed-off-by: Shijie Luo <luoshijie1(a)huawei.com>
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Feilong Lin <linfeilong(a)huawei.com>
Cc: Shijie Luo <luoshijie1(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mempolicy.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/mm/mempolicy.c~mm-mempolicy-fix-potential-pte_unmap_unlock-pte-error
+++ a/mm/mempolicy.c
@@ -525,7 +525,7 @@ static int queue_pages_pte_range(pmd_t *
unsigned long flags = qp->flags;
int ret;
bool has_unmovable = false;
- pte_t *pte;
+ pte_t *pte, *mapped_pte;
spinlock_t *ptl;
ptl = pmd_trans_huge_lock(pmd, vma);
@@ -539,7 +539,7 @@ static int queue_pages_pte_range(pmd_t *
if (pmd_trans_unstable(pmd))
return 0;
- pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
+ mapped_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
for (; addr != end; pte++, addr += PAGE_SIZE) {
if (!pte_present(*pte))
continue;
@@ -571,7 +571,7 @@ static int queue_pages_pte_range(pmd_t *
} else
break;
}
- pte_unmap_unlock(pte - 1, ptl);
+ pte_unmap_unlock(mapped_pte, ptl);
cond_resched();
if (has_unmovable)
_
Patches currently in -mm which might be from luoshijie1(a)huawei.com are
The patch titled
Subject: mm: memcg: link page counters to root if use_hierarchy is false
has been removed from the -mm tree. Its filename was
mm-memcg-link-page-counters-to-root-if-use_hierarchy-is-false.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Roman Gushchin <guro(a)fb.com>
Subject: mm: memcg: link page counters to root if use_hierarchy is false
Richard reported a warning which can be reproduced by running the LTP
madvise6 test (cgroup v1 in the non-hierarchical mode should be used):
[ 9.841552] ------------[ cut here ]------------
[ 9.841788] WARNING: CPU: 0 PID: 12 at mm/page_counter.c:57 page_counter_uncharge (mm/page_counter.c:57 mm/page_counter.c:50 mm/page_counter.c:156)
[ 9.841982] Modules linked in:
[ 9.842072] CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.9.0-rc7-22-default #77
[ 9.842266] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812d-rebuilt.opensuse.org 04/01/2014
[ 9.842571] Workqueue: events drain_local_stock
[ 9.842750] RIP: 0010:page_counter_uncharge (mm/page_counter.c:57 mm/page_counter.c:50 mm/page_counter.c:156)
[ 9.842894] Code: 0f c1 45 00 4c 29 e0 48 89 ef 48 89 c3 48 89 c6 e8 2a fe ff ff 48 85 db 78 10 48 8b 6d 28 48 85 ed 75 d8 5b 5d 41 5c 41 5d c3 <0f> 0b eb ec 90 e8 4b f9 88 2a 48 8b 17 48 39 d6 72 41 41 54 49 89
[ 9.843438] RSP: 0018:ffffb1c18006be28 EFLAGS: 00010086
[ 9.843585] RAX: ffffffffffffffff RBX: ffffffffffffffff RCX: ffff94803bc2cae0
[ 9.843806] RDX: 0000000000000001 RSI: ffffffffffffffff RDI: ffff948007d2b248
[ 9.844026] RBP: ffff948007d2b248 R08: ffff948007c58eb0 R09: ffff948007da05ac
[ 9.844248] R10: 0000000000000018 R11: 0000000000000018 R12: 0000000000000001
[ 9.844477] R13: ffffffffffffffff R14: 0000000000000000 R15: ffff94803bc2cac0
[ 9.844696] FS: 0000000000000000(0000) GS:ffff94803bc00000(0000) knlGS:0000000000000000
[ 9.844915] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9.845096] CR2: 00007f0579ee0384 CR3: 000000002cc0a000 CR4: 00000000000006f0
[ 9.845319] Call Trace:
[ 9.845429] __memcg_kmem_uncharge (mm/memcontrol.c:3022)
[ 9.845582] drain_obj_stock (./include/linux/rcupdate.h:689 mm/memcontrol.c:3114)
[ 9.845684] drain_local_stock (mm/memcontrol.c:2255)
[ 9.845789] process_one_work (./arch/x86/include/asm/jump_label.h:25 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:108 kernel/workqueue.c:2274)
[ 9.845898] worker_thread (./include/linux/list.h:282 kernel/workqueue.c:2416)
[ 9.846034] ? process_one_work (kernel/workqueue.c:2358)
[ 9.846162] kthread (kernel/kthread.c:292)
[ 9.846271] ? __kthread_bind_mask (kernel/kthread.c:245)
[ 9.846420] ret_from_fork (arch/x86/entry/entry_64.S:300)
[ 9.846531] ---[ end trace 8b5647c1eba9d18a ]---
The problem occurs because in the non-hierarchical mode non-root page
counters are not linked to root page counters, so the charge is not
propagated to the root memory cgroup.
After the removal of the original memory cgroup and reparenting of the
object cgroup, the root cgroup might be uncharged by draining a objcg
stock, for example. It leads to an eventual underflow of the charge and
triggers a warning.
Fix it by linking all page counters to corresponding root page counters in
the non-hierarchical mode.
Please note, that in the non-hierarchical mode all objcgs are always
reparented to the root memory cgroup, even if the hierarchy has more than
1 level. This patch doesn't change it.
The patch also doesn't affect how the hierarchical mode is working, which
is the only sane and truly supported mode now.
Thanks to Richard for reporting, debugging and providing an alternative
version of the fix!
Link: https://lkml.kernel.org/r/20201026231326.3212225-1-guro@fb.com
Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Debugged-by: Richard Palethorpe <rpalethorpe(a)suse.com>
Reported-by: <ltp(a)lists.linux.it>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Reviewed-by: Michal Koutný <mkoutny(a)suse.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
--- a/mm/memcontrol.c~mm-memcg-link-page-counters-to-root-if-use_hierarchy-is-false
+++ a/mm/memcontrol.c
@@ -5345,17 +5345,22 @@ mem_cgroup_css_alloc(struct cgroup_subsy
memcg->swappiness = mem_cgroup_swappiness(parent);
memcg->oom_kill_disable = parent->oom_kill_disable;
}
- if (parent && parent->use_hierarchy) {
+ if (!parent) {
+ page_counter_init(&memcg->memory, NULL);
+ page_counter_init(&memcg->swap, NULL);
+ page_counter_init(&memcg->kmem, NULL);
+ page_counter_init(&memcg->tcpmem, NULL);
+ } else if (parent->use_hierarchy) {
memcg->use_hierarchy = true;
page_counter_init(&memcg->memory, &parent->memory);
page_counter_init(&memcg->swap, &parent->swap);
page_counter_init(&memcg->kmem, &parent->kmem);
page_counter_init(&memcg->tcpmem, &parent->tcpmem);
} else {
- page_counter_init(&memcg->memory, NULL);
- page_counter_init(&memcg->swap, NULL);
- page_counter_init(&memcg->kmem, NULL);
- page_counter_init(&memcg->tcpmem, NULL);
+ page_counter_init(&memcg->memory, &root_mem_cgroup->memory);
+ page_counter_init(&memcg->swap, &root_mem_cgroup->swap);
+ page_counter_init(&memcg->kmem, &root_mem_cgroup->kmem);
+ page_counter_init(&memcg->tcpmem, &root_mem_cgroup->tcpmem);
/*
* Deeper hierachy with use_hierarchy == false doesn't make
* much sense so let cgroup subsystem know about this
_
Patches currently in -mm which might be from guro(a)fb.com are
mm-memcontrol-use-helpers-to-read-pages-memcg-data.patch
mm-memcontrol-slab-use-helpers-to-access-slab-pages-memcg_data.patch
mm-introduce-page-memcg-flags.patch
mm-convert-page-kmemcg-type-to-a-page-memcg-flag.patch
mm-vmstat-fix-proc-sys-vm-stat_refresh-generating-false-warnings.patch
mm-vmstat-fix-proc-sys-vm-stat_refresh-generating-false-warnings-fix.patch
The patch titled
Subject: hugetlb_cgroup: fix reservation accounting
has been removed from the -mm tree. Its filename was
hugetlb_cgroup-fix-reservation-accounting.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlb_cgroup: fix reservation accounting
Michal Privoznik was using "free page reporting" in QEMU/virtio-balloon
with hugetlbfs and hit the warning below. QEMU with free page hinting
uses fallocate(FALLOC_FL_PUNCH_HOLE) to discard pages that are reported
as free by a VM. The reporting granularity is in pageblock granularity.
So when the guest reports 2M chunks, we fallocate(FALLOC_FL_PUNCH_HOLE)
one huge page in QEMU.
[ 315.251417] ------------[ cut here ]------------
[ 315.251424] WARNING: CPU: 7 PID: 6636 at mm/page_counter.c:57 page_counter_uncharge+0x4b/0x50
[ 315.251425] Modules linked in: ...
[ 315.251466] CPU: 7 PID: 6636 Comm: qemu-system-x86 Not tainted 5.9.0 #137
[ 315.251467] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO, BIOS F21 07/31/2020
[ 315.251469] RIP: 0010:page_counter_uncharge+0x4b/0x50
...
[ 315.251479] Call Trace:
[ 315.251485] hugetlb_cgroup_uncharge_file_region+0x4b/0x80
[ 315.251487] region_del+0x1d3/0x300
[ 315.251489] hugetlb_unreserve_pages+0x39/0xb0
[ 315.251492] remove_inode_hugepages+0x1a8/0x3d0
[ 315.251495] ? tlb_finish_mmu+0x7a/0x1d0
[ 315.251497] hugetlbfs_fallocate+0x3c4/0x5c0
[ 315.251519] ? kvm_arch_vcpu_ioctl_run+0x614/0x1700 [kvm]
[ 315.251522] ? file_has_perm+0xa2/0xb0
[ 315.251524] ? inode_security+0xc/0x60
[ 315.251525] ? selinux_file_permission+0x4e/0x120
[ 315.251527] vfs_fallocate+0x146/0x290
[ 315.251529] __x64_sys_fallocate+0x3e/0x70
[ 315.251531] do_syscall_64+0x33/0x40
[ 315.251533] entry_SYSCALL_64_after_hwframe+0x44/0xa9
...
[ 315.251542] ---[ end trace 4c88c62ccb1349c9 ]---
Investigation of the issue uncovered bugs in hugetlb cgroup reservation
accounting. This patch addresses the found issues.
Link: https://lkml.kernel.org/r/20201021204426.36069-1-mike.kravetz@oracle.com
Fixes: 075a61d07a8e ("hugetlb_cgroup: add accounting for shared mappings")
Cc: <stable(a)vger.kernel.org>
Reported-by: Michal Privoznik <mprivozn(a)redhat.com>
Co-developed-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Tested-by: Michal Privoznik <mprivozn(a)redhat.com>
Acked-by: Michael S. Tsirkin <mst(a)redhat.com>
Reviewed-by: Mina Almasry <almasrymina(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar(a)linux.vnet.ibm.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
--- a/mm/hugetlb.c~hugetlb_cgroup-fix-reservation-accounting
+++ a/mm/hugetlb.c
@@ -648,6 +648,8 @@ retry:
}
del += t - f;
+ hugetlb_cgroup_uncharge_file_region(
+ resv, rg, t - f);
/* New entry for end of split region */
nrg->from = t;
@@ -660,9 +662,6 @@ retry:
/* Original entry is trimmed */
rg->to = f;
- hugetlb_cgroup_uncharge_file_region(
- resv, rg, nrg->to - nrg->from);
-
list_add(&nrg->link, &rg->link);
nrg = NULL;
break;
@@ -678,17 +677,17 @@ retry:
}
if (f <= rg->from) { /* Trim beginning of region */
- del += t - rg->from;
- rg->from = t;
-
hugetlb_cgroup_uncharge_file_region(resv, rg,
t - rg->from);
- } else { /* Trim end of region */
- del += rg->to - f;
- rg->to = f;
+ del += t - rg->from;
+ rg->from = t;
+ } else { /* Trim end of region */
hugetlb_cgroup_uncharge_file_region(resv, rg,
rg->to - f);
+
+ del += rg->to - f;
+ rg->to = f;
}
}
@@ -2443,6 +2442,9 @@ struct page *alloc_huge_page(struct vm_a
rsv_adjust = hugepage_subpool_put_pages(spool, 1);
hugetlb_acct_memory(h, -rsv_adjust);
+ if (deferred_reserve)
+ hugetlb_cgroup_uncharge_page_rsvd(hstate_index(h),
+ pages_per_huge_page(h), page);
}
return page;
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
Specify type alignment when declaring linker-section match-table entries
to prevent gcc from increasing alignment and corrupting the various
tables with padding (e.g. timers, irqchips, clocks, reserved memory).
This is specifically needed on x86 where gcc (typically) aligns larger
objects like struct of_device_id with static extent on 32-byte
boundaries which at best prevents matching on anything but the first
entry.
Here's a 64-bit example where all entries are corrupt as 16 bytes of
padding has been inserted before the first entry:
ffffffff8266b4b0 D __clk_of_table
ffffffff8266b4c0 d __of_table_fixed_factor_clk
ffffffff8266b5a0 d __of_table_fixed_clk
ffffffff8266b680 d __clk_of_table_sentinel
And here's a 32-bit example where the 8-byte-aligned table happens to be
placed on a 32-byte boundary so that all but the first entry are corrupt
due to the 28 bytes of padding inserted between entries:
812b3ec0 D __irqchip_of_table
812b3ec0 d __of_table_irqchip1
812b3fa0 d __of_table_irqchip2
812b4080 d __of_table_irqchip3
812b4160 d irqchip_of_match_end
Verified on x86 using gcc-9.3 and gcc-4.9 (which uses 64-byte
alignment), and on arm using gcc-7.2.
Note that there are no in-tree users of these tables on x86 currently
(even if they are included in the image).
Fixes: 54196ccbe0ba ("of: consolidate linker section OF match table declarations")
Fixes: f6e916b82022 ("irqchip: add basic infrastructure")
Cc: stable <stable(a)vger.kernel.org> # 3.9
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
include/linux/of.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/linux/of.h b/include/linux/of.h
index 5d51891cbf1a..af655d264f10 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -1300,6 +1300,7 @@ static inline int of_get_available_child_count(const struct device_node *np)
#define _OF_DECLARE(table, name, compat, fn, fn_type) \
static const struct of_device_id __of_table_##name \
__used __section("__" #table "_of_table") \
+ __aligned(__alignof__(struct of_device_id)) \
= { .compatible = compat, \
.data = (fn == (fn_type)NULL) ? fn : fn }
#else
--
2.26.2