- Linux-stable-mirror - lists.linaro.org

[PATCH v2] powerpc/tm: Set MSR[TS] just prior to recheckpoint

by Breno Leitao

On a signal handler return, the user could set a context with MSR[TS] bits set, and these bits would be copied to task regs->msr. At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set, several __get_user() are called and then a recheckpoint is executed. This is a problem since a page fault (in kernel space) could happen when calling __get_user(). If it happens, the process MSR[TS] bits were already set, but recheckpoint was not executed, and SPRs are still invalid. The page fault can cause the current process to be de-scheduled, with MSR[TS] active and without tm_recheckpoint() being called. More importantly, without TEXASR[FS] bit set also. Since TEXASR might not have the FS bit set, and when the process is scheduled back, it will try to reclaim, which will be aborted because of the CPU is not in the suspended state, and, then, recheckpoint. This recheckpoint will restore thread->texasr into TEXASR SPR, which might be zero, hitting a BUG_ON(). kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434! cpu 0xb: Vector: 700 (Program Check) at [c00000041f1576d0] pc: c000000000054550: restore_gprs+0xb0/0x180 lr: 0000000000000000 sp: c00000041f157950 msr: 8000000100021033 current = 0xc00000041f143000 paca = 0xc00000000fb86300 softe: 0 irq_happened: 0x01 pid = 1021, comm = kworker/11:1 kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434! Linux version 4.9.0-3-powerpc64le (debian-kernel(a)lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) enter ? for help [c00000041f157b30] c00000000001bc3c tm_recheckpoint.part.11+0x6c/0xa0 [c00000041f157b70] c00000000001d184 __switch_to+0x1e4/0x4c0 [c00000041f157bd0] c00000000082eeb8 __schedule+0x2f8/0x990 [c00000041f157cb0] c00000000082f598 schedule+0x48/0xc0 [c00000041f157ce0] c0000000000f0d28 worker_thread+0x148/0x610 [c00000041f157d80] c0000000000f96b0 kthread+0x120/0x140 [c00000041f157e30] c00000000000c0e0 ret_from_kernel_thread+0x5c/0x7c This patch simply delays the MSR[TS] set, so, if there is any page fault in the __get_user() section, it does not have regs->msr[TS] set, since the TM structures are still invalid, thus avoiding doing TM operations for in-kernel exceptions and possible process reschedule. With this patch, the MSR[TS] will only be set just before recheckpointing and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in invalid state. Other than that, if CONFIG_PREEMPT is set, there might be a preemption just after setting MSR[TS] and before tm_recheckpoint(), thus, this block must be atomic from a preemption perspective, thus, calling preempt_disable/enable() on this code. It is not possible to move tm_recheckpoint to happen earlier, because it is required to get the checkpointed registers from userspace, with __get_user(), thus, the only way to avoid this undesired behavior is delaying the MSR[TS] set. The 32-bits signal handler seems to be safe this current issue, but, it might be exposed to the preemption issue, thus, disabling preemption in this chunk of code. Changes from v2: * Run the critical section with preempt_disable. Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals") Cc: stable(a)vger.kernel.org (v3.9+) Signed-off-by: Breno Leitao <leitao(a)debian.org> diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c index e6474a45cef5..fd59fef9931b 100644 --- a/arch/powerpc/kernel/signal_32.c +++ b/arch/powerpc/kernel/signal_32.c @@ -848,7 +848,23 @@ static long restore_tm_user_regs(struct pt_regs *regs, /* If TM bits are set to the reserved value, it's an invalid context */ if (MSR_TM_RESV(msr_hi)) return 1; - /* Pull in the MSR TM bits from the user context */ + + /* + * Disabling preemption, since it is unsafe to be preempted + * with MSR[TS] set without recheckpointing. + */ + preempt_disable(); + + /* + * CAUTION: + * After regs->MSR[TS] being updated, make sure that get_user(), + * put_user() or similar functions are *not* called. These + * functions can generate page faults which will cause the process + * to be de-scheduled with MSR[TS] set but without calling + * tm_recheckpoint(). This can cause a bug. + * + * Pull in the MSR TM bits from the user context + */ regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr_hi & MSR_TS_MASK); /* Now, recheckpoint. This loads up all of the checkpointed (older) * registers, including FP and V[S]Rs. After recheckpointing, the @@ -873,6 +889,8 @@ static long restore_tm_user_regs(struct pt_regs *regs, } #endif + preempt_enable(); + return 0; } #endif diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index 83d51bf586c7..bbd1c73243d7 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct *tsk, if (MSR_TM_RESV(msr)) return -EINVAL; - /* pull in MSR TS bits from user context */ - regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK); - - /* - * Ensure that TM is enabled in regs->msr before we leave the signal - * handler. It could be the case that (a) user disabled the TM bit - * through the manipulation of the MSR bits in uc_mcontext or (b) the - * TM bit was disabled because a sufficient number of context switches - * happened whilst in the signal handler and load_tm overflowed, - * disabling the TM bit. In either case we can end up with an illegal - * TM state leading to a TM Bad Thing when we return to userspace. - */ - regs->msr |= MSR_TM; - /* pull in MSR LE from user context */ regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE); @@ -572,6 +558,34 @@ static long restore_tm_sigcontexts(struct task_struct *tsk, tm_enable(); /* Make sure the transaction is marked as failed */ tsk->thread.tm_texasr |= TEXASR_FS; + + /* + * Disabling preemption, since it is unsafe to be preempted + * with MSR[TS] set without recheckpointing. + */ + preempt_disable(); + + /* pull in MSR TS bits from user context */ + regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK); + + /* + * Ensure that TM is enabled in regs->msr before we leave the signal + * handler. It could be the case that (a) user disabled the TM bit + * through the manipulation of the MSR bits in uc_mcontext or (b) the + * TM bit was disabled because a sufficient number of context switches + * happened whilst in the signal handler and load_tm overflowed, + * disabling the TM bit. In either case we can end up with an illegal + * TM state leading to a TM Bad Thing when we return to userspace. + * + * CAUTION: + * After regs->MSR[TS] being updated, make sure that get_user(), + * put_user() or similar functions are *not* called. These + * functions can generate page faults which will cause the process + * to be de-scheduled with MSR[TS] set but without calling + * tm_recheckpoint(). This can cause a bug. + */ + regs->msr |= MSR_TM; + /* This loads the checkpointed FP/VEC state, if used */ tm_recheckpoint(&tsk->thread); @@ -585,6 +599,8 @@ static long restore_tm_sigcontexts(struct task_struct *tsk, regs->msr |= MSR_VEC; } + preempt_enable(); + return err; } #endif -- 2.19.0

6 years, 8 months

2
1
0 0

FAILED: patch "[PATCH] ASoC: sta32x: set ->component pointer in private struct" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 747df19747bc9752cd40b9cce761e17a033aa5c2 Mon Sep 17 00:00:00 2001 From: Daniel Mack <daniel(a)zonque.org> Date: Thu, 11 Oct 2018 20:32:05 +0200 Subject: [PATCH] ASoC: sta32x: set ->component pointer in private struct The ESD watchdog code in sta32x_watchdog() dereferences the pointer which is never assigned. This is a regression from a1be4cead9b950 ("ASoC: sta32x: Convert to direct regmap API usage.") which went unnoticed since nobody seems to use that ESD workaround. Fixes: a1be4cead9b950 ("ASoC: sta32x: Convert to direct regmap API usage.") Signed-off-by: Daniel Mack <daniel(a)zonque.org> Signed-off-by: Mark Brown <broonie(a)kernel.org> Cc: stable(a)vger.kernel.org diff --git a/sound/soc/codecs/sta32x.c b/sound/soc/codecs/sta32x.c index d5035f2f2b2b..ce508b4cc85c 100644 --- a/sound/soc/codecs/sta32x.c +++ b/sound/soc/codecs/sta32x.c @@ -879,6 +879,9 @@ static int sta32x_probe(struct snd_soc_component *component) struct sta32x_priv *sta32x = snd_soc_component_get_drvdata(component); struct sta32x_platform_data *pdata = sta32x->pdata; int i, ret = 0, thermal = 0; + + sta32x->component = component; + ret = regulator_bulk_enable(ARRAY_SIZE(sta32x->supplies), sta32x->supplies); if (ret != 0) {

6 years, 8 months

6
5
0 0

FAILED: patch "[PATCH] cifs: integer overflow in in SMB2_ioctl()" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 2d204ee9d671327915260071c19350d84344e096 Mon Sep 17 00:00:00 2001 From: Dan Carpenter <dan.carpenter(a)oracle.com> Date: Mon, 10 Sep 2018 14:12:07 +0300 Subject: [PATCH] cifs: integer overflow in in SMB2_ioctl() The "le32_to_cpu(rsp->OutputOffset) + *plen" addition can overflow and wrap around to a smaller value which looks like it would lead to an information leak. Fixes: 4a72dafa19ba ("SMB2 FSCTL and IOCTL worker function") Signed-off-by: Dan Carpenter <dan.carpenter(a)oracle.com> Signed-off-by: Steve French <stfrench(a)microsoft.com> Reviewed-by: Aurelien Aptel <aaptel(a)suse.com> CC: Stable <stable(a)vger.kernel.org> diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index 6f0e6b42599c..f54d07bda067 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -2459,14 +2459,14 @@ SMB2_ioctl(const unsigned int xid, struct cifs_tcon *tcon, u64 persistent_fid, /* We check for obvious errors in the output buffer length and offset */ if (*plen == 0) goto ioctl_exit; /* server returned no data */ - else if (*plen > 0xFF00) { + else if (*plen > rsp_iov.iov_len || *plen > 0xFF00) { cifs_dbg(VFS, "srv returned invalid ioctl length: %d\n", *plen); *plen = 0; rc = -EIO; goto ioctl_exit; } - if (rsp_iov.iov_len < le32_to_cpu(rsp->OutputOffset) + *plen) { + if (rsp_iov.iov_len - *plen < le32_to_cpu(rsp->OutputOffset)) { cifs_dbg(VFS, "Malformed ioctl resp: len %d offset %d\n", *plen, le32_to_cpu(rsp->OutputOffset)); *plen = 0;

6 years, 8 months

4
3
0 0

[PATCH v3 2/2] hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race

by Mike Kravetz

hugetlbfs page faults can race with truncate and hole punch operations. Current code in the page fault path attempts to handle this by 'backing out' operations if we encounter the race. One obvious omission in the current code is removing a page newly added to the page cache. This is pretty straight forward to address, but there is a more subtle and difficult issue of backing out hugetlb reservations. To handle this correctly, the 'reservation state' before page allocation needs to be noted so that it can be properly backed out. There are four distinct possibilities for reservation state: shared/reserved, shared/no-resv, private/reserved and private/no-resv. Backing out a reservation may require memory allocation which could fail so that needs to be taken into account as well. Instead of writing the required complicated code for this rare occurrence, just eliminate the race. i_mmap_rwsem is now held in read mode for the duration of page fault processing. Hold i_mmap_rwsem longer in truncation and hold punch code to cover the call to remove_inode_hugepages. With this modification, code in remove_inode_hugepages checking for races becomes 'dead' as it can not longer happen. Remove the dead code and expand comments to explain reasoning. Similarly, checks for races with truncation in the page fault path can be simplified and removed. Cc: <stable(a)vger.kernel.org> Fixes: ebed4bfc8da8 ("hugetlb: fix absurd HugePages_Rsvd") Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com> --- fs/hugetlbfs/inode.c | 61 ++++++++++++++++++++------------------------ mm/hugetlb.c | 21 ++++++++------- 2 files changed, 38 insertions(+), 44 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 32920a10100e..a2fcea5f8225 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -383,17 +383,16 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end) * truncation is indicated by end of range being LLONG_MAX * In this case, we first scan the range and release found pages. * After releasing pages, hugetlb_unreserve_pages cleans up region/reserv - * maps and global counts. Page faults can not race with truncation - * in this routine. hugetlb_no_page() prevents page faults in the - * truncated range. It checks i_size before allocation, and again after - * with the page table lock for the page held. The same lock must be - * acquired to unmap a page. + * maps and global counts. * hole punch is indicated if end is not LLONG_MAX * In the hole punch case we scan the range and release found pages. * Only when releasing a page is the associated region/reserv map * deleted. The region/reserv map for ranges without associated - * pages are not modified. Page faults can race with hole punch. - * This is indicated if we find a mapped page. + * pages are not modified. + * + * Callers of this routine must hold the i_mmap_rwsem in write mode to prevent + * races with page faults. + * * Note: If the passed end of range value is beyond the end of file, but * not LLONG_MAX this routine still performs a hole punch operation. */ @@ -423,32 +422,14 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, for (i = 0; i < pagevec_count(&pvec); ++i) { struct page *page = pvec.pages[i]; - u32 hash; index = page->index; - hash = hugetlb_fault_mutex_hash(h, current->mm, - &pseudo_vma, - mapping, index, 0); - mutex_lock(&hugetlb_fault_mutex_table[hash]); - /* - * If page is mapped, it was faulted in after being - * unmapped in caller. Unmap (again) now after taking - * the fault mutex. The mutex will prevent faults - * until we finish removing the page. - * - * This race can only happen in the hole punch case. - * Getting here in a truncate operation is a bug. + * A mapped page is impossible as callers should unmap + * all references before calling. And, i_mmap_rwsem + * prevents the creation of additional mappings. */ - if (unlikely(page_mapped(page))) { - BUG_ON(truncate_op); - - i_mmap_lock_write(mapping); - hugetlb_vmdelete_list(&mapping->i_mmap, - index * pages_per_huge_page(h), - (index + 1) * pages_per_huge_page(h)); - i_mmap_unlock_write(mapping); - } + VM_BUG_ON(page_mapped(page)); lock_page(page); /* @@ -470,7 +451,6 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, } unlock_page(page); - mutex_unlock(&hugetlb_fault_mutex_table[hash]); } huge_pagevec_release(&pvec); cond_resched(); @@ -482,9 +462,20 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, static void hugetlbfs_evict_inode(struct inode *inode) { + struct address_space *mapping = inode->i_mapping; struct resv_map *resv_map; + /* + * The vfs layer guarantees that there are no other users of this + * inode. Therefore, it would be safe to call remove_inode_hugepages + * without holding i_mmap_rwsem. We acquire and hold here to be + * consistent with other callers. Since there will be no contention + * on the semaphore, overhead is negligible. + */ + i_mmap_lock_write(mapping); remove_inode_hugepages(inode, 0, LLONG_MAX); + i_mmap_unlock_write(mapping); + resv_map = (struct resv_map *)inode->i_mapping->private_data; /* root inode doesn't have the resv_map, so we should check it */ if (resv_map) @@ -505,8 +496,8 @@ static int hugetlb_vmtruncate(struct inode *inode, loff_t offset) i_mmap_lock_write(mapping); if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0); - i_mmap_unlock_write(mapping); remove_inode_hugepages(inode, offset, LLONG_MAX); + i_mmap_unlock_write(mapping); return 0; } @@ -540,8 +531,8 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) hugetlb_vmdelete_list(&mapping->i_mmap, hole_start >> PAGE_SHIFT, hole_end >> PAGE_SHIFT); - i_mmap_unlock_write(mapping); remove_inode_hugepages(inode, hole_start, hole_end); + i_mmap_unlock_write(mapping); inode_unlock(inode); } @@ -624,7 +615,11 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, /* addr is the offset within the file (zero based) */ addr = index * hpage_size; - /* mutex taken here, fault path and hole punch */ + /* + * fault mutex taken here, protects against fault path + * and hole punch. inode_lock previously taken protects + * against truncation. + */ hash = hugetlb_fault_mutex_hash(h, mm, &pseudo_vma, mapping, index, addr); mutex_lock(&hugetlb_fault_mutex_table[hash]); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 2a3162030167..cfd9790b01e3 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3757,16 +3757,16 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, } /* - * Use page lock to guard against racing truncation - * before we get page_table_lock. + * We can not race with truncation due to holding i_mmap_rwsem. + * Check once here for faults beyond end of file. */ + size = i_size_read(mapping->host) >> huge_page_shift(h); + if (idx >= size) + goto out; + retry: page = find_lock_page(mapping, idx); if (!page) { - size = i_size_read(mapping->host) >> huge_page_shift(h); - if (idx >= size) - goto out; - /* * Check for page in userfault range */ @@ -3856,9 +3856,6 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, } ptl = huge_pte_lock(h, mm, ptep); - size = i_size_read(mapping->host) >> huge_page_shift(h); - if (idx >= size) - goto backout; ret = 0; if (!huge_pte_none(huge_ptep_get(ptep))) @@ -3961,8 +3958,10 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, /* * Acquire i_mmap_rwsem before calling huge_pte_alloc and hold - * until finished with ptep. This prevents huge_pmd_unshare from - * being called elsewhere and making the ptep no longer valid. + * until finished with ptep. This serves two purposes: + * 1) It prevents huge_pmd_unshare from being called elsewhere + * and making the ptep no longer valid. + * 2) It synchronizes us with file truncation. * * ptep could have already be assigned via huge_pte_offset. That * is OK, as huge_pte_alloc will return the same value unless -- 2.17.2

6 years, 8 months

1
0
0 0

[PATCH v2 2/2] hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race

by Mike Kravetz

hugetlbfs page faults can race with truncate and hole punch operations. Current code in the page fault path attempts to handle this by 'backing out' operations if we encounter the race. One obvious omission in the current code is removing a page newly added to the page cache. This is pretty straight forward to address, but there is a more subtle and difficult issue of backing out hugetlb reservations. To handle this correctly, the 'reservation state' before page allocation needs to be noted so that it can be properly backed out. There are four distinct possibilities for reservation state: shared/reserved, shared/no-resv, private/reserved and private/no-resv. Backing out a reservation may require memory allocation which could fail so that needs to be taken into account as well. Instead of writing the required complicated code for this rare occurrence, just eliminate the race. i_mmap_rwsem is now held in read mode for the duration of page fault processing. Hold i_mmap_rwsem longer in truncation and hold punch code to cover the call to remove_inode_hugepages. With this modification, code in remove_inode_hugepages checking for races becomes 'dead' as it can not longer happen. Remove the dead code and expand comments to explain reasoning. Similarly, checks for races with truncation in the page fault path can be simplified and removed. Cc: <stable(a)vger.kernel.org> Fixes: ebed4bfc8da8 ("hugetlb: fix absurd HugePages_Rsvd") Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com> --- fs/hugetlbfs/inode.c | 50 +++++++++++++++----------------------------- mm/hugetlb.c | 21 +++++++++---------- 2 files changed, 27 insertions(+), 44 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 32920a10100e..a9c00c6ef80d 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -383,17 +383,16 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end) * truncation is indicated by end of range being LLONG_MAX * In this case, we first scan the range and release found pages. * After releasing pages, hugetlb_unreserve_pages cleans up region/reserv - * maps and global counts. Page faults can not race with truncation - * in this routine. hugetlb_no_page() prevents page faults in the - * truncated range. It checks i_size before allocation, and again after - * with the page table lock for the page held. The same lock must be - * acquired to unmap a page. + * maps and global counts. * hole punch is indicated if end is not LLONG_MAX * In the hole punch case we scan the range and release found pages. * Only when releasing a page is the associated region/reserv map * deleted. The region/reserv map for ranges without associated - * pages are not modified. Page faults can race with hole punch. - * This is indicated if we find a mapped page. + * pages are not modified. + * + * Callers of this routine must hold the i_mmap_rwsem in write mode to prevent + * races with page faults. + * * Note: If the passed end of range value is beyond the end of file, but * not LLONG_MAX this routine still performs a hole punch operation. */ @@ -423,32 +422,14 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, for (i = 0; i < pagevec_count(&pvec); ++i) { struct page *page = pvec.pages[i]; - u32 hash; index = page->index; - hash = hugetlb_fault_mutex_hash(h, current->mm, - &pseudo_vma, - mapping, index, 0); - mutex_lock(&hugetlb_fault_mutex_table[hash]); - /* - * If page is mapped, it was faulted in after being - * unmapped in caller. Unmap (again) now after taking - * the fault mutex. The mutex will prevent faults - * until we finish removing the page. - * - * This race can only happen in the hole punch case. - * Getting here in a truncate operation is a bug. + * A mapped page is impossible as callers should unmap + * all references before calling. And, i_mmap_rwsem + * prevents the creation of additional mappings. */ - if (unlikely(page_mapped(page))) { - BUG_ON(truncate_op); - - i_mmap_lock_write(mapping); - hugetlb_vmdelete_list(&mapping->i_mmap, - index * pages_per_huge_page(h), - (index + 1) * pages_per_huge_page(h)); - i_mmap_unlock_write(mapping); - } + VM_BUG_ON(page_mapped(page)); lock_page(page); /* @@ -470,7 +451,6 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, } unlock_page(page); - mutex_unlock(&hugetlb_fault_mutex_table[hash]); } huge_pagevec_release(&pvec); cond_resched(); @@ -505,8 +485,8 @@ static int hugetlb_vmtruncate(struct inode *inode, loff_t offset) i_mmap_lock_write(mapping); if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0); - i_mmap_unlock_write(mapping); remove_inode_hugepages(inode, offset, LLONG_MAX); + i_mmap_unlock_write(mapping); return 0; } @@ -540,8 +520,8 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) hugetlb_vmdelete_list(&mapping->i_mmap, hole_start >> PAGE_SHIFT, hole_end >> PAGE_SHIFT); - i_mmap_unlock_write(mapping); remove_inode_hugepages(inode, hole_start, hole_end); + i_mmap_unlock_write(mapping); inode_unlock(inode); } @@ -624,7 +604,11 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, /* addr is the offset within the file (zero based) */ addr = index * hpage_size; - /* mutex taken here, fault path and hole punch */ + /* + * fault mutex taken here, protects against fault path + * and hole punch. inode_lock previously taken protects + * against truncation. + */ hash = hugetlb_fault_mutex_hash(h, mm, &pseudo_vma, mapping, index, addr); mutex_lock(&hugetlb_fault_mutex_table[hash]); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ab4c77b8c72c..25a0cd2f8b39 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3760,16 +3760,16 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, } /* - * Use page lock to guard against racing truncation - * before we get page_table_lock. + * We can not race with truncation due to holding i_mmap_rwsem. + * Check once here for faults beyond end of file. */ + size = i_size_read(mapping->host) >> huge_page_shift(h); + if (idx >= size) + goto out; + retry: page = find_lock_page(mapping, idx); if (!page) { - size = i_size_read(mapping->host) >> huge_page_shift(h); - if (idx >= size) - goto out; - /* * Check for page in userfault range */ @@ -3859,9 +3859,6 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, } ptl = huge_pte_lock(h, mm, ptep); - size = i_size_read(mapping->host) >> huge_page_shift(h); - if (idx >= size) - goto backout; ret = 0; if (!huge_pte_none(huge_ptep_get(ptep))) @@ -3964,8 +3961,10 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, /* * Acquire i_mmap_rwsem before calling huge_pte_alloc and hold - * until finished with ptep. This prevents huge_pmd_unshare from - * being called elsewhere and making the ptep no longer valid. + * until finished with ptep. This serves two purposes: + * 1) It prevents huge_pmd_unshare from being called elsewhere + * and making the ptep no longer valid. + * 2) It synchronizes us with file truncation. * * ptep could have already be assigned via huge_pte_offset. That * is OK, as huge_pte_alloc will return the same value unless -- 2.17.2

6 years, 8 months

2
5
0 0

FAILED: patch "[PATCH] dax: Don't access a freed inode" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 55e56f06ed71d9441f3abd5b1d3c1a870812b3fe Mon Sep 17 00:00:00 2001 From: Matthew Wilcox <willy(a)infradead.org> Date: Tue, 27 Nov 2018 13:16:34 -0800 Subject: [PATCH] dax: Don't access a freed inode After we drop the i_pages lock, the inode can be freed at any time. The get_unlocked_entry() code has no choice but to reacquire the lock, so it can't be used here. Create a new wait_entry_unlocked() which takes care not to acquire the lock or dereference the address_space in any way. Fixes: c2a7d2a11552 ("filesystem-dax: Introduce dax_lock_mapping_entry()") Cc: <stable(a)vger.kernel.org> Signed-off-by: Matthew Wilcox <willy(a)infradead.org> Reviewed-by: Jan Kara <jack(a)suse.cz> Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> diff --git a/fs/dax.c b/fs/dax.c index e69fc231833b..3f592dc18d67 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -232,6 +232,34 @@ static void *get_unlocked_entry(struct xa_state *xas) } } +/* + * The only thing keeping the address space around is the i_pages lock + * (it's cycled in clear_inode() after removing the entries from i_pages) + * After we call xas_unlock_irq(), we cannot touch xas->xa. + */ +static void wait_entry_unlocked(struct xa_state *xas, void *entry) +{ + struct wait_exceptional_entry_queue ewait; + wait_queue_head_t *wq; + + init_wait(&ewait.wait); + ewait.wait.func = wake_exceptional_entry_func; + + wq = dax_entry_waitqueue(xas, entry, &ewait.key); + prepare_to_wait_exclusive(wq, &ewait.wait, TASK_UNINTERRUPTIBLE); + xas_unlock_irq(xas); + schedule(); + finish_wait(wq, &ewait.wait); + + /* + * Entry lock waits are exclusive. Wake up the next waiter since + * we aren't sure we will acquire the entry lock and thus wake + * the next waiter up on unlock. + */ + if (waitqueue_active(wq)) + __wake_up(wq, TASK_NORMAL, 1, &ewait.key); +} + static void put_unlocked_entry(struct xa_state *xas, void *entry) { /* If we were the only waiter woken, wake the next one */ @@ -389,9 +417,7 @@ bool dax_lock_mapping_entry(struct page *page) entry = xas_load(&xas); if (dax_is_locked(entry)) { rcu_read_unlock(); - entry = get_unlocked_entry(&xas); - xas_unlock_irq(&xas); - put_unlocked_entry(&xas, entry); + wait_entry_unlocked(&xas, entry); rcu_read_lock(); continue; }

6 years, 8 months

5
5
0 0

Re: [PATCH] MIPS: math-emu: Write-protect delay slot emulation pages

by Paul Burton

Hi Sasha, On Thu, Dec 20, 2018 at 07:26:15PM +0000, Sasha Levin wrote: > Hi, > > [This is an automated email] > > This commit has been processed because it contains a "Fixes:" tag, > fixing commit: 432c6bacbd0c MIPS: Use per-mm page to execute branch delay slot instructions. > > The bot has tested the following trees: v4.19.10, v4.14.89, v4.9.146, Neat! I like the idea of this automation :) > v4.19.10: Build OK! > v4.14.89: Build OK! > v4.9.146: Failed to apply! Possible dependencies: > 05ce77249d50 ("userfaultfd: non-cooperative: add madvise() event for MADV_DONTNEED request") > 163e11bc4f6e ("userfaultfd: hugetlbfs: UFFD_FEATURE_MISSING_HUGETLBFS") > 67dece7d4c58 ("x86/vdso: Set vDSO pointer only after success") > 72f87654c696 ("userfaultfd: non-cooperative: add mremap() event") > 893e26e61d04 ("userfaultfd: non-cooperative: Add fork() event") > 897ab3e0c49e ("userfaultfd: non-cooperative: add event for memory unmaps") > 9cd75c3cd4c3 ("userfaultfd: non-cooperative: add ability to report non-PF events from uffd descriptor") > d811914d8757 ("userfaultfd: non-cooperative: rename *EVENT_MADVDONTNEED to *EVENT_REMOVE") This list includes the correct soft dependency - commit 897ab3e0c49e ("userfaultfd: non-cooperative: add event for memory unmaps") which added an extra argument to mmap_region(). > How should we proceed with this patch? The backport to v4.9 should simply drop the last argument (NULL) in the call to mmap_region(). Is there some way I can indicate this sort of thing in future patches so that the automation can spot that I already know it won't apply cleanly to a particular range of kernel versions? Or even better, that I could indicate what needs to be changed when backporting to those kernel versions? Thanks, Paul

6 years, 8 months

2
1
0 0

Re: Fix "perf tools: Synthesize GROUP_DESC feature in pipe mode" in the LT 4.14 branch

by Jinpu Wang

+cc Greg, stable Greensky, James J <james.j.greensky(a)intel.com> 于2018年12月21日周五上午11:48写道： > > Commit d38d272592737ea88a20 ("perf tools: Synthesize GROUP_DESC feature in pipe mode") broke the LT 4.14 branch when using event groups in pipe-mode. > > # perf record -e '{cycles,instructions,branches}' -- sleep 4 | perf report > # To display the perf.data header info, please use --header/--header-only options > # > Oxd7c [0x60]: failed to process type: 80 > Error: > Failed to process sample > > Commit a2015516c5c0be932a69 ("perf record: Synthesize features before events in pipe mode") is the fix. Can we get this cherry-picked and applied? >

6 years, 8 months

3
3
0 0

v4.19.12 build: 0 failures 2 warnings (v4.19.12)

by Build bot for Mark Brown

Tree/Branch: v4.19.12 Git describe: v4.19.12 Commit: 2a7cb228d2 Linux 4.19.12 Build Time: 124 min 24 sec Passed: 11 / 11 (100.00 %) Failed: 0 / 11 ( 0.00 %) Errors: 0 Warnings: 2 Section Mismatches: 0 ------------------------------------------------------------------------------- defconfigs with issues (other than build errors): 1 warnings 0 mismatches : arm64-allmodconfig 1 warnings 0 mismatches : arm-allmodconfig ------------------------------------------------------------------------------- Warnings Summary: 2 1 ../drivers/staging/erofs/unzip_vle.c:186:29: warning: array subscript is above array bounds [-Warray-bounds] 1 ../drivers/isdn/hardware/eicon/message.c:5985:1: warning: the frame size of 2064 bytes is larger than 2048 bytes [-Wframe-larger-than=] =============================================================================== Detailed per-defconfig build reports below: ------------------------------------------------------------------------------- arm64-allmodconfig : PASS, 0 errors, 1 warnings, 0 section mismatches Warnings: ../drivers/isdn/hardware/eicon/message.c:5985:1: warning: the frame size of 2064 bytes is larger than 2048 bytes [-Wframe-larger-than=] ------------------------------------------------------------------------------- arm-allmodconfig : PASS, 0 errors, 1 warnings, 0 section mismatches Warnings: ../drivers/staging/erofs/unzip_vle.c:186:29: warning: array subscript is above array bounds [-Warray-bounds] ------------------------------------------------------------------------------- Passed with no errors, warnings or mismatches: arm64-allnoconfig arm-multi_v5_defconfig arm-multi_v7_defconfig x86_64-defconfig arm-allnoconfig x86_64-allnoconfig arm-multi_v4t_defconfig x86_64-allmodconfig arm64-defconfig

6 years, 8 months

1
0
0 0

Re: [PATCH v2] powerpc/4xx/ocm: fix compilation error

by christophe leroy

Le 22/12/2018 à 15:35, Christian Lamparter a écrit : > This patch fixes a recent compilation regression in ocm: > > | ocm.c: In function ‘ocm_init_node’: > | ocm.c:182:18: error: invalid operands to binary | > | (have ‘int’ and ‘pgprot_t’ {aka ‘struct <anonymous>’}) > | _PAGE_EXEC | PAGE_KERNEL_NCG); > | ^ > | > | ocm.c:197:17: error: invalid operands to binary | > | (have ‘int’ and ‘pgprot_t’ {aka ‘struct <anonymous>’}) > | _PAGE_EXEC | PAGE_KERNEL); > | ^ > > Fixes: 56f3c1413f5c ("powerpc/mm: properly set PAGE_KERNEL flags in ioremap()") > Signed-off-by: Christian Lamparter <chunkeey(a)gmail.com> Reviewed-by: Christophe Leroy <christophe.leroy(a)c-s.fr> Cc: stable(a)vger.kernel.org > --- > arch/powerpc/platforms/4xx/ocm.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/platforms/4xx/ocm.c b/arch/powerpc/platforms/4xx/ocm.c > index 561b09de69bf..de3565353153 100644 > --- a/arch/powerpc/platforms/4xx/ocm.c > +++ b/arch/powerpc/platforms/4xx/ocm.c > @@ -179,7 +179,7 @@ static void __init ocm_init_node(int count, struct device_node *node) > /* ioremap the non-cached region */ > if (ocm->nc.memtotal) { > ocm->nc.virt = __ioremap(ocm->nc.phys, ocm->nc.memtotal, > - _PAGE_EXEC | PAGE_KERNEL_NCG); > + _PAGE_EXEC | pgprot_val(PAGE_KERNEL_NCG)); > > if (!ocm->nc.virt) { > printk(KERN_ERR > @@ -194,7 +194,7 @@ static void __init ocm_init_node(int count, struct device_node *node) > > if (ocm->c.memtotal) { > ocm->c.virt = __ioremap(ocm->c.phys, ocm->c.memtotal, > - _PAGE_EXEC | PAGE_KERNEL); > + _PAGE_EXEC | pgprot_val(PAGE_KERNEL)); > > if (!ocm->c.virt) { > printk(KERN_ERR > --- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus

6 years, 8 months

1
0
0 0

[PATCH] drm/i915: Unwind failure on pinning the gen7 ppgtt

by Chris Wilson

If we fail to pin the ggtt vma slot for the ppgtt page tables, we need to unwind the locals before reporting the error. Or else on subsequent attempts to bind the page tables into the ggtt, we will already believe that the vma has been pinned and continue on blithely. If something else should happen to be at that location, choas ensues. Fixes: a2bbf7148342 ("drm/i915/gtt: Only keep gen6 page directories pinned while active") Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala(a)linux.intel.com> Cc: Matthew Auld <matthew.william.auld(a)gmail.com> Cc: <stable(a)vger.kernel.org> # v4.19+ --- drivers/gpu/drm/i915/i915_gem_gtt.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 6e31745f6156..4ed2f3e61347 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -2073,6 +2073,7 @@ static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size) int gen6_ppgtt_pin(struct i915_hw_ppgtt *base) { struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base); + int err; /* * Workaround the limited maximum vma->pin_count and the aliasing_ppgtt @@ -2088,9 +2089,17 @@ int gen6_ppgtt_pin(struct i915_hw_ppgtt *base) * allocator works in address space sizes, so it's multiplied by page * size. We allocate at the top of the GTT to avoid fragmentation. */ - return i915_vma_pin(ppgtt->vma, - 0, GEN6_PD_ALIGN, - PIN_GLOBAL | PIN_HIGH); + err = i915_vma_pin(ppgtt->vma, + 0, GEN6_PD_ALIGN, + PIN_GLOBAL | PIN_HIGH); + if (err) + goto unpin; + + return 0; + +unpin: + ppgtt->pin_count = 0; + return err; } void gen6_ppgtt_unpin(struct i915_hw_ppgtt *base) -- 2.20.1

6 years, 8 months

2
1
0 0

[PATCH v2 0/1] f2fs: fix sanity_check_raw_super on big endian machines

by Martin Blumenstingl

Big endian machines (at least the one I have access to) cannot mount f2fs filesystems anymore. This is with Linux 4.14.89 but I suspect that 4.9.144 (and later) is affected as well. commit 0cfe75c5b01199 ("f2fs: enhance sanity_check_raw_super() to avoid potential overflows") treats the "block_count" from struct f2fs_super_block as 32-bit little endian value instead of a 64-bit little endian value. I tested this fix on top of Linux 4.14.49 but it seems that all stable and mainline kernel versions are affected: - 4.9.144 and later because 0cfe75c5b01199 was backported there - 4.14.86 and later because 0cfe75c5b01199 was backported there - 4.19 - 4.20-rcX changes since v1 at [0]: - change the printk format for block_count from "%u" to "%llu" (thanks to "kbuild test robot" for spotting this) - added Chao Yu's reviewed by [0] https://lore.kernel.org/patchwork/cover/1027285/ Martin Blumenstingl (1): f2fs: fix validation of the block count in sanity_check_raw_super fs/f2fs/super.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- 2.20.1

6 years, 8 months

1
1
0 0

[PATCH] ocxl: Fix endiannes bug in ocxl_link_update_pe()

by Greg Kurz

All fields in the PE are big-endian. Use cpu_to_be32() like everywhere else something is written to the PE. Otherwise a wrong TID will be used by the NPU. If this TID happens to point to an existing thread sharing the same mm, it could be woken up by error. This is highly improbable though. The likely outcome of this is the NPU not finding the target thread and forcing the AFU into sending an interrupt, which userspace is supposed to handle anyway. Fixes: e948e06fc63a ("ocxl: Expose the thread_id needed for wait on POWER9") Cc: stable(a)vger.kernel.org # v4.18 Signed-off-by: Greg Kurz <groug(a)kaod.org> --- This bug remained unnoticed so far because the current OCXL test suite happens to call OCXL_IOCTL_ENABLE_P9_WAIT before attaching a context. This causes ocxl_link_update_pe() to be called before ocxl_link_add_pe() which re-writes the TID in the PE with the appropriate endianness. I have some patches that change the behavior of the OCXL test suite so that it can catch the issue: https://github.com/gkurz/libocxl/commits/wake-host-thread-rework --- drivers/misc/ocxl/link.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c index 31695a078485..646d16450066 100644 --- a/drivers/misc/ocxl/link.c +++ b/drivers/misc/ocxl/link.c @@ -566,7 +566,7 @@ int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid) mutex_lock(&spa->spa_lock); - pe->tid = tid; + pe->tid = cpu_to_be32(tid); /* * The barrier makes sure the PE is updated

6 years, 8 months

4
4
0 0

[PATCH] ocxl/afu_irq: Don't include <asm/pnv-ocxl.h>

by Greg Kurz

The AFU irq code doesn't need to reach out to the platform. Signed-off-by: Greg Kurz <groug(a)kaod.org> --- drivers/misc/ocxl/afu_irq.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/misc/ocxl/afu_irq.c b/drivers/misc/ocxl/afu_irq.c index e70cfa24577f..11ab996657a2 100644 --- a/drivers/misc/ocxl/afu_irq.c +++ b/drivers/misc/ocxl/afu_irq.c @@ -2,7 +2,6 @@ // Copyright 2017 IBM Corp. #include <linux/interrupt.h> #include <linux/eventfd.h> -#include <asm/pnv-ocxl.h> #include "ocxl_internal.h" #include "trace.h"

6 years, 8 months

4
4
0 0

[PATCH] ocxl: Clarify error path in setup_xsl_irq()

by Greg Kurz

Implementing rollback with goto and labels is a common practice that leads to prettier and more maintainable code. FWIW, this design pattern is already being used in alloc_link() a few lines below in this file. Do the same in setup_xsl_irq(). Signed-off-by: Greg Kurz <groug(a)kaod.org> --- drivers/misc/ocxl/link.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c index eed92055184d..659977a17405 100644 --- a/drivers/misc/ocxl/link.c +++ b/drivers/misc/ocxl/link.c @@ -273,9 +273,9 @@ static int setup_xsl_irq(struct pci_dev *dev, struct link *link) spa->irq_name = kasprintf(GFP_KERNEL, "ocxl-xsl-%x-%x-%x", link->domain, link->bus, link->dev); if (!spa->irq_name) { - unmap_irq_registers(spa); dev_err(&dev->dev, "Can't allocate name for xsl interrupt\n"); - return -ENOMEM; + rc = -ENOMEM; + goto err_xsl; } /* * At some point, we'll need to look into allowing a higher @@ -283,11 +283,10 @@ static int setup_xsl_irq(struct pci_dev *dev, struct link *link) */ spa->virq = irq_create_mapping(NULL, hwirq); if (!spa->virq) { - kfree(spa->irq_name); - unmap_irq_registers(spa); dev_err(&dev->dev, "irq_create_mapping failed for translation interrupt\n"); - return -EINVAL; + rc = -EINVAL; + goto err_name; } dev_dbg(&dev->dev, "hwirq %d mapped to virq %d\n", hwirq, spa->virq); @@ -295,15 +294,21 @@ static int setup_xsl_irq(struct pci_dev *dev, struct link *link) rc = request_irq(spa->virq, xsl_fault_handler, 0, spa->irq_name, link); if (rc) { - irq_dispose_mapping(spa->virq); - kfree(spa->irq_name); - unmap_irq_registers(spa); dev_err(&dev->dev, "request_irq failed for translation interrupt: %d\n", rc); - return -EINVAL; + rc = -EINVAL; + goto err_mapping; } return 0; + +err_mapping: + irq_dispose_mapping(spa->virq); +err_name: + kfree(spa->irq_name); +err_xsl: + unmap_irq_registers(spa); + return rc; } static void release_xsl_irq(struct link *link)

6 years, 8 months

4
3
0 0

FAILED: patch "[PATCH] ib_srpt: Fix a use-after-free in __srpt_close_all_ch()" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 14d15c2b278011056482eb015dff89f9cbf2b841 Mon Sep 17 00:00:00 2001 From: Bart Van Assche <bart.vanassche(a)wdc.com> Date: Mon, 2 Jul 2018 14:08:45 -0700 Subject: [PATCH] ib_srpt: Fix a use-after-free in __srpt_close_all_ch() BUG: KASAN: use-after-free in srpt_set_enabled+0x1a9/0x1e0 [ib_srpt] Read of size 4 at addr ffff8801269d23f8 by task check/29726 CPU: 4 PID: 29726 Comm: check Not tainted 4.18.0-rc2-dbg+ #4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xa4/0xf5 print_address_description+0x6f/0x270 kasan_report+0x241/0x360 __asan_load4+0x78/0x80 srpt_set_enabled+0x1a9/0x1e0 [ib_srpt] srpt_tpg_enable_store+0xb8/0x120 [ib_srpt] configfs_write_file+0x14e/0x1d0 [configfs] __vfs_write+0xd2/0x3b0 vfs_write+0x101/0x270 ksys_write+0xab/0x120 __x64_sys_write+0x43/0x50 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f235cfe6154 Fixes: aaf45bd83eba ("IB/srpt: Detect session shutdown reliably") Signed-off-by: Bart Van Assche <bart.vanassche(a)wdc.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com> diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c index 754da8d30952..e42eec20c631 100644 --- a/drivers/infiniband/ulp/srpt/ib_srpt.c +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c @@ -1940,8 +1940,8 @@ static void __srpt_close_all_ch(struct srpt_port *sport) list_for_each_entry(nexus, &sport->nexus_list, entry) { list_for_each_entry(ch, &nexus->ch_list, list) { if (srpt_disconnect_ch(ch) >= 0) - pr_info("Closing channel %s-%d because target %s_%d has been disabled\n", - ch->sess_name, ch->qp->qp_num, + pr_info("Closing channel %s because target %s_%d has been disabled\n", + ch->sess_name, sport->sdev->device->name, sport->port); srpt_close_ch(ch); }

6 years, 8 months

3
2
0 0

FAILED: patch "[PATCH] ubifs: Fix directory size calculation for symlinks" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 00ee8b60102862f4daf0814d12a2ea2744fc0b9b Mon Sep 17 00:00:00 2001 From: Richard Weinberger <richard(a)nod.at> Date: Mon, 11 Jun 2018 23:41:09 +0200 Subject: [PATCH] ubifs: Fix directory size calculation for symlinks We have to account the name of the symlink and not the target length. Fixes: ca7f85be8d6c ("ubifs: Add support for encrypted symlinks") Cc: <stable(a)vger.kernel.org> Signed-off-by: Richard Weinberger <richard(a)nod.at> diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c index 9da224d4f2da..e8616040bffc 100644 --- a/fs/ubifs/dir.c +++ b/fs/ubifs/dir.c @@ -1123,8 +1123,7 @@ static int ubifs_symlink(struct inode *dir, struct dentry *dentry, struct ubifs_inode *ui; struct ubifs_inode *dir_ui = ubifs_inode(dir); struct ubifs_info *c = dir->i_sb->s_fs_info; - int err, len = strlen(symname); - int sz_change = CALC_DENT_SIZE(len); + int err, sz_change, len = strlen(symname); struct fscrypt_str disk_link; struct ubifs_budget_req req = { .new_ino = 1, .new_dent = 1, .new_ino_d = ALIGN(len, 8), @@ -1151,6 +1150,8 @@ static int ubifs_symlink(struct inode *dir, struct dentry *dentry, if (err) goto out_budg; + sz_change = CALC_DENT_SIZE(fname_len(&nm)); + inode = ubifs_new_inode(c, dir, S_IFLNK | S_IRWXUGO); if (IS_ERR(inode)) { err = PTR_ERR(inode);

6 years, 8 months

4
3
0 0

[PATCH] iomap: Revert "fs/iomap.c: get/put the page in iomap_page_create/release()"

by Dave Chinner

From: Dave Chinner <dchinner(a)redhat.com> This reverts commit 61c6de667263184125d5ca75e894fcad632b0dd3. The reverted commit added page reference counting to iomap page structures that are used to track block size < page size state. This was supposed to align the code with page migration page accounting assumptions, but what it has done instead is break XFS filesystems. Every fstests run I've done on sub-page block size XFS filesystems has since picking up this commit 2 days ago has failed with bad page state errors such as: # ./run_check.sh "-m rmapbt=1,reflink=1 -i sparse=1 -b size=1k" "generic/038" .... SECTION -- xfs FSTYP -- xfs (debug) PLATFORM -- Linux/x86_64 test1 4.20.0-rc6-dgc+ MKFS_OPTIONS -- -f -m rmapbt=1,reflink=1 -i sparse=1 -b size=1k /dev/sdc MOUNT_OPTIONS -- /dev/sdc /mnt/scratch generic/038 454s ... run fstests generic/038 at 2018-12-20 18:43:05 XFS (sdc): Unmounting Filesystem XFS (sdc): Mounting V5 Filesystem XFS (sdc): Ending clean mount BUG: Bad page state in process kswapd0 pfn:3a7fa page:ffffea0000ccbeb0 count:0 mapcount:0 mapping:ffff88800d9b6360 index:0x1 flags: 0xfffffc0000000() raw: 000fffffc0000000 dead000000000100 dead000000000200 ffff88800d9b6360 raw: 0000000000000001 0000000000000000 00000000ffffffff page dumped because: non-NULL mapping CPU: 0 PID: 676 Comm: kswapd0 Not tainted 4.20.0-rc6-dgc+ #915 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014 Call Trace: dump_stack+0x67/0x90 bad_page.cold.116+0x8a/0xbd free_pcppages_bulk+0x4bf/0x6a0 free_unref_page_list+0x10f/0x1f0 shrink_page_list+0x49d/0xf50 shrink_inactive_list+0x19d/0x3b0 shrink_node_memcg.constprop.77+0x398/0x690 ? shrink_slab.constprop.81+0x278/0x3f0 shrink_node+0x7a/0x2f0 kswapd+0x34b/0x6d0 ? node_reclaim+0x240/0x240 kthread+0x11f/0x140 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x24/0x30 Disabling lock debugging due to kernel taint .... The failures are from anyway that frees pages and empties the per-cpu page magazines, so it's not a predictable failure or an easy to debug failure. generic/038 is a reliable reproducer of this problem - it has a 9 in 10 failure rate on one of my test machines. Failure on other machines have been at random points in fstests runs but every run has ended up tripping this problem. Hence generic/038 was used to bisect the failure because it was the most reliable failure. It is too close to the 4.20 release (not to mention holidays) to try to diagnose, fix and test the underlying cause of the problem, so reverting the commit is the only option we have right now. The revert has been tested against a current tot 4.20-rc7+ kernel across multiple machines running sub-page block size XFs filesystems and none of the bad page state failures have been seen. Signed-off-by: Dave Chinner <dchinner(a)redhat.com> --- fs/iomap.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/fs/iomap.c b/fs/iomap.c index 5bc172f3dfe8..d6bc98ae8d35 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -116,12 +116,6 @@ iomap_page_create(struct inode *inode, struct page *page) atomic_set(&iop->read_count, 0); atomic_set(&iop->write_count, 0); bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE); - - /* - * migrate_page_move_mapping() assumes that pages with private data have - * their count elevated by 1. - */ - get_page(page); set_page_private(page, (unsigned long)iop); SetPagePrivate(page); return iop; @@ -138,7 +132,6 @@ iomap_page_release(struct page *page) WARN_ON_ONCE(atomic_read(&iop->write_count)); ClearPagePrivate(page); set_page_private(page, 0); - put_page(page); kfree(iop); } -- 2.19.1

6 years, 8 months

6
13
0 0

stable/linux-3.18.y boot: 47 boots: 1 failed, 45 passed with 1 conflict (v3.18.131)

by kernelci.org bot

stable/linux-3.18.y boot: 47 boots: 1 failed, 45 passed with 1 conflict (v3.18.131) Full Boot Summary: https://kernelci.org/boot/all/job/stable/branch/linux-3.18.y/kernel/v3.18.1… Full Build Summary: https://kernelci.org/build/stable/branch/linux-3.18.y/kernel/v3.18.131/ Tree: stable Branch: linux-3.18.y Git Describe: v3.18.131 Git Commit: fa42fea0d8b49ba65b49a999331950d74827a52d Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Tested: 21 unique boards, 11 SoC families, 11 builds out of 185 Boot Regressions Detected: arm: multi_v7_defconfig: omap4-panda: lab-collabora: new failure (last pass: v3.18.130) Boot Failure Detected: arm: exynos_defconfig exynos4412-odroidx2: 1 failed lab Conflicting Boot Failure Detected: (These likely are not failures as other labs are reporting PASS. Needs review.) arm: multi_v7_defconfig: omap4-panda: lab-baylibre-seattle: PASS lab-collabora: FAIL --- For more info write to <info(a)kernelci.org>

6 years, 8 months

1
0
0 0

stable/linux-4.14.y boot: 101 boots: 2 failed, 98 passed with 1 offline (v4.14.90)

by kernelci.org bot

stable/linux-4.14.y boot: 101 boots: 2 failed, 98 passed with 1 offline (v4.14.90) Full Boot Summary: https://kernelci.org/boot/all/job/stable/branch/linux-4.14.y/kernel/v4.14.9… Full Build Summary: https://kernelci.org/build/stable/branch/linux-4.14.y/kernel/v4.14.90/ Tree: stable Branch: linux-4.14.y Git Describe: v4.14.90 Git Commit: 592f5569e18471c07208f74540f4e0f646b226f7 Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Tested: 59 unique boards, 22 SoC families, 12 builds out of 197 Boot Regressions Detected: arm64: defconfig: rk3399-firefly: lab-baylibre-seattle: failing since 13 days (last pass: v4.14.86 - first fail: v4.14.87) Boot Failures Detected: arm64: defconfig meson-gxl-s905x-libretech-cc: 1 failed lab rk3399-firefly: 1 failed lab Offline Platforms: arm: multi_v7_defconfig: stih410-b2120: 1 offline lab --- For more info write to <info(a)kernelci.org>

6 years, 8 months

1
0
0 0

[patch 4/4] mm, page_alloc: fix has_unmovable_pages for HugePages

by akpm＠linux-foundation.org

From: Oscar Salvador <osalvador(a)suse.de> Subject: mm, page_alloc: fix has_unmovable_pages for HugePages While playing with gigantic hugepages and memory_hotplug, I triggered the following #PF when "cat memoryX/removable": <--- kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 kernel: #PF error: [normal kernel read fault] kernel: PGD 0 P4D 0 kernel: Oops: 0000 [#1] SMP PTI kernel: CPU: 1 PID: 1481 Comm: cat Tainted: G E 4.20.0-rc6-mm1-1-default+ #18 kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 kernel: RIP: 0010:has_unmovable_pages+0x154/0x210 kernel: Code: 1b ff ff ff eb 32 48 8b 45 00 bf 00 10 00 00 a9 00 00 01 00 74 07 0f b6 4d 51 48 d3 e7 e8 c4 81 05 00 48 85 c0 49 89 c1 75 7e <41> 8b 41 08 83 f8 09 74 41 83 f8 1b 74 3c 4d 2b 64 24 58 49 81 ec kernel: RSP: 0018:ffffc90000a1fd30 EFLAGS: 00010246 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000009 kernel: RDX: ffffffff82aed4f0 RSI: 0000000000001000 RDI: 0000000000001000 kernel: RBP: ffffea0001800000 R08: 0000000000200000 R09: 0000000000000000 kernel: R10: 0000000000001000 R11: 0000000000000003 R12: ffff88813ffd45c0 kernel: R13: 0000000000060000 R14: 0000000000000001 R15: ffffea0000000000 kernel: FS: 00007fd71d9b3500(0000) GS:ffff88813bb00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000008 CR3: 00000001371c2002 CR4: 00000000003606e0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 kernel: Call Trace: kernel: is_mem_section_removable+0x7d/0x100 kernel: removable_show+0x90/0xb0 kernel: dev_attr_show+0x1c/0x50 kernel: sysfs_kf_seq_show+0xca/0x1b0 kernel: seq_read+0x133/0x380 kernel: __vfs_read+0x26/0x180 kernel: vfs_read+0x89/0x140 kernel: ksys_read+0x42/0x90 kernel: do_syscall_64+0x5b/0x180 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 kernel: RIP: 0033:0x7fd71d4c8b41 kernel: Code: fe ff ff 48 8d 3d 27 9e 09 00 48 83 ec 08 e8 96 02 02 00 66 0f 1f 44 00 00 8b 05 ea fc 2c 00 48 63 ff 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 55 53 48 89 d5 48 89 kernel: RSP: 002b:00007ffeab5f6448 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 kernel: RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fd71d4c8b41 kernel: RDX: 0000000000020000 RSI: 00007fd71d809000 RDI: 0000000000000003 kernel: RBP: 0000000000020000 R08: ffffffffffffffff R09: 0000000000000000 kernel: R10: 000000000000038b R11: 0000000000000246 R12: 00007fd71d809000 kernel: R13: 0000000000000003 R14: 00007fd71d80900f R15: 0000000000020000 kernel: Modules linked in: af_packet(E) xt_tcpudp(E) ipt_REJECT(E) xt_conntrack(E) nf_conntrack(E) nf_defrag_ipv4(E) ip_set(E) nfnetlink(E) ebtable_nat(E) ebtable_broute(E) bridge(E) stp(E) llc(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ebtable_filter(E) ebtables(E) iptable_filter(E) ip_tables(E) x_tables(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) bochs_drm(E) ttm(E) drm_kms_helper(E) drm(E) aesni_intel(E) virtio_net(E) syscopyarea(E) net_failover(E) sysfillrect(E) failover(E) aes_x86_64(E) crypto_simd(E) sysimgblt(E) cryptd(E) pcspkr(E) glue_helper(E) parport_pc(E) fb_sys_fops(E) i2c_piix4(E) parport(E) button(E) btrfs(E) libcrc32c(E) xor(E) zstd_decompress(E) zstd_compress(E) raid6_pq(E) sd_mod(E) ata_generic(E) ata_piix(E) ahci(E) libahci(E) serio_raw(E) crc32c_intel(E) virtio_pci(E) virtio_ring(E) virtio(E) libata(E) sg(E) scsi_mod(E) autofs4(E) kernel: CR2: 0000000000000008 kernel: ---[ end trace 49cade81474e40e7 ]--- kernel: RIP: 0010:has_unmovable_pages+0x154/0x210 kernel: Code: 1b ff ff ff eb 32 48 8b 45 00 bf 00 10 00 00 a9 00 00 01 00 74 07 0f b6 4d 51 48 d3 e7 e8 c4 81 05 00 48 85 c0 49 89 c1 75 7e <41> 8b 41 08 83 f8 09 74 41 83 f8 1b 74 3c 4d 2b 64 24 58 49 81 ec kernel: RSP: 0018:ffffc90000a1fd30 EFLAGS: 00010246 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000009 kernel: RDX: ffffffff82aed4f0 RSI: 0000000000001000 RDI: 0000000000001000 kernel: RBP: ffffea0001800000 R08: 0000000000200000 R09: 0000000000000000 kernel: R10: 0000000000001000 R11: 0000000000000003 R12: ffff88813ffd45c0 kernel: R13: 0000000000060000 R14: 0000000000000001 R15: ffffea0000000000 kernel: FS: 00007fd71d9b3500(0000) GS:ffff88813bb00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000008 CR3: 00000001371c2002 CR4: 00000000003606e0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 ---> The reason is we do not pass the Head to page_hstate(), and so, the call to compound_order() in page_hstate() returns 0, so we end up checking all hstates's size to match PAGE_SIZE. Obviously, we do not find any hstate matching that size, and we return NULL. Then, we dereference that NULL pointer in hugepage_migration_supported() and we got the #PF from above. Fix that by getting the head page before calling page_hstate(). Also, since gigantic pages span several pageblocks, re-adjust the logic for skipping pages. While are it, we can also get rid of the round_up(). [osalvador(a)suse.de: remove round_up(), adjust skip pages logic per Michal] Link: http://lkml.kernel.org/r/20181221062809.31771-1-osalvador@suse.de Link: http://lkml.kernel.org/r/20181217225113.17864-1-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador(a)suse.de> Acked-by: Michal Hocko <mhocko(a)suse.com> Reviewed-by: David Hildenbrand <david(a)redhat.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: Pavel Tatashin <pavel.tatashin(a)microsoft.com> Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_alloc.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) --- a/mm/page_alloc.c~mm-page_alloc-fix-has_unmovable_pages-for-hugepages +++ a/mm/page_alloc.c @@ -7814,11 +7814,14 @@ bool has_unmovable_pages(struct zone *zo * handle each tail page individually in migration. */ if (PageHuge(page)) { + struct page *head = compound_head(page); + unsigned int skip_pages; - if (!hugepage_migration_supported(page_hstate(page))) + if (!hugepage_migration_supported(page_hstate(head))) goto unmovable; - iter = round_up(iter + 1, 1<<compound_order(page)) - 1; + skip_pages = (1 << compound_order(head)) - (page - head); + iter += skip_pages - 1; continue; } _

6 years, 8 months

1
0
0 0

[patch 3/4] fork,memcg: fix crash in free_thread_stack on memcg charge fail

by akpm＠linux-foundation.org

From: Rik van Riel <riel(a)surriel.com> Subject: fork,memcg: fix crash in free_thread_stack on memcg charge fail Changeset 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting") will result in fork failing if allocating a kernel stack for a task in dup_task_struct exceeds the kernel memory allowance for that cgroup. Unfortunately, it also results in a crash. This is due to the code jumping to free_stack and calling free_thread_stack when the memcg kernel stack charge fails, but without tsk->stack pointing at the freshly allocated stack. This in turn results in the vfree_atomic in free_thread_stack oopsing with a backtrace like this: #5 [ffffc900244efc88] die at ffffffff8101f0ab #6 [ffffc900244efcb8] do_general_protection at ffffffff8101cb86 #7 [ffffc900244efce0] general_protection at ffffffff818ff082 [exception RIP: llist_add_batch+7] RIP: ffffffff8150d487 RSP: ffffc900244efd98 RFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff88085ef55980 RCX: 0000000000000000 RDX: ffff88085ef55980 RSI: 343834343531203a RDI: 343834343531203a RBP: ffffc900244efd98 R8: 0000000000000001 R9: ffff8808578c3600 R10: 0000000000000000 R11: 0000000000000001 R12: ffff88029f6c21c0 R13: 0000000000000286 R14: ffff880147759b00 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffffc900244efda0] vfree_atomic at ffffffff811df2c7 #9 [ffffc900244efdb8] copy_process at ffffffff81086e37 #10 [ffffc900244efe98] _do_fork at ffffffff810884e0 #11 [ffffc900244eff10] sys_vfork at ffffffff810887ff #12 [ffffc900244eff20] do_syscall_64 at ffffffff81002a43 RIP: 000000000049b948 RSP: 00007ffcdb307830 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 0000000000896030 RCX: 000000000049b948 RDX: 0000000000000000 RSI: 00007ffcdb307790 RDI: 00000000005d7421 RBP: 000000000067370f R8: 00007ffcdb3077b0 R9: 000000000001ed00 R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000040 R13: 000000000000000f R14: 0000000000000000 R15: 000000000088d018 ORIG_RAX: 000000000000003a CS: 0033 SS: 002b The simplest fix is to assign tsk->stack right where it is allocated. Link: http://lkml.kernel.org/r/20181214231726.7ee4843c@imladris.surriel.com Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting") Signed-off-by: Rik van Riel <riel(a)surriel.com> Acked-by: Roman Gushchin <guro(a)fb.com> Acked-by: Michal Hocko <mhocko(a)suse.com> Cc: Shakeel Butt <shakeelb(a)google.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Tejun Heo <tj(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- kernel/fork.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) --- a/kernel/fork.c~forkmemcg-fix-crash-in-free_thread_stack-on-memcg-charge-fail +++ a/kernel/fork.c @@ -240,8 +240,10 @@ static unsigned long *alloc_thread_stack * free_thread_stack() can be called in interrupt context, * so cache the vm_struct. */ - if (stack) + if (stack) { tsk->stack_vm_area = find_vm_area(stack); + tsk->stack = stack; + } return stack; #else struct page *page = alloc_pages_node(node, THREADINFO_GFP, @@ -288,7 +290,10 @@ static struct kmem_cache *thread_stack_c static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node) { - return kmem_cache_alloc_node(thread_stack_cache, THREADINFO_GFP, node); + unsigned long *stack; + stack = kmem_cache_alloc_node(thread_stack_cache, THREADINFO_GFP, node); + tsk->stack = stack; + return stack; } static void free_thread_stack(struct task_struct *tsk) _

6 years, 8 months

1
0
0 0

[patch 2/4] mm: thp: fix flags for pmd migration when split

by akpm＠linux-foundation.org

From: Peter Xu <peterx(a)redhat.com> Subject: mm: thp: fix flags for pmd migration when split When splitting a huge migrating PMD, we'll transfer all the existing PMD bits and apply them again onto the small PTEs. However we are fetching the bits unconditionally via pmd_soft_dirty(), pmd_write() or pmd_yound() while actually they don't make sense at all when it's a migration entry. Fix them up. Since at it, drop the ifdef together as not needed. Note that if my understanding is correct about the problem then if without the patch there is chance to lose some of the dirty bits in the migrating pmd pages (on x86_64 we're fetching bit 11 which is part of swap offset instead of bit 2) and it could potentially corrupt the memory of an userspace program which depends on the dirty bit. Link: http://lkml.kernel.org/r/20181213051510.20306-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx(a)redhat.com> Reviewed-by: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru> Reviewed-by: William Kucharski <william.kucharski(a)oracle.com> Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Dave Jiang <dave.jiang(a)intel.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.vnet.ibm.com> Cc: Souptick Joarder <jrdr.linux(a)gmail.com> Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru> Cc: Zi Yan <zi.yan(a)cs.rutgers.edu> Cc: <stable(a)vger.kernel.org> [4.14+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/huge_memory.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) --- a/mm/huge_memory.c~mm-thp-fix-flags-for-pmd-migration-when-split +++ a/mm/huge_memory.c @@ -2144,23 +2144,25 @@ static void __split_huge_pmd_locked(stru */ old_pmd = pmdp_invalidate(vma, haddr, pmd); -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION pmd_migration = is_pmd_migration_entry(old_pmd); - if (pmd_migration) { + if (unlikely(pmd_migration)) { swp_entry_t entry; entry = pmd_to_swp_entry(old_pmd); page = pfn_to_page(swp_offset(entry)); - } else -#endif + write = is_write_migration_entry(entry); + young = false; + soft_dirty = pmd_swp_soft_dirty(old_pmd); + } else { page = pmd_page(old_pmd); + if (pmd_dirty(old_pmd)) + SetPageDirty(page); + write = pmd_write(old_pmd); + young = pmd_young(old_pmd); + soft_dirty = pmd_soft_dirty(old_pmd); + } VM_BUG_ON_PAGE(!page_count(page), page); page_ref_add(page, HPAGE_PMD_NR - 1); - if (pmd_dirty(old_pmd)) - SetPageDirty(page); - write = pmd_write(old_pmd); - young = pmd_young(old_pmd); - soft_dirty = pmd_soft_dirty(old_pmd); /* * Withdraw the table only after we mark the pmd entry invalid. _

6 years, 8 months

1
0
0 0

[patch 1/4] mm, memory_hotplug: initialize struct pages for the full memory section

by akpm＠linux-foundation.org

From: Mikhail Zaslonko <zaslonko(a)linux.ibm.com> Subject: mm, memory_hotplug: initialize struct pages for the full memory section If memory end is not aligned with the sparse memory section boundary, the mapping of such a section is only partly initialized. This may lead to VM_BUG_ON due to uninitialized struct page access from is_mem_section_removable() or test_pages_in_a_zone() function triggered by memory_hotplug sysfs handlers: Here are the the panic examples: CONFIG_DEBUG_VM=y CONFIG_DEBUG_VM_PGFLAGS=y kernel parameter mem=2050M -------------------------- page:000003d082008000 is uninitialized and poisoned page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) Call Trace: ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160) [<00000000008f15c4>] show_valid_zones+0x5c/0x190 [<00000000008cf9c4>] dev_attr_show+0x34/0x70 [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148 [<00000000003e4194>] seq_read+0x204/0x480 [<00000000003b53ea>] __vfs_read+0x32/0x178 [<00000000003b55b2>] vfs_read+0x82/0x138 [<00000000003b5be2>] ksys_read+0x5a/0xb0 [<0000000000b86ba0>] system_call+0xdc/0x2d8 Last Breaking-Event-Address: [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160 Kernel panic - not syncing: Fatal exception: panic_on_oops kernel parameter mem=3075M -------------------------- page:000003d08300c000 is uninitialized and poisoned page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) Call Trace: ([<000000000038596c>] is_mem_section_removable+0xb4/0x190) [<00000000008f12fa>] show_mem_removable+0x9a/0xd8 [<00000000008cf9c4>] dev_attr_show+0x34/0x70 [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148 [<00000000003e4194>] seq_read+0x204/0x480 [<00000000003b53ea>] __vfs_read+0x32/0x178 [<00000000003b55b2>] vfs_read+0x82/0x138 [<00000000003b5be2>] ksys_read+0x5a/0xb0 [<0000000000b86ba0>] system_call+0xdc/0x2d8 Last Breaking-Event-Address: [<000000000038596c>] is_mem_section_removable+0xb4/0x190 Kernel panic - not syncing: Fatal exception: panic_on_oops Fix the problem by initializing the last memory section of each zone in memmap_init_zone() till the very end, even if it goes beyond the zone end. Michal said: : This has alwways been problem AFAIU. It just went unnoticed because we : have zeroed memmaps during allocation before f7f99100d8d9 ("mm: stop : zeroing memory during allocation in vmemmap") and so the above test : would simply skip these ranges as belonging to zone 0 or provided a : garbage. : : So I guess we do care for post f7f99100d8d9 kernels mostly and : therefore Fixes: f7f99100d8d9 ("mm: stop zeroing memory during : allocation in vmemmap") Link: http://lkml.kernel.org/r/20181212172712.34019-2-zaslonko@linux.ibm.com Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap") Signed-off-by: Mikhail Zaslonko <zaslonko(a)linux.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer(a)de.ibm.com> Suggested-by: Michal Hocko <mhocko(a)kernel.org> Acked-by: Michal Hocko <mhocko(a)suse.com> Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov(a)gmail.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov(a)gmail.com> Cc: Dave Hansen <dave.hansen(a)intel.com> Cc: Alexander Duyck <alexander.h.duyck(a)linux.intel.com> Cc: Pasha Tatashin <Pavel.Tatashin(a)microsoft.com> Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com> Cc: Heiko Carstens <heiko.carstens(a)de.ibm.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_alloc.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) --- a/mm/page_alloc.c~mm-memory_hotplug-initialize-struct-pages-for-the-full-memory-section +++ a/mm/page_alloc.c @@ -5542,6 +5542,18 @@ void __meminit memmap_init_zone(unsigned cond_resched(); } } +#ifdef CONFIG_SPARSEMEM + /* + * If the zone does not span the rest of the section then + * we should at least initialize those pages. Otherwise we + * could blow up on a poisoned page in some paths which depend + * on full sections being initialized (e.g. memory hotplug). + */ + while (end_pfn % PAGES_PER_SECTION) { + __init_single_page(pfn_to_page(end_pfn), end_pfn, zone, nid); + end_pfn++; + } +#endif } #ifdef CONFIG_ZONE_DEVICE _

6 years, 8 months

1
0
0 0

stable/linux-4.9.y boot: 91 boots: 0 failed, 90 passed with 1 offline (v4.9.147)

by kernelci.org bot

stable/linux-4.9.y boot: 91 boots: 0 failed, 90 passed with 1 offline (v4.9.147) Full Boot Summary: https://kernelci.org/boot/all/job/stable/branch/linux-4.9.y/kernel/v4.9.147/ Full Build Summary: https://kernelci.org/build/stable/branch/linux-4.9.y/kernel/v4.9.147/ Tree: stable Branch: linux-4.9.y Git Describe: v4.9.147 Git Commit: bbfc30f29cb328111fec12975ded8223ecc8e1a5 Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git Tested: 48 unique boards, 20 SoC families, 13 builds out of 193 Offline Platforms: arm: multi_v7_defconfig: stih410-b2120: 1 offline lab --- For more info write to <info(a)kernelci.org>

6 years, 8 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror