This is the start of the stable review cycle for the 4.9.133 release. There are 35 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat Oct 13 15:25:09 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.133-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.9.133-rc1
Andy Lutomirski luto@kernel.org x86/fpu: Finish excising 'eagerfpu'
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "perf: sync up x86/.../cpufeatures.h"
Rik van Riel riel@redhat.com x86/fpu: Remove struct fpu::counter
Andy Lutomirski luto@kernel.org x86/fpu: Remove use_eager_fpu()
Gao Feng gfree.wind@vip.163.com ebtables: arpreply: Add the standard target sanity check
Zhi Chen zhichen@codeaurora.org ath10k: fix scan crash due to incorrect length calculation
Frederic Weisbecker fweisbec@gmail.com sched/cputime: Fix ksoftirqd cputime accounting regression
Frederic Weisbecker fweisbec@gmail.com sched/cputime: Increment kcpustat directly on irqtime account
Frederic Weisbecker fweisbec@gmail.com sched/cputime: Convert kcpustat to nsecs
Richard Weinberger richard@nod.at ubifs: Check for name being NULL while mounting
Cong Wang xiyou.wangcong@gmail.com ucma: fix a use-after-free in ucma_resolve_ip()
Chao Yu yuchao0@huawei.com f2fs: fix invalid memory access
Feng Tang feng.tang@intel.com x86/mm: Expand static page table for fixmap space
Vineet Gupta vgupta@synopsys.com ARC: clone syscall to setp r25 as thread pointer
Michal Suchanek msuchanek@suse.de powerpc/fadump: Return error when fadump registration fails
Yu Wang yyuwang@codeaurora.org ath10k: fix kernel panic issue during pci probe
Carl Huang cjhuang@codeaurora.org ath10k: fix use-after-free in ath10k_wmi_cmd_send_nowait
Prateek Sood prsood@codeaurora.org cgroup: Fix deadlock in cpu hotplug path
Theodore Ts'o tytso@mit.edu ext4: always verify the magic number in xattr blocks
Theodore Ts'o tytso@mit.edu ext4: add corruption check in ext4_xattr_set_entry()
Guenter Roeck linux@roeck-us.net of: unittest: Disable interrupt node tests for old world MAC systems
Dmitry Safonov dima@arista.com tty: Drop tty->count on tty_reopen() failure
Johan Hovold johan@kernel.org USB: serial: simple: add Motorola Tetra MTP6550 id
Chunfeng Yun chunfeng.yun@mediatek.com usb: xhci-mtk: resume USB3 roothub first
Mathias Nyman mathias.nyman@linux.intel.com xhci: Add missing CAS workaround for Intel Sunrise Point xHCI
Mike Snitzer snitzer@redhat.com dm cache: fix resize crash if user doesn't reload cache table
Joe Thornber ejt@redhat.com dm cache metadata: ignore hints array being too small during resize
Rafael J. Wysocki rafael.j.wysocki@intel.com PM / core: Clear the direct_complete flag on errors
Felix Fietkau nbd@nbd.name mac80211: fix setting IEEE80211_KEY_FLAG_RX_MGMT for AP mode keys
Daniel Drake drake@endlessm.com PCI: Reprogram bridge prefetch registers on resume
Andy Lutomirski luto@kernel.org x86/vdso: Fix vDSO syscall fallback asm constraint regression
Andy Lutomirski luto@kernel.org x86/vdso: Fix asm constraints on vDSO syscall fallbacks
Jan Beulich JBeulich@suse.com xen-netback: fix input validation in xenvif_set_hash_mapping()
Tomi Valkeinen tomi.valkeinen@ti.com fbdev/omapfb: fix omapfb_memory_read infoleak
Jann Horn jannh@google.com mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly
-------------
Diffstat:
Documentation/kernel-parameters.txt | 6 -- Makefile | 4 +- arch/arc/kernel/process.c | 20 +++++++ arch/powerpc/kernel/fadump.c | 23 +++++--- arch/s390/appldata/appldata_os.c | 16 +++--- arch/x86/crypto/crc32c-intel_glue.c | 17 ++---- arch/x86/entry/vdso/vclock_gettime.c | 26 +++++---- arch/x86/include/asm/cpufeatures.h | 1 - arch/x86/include/asm/fixmap.h | 10 ++++ arch/x86/include/asm/fpu/internal.h | 37 +----------- arch/x86/include/asm/fpu/types.h | 34 ----------- arch/x86/include/asm/pgtable_64.h | 3 +- arch/x86/include/asm/trace/fpu.h | 5 +- arch/x86/kernel/fpu/core.c | 39 ++----------- arch/x86/kernel/fpu/signal.c | 8 +-- arch/x86/kernel/fpu/xstate.c | 9 --- arch/x86/kernel/head_64.S | 16 ++++-- arch/x86/kvm/cpuid.c | 4 +- arch/x86/kvm/x86.c | 10 ---- arch/x86/mm/pgtable.c | 9 +++ arch/x86/mm/pkeys.c | 3 +- arch/x86/xen/mmu.c | 8 ++- drivers/base/power/main.c | 5 +- drivers/cpufreq/cpufreq.c | 6 +- drivers/cpufreq/cpufreq_governor.c | 2 +- drivers/cpufreq/cpufreq_stats.c | 1 - drivers/infiniband/core/ucma.c | 2 + drivers/macintosh/rack-meter.c | 2 +- drivers/md/dm-cache-metadata.c | 4 +- drivers/md/dm-cache-target.c | 9 ++- drivers/net/wireless/ath/ath10k/debug.c | 12 +++- drivers/net/wireless/ath/ath10k/trace.h | 12 ++-- drivers/net/wireless/ath/ath10k/wmi-tlv.c | 8 +-- drivers/net/wireless/ath/ath10k/wmi.c | 2 +- drivers/net/xen-netback/hash.c | 12 ++-- drivers/of/unittest.c | 28 ++++++--- drivers/pci/pci.c | 27 ++++++--- drivers/tty/tty_io.c | 11 +++- drivers/usb/host/xhci-mtk.c | 4 +- drivers/usb/host/xhci-pci.c | 2 + drivers/usb/serial/usb-serial-simple.c | 3 +- drivers/video/fbdev/omap2/omapfb/omapfb-ioctl.c | 5 +- fs/ext4/xattr.c | 28 +++++---- fs/f2fs/checkpoint.c | 9 +-- fs/proc/stat.c | 68 +++++++++++----------- fs/proc/uptime.c | 7 +-- fs/ubifs/super.c | 3 + include/linux/netfilter_bridge/ebtables.h | 5 ++ kernel/cgroup.c | 6 +- kernel/sched/cpuacct.c | 2 +- kernel/sched/cputime.c | 75 +++++++++++-------------- kernel/sched/sched.h | 12 +++- mm/vmstat.c | 3 + net/bridge/netfilter/ebt_arpreply.c | 3 + net/mac80211/cfg.c | 2 +- tools/arch/x86/include/asm/cpufeatures.h | 1 - 56 files changed, 335 insertions(+), 354 deletions(-)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jann Horn jannh@google.com
commit 58bc4c34d249bf1bc50730a9a209139347cfacfe upstream.
5dd0b16cdaff ("mm/vmstat: Make NR_TLB_REMOTE_FLUSH_RECEIVED available even on UP") made the availability of the NR_TLB_REMOTE_FLUSH* counters inside the kernel unconditional to reduce #ifdef soup, but (either to avoid showing dummy zero counters to userspace, or because that code was missed) didn't update the vmstat_array, meaning that all following counters would be shown with incorrect values.
This only affects kernel builds with CONFIG_VM_EVENT_COUNTERS=y && CONFIG_DEBUG_TLBFLUSH=y && CONFIG_SMP=n.
Link: http://lkml.kernel.org/r/20181001143138.95119-2-jannh@google.com Fixes: 5dd0b16cdaff ("mm/vmstat: Make NR_TLB_REMOTE_FLUSH_RECEIVED available even on UP") Signed-off-by: Jann Horn jannh@google.com Reviewed-by: Kees Cook keescook@chromium.org Reviewed-by: Andrew Morton akpm@linux-foundation.org Acked-by: Michal Hocko mhocko@suse.com Acked-by: Roman Gushchin guro@fb.com Cc: Davidlohr Bueso dave@stgolabs.net Cc: Oleg Nesterov oleg@redhat.com Cc: Christoph Lameter clameter@sgi.com Cc: Kemi Wang kemi.wang@intel.com Cc: Andy Lutomirski luto@kernel.org Cc: Ingo Molnar mingo@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/vmstat.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1078,6 +1078,9 @@ const char * const vmstat_text[] = { #ifdef CONFIG_SMP "nr_tlb_remote_flush", "nr_tlb_remote_flush_received", +#else + "", /* nr_tlb_remote_flush */ + "", /* nr_tlb_remote_flush_received */ #endif /* CONFIG_SMP */ "nr_tlb_local_flush_all", "nr_tlb_local_flush_one",
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tomi Valkeinen tomi.valkeinen@ti.com
commit 1bafcbf59fed92af58955024452f45430d3898c5 upstream.
OMAPFB_MEMORY_READ ioctl reads pixels from the LCD's memory and copies them to a userspace buffer. The code has two issues:
- The user provided width and height could be large enough to overflow the calculations - The copy_to_user() can copy uninitialized memory to the userspace, which might contain sensitive kernel information.
Fix these by limiting the width & height parameters, and only copying the amount of data that we actually received from the LCD.
Signed-off-by: Tomi Valkeinen tomi.valkeinen@ti.com Reported-by: Jann Horn jannh@google.com Cc: stable@vger.kernel.org Cc: security@kernel.org Cc: Will Deacon will.deacon@arm.com Cc: Jann Horn jannh@google.com Cc: Tony Lindgren tony@atomide.com Signed-off-by: Bartlomiej Zolnierkiewicz b.zolnierkie@samsung.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/video/fbdev/omap2/omapfb/omapfb-ioctl.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/video/fbdev/omap2/omapfb/omapfb-ioctl.c +++ b/drivers/video/fbdev/omap2/omapfb/omapfb-ioctl.c @@ -496,6 +496,9 @@ static int omapfb_memory_read(struct fb_ if (!access_ok(VERIFY_WRITE, mr->buffer, mr->buffer_size)) return -EFAULT;
+ if (mr->w > 4096 || mr->h > 4096) + return -EINVAL; + if (mr->w * mr->h * 3 > mr->buffer_size) return -EINVAL;
@@ -509,7 +512,7 @@ static int omapfb_memory_read(struct fb_ mr->x, mr->y, mr->w, mr->h);
if (r > 0) { - if (copy_to_user(mr->buffer, buf, mr->buffer_size)) + if (copy_to_user(mr->buffer, buf, r)) r = -EFAULT; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jan Beulich JBeulich@suse.com
commit 780e83c259fc33e8959fed8dfdad17e378d72b62 upstream.
Both len and off are frontend specified values, so we need to make sure there's no overflow when adding the two for the bounds check. We also want to avoid undefined behavior and hence use off to index into ->hash.mapping[] only after bounds checking. This at the same time allows to take care of not applying off twice for the bounds checking against vif->num_queues.
It is also insufficient to bounds check copy_op.len, as this is len truncated to 16 bits.
This is XSA-270 / CVE-2018-15471.
Reported-by: Felix Wilhelm fwilhelm@google.com Signed-off-by: Jan Beulich jbeulich@suse.com Reviewed-by: Paul Durrant paul.durrant@citrix.com Tested-by: Paul Durrant paul.durrant@citrix.com Cc: stable@vger.kernel.org [4.7 onwards] Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/xen-netback/hash.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
--- a/drivers/net/xen-netback/hash.c +++ b/drivers/net/xen-netback/hash.c @@ -332,20 +332,22 @@ u32 xenvif_set_hash_mapping_size(struct u32 xenvif_set_hash_mapping(struct xenvif *vif, u32 gref, u32 len, u32 off) { - u32 *mapping = &vif->hash.mapping[off]; + u32 *mapping = vif->hash.mapping; struct gnttab_copy copy_op = { .source.u.ref = gref, .source.domid = vif->domid, - .dest.u.gmfn = virt_to_gfn(mapping), .dest.domid = DOMID_SELF, - .dest.offset = xen_offset_in_page(mapping), - .len = len * sizeof(u32), + .len = len * sizeof(*mapping), .flags = GNTCOPY_source_gref };
- if ((off + len > vif->hash.size) || copy_op.len > XEN_PAGE_SIZE) + if ((off + len < off) || (off + len > vif->hash.size) || + len > XEN_PAGE_SIZE / sizeof(*mapping)) return XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER;
+ copy_op.dest.u.gmfn = virt_to_gfn(mapping + off); + copy_op.dest.offset = xen_offset_in_page(mapping + off); + while (len-- != 0) if (mapping[off++] >= vif->num_queues) return XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit 715bd9d12f84d8f5cc8ad21d888f9bc304a8eb0b upstream.
The syscall fallbacks in the vDSO have incorrect asm constraints. They are not marked as writing to their outputs -- instead, they are marked as clobbering "memory", which is useless. In particular, gcc is smart enough to know that the timespec parameter hasn't escaped, so a memory clobber doesn't clobber it. And passing a pointer as an asm *input* does not tell gcc that the pointed-to value is changed.
Add in the fact that the asm instructions weren't volatile, and gcc was free to omit them entirely unless their sole output (the return value) is used. Which it is (phew!), but that stops happening with some upcoming patches.
As a trivial example, the following code:
void test_fallback(struct timespec *ts) { vdso_fallback_gettime(CLOCK_MONOTONIC, ts); }
compiles to:
00000000000000c0 <test_fallback>: c0: c3 retq
To add insult to injury, the RCX and R11 clobbers on 64-bit builds were missing.
The "memory" clobber is also unnecessary -- no ordering with respect to other memory operations is needed, but that's going to be fixed in a separate not-for-stable patch.
Fixes: 2aae950b21e4 ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu") Signed-off-by: Andy Lutomirski luto@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/2c0231690551989d2fafa60ed0e7b5cc8b403908.153842229... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/entry/vdso/vclock_gettime.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-)
--- a/arch/x86/entry/vdso/vclock_gettime.c +++ b/arch/x86/entry/vdso/vclock_gettime.c @@ -37,8 +37,9 @@ extern u8 pvclock_page notrace static long vdso_fallback_gettime(long clock, struct timespec *ts) { long ret; - asm("syscall" : "=a" (ret) : - "0" (__NR_clock_gettime), "D" (clock), "S" (ts) : "memory"); + asm ("syscall" : "=a" (ret), "=m" (*ts) : + "0" (__NR_clock_gettime), "D" (clock), "S" (ts) : + "memory", "rcx", "r11"); return ret; }
@@ -46,8 +47,9 @@ notrace static long vdso_fallback_gtod(s { long ret;
- asm("syscall" : "=a" (ret) : - "0" (__NR_gettimeofday), "D" (tv), "S" (tz) : "memory"); + asm ("syscall" : "=a" (ret), "=m" (*tv), "=m" (*tz) : + "0" (__NR_gettimeofday), "D" (tv), "S" (tz) : + "memory", "rcx", "r11"); return ret; }
@@ -58,12 +60,12 @@ notrace static long vdso_fallback_gettim { long ret;
- asm( + asm ( "mov %%ebx, %%edx \n" "mov %2, %%ebx \n" "call __kernel_vsyscall \n" "mov %%edx, %%ebx \n" - : "=a" (ret) + : "=a" (ret), "=m" (*ts) : "0" (__NR_clock_gettime), "g" (clock), "c" (ts) : "memory", "edx"); return ret; @@ -73,12 +75,12 @@ notrace static long vdso_fallback_gtod(s { long ret;
- asm( + asm ( "mov %%ebx, %%edx \n" "mov %2, %%ebx \n" "call __kernel_vsyscall \n" "mov %%edx, %%ebx \n" - : "=a" (ret) + : "=a" (ret), "=m" (*tv), "=m" (*tz) : "0" (__NR_gettimeofday), "g" (tv), "c" (tz) : "memory", "edx"); return ret;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit 02e425668f5c9deb42787d10001a3b605993ad15 upstream.
When I added the missing memory outputs, I failed to update the index of the first argument (ebx) on 32-bit builds, which broke the fallbacks. Somehow I must have screwed up my testing or gotten lucky.
Add another test to cover gettimeofday() as well.
Signed-off-by: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Fixes: 715bd9d12f84 ("x86/vdso: Fix asm constraints on vDSO syscall fallbacks") Link: http://lkml.kernel.org/r/21bd45ab04b6d838278fa5bebfa9163eceffa13c.1538608971... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/entry/vdso/vclock_gettime.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/arch/x86/entry/vdso/vclock_gettime.c +++ b/arch/x86/entry/vdso/vclock_gettime.c @@ -62,11 +62,11 @@ notrace static long vdso_fallback_gettim
asm ( "mov %%ebx, %%edx \n" - "mov %2, %%ebx \n" + "mov %[clock], %%ebx \n" "call __kernel_vsyscall \n" "mov %%edx, %%ebx \n" : "=a" (ret), "=m" (*ts) - : "0" (__NR_clock_gettime), "g" (clock), "c" (ts) + : "0" (__NR_clock_gettime), [clock] "g" (clock), "c" (ts) : "memory", "edx"); return ret; } @@ -77,11 +77,11 @@ notrace static long vdso_fallback_gtod(s
asm ( "mov %%ebx, %%edx \n" - "mov %2, %%ebx \n" + "mov %[tv], %%ebx \n" "call __kernel_vsyscall \n" "mov %%edx, %%ebx \n" : "=a" (ret), "=m" (*tv), "=m" (*tz) - : "0" (__NR_gettimeofday), "g" (tv), "c" (tz) + : "0" (__NR_gettimeofday), [tv] "g" (tv), "c" (tz) : "memory", "edx"); return ret; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Drake drake@endlessm.com
commit 083874549fdfefa629dfa752785e20427dde1511 upstream.
On 38+ Intel-based ASUS products, the NVIDIA GPU becomes unusable after S3 suspend/resume. The affected products include multiple generations of NVIDIA GPUs and Intel SoCs. After resume, nouveau logs many errors such as:
fifo: fault 00 [READ] at 0000005555555000 engine 00 [GR] client 04 [HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown] DRM: failed to idle channel 0 [DRM]
Similarly, the NVIDIA proprietary driver also fails after resume (black screen, 100% CPU usage in Xorg process). We shipped a sample to NVIDIA for diagnosis, and their response indicated that it's a problem with the parent PCI bridge (on the Intel SoC), not the GPU.
Runtime suspend/resume works fine, only S3 suspend is affected.
We found a workaround: on resume, rewrite the Intel PCI bridge 'Prefetchable Base Upper 32 Bits' register (PCI_PREF_BASE_UPPER32). In the cases that I checked, this register has value 0 and we just have to rewrite that value.
Linux already saves and restores PCI config space during suspend/resume, but this register was being skipped because upon resume, it already has value 0 (the correct, pre-suspend value).
Intel appear to have previously acknowledged this behaviour and the requirement to rewrite this register: https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23
Based on that, rewrite the prefetch register values even when that appears unnecessary.
We have confirmed this solution on all the affected models we have in-hands (X542UQ, UX533FD, X530UN, V272UN).
Additionally, this solves an issue where r8169 MSI-X interrupts were broken after S3 suspend/resume on ASUS X441UAR. This issue was recently worked around in commit 7bb05b85bc2d ("r8169: don't use MSI-X on RTL8106e"). It also fixes the same issue on RTL6186evl/8111evl on an Aimfor-tech laptop that we had not yet patched. I suspect it will also fix the issue that was worked around in commit 7c53a722459c ("r8169: don't use MSI-X on RTL8168g").
Thomas Martitz reports that this change also solves an issue where the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive after S3 suspend/resume.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201069 Signed-off-by: Daniel Drake drake@endlessm.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Reviewed-By: Peter Wu peter@lekensteyn.nl CC: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/pci.c | 27 +++++++++++++++++++-------- 1 file changed, 19 insertions(+), 8 deletions(-)
--- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1114,12 +1114,12 @@ int pci_save_state(struct pci_dev *dev) EXPORT_SYMBOL(pci_save_state);
static void pci_restore_config_dword(struct pci_dev *pdev, int offset, - u32 saved_val, int retry) + u32 saved_val, int retry, bool force) { u32 val;
pci_read_config_dword(pdev, offset, &val); - if (val == saved_val) + if (!force && val == saved_val) return;
for (;;) { @@ -1138,25 +1138,36 @@ static void pci_restore_config_dword(str }
static void pci_restore_config_space_range(struct pci_dev *pdev, - int start, int end, int retry) + int start, int end, int retry, + bool force) { int index;
for (index = end; index >= start; index--) pci_restore_config_dword(pdev, 4 * index, pdev->saved_config_space[index], - retry); + retry, force); }
static void pci_restore_config_space(struct pci_dev *pdev) { if (pdev->hdr_type == PCI_HEADER_TYPE_NORMAL) { - pci_restore_config_space_range(pdev, 10, 15, 0); + pci_restore_config_space_range(pdev, 10, 15, 0, false); /* Restore BARs before the command register. */ - pci_restore_config_space_range(pdev, 4, 9, 10); - pci_restore_config_space_range(pdev, 0, 3, 0); + pci_restore_config_space_range(pdev, 4, 9, 10, false); + pci_restore_config_space_range(pdev, 0, 3, 0, false); + } else if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { + pci_restore_config_space_range(pdev, 12, 15, 0, false); + + /* + * Force rewriting of prefetch registers to avoid S3 resume + * issues on Intel PCI bridges that occur when these + * registers are not explicitly written. + */ + pci_restore_config_space_range(pdev, 9, 11, 0, true); + pci_restore_config_space_range(pdev, 0, 8, 0, false); } else { - pci_restore_config_space_range(pdev, 0, 15, 0); + pci_restore_config_space_range(pdev, 0, 15, 0, false); } }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Felix Fietkau nbd@nbd.name
commit 211710ca74adf790b46ab3867fcce8047b573cd1 upstream.
key->sta is only valid after ieee80211_key_link, which is called later in this function. Because of that, the IEEE80211_KEY_FLAG_RX_MGMT is never set when management frame protection is enabled.
Fixes: e548c49e6dc6b ("mac80211: add key flag for management keys") Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau nbd@nbd.name Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/mac80211/cfg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -386,7 +386,7 @@ static int ieee80211_add_key(struct wiph case NL80211_IFTYPE_AP: case NL80211_IFTYPE_AP_VLAN: /* Keys without a station are used for TX only */ - if (key->sta && test_sta_flag(key->sta, WLAN_STA_MFP)) + if (sta && test_sta_flag(sta, WLAN_STA_MFP)) key->conf.flags |= IEEE80211_KEY_FLAG_RX_MGMT; break; case NL80211_IFTYPE_ADHOC:
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit 69e445ab8b66a9f30519842ef18be555d3ee9b51 upstream.
If __device_suspend() runs asynchronously (in which case the device passed to it is in dpm_suspended_list at that point) and it returns early on an error or pending wakeup, and the power.direct_complete flag has been set for the device already, the subsequent device_resume() will be confused by that and it will call pm_runtime_enable() incorrectly, as runtime PM has not been disabled for the device by __device_suspend().
To avoid that, clear power.direct_complete if __device_suspend() is not going to disable runtime PM for the device before returning.
Fixes: aae4518b3124 (PM / sleep: Mechanism to avoid resuming runtime-suspended devices unnecessarily) Reported-by: Al Cooper alcooperx@gmail.com Tested-by: Al Cooper alcooperx@gmail.com Reviewed-by: Ulf Hansson ulf.hansson@linaro.org Cc: 3.16+ stable@vger.kernel.org # 3.16+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/base/power/main.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -1360,8 +1360,10 @@ static int __device_suspend(struct devic
dpm_wait_for_children(dev, async);
- if (async_error) + if (async_error) { + dev->power.direct_complete = false; goto Complete; + }
/* * If a device configured to wake up the system from sleep states @@ -1373,6 +1375,7 @@ static int __device_suspend(struct devic pm_wakeup_event(dev, 0);
if (pm_wakeup_pending()) { + dev->power.direct_complete = false; async_error = -EBUSY; goto Complete; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Joe Thornber ejt@redhat.com
commit 4561ffca88c546f96367f94b8f1e4715a9c62314 upstream.
Commit fd2fa9541 ("dm cache metadata: save in-core policy_hint_size to on-disk superblock") enabled previously written policy hints to be used after a cache is reactivated. But in doing so the cache metadata's hint array was left exposed to out of bounds access because on resize the metadata's on-disk hint array wasn't ever extended.
Fix this by ignoring that there are no on-disk hints associated with the newly added cache blocks. An expanded on-disk hint array is later rewritten upon the next clean shutdown of the cache.
Fixes: fd2fa9541 ("dm cache metadata: save in-core policy_hint_size to on-disk superblock") Cc: stable@vger.kernel.org Signed-off-by: Joe Thornber ejt@redhat.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-cache-metadata.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/md/dm-cache-metadata.c +++ b/drivers/md/dm-cache-metadata.c @@ -1262,8 +1262,8 @@ static int __load_mappings(struct dm_cac if (hints_valid) { r = dm_array_cursor_next(&cmd->hint_cursor); if (r) { - DMERR("dm_array_cursor_next for hint failed"); - goto out; + dm_array_cursor_end(&cmd->hint_cursor); + hints_valid = false; } } }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mike Snitzer snitzer@redhat.com
commit 5d07384a666d4b2f781dc056bfeec2c27fbdf383 upstream.
A reload of the cache's DM table is needed during resize because otherwise a crash will occur when attempting to access smq policy entries associated with the portion of the cache that was recently extended.
The reason is cache-size based data structures in the policy will not be resized, the only way to safely extend the cache is to allow for a proper cache policy initialization that occurs when the cache table is loaded. For example the smq policy's space_init(), init_allocator(), calc_hotspot_params() must be sized based on the extended cache size.
The fix for this is to disallow cache resizes of this pattern: 1) suspend "cache" target's device 2) resize the fast device used for the cache 3) resume "cache" target's device
Instead, the last step must be a full reload of the cache's DM table.
Fixes: 66a636356 ("dm cache: add stochastic-multi-queue (smq) policy") Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-cache-target.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/drivers/md/dm-cache-target.c +++ b/drivers/md/dm-cache-target.c @@ -3390,8 +3390,13 @@ static dm_cblock_t get_cache_dev_size(st
static bool can_resize(struct cache *cache, dm_cblock_t new_size) { - if (from_cblock(new_size) > from_cblock(cache->cache_size)) - return true; + if (from_cblock(new_size) > from_cblock(cache->cache_size)) { + if (cache->sized) { + DMERR("%s: unable to extend cache due to missing cache table reload", + cache_device_name(cache)); + return false; + } + }
/* * We can't drop a dirty block when shrinking the cache.
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mathias Nyman mathias.nyman@linux.intel.com
commit ffe84e01bb1b38c7eb9c6b6da127a6c136d251df upstream.
The workaround for missing CAS bit is also needed for xHC on Intel sunrisepoint PCH. For more details see:
Intel 100/c230 series PCH specification update Doc #332692-006 Errata #8
Cc: stable@vger.kernel.org Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/host/xhci-pci.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -179,6 +179,8 @@ static void xhci_pci_quirks(struct devic } if (pdev->vendor == PCI_VENDOR_ID_INTEL && (pdev->device == PCI_DEVICE_ID_INTEL_CHERRYVIEW_XHCI || + pdev->device == PCI_DEVICE_ID_INTEL_SUNRISEPOINT_LP_XHCI || + pdev->device == PCI_DEVICE_ID_INTEL_SUNRISEPOINT_H_XHCI || pdev->device == PCI_DEVICE_ID_INTEL_APL_XHCI || pdev->device == PCI_DEVICE_ID_INTEL_DNV_XHCI)) xhci->quirks |= XHCI_MISSING_CAS;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chunfeng Yun chunfeng.yun@mediatek.com
commit 555df5820e733cded7eb8d0bf78b2a791be51d75 upstream.
Give USB3 devices a better chance to enumerate at USB3 speeds if they are connected to a suspended host. Porting from "671ffdff5b13 xhci: resume USB 3 roothub first"
Cc: stable@vger.kernel.org Signed-off-by: Chunfeng Yun chunfeng.yun@mediatek.com Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/host/xhci-mtk.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/usb/host/xhci-mtk.c +++ b/drivers/usb/host/xhci-mtk.c @@ -735,10 +735,10 @@ static int __maybe_unused xhci_mtk_resum xhci_mtk_host_enable(mtk);
xhci_dbg(xhci, "%s: restart port polling\n", __func__); - set_bit(HCD_FLAG_POLL_RH, &hcd->flags); - usb_hcd_poll_rh_status(hcd); set_bit(HCD_FLAG_POLL_RH, &xhci->shared_hcd->flags); usb_hcd_poll_rh_status(xhci->shared_hcd); + set_bit(HCD_FLAG_POLL_RH, &hcd->flags); + usb_hcd_poll_rh_status(hcd); return 0; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan@kernel.org
commit f5fad711c06e652f90f581fc7c2caee327c33d31 upstream.
Add device-id for the Motorola Tetra radio MTP6550.
Bus 001 Device 004: ID 0cad:9012 Motorola CGISS Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass 0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 64 idVendor 0x0cad Motorola CGISS idProduct 0x9012 bcdDevice 24.16 iManufacturer 1 Motorola Solutions, Inc. iProduct 2 TETRA PEI interface iSerial 0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 55 bNumInterfaces 2 bConfigurationValue 1 iConfiguration 3 Generic Serial config bmAttributes 0x80 (Bus Powered) MaxPower 500mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 0 bInterfaceProtocol 0 iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x01 EP 1 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 1 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 0 bInterfaceProtocol 0 iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x02 EP 2 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Device Qualifier (for other device speed): bLength 10 bDescriptorType 6 bcdUSB 2.00 bDeviceClass 0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 64 bNumConfigurations 1 Device Status: 0x0000 (Bus Powered)
Reported-by: Hans Hult hanshult35@gmail.com Cc: stable stable@vger.kernel.org Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/serial/usb-serial-simple.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/usb/serial/usb-serial-simple.c +++ b/drivers/usb/serial/usb-serial-simple.c @@ -87,7 +87,8 @@ DEVICE(moto_modem, MOTO_IDS);
/* Motorola Tetra driver */ #define MOTOROLA_TETRA_IDS() \ - { USB_DEVICE(0x0cad, 0x9011) } /* Motorola Solutions TETRA PEI */ + { USB_DEVICE(0x0cad, 0x9011) }, /* Motorola Solutions TETRA PEI */ \ + { USB_DEVICE(0x0cad, 0x9012) } /* MTP6550 */ DEVICE(motorola_tetra, MOTOROLA_TETRA_IDS);
/* Novatel Wireless GPS driver */
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Safonov dima@arista.com
commit fe32416790093b31364c08395727de17ec96ace1 upstream.
In case of tty_ldisc_reinit() failure, tty->count should be decremented back, otherwise we will never release_tty(). Tetsuo reported that it fixes noisy warnings on tty release like: pts pts4033: tty_release: tty->count(10529) != (#fd's(7) + #kopen's(0))
Fixes: commit 892d1fa7eaae ("tty: Destroy ldisc instance on hangup")
Cc: stable@vger.kernel.org # v4.6+ Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Jiri Slaby jslaby@suse.com Reviewed-by: Jiri Slaby jslaby@suse.cz Tested-by: Jiri Slaby jslaby@suse.com Tested-by: Mark Rutland mark.rutland@arm.com Tested-by: Tetsuo Handa penguin-kernel@i-love.sakura.ne.jp Signed-off-by: Dmitry Safonov dima@arista.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/tty/tty_io.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
--- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -1475,6 +1475,7 @@ static void tty_driver_remove_tty(struct static int tty_reopen(struct tty_struct *tty) { struct tty_driver *driver = tty->driver; + int retval;
if (driver->type == TTY_DRIVER_TYPE_PTY && driver->subtype == PTY_TYPE_MASTER) @@ -1488,10 +1489,14 @@ static int tty_reopen(struct tty_struct
tty->count++;
- if (!tty->ldisc) - return tty_ldisc_reinit(tty, tty->termios.c_line); + if (tty->ldisc) + return 0;
- return 0; + retval = tty_ldisc_reinit(tty, tty->termios.c_line); + if (retval) + tty->count--; + + return retval; }
/**
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guenter Roeck linux@roeck-us.net
commit 8894891446c9380709451b99ab45c5c53adfd2fc upstream.
On systems with OF_IMAP_OLDWORLD_MAC set in of_irq_workarounds, the devicetree interrupt parsing code is different, causing unit tests of devicetree interrupt nodes to fail. Due to a bug in unittest code, which tries to dereference an uninitialized pointer, this results in a crash.
OF: /testcase-data/phandle-tests/consumer-a: arguments longer than property Unable to handle kernel paging request for data at address 0x00bc616e Faulting instruction address: 0xc08e9468 Oops: Kernel access of bad area, sig: 11 [#1] BE PREEMPT PowerMac Modules linked in: CPU: 0 PID: 1 Comm: swapper Not tainted 4.14.72-rc1-yocto-standard+ #1 task: cf8e0000 task.stack: cf8da000 NIP: c08e9468 LR: c08ea5bc CTR: c08ea5ac REGS: cf8dbb50 TRAP: 0300 Not tainted (4.14.72-rc1-yocto-standard+) MSR: 00001032 <ME,IR,DR,RI> CR: 82004044 XER: 00000000 DAR: 00bc616e DSISR: 40000000 GPR00: c08ea5bc cf8dbc00 cf8e0000 c13ca517 c13ca517 c13ca8a0 00000066 00000002 GPR08: 00000063 00bc614e c0b05865 000affff 82004048 00000000 c00047f0 00000000 GPR16: c0a80000 c0a9cc34 c13ca517 c0ad1134 05ffffff 000affff c0b05860 c0abeef8 GPR24: cecec278 cecec278 c0a8c4d0 c0a885e0 c13ca8a0 05ffffff c13ca8a0 c13ca517
NIP [c08e9468] device_node_gen_full_name+0x30/0x15c LR [c08ea5bc] device_node_string+0x190/0x3c8 Call Trace: [cf8dbc00] [c007f670] trace_hardirqs_on_caller+0x118/0x1fc (unreliable) [cf8dbc40] [c08ea5bc] device_node_string+0x190/0x3c8 [cf8dbcb0] [c08eb794] pointer+0x25c/0x4d0 [cf8dbd00] [c08ebcbc] vsnprintf+0x2b4/0x5ec [cf8dbd60] [c08ec00c] vscnprintf+0x18/0x48 [cf8dbd70] [c008e268] vprintk_store+0x4c/0x22c [cf8dbda0] [c008ecac] vprintk_emit+0x94/0x130 [cf8dbdd0] [c008ff54] printk+0x5c/0x6c [cf8dbe10] [c0b8ddd4] of_unittest+0x2220/0x26f8 [cf8dbea0] [c0004434] do_one_initcall+0x4c/0x184 [cf8dbf00] [c0b4534c] kernel_init_freeable+0x13c/0x1d8 [cf8dbf30] [c0004814] kernel_init+0x24/0x118 [cf8dbf40] [c0013398] ret_from_kernel_thread+0x5c/0x64
The problem was observed when running a qemu test for the g3beige machine with devicetree unittests enabled.
Disable interrupt node tests on affected systems to avoid both false unittest failures and the crash.
With this patch in place, unittest on the affected system passes with the following message.
dt-test ### end of unittest - 144 passed, 0 failed
Fixes: 53a42093d96ef ("of: Add device tree selftests") Signed-off-by: Guenter Roeck linux@roeck-us.net Reviewed-by: Frank Rowand frank.rowand@sony.com Signed-off-by: Rob Herring robh@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/of/unittest.c | 28 +++++++++++++++++++--------- 1 file changed, 19 insertions(+), 9 deletions(-)
--- a/drivers/of/unittest.c +++ b/drivers/of/unittest.c @@ -548,6 +548,9 @@ static void __init of_unittest_parse_int struct of_phandle_args args; int i, rc;
+ if (of_irq_workarounds & OF_IMAP_OLDWORLD_MAC) + return; + np = of_find_node_by_path("/testcase-data/interrupts/interrupts0"); if (!np) { pr_err("missing testcase data\n"); @@ -622,6 +625,9 @@ static void __init of_unittest_parse_int struct of_phandle_args args; int i, rc;
+ if (of_irq_workarounds & OF_IMAP_OLDWORLD_MAC) + return; + np = of_find_node_by_path("/testcase-data/interrupts/interrupts-extended0"); if (!np) { pr_err("missing testcase data\n"); @@ -778,15 +784,19 @@ static void __init of_unittest_platform_ pdev = of_find_device_by_node(np); unittest(pdev, "device 1 creation failed\n");
- irq = platform_get_irq(pdev, 0); - unittest(irq == -EPROBE_DEFER, "device deferred probe failed - %d\n", irq); - - /* Test that a parsing failure does not return -EPROBE_DEFER */ - np = of_find_node_by_path("/testcase-data/testcase-device2"); - pdev = of_find_device_by_node(np); - unittest(pdev, "device 2 creation failed\n"); - irq = platform_get_irq(pdev, 0); - unittest(irq < 0 && irq != -EPROBE_DEFER, "device parsing error failed - %d\n", irq); + if (!(of_irq_workarounds & OF_IMAP_OLDWORLD_MAC)) { + irq = platform_get_irq(pdev, 0); + unittest(irq == -EPROBE_DEFER, + "device deferred probe failed - %d\n", irq); + + /* Test that a parsing failure does not return -EPROBE_DEFER */ + np = of_find_node_by_path("/testcase-data/testcase-device2"); + pdev = of_find_device_by_node(np); + unittest(pdev, "device 2 creation failed\n"); + irq = platform_get_irq(pdev, 0); + unittest(irq < 0 && irq != -EPROBE_DEFER, + "device parsing error failed - %d\n", irq); + }
np = of_find_node_by_path("/testcase-data/platform-tests"); unittest(np, "No testcase data in device tree\n");
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Theodore Ts'o tytso@mit.edu
commit 5369a762c882c0b6e9599e4ebbb3a9ba9eee7e2d upstream.
In theory this should have been caught earlier when the xattr list was verified, but in case it got missed, it's simple enough to add check to make sure we don't overrun the xattr buffer.
This addresses CVE-2018-10879.
https://bugzilla.kernel.org/show_bug.cgi?id=200001
Signed-off-by: Theodore Ts'o tytso@mit.edu Reviewed-by: Andreas Dilger adilger@dilger.ca [bwh: Backported to 3.16: - Add inode parameter to ext4_xattr_set_entry() and update callers - Return -EIO instead of -EFSCORRUPTED on error - Adjust context] Signed-off-by: Ben Hutchings ben@decadent.org.uk [adjusted context for 4.9] Signed-off-by: Daniel Rosenberg drosen@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/xattr.c | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-)
--- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -645,14 +645,20 @@ static size_t ext4_xattr_free_space(stru }
static int -ext4_xattr_set_entry(struct ext4_xattr_info *i, struct ext4_xattr_search *s) +ext4_xattr_set_entry(struct ext4_xattr_info *i, struct ext4_xattr_search *s, + struct inode *inode) { - struct ext4_xattr_entry *last; + struct ext4_xattr_entry *last, *next; size_t free, min_offs = s->end - s->base, name_len = strlen(i->name);
/* Compute min_offs and last. */ last = s->first; - for (; !IS_LAST_ENTRY(last); last = EXT4_XATTR_NEXT(last)) { + for (; !IS_LAST_ENTRY(last); last = next) { + next = EXT4_XATTR_NEXT(last); + if ((void *)next >= s->end) { + EXT4_ERROR_INODE(inode, "corrupted xattr entries"); + return -EIO; + } if (last->e_value_size) { size_t offs = le16_to_cpu(last->e_value_offs); if (offs < min_offs) @@ -834,7 +840,7 @@ ext4_xattr_block_set(handle_t *handle, s mb_cache_entry_delete_block(ext4_mb_cache, hash, bs->bh->b_blocknr); ea_bdebug(bs->bh, "modifying in-place"); - error = ext4_xattr_set_entry(i, s); + error = ext4_xattr_set_entry(i, s, inode); if (!error) { if (!IS_LAST_ENTRY(s->first)) ext4_xattr_rehash(header(s->base), @@ -881,7 +887,7 @@ ext4_xattr_block_set(handle_t *handle, s s->end = s->base + sb->s_blocksize; }
- error = ext4_xattr_set_entry(i, s); + error = ext4_xattr_set_entry(i, s, inode); if (error == -EFSCORRUPTED) goto bad_block; if (error) @@ -1079,7 +1085,7 @@ int ext4_xattr_ibody_inline_set(handle_t
if (EXT4_I(inode)->i_extra_isize == 0) return -ENOSPC; - error = ext4_xattr_set_entry(i, s); + error = ext4_xattr_set_entry(i, s, inode); if (error) { if (error == -ENOSPC && ext4_has_inline_data(inode)) { @@ -1091,7 +1097,7 @@ int ext4_xattr_ibody_inline_set(handle_t error = ext4_xattr_ibody_find(inode, i, is); if (error) return error; - error = ext4_xattr_set_entry(i, s); + error = ext4_xattr_set_entry(i, s, inode); } if (error) return error; @@ -1117,7 +1123,7 @@ static int ext4_xattr_ibody_set(handle_t
if (EXT4_I(inode)->i_extra_isize == 0) return -ENOSPC; - error = ext4_xattr_set_entry(i, s); + error = ext4_xattr_set_entry(i, s, inode); if (error) return error; header = IHDR(inode, ext4_raw_inode(&is->iloc));
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Theodore Ts'o tytso@mit.edu
commit 513f86d73855ce556ea9522b6bfd79f87356dc3a upstream.
If there an inode points to a block which is also some other type of metadata block (such as a block allocation bitmap), the buffer_verified flag can be set when it was validated as that other metadata block type; however, it would make a really terrible external attribute block. The reason why we use the verified flag is to avoid constantly reverifying the block. However, it doesn't take much overhead to make sure the magic number of the xattr block is correct, and this will avoid potential crashes.
This addresses CVE-2018-10879.
https://bugzilla.kernel.org/show_bug.cgi?id=200001
Signed-off-by: Theodore Ts'o tytso@mit.edu Reviewed-by: Andreas Dilger adilger@dilger.ca [Backported to 4.9: adjust context] Signed-off-by: Daniel Rosenberg drosen@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/xattr.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -209,12 +209,12 @@ ext4_xattr_check_block(struct inode *ino { int error;
- if (buffer_verified(bh)) - return 0; - if (BHDR(bh)->h_magic != cpu_to_le32(EXT4_XATTR_MAGIC) || BHDR(bh)->h_blocks != cpu_to_le32(1)) return -EFSCORRUPTED; + if (buffer_verified(bh)) + return 0; + if (!ext4_xattr_block_csum_verify(inode, bh)) return -EFSBADCRC; error = ext4_xattr_check_names(BFIRST(bh), bh->b_data + bh->b_size,
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Prateek Sood prsood@codeaurora.org
commit 116d2f7496c51b2e02e8e4ecdd2bdf5fb9d5a641 upstream.
Deadlock during cgroup migration from cpu hotplug path when a task T is being moved from source to destination cgroup.
kworker/0:0 cpuset_hotplug_workfn() cpuset_hotplug_update_tasks() hotplug_update_tasks_legacy() remove_tasks_in_empty_cpuset() cgroup_transfer_tasks() // stuck in iterator loop cgroup_migrate() cgroup_migrate_add_task()
In cgroup_migrate_add_task() it checks for PF_EXITING flag of task T. Task T will not migrate to destination cgroup. css_task_iter_start() will keep pointing to task T in loop waiting for task T cg_list node to be removed.
Task T do_exit() exit_signals() // sets PF_EXITING exit_task_namespaces() switch_task_namespaces() free_nsproxy() put_mnt_ns() drop_collected_mounts() namespace_unlock() synchronize_rcu() _synchronize_rcu_expedited() schedule_work() // on cpu0 low priority worker pool wait_event() // waiting for work item to execute
Task T inserted a work item in the worklist of cpu0 low priority worker pool. It is waiting for expedited grace period work item to execute. This work item will only be executed once kworker/0:0 complete execution of cpuset_hotplug_workfn().
kworker/0:0 ==> Task T ==>kworker/0:0
In case of PF_EXITING task being migrated from source to destination cgroup, migrate next available task in source cgroup.
Signed-off-by: Prateek Sood prsood@codeaurora.org Signed-off-by: Tejun Heo tj@kernel.org [AmitP: Upstream commit cherry-pick failed, so I picked the backported changes from CAF/msm-4.9 tree instead: https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/?id=49b74f169641...] Signed-off-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- This patch can be cleanly applied and build tested on 4.4.y and 3.18.y as well but I couldn't find it in msm-4.4 and msm-3.18 trees. So this patch is really untested on those stable trees. Build tested on 4.9.131, 4.4.159 and 3.18.123 for ARCH=arm/arm64 allmodconfig.
kernel/cgroup.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -4386,7 +4386,11 @@ int cgroup_transfer_tasks(struct cgroup */ do { css_task_iter_start(&from->self, &it); - task = css_task_iter_next(&it); + + do { + task = css_task_iter_next(&it); + } while (task && (task->flags & PF_EXITING)); + if (task) get_task_struct(task); css_task_iter_end(&it);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Carl Huang cjhuang@codeaurora.org
commit 9ef0f58ed7b4a55da4a64641d538e0d9e46579ac upstream.
The skb may be freed in tx completion context before trace_ath10k_wmi_cmd is called. This can be easily captured when KASAN(Kernel Address Sanitizer) is enabled. The fix is to move trace_ath10k_wmi_cmd before the send operation. As the ret has no meaning in trace_ath10k_wmi_cmd then, so remove this parameter too.
Signed-off-by: Carl Huang cjhuang@codeaurora.org Tested-by: Brian Norris briannorris@chromium.org Reviewed-by: Brian Norris briannorris@chromium.org Signed-off-by: Kalle Valo kvalo@codeaurora.org Signed-off-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/wireless/ath/ath10k/trace.h | 12 ++++-------- drivers/net/wireless/ath/ath10k/wmi.c | 2 +- 2 files changed, 5 insertions(+), 9 deletions(-)
--- a/drivers/net/wireless/ath/ath10k/trace.h +++ b/drivers/net/wireless/ath/ath10k/trace.h @@ -152,10 +152,9 @@ TRACE_EVENT(ath10k_log_dbg_dump, );
TRACE_EVENT(ath10k_wmi_cmd, - TP_PROTO(struct ath10k *ar, int id, const void *buf, size_t buf_len, - int ret), + TP_PROTO(struct ath10k *ar, int id, const void *buf, size_t buf_len),
- TP_ARGS(ar, id, buf, buf_len, ret), + TP_ARGS(ar, id, buf, buf_len),
TP_STRUCT__entry( __string(device, dev_name(ar->dev)) @@ -163,7 +162,6 @@ TRACE_EVENT(ath10k_wmi_cmd, __field(unsigned int, id) __field(size_t, buf_len) __dynamic_array(u8, buf, buf_len) - __field(int, ret) ),
TP_fast_assign( @@ -171,17 +169,15 @@ TRACE_EVENT(ath10k_wmi_cmd, __assign_str(driver, dev_driver_string(ar->dev)); __entry->id = id; __entry->buf_len = buf_len; - __entry->ret = ret; memcpy(__get_dynamic_array(buf), buf, buf_len); ),
TP_printk( - "%s %s id %d len %zu ret %d", + "%s %s id %d len %zu", __get_str(driver), __get_str(device), __entry->id, - __entry->buf_len, - __entry->ret + __entry->buf_len ) );
--- a/drivers/net/wireless/ath/ath10k/wmi.c +++ b/drivers/net/wireless/ath/ath10k/wmi.c @@ -1711,8 +1711,8 @@ int ath10k_wmi_cmd_send_nowait(struct at cmd_hdr->cmd_id = __cpu_to_le32(cmd);
memset(skb_cb, 0, sizeof(*skb_cb)); + trace_ath10k_wmi_cmd(ar, cmd_id, skb->data, skb->len); ret = ath10k_htc_send(&ar->htc, ar->wmi.eid, skb); - trace_ath10k_wmi_cmd(ar, cmd_id, skb->data, skb->len, ret);
if (ret) goto err_pull;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yu Wang yyuwang@codeaurora.org
commit 50e79e25250bf928369996277e85b00536b380c7 upstream.
If device gone during chip reset, ar->normal_mode_fw.board is not initialized, but ath10k_debug_print_hwfw_info() will try to access its member, which will cause 'kernel NULL pointer' issue. This was found using a faulty device (pci link went down sometimes) in a random insmod/rmmod/other-op test. To fix it, check ar->normal_mode_fw.board before accessing the member.
pci 0000:02:00.0: BAR 0: assigned [mem 0xf7400000-0xf75fffff 64bit] ath10k_pci 0000:02:00.0: enabling device (0000 -> 0002) ath10k_pci 0000:02:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0 ath10k_pci 0000:02:00.0: failed to read device register, device is gone ath10k_pci 0000:02:00.0: failed to wait for target init: -5 ath10k_pci 0000:02:00.0: failed to warm reset: -5 ath10k_pci 0000:02:00.0: firmware crashed during chip reset ath10k_pci 0000:02:00.0: firmware crashed! (uuid 5d018951-b8e1-404a-8fde-923078b4423a) ath10k_pci 0000:02:00.0: (null) target 0x00000000 chip_id 0x00340aff sub 0000:0000 ath10k_pci 0000:02:00.0: kconfig debug 1 debugfs 1 tracing 1 dfs 1 testmode 1 ath10k_pci 0000:02:00.0: firmware ver api 0 features crc32 00000000 ... BUG: unable to handle kernel NULL pointer dereference at 00000004 ... Call Trace: [<fb4e7882>] ath10k_print_driver_info+0x12/0x20 [ath10k_core] [<fb62b7dd>] ath10k_pci_fw_crashed_dump+0x6d/0x4d0 [ath10k_pci] [<fb629f07>] ? ath10k_pci_sleep.part.19+0x57/0xc0 [ath10k_pci] [<fb62c8ee>] ath10k_pci_hif_power_up+0x14e/0x1b0 [ath10k_pci] [<c10477fb>] ? do_page_fault+0xb/0x10 [<fb4eb934>] ath10k_core_register_work+0x24/0x840 [ath10k_core] [<c18a00d8>] ? netlbl_unlhsh_remove+0x178/0x410 [<c10477f0>] ? __do_page_fault+0x480/0x480 [<c1068e44>] process_one_work+0x114/0x3e0 [<c1069d07>] worker_thread+0x37/0x4a0 [<c106e294>] kthread+0xa4/0xc0 [<c1069cd0>] ? create_worker+0x180/0x180 [<c106e1f0>] ? kthread_park+0x50/0x50 [<c18ab4f7>] ret_from_fork+0x1b/0x28 Code: 78 80 b8 50 09 00 00 00 75 5d 8d 75 94 c7 44 24 08 aa d7 52 fb c7 44 24 04 64 00 00 00 89 34 24 e8 82 52 e2 c5 8b 83 dc 08 00 00 <8b> 50 04 8b 08 31 c0 e8 20 57 e3 c5 89 44 24 10 8b 83 58 09 00 EIP: [<fb4e7754>]- ath10k_debug_print_board_info+0x34/0xb0 [ath10k_core] SS:ESP 0068:f4921d90 CR2: 0000000000000004
Signed-off-by: Yu Wang yyuwang@codeaurora.org Signed-off-by: Kalle Valo kvalo@codeaurora.org [AmitP: Minor rebasing for 4.14.y and 4.9.y] Signed-off-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/wireless/ath/ath10k/debug.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
--- a/drivers/net/wireless/ath/ath10k/debug.c +++ b/drivers/net/wireless/ath/ath10k/debug.c @@ -1,6 +1,7 @@ /* * Copyright (c) 2005-2011 Atheros Communications Inc. * Copyright (c) 2011-2013 Qualcomm Atheros, Inc. + * Copyright (c) 2018, The Linux Foundation. All rights reserved. * * Permission to use, copy, modify, and/or distribute this software for any * purpose with or without fee is hereby granted, provided that the above @@ -161,6 +162,8 @@ void ath10k_debug_print_hwfw_info(struct void ath10k_debug_print_board_info(struct ath10k *ar) { char boardinfo[100]; + const struct firmware *board; + u32 crc;
if (ar->id.bmi_ids_valid) scnprintf(boardinfo, sizeof(boardinfo), "%d:%d", @@ -168,11 +171,16 @@ void ath10k_debug_print_board_info(struc else scnprintf(boardinfo, sizeof(boardinfo), "N/A");
+ board = ar->normal_mode_fw.board; + if (!IS_ERR_OR_NULL(board)) + crc = crc32_le(0, board->data, board->size); + else + crc = 0; + ath10k_info(ar, "board_file api %d bmi_id %s crc32 %08x", ar->bd_api, boardinfo, - crc32_le(0, ar->normal_mode_fw.board->data, - ar->normal_mode_fw.board->size)); + crc); }
void ath10k_debug_print_boot_info(struct ath10k *ar)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Suchanek msuchanek@suse.de
commit 98b8cd7f75643e0a442d7a4c1cef2c9d53b7e92b upstream.
- log an error message when registration fails and no error code listed in the switch is returned - translate the hv error code to posix error code and return it from fw_register - return the posix error code from fw_register to the process writing to sysfs - return EEXIST on re-registration - return success on deregistration when fadump is not registered - return ENODEV when no memory is reserved for fadump
Signed-off-by: Michal Suchanek msuchanek@suse.de Tested-by: Hari Bathini hbathini@linux.vnet.ibm.com [mpe: Use pr_err() to shrink the error print] Signed-off-by: Michael Ellerman mpe@ellerman.id.au Cc: Kleber Sacilotto de Souza kleber.souza@canonical.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/powerpc/kernel/fadump.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-)
--- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -365,9 +365,9 @@ static int __init early_fadump_reserve_m } early_param("fadump_reserve_mem", early_fadump_reserve_mem);
-static void register_fw_dump(struct fadump_mem_struct *fdm) +static int register_fw_dump(struct fadump_mem_struct *fdm) { - int rc; + int rc, err; unsigned int wait_time;
pr_debug("Registering for firmware-assisted kernel dump...\n"); @@ -384,7 +384,11 @@ static void register_fw_dump(struct fadu
} while (wait_time);
+ err = -EIO; switch (rc) { + default: + pr_err("Failed to register. Unknown Error(%d).\n", rc); + break; case -1: printk(KERN_ERR "Failed to register firmware-assisted kernel" " dump. Hardware Error(%d).\n", rc); @@ -392,18 +396,22 @@ static void register_fw_dump(struct fadu case -3: printk(KERN_ERR "Failed to register firmware-assisted kernel" " dump. Parameter Error(%d).\n", rc); + err = -EINVAL; break; case -9: printk(KERN_ERR "firmware-assisted kernel dump is already " " registered."); fw_dump.dump_registered = 1; + err = -EEXIST; break; case 0: printk(KERN_INFO "firmware-assisted kernel dump registration" " is successful\n"); fw_dump.dump_registered = 1; + err = 0; break; } + return err; }
void crash_fadump(struct pt_regs *regs, const char *str) @@ -1006,7 +1014,7 @@ static unsigned long init_fadump_header( return addr; }
-static void register_fadump(void) +static int register_fadump(void) { unsigned long addr; void *vaddr; @@ -1017,7 +1025,7 @@ static void register_fadump(void) * assisted dump. */ if (!fw_dump.reserve_dump_area_size) - return; + return -ENODEV;
ret = fadump_setup_crash_memory_ranges(); if (ret) @@ -1032,7 +1040,7 @@ static void register_fadump(void) fadump_create_elfcore_headers(vaddr);
/* register the future kernel dump with firmware. */ - register_fw_dump(&fdm); + return register_fw_dump(&fdm); }
static int fadump_unregister_dump(struct fadump_mem_struct *fdm) @@ -1218,7 +1226,6 @@ static ssize_t fadump_register_store(str switch (buf[0]) { case '0': if (fw_dump.dump_registered == 0) { - ret = -EINVAL; goto unlock_out; } /* Un-register Firmware-assisted dump */ @@ -1226,11 +1233,11 @@ static ssize_t fadump_register_store(str break; case '1': if (fw_dump.dump_registered == 1) { - ret = -EINVAL; + ret = -EEXIST; goto unlock_out; } /* Register Firmware-assisted dump */ - register_fadump(); + ret = register_fadump(); break; default: ret = -EINVAL;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vineet Gupta vgupta@synopsys.com
commit c58a584f05e35d1d4342923cd7aac07d9c3d3d16 upstream.
Per ARC TLS ABI, r25 is designated TP (thread pointer register). However so far kernel didn't do any special treatment, like setting up usermode r25, even for CLONE_SETTLS. We instead relied on libc runtime to do this, in say clone libc wrapper [1]. This was deliberate to keep kernel ABI agnostic (userspace could potentially change TP, specially for different ARC ISA say ARCompact vs. ARCv2 with different spare registers etc)
However userspace setting up r25, after clone syscall opens a race, if child is not scheduled and gets a signal instead. It starts off in userspace not in clone but in a signal handler and anything TP sepcific there such as pthread_self() fails which showed up with uClibc testsuite nptl/tst-kill6 [2]
Fix this by having kernel populate r25 to TP value. So this locks in ABI, but it was not going to change anyways, and fwiw is same for both ARCompact (arc700 core) and ARCvs (HS3x cores)
[1] https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/libc/sysdeps/linux/ar... [2] https://github.com/wbx-github/uclibc-ng-test/blob/master/test/nptl/tst-kill6...
Fixes: ARC STAR 9001378481 Cc: stable@vger.kernel.org Reported-by: Nikita Sobolev sobolev@synopsys.com Signed-off-by: Vineet Gupta vgupta@synopsys.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arc/kernel/process.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
--- a/arch/arc/kernel/process.c +++ b/arch/arc/kernel/process.c @@ -213,6 +213,26 @@ int copy_thread(unsigned long clone_flag task_thread_info(current)->thr_ptr; }
+ + /* + * setup usermode thread pointer #1: + * when child is picked by scheduler, __switch_to() uses @c_callee to + * populate usermode callee regs: this works (despite being in a kernel + * function) since special return path for child @ret_from_fork() + * ensures those regs are not clobbered all the way to RTIE to usermode + */ + c_callee->r25 = task_thread_info(p)->thr_ptr; + +#ifdef CONFIG_ARC_CURR_IN_REG + /* + * setup usermode thread pointer #2: + * however for this special use of r25 in kernel, __switch_to() sets + * r25 for kernel needs and only in the final return path is usermode + * r25 setup, from pt_regs->user_r25. So set that up as well + */ + c_regs->user_r25 = c_callee->r25; +#endif + return 0; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Feng Tang feng.tang@intel.com
commit 05ab1d8a4b36ee912b7087c6da127439ed0a903e upstream.
We met a kernel panic when enabling earlycon, which is due to the fixmap address of earlycon is not statically setup.
Currently the static fixmap setup in head_64.S only covers 2M virtual address space, while it actually could be in 4M space with different kernel configurations, e.g. when VSYSCALL emulation is disabled.
So increase the static space to 4M for now by defining FIXMAP_PMD_NUM to 2, and add a build time check to ensure that the fixmap is covered by the initial static page tables.
Fixes: 1ad83c858c7d ("x86_64,vsyscall: Make vsyscall emulation configurable") Suggested-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Feng Tang feng.tang@intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: kernel test robot rong.a.chen@intel.com Reviewed-by: Juergen Gross jgross@suse.com (Xen parts) Cc: H Peter Anvin hpa@linux.intel.com Cc: Peter Zijlstra peterz@infradead.org Cc: Michal Hocko mhocko@kernel.org Cc: Yinghai Lu yinghai@kernel.org Cc: Dave Hansen dave.hansen@intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Andy Lutomirsky luto@kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180920025828.23699-1-feng.tang@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/fixmap.h | 10 ++++++++++ arch/x86/include/asm/pgtable_64.h | 3 ++- arch/x86/kernel/head_64.S | 16 ++++++++++++---- arch/x86/mm/pgtable.c | 9 +++++++++ arch/x86/xen/mmu.c | 8 ++++++-- 5 files changed, 39 insertions(+), 7 deletions(-)
--- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -14,6 +14,16 @@ #ifndef _ASM_X86_FIXMAP_H #define _ASM_X86_FIXMAP_H
+/* + * Exposed to assembly code for setting up initial page tables. Cannot be + * calculated in assembly code (fixmap entries are an enum), but is sanity + * checked in the actual fixmap C code to make sure that the fixmap is + * covered fully. + */ +#define FIXMAP_PMD_NUM 2 +/* fixmap starts downwards from the 507th entry in level2_fixmap_pgt */ +#define FIXMAP_PMD_TOP 507 + #ifndef __ASSEMBLY__ #include <linux/kernel.h> #include <asm/acpi.h> --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -13,13 +13,14 @@ #include <asm/processor.h> #include <linux/bitops.h> #include <linux/threads.h> +#include <asm/fixmap.h>
extern pud_t level3_kernel_pgt[512]; extern pud_t level3_ident_pgt[512]; extern pmd_t level2_kernel_pgt[512]; extern pmd_t level2_fixmap_pgt[512]; extern pmd_t level2_ident_pgt[512]; -extern pte_t level1_fixmap_pgt[512]; +extern pte_t level1_fixmap_pgt[512 * FIXMAP_PMD_NUM]; extern pgd_t init_level4_pgt[];
#define swapper_pg_dir init_level4_pgt --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -23,6 +23,7 @@ #include "../entry/calling.h" #include <asm/export.h> #include <asm/nospec-branch.h> +#include <asm/fixmap.h>
#ifdef CONFIG_PARAVIRT #include <asm/asm-offsets.h> @@ -493,13 +494,20 @@ NEXT_PAGE(level2_kernel_pgt) KERNEL_IMAGE_SIZE/PMD_SIZE)
NEXT_PAGE(level2_fixmap_pgt) - .fill 506,8,0 - .quad level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE - /* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */ - .fill 5,8,0 + .fill (512 - 4 - FIXMAP_PMD_NUM),8,0 + pgtno = 0 + .rept (FIXMAP_PMD_NUM) + .quad level1_fixmap_pgt + (pgtno << PAGE_SHIFT) - __START_KERNEL_map \ + + _PAGE_TABLE; + pgtno = pgtno + 1 + .endr + /* 6 MB reserved space + a 2MB hole */ + .fill 4,8,0
NEXT_PAGE(level1_fixmap_pgt) + .rept (FIXMAP_PMD_NUM) .fill 512,8,0 + .endr
#undef PMDS
--- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -536,6 +536,15 @@ void __native_set_fixmap(enum fixed_addr { unsigned long address = __fix_to_virt(idx);
+#ifdef CONFIG_X86_64 + /* + * Ensure that the static initial page tables are covering the + * fixmap completely. + */ + BUILD_BUG_ON(__end_of_permanent_fixed_addresses > + (FIXMAP_PMD_NUM * PTRS_PER_PTE)); +#endif + if (idx >= __end_of_fixed_addresses) { BUG(); return; --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1936,7 +1936,7 @@ void __init xen_setup_kernel_pagetable(p * L3_k[511] -> level2_fixmap_pgt */ convert_pfn_mfn(level3_kernel_pgt);
- /* L3_k[511][506] -> level1_fixmap_pgt */ + /* L3_k[511][508-FIXMAP_PMD_NUM ... 507] -> level1_fixmap_pgt */ convert_pfn_mfn(level2_fixmap_pgt); } /* We get [511][511] and have Xen's version of level2_kernel_pgt */ @@ -1970,7 +1970,11 @@ void __init xen_setup_kernel_pagetable(p set_page_prot(level2_ident_pgt, PAGE_KERNEL_RO); set_page_prot(level2_kernel_pgt, PAGE_KERNEL_RO); set_page_prot(level2_fixmap_pgt, PAGE_KERNEL_RO); - set_page_prot(level1_fixmap_pgt, PAGE_KERNEL_RO); + + for (i = 0; i < FIXMAP_PMD_NUM; i++) { + set_page_prot(level1_fixmap_pgt + i * PTRS_PER_PTE, + PAGE_KERNEL_RO); + }
/* Pin down new L4 */ pin_pagetable_pfn(MMUEXT_PIN_L4_TABLE,
On Thu, 2018-10-11 at 17:35 +0200, Greg Kroah-Hartman wrote:
4.9-stable review patch. If anyone has any objections, please let me know.
From: Feng Tang feng.tang@intel.com
commit 05ab1d8a4b36ee912b7087c6da127439ed0a903e upstream.
This backport is incorrect. The part that updated __startup_64() in arch/x86/kernel/head64.c was dropped, presumably because that function doesn't exist in 4.9. However that seems to be an essential of the fix. In 4.9 the startup_64 routine in arch/x86/kernel/head_64.S would need to be changed instead.
I also found that this introduces new boot-time warnings on some systems if CONFIG_DEBUG_WX is enabled.
So, unless someone provides fixes for those issues, I think this should be reverted for the 4.9 branch.
Ben.
We met a kernel panic when enabling earlycon, which is due to the fixmap address of earlycon is not statically setup.
Currently the static fixmap setup in head_64.S only covers 2M virtual address space, while it actually could be in 4M space with different kernel configurations, e.g. when VSYSCALL emulation is disabled.
So increase the static space to 4M for now by defining FIXMAP_PMD_NUM to 2, and add a build time check to ensure that the fixmap is covered by the initial static page tables.
Fixes: 1ad83c858c7d ("x86_64,vsyscall: Make vsyscall emulation configurable") Suggested-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Feng Tang feng.tang@intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: kernel test robot rong.a.chen@intel.com Reviewed-by: Juergen Gross jgross@suse.com (Xen parts) Cc: H Peter Anvin hpa@linux.intel.com Cc: Peter Zijlstra peterz@infradead.org Cc: Michal Hocko mhocko@kernel.org Cc: Yinghai Lu yinghai@kernel.org Cc: Dave Hansen dave.hansen@intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Andy Lutomirsky luto@kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180920025828.23699-1-feng.tang@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
arch/x86/include/asm/fixmap.h | 10 ++++++++++ arch/x86/include/asm/pgtable_64.h | 3 ++- arch/x86/kernel/head_64.S | 16 ++++++++++++---- arch/x86/mm/pgtable.c | 9 +++++++++ arch/x86/xen/mmu.c | 8 ++++++-- 5 files changed, 39 insertions(+), 7 deletions(-)
--- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -14,6 +14,16 @@ #ifndef _ASM_X86_FIXMAP_H #define _ASM_X86_FIXMAP_H +/*
- Exposed to assembly code for setting up initial page tables. Cannot be
- calculated in assembly code (fixmap entries are an enum), but is sanity
- checked in the actual fixmap C code to make sure that the fixmap is
- covered fully.
- */
+#define FIXMAP_PMD_NUM 2 +/* fixmap starts downwards from the 507th entry in level2_fixmap_pgt */ +#define FIXMAP_PMD_TOP 507
#ifndef __ASSEMBLY__ #include <linux/kernel.h> #include <asm/acpi.h> --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -13,13 +13,14 @@ #include <asm/processor.h> #include <linux/bitops.h> #include <linux/threads.h> +#include <asm/fixmap.h> extern pud_t level3_kernel_pgt[512]; extern pud_t level3_ident_pgt[512]; extern pmd_t level2_kernel_pgt[512]; extern pmd_t level2_fixmap_pgt[512]; extern pmd_t level2_ident_pgt[512]; -extern pte_t level1_fixmap_pgt[512]; +extern pte_t level1_fixmap_pgt[512 * FIXMAP_PMD_NUM]; extern pgd_t init_level4_pgt[]; #define swapper_pg_dir init_level4_pgt --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -23,6 +23,7 @@ #include "../entry/calling.h" #include <asm/export.h> #include <asm/nospec-branch.h> +#include <asm/fixmap.h> #ifdef CONFIG_PARAVIRT #include <asm/asm-offsets.h> @@ -493,13 +494,20 @@ NEXT_PAGE(level2_kernel_pgt) KERNEL_IMAGE_SIZE/PMD_SIZE) NEXT_PAGE(level2_fixmap_pgt)
- .fill 506,8,0
- .quad level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
- /* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
- .fill 5,8,0
- .fill (512 - 4 - FIXMAP_PMD_NUM),8,0
- pgtno = 0
- .rept (FIXMAP_PMD_NUM)
- .quad level1_fixmap_pgt + (pgtno << PAGE_SHIFT) - __START_KERNEL_map \
+ _PAGE_TABLE;
- pgtno = pgtno + 1
- .endr
- /* 6 MB reserved space + a 2MB hole */
- .fill 4,8,0
NEXT_PAGE(level1_fixmap_pgt)
- .rept (FIXMAP_PMD_NUM) .fill 512,8,0
- .endr
#undef PMDS --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -536,6 +536,15 @@ void __native_set_fixmap(enum fixed_addr { unsigned long address = __fix_to_virt(idx); +#ifdef CONFIG_X86_64
/*
- Ensure that the static initial page tables are covering the
- fixmap completely.
- */
- BUILD_BUG_ON(__end_of_permanent_fixed_addresses >
(FIXMAP_PMD_NUM * PTRS_PER_PTE));
+#endif
- if (idx >= __end_of_fixed_addresses) { BUG(); return;
--- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1936,7 +1936,7 @@ void __init xen_setup_kernel_pagetable(p * L3_k[511] -> level2_fixmap_pgt */ convert_pfn_mfn(level3_kernel_pgt);
/* L3_k[511][506] -> level1_fixmap_pgt */
convert_pfn_mfn(level2_fixmap_pgt); } /* We get [511][511] and have Xen's version of level2_kernel_pgt *//* L3_k[511][508-FIXMAP_PMD_NUM ... 507] -> level1_fixmap_pgt */
@@ -1970,7 +1970,11 @@ void __init xen_setup_kernel_pagetable(p set_page_prot(level2_ident_pgt, PAGE_KERNEL_RO); set_page_prot(level2_kernel_pgt, PAGE_KERNEL_RO); set_page_prot(level2_fixmap_pgt, PAGE_KERNEL_RO);
set_page_prot(level1_fixmap_pgt, PAGE_KERNEL_RO);
for (i = 0; i < FIXMAP_PMD_NUM; i++) {
set_page_prot(level1_fixmap_pgt + i * PTRS_PER_PTE,
PAGE_KERNEL_RO);
}
/* Pin down new L4 */ pin_pagetable_pfn(MMUEXT_PIN_L4_TABLE,
Hi Ben,
On Thu, Nov 01, 2018 at 10:25:43PM +0000, Ben Hutchings wrote:
On Thu, 2018-10-11 at 17:35 +0200, Greg Kroah-Hartman wrote:
4.9-stable review patch. If anyone has any objections, please let me know.
From: Feng Tang feng.tang@intel.com
commit 05ab1d8a4b36ee912b7087c6da127439ed0a903e upstream.
This backport is incorrect. The part that updated __startup_64() in arch/x86/kernel/head64.c was dropped, presumably because that function doesn't exist in 4.9. However that seems to be an essential of the fix. In 4.9 the startup_64 routine in arch/x86/kernel/head_64.S would need to be changed instead.
I also found that this introduces new boot-time warnings on some systems if CONFIG_DEBUG_WX is enabled.
So, unless someone provides fixes for those issues, I think this should be reverted for the 4.9 branch.
Thanks for the catch, I'm fine with the revert for now.
- Feng
On Fri, Nov 02, 2018 at 11:38:13AM +0800, Feng Tang wrote:
Hi Ben,
On Thu, Nov 01, 2018 at 10:25:43PM +0000, Ben Hutchings wrote:
On Thu, 2018-10-11 at 17:35 +0200, Greg Kroah-Hartman wrote:
4.9-stable review patch. If anyone has any objections, please let me know.
From: Feng Tang feng.tang@intel.com
commit 05ab1d8a4b36ee912b7087c6da127439ed0a903e upstream.
This backport is incorrect. The part that updated __startup_64() in arch/x86/kernel/head64.c was dropped, presumably because that function doesn't exist in 4.9. However that seems to be an essential of the fix. In 4.9 the startup_64 routine in arch/x86/kernel/head_64.S would need to be changed instead.
I also found that this introduces new boot-time warnings on some systems if CONFIG_DEBUG_WX is enabled.
So, unless someone provides fixes for those issues, I think this should be reverted for the 4.9 branch.
Thanks for the catch, I'm fine with the revert for now.
I've queued the revert, thank you.
-- Thanks, Sasha
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chao Yu yuchao0@huawei.com
commit d3f07c049dab1a3f1740f476afd3d5e5b738c21c upstream.
syzbot found the following crash on:
HEAD commit: d9bd94c0bcaa Add linux-next specific files for 20180801 git tree: linux-next console output: https://syzkaller.appspot.com/x/log.txt?x=1001189c400000 kernel config: https://syzkaller.appspot.com/x/.config?x=cc8964ea4d04518c dashboard link: https://syzkaller.appspot.com/bug?extid=c966a82db0b14aa37e81 compiler: gcc (GCC) 8.0.1 20180413 (experimental)
Unfortunately, I don't have any reproducer for this crash yet.
IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+c966a82db0b14aa37e81@syzkaller.appspotmail.com
loop7: rw=12288, want=8200, limit=20 netlink: 65342 bytes leftover after parsing attributes in process `syz-executor4'. openvswitch: netlink: Message has 8 unknown bytes. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN CPU: 1 PID: 7615 Comm: syz-executor7 Not tainted 4.18.0-rc7-next-20180801+ #29 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__read_once_size include/linux/compiler.h:188 [inline] RIP: 0010:compound_head include/linux/page-flags.h:142 [inline] RIP: 0010:PageLocked include/linux/page-flags.h:272 [inline] RIP: 0010:f2fs_put_page fs/f2fs/f2fs.h:2011 [inline] RIP: 0010:validate_checkpoint+0x66d/0xec0 fs/f2fs/checkpoint.c:835 Code: e8 58 05 7f fe 4c 8d 6b 80 4d 8d 74 24 08 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 c6 04 02 00 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 f4 06 00 00 4c 89 ea 4d 8b 7c 24 08 48 b8 00 00 RSP: 0018:ffff8801937cebe8 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff8801937cef30 RCX: ffffc90006035000 RDX: 0000000000000000 RSI: ffffffff82fd9658 RDI: 0000000000000005 RBP: ffff8801937cef58 R08: ffff8801ab254700 R09: fffff94000d9e026 R10: fffff94000d9e026 R11: ffffea0006cf0137 R12: fffffffffffffffb R13: ffff8801937ceeb0 R14: 0000000000000003 R15: ffff880193419b40 FS: 00007f36a61d5700(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc04ff93000 CR3: 00000001d0562000 CR4: 00000000001426e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: f2fs_get_valid_checkpoint+0x436/0x1ec0 fs/f2fs/checkpoint.c:860 f2fs_fill_super+0x2d42/0x8110 fs/f2fs/super.c:2883 mount_bdev+0x314/0x3e0 fs/super.c:1344 f2fs_mount+0x3c/0x50 fs/f2fs/super.c:3133 legacy_get_tree+0x131/0x460 fs/fs_context.c:729 vfs_get_tree+0x1cb/0x5c0 fs/super.c:1743 do_new_mount fs/namespace.c:2603 [inline] do_mount+0x6f2/0x1e20 fs/namespace.c:2927 ksys_mount+0x12d/0x140 fs/namespace.c:3143 __do_sys_mount fs/namespace.c:3157 [inline] __se_sys_mount fs/namespace.c:3154 [inline] __x64_sys_mount+0xbe/0x150 fs/namespace.c:3154 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x45943a Code: b8 a6 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 bd 8a fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 9a 8a fb ff c3 66 0f 1f 84 00 00 00 00 00 RSP: 002b:00007f36a61d4a88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 00007f36a61d4b30 RCX: 000000000045943a RDX: 00007f36a61d4ad0 RSI: 0000000020000100 RDI: 00007f36a61d4af0 RBP: 0000000020000100 R08: 00007f36a61d4b30 R09: 00007f36a61d4ad0 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000013 R13: 0000000000000000 R14: 00000000004c8ea0 R15: 0000000000000000 Modules linked in: Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace bd8550c129352286 ]--- RIP: 0010:__read_once_size include/linux/compiler.h:188 [inline] RIP: 0010:compound_head include/linux/page-flags.h:142 [inline] RIP: 0010:PageLocked include/linux/page-flags.h:272 [inline] RIP: 0010:f2fs_put_page fs/f2fs/f2fs.h:2011 [inline] RIP: 0010:validate_checkpoint+0x66d/0xec0 fs/f2fs/checkpoint.c:835 Code: e8 58 05 7f fe 4c 8d 6b 80 4d 8d 74 24 08 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 c6 04 02 00 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 f4 06 00 00 4c 89 ea 4d 8b 7c 24 08 48 b8 00 00 RSP: 0018:ffff8801937cebe8 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff8801937cef30 RCX: ffffc90006035000 RDX: 0000000000000000 RSI: ffffffff82fd9658 RDI: 0000000000000005 netlink: 65342 bytes leftover after parsing attributes in process `syz-executor4'. RBP: ffff8801937cef58 R08: ffff8801ab254700 R09: fffff94000d9e026 openvswitch: netlink: Message has 8 unknown bytes. R10: fffff94000d9e026 R11: ffffea0006cf0137 R12: fffffffffffffffb R13: ffff8801937ceeb0 R14: 0000000000000003 R15: ffff880193419b40 FS: 00007f36a61d5700(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc04ff93000 CR3: 00000001d0562000 CR4: 00000000001426e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
In validate_checkpoint(), if we failed to call get_checkpoint_version(), we will pass returned invalid page pointer into f2fs_put_page, cause accessing invalid memory, this patch tries to handle error path correctly to fix this issue.
Signed-off-by: Chao Yu yuchao0@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org
--- fs/f2fs/checkpoint.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
--- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -676,6 +676,7 @@ static int get_checkpoint_version(struct
crc_offset = le32_to_cpu((*cp_block)->checksum_offset); if (crc_offset >= blk_size) { + f2fs_put_page(*cp_page, 1); f2fs_msg(sbi->sb, KERN_WARNING, "invalid crc_offset: %zu", crc_offset); return -EINVAL; @@ -684,6 +685,7 @@ static int get_checkpoint_version(struct crc = le32_to_cpu(*((__le32 *)((unsigned char *)*cp_block + crc_offset))); if (!f2fs_crc_valid(sbi, crc, *cp_block, crc_offset)) { + f2fs_put_page(*cp_page, 1); f2fs_msg(sbi->sb, KERN_WARNING, "invalid crc value"); return -EINVAL; } @@ -703,14 +705,14 @@ static struct page *validate_checkpoint( err = get_checkpoint_version(sbi, cp_addr, &cp_block, &cp_page_1, version); if (err) - goto invalid_cp1; + return NULL; pre_version = *version;
cp_addr += le32_to_cpu(cp_block->cp_pack_total_block_count) - 1; err = get_checkpoint_version(sbi, cp_addr, &cp_block, &cp_page_2, version); if (err) - goto invalid_cp2; + goto invalid_cp; cur_version = *version;
if (cur_version == pre_version) { @@ -718,9 +720,8 @@ static struct page *validate_checkpoint( f2fs_put_page(cp_page_2, 1); return cp_page_1; } -invalid_cp2: f2fs_put_page(cp_page_2, 1); -invalid_cp1: +invalid_cp: f2fs_put_page(cp_page_1, 1); return NULL; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang xiyou.wangcong@gmail.com
commit 5fe23f262e0548ca7f19fb79f89059a60d087d22 upstream.
There is a race condition between ucma_close() and ucma_resolve_ip():
CPU0 CPU1 ucma_resolve_ip(): ucma_close():
ctx = ucma_get_ctx(file, cmd.id);
list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) { mutex_lock(&mut); idr_remove(&ctx_idr, ctx->id); mutex_unlock(&mut); ... mutex_lock(&mut); if (!ctx->closing) { mutex_unlock(&mut); rdma_destroy_id(ctx->cm_id); ... ucma_free_ctx(ctx);
ret = rdma_resolve_addr(); ucma_put_ctx(ctx);
Before idr_remove(), ucma_get_ctx() could still find the ctx and after rdma_destroy_id(), rdma_resolve_addr() may still access id_priv pointer. Also, ucma_put_ctx() may use ctx after ucma_free_ctx() too.
ucma_close() should call ucma_put_ctx() too which tests the refcnt and waits for the last one releasing it. The similar pattern is already used by ucma_destroy_id().
Reported-and-tested-by: syzbot+da2591e115d57a9cbb8b@syzkaller.appspotmail.com Reported-by: syzbot+cfe3c1e8ef634ba8964b@syzkaller.appspotmail.com Cc: Jason Gunthorpe jgg@mellanox.com Cc: Doug Ledford dledford@redhat.com Cc: Leon Romanovsky leon@kernel.org Signed-off-by: Cong Wang xiyou.wangcong@gmail.com Reviewed-by: Leon Romanovsky leonro@mellanox.com Signed-off-by: Doug Ledford dledford@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/core/ucma.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -1720,6 +1720,8 @@ static int ucma_close(struct inode *inod mutex_lock(&mut); if (!ctx->closing) { mutex_unlock(&mut); + ucma_put_ctx(ctx); + wait_for_completion(&ctx->comp); /* rdma_destroy_id ensures that no event handlers are * inflight for that id before releasing it. */
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Richard Weinberger richard@nod.at
commit 37f31b6ca4311b94d985fb398a72e5399ad57925 upstream.
The requested device name can be NULL or an empty string. Check for that and refuse to continue. UBIFS has to do this manually since we cannot use mount_bdev(), which checks for this condition.
Fixes: 1e51764a3c2ac ("UBIFS: add new flash file system") Reported-by: syzbot+38bd0f7865e5c6379280@syzkaller.appspotmail.com Signed-off-by: Richard Weinberger richard@nod.at Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ubifs/super.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/fs/ubifs/super.c +++ b/fs/ubifs/super.c @@ -1918,6 +1918,9 @@ static struct ubi_volume_desc *open_ubi( int dev, vol; char *endptr;
+ if (!name || !*name) + return ERR_PTR(-EINVAL); + /* First, try to open using the device node path method */ ubi = ubi_open_volume_path(name, mode); if (!IS_ERR(ubi))
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frederic Weisbecker fweisbec@gmail.com
commit 7fb1327ee9b92fca27662f9b9d60c7c3376d6c69 upstream.
Kernel CPU stats are stored in cputime_t which is an architecture defined type, and hence a bit opaque and requiring accessors and mutators for any operation.
Converting them to nsecs simplifies the code and is one step toward the removal of cputime_t in the core code.
Signed-off-by: Frederic Weisbecker fweisbec@gmail.com Cc: Benjamin Herrenschmidt benh@kernel.crashing.org Cc: Paul Mackerras paulus@samba.org Cc: Michael Ellerman mpe@ellerman.id.au Cc: Heiko Carstens heiko.carstens@de.ibm.com Cc: Martin Schwidefsky schwidefsky@de.ibm.com Cc: Tony Luck tony.luck@intel.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: Peter Zijlstra peterz@infradead.org Cc: Rik van Riel riel@redhat.com Cc: Stanislaw Gruszka sgruszka@redhat.com Cc: Wanpeng Li wanpeng.li@hotmail.com Link: http://lkml.kernel.org/r/1485832191-26889-4-git-send-email-fweisbec@gmail.co... Signed-off-by: Ingo Molnar mingo@kernel.org [colona: minor conflict as 527b0a76f41d ("sched/cpuacct: Avoid %lld seq_printf warning") is missing from v4.9] Signed-off-by: Ivan Delalande colona@arista.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/appldata/appldata_os.c | 16 ++++---- drivers/cpufreq/cpufreq.c | 6 +-- drivers/cpufreq/cpufreq_governor.c | 2 - drivers/cpufreq/cpufreq_stats.c | 1 drivers/macintosh/rack-meter.c | 2 - fs/proc/stat.c | 68 ++++++++++++++++++------------------- fs/proc/uptime.c | 7 +-- kernel/sched/cpuacct.c | 2 - kernel/sched/cputime.c | 22 +++++------ 9 files changed, 61 insertions(+), 65 deletions(-)
--- a/arch/s390/appldata/appldata_os.c +++ b/arch/s390/appldata/appldata_os.c @@ -113,21 +113,21 @@ static void appldata_get_os_data(void *d j = 0; for_each_online_cpu(i) { os_data->os_cpu[j].per_cpu_user = - cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_USER]); + nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_USER]); os_data->os_cpu[j].per_cpu_nice = - cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_NICE]); + nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_NICE]); os_data->os_cpu[j].per_cpu_system = - cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_SYSTEM]); + nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_SYSTEM]); os_data->os_cpu[j].per_cpu_idle = - cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IDLE]); + nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IDLE]); os_data->os_cpu[j].per_cpu_irq = - cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IRQ]); + nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IRQ]); os_data->os_cpu[j].per_cpu_softirq = - cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_SOFTIRQ]); + nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_SOFTIRQ]); os_data->os_cpu[j].per_cpu_iowait = - cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IOWAIT]); + nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_IOWAIT]); os_data->os_cpu[j].per_cpu_steal = - cputime_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_STEAL]); + nsecs_to_jiffies(kcpustat_cpu(i).cpustat[CPUTIME_STEAL]); os_data->os_cpu[j].cpu_id = i; j++; } --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -132,7 +132,7 @@ static inline u64 get_cpu_idle_time_jiff u64 cur_wall_time; u64 busy_time;
- cur_wall_time = jiffies64_to_cputime64(get_jiffies_64()); + cur_wall_time = jiffies64_to_nsecs(get_jiffies_64());
busy_time = kcpustat_cpu(cpu).cpustat[CPUTIME_USER]; busy_time += kcpustat_cpu(cpu).cpustat[CPUTIME_SYSTEM]; @@ -143,9 +143,9 @@ static inline u64 get_cpu_idle_time_jiff
idle_time = cur_wall_time - busy_time; if (wall) - *wall = cputime_to_usecs(cur_wall_time); + *wall = div_u64(cur_wall_time, NSEC_PER_USEC);
- return cputime_to_usecs(idle_time); + return div_u64(idle_time, NSEC_PER_USEC); }
u64 get_cpu_idle_time(unsigned int cpu, u64 *wall, int io_busy) --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -152,7 +152,7 @@ unsigned int dbs_update(struct cpufreq_p if (ignore_nice) { u64 cur_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
- idle_time += cputime_to_usecs(cur_nice - j_cdbs->prev_cpu_nice); + idle_time += div_u64(cur_nice - j_cdbs->prev_cpu_nice, NSEC_PER_USEC); j_cdbs->prev_cpu_nice = cur_nice; }
--- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -13,7 +13,6 @@ #include <linux/cpufreq.h> #include <linux/module.h> #include <linux/slab.h> -#include <linux/cputime.h>
static DEFINE_SPINLOCK(cpufreq_stats_lock);
--- a/drivers/macintosh/rack-meter.c +++ b/drivers/macintosh/rack-meter.c @@ -91,7 +91,7 @@ static inline cputime64_t get_cpu_idle_t if (rackmeter_ignore_nice) retval += kcpustat_cpu(cpu).cpustat[CPUTIME_NICE];
- return retval; + return nsecs_to_cputime64(retval); }
static void rackmeter_setup_i2s(struct rackmeter *rm) --- a/fs/proc/stat.c +++ b/fs/proc/stat.c @@ -21,23 +21,23 @@
#ifdef arch_idle_time
-static cputime64_t get_idle_time(int cpu) +static u64 get_idle_time(int cpu) { - cputime64_t idle; + u64 idle;
idle = kcpustat_cpu(cpu).cpustat[CPUTIME_IDLE]; if (cpu_online(cpu) && !nr_iowait_cpu(cpu)) - idle += arch_idle_time(cpu); + idle += cputime_to_nsecs(arch_idle_time(cpu)); return idle; }
-static cputime64_t get_iowait_time(int cpu) +static u64 get_iowait_time(int cpu) { - cputime64_t iowait; + u64 iowait;
iowait = kcpustat_cpu(cpu).cpustat[CPUTIME_IOWAIT]; if (cpu_online(cpu) && nr_iowait_cpu(cpu)) - iowait += arch_idle_time(cpu); + iowait += cputime_to_nsecs(arch_idle_time(cpu)); return iowait; }
@@ -45,32 +45,32 @@ static cputime64_t get_iowait_time(int c
static u64 get_idle_time(int cpu) { - u64 idle, idle_time = -1ULL; + u64 idle, idle_usecs = -1ULL;
if (cpu_online(cpu)) - idle_time = get_cpu_idle_time_us(cpu, NULL); + idle_usecs = get_cpu_idle_time_us(cpu, NULL);
- if (idle_time == -1ULL) + if (idle_usecs == -1ULL) /* !NO_HZ or cpu offline so we can rely on cpustat.idle */ idle = kcpustat_cpu(cpu).cpustat[CPUTIME_IDLE]; else - idle = usecs_to_cputime64(idle_time); + idle = idle_usecs * NSEC_PER_USEC;
return idle; }
static u64 get_iowait_time(int cpu) { - u64 iowait, iowait_time = -1ULL; + u64 iowait, iowait_usecs = -1ULL;
if (cpu_online(cpu)) - iowait_time = get_cpu_iowait_time_us(cpu, NULL); + iowait_usecs = get_cpu_iowait_time_us(cpu, NULL);
- if (iowait_time == -1ULL) + if (iowait_usecs == -1ULL) /* !NO_HZ or cpu offline so we can rely on cpustat.iowait */ iowait = kcpustat_cpu(cpu).cpustat[CPUTIME_IOWAIT]; else - iowait = usecs_to_cputime64(iowait_time); + iowait = iowait_usecs * NSEC_PER_USEC;
return iowait; } @@ -115,16 +115,16 @@ static int show_stat(struct seq_file *p, } sum += arch_irq_stat();
- seq_put_decimal_ull(p, "cpu ", cputime64_to_clock_t(user)); - seq_put_decimal_ull(p, " ", cputime64_to_clock_t(nice)); - seq_put_decimal_ull(p, " ", cputime64_to_clock_t(system)); - seq_put_decimal_ull(p, " ", cputime64_to_clock_t(idle)); - seq_put_decimal_ull(p, " ", cputime64_to_clock_t(iowait)); - seq_put_decimal_ull(p, " ", cputime64_to_clock_t(irq)); - seq_put_decimal_ull(p, " ", cputime64_to_clock_t(softirq)); - seq_put_decimal_ull(p, " ", cputime64_to_clock_t(steal)); - seq_put_decimal_ull(p, " ", cputime64_to_clock_t(guest)); - seq_put_decimal_ull(p, " ", cputime64_to_clock_t(guest_nice)); + seq_put_decimal_ull(p, "cpu ", nsec_to_clock_t(user)); + seq_put_decimal_ull(p, " ", nsec_to_clock_t(nice)); + seq_put_decimal_ull(p, " ", nsec_to_clock_t(system)); + seq_put_decimal_ull(p, " ", nsec_to_clock_t(idle)); + seq_put_decimal_ull(p, " ", nsec_to_clock_t(iowait)); + seq_put_decimal_ull(p, " ", nsec_to_clock_t(irq)); + seq_put_decimal_ull(p, " ", nsec_to_clock_t(softirq)); + seq_put_decimal_ull(p, " ", nsec_to_clock_t(steal)); + seq_put_decimal_ull(p, " ", nsec_to_clock_t(guest)); + seq_put_decimal_ull(p, " ", nsec_to_clock_t(guest_nice)); seq_putc(p, '\n');
for_each_online_cpu(i) { @@ -140,16 +140,16 @@ static int show_stat(struct seq_file *p, guest = kcpustat_cpu(i).cpustat[CPUTIME_GUEST]; guest_nice = kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE]; seq_printf(p, "cpu%d", i); - seq_put_decimal_ull(p, " ", cputime64_to_clock_t(user)); - seq_put_decimal_ull(p, " ", cputime64_to_clock_t(nice)); - seq_put_decimal_ull(p, " ", cputime64_to_clock_t(system)); - seq_put_decimal_ull(p, " ", cputime64_to_clock_t(idle)); - seq_put_decimal_ull(p, " ", cputime64_to_clock_t(iowait)); - seq_put_decimal_ull(p, " ", cputime64_to_clock_t(irq)); - seq_put_decimal_ull(p, " ", cputime64_to_clock_t(softirq)); - seq_put_decimal_ull(p, " ", cputime64_to_clock_t(steal)); - seq_put_decimal_ull(p, " ", cputime64_to_clock_t(guest)); - seq_put_decimal_ull(p, " ", cputime64_to_clock_t(guest_nice)); + seq_put_decimal_ull(p, " ", nsec_to_clock_t(user)); + seq_put_decimal_ull(p, " ", nsec_to_clock_t(nice)); + seq_put_decimal_ull(p, " ", nsec_to_clock_t(system)); + seq_put_decimal_ull(p, " ", nsec_to_clock_t(idle)); + seq_put_decimal_ull(p, " ", nsec_to_clock_t(iowait)); + seq_put_decimal_ull(p, " ", nsec_to_clock_t(irq)); + seq_put_decimal_ull(p, " ", nsec_to_clock_t(softirq)); + seq_put_decimal_ull(p, " ", nsec_to_clock_t(steal)); + seq_put_decimal_ull(p, " ", nsec_to_clock_t(guest)); + seq_put_decimal_ull(p, " ", nsec_to_clock_t(guest_nice)); seq_putc(p, '\n'); } seq_put_decimal_ull(p, "intr ", (unsigned long long)sum); --- a/fs/proc/uptime.c +++ b/fs/proc/uptime.c @@ -5,23 +5,20 @@ #include <linux/seq_file.h> #include <linux/time.h> #include <linux/kernel_stat.h> -#include <linux/cputime.h>
static int uptime_proc_show(struct seq_file *m, void *v) { struct timespec uptime; struct timespec idle; - u64 idletime; u64 nsec; u32 rem; int i;
- idletime = 0; + nsec = 0; for_each_possible_cpu(i) - idletime += (__force u64) kcpustat_cpu(i).cpustat[CPUTIME_IDLE]; + nsec += (__force u64) kcpustat_cpu(i).cpustat[CPUTIME_IDLE];
get_monotonic_boottime(&uptime); - nsec = cputime64_to_jiffies64(idletime) * TICK_NSEC; idle.tv_sec = div_u64_rem(nsec, NSEC_PER_SEC, &rem); idle.tv_nsec = rem; seq_printf(m, "%lu.%02lu %lu.%02lu\n", --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -297,7 +297,7 @@ static int cpuacct_stats_show(struct seq for (stat = 0; stat < CPUACCT_STAT_NSTATS; stat++) { seq_printf(sf, "%s %lld\n", cpuacct_stat_desc[stat], - cputime64_to_clock_t(val[stat])); + nsec_to_clock_t(val[stat])); }
return 0; --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -75,9 +75,9 @@ static cputime_t irqtime_account_update( u64 *cpustat = kcpustat_this_cpu->cpustat; cputime_t irq_cputime;
- irq_cputime = nsecs_to_cputime64(irqtime) - cpustat[idx]; + irq_cputime = nsecs_to_cputime64(irqtime - cpustat[idx]); irq_cputime = min(irq_cputime, maxtime); - cpustat[idx] += irq_cputime; + cpustat[idx] += cputime_to_nsecs(irq_cputime);
return irq_cputime; } @@ -143,7 +143,7 @@ void account_user_time(struct task_struc index = (task_nice(p) > 0) ? CPUTIME_NICE : CPUTIME_USER;
/* Add user time to cpustat. */ - task_group_account_field(p, index, (__force u64) cputime); + task_group_account_field(p, index, cputime_to_nsecs(cputime));
/* Account for user time used */ acct_account_cputime(p); @@ -168,11 +168,11 @@ static void account_guest_time(struct ta
/* Add guest time to cpustat. */ if (task_nice(p) > 0) { - cpustat[CPUTIME_NICE] += (__force u64) cputime; - cpustat[CPUTIME_GUEST_NICE] += (__force u64) cputime; + cpustat[CPUTIME_NICE] += cputime_to_nsecs(cputime); + cpustat[CPUTIME_GUEST_NICE] += cputime_to_nsecs(cputime); } else { - cpustat[CPUTIME_USER] += (__force u64) cputime; - cpustat[CPUTIME_GUEST] += (__force u64) cputime; + cpustat[CPUTIME_USER] += cputime_to_nsecs(cputime); + cpustat[CPUTIME_GUEST] += cputime_to_nsecs(cputime); } }
@@ -193,7 +193,7 @@ void __account_system_time(struct task_s account_group_system_time(p, cputime);
/* Add system time to cpustat. */ - task_group_account_field(p, index, (__force u64) cputime); + task_group_account_field(p, index, cputime_to_nsecs(cputime));
/* Account for system time used */ acct_account_cputime(p); @@ -234,7 +234,7 @@ void account_steal_time(cputime_t cputim { u64 *cpustat = kcpustat_this_cpu->cpustat;
- cpustat[CPUTIME_STEAL] += (__force u64) cputime; + cpustat[CPUTIME_STEAL] += cputime_to_nsecs(cputime); }
/* @@ -247,9 +247,9 @@ void account_idle_time(cputime_t cputime struct rq *rq = this_rq();
if (atomic_read(&rq->nr_iowait) > 0) - cpustat[CPUTIME_IOWAIT] += (__force u64) cputime; + cpustat[CPUTIME_IOWAIT] += cputime_to_nsecs(cputime); else - cpustat[CPUTIME_IDLE] += (__force u64) cputime; + cpustat[CPUTIME_IDLE] += cputime_to_nsecs(cputime); }
/*
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frederic Weisbecker fweisbec@gmail.com
commit a499a5a14dbd1d0315a96fc62a8798059325e9e6 upstream.
The irqtime is accounted is nsecs and stored in cpu_irq_time.hardirq_time and cpu_irq_time.softirq_time. Once the accumulated amount reaches a new jiffy, this one gets accounted to the kcpustat.
This was necessary when kcpustat was stored in cputime_t, which could at worst have jiffies granularity. But now kcpustat is stored in nsecs so this whole discretization game with temporary irqtime storage has become unnecessary.
We can now directly account the irqtime to the kcpustat.
Signed-off-by: Frederic Weisbecker fweisbec@gmail.com Cc: Benjamin Herrenschmidt benh@kernel.crashing.org Cc: Fenghua Yu fenghua.yu@intel.com Cc: Heiko Carstens heiko.carstens@de.ibm.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Martin Schwidefsky schwidefsky@de.ibm.com Cc: Michael Ellerman mpe@ellerman.id.au Cc: Paul Mackerras paulus@samba.org Cc: Peter Zijlstra peterz@infradead.org Cc: Rik van Riel riel@redhat.com Cc: Stanislaw Gruszka sgruszka@redhat.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Tony Luck tony.luck@intel.com Cc: Wanpeng Li wanpeng.li@hotmail.com Link: http://lkml.kernel.org/r/1485832191-26889-17-git-send-email-fweisbec@gmail.c... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Ivan Delalande colona@arista.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/sched/cputime.c | 50 ++++++++++++++++--------------------------------- kernel/sched/sched.h | 7 +++--- 2 files changed, 21 insertions(+), 36 deletions(-)
--- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -44,6 +44,7 @@ void disable_sched_clock_irqtime(void) void irqtime_account_irq(struct task_struct *curr) { struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime); + u64 *cpustat = kcpustat_this_cpu->cpustat; s64 delta; int cpu;
@@ -61,49 +62,35 @@ void irqtime_account_irq(struct task_str * in that case, so as not to confuse scheduler with a special task * that do not consume any time, but still wants to run. */ - if (hardirq_count()) - irqtime->hardirq_time += delta; - else if (in_serving_softirq() && curr != this_cpu_ksoftirqd()) - irqtime->softirq_time += delta; + if (hardirq_count()) { + cpustat[CPUTIME_IRQ] += delta; + irqtime->tick_delta += delta; + } else if (in_serving_softirq() && curr != this_cpu_ksoftirqd()) { + cpustat[CPUTIME_SOFTIRQ] += delta; + irqtime->tick_delta += delta; + }
u64_stats_update_end(&irqtime->sync); } EXPORT_SYMBOL_GPL(irqtime_account_irq);
-static cputime_t irqtime_account_update(u64 irqtime, int idx, cputime_t maxtime) +static cputime_t irqtime_tick_accounted(cputime_t maxtime) { - u64 *cpustat = kcpustat_this_cpu->cpustat; - cputime_t irq_cputime; - - irq_cputime = nsecs_to_cputime64(irqtime - cpustat[idx]); - irq_cputime = min(irq_cputime, maxtime); - cpustat[idx] += cputime_to_nsecs(irq_cputime); - - return irq_cputime; -} + struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime); + cputime_t delta;
-static cputime_t irqtime_account_hi_update(cputime_t maxtime) -{ - return irqtime_account_update(__this_cpu_read(cpu_irqtime.hardirq_time), - CPUTIME_IRQ, maxtime); -} + delta = nsecs_to_cputime(irqtime->tick_delta); + delta = min(delta, maxtime); + irqtime->tick_delta -= cputime_to_nsecs(delta);
-static cputime_t irqtime_account_si_update(cputime_t maxtime) -{ - return irqtime_account_update(__this_cpu_read(cpu_irqtime.softirq_time), - CPUTIME_SOFTIRQ, maxtime); + return delta; }
#else /* CONFIG_IRQ_TIME_ACCOUNTING */
#define sched_clock_irqtime (0)
-static cputime_t irqtime_account_hi_update(cputime_t dummy) -{ - return 0; -} - -static cputime_t irqtime_account_si_update(cputime_t dummy) +static cputime_t irqtime_tick_accounted(cputime_t dummy) { return 0; } @@ -290,10 +277,7 @@ static inline cputime_t account_other_ti accounted = steal_account_process_time(max);
if (accounted < max) - accounted += irqtime_account_hi_update(max - accounted); - - if (accounted < max) - accounted += irqtime_account_si_update(max - accounted); + accounted += irqtime_tick_accounted(max - accounted);
return accounted; } --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -4,6 +4,7 @@ #include <linux/sched/rt.h> #include <linux/u64_stats_sync.h> #include <linux/sched/deadline.h> +#include <linux/kernel_stat.h> #include <linux/binfmts.h> #include <linux/mutex.h> #include <linux/spinlock.h> @@ -1742,8 +1743,7 @@ static inline void nohz_balance_exit_idl
#ifdef CONFIG_IRQ_TIME_ACCOUNTING struct irqtime { - u64 hardirq_time; - u64 softirq_time; + u64 tick_delta; u64 irq_start_time; struct u64_stats_sync sync; }; @@ -1753,12 +1753,13 @@ DECLARE_PER_CPU(struct irqtime, cpu_irqt static inline u64 irq_time_read(int cpu) { struct irqtime *irqtime = &per_cpu(cpu_irqtime, cpu); + u64 *cpustat = kcpustat_cpu(cpu).cpustat; unsigned int seq; u64 total;
do { seq = __u64_stats_fetch_begin(&irqtime->sync); - total = irqtime->softirq_time + irqtime->hardirq_time; + total = cpustat[CPUTIME_SOFTIRQ] + cpustat[CPUTIME_IRQ]; } while (__u64_stats_fetch_retry(&irqtime->sync, seq));
return total;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frederic Weisbecker fweisbec@gmail.com
commit 25e2d8c1b9e327ed260edd13169cc22bc7a78bc6 upstream.
irq_time_read() returns the irqtime minus the ksoftirqd time. This is necessary because irq_time_read() is used to substract the IRQ time from the sum_exec_runtime of a task. If we were to include the softirq time of ksoftirqd, this task would substract its own CPU time everytime it updates ksoftirqd->sum_exec_runtime which would therefore never progress.
But this behaviour got broken by:
a499a5a14db ("sched/cputime: Increment kcpustat directly on irqtime account")
... which now includes ksoftirqd softirq time in the time returned by irq_time_read().
This has resulted in wrong ksoftirqd cputime reported to userspace through /proc/stat and thus "top" not showing ksoftirqd when it should after intense networking load.
ksoftirqd->stime happens to be correct but it gets scaled down by sum_exec_runtime through task_cputime_adjusted().
To fix this, just account the strict IRQ time in a separate counter and use it to report the IRQ time.
Reported-and-tested-by: Jesper Dangaard Brouer brouer@redhat.com Signed-off-by: Frederic Weisbecker fweisbec@gmail.com Reviewed-by: Rik van Riel riel@redhat.com Acked-by: Jesper Dangaard Brouer brouer@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Stanislaw Gruszka sgruszka@redhat.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Wanpeng Li wanpeng.li@hotmail.com Link: http://lkml.kernel.org/r/1493129448-5356-1-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Ivan Delalande colona@arista.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/sched/cputime.c | 27 ++++++++++++++++----------- kernel/sched/sched.h | 9 +++++++-- 2 files changed, 23 insertions(+), 13 deletions(-)
--- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -37,6 +37,18 @@ void disable_sched_clock_irqtime(void) sched_clock_irqtime = 0; }
+static void irqtime_account_delta(struct irqtime *irqtime, u64 delta, + enum cpu_usage_stat idx) +{ + u64 *cpustat = kcpustat_this_cpu->cpustat; + + u64_stats_update_begin(&irqtime->sync); + cpustat[idx] += delta; + irqtime->total += delta; + irqtime->tick_delta += delta; + u64_stats_update_end(&irqtime->sync); +} + /* * Called before incrementing preempt_count on {soft,}irq_enter * and before decrementing preempt_count on {soft,}irq_exit. @@ -44,7 +56,6 @@ void disable_sched_clock_irqtime(void) void irqtime_account_irq(struct task_struct *curr) { struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime); - u64 *cpustat = kcpustat_this_cpu->cpustat; s64 delta; int cpu;
@@ -55,22 +66,16 @@ void irqtime_account_irq(struct task_str delta = sched_clock_cpu(cpu) - irqtime->irq_start_time; irqtime->irq_start_time += delta;
- u64_stats_update_begin(&irqtime->sync); /* * We do not account for softirq time from ksoftirqd here. * We want to continue accounting softirq time to ksoftirqd thread * in that case, so as not to confuse scheduler with a special task * that do not consume any time, but still wants to run. */ - if (hardirq_count()) { - cpustat[CPUTIME_IRQ] += delta; - irqtime->tick_delta += delta; - } else if (in_serving_softirq() && curr != this_cpu_ksoftirqd()) { - cpustat[CPUTIME_SOFTIRQ] += delta; - irqtime->tick_delta += delta; - } - - u64_stats_update_end(&irqtime->sync); + if (hardirq_count()) + irqtime_account_delta(irqtime, delta, CPUTIME_IRQ); + else if (in_serving_softirq() && curr != this_cpu_ksoftirqd()) + irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ); } EXPORT_SYMBOL_GPL(irqtime_account_irq);
--- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1743,6 +1743,7 @@ static inline void nohz_balance_exit_idl
#ifdef CONFIG_IRQ_TIME_ACCOUNTING struct irqtime { + u64 total; u64 tick_delta; u64 irq_start_time; struct u64_stats_sync sync; @@ -1750,16 +1751,20 @@ struct irqtime {
DECLARE_PER_CPU(struct irqtime, cpu_irqtime);
+/* + * Returns the irqtime minus the softirq time computed by ksoftirqd. + * Otherwise ksoftirqd's sum_exec_runtime is substracted its own runtime + * and never move forward. + */ static inline u64 irq_time_read(int cpu) { struct irqtime *irqtime = &per_cpu(cpu_irqtime, cpu); - u64 *cpustat = kcpustat_cpu(cpu).cpustat; unsigned int seq; u64 total;
do { seq = __u64_stats_fetch_begin(&irqtime->sync); - total = cpustat[CPUTIME_SOFTIRQ] + cpustat[CPUTIME_IRQ]; + total = irqtime->total; } while (__u64_stats_fetch_retry(&irqtime->sync, seq));
return total;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhi Chen zhichen@codeaurora.org
commit c8291988806407e02a01b4b15b4504eafbcc04e0 upstream.
Length of WMI scan message was not calculated correctly. The allocated buffer was smaller than what we expected. So WMI message corrupted skb_info, which is at the end of skb->data. This fix takes TLV header into account even if the element is zero-length.
Crash log: [49.629986] Unhandled kernel unaligned access[#1]: [49.634932] CPU: 0 PID: 1176 Comm: logd Not tainted 4.4.60 #180 [49.641040] task: 83051460 ti: 8329c000 task.ti: 8329c000 [49.646608] $ 0 : 00000000 00000001 80984a80 00000000 [49.652038] $ 4 : 45259e89 8046d484 8046df30 8024ba70 [49.657468] $ 8 : 00000000 804cc4c0 00000001 20306320 [49.662898] $12 : 33322037 000110f2 00000000 31203930 [49.668327] $16 : 82792b40 80984a80 00000001 804207fc [49.673757] $20 : 00000000 0000012c 00000040 80470000 [49.679186] $24 : 00000000 8024af7c [49.684617] $28 : 8329c000 8329db88 00000001 802c58d0 [49.690046] Hi : 00000000 [49.693022] Lo : 453c0000 [49.696013] epc : 800efae4 put_page+0x0/0x58 [49.700615] ra : 802c58d0 skb_release_data+0x148/0x1d4 [49.706184] Status: 1000fc03 KERNEL EXL IE [49.710531] Cause : 00800010 (ExcCode 04) [49.714669] BadVA : 45259e89 [49.717644] PrId : 00019374 (MIPS 24Kc)
Signed-off-by: Zhi Chen zhichen@codeaurora.org Signed-off-by: Kalle Valo kvalo@codeaurora.org Cc: Brian Norris briannorris@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/wireless/ath/ath10k/wmi-tlv.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/net/wireless/ath/ath10k/wmi-tlv.c +++ b/drivers/net/wireless/ath/ath10k/wmi-tlv.c @@ -1486,10 +1486,10 @@ ath10k_wmi_tlv_op_gen_start_scan(struct bssid_len = arg->n_bssids * sizeof(struct wmi_mac_addr); ie_len = roundup(arg->ie_len, 4); len = (sizeof(*tlv) + sizeof(*cmd)) + - (arg->n_channels ? sizeof(*tlv) + chan_len : 0) + - (arg->n_ssids ? sizeof(*tlv) + ssid_len : 0) + - (arg->n_bssids ? sizeof(*tlv) + bssid_len : 0) + - (arg->ie_len ? sizeof(*tlv) + ie_len : 0); + sizeof(*tlv) + chan_len + + sizeof(*tlv) + ssid_len + + sizeof(*tlv) + bssid_len + + sizeof(*tlv) + ie_len;
skb = ath10k_wmi_alloc_skb(ar, len); if (!skb)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gao Feng gfree.wind@vip.163.com
commit c953d63548207a085abcb12a15fefc8a11ffdf0a upstream.
The info->target comes from userspace and it would be used directly. So we need to add the sanity check to make sure it is a valid standard target, although the ebtables tool has already checked it. Kernel needs to validate anything coming from userspace.
If the target is set as an evil value, it would break the ebtables and cause a panic. Because the non-standard target is treated as one offset.
Now add one helper function ebt_invalid_target, and we would replace the macro INVALID_TARGET later.
Signed-off-by: Gao Feng gfree.wind@vip.163.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Cc: Loic hackurx@opensec.fr Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/netfilter_bridge/ebtables.h | 5 +++++ net/bridge/netfilter/ebt_arpreply.c | 3 +++ 2 files changed, 8 insertions(+)
--- a/include/linux/netfilter_bridge/ebtables.h +++ b/include/linux/netfilter_bridge/ebtables.h @@ -123,4 +123,9 @@ extern unsigned int ebt_do_table(struct /* True if the target is not a standard target */ #define INVALID_TARGET (info->target < -NUM_STANDARD_TARGETS || info->target >= 0)
+static inline bool ebt_invalid_target(int target) +{ + return (target < -NUM_STANDARD_TARGETS || target >= 0); +} + #endif --- a/net/bridge/netfilter/ebt_arpreply.c +++ b/net/bridge/netfilter/ebt_arpreply.c @@ -67,6 +67,9 @@ static int ebt_arpreply_tg_check(const s if (e->ethproto != htons(ETH_P_ARP) || e->invflags & EBT_IPROTO) return -EINVAL; + if (ebt_invalid_target(info->target)) + return -EINVAL; + return 0; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit c592b57347069abfc0dcad3b3a302cf882602597 upstream.
This removes all the obvious code paths that depend on lazy FPU mode. It shouldn't change the generated code at all.
Signed-off-by: Andy Lutomirski luto@kernel.org Signed-off-by: Rik van Riel riel@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Denys Vlasenko dvlasenk@redhat.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: H. Peter Anvin hpa@zytor.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Oleg Nesterov oleg@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Quentin Casasnovas quentin.casasnovas@oracle.com Cc: Thomas Gleixner tglx@linutronix.de Cc: pbonzini@redhat.com Link: http://lkml.kernel.org/r/1475627678-20788-5-git-send-email-riel@redhat.com Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Daniel Sangorrin daniel.sangorrin@toshiba.co.jp Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/crypto/crc32c-intel_glue.c | 17 ++++------------- arch/x86/include/asm/fpu/internal.h | 34 +--------------------------------- arch/x86/kernel/fpu/core.c | 36 ++++-------------------------------- arch/x86/kernel/fpu/signal.c | 8 +++----- arch/x86/kernel/fpu/xstate.c | 9 --------- arch/x86/kvm/cpuid.c | 4 +--- arch/x86/kvm/x86.c | 10 ---------- 7 files changed, 13 insertions(+), 105 deletions(-)
--- a/arch/x86/crypto/crc32c-intel_glue.c +++ b/arch/x86/crypto/crc32c-intel_glue.c @@ -48,21 +48,13 @@ #ifdef CONFIG_X86_64 /* * use carryless multiply version of crc32c when buffer - * size is >= 512 (when eager fpu is enabled) or - * >= 1024 (when eager fpu is disabled) to account + * size is >= 512 to account * for fpu state save/restore overhead. */ -#define CRC32C_PCL_BREAKEVEN_EAGERFPU 512 -#define CRC32C_PCL_BREAKEVEN_NOEAGERFPU 1024 +#define CRC32C_PCL_BREAKEVEN 512
asmlinkage unsigned int crc_pcl(const u8 *buffer, int len, unsigned int crc_init); -static int crc32c_pcl_breakeven = CRC32C_PCL_BREAKEVEN_EAGERFPU; -#define set_pcl_breakeven_point() \ -do { \ - if (!use_eager_fpu()) \ - crc32c_pcl_breakeven = CRC32C_PCL_BREAKEVEN_NOEAGERFPU; \ -} while (0) #endif /* CONFIG_X86_64 */
static u32 crc32c_intel_le_hw_byte(u32 crc, unsigned char const *data, size_t length) @@ -185,7 +177,7 @@ static int crc32c_pcl_intel_update(struc * use faster PCL version if datasize is large enough to * overcome kernel fpu state save/restore overhead */ - if (len >= crc32c_pcl_breakeven && irq_fpu_usable()) { + if (len >= CRC32C_PCL_BREAKEVEN && irq_fpu_usable()) { kernel_fpu_begin(); *crcp = crc_pcl(data, len, *crcp); kernel_fpu_end(); @@ -197,7 +189,7 @@ static int crc32c_pcl_intel_update(struc static int __crc32c_pcl_intel_finup(u32 *crcp, const u8 *data, unsigned int len, u8 *out) { - if (len >= crc32c_pcl_breakeven && irq_fpu_usable()) { + if (len >= CRC32C_PCL_BREAKEVEN && irq_fpu_usable()) { kernel_fpu_begin(); *(__le32 *)out = ~cpu_to_le32(crc_pcl(data, len, *crcp)); kernel_fpu_end(); @@ -257,7 +249,6 @@ static int __init crc32c_intel_mod_init( alg.update = crc32c_pcl_intel_update; alg.finup = crc32c_pcl_intel_finup; alg.digest = crc32c_pcl_intel_digest; - set_pcl_breakeven_point(); } #endif return crypto_register_shash(&alg); --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -60,11 +60,6 @@ extern u64 fpu__get_supported_xfeatures_ /* * FPU related CPU feature flag helper routines: */ -static __always_inline __pure bool use_eager_fpu(void) -{ - return true; -} - static __always_inline __pure bool use_xsaveopt(void) { return static_cpu_has(X86_FEATURE_XSAVEOPT); @@ -501,24 +496,6 @@ static inline int fpu_want_lazy_restore( }
-/* - * Wrap lazy FPU TS handling in a 'hw fpregs activation/deactivation' - * idiom, which is then paired with the sw-flag (fpregs_active) later on: - */ - -static inline void __fpregs_activate_hw(void) -{ - if (!use_eager_fpu()) - clts(); -} - -static inline void __fpregs_deactivate_hw(void) -{ - if (!use_eager_fpu()) - stts(); -} - -/* Must be paired with an 'stts' (fpregs_deactivate_hw()) after! */ static inline void __fpregs_deactivate(struct fpu *fpu) { WARN_ON_FPU(!fpu->fpregs_active); @@ -528,7 +505,6 @@ static inline void __fpregs_deactivate(s trace_x86_fpu_regs_deactivated(fpu); }
-/* Must be paired with a 'clts' (fpregs_activate_hw()) before! */ static inline void __fpregs_activate(struct fpu *fpu) { WARN_ON_FPU(fpu->fpregs_active); @@ -554,22 +530,17 @@ static inline int fpregs_active(void) }
/* - * Encapsulate the CR0.TS handling together with the - * software flag. - * * These generally need preemption protection to work, * do try to avoid using these on their own. */ static inline void fpregs_activate(struct fpu *fpu) { - __fpregs_activate_hw(); __fpregs_activate(fpu); }
static inline void fpregs_deactivate(struct fpu *fpu) { __fpregs_deactivate(fpu); - __fpregs_deactivate_hw(); }
/* @@ -596,8 +567,7 @@ switch_fpu_prepare(struct fpu *old_fpu, * or if the past 5 consecutive context-switches used math. */ fpu.preload = static_cpu_has(X86_FEATURE_FPU) && - new_fpu->fpstate_active && - (use_eager_fpu() || new_fpu->counter > 5); + new_fpu->fpstate_active;
if (old_fpu->fpregs_active) { if (!copy_fpregs_to_fpstate(old_fpu)) @@ -615,8 +585,6 @@ switch_fpu_prepare(struct fpu *old_fpu, __fpregs_activate(new_fpu); trace_x86_fpu_regs_activated(new_fpu); prefetch(&new_fpu->state); - } else { - __fpregs_deactivate_hw(); } } else { old_fpu->counter = 0; --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -59,27 +59,9 @@ static bool kernel_fpu_disabled(void) return this_cpu_read(in_kernel_fpu); }
-/* - * Were we in an interrupt that interrupted kernel mode? - * - * On others, we can do a kernel_fpu_begin/end() pair *ONLY* if that - * pair does nothing at all: the thread must not have fpu (so - * that we don't try to save the FPU state), and TS must - * be set (so that the clts/stts pair does nothing that is - * visible in the interrupted kernel thread). - * - * Except for the eagerfpu case when we return true; in the likely case - * the thread has FPU but we are not going to set/clear TS. - */ static bool interrupted_kernel_fpu_idle(void) { - if (kernel_fpu_disabled()) - return false; - - if (use_eager_fpu()) - return true; - - return !current->thread.fpu.fpregs_active && (read_cr0() & X86_CR0_TS); + return !kernel_fpu_disabled(); }
/* @@ -127,7 +109,6 @@ void __kernel_fpu_begin(void) copy_fpregs_to_fpstate(fpu); } else { this_cpu_write(fpu_fpregs_owner_ctx, NULL); - __fpregs_activate_hw(); } } EXPORT_SYMBOL(__kernel_fpu_begin); @@ -138,8 +119,6 @@ void __kernel_fpu_end(void)
if (fpu->fpregs_active) copy_kernel_to_fpregs(&fpu->state); - else - __fpregs_deactivate_hw();
kernel_fpu_enable(); } @@ -201,10 +180,7 @@ void fpu__save(struct fpu *fpu) trace_x86_fpu_before_save(fpu); if (fpu->fpregs_active) { if (!copy_fpregs_to_fpstate(fpu)) { - if (use_eager_fpu()) - copy_kernel_to_fpregs(&fpu->state); - else - fpregs_deactivate(fpu); + copy_kernel_to_fpregs(&fpu->state); } } trace_x86_fpu_after_save(fpu); @@ -262,8 +238,7 @@ int fpu__copy(struct fpu *dst_fpu, struc * Don't let 'init optimized' areas of the XSAVE area * leak into the child task: */ - if (use_eager_fpu()) - memset(&dst_fpu->state.xsave, 0, fpu_kernel_xstate_size); + memset(&dst_fpu->state.xsave, 0, fpu_kernel_xstate_size);
/* * Save current FPU registers directly into the child @@ -285,10 +260,7 @@ int fpu__copy(struct fpu *dst_fpu, struc memcpy(&src_fpu->state, &dst_fpu->state, fpu_kernel_xstate_size);
- if (use_eager_fpu()) - copy_kernel_to_fpregs(&src_fpu->state); - else - fpregs_deactivate(src_fpu); + copy_kernel_to_fpregs(&src_fpu->state); } preempt_enable();
--- a/arch/x86/kernel/fpu/signal.c +++ b/arch/x86/kernel/fpu/signal.c @@ -344,11 +344,9 @@ static int __fpu__restore_sig(void __use }
fpu->fpstate_active = 1; - if (use_eager_fpu()) { - preempt_disable(); - fpu__restore(fpu); - preempt_enable(); - } + preempt_disable(); + fpu__restore(fpu); + preempt_enable();
return err; } else { --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -890,15 +890,6 @@ int arch_set_user_pkey_access(struct tas */ if (!boot_cpu_has(X86_FEATURE_OSPKE)) return -EINVAL; - /* - * For most XSAVE components, this would be an arduous task: - * brining fpstate up to date with fpregs, updating fpstate, - * then re-populating fpregs. But, for components that are - * never lazily managed, we can just access the fpregs - * directly. PKRU is never managed lazily, so we can just - * manipulate it directly. Make sure it stays that way. - */ - WARN_ON_ONCE(!use_eager_fpu());
/* Set the bits we need in PKRU: */ if (init_val & PKEY_DISABLE_ACCESS) --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -16,7 +16,6 @@ #include <linux/export.h> #include <linux/vmalloc.h> #include <linux/uaccess.h> -#include <asm/fpu/internal.h> /* For use_eager_fpu. Ugh! */ #include <asm/user.h> #include <asm/fpu/xstate.h> #include "cpuid.h" @@ -114,8 +113,7 @@ int kvm_update_cpuid(struct kvm_vcpu *vc if (best && (best->eax & (F(XSAVES) | F(XSAVEC)))) best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
- if (use_eager_fpu()) - kvm_x86_ops->fpu_activate(vcpu); + kvm_x86_ops->fpu_activate(vcpu);
/* * The existing code assumes virtual address is 48-bit in the canonical --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7631,16 +7631,6 @@ void kvm_put_guest_fpu(struct kvm_vcpu * copy_fpregs_to_fpstate(&vcpu->arch.guest_fpu); __kernel_fpu_end(); ++vcpu->stat.fpu_reload; - /* - * If using eager FPU mode, or if the guest is a frequent user - * of the FPU, just leave the FPU active for next time. - * Every 255 times fpu_counter rolls over to 0; a guest that uses - * the FPU in bursts will revert to loading it on demand. - */ - if (!use_eager_fpu()) { - if (++vcpu->fpu_counter < 5) - kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu); - } trace_kvm_fpu(0); }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rik van Riel riel@redhat.com
commit 3913cc3507575273beb165a5e027a081913ed507 upstream.
With the lazy FPU code gone, we no longer use the counter field in struct fpu for anything. Get rid it.
Signed-off-by: Rik van Riel riel@redhat.com Reviewed-by: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Denys Vlasenko dvlasenk@redhat.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: H. Peter Anvin hpa@zytor.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Oleg Nesterov oleg@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Quentin Casasnovas quentin.casasnovas@oracle.com Cc: Thomas Gleixner tglx@linutronix.de Cc: pbonzini@redhat.com Link: http://lkml.kernel.org/r/1475627678-20788-6-git-send-email-riel@redhat.com Signed-off-by: Ingo Molnar mingo@kernel.org Cc: Daniel Sangorrin daniel.sangorrin@toshiba.co.jp Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/include/asm/fpu/internal.h | 3 --- arch/x86/include/asm/fpu/types.h | 11 ----------- arch/x86/include/asm/trace/fpu.h | 5 +---- arch/x86/kernel/fpu/core.c | 3 --- 4 files changed, 1 insertion(+), 21 deletions(-)
--- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -581,16 +581,13 @@ switch_fpu_prepare(struct fpu *old_fpu,
/* Don't change CR0.TS if we just switch! */ if (fpu.preload) { - new_fpu->counter++; __fpregs_activate(new_fpu); trace_x86_fpu_regs_activated(new_fpu); prefetch(&new_fpu->state); } } else { - old_fpu->counter = 0; old_fpu->last_cpu = -1; if (fpu.preload) { - new_fpu->counter++; if (fpu_want_lazy_restore(new_fpu, cpu)) fpu.preload = 0; else --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -322,17 +322,6 @@ struct fpu { unsigned char fpregs_active;
/* - * @counter: - * - * This counter contains the number of consecutive context switches - * during which the FPU stays used. If this is over a threshold, the - * lazy FPU restore logic becomes eager, to save the trap overhead. - * This is an unsigned char so that after 256 iterations the counter - * wraps and the context switch behavior turns lazy again; this is to - * deal with bursty apps that only use the FPU for a short time: - */ - unsigned char counter; - /* * @state: * * In-memory copy of all FPU registers that we save/restore --- a/arch/x86/include/asm/trace/fpu.h +++ b/arch/x86/include/asm/trace/fpu.h @@ -14,7 +14,6 @@ DECLARE_EVENT_CLASS(x86_fpu, __field(struct fpu *, fpu) __field(bool, fpregs_active) __field(bool, fpstate_active) - __field(int, counter) __field(u64, xfeatures) __field(u64, xcomp_bv) ), @@ -23,17 +22,15 @@ DECLARE_EVENT_CLASS(x86_fpu, __entry->fpu = fpu; __entry->fpregs_active = fpu->fpregs_active; __entry->fpstate_active = fpu->fpstate_active; - __entry->counter = fpu->counter; if (boot_cpu_has(X86_FEATURE_OSXSAVE)) { __entry->xfeatures = fpu->state.xsave.header.xfeatures; __entry->xcomp_bv = fpu->state.xsave.header.xcomp_bv; } ), - TP_printk("x86/fpu: %p fpregs_active: %d fpstate_active: %d counter: %d xfeatures: %llx xcomp_bv: %llx", + TP_printk("x86/fpu: %p fpregs_active: %d fpstate_active: %d xfeatures: %llx xcomp_bv: %llx", __entry->fpu, __entry->fpregs_active, __entry->fpstate_active, - __entry->counter, __entry->xfeatures, __entry->xcomp_bv ) --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -225,7 +225,6 @@ EXPORT_SYMBOL_GPL(fpstate_init);
int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu) { - dst_fpu->counter = 0; dst_fpu->fpregs_active = 0; dst_fpu->last_cpu = -1;
@@ -433,7 +432,6 @@ void fpu__restore(struct fpu *fpu) trace_x86_fpu_before_restore(fpu); fpregs_activate(fpu); copy_kernel_to_fpregs(&fpu->state); - fpu->counter++; trace_x86_fpu_after_restore(fpu); kernel_fpu_enable(); } @@ -451,7 +449,6 @@ EXPORT_SYMBOL_GPL(fpu__restore); void fpu__drop(struct fpu *fpu) { preempt_disable(); - fpu->counter = 0;
if (fpu->fpregs_active) { /* Ignore delayed exceptions from user space */
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit f09a7b0eead737b33d940bf5c2509ca1441e9590
Daniel writes: Because the modification in this patch actually belongs to e63650840e8b ("x86/fpu: Finish excising 'eagerfpu'")
Reported-by: Daniel Sangorrin daniel.sangorrin@toshiba.co.jp Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/arch/x86/include/asm/cpufeatures.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/arch/x86/include/asm/cpufeatures.h +++ b/tools/arch/x86/include/asm/cpufeatures.h @@ -104,7 +104,7 @@ #define X86_FEATURE_EXTD_APICID ( 3*32+26) /* has extended APICID (8 bits) */ #define X86_FEATURE_AMD_DCM ( 3*32+27) /* multi-node processor */ #define X86_FEATURE_APERFMPERF ( 3*32+28) /* APERFMPERF */ -/* free, was #define X86_FEATURE_EAGER_FPU ( 3*32+29) * "eagerfpu" Non lazy FPU restore */ +#define X86_FEATURE_EAGER_FPU ( 3*32+29) /* "eagerfpu" Non lazy FPU restore */ #define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 state */
/* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Lutomirski luto@kernel.org
commit e63650840e8b053aa09ad934877e87e9941ed135 upstream.
Now that eagerfpu= is gone, remove it from the docs and some comments. Also sync the changes to tools/.
Signed-off-by: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Brian Gerst brgerst@gmail.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Denys Vlasenko dvlasenk@redhat.com Cc: Fenghua Yu fenghua.yu@intel.com Cc: H. Peter Anvin hpa@zytor.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Oleg Nesterov oleg@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: Quentin Casasnovas quentin.casasnovas@oracle.com Cc: Rik van Riel riel@redhat.com Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/cf430dd4481d41280e93ac6cf0def1007a67fc8e.1476740397... Signed-off-by: Ingo Molnar mingo@kernel.org Cc: Daniel Sangorrin daniel.sangorrin@toshiba.co.jp Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- Documentation/kernel-parameters.txt | 6 ------ arch/x86/include/asm/cpufeatures.h | 1 - arch/x86/include/asm/fpu/types.h | 23 ----------------------- arch/x86/mm/pkeys.c | 3 +-- tools/arch/x86/include/asm/cpufeatures.h | 1 - 5 files changed, 1 insertion(+), 33 deletions(-)
--- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1084,12 +1084,6 @@ bytes respectively. Such letter suffixes nopku [X86] Disable Memory Protection Keys CPU feature found in some Intel CPUs.
- eagerfpu= [X86] - on enable eager fpu restore - off disable eager fpu restore - auto selects the default scheme, which automatically - enables eagerfpu restore for xsaveopt. - module.async_probe [KNL] Enable asynchronous probe on this module.
--- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -104,7 +104,6 @@ #define X86_FEATURE_EXTD_APICID ( 3*32+26) /* has extended APICID (8 bits) */ #define X86_FEATURE_AMD_DCM ( 3*32+27) /* multi-node processor */ #define X86_FEATURE_APERFMPERF ( 3*32+28) /* APERFMPERF */ -/* free, was #define X86_FEATURE_EAGER_FPU ( 3*32+29) * "eagerfpu" Non lazy FPU restore */ #define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 state */
/* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */ --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -329,29 +329,6 @@ struct fpu { * the registers in the FPU are more recent than this state * copy. If the task context-switches away then they get * saved here and represent the FPU state. - * - * After context switches there may be a (short) time period - * during which the in-FPU hardware registers are unchanged - * and still perfectly match this state, if the tasks - * scheduled afterwards are not using the FPU. - * - * This is the 'lazy restore' window of optimization, which - * we track though 'fpu_fpregs_owner_ctx' and 'fpu->last_cpu'. - * - * We detect whether a subsequent task uses the FPU via setting - * CR0::TS to 1, which causes any FPU use to raise a #NM fault. - * - * During this window, if the task gets scheduled again, we - * might be able to skip having to do a restore from this - * memory buffer to the hardware registers - at the cost of - * incurring the overhead of #NM fault traps. - * - * Note that on modern CPUs that support the XSAVEOPT (or other - * optimized XSAVE instructions), we don't use #NM traps anymore, - * as the hardware can track whether FPU registers need saving - * or not. On such CPUs we activate the non-lazy ('eagerfpu') - * logic, which unconditionally saves/restores all FPU state - * across context switches. (if FPU state exists.) */ union fpregs_state state; /* --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -142,8 +142,7 @@ u32 init_pkru_value = PKRU_AD_KEY( 1) | * Called from the FPU code when creating a fresh set of FPU * registers. This is called from a very specific context where * we know the FPU regstiers are safe for use and we can use PKRU - * directly. The fact that PKRU is only available when we are - * using eagerfpu mode makes this possible. + * directly. */ void copy_init_pkru_to_fpregs(void) { --- a/tools/arch/x86/include/asm/cpufeatures.h +++ b/tools/arch/x86/include/asm/cpufeatures.h @@ -104,7 +104,6 @@ #define X86_FEATURE_EXTD_APICID ( 3*32+26) /* has extended APICID (8 bits) */ #define X86_FEATURE_AMD_DCM ( 3*32+27) /* multi-node processor */ #define X86_FEATURE_APERFMPERF ( 3*32+28) /* APERFMPERF */ -#define X86_FEATURE_EAGER_FPU ( 3*32+29) /* "eagerfpu" Non lazy FPU restore */ #define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 state */
/* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
On 10/11/2018 09:35 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.133 release. There are 35 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat Oct 13 15:25:09 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.133-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
On Thu, 11 Oct 2018 at 21:14, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.9.133 release. There are 35 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat Oct 13 15:25:09 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.133-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Summary ------------------------------------------------------------------------
kernel: 4.9.133-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.9.y git commit: c26947f9eb18b1c3de2b4d76a981b0b468b390f2 git describe: v4.9.132-36-gc26947f9eb18 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.132-36-...
No regressions (compared to build v4.9.132)
Ran 20946 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * boot * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-containers-tests * ltp-cve-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * ltp-open-posix-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
On 10/11/2018 08:35 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.133 release. There are 35 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat Oct 13 15:25:09 UTC 2018. Anything received after that time might be too late.
[preliminary]
powerpc:allmodconfig:
drivers/macintosh/rack-meter.c: In function 'get_cpu_idle_time': drivers/macintosh/rack-meter.c:94:9: error: implicit declaration of function 'nsecs_to_cputime64'
Guenter
On Fri, Oct 12, 2018 at 05:20:40AM -0700, Guenter Roeck wrote:
On 10/11/2018 08:35 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.133 release. There are 35 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat Oct 13 15:25:09 UTC 2018. Anything received after that time might be too late.
[preliminary]
powerpc:allmodconfig:
drivers/macintosh/rack-meter.c: In function 'get_cpu_idle_time': drivers/macintosh/rack-meter.c:94:9: error: implicit declaration of function 'nsecs_to_cputime64'
Ugh, this one is messier, let me figure it out...
greg k-h
On Fri, Oct 12, 2018 at 03:52:43PM +0200, Greg Kroah-Hartman wrote:
On Fri, Oct 12, 2018 at 05:20:40AM -0700, Guenter Roeck wrote:
On 10/11/2018 08:35 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.133 release. There are 35 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat Oct 13 15:25:09 UTC 2018. Anything received after that time might be too late.
[preliminary]
powerpc:allmodconfig:
drivers/macintosh/rack-meter.c: In function 'get_cpu_idle_time': drivers/macintosh/rack-meter.c:94:9: error: implicit declaration of function 'nsecs_to_cputime64'
Ugh, this one is messier, let me figure it out...
Ok, I've just dropped the offending patches now and pushed out a -rc2 with that all fixed up. Hopefully that resolves this issue.
thanks,
greg k-h
On Thu, Oct 11, 2018 at 05:35:02PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.133 release. There are 35 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat Oct 13 15:25:09 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.133-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
Merged, compiled with -Werror, and installed onto my OnePlus 6.
No initial issues noticed in dmesg or general usage.
Thanks! Nathan
On Thu, Oct 11, 2018 at 05:35:02PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.133 release. There are 35 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat Oct 13 15:25:09 UTC 2018. Anything received after that time might be too late.
For v4.9.132-33-ge4e7bd48c9de:
Build results: total: 150 pass: 150 fail: 0 Qemu test results: total: 308 pass: 308 fail: 0
Details are available at https://kerneltests.org/builders/.
Guenter
linux-stable-mirror@lists.linaro.org